Lustre LNet Networking Details

LNet (Lustre Networking) is the kernel-level networking infrastructure for Lustre, providing abstracted message passing over various networks like TCP/IP, InfiniBand, and Omni-Path. It handles routing, failover, load balancing, and high availability for clients, servers, and routers. This guide is based on Lustre 2.17.0 (January 2026), incorporating features like dynamic NID configuration and enhanced multi-rail (LMR). For recent JIRA updates, see LU-19763 (TCP zerocopy) and ongoing upstreaming efforts. Always refer to the Lustre Manual for full details.

Core Concepts

ConceptDescription
NID (Network Identifier)Format: <IP|hostname>@<LNet-label> (e.g., 192.168.1.10@tcp0). Identifies nodes uniquely.
LNet LabelFormat: <LND><number> (e.g., tcp0, o2ib0). Defines network types.
RouterIntermediate node for cross-network forwarding.
PeerRemote node with NID; includes credits, health, and reference counts.
CreditsFlow control: send (tx), routing (rtr), buffer (peer_buffer_credits).
PortalKernel thread for message reception and dispatch to upper layers (e.g., ptlrpc).
CPT (CPU Partition)Partitions messages across CPU cores for affinity and balancing.
Multi-Rail (LMR)Bonds multiple interfaces/NIDs for bandwidth and redundancy (enhanced in 2.17).

Features

Modules and Supported Drivers

Module/DriverRoleSupported Networks
lnetCore messaging, routing, credits.All
libcfsKernel services, CPT, memory.All
ksocklndTCP/IP driver.Ethernet, IPoIB
o2iblndRDMA driver.InfiniBand, Omni-Path
gniGemini/Aries driver.HPC fabrics
raRapidArray driver.Specialized
elanQuadrics driver (legacy).Legacy HPC
lnet_selftestTesting framework.All

Load modules: modprobe libcfs; modprobe lnet; modprobe <LND>. Client and server use the same modules, but servers often require high-bandwidth LNDs like o2iblnd.

Configuration

Module Parameters (/etc/modprobe.d/lustre.conf)

options lnet networks="tcp0(eth0),o2ib0(ib0)"
options lnet ip2nets="tcp0(eth0) 192.168.0.[2,4]"
options lnet routes="tcp0 132.6.1.[1-8]@o2ib0; o2ib0 192.168.0.[1-8]@tcp0"
options ksocklnd credits=256 peer_credits=8
options o2iblnd conns_per_peer=4

Apply: modprobe -r lnet; modprobe lnet.

Runtime with lnetctl (2.7+)

# Initialize
lnetctl lnet configure

# Add network
lnetctl net add --net tcp2 --if eth0,eth1 --peer_timeout 180

# Add peer (multi-rail)
lnetctl peer add --prim_nid 10.10.10.2@tcp --nid 10.10.3.3@tcp1,10.4.4.5@tcp2

# Add route
lnetctl route add --net tcp2 --gateway 192.168.205.130@tcp1 --hop 2 --prio 1

# Enable routing
lnetctl set routing 1

# YAML import
lnetctl import config.yaml

For dynamic NID (2.17+): Use C-API for programmatic changes, e.g., lustre_lnet_config_net("tcp2", "eth0", 0, seq, &err).

Routing and Multi-Rail

Health Monitoring

Health value: 0-1000; decrements on failures, recovers via pings. Set sensitivity: lnetctl set health_sensitivity 100. View: lnetctl net show -v 3.

Discovery and Self-Test

For client vs. server: Clients focus on multi-rail for bandwidth; servers on high-availability routing. Recent: TCP zerocopy default (LU-19763, 2026) improves performance.