Dynamic NID Configuration Details
Dynamic NID Configuration in Lustre LNet allows runtime management of Network Identifiers (NIDs) without restarting services, supporting multi-rail, failover, and routing. NIDs identify nodes (e.g., 192.168.1.2@tcp0). This feature, enhanced in Lustre 2.17.0 (released 2025), simplifies online LNet interface configuration (LU-18815). Information is based on the Lustre Operations Manual (updated 2025), release notes, and JIRA (e.g., LU-17431 for dynamic nodemaps). For full details, see Lustre Manual.
NID Basics
- Format: ADDR@NETTYPE (e.g., 192.168.1.2@tcp0, 10.13.24.90@o2ib1).
- Multi-NID: Comma-separated for same host (NID1,NID2@tcp); colon for failover (NID1:NID2@tcp).
- Display:
lctl list_nids(shows all while LNet running). - Static Config: Module params in /etc/modprobe.d/lustre.conf (e.g., networks="tcp0(eth0)").
Dynamic Configuration with lnetctl (2.7+)
| Command | Example | Description |
|---|---|---|
| lnetctl net add | lnetctl net add --net tcp2 --if eth0,eth1 --peer_timeout 180 | Add network with multiple interfaces (non-unique names in 2.10+). |
| lnetctl peer add | lnetctl peer add --prim_nid 10.10.10.2@tcp --nid 10.10.3.3@tcp1,10.4.4.5@tcp2 | Add peer with multi-rail NIDs (2.10+). |
| lnetctl route add | lnetctl route add --net tcp2 --gateway 192.168.205.130@tcp1 --hop 2 --prio 1 | Add dynamic route; supports asymmetrical (2.13+). |
| lnetctl set routing | lnetctl set routing 1 | Enable routing. |
| lnetctl import | lnetctl import config.yaml | Batch import from YAML (2.11+). |
| lnetctl net show | lnetctl net show --verbose | Show networks with details. |
Dynamic Peer Discovery (2.11+)
# Enable discovery
lnetctl set discovery 1
# Discover peer
lnetctl discover PEER_NID
# Clear peer
lctl clear_peer NID
Automatic via pings; manual configs override with warnings on mismatches.
LNet Health and Multi-Rail (2.10+)
- Health Scores: 0-1000; tune with lnet_health_sensitivity=100, lnet_retry_count=2.
- Multi-Rail: Bond interfaces; load balance with round-robin/credits.
- Route Checks (2.13+): check_routers_before_use=1, alive_router_check_interval=60s.
Failover and NID Replacement
# Set failover
mkfs.lustre --servicenode=NID1:NID2@tcp
# Replace NIDs (2.11+)
lctl replace_nids fsname-OSTxxxx NID1,NID2@tcp
# Deactivate OST (2.16+)
lctl del_ost --target fsname-OSTxxxx
C-API for Programmatic Control
| Function | IOCTL | Description |
|---|---|---|
| lustre_lnet_config_net | IOC_LIBCFS_ADD_NET | Add network interface. |
| lustre_lnet_del_net | IOC_LIBCFS_DEL_NET | Delete network. |
| lustre_lnet_show_net | IOC_LIBCFS_GET_NET | Show network config. |
| lustre_lnet_config_route | - | Add route. |
Return: 0 success, -EINVAL invalid argument, -ENOMEM memory issues.
Integration with Nodemaps (2.17+)
- Dynamic Nodemaps (LU-17431): Hierarchical, ephemeral for jobs; NID ranges must fit parent.
- Fileset Isolation: Restrict access by NID; up to 255 alternates.
# Add fileset
lctl nodemap_fileset_add --name NODMAP --fileset /dir --alt --ro
# Modify
lctl nodemap_fileset_modify --name NODMAP --fileset /dir --rw
Best Practices
- Use unique NIDs; test with LNet self-tests.
- Prefer lnetctl for runtime; YAML for persistence.
- Monitor: lctl get_param nis, lnetctl peer_show.
- Client/Server: Same tools; servers handle more NIDs (up to 32+ in 2.17).
For updates, check JIRA (e.g., LU-18815) or LUG presentations (2025 on 2.17+).