Lustre Debugging Tutorial
Lustre provides a comprehensive set of debugging tools for troubleshooting file-system issues, including an internal debugger, debug logs, configurable debug levels, buffer management, and a debug daemon. This tutorial covers Lustre 2.17.0 (January 2026), based on the Lustre Operations Manual (updated 2025). Refer to Lustre Manual for full details. This expanded guide includes explanations for users with limited experience, best practices, warnings, and additional troubleshooting tips.
Introduction for Beginners
If you're new to Lustre debugging, understand that Lustre is a complex distributed filesystem, and issues can arise from network problems, metadata inconsistencies, or resource contention. Debugging tools help capture and analyze logs to identify root causes. Key concepts:
- Debug Buffer: A memory area in the kernel where Lustre stores log messages temporarily.
- Subsystems: Components like metadata client (mdc) or network (lnet) that generate specific logs.
- Message Types: Categories like errors or traces that control log verbosity.
- lctl: The primary command-line tool for controlling Lustre, including debugging.
- Prerequisites: Root access on Lustre nodes, mounted filesystem, and basic Linux knowledge (e.g., grep, perl).
Best Practices: Start with minimal debug levels to avoid overwhelming logs. Reproduce issues in a test environment before enabling on production.
Warning: High debug levels can impact performance or fill disks—monitor CPU/memory/disk usage. Always disable after debugging to prevent overhead.
Internal Debugger
The Lustre kernel debug logging captures debug messages from Lustre kernel modules (e.g., mds, ost, lnet, ldlm, ptlrpc, etc.) and stores them in a circular debug buffer in kernel memory. For beginners: This is like a flight recorder for the filesystem, logging events for later analysis.
Key Features
- Buffer Type: Circular, fixed-size memory buffer (per-CPU or global). Old messages are overwritten when full.
- Default Size: ~5 MB per CPU core (configurable). Increase for longer history.
- Message Format: Includes subsystem, debug mask, CPU ID, timestamp, stack size, PID, file:line:function, message. Example:
kmalloced '*obj': 24 at a375571c. Timestamps help correlate events across nodes. - Subsystems: mdc, mds, osc, ost, ldlm, ptlrpc, lnet, etc. Focus on relevant ones for your issue (e.g., lnet for network problems).
Best Practices: Use markers (lctl mark "Start test") to bookmark logs. Sync node clocks with NTP for multi-node analysis.
Warning: Large buffers consume kernel memory—avoid exceeding available RAM to prevent OOM kills.
Message Types
Message types categorize log entries. Beginners: Start with error/warning for critical issues; add trace for detailed flows.
| Type | Description | Beginner Notes |
|---|---|---|
| trace | Function entry/exit | Verbose; use for step-by-step debugging but expect large logs. |
| inode | Inode operations | Useful for file creation/deletion issues. |
| info | General non-critical info | Low overhead; good starting point. |
| warning | Significant but non-fatal issues | Alerts to potential problems. |
| error | Critical errors | Must-investigate; often with error codes. |
| emerg | Fatal conditions | System may crash; check immediately. |
| neterror | LNet/network errors | For connectivity issues. |
| rpctrace | RPC request/reply tracing | Tracks client-server communications. |
| malloc | Memory allocation tracking (used with `leak_finder.pl`) | Enable only for leak hunting; high overhead. |
| ha | Failover and recovery events | For high-availability setups. |
| quota | Space accounting | For quota-related errors. |
| sec | Security handling | For permission/ACL issues. |
| iotrace | IO path tracing | For performance bottlenecks in data paths. |
Commands
| Command | Purpose | Beginner Notes |
|---|---|---|
| lctl debug_kernel FILENAME | Write buffer to FILENAME (ASCII or raw) or stdout | Use ASCII for readability; raw for tools like debug_file. |
| lctl clear | Clear kernel debug buffer | Do this before tests for clean logs. |
| lctl mark [TEXT] | Insert timestamped marker TEXT into the kernel debug log | Helps segment logs (e.g., "Before failure"). |
| lctl set_param debug=[+-]TYPE... | Enable or disable debug logging of TYPE messages | + adds, - removes; combine like +error+warning. |
| lctl set_param subsystem_debug=[+-]SUBSYS... | Enable or disable logging of SUBSYS messages | Target specific areas to reduce noise. |
| lctl debug_file INPUT OUTPUT | Convert binary INPUT debug log file dumped by kernel to text in OUTPUT file | Essential for analyzing daemon outputs. |
Reading Debug Logs
Debug logs are accessible via kernel buffer dumps and user-space tools. For beginners: Logs can be voluminous—use grep for keywords like "LustreError".
Access Methods
- Kernel Log: View via
dmesg,/var/log/messages. Quick for recent errors. - Debug Buffer Dump:
lctl debug_kernel [filename]→ Dumps to file. Example:lctl debug_kernel /tmp/lustre_debug.bin. Use on all nodes for distributed issues. - Console Output: Enable full console:
lctl set_param printk=-1. Disable rate limiting:options libcfs libcfs_console_ratelimit=0in/etc/modprobe.d/lustre.conf. Reboot or reload modules after changes. - Log Path: Controlled by
lnet.debug_pathparameter (default:/tmp/lustre-log). Use persistent storage like /var/log. - Request History (PtlRPC): Stores RPC history. Parameters:
req_buffer_history_len,req_buffer_history_max,req_history(sequence, NIDs, xid, etc.). Phases: New, Interpret, Complete. Useful for RPC timeouts.
Best Practices: Dump logs immediately after issues to avoid overwrite. Compress large files (e.g., gzip).
Warning: Frequent dumps can add I/O load—schedule during low activity.
Analysis Tools
leak_finder.pl: Parse "malloc" debug messages for inconsistencies between alloc/free for memory leaks. Example:perl leak_finder.pl /tmp/debug.log. See detailed section below.
Changing Debugging Levels
Debug verbosity is controlled via global and subsystem-specific masks. Beginners: Levels range from silent (0) to very detailed (full); start low and increase as needed.
Global Debug Mask
lctl set_param debug=[+-]TYPE
- 0: Disable all.
- all : Full debugging (high overhead).
- TYPE : Log only "TYPE" messages (e.g., neterror,trace ).
- -TYPE : Disable "TYPE" messages (e.g. -neterror-trace )
Default includes warning, error, emerg, ha, config, console.
Best Practices: Use + for additive enabling; test in isolation.
Subsystem-Specific Debug
lctl set_param subsystem_debug=mds
Levels
| Level | Activation | Beginner Notes |
|---|---|---|
| 0 | No messages | Silent mode for production. |
| 1 | Critical errors | Minimal logging. |
| 2 | Warnings + errors | Balance for monitoring. |
| 3+ | Detailed tracing (function calls, RPCs) | High detail; use briefly. |
Targeting Different Subsystems
Subsystems allow focused debugging. Beginners: Choose based on symptoms (e.g., mds for metadata slowness).
| Subsystem | Role | Key Debug Parameters | Categories/Masks (Examples) | Activation Examples | Beginner Notes |
|---|---|---|---|---|---|
| mdc (Metadata Client) | Client-side metadata ops (create, unlink, getattr) | mdc.debug, mdc_rpc.debug, mdc_request.debug | +all, +rpctrace, +trace | lctl set_param debug=+all | Start here for client file ops issues. |
| mds (Metadata Server) | Server-side metadata (layout, locks, recovery) | mds.debug, mds_request.debug, mds_reint.debug | +rpctrace, +dlmtrace, +inode | lctl set_param debug=+inode | For server-side bottlenecks. |
| osc (Object Storage Client) | Client-side I/O to OSTs | osc.debug, osc_request.debug, osc_io.debug | +iotrace, +rpctrace | lctl set_param debug=+iotrace | Data read/write problems. |
| ost (Object Storage Target) | Server-side data storage | ost.debug, ost_io.debug, ost_create.debug | +rpctrace, +dlmtrace, +inode | lctl set_param ost.debug=+rpctrace | Storage server issues. |
| ldlm (Lustre Distributed Lock Manager) | Manages distributed locks | ldlm.debug, ldlm_enqueue.debug, ldlm_cancel.debug | +dlmtrace, +rpctrace | lctl set_param debug=+dlmtrace | Lock contention or deadlocks. |
| ptlrpc (Portal RPC) | RPC communication layer | ptlrpc.debug, ptlrpc_request.debug, ptlrpc_reply.debug | +rpctrace, +neterror, +trace | lctl set_param debug=+rpctrace | Communication failures. |
| lnet (Lustre Network) | Network routing & communication | lnet.debug, lnet_ni.debug, lnet_router.debug | +neterror | lctl set_param debug=+neterror | Network-specific errors. |
View & List
lctl get_param debug
lctl debug_list types
lctl debug_list subsystems
Permanent Settings
lctl set_param -P debug=+malloc
Warning: Permanent settings (-P) apply cluster-wide—test first without -P.
Basic Debugging Settings
Maximum Debug Buffer Size
lctl set_param debug_mb=1G # 1 GiB total
Default: ~5 MB per CPU core. Buffer wraps on overflow.
Console Rate Limiting
Disable: Add to /etc/modprobe.d/lustre.conf: options libcfs libcfs_console_ratelimit=0. Reload modules.
Panic, Log Dump, and Upcall on LBUG
lctl set_param panic_on_lbug=1
lctl set_param debug_log_upcall=/path/to/script
Upcall script can automate dumps on errors.
Debug Daemon
The debug daemon is a userspace process that continuously dumps the kernel debug buffer to a file, preventing overflow and providing persistent logs. It runs as a background daemon, flushing the buffer at regular intervals or on demand. For beginners: Think of it as a continuous logger to capture long sessions without losing data.
Why Use It
- Persistent logging beyond volatile kernel buffers.
- Critical for long-running sessions, recovery analysis, or high-volume tracing.
- Prevents data loss ("Trace buffer full").
When to Use It
- During troubleshooting, failover testing, or performance debugging.
- When kernel buffer is insufficient (e.g., large clusters).
- Post-mortem analysis after issue has been reproduced crashes.
Best Practices: Run on all relevant nodes (clients/servers). Use large size limits for extended tests.
How to Set It Up
| Command | Description | Beginner Notes |
|---|---|---|
| lctl debug_daemon start <filename> [MB] | Start logging to file (e.g., start /var/log/lustre.bin 40). The optional megabytes parameter limits the file size; daemon overwrites old logs if size exceeded. | Use binary (.bin) for efficiency; decode later. |
| lctl debug_daemon stop | Stop and flush final buffer to file. | Always stop to ensure complete logs. |
| lctl debug_daemon dump | Manual flush without stopping. | Useful mid-test. |
| lctl debug_file <input> <output> | Decode binary log to text (e.g., lctl debug_file lustre.bin lustre.txt). | Text files are grep-friendly. |
Behavior: Runs as kernel thread; auto-stops on shutdown. If file exists, overwrites. Use on servers/clients for distributed debugging.
Warning: Large files can fill disks—monitor with df; use rotation or limits.
Troubleshooting the Debug Daemon
- Daemon Not Starting: Check if Lustre modules are loaded (
lsmod | grep lustre). Ensure root privileges. Verify path permissions:mkdir -p /var/log/lustre; chmod 755 /var/log/lustre. - No Output in File: Confirm debug levels are set (
lctl get_param debug); if 0, set to +trace or similar. Test withlctl mark "Test"to insert messages. Ensure daemon is running:ps aux | grep lctl. - File Not Written/Empty: Use absolute paths. Check disk space:
df -h /var/log. If size limit too small, increase [MB]. - Buffer Overflow Before Dump: Increase debug_mb:
lctl set_param debug_mb=200. Use smaller intervals via manual dump. - Decode Fails: Ensure input is binary (
file input.binshould show "data"). Retrylctl debug_file input.bin output.txt. If corrupt, restart daemon. - Daemon Stuck/Hanging: Kill process:
pkill -f "lctl debug_daemon". Check logs for errors:dmesg | grep lustre. - High Overhead: Limit subsystems/types to reduce volume. Run on specific nodes only.
- Rotation Issues: If files not rotating, check size limit; unlimited mode appends without rotation.
Example Workflow
# Enable debug
lctl set_param subsystem_debug=mds debug.+neterror
# Start daemon
lctl debug_daemon start /var/log/lustre_debug 1024
# Trigger issue
# Stop and decode
lctl debug_daemon stop
lctl debug_file /var/log/lustre_debug /var/log/lustre_debug.txt
# Analyze
grep "LustreError" /var/log/lustre_debug.txt
perl leak_finder.pl /var/log/lustre_debug.txt
Expanded Debug Daemon Examples
Example 1: Debugging MDS Recovery
# On MDS: Set debug for recovery
lctl set_param debug.mds=+ha
# Start daemon with 2 GB limit writing to tmpfs
lctl debug_daemon start /tmp/mds_recovery.bin 200
# Simulate recovery (e.g., unmount and remount MDT)
# Dump manually during process
lctl debug_daemon dump
# Stop after issue has been reproduced
lctl debug_daemon stop
lctl debug_file /tmp/mds_recovery.bin /tmp/mds_recovery.txt
# Analyze
grep "recovery" /tmp/mds_recovery.txt
Example 2: Multi-Node Network Debugging
# On Client: Debug LNet/RPC
lctl set_param debug+lnet+rpctrace
lctl set_param debug.ptlrpc=+trace
# Start daemon
lctl debug_daemon start /tmp/client_net.bin 2048
# On Server: Mirror debug
lctl set_param debug+lnet+rpctrace
lctl debug_daemon start /tmp/server_net.bin 2048
# Run I/O test (e.g., dd on client)
# Stop both, decode, compare timestamps
Example 3: Memory Leak Hunting
# Enable malloc tracking
lctl set_param debug=+malloc
# Start daemon
lctl debug_daemon start /tmp/memleak.bin 1024
# Run workload (e.g., create/delete files loop)
# Stop and analyze
lctl debug_daemon stop
lctl debug_file /tmp/memleak.bin /tmp/memleak.txt
perl leak_finder.pl /tmp/memleak.txt
Example 4: Long-Running Performance Debug
# Set low-level for perf
lctl set_param debug=iotrace debug_mb=1024
# Start daemon with rotation
lctl debug_daemon start /var/log/perf_debug.bin 20480 # Larger for long runs
# Run benchmark (e.g., IOR)
# Manual dump mid-run
lctl debug_daemon dump
# Stop at end
lctl debug_daemon stop
leak_finder.pl Usage
leak_finder.pl is a Perl script located in lustre/tests/ that analyzes debug logs with +malloc enabled to detect memory leaks by matching kmalloced/kfreed pairs. It reports unpaired allocations as potential leaks, grouped by call site or function. Use it after issue has been reproduced capturing logs with malloc tracing to identify leaks in kernel modules. For beginners: Memory leaks occur when allocated memory isn't freed, leading to exhaustion over time.
Preparing for Usage
# Enable malloc tracing
lctl set_param debug=+malloc
# Generate log (e.g., via debug daemon)
lctl debug_daemon start /tmp/leak_log.bin 2048
# Run suspected leaky code
lctl debug_daemon stop
lctl debug_file /tmp/leak_log.bin /tmp/leak_log.txt
Best Practices: Run workloads that stress memory (e.g., repeated allocations). Clear buffer before starting.
Warning: +malloc adds significant overhead—use only in testing; disable in production.
Running the Script
perl /path/to/lustre/tests/leak_finder.pl /tmp/leak_log.txt
Options:
- --by-func: Group leaks by function name.
- --help: Show usage.
Example Output: Lists allocations without frees, with addresses, sizes, and call sites (e.g., "Unmatched kmalloc at obd_alloc: 1024 bytes").
Expanded leak_finder.pl Examples
Example 1: Basic Leak Detection
# Run script
perl leak_finder.pl /tmp/leak_log.txt
# Output example:
Unmatched kmallocs:
obd_alloc: 2048 bytes at 0x12345678 (called from function1)
lustre_inode_alloc: 1024 bytes at 0x87654321 (called from function2)
Example 2: Group by Function
perl leak_finder.pl --by-func /tmp/leak_log.txt
# Output example:
Leaks by function:
function1: 3 allocations, 12048 bytes
function2: 2 allocations, 200 bytes
Example 3: Long-Run Analysis
# Start daemon with malloc
lctl set_param debug=+malloc
lctl debug_daemon start /tmp/long_leak.bin 200
# Run extended workload
# Stop and analyze
lctl debug_daemon stop
lctl debug_file /tmp/long_leak.bin /tmp/long_leak.txt
perl leak_finder.pl --by-func /tmp/long_leak.txt
Example 4: Troubleshooting No Leaks Reported
# If no output: Verify +malloc enabled
lctl get_param debug # Should include +malloc
# Rerun workload, ensure logs capture full run
perl leak_finder.pl /tmp/leak_log.txt # If empty, increase buffer size or use daemon
Notes: Higher levels increase overhead; disable in production.
Additional Resources and Troubleshooting
For more advanced debugging:
- Lustre Manual: Debugging Section.
- Wiki: Debugging Tips.
- Tools: Use strace for user-space calls, wireshark for network, crash for kernel dumps.
- Jobstats (2.15+): Monitor per-job I/O with
lctl get_param job_stats; enable vialctl set_param -P jobid_var=procname_uid. - Common Pitfalls: Forgetting to decode binary logs leads to unreadable files. High levels can cause "Trace buffer full"—increase debug_mb.
- Community: Join lustre-devel mailing list or LUG for help.
If logs show "LustreError: went back in time", check disk caches. For persistent issues, use LFSCK for consistency checks.