Lustre Debugging Tutorial

Lustre provides a comprehensive set of debugging tools for troubleshooting file-system issues, including an internal debugger, debug logs, configurable debug levels, buffer management, and a debug daemon. This tutorial covers Lustre 2.17.0 (January 2026), based on the Lustre Operations Manual (updated 2025). Refer to Lustre Manual for full details. This expanded guide includes explanations for users with limited experience, best practices, warnings, and additional troubleshooting tips.

Introduction for Beginners

If you're new to Lustre debugging, understand that Lustre is a complex distributed filesystem, and issues can arise from network problems, metadata inconsistencies, or resource contention. Debugging tools help capture and analyze logs to identify root causes. Key concepts:

Debug Buffer: A memory area in the kernel where Lustre stores log messages temporarily.
Subsystems: Components like metadata client (mdc) or network (lnet) that generate specific logs.
Message Types: Categories like errors or traces that control log verbosity.
lctl: The primary command-line tool for controlling Lustre, including debugging.
Prerequisites: Root access on Lustre nodes, mounted filesystem, and basic Linux knowledge (e.g., grep, perl).

Best Practices: Start with minimal debug levels to avoid overwhelming logs. Reproduce issues in a test environment before enabling on production.

Warning: High debug levels can impact performance or fill disks—monitor CPU/memory/disk usage. Always disable after debugging to prevent overhead.

Internal Debugger

The Lustre kernel debug logging captures debug messages from Lustre kernel modules (e.g., mds, ost, lnet, ldlm, ptlrpc, etc.) and stores them in a circular debug buffer in kernel memory. For beginners: This is like a flight recorder for the filesystem, logging events for later analysis.

Key Features

Buffer Type: Circular, fixed-size memory buffer (per-CPU or global). Old messages are overwritten when full.
Default Size: ~5 MB per CPU core (configurable). Increase for longer history.
Message Format: Includes subsystem, debug mask, CPU ID, timestamp, stack size, PID, file:line:function, message. Example: kmalloced '*obj': 24 at a375571c. Timestamps help correlate events across nodes.
Subsystems: mdc, mds, osc, ost, ldlm, ptlrpc, lnet, etc. Focus on relevant ones for your issue (e.g., lnet for network problems).

Best Practices: Use markers (lctl mark "Start test") to bookmark logs. Sync node clocks with NTP for multi-node analysis.

Warning: Large buffers consume kernel memory—avoid exceeding available RAM to prevent OOM kills.

Message Types

Message types categorize log entries. Beginners: Start with error/warning for critical issues; add trace for detailed flows.

Type	Description	Beginner Notes
trace	Function entry/exit	Verbose; use for step-by-step debugging but expect large logs.
inode	Inode operations	Useful for file creation/deletion issues.
info	General non-critical info	Low overhead; good starting point.
warning	Significant but non-fatal issues	Alerts to potential problems.
error	Critical errors	Must-investigate; often with error codes.
emerg	Fatal conditions	System may crash; check immediately.
neterror	LNet/network errors	For connectivity issues.
rpctrace	RPC request/reply tracing	Tracks client-server communications.
malloc	Memory allocation tracking (used with `leak_finder.pl`)	Enable only for leak hunting; high overhead.
ha	Failover and recovery events	For high-availability setups.
quota	Space accounting	For quota-related errors.
sec	Security handling	For permission/ACL issues.
iotrace	IO path tracing	For performance bottlenecks in data paths.

Commands

Command	Purpose	Beginner Notes
lctl debug_kernel FILENAME	Write buffer to FILENAME (ASCII or raw) or stdout	Use ASCII for readability; raw for tools like debug_file.
lctl clear	Clear kernel debug buffer	Do this before tests for clean logs.
lctl mark [TEXT]	Insert timestamped marker TEXT into the kernel debug log	Helps segment logs (e.g., "Before failure").
lctl set_param debug=[+-]TYPE...	Enable or disable debug logging of TYPE messages	+ adds, - removes; combine like +error+warning.
lctl set_param subsystem_debug=[+-]SUBSYS...	Enable or disable logging of SUBSYS messages	Target specific areas to reduce noise.
lctl debug_file INPUT OUTPUT	Convert binary INPUT debug log file dumped by kernel to text in OUTPUT file	Essential for analyzing daemon outputs.

Reading Debug Logs

Debug logs are accessible via kernel buffer dumps and user-space tools. For beginners: Logs can be voluminous—use grep for keywords like "LustreError".

Access Methods

Kernel Log: View via dmesg, /var/log/messages. Quick for recent errors.
Debug Buffer Dump: lctl debug_kernel [filename] → Dumps to file. Example: lctl debug_kernel /tmp/lustre_debug.bin. Use on all nodes for distributed issues.
Console Output: Enable full console: lctl set_param printk=-1. Disable rate limiting: options libcfs libcfs_console_ratelimit=0 in /etc/modprobe.d/lustre.conf. Reboot or reload modules after changes.
Log Path: Controlled by lnet.debug_path parameter (default: /tmp/lustre-log). Use persistent storage like /var/log.
Request History (PtlRPC): Stores RPC history. Parameters: req_buffer_history_len, req_buffer_history_max, req_history (sequence, NIDs, xid, etc.). Phases: New, Interpret, Complete. Useful for RPC timeouts.

Best Practices: Dump logs immediately after issues to avoid overwrite. Compress large files (e.g., gzip).

Warning: Frequent dumps can add I/O load—schedule during low activity.

Analysis Tools

leak_finder.pl: Parse "malloc" debug messages for inconsistencies between alloc/free for memory leaks. Example: perl leak_finder.pl /tmp/debug.log. See detailed section below.

Changing Debugging Levels

Debug verbosity is controlled via global and subsystem-specific masks. Beginners: Levels range from silent (0) to very detailed (full); start low and increase as needed.

Global Debug Mask

lctl set_param debug=[+-]TYPE

0: Disable all.
all : Full debugging (high overhead).
TYPE : Log only "TYPE" messages (e.g., neterror,trace ).
-TYPE : Disable "TYPE" messages (e.g. -neterror-trace )

Default includes warning, error, emerg, ha, config, console.

Best Practices: Use + for additive enabling; test in isolation.

Subsystem-Specific Debug

lctl set_param subsystem_debug=mds

Levels

Level	Activation	Beginner Notes
0	No messages	Silent mode for production.
1	Critical errors	Minimal logging.
2	Warnings + errors	Balance for monitoring.
3+	Detailed tracing (function calls, RPCs)	High detail; use briefly.

Targeting Different Subsystems

Subsystems allow focused debugging. Beginners: Choose based on symptoms (e.g., mds for metadata slowness).

Subsystem	Role	Key Debug Parameters	Categories/Masks (Examples)	Activation Examples	Beginner Notes
mdc (Metadata Client)	Client-side metadata ops (create, unlink, getattr)	mdc.debug, mdc_rpc.debug, mdc_request.debug	+all, +rpctrace, +trace	lctl set_param debug=+all	Start here for client file ops issues.
mds (Metadata Server)	Server-side metadata (layout, locks, recovery)	mds.debug, mds_request.debug, mds_reint.debug	+rpctrace, +dlmtrace, +inode	lctl set_param debug=+inode	For server-side bottlenecks.
osc (Object Storage Client)	Client-side I/O to OSTs	osc.debug, osc_request.debug, osc_io.debug	+iotrace, +rpctrace	lctl set_param debug=+iotrace	Data read/write problems.
ost (Object Storage Target)	Server-side data storage	ost.debug, ost_io.debug, ost_create.debug	+rpctrace, +dlmtrace, +inode	lctl set_param ost.debug=+rpctrace	Storage server issues.
ldlm (Lustre Distributed Lock Manager)	Manages distributed locks	ldlm.debug, ldlm_enqueue.debug, ldlm_cancel.debug	+dlmtrace, +rpctrace	lctl set_param debug=+dlmtrace	Lock contention or deadlocks.
ptlrpc (Portal RPC)	RPC communication layer	ptlrpc.debug, ptlrpc_request.debug, ptlrpc_reply.debug	+rpctrace, +neterror, +trace	lctl set_param debug=+rpctrace	Communication failures.
lnet (Lustre Network)	Network routing & communication	lnet.debug, lnet_ni.debug, lnet_router.debug	+neterror	lctl set_param debug=+neterror	Network-specific errors.

View & List

lctl get_param debug
lctl debug_list types
lctl debug_list subsystems

Permanent Settings

lctl set_param -P debug=+malloc

Warning: Permanent settings (-P) apply cluster-wide—test first without -P.

Basic Debugging Settings

Maximum Debug Buffer Size

lctl set_param debug_mb=1G  # 1 GiB total

Default: ~5 MB per CPU core. Buffer wraps on overflow.

Console Rate Limiting

Disable: Add to /etc/modprobe.d/lustre.conf: options libcfs libcfs_console_ratelimit=0. Reload modules.

Panic, Log Dump, and Upcall on LBUG

lctl set_param panic_on_lbug=1
lctl set_param debug_log_upcall=/path/to/script

Upcall script can automate dumps on errors.

Debug Daemon

The debug daemon is a userspace process that continuously dumps the kernel debug buffer to a file, preventing overflow and providing persistent logs. It runs as a background daemon, flushing the buffer at regular intervals or on demand. For beginners: Think of it as a continuous logger to capture long sessions without losing data.

Why Use It

Persistent logging beyond volatile kernel buffers.
Critical for long-running sessions, recovery analysis, or high-volume tracing.
Prevents data loss ("Trace buffer full").

When to Use It

During troubleshooting, failover testing, or performance debugging.
When kernel buffer is insufficient (e.g., large clusters).
Post-mortem analysis after issue has been reproduced crashes.

Best Practices: Run on all relevant nodes (clients/servers). Use large size limits for extended tests.

How to Set It Up

Command	Description	Beginner Notes
lctl debug_daemon start <filename> [MB]	Start logging to file (e.g., `start /var/log/lustre.bin 40`). The optional megabytes parameter limits the file size; daemon overwrites old logs if size exceeded.	Use binary (.bin) for efficiency; decode later.
lctl debug_daemon stop	Stop and flush final buffer to file.	Always stop to ensure complete logs.
lctl debug_daemon dump	Manual flush without stopping.	Useful mid-test.
lctl debug_file <input> <output>	Decode binary log to text (e.g., `lctl debug_file lustre.bin lustre.txt`).	Text files are grep-friendly.

Behavior: Runs as kernel thread; auto-stops on shutdown. If file exists, overwrites. Use on servers/clients for distributed debugging.

Warning: Large files can fill disks—monitor with df; use rotation or limits.

Troubleshooting the Debug Daemon

Daemon Not Starting: Check if Lustre modules are loaded (lsmod | grep lustre). Ensure root privileges. Verify path permissions: mkdir -p /var/log/lustre; chmod 755 /var/log/lustre.
No Output in File: Confirm debug levels are set (lctl get_param debug); if 0, set to +trace or similar. Test with lctl mark "Test" to insert messages. Ensure daemon is running: ps aux | grep lctl.
File Not Written/Empty: Use absolute paths. Check disk space: df -h /var/log. If size limit too small, increase [MB].
Buffer Overflow Before Dump: Increase debug_mb: lctl set_param debug_mb=200. Use smaller intervals via manual dump.
Decode Fails: Ensure input is binary (file input.bin should show "data"). Retry lctl debug_file input.bin output.txt. If corrupt, restart daemon.
Daemon Stuck/Hanging: Kill process: pkill -f "lctl debug_daemon". Check logs for errors: dmesg | grep lustre.
High Overhead: Limit subsystems/types to reduce volume. Run on specific nodes only.
Rotation Issues: If files not rotating, check size limit; unlimited mode appends without rotation.

Example Workflow

# Enable debug
lctl set_param subsystem_debug=mds debug.+neterror

# Start daemon
lctl debug_daemon start /var/log/lustre_debug 1024

# Trigger issue

# Stop and decode
lctl debug_daemon stop
lctl debug_file /var/log/lustre_debug /var/log/lustre_debug.txt

# Analyze
grep "LustreError" /var/log/lustre_debug.txt
perl leak_finder.pl /var/log/lustre_debug.txt

Expanded Debug Daemon Examples

Example 1: Debugging MDS Recovery

# On MDS: Set debug for recovery
lctl set_param debug.mds=+ha

# Start daemon with 2 GB limit writing to tmpfs
lctl debug_daemon start /tmp/mds_recovery.bin 200

# Simulate recovery (e.g., unmount and remount MDT)

# Dump manually during process
lctl debug_daemon dump

# Stop after issue has been reproduced
lctl debug_daemon stop
lctl debug_file /tmp/mds_recovery.bin /tmp/mds_recovery.txt

# Analyze
grep "recovery" /tmp/mds_recovery.txt

Example 2: Multi-Node Network Debugging

# On Client: Debug LNet/RPC
lctl set_param debug+lnet+rpctrace
lctl set_param debug.ptlrpc=+trace

# Start daemon
lctl debug_daemon start /tmp/client_net.bin 2048

# On Server: Mirror debug
lctl set_param debug+lnet+rpctrace
lctl debug_daemon start /tmp/server_net.bin 2048

# Run I/O test (e.g., dd on client)

# Stop both, decode, compare timestamps

Example 3: Memory Leak Hunting

# Enable malloc tracking
lctl set_param debug=+malloc

# Start daemon
lctl debug_daemon start /tmp/memleak.bin 1024

# Run workload (e.g., create/delete files loop)

# Stop and analyze
lctl debug_daemon stop
lctl debug_file /tmp/memleak.bin /tmp/memleak.txt
perl leak_finder.pl /tmp/memleak.txt

Example 4: Long-Running Performance Debug

# Set low-level for perf
lctl set_param debug=iotrace debug_mb=1024

# Start daemon with rotation
lctl debug_daemon start /var/log/perf_debug.bin 20480  # Larger for long runs

# Run benchmark (e.g., IOR)

# Manual dump mid-run
lctl debug_daemon dump

# Stop at end
lctl debug_daemon stop

leak_finder.pl Usage

leak_finder.pl is a Perl script located in lustre/tests/ that analyzes debug logs with +malloc enabled to detect memory leaks by matching kmalloced/kfreed pairs. It reports unpaired allocations as potential leaks, grouped by call site or function. Use it after issue has been reproduced capturing logs with malloc tracing to identify leaks in kernel modules. For beginners: Memory leaks occur when allocated memory isn't freed, leading to exhaustion over time.

Preparing for Usage

# Enable malloc tracing
lctl set_param debug=+malloc

# Generate log (e.g., via debug daemon)
lctl debug_daemon start /tmp/leak_log.bin 2048
# Run suspected leaky code
lctl debug_daemon stop
lctl debug_file /tmp/leak_log.bin /tmp/leak_log.txt

Best Practices: Run workloads that stress memory (e.g., repeated allocations). Clear buffer before starting.

Warning: +malloc adds significant overhead—use only in testing; disable in production.

Running the Script

perl /path/to/lustre/tests/leak_finder.pl /tmp/leak_log.txt

Options:

--by-func: Group leaks by function name.
--help: Show usage.

Example Output: Lists allocations without frees, with addresses, sizes, and call sites (e.g., "Unmatched kmalloc at obd_alloc: 1024 bytes").

Expanded leak_finder.pl Examples

Example 1: Basic Leak Detection

# Run script
perl leak_finder.pl /tmp/leak_log.txt

# Output example:
Unmatched kmallocs:
obd_alloc: 2048 bytes at 0x12345678 (called from function1)
lustre_inode_alloc: 1024 bytes at 0x87654321 (called from function2)

Example 2: Group by Function

perl leak_finder.pl --by-func /tmp/leak_log.txt

# Output example:
Leaks by function:
function1: 3 allocations, 12048 bytes
function2: 2 allocations, 200 bytes

Example 3: Long-Run Analysis

# Start daemon with malloc
lctl set_param debug=+malloc
lctl debug_daemon start /tmp/long_leak.bin 200

# Run extended workload

# Stop and analyze
lctl debug_daemon stop
lctl debug_file /tmp/long_leak.bin /tmp/long_leak.txt
perl leak_finder.pl --by-func /tmp/long_leak.txt

Example 4: Troubleshooting No Leaks Reported

# If no output: Verify +malloc enabled
lctl get_param debug # Should include +malloc

# Rerun workload, ensure logs capture full run
perl leak_finder.pl /tmp/leak_log.txt  # If empty, increase buffer size or use daemon

Notes: Higher levels increase overhead; disable in production.

Additional Resources and Troubleshooting

For more advanced debugging:

Lustre Manual: Debugging Section.
Wiki: Debugging Tips.
Tools: Use strace for user-space calls, wireshark for network, crash for kernel dumps.
Jobstats (2.15+): Monitor per-job I/O with lctl get_param job_stats; enable via lctl set_param -P jobid_var=procname_uid.
Common Pitfalls: Forgetting to decode binary logs leads to unreadable files. High levels can cause "Trace buffer full"—increase debug_mb.
Community: Join lustre-devel mailing list or LUG for help.

If logs show "LustreError: went back in time", check disk caches. For persistent issues, use LFSCK for consistency checks.