Lustre Upstreaming Process Details

The Lustre upstreaming effort aims to integrate Lustre as a native filesystem in the mainline Linux kernel, reducing out-of-tree patches and vendor-specific modifications. This process makes Lustre more accessible, easier to maintain, and compatible with standard Linux distributions. This information is based on the latest wiki updates (June 2025) and LUG (Lustre User Group) 2024 presentations. The process is tracked under Lustre's master branch, with the main JIRA ticket LU-12511. For more, see the official wiki.

Introduction for Beginners

If you're new to Lustre or kernel development, upstreaming refers to the process of merging code from a project like Lustre into the official ("mainline") Linux kernel maintained by Linus Torvalds. This reduces the need for custom patches that vendors apply to their kernel versions (e.g., Red Hat Enterprise Linux or SUSE Linux Enterprise Server). Key benefits include better integration, fewer bugs from diverging codebases, and easier adoption by the broader Linux community.

Warning: Upstreaming involves complex kernel changes that can introduce instability if not tested thoroughly. Beginners should start by reading the Lustre wiki and contributing small bug fixes before tackling upstreaming tasks.

Best Practices: Always collaborate early via mailing lists to avoid duplicating efforts. Use version control (Git) properly and follow coding standards to streamline reviews.

Goals and Tenets

The upstreaming initiative focuses on long-term sustainability and community integration. For beginners: These tenets guide decisions to ensure Lustre evolves in harmony with the Linux kernel, reducing maintenance burdens for users and developers.

Best Practices: When developing, design changes to be kernel-agnostic where possible. Document compatibility implications in JIRA tickets.

Warning: Ignoring upstream tenets can lead to rejected patches or increased downstream (vendor-specific) maintenance costs.

Pending Work and Challenges

Upstreaming requires addressing technical hurdles to align Lustre with kernel standards. For beginners: Each item is a specific task tracked in JIRA, often involving code refactoring or feature enhancements.

ItemDescriptionStatus/JIRABeginner Notes
IPv6 SupportMostly complete in 2.17, but a key barrier for LNet (Lustre Networking) upstreaming.LU-18417IPv6 is the next-generation internet protocol; full support ensures Lustre works in modern networks.
Kernel Coding StyleAddressed via checkpatch.pl script.LU-6142Use tools like checkpatch to enforce consistent code formatting, making reviews easier.
Kernel-DocTransition to Sphinx style in progress.LU-9633Sphinx generates documentation; this improves Lustre's in-kernel docs.
Module UnificationReduce number of kernel modules.LU-17862Fewer modules simplify loading and maintenance.
Code SeparationSplit kernel, compatibility, and userspace code.LU-18687Isolates core Lustre from distro-specific hacks.
Hashing ReplacementUse rhashtable, mostly server-side.LU-8130rhashtable is a standard kernel data structure for efficient lookups.
ProcFS to DebugFSRemoval ongoing, with netlink for stats.LU-8066, LU-11850DebugFS is preferred for debugging info; ProcFS is deprecated for this use.
Folios TransitionIn progress.LU-17916Folios are a new kernel memory management unit replacing pages.
Writepage DeprecationAlign with upstream.LU-18675Updates I/O paths to modern kernel APIs.
dcache RegressionsFixes for upstream changes.LU-11501, LU-9868dcache is the directory cache; regressions are bugs from kernel updates.
o2iblnd SimplificationDeferred for initial TCP/IP-only submission.LU-8874o2iblnd handles InfiniBand; starting with TCP simplifies initial upstream.

Challenges include supporting older kernels (e.g., dropping RHEL7 support to ease transitions), fixing dcache bugs, and ensuring non-root access to debugfs for security.

Best Practices: Prioritize high-impact items like IPv6. Test changes on both mainline and vendor kernels.

Warning: Delaying fixes for older kernels can block upstream progress; plan deprecations carefully to avoid disrupting users.

Stages of Upstreaming

Upstreaming is phased to manage complexity. For beginners: Each stage builds on the previous, with ETAs as estimates—delays are common in kernel work.

  1. Code Separation (ETA: June 2025): Reorganize tree into fs/lustre/, net/lnet/, lustre_compat/, etc. Compatibility in libcfs.ko for older distros. This isolates upstreamable code.
  2. Mainline Compilation (ETA: November 2025): Ensure fs/ and net/ compile on mainline without compat layer. Validate via Gerrit and Jenkins CI tools.
  3. Separate Kernel Tree (ETA: November 2026): Maintain patches on mainline, generate for submission, support older kernels via compat. This allows parallel development.
  4. Patch Submission (ETA: November 2026): Submit during kernel merge windows (short periods when new features are accepted), test extensively, and backport fixes.

Best Practices: Use automated CI (Continuous Integration) like Jenkins for early error detection. Document each stage's milestones in JIRA.

Warning: Missing merge windows can delay upstreaming by months; align submissions with kernel release cycles (every 2-3 months).

Submission Guidelines

Patches for upstreaming follow the standard Lustre submission process via Gerrit, ensuring quality and stability. Below are detailed guidelines, expanded for clarity.

Best Practices: Start small—fix a bug before submitting features. Always include tests to prove your change works.

Warning: Poorly formatted patches or incomplete tests will be rejected; follow guidelines strictly to avoid wasted effort.

Lustre Quality and Stability

Lustre development prioritizes stability, as many sites rely on it for production filesystems handling petabytes of data. Processes ensure careful evaluation of changes. For beginners: Stability means avoiding crashes or data corruption in HPC environments.

Landing a Patch to a Lustre Release

Landings freeze one month prior to General Availability (GA) to allow time for stress testing, performance benchmarking, and interoperability testing with different clients/servers.

Patch Landing Checklist

This step-by-step checklist ensures patches are robust. For beginners: Follow sequentially; tools like Git and checkpatch automate much of this.

  1. A JIRA ticket has been opened to track the issue. Include details like problem description and proposed fix.
  2. Test the change locally using acceptance-small.sh (see TestingLustreCode). This runs basic regression tests.
  3. Commit the change after verifying:
    1. Commit comments are well-formatted and useful (see Commit_Comments). Include what, why, and how.
    2. The patch follows Lustre Coding Style Guidelines, verified by running git show | contrib/scripts/checkpatch.pl -.
    3. A regression test has been created that fails without the patch and passes with it (see TestingLustreCode).
    4. The patch includes an appropriate Signed-off-by: line, certifying your contribution.
    5. Patch includes a Fixes: line if it is fixing a bug in a previous patch, referencing the commit hash.
  4. Upload the patch to Gerrit and review test results:
    1. Ensure newly-added or modified tests are passing consistently across runs.
    2. Review other test failures, which may include intermittent issues (e.g., network flakes).
    3. Associate known failures with existing LU tickets by searching Jira or Maloo (test result database).
    4. Raise bug(s) for failures seen across multiple patches without existing LU tickets.
    5. Fix failures associated only with your ticket, as patches causing failures cannot be landed.
    6. Retest failed sessions with known issues or resubmit the patch if changes are needed.
  5. Request at least two Patch Inspection approvals on the Gerrit change, preferably from developers experienced in the relevant code area (identified via contrib/scripts/get_maintainer.pl or the Code Reviewers page).
  6. Record the Gerrit change URL in the JIRA ticket (typically done automatically by Gerrit).
  7. Attach additional test results (e.g., interoperability with old clients, performance benchmarks) to the JIRA ticket.
  8. Once Maloo confirms all tests pass and the patch has two positive reviews (excluding the author), the Gatekeeper is automatically notified that the patch is ready for merge.
  9. The Gatekeeper reviews the patch, confirms test results and inspections, conducts merge testing, and submits it (typically takes about a week).
  10. If submission fails due to conflicts or regressions, rebase the patch with the target branch and repeat the steps.
  11. For other branches:
    1. Use the same JIRA ticket and Gerrit Change-Id labels for tracking.
    2. Add Lustre-commit: and Lustre-change: labels for Lustre-specific metadata.
    3. Remove Reviewed-on: and Tested-by: labels from the commit message to clean it up.
    4. If no conflicts, cherry-pick the patch to other branches from Gerrit after editing the commit message.

Best Practices: Use descriptive commit messages. Automate testing with scripts to catch issues early.

Warning: Intermittent test failures can delay landing; investigate thoroughly to avoid false positives.

Landing a Feature to a Feature Release

Feature releases occur approximately every 6 months. Features are scheduled early in the development cycle to allow ample testing.

First Steps

Begin by checking for existing work to collaborate efficiently. For beginners: Collaboration prevents redundant work and incorporates diverse expertise.

Best Practices: Include diagrams or pseudocode in JIRA for complex designs. Seek feedback before coding.

Warning: Late discussions can lead to major rewrites; engage the community from the start.

Schedule and Timing

Lustre follows a "train model" with a fixed schedule, like a train leaving on time—miss it, and wait for the next. Features not ready are deferred.

Best Practices: Plan backwards from deadlines. Allocate time for reviews and iterations.

Warning: Missing freezes means your feature waits 6 months; prioritize completion over perfection.

Feature Landing Checklist

This ensures features are production-ready. For beginners: Features require more scrutiny than patches due to broader impact.

Best Practices: Include scalability tests for HPC features. Update docs concurrently with code.

Warning: Incomplete test plans can lead to post-release bugs; cover all scenarios.

Seeking Guidance

Contact other Lustre developers via the lustre-devel mailing list or IRC for assistance with patch submission. Beginners: Don't hesitate to ask— the community is supportive.

Review and Validation Process

The process ensures upstream code meets kernel standards. For beginners: Automation catches issues early, but human reviews are crucial.

Best Practices: Respond promptly to review comments. Use CI feedback to iterate quickly.

Warning: Ignoring vendor kernel tests can break production deployments; balance upstream and downstream needs.

Status as of June 2025

Significant progress has been made since Lustre's removal from the kernel staging area in earlier years.

Best Practices: Monitor JIRA for opportunities to contribute. Attend LUG for networking.

Warning: Status can change rapidly; always check the wiki for the latest.

Additional Resources and Troubleshooting

For deeper dives:

If stuck, start with small contributions to build experience. Always update to the latest Lustre version for testing upstream changes.