Lustre Upstreaming Process Details
The Lustre upstreaming effort aims to integrate Lustre as a native filesystem in the mainline Linux kernel, reducing out-of-tree patches and vendor-specific modifications. This process makes Lustre more accessible, easier to maintain, and compatible with standard Linux distributions. This information is based on the latest wiki updates (June 2025) and LUG (Lustre User Group) 2024 presentations. The process is tracked under Lustre's master branch, with the main JIRA ticket LU-12511. For more, see the official wiki.
Introduction for Beginners
If you're new to Lustre or kernel development, upstreaming refers to the process of merging code from a project like Lustre into the official ("mainline") Linux kernel maintained by Linus Torvalds. This reduces the need for custom patches that vendors apply to their kernel versions (e.g., Red Hat Enterprise Linux or SUSE Linux Enterprise Server). Key benefits include better integration, fewer bugs from diverging codebases, and easier adoption by the broader Linux community.
- Mainline Kernel: The official Linux kernel source tree where all standard features are developed.
- Out-of-Tree Patches: Custom code not in the mainline, requiring separate maintenance.
- JIRA: A bug tracking system used by Lustre developers (similar to GitHub Issues).
- Gerrit: A code review tool for submitting and reviewing patches.
- LUG: Annual Lustre User Group conference for updates and discussions.
Warning: Upstreaming involves complex kernel changes that can introduce instability if not tested thoroughly. Beginners should start by reading the Lustre wiki and contributing small bug fixes before tackling upstreaming tasks.
Best Practices: Always collaborate early via mailing lists to avoid duplicating efforts. Use version control (Git) properly and follow coding standards to streamline reviews.
Goals and Tenets
The upstreaming initiative focuses on long-term sustainability and community integration. For beginners: These tenets guide decisions to ensure Lustre evolves in harmony with the Linux kernel, reducing maintenance burdens for users and developers.
- Make mainline Linux the source of truth for Lustre development, meaning all new features start there.
- Prioritize upstream-first changes, with backports to older vendor kernels (e.g., RHEL, SLES) as needed.
- Preserve community workflows and integrate with the broader filesystem community (e.g., collaborating with ext4 or XFS developers).
- Reduce patching needs by separating core code from compatibility layers, allowing Lustre to run on both new and old kernels.
Best Practices: When developing, design changes to be kernel-agnostic where possible. Document compatibility implications in JIRA tickets.
Warning: Ignoring upstream tenets can lead to rejected patches or increased downstream (vendor-specific) maintenance costs.
Pending Work and Challenges
Upstreaming requires addressing technical hurdles to align Lustre with kernel standards. For beginners: Each item is a specific task tracked in JIRA, often involving code refactoring or feature enhancements.
| Item | Description | Status/JIRA | Beginner Notes |
|---|---|---|---|
| IPv6 Support | Mostly complete in 2.17, but a key barrier for LNet (Lustre Networking) upstreaming. | LU-18417 | IPv6 is the next-generation internet protocol; full support ensures Lustre works in modern networks. |
| Kernel Coding Style | Addressed via checkpatch.pl script. | LU-6142 | Use tools like checkpatch to enforce consistent code formatting, making reviews easier. |
| Kernel-Doc | Transition to Sphinx style in progress. | LU-9633 | Sphinx generates documentation; this improves Lustre's in-kernel docs. |
| Module Unification | Reduce number of kernel modules. | LU-17862 | Fewer modules simplify loading and maintenance. |
| Code Separation | Split kernel, compatibility, and userspace code. | LU-18687 | Isolates core Lustre from distro-specific hacks. |
| Hashing Replacement | Use rhashtable, mostly server-side. | LU-8130 | rhashtable is a standard kernel data structure for efficient lookups. |
| ProcFS to DebugFS | Removal ongoing, with netlink for stats. | LU-8066, LU-11850 | DebugFS is preferred for debugging info; ProcFS is deprecated for this use. |
| Folios Transition | In progress. | LU-17916 | Folios are a new kernel memory management unit replacing pages. |
| Writepage Deprecation | Align with upstream. | LU-18675 | Updates I/O paths to modern kernel APIs. |
| dcache Regressions | Fixes for upstream changes. | LU-11501, LU-9868 | dcache is the directory cache; regressions are bugs from kernel updates. |
| o2iblnd Simplification | Deferred for initial TCP/IP-only submission. | LU-8874 | o2iblnd handles InfiniBand; starting with TCP simplifies initial upstream. |
Challenges include supporting older kernels (e.g., dropping RHEL7 support to ease transitions), fixing dcache bugs, and ensuring non-root access to debugfs for security.
Best Practices: Prioritize high-impact items like IPv6. Test changes on both mainline and vendor kernels.
Warning: Delaying fixes for older kernels can block upstream progress; plan deprecations carefully to avoid disrupting users.
Stages of Upstreaming
Upstreaming is phased to manage complexity. For beginners: Each stage builds on the previous, with ETAs as estimates—delays are common in kernel work.
- Code Separation (ETA: June 2025): Reorganize tree into fs/lustre/, net/lnet/, lustre_compat/, etc. Compatibility in libcfs.ko for older distros. This isolates upstreamable code.
- Mainline Compilation (ETA: November 2025): Ensure fs/ and net/ compile on mainline without compat layer. Validate via Gerrit and Jenkins CI tools.
- Separate Kernel Tree (ETA: November 2026): Maintain patches on mainline, generate for submission, support older kernels via compat. This allows parallel development.
- Patch Submission (ETA: November 2026): Submit during kernel merge windows (short periods when new features are accepted), test extensively, and backport fixes.
Best Practices: Use automated CI (Continuous Integration) like Jenkins for early error detection. Document each stage's milestones in JIRA.
Warning: Missing merge windows can delay upstreaming by months; align submissions with kernel release cycles (every 2-3 months).
Submission Guidelines
Patches for upstreaming follow the standard Lustre submission process via Gerrit, ensuring quality and stability. Below are detailed guidelines, expanded for clarity.
Best Practices: Start small—fix a bug before submitting features. Always include tests to prove your change works.
Warning: Poorly formatted patches or incomplete tests will be rejected; follow guidelines strictly to avoid wasted effort.
Lustre Quality and Stability
Lustre development prioritizes stability, as many sites rely on it for production filesystems handling petabytes of data. Processes ensure careful evaluation of changes. For beginners: Stability means avoiding crashes or data corruption in HPC environments.
Landing a Patch to a Lustre Release
Landings freeze one month prior to General Availability (GA) to allow time for stress testing, performance benchmarking, and interoperability testing with different clients/servers.
Patch Landing Checklist
This step-by-step checklist ensures patches are robust. For beginners: Follow sequentially; tools like Git and checkpatch automate much of this.
- A JIRA ticket has been opened to track the issue. Include details like problem description and proposed fix.
- Test the change locally using
acceptance-small.sh(see TestingLustreCode). This runs basic regression tests. - Commit the change after verifying:
- Commit comments are well-formatted and useful (see Commit_Comments). Include what, why, and how.
- The patch follows Lustre Coding Style Guidelines, verified by running
git show | contrib/scripts/checkpatch.pl -. - A regression test has been created that fails without the patch and passes with it (see TestingLustreCode).
- The patch includes an appropriate
Signed-off-by:line, certifying your contribution. - Patch includes a
Fixes:line if it is fixing a bug in a previous patch, referencing the commit hash.
- Upload the patch to Gerrit and review test results:
- Ensure newly-added or modified tests are passing consistently across runs.
- Review other test failures, which may include intermittent issues (e.g., network flakes).
- Associate known failures with existing LU tickets by searching Jira or Maloo (test result database).
- Raise bug(s) for failures seen across multiple patches without existing LU tickets.
- Fix failures associated only with your ticket, as patches causing failures cannot be landed.
- Retest failed sessions with known issues or resubmit the patch if changes are needed.
- Request at least two Patch Inspection approvals on the Gerrit change, preferably from developers experienced in the relevant code area (identified via
contrib/scripts/get_maintainer.plor the Code Reviewers page). - Record the Gerrit change URL in the JIRA ticket (typically done automatically by Gerrit).
- Attach additional test results (e.g., interoperability with old clients, performance benchmarks) to the JIRA ticket.
- Once Maloo confirms all tests pass and the patch has two positive reviews (excluding the author), the Gatekeeper is automatically notified that the patch is ready for merge.
- The Gatekeeper reviews the patch, confirms test results and inspections, conducts merge testing, and submits it (typically takes about a week).
- If submission fails due to conflicts or regressions, rebase the patch with the target branch and repeat the steps.
- For other branches:
- Use the same JIRA ticket and Gerrit
Change-Idlabels for tracking. - Add
Lustre-commit:andLustre-change:labels for Lustre-specific metadata. - Remove
Reviewed-on:andTested-by:labels from the commit message to clean it up. - If no conflicts, cherry-pick the patch to other branches from Gerrit after editing the commit message.
- Use the same JIRA ticket and Gerrit
Best Practices: Use descriptive commit messages. Automate testing with scripts to catch issues early.
Warning: Intermittent test failures can delay landing; investigate thoroughly to avoid false positives.
Landing a Feature to a Feature Release
Feature releases occur approximately every 6 months. Features are scheduled early in the development cycle to allow ample testing.
First Steps
Begin by checking for existing work to collaborate efficiently. For beginners: Collaboration prevents redundant work and incorporates diverse expertise.
- Check the Projects page to see if someone is already working on a similar feature.
- If a similar project exists, add yourself as a watcher to the JIRA ticket and offer to collaborate via comments or email.
- If no match is found:
- Open a new JIRA ticket with detailed plans, including purpose, design thoughts, and potential impacts.
- Add an entry to the Future Projects section on the Projects page.
- Email the lustre-devel mailing list to draw attention to your project and the ticket.
- Discuss features early to avoid conflicts in protocol changes, code restructuring, or interoperability issues (e.g., ensuring new features work with old versions).
Best Practices: Include diagrams or pseudocode in JIRA for complex designs. Seek feedback before coding.
Warning: Late discussions can lead to major rewrites; engage the community from the start.
Schedule and Timing
Lustre follows a "train model" with a fixed schedule, like a train leaving on time—miss it, and wait for the next. Features not ready are deferred.
- T-7: Call for features sent to lustre-devel. Respond early with feature details using the Feature Landing Checklist.
- T-6: Initial review of candidate features, test plan creation, and landing schedule establishment.
- T-3: Feature freeze; only bug fixes allowed after this point.
- T-1: Code freeze; critical bug fixes only. Release candidate tagged, intensive testing begins.
- T0: General Availability (GA) announced; RPMs available at downloads.whamcloud.com.
Best Practices: Plan backwards from deadlines. Allocate time for reviews and iterations.
Warning: Missing freezes means your feature waits 6 months; prioritize completion over perfection.
Feature Landing Checklist
This ensures features are production-ready. For beginners: Features require more scrutiny than patches due to broader impact.
- High-level design reviewed and signed off by a senior Lustre engineer (e.g., via Gerrit or mailing list).
- Test plan reviewed and signed off, including performance testing, version interoperability (old/new servers/clients), and feature-specific tests (e.g., edge cases).
- Test plan results uploaded to Maloo.
- Proposed revisions to the manual provided (e.g., new configuration options).
- All criteria from the Patch Landing Checklist are met.
Best Practices: Include scalability tests for HPC features. Update docs concurrently with code.
Warning: Incomplete test plans can lead to post-release bugs; cover all scenarios.
Seeking Guidance
Contact other Lustre developers via the lustre-devel mailing list or IRC for assistance with patch submission. Beginners: Don't hesitate to ask— the community is supportive.
Review and Validation Process
The process ensures upstream code meets kernel standards. For beginners: Automation catches issues early, but human reviews are crucial.
- Submit patches to Gerrit for code review.
- Automated build validation for mainline compilation (non-enforced initially to allow gradual adoption).
- Jenkins minimalist kernel build for checks, ensuring no compile errors.
- Testing on vendor kernels remains primary to maintain compatibility.
Best Practices: Respond promptly to review comments. Use CI feedback to iterate quickly.
Warning: Ignoring vendor kernel tests can break production deployments; balance upstream and downstream needs.
Status as of June 2025
Significant progress has been made since Lustre's removal from the kernel staging area in earlier years.
- Code cleanups (e.g., libcfs debug macros replaced, tracefile converted to ring_buffer) are ongoing.
- LNet IPv6 foundation complete, but full support needed for multi-rail and advanced features.
- ldiskfs/e2fsprogs upstreaming active (e.g., sparse_super2 for efficiency, parallel e2fsck for faster checks).
- Maintainers: James Simmons leading client efforts, with community contributions welcome.
- Future: Finalize IPv6, complete cleanups, minimize tree differences (e.g., UUID mounting for unique identification, ASLR for security).
Best Practices: Monitor JIRA for opportunities to contribute. Attend LUG for networking.
Warning: Status can change rapidly; always check the wiki for the latest.
Additional Resources and Troubleshooting
For deeper dives:
- Lustre Wiki: Upstreaming Page.
- JIRA: Track progress at LU-12511.
- Mailing Lists: lustre-devel for discussions.
- LUG Presentations: Search for LUG 2024/2025 slides on upstreaming.
- Common Pitfalls: Conflicts during rebase—use Git rebase carefully. Test failures—debug with Lustre logs (
lctl debug_daemon). - Advanced: Join IRC (#lustre on OFTC) for real-time help. Consider contributing to related projects like e2fsprogs.
If stuck, start with small contributions to build experience. Always update to the latest Lustre version for testing upstream changes.