LF AI & Data - How to Apply and Join
29 October 2024
Greg Watson
1. Introduction
This document, written as part of the work of the Center for Open-Source Research Software Stewardship and Advancement (CORSA), describes the information that a particular software project needs to collect/have in order to apply to and then join the LF AI & Data Foundation. Much of the content in this document comes from the LF AI & Data Foundation Lifecycle Document, including the website.
The LF AI & Data Foundation is a is a global not for profit foundation that hosts critical components of the global AI & Data technology infrastructure. It brings together the world’s top developers, end users, and vendors to identify and contribute to the projects and initiatives that address industry challenges for the benefit of all participants.
Their mission is to build and support an open AI community, and drive open source innovation in the AI, ML, DL and Data domains by enabling collaboration, sharing best practices, supporting development efforts, and the creation of new opportunities for all the members of the community.
Benefits of hosting your project at LF AI & Data include:
- A neutral home that increases adoption and contributions.
- Deep engagement with the open source AI and data community to enable collaboration.
- Staff eager to help and support.
- Program and project management services.
- Creative services available to all hosted projects.
- Marketing services to support community and ecosystem engagement.
- Source code scanning service.
- Strong presence in China via a local office and support staff.
- Ability to keep your maintainers and define your own governance, as long as it’s neutral.
- A world-class events team (virtual until it’s safe to hold in-person events again), able to run events around the world from 12 to 12,000 attendees.
A detailed description of services available to hosted projects is available here.
Some additional benefits from joining a foundation are discussed in Section 5 of Watson, et al., 2023.
2. Relationships with LF AI & Data and basic requirements
The LF AI & Data Foundation supports open source projects within artificial intelligence and the data space. Hosting a project with the Linux Foundation follows open governance, which means that there is no one company or individual in control of a project. When the maintainers of an open source project decide to host it at the Linux Foundation, they specifically transfer ownership of the trademark for their project to the Linux Foundation. They don’t transfer the copyright, however, since usage is already available to other users under the open source license. Note that LF AI & Data will only be hosting the upstream code.
3. Project Lifecycle
There are four stages of project hosting in LF AI & Data: Sandbox, Incubating, Graduating, and Emeritus. There are specific requirements for each stage and different benefits for projects in each of the stages.
A project transitions from one stage to another by meeting certain requirements. If a project’s request to transition to a higher level stage is not approved, the project is eligible to re-apply after siz months of the first request.
The Technical Advisory Committee (TAC) votes on new projects joining LF AI & Data, as well as on promoting projects across incubation stages. The TAC will annually review all LF AI & Data projects. This annual review will include assessing whether projects in Sandbox and Incubation are making adequate progress towards the Graduation stage; and whether projects in the Graduation stage are maintaining positive growth and adoption. Any project may be moved to the Emeritus stage, provided the TAC and the Governing Board approve the transition via a 2/3 affirmative vote.
Sandbox
The Sandbox hosting stage is specific to projects that intend to join LF AI & Data Incubation in the future and wish to lay the foundations for that. Such projects for instance aim to extend one or more LF AI & Data projects with functionality or interoperability libraries, or they generally fit the LF AI & Data mission and provide the potential for a novel approach to existing functional areas (or are an attempt to meet an unfulfilled need).
Projects at the Sandbox stage must:
- Fit the scope and mission of LF AI & Data
- Have an OSI-approved license.
- Have a sponsor who is an existing LF AI & Data member. Alternatively, a new organization would join LF AI & Data and sponsor the project’s incubation application.
- Have an open and documented technical governance. The LF team can help set this up.
- The project’s founders adopt an open governance model documented in a Technical Charter for the project, and execute the Project Contribution Agreement transferring the project’s assets to the LF.
Sandbox stage projects are eligible to receive the following benefits:
- Neutral hosting of the project’s trademark and assets by LF AI & Data.
- Appointment of a TAC member as a project sponsor and provide recommendations regarding governance best practices.
- LF AI & Data blog post or similar announcing the project’s hosting in the Foundation (quarterly announcements).
- Right to refer to the project as an “LF AI & Data Sandbox Project,” with the right, subject to applicable trademark usage guidelines, to display the LF AI & Data logo on the project’s code repository.
- An initial and regularly scheduled license scan of the project’s codebase with results reported to the project’s mailing list.
- Ongoing source code security scans and reports.
- Infrastructure support includes mailing lists, wiki space, slack channel, etc.
- Marketing, communication, and PR support are limited to significant announcements.
- Access to the LFX platform.
- Support of the Foundation staff who are eager to help with the project.
Incubation
Projects in the Sandbox stage are generally expected to move to Incubation within 12 months from joining the Foundation following an evaluation by the TAC committee
Projects at the Incubation stage must:
- Have at least three organizations actively contributing to the project.
- Have a defined Technical Steering Committee (TSC) with a chairperson identified, with open and transparent communication.
- Have reached a minimum of 500 stars on GitHub.
- Have achieved and maintained an OpenSSF Best Practices Badge Program (Silver).
Incubation stage projects are eligible to receive all the benefits of the Sandbox stage projects plus:
- Right to refer to the project as an “LF AI & Data Incubation Project,” with the right, subject to applicable trademark usage guidelines, to display the LF AI & Data logo on the project’s code repository.
- Creative and artwork support covering website, logo, and other required creative work.
- Marketing, communication, and PR support, including project promotion via blog posts, social media, and LF AI & Data website.
- Access to the Bevy platform for community-hosted events.
Graduation
Projects in the Incubation stage are generally expected to move to Graduation within 12-18 months from joining the Foundation following an evaluation by the TAC committee.
Projects at the Graduation stage must:
- Have a healthy number of code contributions from at least five organizations.
- Have reached a minimum of 1000 stars on GitHub.
- Have achieved and maintained an [OpenSSF Best Practices Badge Program (Gold)](OpenSSF Best Practices Gold Badge](https://www.bestpractices.dev/en/criteria/2).
- Have demonstrated a substantial ongoing flow of commits and merged contributions for the past 12 months*.
- Have completed at least one collaboration with another LF AI & Data hosted project
Since some of these criteria can vary depending on a project’s type, scope, and size, the TAC has final judgment over the activity level adequate to meet these criteria.
To graduate, the project must receive the affirmative vote of the TAC and the Governing Board.
When a project graduates, it will be eligible to have a technical lead appointed to represent the project on the LF AI & Data Technical Advisory Council. The project is expected to nominate a lead to the TAC who can attend and participate in the bi-weekly TAC calls.
Graduation stage projects are eligible for all the benefits of Incubation stage projects plus:
- LF AI & Data blog announcement or similar announcing the project graduation, including promotion activities.
- Graduation stage projects may receive support as determined by the Governing Board.
- Right to refer to the project as an “LF AI & Data Graduation Project,” with the right, subject to applicable trademark usage guidelines, to display the LF AI & Data logo on the project’s code repository.
- Voting seat on the TAC.
- Advanced IT infrastructure support (pending board approval).
- Additional ecosystem development opportunities include training courses, certification development, and conformance programs (pending board approval).
Emeritus
There are times when projects become inactive for various reasons. There are cases where the TAC may no longer support a project. The project will be transitioned to the Emeritus stage and archived in such cases.
What does archiving for an LF AI & Data project mean?
- LF AI & Data will no longer provide support for the project beyond what’s deemed necessary as part of the archiving process.
- LF AI & Data will list the project online as archived.
- All assets, including trademarks of archived projects, will remain hosted by LF AI & Data and the Linux Foundation.
Process of Archiving a Project
- A proposal must be submitted to the TAC via the regularly scheduled TAC call.
- The proposal will be presented to the TAC and include an explanation supporting archiving of the project.
- The proposal must remain open for at least two weeks of discussion.
- A vote must be finalized with 2/3 approval from the TAC and 2/3 approval from the Governing Board.
4. Joining requirements
The requirements for applying for Sandbox stage are:
- Fit the scope and mission of LF AI & Data.
- Have a sponsor who is an existing LF AI & Data member. Alternatively, a new organization would join LF AI & Data and sponsor the project’s incubation application. Have an open and documented technical governance. The Linux Foundation team can help set this up as part of the onboarding process.
- Have an OSI-approved license.
- The project’s founders adopt an open governance model documented in a Technical Charter for the project, and execute the Project Contribution Agreement transferring the project’s assets to the Linux Foundation..
To be accepted into the Incubation stage, a project must meet all the requirements of the Sandbox stage plus:
- Have at least three organizations actively contributing to the project.
- Have a defined Technical Steering Committee (TSC) with a chairperson identified, with open and transparent communication.
- Have a sponsor who is an existing LF AI & Data member. Alternatively, a new organization would join LF AI & Data and sponsor the project’s incubation application.
- Have at least 500 stars on GitHub.
- Have achieved and maintained the OpenSSF Best Practices Silver Badge.
In addition to the affirmative vote of the TAC, incubation stage projects also require the affirmative vote of the Governing Board.
To be accepted into the Graduation stage, a project must meet the Incubation stage requirements plus:
- Have a healthy number of code contributions coming from at least five organizations.
- Have reached a minimum of 1000 stars on GitHub.
- Have achieved and maintained the OpenSSF Best Practices Gold Badge.
- Have demonstrated a substantial ongoing flow of commits and merged contributions for the past 12 months.
- Have completed at least one collaboration with another LF AI & Data hosted project
- Received the affirmative vote of two-thirds of the TAC and the affirmative vote of the Governing Board.
- Have a technical lead appointed for representation of the project on the LF AI & Data Technical Advisory Council.
Since these metrics can vary depending on the type, scope, and size of a project, the TAC has final judgment over the level of activity that is adequate to meet these criteria.
5. Proposing a project for Sandbox hosting
- Submit a Project Contribution Proposal via a GitHub pull request
- Move the project’s code into its own GitHub organization and not under its founders organization
- Enable two-factor authentication for all members of the project’s GitHub organization
- Install the GitHub DCO app on all repos
- Add @thelinuxfoundation as a co-owner of the GitHub organization
- Achieve and maintain the OpenSSF Best Practices Passing Badge.
- Identify who on the project will handle security issues (could be a team)
- LF AI & Data will set up a mailing list to receive and discuss security vulnerability reporting
- Have the following files in GitHub:
LICENSE.md
at the root of the repository specifying the terms and conditions for using, distributing, and modifying the software. In addition, you should provide information on the license of any third-party code included in the project.README.md
welcomes new community members to the project and explains why the project is useful and how to get started.CONTRIBUTING.md
explains how to contribute to the project. The file explains the types of contributions needed and how the development process works.CODEOWNERS
to define individuals or teams responsible for code in a repository; document current project owners and emeritus committers.CODE_OF_CONDUCT.md
sets the ground rules for participants’ behavior and helps facilitate a friendly, welcoming environment. By default, projects should leverage the Linux Foundation Code of Conduct unless an alternate Code of Conduct is approved prior.RELEASE.md
provides documentation on the release methodology, cadence, criteria, etc.GOVERNANCE.md
documents the project’s technical governance.SUPPORT.md
lets users and developers know how to get help with the project.SECURITY.md
informs users and developers on how to report security issues and vulnerabilities.
- Optional but highly recommended Include a Software Package Data Exchange (SPDX) short-form identifier in a comment at the top headers of each source code file.
Projects that have questions about foundations are welcome to contact CORSA by emailing PI Greg Watson. And those with questions that are specific to LF AI & Data should use the contact link on its website.