LF AI & Data - How to Apply and Join

29 October 2024

Greg Watson

1. Introduction

This document, written as part of the work of the Center for Open-Source Research Software Stewardship and Advancement (CORSA), describes the information that a particular software project needs to collect/have in order to apply to and then join the LF AI & Data Foundation. Much of the content in this document comes from the LF AI & Data Foundation Lifecycle Document, including the website.

The LF AI & Data Foundation is a is a global not for profit foundation that hosts critical components of the global AI & Data technology infrastructure. It brings together the world’s top developers, end users, and vendors to identify and contribute to the projects and initiatives that address industry challenges for the benefit of all participants.

Their mission is to build and support an open AI community, and drive open source innovation in the AI, ML, DL and Data domains by enabling collaboration, sharing best practices, supporting development efforts, and the creation of new opportunities for all the members of the community.

Benefits of hosting your project at LF AI & Data include:

A detailed description of services available to hosted projects is available here.

Some additional benefits from joining a foundation are discussed in Section 5 of Watson, et al., 2023.

2. Relationships with LF AI & Data and basic requirements

The LF AI & Data Foundation supports open source projects within artificial intelligence and the data space. Hosting a project with the Linux Foundation follows open governance, which means that there is no one company or individual in control of a project. When the maintainers of an open source project decide to host it at the Linux Foundation, they specifically transfer ownership of the trademark for their project to the Linux Foundation. They don’t transfer the copyright, however, since usage is already available to other users under the open source license. Note that LF AI & Data will only be hosting the upstream code.

3. Project Lifecycle

There are four stages of project hosting in LF AI & Data: Sandbox, Incubating, Graduating, and Emeritus. There are specific requirements for each stage and different benefits for projects in each of the stages.

A project transitions from one stage to another by meeting certain requirements. If a project’s request to transition to a higher level stage is not approved, the project is eligible to re-apply after siz months of the first request.

The Technical Advisory Committee (TAC) votes on new projects joining LF AI & Data, as well as on promoting projects across incubation stages. The TAC will annually review all LF AI & Data projects. This annual review will include assessing whether projects in Sandbox and Incubation are making adequate progress towards the Graduation stage; and whether projects in the Graduation stage are maintaining positive growth and adoption. Any project may be moved to the Emeritus stage, provided the TAC and the Governing Board approve the transition via a 2/3 affirmative vote.

Sandbox

The Sandbox hosting stage is specific to projects that intend to join LF AI & Data Incubation in the future and wish to lay the foundations for that. Such projects for instance aim to extend one or more LF AI & Data projects with functionality or interoperability libraries, or they generally fit the LF AI & Data mission and provide the potential for a novel approach to existing functional areas (or are an attempt to meet an unfulfilled need).

Projects at the Sandbox stage must:

Sandbox stage projects are eligible to receive the following benefits:

Incubation

Projects in the Sandbox stage are generally expected to move to Incubation within 12 months from joining the Foundation following an evaluation by the TAC committee

Projects at the Incubation stage must:

Incubation stage projects are eligible to receive all the benefits of the Sandbox stage projects plus:

Graduation

Projects in the Incubation stage are generally expected to move to Graduation within 12-18 months from joining the Foundation following an evaluation by the TAC committee.

Projects at the Graduation stage must:

Since some of these criteria can vary depending on a project’s type, scope, and size, the TAC has final judgment over the activity level adequate to meet these criteria.

To graduate, the project must receive the affirmative vote of the TAC and the Governing Board.

When a project graduates, it will be eligible to have a technical lead appointed to represent the project on the LF AI & Data Technical Advisory Council. The project is expected to nominate a lead to the TAC who can attend and participate in the bi-weekly TAC calls.

Graduation stage projects are eligible for all the benefits of Incubation stage projects plus:

Emeritus

There are times when projects become inactive for various reasons. There are cases where the TAC may no longer support a project. The project will be transitioned to the Emeritus stage and archived in such cases.

What does archiving for an LF AI & Data project mean?

Process of Archiving a Project

4. Joining requirements

The requirements for applying for Sandbox stage are:

To be accepted into the Incubation stage, a project must meet all the requirements of the Sandbox stage plus:

In addition to the affirmative vote of the TAC, incubation stage projects also require the affirmative vote of the Governing Board.

To be accepted into the Graduation stage, a project must meet the Incubation stage requirements plus:

Since these metrics can vary depending on the type, scope, and size of a project, the TAC has final judgment over the level of activity that is adequate to meet these criteria.

5. Proposing a project for Sandbox hosting

  1. Submit a Project Contribution Proposal via a GitHub pull request
  2. Move the project’s code into its own GitHub organization and not under its founders organization
    • Enable two-factor authentication for all members of the project’s GitHub organization
    • Install the GitHub DCO app on all repos
    • Add @thelinuxfoundation as a co-owner of the GitHub organization
  3. Achieve and maintain the OpenSSF Best Practices Passing Badge.
  4. Identify who on the project will handle security issues (could be a team)
    • LF AI & Data will set up a mailing list to receive and discuss security vulnerability reporting
  5. Have the following files in GitHub:
    • LICENSE.md at the root of the repository specifying the terms and conditions for using, distributing, and modifying the software. In addition, you should provide information on the license of any third-party code included in the project.
    • README.md welcomes new community members to the project and explains why the project is useful and how to get started.
    • CONTRIBUTING.md explains how to contribute to the project. The file explains the types of contributions needed and how the development process works.
    • CODEOWNERS to define individuals or teams responsible for code in a repository; document current project owners and emeritus committers.
    • CODE_OF_CONDUCT.md sets the ground rules for participants’ behavior and helps facilitate a friendly, welcoming environment. By default, projects should leverage the Linux Foundation Code of Conduct unless an alternate Code of Conduct is approved prior.
    • RELEASE.md provides documentation on the release methodology, cadence, criteria, etc.
    • GOVERNANCE.md documents the project’s technical governance.
    • SUPPORT.md lets users and developers know how to get help with the project.
    • SECURITY.md informs users and developers on how to report security issues and vulnerabilities.
  6. Optional but highly recommended Include a Software Package Data Exchange (SPDX) short-form identifier in a comment at the top headers of each source code file.

Projects that have questions about foundations are welcome to contact CORSA by emailing PI Greg Watson. And those with questions that are specific to LF AI & Data should use the contact link on its website.