Recipe Contribution#

Congratulations! You’re reading about the most exciting part of Pangeo Forge: Recipe Contribution. By contributing your recipe to Pangeo Forge Cloud, you’re creating a maintainable dataset resource for the entire community.

To begin, you’ll need a thorough understanding of the source dataset you wish to ingest, and a vision for the desired target dataset (desired format, chunk structure, etc.) and any cleaning / processing steps needed along the way.

Note

No code required! ✨ If there’s dataset you’d love to see in analysis-ready, cloud optimized (ARCO) format, but you’re not ready to code just yet, we invite you to open a new Issue on pangeo-forge/staged-recipes describing your dataset of interest. The community can then collaborate on developing recipes for these datasets together.

Browse existing proposed recipe Issues here for inspiration and insight into what others in the community are working on. If you see a dataset you’re interested in there, feel free to chime in on the the discussion thread!

Required files#

Once you know what dataset you’d like to transform into analysis-ready, cloud optimized (ARCO) format, it’s time to begin writing the contribution itself. Every recipe contribution has two required files, a Recipe module and a meta.yaml, described below.

Recipe module#

The Recipe module is a Python file (i.e. a text file with the .py extension) which defines one or more Recipe Objects for your dataset of interest.

To write this Python file, you will need to understand the basics of Pangeo Forge Recipes. The Introduction Tutorial is a great place to start, and working through the Recipe Tutorials and Recipes User Guide can help develop your understanding.

During the development process, it is recommended to run subsets of your dataset transformation as you go along, to get a feel for how the resulting ARCO dataset will look. This can be done with either a local installation of pangeo-forge-recipes, or with the in-browser Pangeo Forge Sandbox.

meta.yaml#

The meta.yaml is a file which contains metadata and configuration your recipe contribution, including:

  • Identifying name(s) for your submitted recipe object(s) along with the name of the Recipe module in which they can be found.

    Note

    No specific name for the recipe module is required. Instead, the name of the recipe module is defined in the meta.yaml.

  • The version of Pangeo Forge Recipes used to develop your recipe(s)

  • The source data provider’s information and license under which the source data is distributed

  • Your name, GitHub username, and Orcid ID

  • The Bakery on which to run your recipe(s)

Please refer to this template and/or the Create a meta.yaml file section of the Running your recipe on Pangeo Forge Cloud tutorial for further details on how to create this file.

Making a PR#

Once you have your Required files ready to go, it’s time to submit them! All new recipe contributions to Pangeo Forge Cloud are staged and evaluated via Pull Requests (PRs) against the pangeo-forge/staged-recipes repository.

To make a PR with your contribution:

  1. Fork the pangeo-forge/staged-recipes GitHub repository

  2. Within the recipes/ directory of your fork, create a subdirectory with a descriptive name for your dataset.

    Note

    The name you chose for this subdirectory will be used to generate the name for the Feedstock repo generated from your PR.

  3. Add your Required files (Recipe module and meta.yaml) to this new subdirectory of your fork, so that your directory tree now looks like this:

    staged-recipes/recipes/
                    └──{dataset-name}/
                            ├──meta.yaml
                            └──{recipe-module-name}.py
    
  4. Open a Pull Request against pangeo-forge/staged-recipes from your fork.

PR Checks#

Once you’ve opened a PR against pangeo-forge/staged-recipes, a series of checks will be performed to ensure that your Required files adhere to the expected format and that the Recipe Objects contained within your Recipe module can produce the expected datasets when executed in subsetted form.

A full listing of the checks performed on each PR is provided in PR Checks Reference.

Creating a new feedstock#

Once your Required files have passed all of the PR Checks documented in PR Checks Reference, a Pangeo Forge maintainer will merge your PR.

In Pange Forge, merging a PR takes on a special meaning. Rather than integrating your files into the pangeo-forge/staged-recipes repository, merging your PR results in the automatic creation of a new Feedstock repository, which is itself automatically populated with the files you’ve submitted in your PR.

Creation of this new repository will trigger the first full production build of the recipe(s) you’ve contributed, and the datasets produced by this build will be added to the Pangeo Forge Catalog.

Maintaining the feedstock#

By creating a new repository to house your contribution, there is now a dedicated place for the provenance of datasets built from your recipe(s) to live. Your GitHub account will be granted maintainer permissions on this new repository at the time that it is is created. Subsequent GitHub Issues and PRs on this new Feedstock repository can be used to correct and/or improve the recipe(s) contained within it, and rebuild the datasets they produce, as needed.