PR Checks Reference#

Once you’ve opened a Pull Request (PR) with your Recipe Contribution, a series of automated checks is performed to ensure that the submitted files conform to the expected format. These checks fall into four sequential categories:

digraph g {
    graph [rankdir="LR"];

    node [shape=rect, style=rounded, color="#003B71"];
    a [label = "Structure"];
    b [label = "meta.yaml"];
    c [label = "Recipe: static"];
    d [label = "Recipe: execution"];

    a -> b -> c -> d;

}

The specific checks and status updates within each of these categories are as follows:

All checks up to and including Recipe run(s) created are automatically run against the latest commit of your PR each time you push to the PR branch. Once Recipe run(s) created succeeds, a human maintainer will initiate the transition from static recipe checks to the recipe execution test by issuing the /run recipe-test command.

Check results (including status and error messages) are reported via comments by @pangeo-forge-bot. This page lists examples of the types of comments you may receive based on various check results; navigate to them by following the links in the table above, the contents section of this page, or by simply scrolling down from here.

Structure#

As described in Recipe Contribution and Introduction Tutorial Part 3, your PR to the pangeo-forge/staged-recipes repository should add a single new directory within the recipes/ subdirectory:

staged-recipes/recipes/
                └──{dataset-name}/
                        ├──meta.yaml
                        └──{recipe-module-name}.py

The first check run against all PRs is that the content of the PR adheres to this structure.

All changes in recipes/ subdir#

If your PR has changed files outside of the recipes/ subdirectory, you will receive a comment notification like this:

pangeo-forge-bot pangeo-forge-bot commented

It looks like there may be a problem with the structure of your PR.

I encountered a FilesChangedOutsideRecipesSubdirError("This PR changes files outside the ``recipes/`` directory.").

Moving all changes within the recipes/ subdirectory will resolve this error.

Single layer of subdirectories#

If your PR contains additional subdirectories within the recipes/{dataset-name}/ directory, you will receive a comment notification like this:

pangeo-forge-bot pangeo-forge-bot commented

It looks like there may be a problem with the structure of your PR.

I encountered a MultipleLayersOfSubdirectoriesError('This PR uses more than one layer of subdirs.').

Placing all submitted files directly within the recipes/{dataset-name}/ directory will resolve this error.

Only one subdirectory#

If your PR contains more than one subdirectory within the recipes/ directory, (e.g., recipes/dataset-name-0/ and recipes/dataset-name-1/, etc.) you will receive a comment notification like this:

pangeo-forge-bot pangeo-forge-bot commented

It looks like there may be a problem with the structure of your PR.

I encountered a TooManySubdirectoriesError("Not all files in this PR exist within the same subdirectory of ``recipes/``.").

Removing all but one subdirectory of recipes/ from your PR will resolve this error.

meta.yaml#

Once the content of your PR is found to adhere to the expected structure, the next aspect that is checked is the meta.yaml file.

Presence#

The first meta.yaml check is simply to confirm that a file named exactly meta.yaml exists within your PR subdirectory. If no such file is found, you will recieve a comment notification such as:

pangeo-forge-bot pangeo-forge-bot commented

I don’t see a meta.yaml in this PR, only these files:

['recipes/great-dataset/meta.yml', 'recipes/great-dataset/recipe.py']

Please commit a meta.yaml that follows this template.

Sorry, I only recognize the longform .yaml extension! If you’re using the shortform .yml, please update your filename to use the longform extension.

Note that the error may arise from the fact that file is truly missing, or perhaps just that its name is not exactly meta.yaml. In the example above, changing the filename as follows

- meta.yml
+ meta.yaml

will resolve the error.

Loadability#

Pangeo Forge Cloud uses PyYAML’s yaml.safe_load to load the meta.yaml. If your meta.yaml cannot be loaded with this function, you will receive a comment notification such as:

pangeo-forge-bot pangeo-forge-bot commented

When I tried to load 'recipes/great-dataset/meta.yaml', I got a ScannerError. You should be able to replicate this error yourself.

First make sure you’re in the root of your cloned staged-recipes repo. Then run this code in a Python interpreter:

import yaml  # note: `pip install PyYAML` first

with open("recipes/great-dataset/meta.yaml", "r") as f:
    yaml.safe_load(f)

Please correct meta.yaml so that you’re able to run this code without error, then commit the corrected meta.yaml.

This notification will only arise if your meta.yaml is not a properly formatted YAML file. Following the instructions in the comment will allow you to replicate the error, which is often caused by small mistakes such as incorrect indentation or missing/incorrect punctuation (i.e. misplaced - dashes or : colons). Commiting a corrected meta.yaml which can be loaded with yaml.safe_load without error will allow you to move past this check.

Completeness#

Once your meta.yaml can be loaded, the completeness check confirms that all expected fields are included in the file. If any fields are found to be missing, you will receive a comment notification such as:

pangeo-forge-bot pangeo-forge-bot commented

It looks like your meta.yaml does not conform to the specification.

            2 validation errors for MetaYaml
pangeo_notebook_version
  field required (type=value_error.missing)
maintainers -> 0 -> orcid
  field required (type=value_error.missing)

Please correct your meta.yaml and commit the corrections to this PR.

In this example, the meta.yaml was found to be missing the pangeo_notebook_version field and the orcid ID for one of the recipe maintainers. Adding the missing fields will resolve this error.

For a complete reference of required fields, see links provided in the meta.yaml section of Required files.

Recipe: static#

Once the meta.yaml is found to be present, loadable, and complete, static checks of the recipe module begin.

Presence#

The first check is for the presence of the recipe module.

Pangeo Forge Cloud does not require any specific name for the recipe module. Instead, as described in the Required files section, the name of the recipe module is defined in the meta.yaml.

If a recipe module with the name indicated in meta.yaml is not found in the PR, you will receive a comment notification such as:

pangeo-forge-bot pangeo-forge-bot commented

I’m having trouble finding your recipe module (i.e. Python file) in this PR.

Your meta.yaml recipes section currently includes a recipe declared as:

- id: great-recipe-id
  object: recipe:great_recipe

The object here should conform to the format {recipe-module-name}:{recipe-object-name}.

In your PR I only see the following files:

['recipes/great-dataset/meta.yaml', 'recipes/great-dataset/recipy.py']

…none of which end with /recipe.py, which is unexpected given the object shown above.

Please help me find your recipe module by either:

  • Updating the meta.yaml recipes section object declaration to point to an existing module name; or

  • Changing the names of the .py files in this PR to point to the existing object in your meta.yaml

This error may occur due to the recipe module truly being missing, or perhaps due to an inconsistency between the recipe module name indicated in meta.yaml and the name of the actual file in the PR. In the case of the example above, a simple typo is causing the error; recip-E ending in E is accidently spelled as recip-Y ending in Y. Changing the recipe module name in the PR as follows:

- recipy.py
+ recipe.py

will resolve the error.

Recipe run(s) created#

Once the recipe module’s presence is confirmed, a new Recipe Run is registered with Pangeo Forge Cloud for every recipe included in the PR. When this is complete, you will receive a comment notification such as:

pangeo-forge-bot pangeo-forge-bot commented

🎉 New recipe runs created for the following recipes at sha abcdefg:

where abcdefg will be replaced with the actual SHA of your PR’s latest commit, and {recipe_run_id} will be replaced with an integer value uniquely identifying the newly created recipe run. If your PR defines more than one recipe, the comment notification will include additional bullet points, one for each recipe in the PR.

Note

The link in the above example comment does not resolve to a real webpage, because it does not have a {recipe_run_id} assigned to it. Please refer to

https://pangeo-forge.org/dashboard/recipe-runs/

for a listing of real Recipe Runs.

Recipe: execution#

/run recipe-test#

Automatically created recipe runs all start with a status of queued. To move the status of a recipe run to in_progress (thereby beginning the actual test execution of the recipe), a human maintainer of Pangeo Forge must issue a special command, as follows:

human-maintainer human-maintainer commented

/run recipe-test recipe_run_id={recipe_run_id}

in this example, {recipe_run_id} would be replaced with the integer id number of the recipe run to be run.

Importability#

The first thing that happens following a Pangeo Forge maintainer issuing the /run recipe-test command is a check that the recipe module is importable. If the recipe module calls local variables or packages which have not been assigned and/or imported, a NameError will occur on import, and you will receive a comment notification such as:

pangeo-forge-bot pangeo-forge-bot commented

When I tried to import your recipe module, I encountered this error

line 43, in <module>
    pattern = patterns.FilePattern(format_function, variable_merge_dim, month_concat_dim)
NameError: name 'format_function' is not defined

Please correct your recipe module so that it’s importable.

Test status: in_progress#

Assuming your recipe module is importable, a test execution of the recipe will begin, and you will receive a status update comment such as:

pangeo-forge-bot pangeo-forge-bot commented

✨ A test of your recipe great-recipe-id is now running on Pangeo Forge Cloud!

I’ll notify you with a comment on this thread when this test is complete. (This could be a little while…)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/{recipe_run_id}

The logs link provided in this comment notification can be used to follow the build progress of your recipe in real time. Any errors that arise, along with associated stack traces, are viewable in these logs.

Note

The link in the above example comment does not resolve to a real webpage, because it does not have a {recipe_run_id} assigned to it. Please refer to

https://pangeo-forge.org/dashboard/recipe-runs/

for a listing of real Recipe Runs, from which example logs are available.

Test status: failed#

If the test fails for any reason, you will receive a comment notification such as:

pangeo-forge-bot pangeo-forge-bot commented

Pangeo Forge Cloud told me that our test of your recipe great-recipe-id failed. But don’t worry, I’m sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/{recipe_run_id}

If you haven’t yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

Test status: success#

Once your recipe test succeeds (which may happen the first time, or after iterative improvements following prior failures), you will receive a long status report comment such as:

pangeo-forge-bot pangeo-forge-bot commented

🥳 Hooray! The test execution of your recipe great-recipe-id succeeded.

Here is a static representation of the dataset built by this recipe:

            <xarray.Dataset>
Dimensions:             (time: 2, depth: 57, lat: 180, lon: 360, nbounds: 2)
Coordinates:
    climatology_bounds  (time, nbounds) float32 dask.array<chunksize=(1, 2), meta=np.ndarray>
    crs                 int32 ...
  * depth               (depth) float32 0.0 5.0 10.0 ... 1.45e+03 1.5e+03
    depth_bnds          (depth, nbounds) float32 dask.array<chunksize=(57, 2), meta=np.ndarray>
  * lat                 (lat) float32 -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
    lat_bnds            (lat, nbounds) float32 dask.array<chunksize=(180, 2), meta=np.ndarray>
  * lon                 (lon) float32 -179.5 -178.5 -177.5 ... 177.5 178.5 179.5
    lon_bnds            (lon, nbounds) float32 dask.array<chunksize=(360, 2), meta=np.ndarray>
  * time                (time) object 1986-01-16 00:00:00 1958-02-16 00:00:00
Dimensions without coordinates: nbounds
Data variables: (12/40)
    A_an                (time, depth, lat, lon) float32 dask.array<chunksize=(1, 57, 180, 360), meta=np.ndarray>
    A_dd                (time, depth, lat, lon) float64 dask.array<chunksize=(1, 57, 180, 360), meta=np.ndarray>
    A_gp                (time, depth, lat, lon) float64 dask.array<chunksize=(1, 57, 180, 360), meta=np.ndarray>
    A_ma                (time, depth, lat, lon) float32 dask.array<chunksize=(1, 57, 180, 360), meta=np.ndarray>
    A_mn                (time, depth, lat, lon) float32 dask.array<chunksize=(1, 57, 180, 360), meta=np.ndarray>
    A_oa                (time, depth, lat, lon) float32 dask.array<chunksize=(1, 57, 180, 360), meta=np.ndarray>
    ...                  ...
    t_gp                (time, depth, lat, lon) float64 dask.array<chunksize=(1, 57, 180, 360), meta=np.ndarray>
    t_ma                (time, depth, lat, lon) float32 dask.array<chunksize=(1, 57, 180, 360), meta=np.ndarray>
    t_mn                (time, depth, lat, lon) float32 dask.array<chunksize=(1, 57, 180, 360), meta=np.ndarray>
    t_oa                (time, depth, lat, lon) float32 dask.array<chunksize=(1, 57, 180, 360), meta=np.ndarray>
    t_sd                (time, depth, lat, lon) float32 dask.array<chunksize=(1, 57, 180, 360), meta=np.ndarray>
    t_se                (time, depth, lat, lon) float32 dask.array<chunksize=(1, 57, 180, 360), meta=np.ndarray>
Attributes: (12/49)
    Conventions:                     CF-1.6, ACDD-1.3
    cdm_data_type:                   Grid
    comment:                         global climatology as part of the World ...
    contributor_name:                Ocean Climate Laboratory
    contributor_role:                Calculation of climatologies
    creator_email:                   NCEI.info@noaa.gov
    ...                              ...
    summary:                         Climatological mean Apparent Oxygen Util...
    time_coverage_duration:          P!!Y
    time_coverage_end:               2017-01-31
    time_coverage_resolution:        P01M
    time_coverage_start:             1900-01-01
    title:                           World Ocean Atlas 2018 : Apparent_Oxygen...

You can also open this dataset by running the following Python code

import fsspec
import xarray as xr

dataset_public_url = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/prod/recipe-run-11/pangeo-forge/staged-recipes/woa18-1deg-monthly.zarr'
mapper = fsspec.get_mapper(dataset_public_url)
ds = xr.open_zarr(mapper, consolidated=True)
ds

in this badge (or your Python interpreter of choice).

Checklist

Please copy-and-paste the list below into a new comment on this thread, and check the boxes off as you've reviewed them.

Note: This test execution is limited to two increments in the concatenation dimension, so you should expect the length of that dimension (e.g, "time" or equivalent) to be 2.

- [ ] Are the dimension lengths correct?
- [ ] Are all of the expected variables present?
- [ ] Does plotting the data produce a plot that looks like your dataset?
- [ ] Can you run a simple computation/reduction on the data and produce a plausible result?

Note

For illustrative purpose, the example comment above uses a dataset from:

At this point, Pangeo Forge maintainers will keep an eye out for your response comment:

recipe-contributor recipe-contributor commented

  • ☑️ Are the dimension lengths correct?

  • ☑️ Are all of the expected variables present?

  • ☑️ Does plotting the data produce a plot that looks like your dataset?

  • ☑️ Can you run a simple computation/reduction on the data and produce a plausible result?

based on the assessment you make of the test data. Once you’ve approved the test data with this comment, the PR will be merged by a Pangeo Forge maintainer.