Recipe Composition

Recipe Composition#

Overview#

A recipe describes the steps to transform archival source data in one format / location into analysis-ready, cloud-optimized (ARCO) data in another format / location. Technically, a recipe is a composite of Apache Beam transforms applied to the data collection associated with a file pattern. To write a recipe:

  1. Define a file pattern for your source data.

  2. Select a set of Transforms to apply to the source data.

  3. Put all of this code into a Python module (i.e., a file with .py extension), as demonstrated in Example recipes.

Generic sequence#

Most recipes will be composed following the generic sequence:

FilePattern | Opener | Preprocessor (Optional) | Writer

Tip

In Apache Beam, transforms are connected with the | pipe operator.

Or, in pseudocode:

recipe = (
    beam.Create(pattern.items())
    | Opener
    | Preprocessor  # optional
    | Writer
)

Pangeo Forge does not provide any importable, pre-defined sequences of transforms. This is by design, and leaves the composition process flexible enough to accomodate the heterogeneity of real world data. In practice, however, certain Common styles may work as the basis for many datasets.

Index#