Storage#

Recipes need a place to store data. This information is provided to the recipe by its .storage_config attribute, which is an object of type pangeo_forge_recipes.storage.StorageConfig. The StorageConfig object looks like this

class pangeo_forge_recipes.storage.StorageConfig(target, cache=None, metadata=None)

A storage configuration container for recipe classes.

Parameters
  • target (FSSpecTarget) – The destination to which to write the output data.

  • cache (Optional[CacheFSSpecTarget]) – A location for caching source files.

  • metadata (Optional[MetadataTarget]) – A location for recipes to cache metadata about source files. Required if nitems_per_file=None on concat dim in file pattern.

As shown above, the storage configuration includes three distinct parts: target, cache, and metadata.

Default storage#

When you create a new recipe, a default StorageConfig will automatically be created pointing at a local a local tempfile.TemporaryDirectory. This allows you to write data to temporary local storage during the recipe development and debugging process. This means that any recipe can immediately be executed with minimal configuration. However, in a realistic “production” scenario, you will want to customize your storage locations.

Customizing storage: the target#

To write a recipe’s full dataset to a persistant storage location, re-assign .storage_config to be a pangeo_forge_recipes.storage.StorageConfig pointing to the location(s) of your choice. The minimal requirement for instantiating StorageConfig is a location in which to store the final dataset produced by the recipe. This is called the target. Pangeo Forge has a special class for this: pangeo_forge_recipes.storage.FSSpecTarget.

Creating a target requires two arguments:

  • The fs argument is an fsspec filesystem. Fsspec supports many different types of storage via its built in and third party implementations.

  • The root_path argument specifies the specific path where the data should be stored.

For example, creating a storage target for AWS S3 might look like this:

import s3fs
from pangeo_forge_recipes.storage import FSSpecTarget

fs = s3fs.S3FileSystem(key="MY_AWS_KEY", secret="MY_AWS_SECRET")
target_path = "pangeo-forge-bucket/my-dataset-v1.zarr"
target = FSSpecTarget(fs=fs, root_path=target_path)

This target can then be assiged to a recipe as follows:

from pangeo_forge_recipes.storage import StorageConfig

recipe.storage_config = StorageConfig(target)

Once assigned, the target can be accessed from the recipe with:

recipe.target

Customizing storage continued: caching#

Oftentimes it is useful to cache input files, rather than read them directly from the data provider. Input files can be cached at a location defined by a pangeo_forge_recipes.storage.CacheFSSpecTarget object. Some recipes require separate caching of metadata, which is provided by a third class pangeo_forge_recipes.storage.MetadataTarget.

A StorageConfig which declares all three storage locations is assigned as follows:

from pangeo_forge_recipes.storage import CacheFSSpecTarget, FSSpecTarget, MetadataTarget, StorageConfig

# define your fsspec filesystems for the target, cache, and metadata locations here

target = FSSpecTarget(fs=<fsspec-filesystem-for-target>, root_path="<path-for-target>")
cache = CacheFSSpecTarget(fs=<fsspec-filesystem-for-cache>, root_path="<path-for-cache>")
metadata = MetadataTarget(fs=<fsspec-filesystem-for-metadata>, root_path="<path-for-metadata>")

recipe.storage_config = StorageConfig(target, cache, metadata)