Common styles#

Open with Xarray, write to Zarr#

This recipe category uses Xarray to open input files and Zarr as the target dataset format. Inputs can be in any file format Xarray can read, including NetCDF, OPeNDAP, GRIB, Zarr, and, via rasterio, GeoTIFF and other geospatial raster formats. The target Zarr dataset will conform to the Xarray Zarr encoding conventions.


The following example recipes are representative of this style:

Below we give a very basic overview of how this recipe is used.

First you must define a file pattern. Once you have a FilePattern object, the recipe pipeline will contain at a minimum the following transforms applied to the file pattern collection:

Open with Kerchunk, write to virtual Zarr#

The standard Zarr recipe creates a copy of the original dataset in the Zarr format, this kerchunk-based reference recipe style does not copy the data and instead creates a Kerchunk mapping, which allows archival formats (including NetCDF, GRIB2, etc.) to be read as if they were Zarr datasets. More details about how Kerchunk works can be found in the kerchunk docs and this blog post.


Examples of this recipe style currently exist in development form, and will be cited here as soon as they are integration tested, which is pending pangeo-forge/pangeo-forge-recipes#608.

Is this style right for my dataset?#

For archival data stored on highly-throughput storage devices, and for which preprocessing is not required, reference recipes are an ideal and storage-efficient option. When choosing whether to create a reference recipe, it is important to consider questions such as:

Where are the archival (i.e. source) files for this dataset currently stored?#

If the original data are not already in the cloud (or some other high-bandwidth storage device, such as an on-prem data center), the performance benefits of using a reference recipe may be limited, because network speeds to access the original data will constrain I/O throughput.

Does this dataset require preprocessing?#

With reference recipes, modification of the underlying data is not possible. For example, the chunking schema of a dataset cannot be modified with Kerchunk, so you are limited to the chunk schema of the archival data. If you need to optimize your datasets chunking schema for space or time, the standard Zarr recipe is the only option. While you cannot modify chunking in a reference recipe, changes in the metadata (attributes, encoding, etc.) can be applied.