Xarray-to-Zarr Sequential Recipe: NOAA OISST

This tutorial describes how to create a recipe from scratch. The source data is a sequence of NetCDF files accessed via HTTP. The target is a Zarr store.

Step 1: Get to know your source data

If you are developing a new recipe, you are probably starting from an existing dataset. The first step is to just get to know the dataset. For this tutorial, our example will be the NOAA Optimum Interpolation Sea Surface Temperature (OISST) v2.1. The authoritative website describing the data is https://www.ncdc.noaa.gov/oisst/optimum-interpolation-sea-surface-temperature-oisst-v21. This website contains links to the actual data files on the data access page. We will use the AVHRR-Only version of the data and follow the corresponding link to the Gridded netCDF Data. Browsing through the directories, we can see that there is one file per day. The very first day of the dataset is stored at the following URL:

https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc

From this example, we can work out the pattern of the file naming conventions. But first, let’s just download one of the files and open it up.

! curl -O https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1674k  100 1674k    0     0   881k      0  0:00:01  0:00:01 --:--:--  881k
import xarray as xr

ds = xr.open_dataset("oisst-avhrr-v02r01.19810901.nc")
ds
<xarray.Dataset>
Dimensions:  (lat: 720, lon: 1440, time: 1, zlev: 1)
Coordinates:
  * lat      (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88
  * lon      (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9
  * time     (time) datetime64[ns] 1981-09-01T12:00:00
  * zlev     (zlev) float32 0.0
Data variables:
    anom     (time, zlev, lat, lon) float32 ...
    err      (time, zlev, lat, lon) float32 ...
    ice      (time, zlev, lat, lon) float32 ...
    sst      (time, zlev, lat, lon) float32 ...
Attributes: (12/37)
    title:                      NOAA/NCEI 1/4 Degree Daily Optimum Interpolat...
    source:                     ICOADS, NCEP_GTS, GSFC_ICE, NCEP_ICE, Pathfin...
    id:                         oisst-avhrr-v02r01.19810901.nc
    naming_authority:           gov.noaa.ncei
    summary:                    NOAAs 1/4-degree Daily Optimum Interpolation ...
    cdm_data_type:              Grid
    ...                         ...
    metadata_link:              https://doi.org/10.25921/RE9P-PT57
    ncei_template_version:      NCEI_NetCDF_Grid_Template_v2.0
    comment:                    Data was converted from NetCDF-3 to NetCDF-4 ...
    sensor:                     Thermometer, AVHRR
    Conventions:                CF-1.6, ACDD-1.3
    references:                 Reynolds, et al.(2007) Daily High-Resolution-...

We can see there are four data variables, all with dimension (time, zlev, lat, lon). There is a dimension coordinate for each dimension, and no non-dimension coordinates. Each file in the sequence presumably has the same zlev, lat, and lon, but we expect time to be different in each one.

Let’s also check the total size of the dataset in the file.

print(f"File size is {ds.nbytes/1e6} MB")
File size is 16.597452 MB

The file size is important because it will help us define the chunk size Pangeo Forge will use to build up the target dataset.

Step 2: Define File Pattern

The first step in developing a recipe is to define a File Pattern. The file pattern describes how the source files (a.k.a. “inputs”) are organized.

In this case, we have a very simple sequence of files that we want to concatenate along a single dimension (time), so we can use the helper function pangeo_forge_recipes.patterns.pattern_from_file_sequence(). This allows us to simply pass a list of URLs, which we define explicitly.

from pangeo_forge_recipes.patterns import pattern_from_file_sequence

pattern_from_file_sequence?
Signature: pattern_from_file_sequence(file_list, concat_dim, nitems_per_file=None)
Docstring: Convenience function for creating a FilePattern from a list of files.
File:      ~/pangeo-forge/pangeo-forge/pangeo_forge_recipes/patterns.py
Type:      function

To populate the file_list, we need understand the file naming conventions. Let’s look again at the first URL

https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc

From this we deduce the following format string.

input_url_pattern = (
    "https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation"
    "/v2.1/access/avhrr/{yyyymm}/oisst-avhrr-v02r01.{yyyymmdd}.nc"
)

To convert this to an actual list of files, we use Pandas. At the time of writing, the latest available data is from 2021-01-05.

import pandas as pd

dates = pd.date_range("1981-09-01", "2021-01-05", freq="D")
input_urls = [
    input_url_pattern.format(
        yyyymm=day.strftime("%Y%m"), yyyymmdd=day.strftime("%Y%m%d")
    )
    for day in dates
]
print(f"Found {len(input_urls)} files!")
input_urls[-1]
Found 14372 files!
'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202101/oisst-avhrr-v02r01.20210105.nc'

Now we can define our pattern. We will include one more piece of information: we know from examining the file above that there is only one timestep per file. So we can set nitems_per_file=1.

pattern = pattern_from_file_sequence(input_urls, "time", nitems_per_file=1)
pattern
<FilePattern {'time': 14372}>

To check out pattern, we can try to get the data back out. The pattern is designed to be iterated over, so to key the first key, we do:

for key in pattern:
    break
key
(DimIndex(name='time', index=0, sequence_len=14372, operation=<CombineOp.CONCAT: 2>),)

We can now use “getitem” syntax on the FilePattern object to retrieve the file name based on this key.

pattern[key]
'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc'

As an alternative way to create the same pattern we could use the more verbose syntax to create a FilePattern class. With this method, we have to define a function which returns the file path, given a particular key. We might do it like this.

from pangeo_forge_recipes.patterns import ConcatDim, FilePattern

def format_function(time):
    return input_url_pattern.format(
        yyyymm=time.strftime("%Y%m"), yyyymmdd=time.strftime("%Y%m%d")
    )

concat_dim = ConcatDim(name="time", keys=dates, nitems_per_file=1)
pattern = FilePattern(format_function, concat_dim)
pattern
<FilePattern {'time': 14372}>

We can check that it gives us the same thing:

pattern[key]
'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc'

Step 3: Pick a Recipe class

Now that we have the file pattern defined, we have to plug it into a Recipe. Since we are reading NetCDF files, we will use the pangeo_forge_recipes.recipe.XarrayZarrRecipe class Let’s examine its documentation string in our notebook.

from pangeo_forge_recipes.recipes import XarrayZarrRecipe
XarrayZarrRecipe?
Init signature:
XarrayZarrRecipe(
    file_pattern: pangeo_forge_recipes.patterns.FilePattern,
    inputs_per_chunk: int = 1,
    target_chunks: Dict[str, int] = <factory>,
    target: Union[pangeo_forge_recipes.storage.AbstractTarget, NoneType] = None,
    input_cache: Union[pangeo_forge_recipes.storage.CacheFSSpecTarget, NoneType] = None,
    metadata_cache: Union[pangeo_forge_recipes.storage.MetadataTarget, NoneType] = None,
    cache_inputs: bool = True,
    copy_input_to_local_file: bool = False,
    consolidate_zarr: bool = True,
    xarray_open_kwargs: dict = <factory>,
    xarray_concat_kwargs: dict = <factory>,
    delete_input_encoding: bool = True,
    process_input: Union[Callable[[xarray.core.dataset.Dataset, str], xarray.core.dataset.Dataset], NoneType] = None,
    process_chunk: Union[Callable[[xarray.core.dataset.Dataset], xarray.core.dataset.Dataset], NoneType] = None,
    lock_timeout: Union[int, NoneType] = None,
    subset_inputs: Dict[str, int] = <factory>,
) -> None
Docstring:     
This class represents a dataset composed of many individual NetCDF files.
This class uses Xarray to read and write data and writes its output to Zarr.
The organization of the source files is described by the ``file_pattern``.
Currently this recipe supports at most one ``MergeDim`` and one ``ConcatDim``
in the File Pattern.

:param file_pattern: An object which describes the organization of the input files.
:param inputs_per_chunk: The number of inputs to use in each chunk along the concat dim.
   Must be an integer >= 1.
:param target_chunks: Desired chunk structure for the targret dataset. This is a dictionary
   mapping dimension names to chunk size. When using a :class:`patterns.FilePattern` with
   a :class:`patterns.ConcatDim` that specifies ``n_items_per_file``, then you don't need
   to include the concat dim in ``target_chunks``.
:param target: A location in which to put the dataset. Can also be assigned at run time.
:param input_cache: A location in which to cache temporary data.
:param metadata_cache: A location in which to cache metadata for inputs and chunks.
  Required if ``nitems_per_file=None`` on concat dim in file pattern.
:param cache_inputs: If ``True``, inputs are copied to ``input_cache`` before
  opening. If ``False``, try to open inputs directly from their source location.
:param copy_input_to_local_file: Whether to copy the inputs to a temporary
  local file. In this case, a path (rather than file object) is passed to
  ``xr.open_dataset``. This is required for engines that can't open
  file-like objects (e.g. pynio).
:param consolidate_zarr: Whether to consolidate the resulting Zarr dataset.
:param xarray_open_kwargs: Extra options for opening the inputs with Xarray.
:param xarray_concat_kwargs: Extra options to pass to Xarray when concatenating
  the inputs to form a chunk.
:param delete_input_encoding: Whether to remove Xarray encoding from variables
  in the input dataset
:param process_input: Function to call on each opened input, with signature
  `(ds: xr.Dataset, filename: str) -> ds: xr.Dataset`.
:param process_chunk: Function to call on each concatenated chunk, with signature
  `(ds: xr.Dataset) -> ds: xr.Dataset`.
:param lock_timeout: The default timeout for acquiring a chunk lock.
:param subset_inputs: If set, break each input file up into multiple chunks
  along dimension according to the specified mapping. For example,
  ``{'time': 5}`` would split each input file into 5 chunks along the
  time dimension. Multiple dimensions are allowed.
File:           ~/pangeo-forge/pangeo-forge/pangeo_forge_recipes/recipes/xarray_zarr.py
Type:           ABCMeta
Subclasses:     

There are lots of optional parameters, but only file_pattern is required. We can initialize our recipe by passing the file pattern to the recipe class.

from pangeo_forge_recipes.recipes import XarrayZarrRecipe

recipe = XarrayZarrRecipe(pattern)
recipe
XarrayZarrRecipe(file_pattern=<FilePattern {'time': 14372}>, inputs_per_chunk=1, target_chunks={}, target=None, input_cache=None, metadata_cache=None, cache_inputs=True, copy_input_to_local_file=False, consolidate_zarr=True, xarray_open_kwargs={}, xarray_concat_kwargs={}, delete_input_encoding=True, process_input=None, process_chunk=None, lock_timeout=None, subset_inputs={})

Now let’s think about the Zarr chunks that this recipe will produce. Each target chunk corresponds to one input. So each variable chunk will only be a few MB. That is too small. Let’s increase inputs_per_chunk to 20. This means that we will need to be able to hold 10 files like the one we examined above in memory at once. That’s 16MB * 10 = 160MB. Not a problem!

recipe = XarrayZarrRecipe(pattern, inputs_per_chunk=10)
recipe
XarrayZarrRecipe(file_pattern=<FilePattern {'time': 14372}>, inputs_per_chunk=10, target_chunks={}, target=None, input_cache=None, metadata_cache=None, cache_inputs=True, copy_input_to_local_file=False, consolidate_zarr=True, xarray_open_kwargs={}, xarray_concat_kwargs={}, delete_input_encoding=True, process_input=None, process_chunk=None, lock_timeout=None, subset_inputs={})

Step 4: Play with the recipe

Now we will just explore our recipe a bit to check whether things make sense.

We will also turn on Pangeo Forge’s logging in order to understand better what it is doing under the hood:

import logging
logger = logging.getLogger("pangeo_forge_recipes")
formatter = logging.Formatter('%(name)s:%(levelname)s - %(message)s')
handler = logging.StreamHandler()
handler.setLevel(logging.INFO)
handler.setFormatter(formatter)
logger.setLevel(logging.INFO)
logger.addHandler(handler)

We can see how many inputs the recipe has like this:

all_inputs = list(recipe.iter_inputs())
len(all_inputs)
14372

And how many chunks:

all_chunks = list(recipe.iter_chunks())
len(all_chunks)
1438

We can now try to load the first chunk. This will raise an exception because we have not initialized any targets.

(Note that the open_chunk and open_input methods must be called as context managers.

%xmode minimal
with recipe.open_chunk(all_chunks[0]) as ds:
    display(ds)
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening inputs for chunk time-0
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-0: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc' directly.
Exception reporting mode: Minimal
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-1: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810902.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810902.nc' directly.
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-2: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810903.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810903.nc' directly.
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-3: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810904.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810904.nc' directly.
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-4: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810905.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810905.nc' directly.
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-5: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810906.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810906.nc' directly.
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-6: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810907.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810907.nc' directly.
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-7: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810908.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810908.nc' directly.
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-8: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810909.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810909.nc' directly.
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-9: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810910.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810910.nc' directly.
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Combining inputs for chunk 'time-0'
<xarray.Dataset>
Dimensions:  (lat: 720, lon: 1440, time: 10, zlev: 1)
Coordinates:
  * lat      (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88
  * lon      (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9
  * time     (time) datetime64[ns] 1981-09-01T12:00:00 ... 1981-09-10T12:00:00
  * zlev     (zlev) float32 0.0
Data variables:
    anom     (time, zlev, lat, lon) float32 dask.array<chunksize=(1, 1, 720, 1440), meta=np.ndarray>
    err      (time, zlev, lat, lon) float32 dask.array<chunksize=(1, 1, 720, 1440), meta=np.ndarray>
    ice      (time, zlev, lat, lon) float32 dask.array<chunksize=(1, 1, 720, 1440), meta=np.ndarray>
    sst      (time, zlev, lat, lon) float32 dask.array<chunksize=(1, 1, 720, 1440), meta=np.ndarray>
Attributes: (12/37)
    title:                      NOAA/NCEI 1/4 Degree Daily Optimum Interpolat...
    source:                     ICOADS, NCEP_GTS, GSFC_ICE, NCEP_ICE, Pathfin...
    id:                         oisst-avhrr-v02r01.19810901.nc
    naming_authority:           gov.noaa.ncei
    summary:                    NOAAs 1/4-degree Daily Optimum Interpolation ...
    cdm_data_type:              Grid
    ...                         ...
    metadata_link:              https://doi.org/10.25921/RE9P-PT57
    ncei_template_version:      NCEI_NetCDF_Grid_Template_v2.0
    comment:                    Data was converted from NetCDF-3 to NetCDF-4 ...
    sensor:                     Thermometer, AVHRR
    Conventions:                CF-1.6, ACDD-1.3
    references:                 Reynolds, et al.(2007) Daily High-Resolution-...

Step 5: Create storage targets

In order to run our recipe, we need to define two places to store data:

  • The Input Cache, where we will temporarily store the files we have downloaded

  • The Target, where the final Zarr dataset will live

import tempfile
from fsspec.implementations.local import LocalFileSystem
from pangeo_forge_recipes.storage import FSSpecTarget, CacheFSSpecTarget

fs_local = LocalFileSystem()

cache_dir = tempfile.TemporaryDirectory()
cache_target = CacheFSSpecTarget(fs_local, cache_dir.name)

target_dir = tempfile.TemporaryDirectory()
target = FSSpecTarget(fs_local, target_dir.name)

recipe.input_cache = cache_target
recipe.target = target
recipe
XarrayZarrRecipe(file_pattern=<FilePattern {'time': 14372}>, inputs_per_chunk=10, target_chunks={}, target=FSSpecTarget(fs=<fsspec.implementations.local.LocalFileSystem object at 0x7f83e8d479d0>, root_path='/var/folders/n8/63q49ms55wxcj_gfbtykwp5r0000gn/T/tmpuz91tfhl'), input_cache=CacheFSSpecTarget(fs=<fsspec.implementations.local.LocalFileSystem object at 0x7f83e8d479d0>, root_path='/var/folders/n8/63q49ms55wxcj_gfbtykwp5r0000gn/T/tmpq3zo16e1'), metadata_cache=None, cache_inputs=True, copy_input_to_local_file=False, consolidate_zarr=True, xarray_open_kwargs={}, xarray_concat_kwargs={}, delete_input_encoding=True, process_input=None, process_chunk=None, lock_timeout=None, subset_inputs={})

Now we try to load the chunk.

with recipe.open_chunk(all_chunks[0]) as ds:
    display(ds)
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening inputs for chunk time-0
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-0: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc' from cache
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/n8/63q49ms55wxcj_gfbtykwp5r0000gn/T/tmpq3zo16e1/fe866b608e5c7eafba93f06954124ba1-https_www.ncei.noaa.gov_data_sea-surface-temperature-optimum-interpolation_v2.1_access_avhrr_198109_oisst-avhrr-v02r01.19810901.nc'

It still didn’t work! That’s because we have not cached the inputs yet. We can have the recipe tell us which inputs are needed for each chunk via the inputs_for_chunk method.

for input_file in recipe.inputs_for_chunk(all_chunks[0]):
    recipe.cache_input(input_file)
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-0'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-1'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810902.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810902.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-2'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810903.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810903.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-3'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810904.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810904.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-4'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810905.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810905.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-5'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810906.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810906.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-6'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810907.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810907.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-7'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810908.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810908.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-8'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810909.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810909.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-9'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810910.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810910.nc' to cache

Step 6: Examine some chunks

Now we can finally open the first chunk!

with recipe.open_chunk(all_chunks[0]) as ds:
    display(ds)
    # need to load if we want to access the data outside of the context
    ds.load()
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening inputs for chunk time-0
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-0: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-1: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810902.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810902.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-2: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810903.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810903.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-3: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810904.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810904.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-4: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810905.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810905.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-5: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810906.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810906.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-6: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810907.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810907.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-7: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810908.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810908.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-8: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810909.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810909.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-9: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810910.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810910.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Combining inputs for chunk 'time-0'
<xarray.Dataset>
Dimensions:  (lat: 720, lon: 1440, time: 10, zlev: 1)
Coordinates:
  * lat      (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88
  * lon      (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9
  * time     (time) datetime64[ns] 1981-09-01T12:00:00 ... 1981-09-10T12:00:00
  * zlev     (zlev) float32 0.0
Data variables:
    anom     (time, zlev, lat, lon) float32 dask.array<chunksize=(1, 1, 720, 1440), meta=np.ndarray>
    err      (time, zlev, lat, lon) float32 dask.array<chunksize=(1, 1, 720, 1440), meta=np.ndarray>
    ice      (time, zlev, lat, lon) float32 dask.array<chunksize=(1, 1, 720, 1440), meta=np.ndarray>
    sst      (time, zlev, lat, lon) float32 dask.array<chunksize=(1, 1, 720, 1440), meta=np.ndarray>
Attributes: (12/37)
    title:                      NOAA/NCEI 1/4 Degree Daily Optimum Interpolat...
    source:                     ICOADS, NCEP_GTS, GSFC_ICE, NCEP_ICE, Pathfin...
    id:                         oisst-avhrr-v02r01.19810901.nc
    naming_authority:           gov.noaa.ncei
    summary:                    NOAAs 1/4-degree Daily Optimum Interpolation ...
    cdm_data_type:              Grid
    ...                         ...
    metadata_link:              https://doi.org/10.25921/RE9P-PT57
    ncei_template_version:      NCEI_NetCDF_Grid_Template_v2.0
    comment:                    Data was converted from NetCDF-3 to NetCDF-4 ...
    sensor:                     Thermometer, AVHRR
    Conventions:                CF-1.6, ACDD-1.3
    references:                 Reynolds, et al.(2007) Daily High-Resolution-...
print(f'Total chunk size: {ds.nbytes / 1e6} MB')
Total chunk size: 165.896724 MB

👀 Inspect the Xarray HTML repr carefully by clicking on the buttons to expand the different sections.

  • ✅ Is the shape of the variable what we expect?

  • ✅ Is time going in the right order?

  • ✅ Do the variable attributes make sense?

Now let’s visualize some data and make sure things look good

ds.sst[0].plot()
<matplotlib.collections.QuadMesh at 0x7f83f8b0f790>
../../_images/netcdf_zarr_sequential_47_1.png
ds.ice[-1].plot()
<matplotlib.collections.QuadMesh at 0x7f83daed3730>
../../_images/netcdf_zarr_sequential_48_1.png

The data look good! Now let’s try a random chunk from the middle.

chunk_number = 500
chunk_key = list(recipe.iter_chunks())[chunk_number]
for input_file in recipe.inputs_for_chunk(chunk_key):
    recipe.cache_input(input_file)
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-5000'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950511.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950511.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-5001'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950512.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950512.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-5002'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950513.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950513.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-5003'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950514.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950514.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-5004'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950515.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950515.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-5005'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950516.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950516.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-5006'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950517.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950517.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-5007'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950518.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950518.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-5008'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950519.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950519.nc' to cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Caching input 'time-5009'
pangeo_forge_recipes.storage:INFO - Caching file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950520.nc'
pangeo_forge_recipes.storage:INFO - Coping remote file 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950520.nc' to cache
with recipe.open_chunk(chunk_key) as ds_chunk:
    ds_chunk.load()
ds_chunk
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening inputs for chunk time-500
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-5000: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950511.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950511.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-5001: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950512.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950512.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-5002: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950513.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950513.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-5003: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950514.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950514.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-5004: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950515.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950515.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-5005: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950516.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950516.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-5006: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950517.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950517.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-5007: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950518.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950518.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-5008: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950519.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950519.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-5009: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950520.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/199505/oisst-avhrr-v02r01.19950520.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Combining inputs for chunk 'time-500'
<xarray.Dataset>
Dimensions:  (lat: 720, lon: 1440, time: 10, zlev: 1)
Coordinates:
  * lat      (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88
  * lon      (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9
  * time     (time) datetime64[ns] 1995-05-11T12:00:00 ... 1995-05-20T12:00:00
  * zlev     (zlev) float32 0.0
Data variables:
    anom     (time, zlev, lat, lon) float32 nan nan nan nan ... 0.11 0.11 0.11
    err      (time, zlev, lat, lon) float32 nan nan nan nan ... 0.3 0.3 0.3 0.3
    ice      (time, zlev, lat, lon) float32 nan nan nan nan ... 0.97 0.97 0.97
    sst      (time, zlev, lat, lon) float32 nan nan nan ... -1.69 -1.69 -1.69
Attributes: (12/37)
    title:                      NOAA/NCEI 1/4 Degree Daily Optimum Interpolat...
    source:                     ICOADS, NCEP_GTS, GSFC_ICE, NCEP_ICE, Pathfin...
    id:                         oisst-avhrr-v02r01.19950511.nc
    naming_authority:           gov.noaa.ncei
    summary:                    NOAAs 1/4-degree Daily Optimum Interpolation ...
    cdm_data_type:              Grid
    ...                         ...
    metadata_link:              https://doi.org/10.25921/RE9P-PT57
    ncei_template_version:      NCEI_NetCDF_Grid_Template_v2.0
    comment:                    Data was converted from NetCDF-3 to NetCDF-4 ...
    sensor:                     Thermometer, AVHRR
    Conventions:                CF-1.6, ACDD-1.3
    references:                 Reynolds, et al.(2007) Daily High-Resolution-...

Step 7: Try writing data

Now that we can see our chunks opening correctly, we are ready to try writing data to our target.

First we need to prepare the target.

recipe.prepare_target()
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Creating a new dataset in target
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening inputs for chunk time-0
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-0: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-1: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810902.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810902.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-2: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810903.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810903.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-3: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810904.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810904.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-4: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810905.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810905.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-5: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810906.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810906.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-6: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810907.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810907.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-7: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810908.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810908.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-8: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810909.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810909.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-9: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810910.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810910.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Combining inputs for chunk 'time-0'
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Storing dataset in /var/folders/n8/63q49ms55wxcj_gfbtykwp5r0000gn/T/tmpuz91tfhl
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Expanding target concat dim 'time' to size 14372

We should now see a Zarr group at the target location. Only the coordinates have been written, not the data variables.

import zarr
zgroup = zarr.open(target_dir.name)
print(zgroup.tree())
/
 ├── anom (14372, 1, 720, 1440) float32
 ├── err (14372, 1, 720, 1440) float32
 ├── ice (14372, 1, 720, 1440) float32
 ├── lat (720,) float32
 ├── lon (1440,) float32
 ├── sst (14372, 1, 720, 1440) float32
 ├── time (14372,) int64
 └── zlev (1,) float32

Let’s examine one of the data variables.

zgroup['sst'].info
Name/sst
Typezarr.core.Array
Data typefloat32
Shape(14372, 1, 720, 1440)
Chunk shape(10, 1, 720, 1440)
OrderC
Read-onlyFalse
CompressorBlosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store typezarr.storage.DirectoryStore
No. bytes59603558400 (55.5G)
No. bytes stored611
Storage ratio97550832.1
Chunks initialized0/1438

Now let’s write the first chunk.

recipe.store_chunk(all_chunks[0])
zgroup['sst'].info
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening inputs for chunk time-0
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-0: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-1: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810902.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810902.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-2: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810903.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810903.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-3: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810904.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810904.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-4: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810905.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810905.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-5: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810906.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810906.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-6: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810907.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810907.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-7: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810908.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810908.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-8: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810909.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810909.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Opening input with Xarray time-9: 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810910.nc'
pangeo_forge_recipes.storage:INFO - Opening 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810910.nc' from cache
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Combining inputs for chunk 'time-0'
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Storing variable anom chunk time-0 to Zarr region (slice(0, 10, None), slice(None, None, None), slice(None, None, None), slice(None, None, None))
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Storing variable err chunk time-0 to Zarr region (slice(0, 10, None), slice(None, None, None), slice(None, None, None), slice(None, None, None))
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Storing variable ice chunk time-0 to Zarr region (slice(0, 10, None), slice(None, None, None), slice(None, None, None), slice(None, None, None))
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Storing variable sst chunk time-0 to Zarr region (slice(0, 10, None), slice(None, None, None), slice(None, None, None), slice(None, None, None))
pangeo_forge_recipes.recipes.xarray_zarr:INFO - Storing variable time chunk time-0 to Zarr region (slice(0, 10, None),)
Name/sst
Typezarr.core.Array
Data typefloat32
Shape(14372, 1, 720, 1440)
Chunk shape(10, 1, 720, 1440)
OrderC
Read-onlyFalse
CompressorBlosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store typezarr.storage.DirectoryStore
No. bytes59603558400 (55.5G)
No. bytes stored19780925 (18.9M)
Storage ratio3013.2
Chunks initialized1/1438

We can see that one of the chunks has been written! 🎉

We can also open the dataset with xarray

ds = xr.open_zarr(target_dir.name)
ds
<xarray.Dataset>
Dimensions:  (lat: 720, lon: 1440, time: 14372, zlev: 1)
Coordinates:
  * lat      (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88
  * lon      (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9
  * time     (time) datetime64[ns] 1981-09-01T12:00:00 ... 1981-09-01T12:00:00
  * zlev     (zlev) float32 0.0
Data variables:
    anom     (time, zlev, lat, lon) float32 dask.array<chunksize=(10, 1, 720, 1440), meta=np.ndarray>
    err      (time, zlev, lat, lon) float32 dask.array<chunksize=(10, 1, 720, 1440), meta=np.ndarray>
    ice      (time, zlev, lat, lon) float32 dask.array<chunksize=(10, 1, 720, 1440), meta=np.ndarray>
    sst      (time, zlev, lat, lon) float32 dask.array<chunksize=(10, 1, 720, 1440), meta=np.ndarray>
Attributes: (12/37)
    Conventions:                CF-1.6, ACDD-1.3
    cdm_data_type:              Grid
    comment:                    Data was converted from NetCDF-3 to NetCDF-4 ...
    creator_email:              oisst-help@noaa.gov
    creator_url:                https://www.ncei.noaa.gov/
    date_created:               2020-05-08T19:05:13Z
    ...                         ...
    source:                     ICOADS, NCEP_GTS, GSFC_ICE, NCEP_ICE, Pathfin...
    standard_name_vocabulary:   CF Standard Name Table (v40, 25 January 2017)
    summary:                    NOAAs 1/4-degree Daily Optimum Interpolation ...
    time_coverage_end:          1981-09-01T23:59:59Z
    time_coverage_start:        1981-09-01T00:00:00Z
    title:                      NOAA/NCEI 1/4 Degree Daily Optimum Interpolat...

There should be data at the beginning…

ds.sst[0].plot()
<matplotlib.collections.QuadMesh at 0x7f83c86cf880>
../../_images/netcdf_zarr_sequential_64_1.png

But not the end…

ds.sst[-1].plot()
<matplotlib.collections.QuadMesh at 0x7f83e8f45610>
../../_images/netcdf_zarr_sequential_66_1.png

Postscript: Execute the full recipe

We are now confident that our recipe works as we expect. At this point we could either:

  • Execute it all ourselves (see Recipe Execution)

  • Create a new recipe feedstock on Pangeo Forge

If we wanted to execute it ourselves, one way would be to simply run the following code

for input_name in recipe.iter_inputs():
    recipe.cache_input(input_name)
recipe.prepare_target()
for chunk in recipe.iter_chunks():
    recipe.store_chunk(chunk)
recipe.finalize_target()

We aren’t going to do this in this notebook because it would take too long.

But hopefully now you have a better understanding of how Pangeo Forge recipes work.