NetCDF Zarr Multi-Variable Sequential Recipe: NOAA World Ocean Atlas#

This recipe is a little bit more complicated than the Xarray-to-Zarr Sequential Recipe: NOAA OISST. You shold probably review that one first; here we will skip the basics.

For this example, we will use data from NOAA’s World Ocean Atlas. As we can see from the data access page, the dataset is spread over many different files. What’s important here is that:

  • There is a time sequence (month) to the files.

  • Different variables live in different files.

Because our dataset is spread over muliple files, we will have to use a more complex File Pattern than the previous example.

Step 1: Get to know your source data#

This step can’t be skipped! It’s impossible to write a recipe if you don’t understand intimately how the source data are organized. World Ocean Atlass has eight different variables: Temperature, Salinity, Dissolved Oxygen, Percent Oxygen Saturation, Apparent Oxygen Utilization, Silicate, Phosphate, Nitrate. Each variable has a page that looks like this:

screenshot from NCEI website

For the purpose of this tutorial, we will use the 5-degree resolution monthly data. We can follow the links to finally find an HTTP download link for a single month of data.

download_url = 'https://www.ncei.noaa.gov/thredds-ocean/fileServer/ncei/woa/temperature/decav/5deg/woa18_decav_t01_5d.nc'

Let’s download it and try to open it with xarray.

! wget https://www.ncei.noaa.gov/thredds-ocean/fileServer/ncei/woa/temperature/decav/5deg/woa18_decav_t01_5d.nc
--2021-11-13 22:38:40--  https://www.ncei.noaa.gov/thredds-ocean/fileServer/ncei/woa/temperature/decav/5deg/woa18_decav_t01_5d.nc
Resolving www.ncei.noaa.gov... 205.167.25.177, 205.167.25.178, 205.167.25.167, ...
Connecting to www.ncei.noaa.gov|205.167.25.177|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 2389903 (2.3M) [application/x-netcdf]
Saving to: ‘woa18_decav_t01_5d.nc.4’

woa18_decav_t01_5d. 100%[===================>]   2.28M  8.91MB/s    in 0.3s    

2021-11-13 22:38:41 (8.91 MB/s) - ‘woa18_decav_t01_5d.nc.4’ saved [2389903/2389903]
import xarray as xr

try:
    ds = xr.open_dataset("woa18_decav_t01_5d.nc")
except ValueError as e:
    print(e)
unable to decode time units 'months since 1955-01-01 00:00:00' with 'the default calendar'. Try opening your dataset with decode_times=False or installing cftime if it is not installed.

❗️ Oh no, we got an error!

This is a very common problem. The calendar is encoded using “months since” units, which are ambiguous in the CF Conventions. (The precise length of a month is variable by month an year.)

We will follow the advice and do

ds = xr.open_dataset("woa18_decav_t01_5d.nc", decode_times=False)
ds
<xarray.Dataset>
Dimensions:             (lat: 36, nbounds: 2, lon: 72, depth: 57, time: 1)
Coordinates:
  * lat                 (lat) float32 -87.5 -82.5 -77.5 -72.5 ... 77.5 82.5 87.5
  * lon                 (lon) float32 -177.5 -172.5 -167.5 ... 167.5 172.5 177.5
  * depth               (depth) float32 0.0 5.0 10.0 ... 1.45e+03 1.5e+03
  * time                (time) float32 372.5
Dimensions without coordinates: nbounds
Data variables:
    crs                 int32 -2147483647
    lat_bnds            (lat, nbounds) float32 -90.0 -85.0 -85.0 ... 85.0 90.0
    lon_bnds            (lon, nbounds) float32 -180.0 -175.0 ... 175.0 180.0
    depth_bnds          (depth, nbounds) float32 0.0 2.5 ... 1.475e+03 1.5e+03
    climatology_bounds  (time, nbounds) float32 0.0 404.0
    t_mn                (time, depth, lat, lon) float32 ...
    t_dd                (time, depth, lat, lon) float64 ...
    t_sd                (time, depth, lat, lon) float32 ...
    t_se                (time, depth, lat, lon) float32 ...
Attributes: (12/49)
    Conventions:                     CF-1.6, ACDD-1.3
    title:                           World Ocean Atlas 2018 : sea_water_tempe...
    summary:                         PRERELEASE Climatological mean temperatu...
    references:                      Locarnini, R. A., A. V. Mishonov, O. K. ...
    institution:                     National Centers for Environmental Infor...
    comment:                         global climatology as part of the World ...
    ...                              ...
    publisher_email:                 NCEI.info@noaa.gov
    nodc_template_version:           NODC_NetCDF_Grid_Template_v2.0
    license:                         These data are openly available to the p...
    metadata_link:                   http://www.nodc.noaa.gov/OC5/WOA18/pr_wo...
    date_created:                    2018-02-19 
    date_modified:                   2018-02-19 
ds.time
<xarray.DataArray 'time' (time: 1)>
array([372.5], dtype=float32)
Coordinates:
  * time     (time) float32 372.5
Attributes:
    standard_name:  time
    long_name:      time
    units:          months since 1955-01-01 00:00:00
    axis:           T
    climatology:    climatology_bounds

We have opened the data, but the time coordinate is just a number, not an actual datetime object. We can work around this issue by explicitly specifying the 360_day calendar (in which every month is assumed to have 30 days).

ds.time.attrs['calendar'] = '360_day'
ds = xr.decode_cf(ds)
ds
<xarray.Dataset>
Dimensions:             (lat: 36, nbounds: 2, lon: 72, depth: 57, time: 1)
Coordinates:
  * lat                 (lat) float32 -87.5 -82.5 -77.5 -72.5 ... 77.5 82.5 87.5
  * lon                 (lon) float32 -177.5 -172.5 -167.5 ... 167.5 172.5 177.5
  * depth               (depth) float32 0.0 5.0 10.0 ... 1.45e+03 1.5e+03
  * time                (time) object 1986-01-16 00:00:00
Dimensions without coordinates: nbounds
Data variables:
    crs                 int32 ...
    lat_bnds            (lat, nbounds) float32 ...
    lon_bnds            (lon, nbounds) float32 ...
    depth_bnds          (depth, nbounds) float32 ...
    climatology_bounds  (time, nbounds) float32 ...
    t_mn                (time, depth, lat, lon) float32 ...
    t_dd                (time, depth, lat, lon) float64 ...
    t_sd                (time, depth, lat, lon) float32 ...
    t_se                (time, depth, lat, lon) float32 ...
Attributes: (12/49)
    Conventions:                     CF-1.6, ACDD-1.3
    title:                           World Ocean Atlas 2018 : sea_water_tempe...
    summary:                         PRERELEASE Climatological mean temperatu...
    references:                      Locarnini, R. A., A. V. Mishonov, O. K. ...
    institution:                     National Centers for Environmental Infor...
    comment:                         global climatology as part of the World ...
    ...                              ...
    publisher_email:                 NCEI.info@noaa.gov
    nodc_template_version:           NODC_NetCDF_Grid_Template_v2.0
    license:                         These data are openly available to the p...
    metadata_link:                   http://www.nodc.noaa.gov/OC5/WOA18/pr_wo...
    date_created:                    2018-02-19 
    date_modified:                   2018-02-19 
ds.time
<xarray.DataArray 'time' (time: 1)>
array([cftime.Datetime360Day(1986, 1, 16, 0, 0, 0, 0, has_year_zero=False)],
      dtype=object)
Coordinates:
  * time     (time) object 1986-01-16 00:00:00
Attributes:
    standard_name:  time
    long_name:      time
    axis:           T
    climatology:    climatology_bounds

We will need this trick for later.

Step 2: Define the File Pattern#

We can browse through the files on the website and see how they are organized.

https://www.ncei.noaa.gov/thredds-ocean/fileServer/ncei/woa/temperature/decav/5deg/woa18_decav_t01_5d.nc
https://www.ncei.noaa.gov/thredds-ocean/fileServer/ncei/woa/temperature/decav/5deg/woa18_decav_t02_5d.nc
...
https://www.ncei.noaa.gov/thredds-ocean/fileServer/ncei/woa/salinity/decav/5deg/woa18_decav_s01_5d.nc
https://www.ncei.noaa.gov/thredds-ocean/fileServer/ncei/woa/salinity/decav/5deg/woa18_decav_s02_5d.nc
...

From this we can deduce the general pattern. We write a function to return the correct filename for a given variable / month combination.

# Here it is important that the function argument name "time" match
# the name of the  dataset dimension "time"
def format_function(variable, time):
    return ("https://www.ncei.noaa.gov/thredds-ocean/fileServer/ncei/woa/"
            f"{variable}/decav/5deg/woa18_decav_{variable[0]}{time:02d}_5d.nc")

format_function("temperature", 2)
'https://www.ncei.noaa.gov/thredds-ocean/fileServer/ncei/woa/temperature/decav/5deg/woa18_decav_t02_5d.nc'

Now we turn this into a FilePattern object. This pattern has two distinct combine_dims: variable name and month. We want to merge over variable names and concatenate over months.

from pangeo_forge_recipes import patterns

variable_merge_dim = patterns.MergeDim("variable", keys=["temperature", "salinity"])

# Here it is important that the ConcatDim name "time" match the name of the 
# dataset dimension "time" (and the argument name in format_function)
month_concat_dim = patterns.ConcatDim("time", keys=list(range(1, 13)), nitems_per_file=1)

pattern = patterns.FilePattern(format_function, variable_merge_dim, month_concat_dim)
pattern
<FilePattern {'variable': 2, 'time': 12}>

Step 3: Write the Recipe#

Now that we have a FilePattern, we are ready to write our XarrayZarrRecipe.

Define an Input Preprocessor Function#

Above we noted that the time was encoded wrong in the original data. We might have also noticed that many variables that seems like coordinates (e.g. lat_bnds) were in the Data Variables part of the dataset. We will write a function that fixes both these issues.

def fix_encoding_and_attrs(ds, fname):
    ds.time.attrs['calendar'] = '360_day'
    ds = xr.decode_cf(ds)
    ds = ds.set_coords(['crs', 'lat_bnds', 'lon_bnds', 'depth_bnds', 'climatology_bounds'])
    return ds

Define the Recipe Object#

from pangeo_forge_recipes.recipes import XarrayZarrRecipe

recipe = XarrayZarrRecipe(
    pattern,
    xarray_open_kwargs={'decode_times': False},
    process_input=fix_encoding_and_attrs
)
recipe
XarrayZarrRecipe(file_pattern=<FilePattern {'variable': 2, 'time': 12}>, storage_config=StorageConfig(target=FSSpecTarget(fs=<fsspec.implementations.local.LocalFileSystem object at 0x113568f70>, root_path='/var/folders/f8/rh42xb3d1tnbw2bxsjwgym1c0000gn/T/tmp6dljapud/PKyYuJFI'), cache=CacheFSSpecTarget(fs=<fsspec.implementations.local.LocalFileSystem object at 0x113568f70>, root_path='/var/folders/f8/rh42xb3d1tnbw2bxsjwgym1c0000gn/T/tmp6dljapud/37IRtt94'), metadata=MetadataTarget(fs=<fsspec.implementations.local.LocalFileSystem object at 0x113568f70>, root_path='/var/folders/f8/rh42xb3d1tnbw2bxsjwgym1c0000gn/T/tmp6dljapud/1kGCgZma')), inputs_per_chunk=1, target_chunks={}, cache_inputs=True, copy_input_to_local_file=False, consolidate_zarr=True, consolidate_dimension_coordinates=True, xarray_open_kwargs={'decode_times': False}, xarray_concat_kwargs={}, delete_input_encoding=True, process_input=<function fix_encoding_and_attrs at 0x112f39820>, process_chunk=None, lock_timeout=None, subset_inputs={}, open_input_with_fsspec_reference=False)

Step 4: Run the Recipe#

In Xarray-to-Zarr Sequential Recipe: NOAA OISST we went through each step of recipe execution in detail. Here we will not do that. He we will let Prefect do the work for us.

flow = recipe.to_prefect()
flow.run()
[2022-02-16 17:50:16-0500] INFO - prefect.FlowRunner | Beginning Flow run for 'pangeo-forge-recipe'
[2022-02-16 17:50:16-0500] INFO - prefect.TaskRunner | Task 'cache_input': Starting task run...
[2022-02-16 17:50:16-0500] INFO - prefect.TaskRunner | Task 'cache_input': Finished task run for task with final state: 'Mapped'
[2022-02-16 17:50:16-0500] INFO - prefect.TaskRunner | Task 'cache_input[0]': Starting task run...
[2022-02-16 17:50:17-0500] INFO - prefect.TaskRunner | Task 'cache_input[0]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:17-0500] INFO - prefect.TaskRunner | Task 'cache_input[1]': Starting task run...
[2022-02-16 17:50:17-0500] INFO - prefect.TaskRunner | Task 'cache_input[1]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:17-0500] INFO - prefect.TaskRunner | Task 'cache_input[2]': Starting task run...
[2022-02-16 17:50:17-0500] INFO - prefect.TaskRunner | Task 'cache_input[2]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:17-0500] INFO - prefect.TaskRunner | Task 'cache_input[3]': Starting task run...
[2022-02-16 17:50:17-0500] INFO - prefect.TaskRunner | Task 'cache_input[3]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:17-0500] INFO - prefect.TaskRunner | Task 'cache_input[4]': Starting task run...
[2022-02-16 17:50:17-0500] INFO - prefect.TaskRunner | Task 'cache_input[4]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:17-0500] INFO - prefect.TaskRunner | Task 'cache_input[5]': Starting task run...
[2022-02-16 17:50:18-0500] INFO - prefect.TaskRunner | Task 'cache_input[5]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:18-0500] INFO - prefect.TaskRunner | Task 'cache_input[6]': Starting task run...
[2022-02-16 17:50:18-0500] INFO - prefect.TaskRunner | Task 'cache_input[6]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:18-0500] INFO - prefect.TaskRunner | Task 'cache_input[7]': Starting task run...
[2022-02-16 17:50:18-0500] INFO - prefect.TaskRunner | Task 'cache_input[7]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:18-0500] INFO - prefect.TaskRunner | Task 'cache_input[8]': Starting task run...
[2022-02-16 17:50:18-0500] INFO - prefect.TaskRunner | Task 'cache_input[8]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:18-0500] INFO - prefect.TaskRunner | Task 'cache_input[9]': Starting task run...
[2022-02-16 17:50:18-0500] INFO - prefect.TaskRunner | Task 'cache_input[9]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:18-0500] INFO - prefect.TaskRunner | Task 'cache_input[10]': Starting task run...
[2022-02-16 17:50:19-0500] INFO - prefect.TaskRunner | Task 'cache_input[10]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:19-0500] INFO - prefect.TaskRunner | Task 'cache_input[11]': Starting task run...
[2022-02-16 17:50:19-0500] INFO - prefect.TaskRunner | Task 'cache_input[11]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:19-0500] INFO - prefect.TaskRunner | Task 'cache_input[12]': Starting task run...
[2022-02-16 17:50:19-0500] INFO - prefect.TaskRunner | Task 'cache_input[12]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:19-0500] INFO - prefect.TaskRunner | Task 'cache_input[13]': Starting task run...
[2022-02-16 17:50:19-0500] INFO - prefect.TaskRunner | Task 'cache_input[13]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:19-0500] INFO - prefect.TaskRunner | Task 'cache_input[14]': Starting task run...
[2022-02-16 17:50:19-0500] INFO - prefect.TaskRunner | Task 'cache_input[14]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:19-0500] INFO - prefect.TaskRunner | Task 'cache_input[15]': Starting task run...
[2022-02-16 17:50:19-0500] INFO - prefect.TaskRunner | Task 'cache_input[15]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:19-0500] INFO - prefect.TaskRunner | Task 'cache_input[16]': Starting task run...
[2022-02-16 17:50:20-0500] INFO - prefect.TaskRunner | Task 'cache_input[16]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:20-0500] INFO - prefect.TaskRunner | Task 'cache_input[17]': Starting task run...
[2022-02-16 17:50:20-0500] INFO - prefect.TaskRunner | Task 'cache_input[17]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:20-0500] INFO - prefect.TaskRunner | Task 'cache_input[18]': Starting task run...
[2022-02-16 17:50:20-0500] INFO - prefect.TaskRunner | Task 'cache_input[18]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:20-0500] INFO - prefect.TaskRunner | Task 'cache_input[19]': Starting task run...
[2022-02-16 17:50:20-0500] INFO - prefect.TaskRunner | Task 'cache_input[19]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:20-0500] INFO - prefect.TaskRunner | Task 'cache_input[20]': Starting task run...
[2022-02-16 17:50:21-0500] INFO - prefect.TaskRunner | Task 'cache_input[20]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:21-0500] INFO - prefect.TaskRunner | Task 'cache_input[21]': Starting task run...
[2022-02-16 17:50:21-0500] INFO - prefect.TaskRunner | Task 'cache_input[21]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:21-0500] INFO - prefect.TaskRunner | Task 'cache_input[22]': Starting task run...
[2022-02-16 17:50:21-0500] INFO - prefect.TaskRunner | Task 'cache_input[22]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:21-0500] INFO - prefect.TaskRunner | Task 'cache_input[23]': Starting task run...
[2022-02-16 17:50:21-0500] INFO - prefect.TaskRunner | Task 'cache_input[23]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:21-0500] INFO - prefect.TaskRunner | Task 'prepare_target': Starting task run...
/Users/rwegener/repos/copy/pangeo-forge-recipes/pangeo_forge_recipes/recipes/xarray_zarr.py:111: RuntimeWarning: Failed to open Zarr store with consolidated metadata, falling back to try reading non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider:
1. Consolidating metadata in this existing store with zarr.consolidate_metadata().
2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or
3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.
  return xr.open_zarr(target.get_mapper())
[2022-02-16 17:50:21-0500] INFO - prefect.TaskRunner | Task 'prepare_target': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:21-0500] INFO - prefect.TaskRunner | Task 'store_chunk': Starting task run...
[2022-02-16 17:50:21-0500] INFO - prefect.TaskRunner | Task 'store_chunk': Finished task run for task with final state: 'Mapped'
[2022-02-16 17:50:21-0500] INFO - prefect.TaskRunner | Task 'store_chunk[0]': Starting task run...
[2022-02-16 17:50:22-0500] INFO - prefect.TaskRunner | Task 'store_chunk[0]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:22-0500] INFO - prefect.TaskRunner | Task 'store_chunk[1]': Starting task run...
[2022-02-16 17:50:22-0500] INFO - prefect.TaskRunner | Task 'store_chunk[1]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:22-0500] INFO - prefect.TaskRunner | Task 'store_chunk[2]': Starting task run...
[2022-02-16 17:50:22-0500] INFO - prefect.TaskRunner | Task 'store_chunk[2]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:22-0500] INFO - prefect.TaskRunner | Task 'store_chunk[3]': Starting task run...
[2022-02-16 17:50:22-0500] INFO - prefect.TaskRunner | Task 'store_chunk[3]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:22-0500] INFO - prefect.TaskRunner | Task 'store_chunk[4]': Starting task run...
[2022-02-16 17:50:22-0500] INFO - prefect.TaskRunner | Task 'store_chunk[4]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:22-0500] INFO - prefect.TaskRunner | Task 'store_chunk[5]': Starting task run...
[2022-02-16 17:50:23-0500] INFO - prefect.TaskRunner | Task 'store_chunk[5]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:23-0500] INFO - prefect.TaskRunner | Task 'store_chunk[6]': Starting task run...
[2022-02-16 17:50:23-0500] INFO - prefect.TaskRunner | Task 'store_chunk[6]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:23-0500] INFO - prefect.TaskRunner | Task 'store_chunk[7]': Starting task run...
[2022-02-16 17:50:23-0500] INFO - prefect.TaskRunner | Task 'store_chunk[7]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:23-0500] INFO - prefect.TaskRunner | Task 'store_chunk[8]': Starting task run...
[2022-02-16 17:50:23-0500] INFO - prefect.TaskRunner | Task 'store_chunk[8]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:23-0500] INFO - prefect.TaskRunner | Task 'store_chunk[9]': Starting task run...
[2022-02-16 17:50:23-0500] INFO - prefect.TaskRunner | Task 'store_chunk[9]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:23-0500] INFO - prefect.TaskRunner | Task 'store_chunk[10]': Starting task run...
[2022-02-16 17:50:23-0500] INFO - prefect.TaskRunner | Task 'store_chunk[10]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:23-0500] INFO - prefect.TaskRunner | Task 'store_chunk[11]': Starting task run...
[2022-02-16 17:50:24-0500] INFO - prefect.TaskRunner | Task 'store_chunk[11]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:24-0500] INFO - prefect.TaskRunner | Task 'store_chunk[12]': Starting task run...
[2022-02-16 17:50:24-0500] INFO - prefect.TaskRunner | Task 'store_chunk[12]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:24-0500] INFO - prefect.TaskRunner | Task 'store_chunk[13]': Starting task run...
[2022-02-16 17:50:24-0500] INFO - prefect.TaskRunner | Task 'store_chunk[13]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:24-0500] INFO - prefect.TaskRunner | Task 'store_chunk[14]': Starting task run...
[2022-02-16 17:50:24-0500] INFO - prefect.TaskRunner | Task 'store_chunk[14]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:24-0500] INFO - prefect.TaskRunner | Task 'store_chunk[15]': Starting task run...
[2022-02-16 17:50:24-0500] INFO - prefect.TaskRunner | Task 'store_chunk[15]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:24-0500] INFO - prefect.TaskRunner | Task 'store_chunk[16]': Starting task run...
[2022-02-16 17:50:24-0500] INFO - prefect.TaskRunner | Task 'store_chunk[16]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:24-0500] INFO - prefect.TaskRunner | Task 'store_chunk[17]': Starting task run...
[2022-02-16 17:50:25-0500] INFO - prefect.TaskRunner | Task 'store_chunk[17]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:25-0500] INFO - prefect.TaskRunner | Task 'store_chunk[18]': Starting task run...
[2022-02-16 17:50:25-0500] INFO - prefect.TaskRunner | Task 'store_chunk[18]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:25-0500] INFO - prefect.TaskRunner | Task 'store_chunk[19]': Starting task run...
[2022-02-16 17:50:25-0500] INFO - prefect.TaskRunner | Task 'store_chunk[19]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:25-0500] INFO - prefect.TaskRunner | Task 'store_chunk[20]': Starting task run...
[2022-02-16 17:50:25-0500] INFO - prefect.TaskRunner | Task 'store_chunk[20]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:25-0500] INFO - prefect.TaskRunner | Task 'store_chunk[21]': Starting task run...
[2022-02-16 17:50:25-0500] INFO - prefect.TaskRunner | Task 'store_chunk[21]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:25-0500] INFO - prefect.TaskRunner | Task 'store_chunk[22]': Starting task run...
[2022-02-16 17:50:26-0500] INFO - prefect.TaskRunner | Task 'store_chunk[22]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:26-0500] INFO - prefect.TaskRunner | Task 'store_chunk[23]': Starting task run...
[2022-02-16 17:50:26-0500] INFO - prefect.TaskRunner | Task 'store_chunk[23]': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:26-0500] INFO - prefect.TaskRunner | Task 'finalize_target': Starting task run...
[2022-02-16 17:50:26-0500] INFO - prefect.TaskRunner | Task 'finalize_target': Finished task run for task with final state: 'Success'
[2022-02-16 17:50:26-0500] INFO - prefect.FlowRunner | Flow run SUCCESS: all reference tasks succeeded
<Success: "All reference tasks succeeded.">
flow.visualize()
../../../_images/cee9a5d1b88e60670b6884d8985b8d18c54181c41c9765ee560b1aa350755f7c.svg

Step 5: Check the Target#

All the data should be there!

ds = xr.open_zarr(recipe.target_mapper)
ds
<xarray.Dataset>
Dimensions:             (time: 12, nbounds: 2, depth: 57, lat: 36, lon: 72)
Coordinates:
    climatology_bounds  (time, nbounds) float32 dask.array<chunksize=(1, 2), meta=np.ndarray>
    crs                 int32 ...
  * depth               (depth) float32 0.0 5.0 10.0 ... 1.45e+03 1.5e+03
    depth_bnds          (depth, nbounds) float32 dask.array<chunksize=(57, 2), meta=np.ndarray>
  * lat                 (lat) float32 -87.5 -82.5 -77.5 -72.5 ... 77.5 82.5 87.5
    lat_bnds            (lat, nbounds) float32 dask.array<chunksize=(36, 2), meta=np.ndarray>
  * lon                 (lon) float32 -177.5 -172.5 -167.5 ... 167.5 172.5 177.5
    lon_bnds            (lon, nbounds) float32 dask.array<chunksize=(72, 2), meta=np.ndarray>
  * time                (time) object 1986-01-16 00:00:00 ... 1986-12-16 00:0...
Dimensions without coordinates: nbounds
Data variables:
    s_dd                (time, depth, lat, lon) float64 dask.array<chunksize=(1, 57, 36, 72), meta=np.ndarray>
    s_mn                (time, depth, lat, lon) float32 dask.array<chunksize=(1, 57, 36, 72), meta=np.ndarray>
    s_sd                (time, depth, lat, lon) float32 dask.array<chunksize=(1, 57, 36, 72), meta=np.ndarray>
    s_se                (time, depth, lat, lon) float32 dask.array<chunksize=(1, 57, 36, 72), meta=np.ndarray>
    t_dd                (time, depth, lat, lon) float64 dask.array<chunksize=(1, 57, 36, 72), meta=np.ndarray>
    t_mn                (time, depth, lat, lon) float32 dask.array<chunksize=(1, 57, 36, 72), meta=np.ndarray>
    t_sd                (time, depth, lat, lon) float32 dask.array<chunksize=(1, 57, 36, 72), meta=np.ndarray>
    t_se                (time, depth, lat, lon) float32 dask.array<chunksize=(1, 57, 36, 72), meta=np.ndarray>
Attributes: (12/49)
    Conventions:                     CF-1.6, ACDD-1.3
    cdm_data_type:                   Grid
    comment:                         global climatology as part of the World ...
    contributor_name:                Ocean Climate Laboratory
    contributor_role:                Calculation of climatologies
    creator_email:                   NCEI.info@noaa.gov
    ...                              ...
    summary:                         Climatological mean salinity for the glo...
    time_coverage_duration:          P63Y
    time_coverage_end:               2017-01-31
    time_coverage_resolution:        P01M
    time_coverage_start:             1955-01-01
    title:                           World Ocean Atlas 2018 : sea_water_salin...

Just to check, we will make a plot.

ds.s_mn.isel(depth=0).mean(dim='time').plot()
<matplotlib.collections.QuadMesh at 0x16a60b310>
../../../_images/e235a2d14e1cb64ca52c4f1f32d5c867d1067d75d8ec62aae4206c19cfe47302.png

🎉 Yay! Our recipe worked!