HRRR Zarr Data Loading Guide¶

There's a variety of samples script out there for using the HRRR Zarr data. They don't all load the data the same way, and how to load it can be pretty confusing. This guide provides our current (as of writing) recommendations for the "best" ways to access the data, with weight given to performance (speed), code length, and a preference for high-level APIs. In the examples, as much as possible is written as functions that can be copied to your code.

Basic data model + urls¶

Identifying a unique Zarr array in the HRRR Zarr Archive¶

The data is split into separate Zarr arrays based on the following:

  • Model run (identified by datetime)
  • Level type:
    • surface ("sfc")
    • pressure ("prs")
  • Variable name (including level and parameter short name)
    • see this table
  • Model type:
    • forecast ("fcst")
    • analysis ("anl")

For your use case, many of these values may always be the same. For instance, perhaps there are two variables and other than that only the model run changes. But the functions provided here need to be usable for all cases, so let's create a data class in Python to keep track of these.

In [1]:
import dataclasses
import datetime

@dataclasses.dataclass
class ZarrId:
    run_hour: datetime.datetime
    level_type: str
    var_level: str
    var_name: str
    model_type: str
        
    def format_chunk_id(self, chunk_id):
        if self.model_type == "fcst": 
            # Extra id part since forecasts have an additional (time) dimension
            return "0." + str(chunk_id)
        else:
            return chunk_id

Identifying a chunk¶

Each Zarr array is chunked such that every grid point is consistently in the same spot in the same chunk. This allows us to efficiently analyze data at a single grid point or small region without having to load the whole zarr array. This picture shows how the grid is chunked:

chunks_conus_labeled.png

As you can see, the chunk ids have the format <grid y chunk index>.<grid x chunk index>

Generating urls¶

For the browser¶

Browser-ready url to explore data available for a model run:

In [2]:
level_type = "sfc"
model_type = "fcst"
run_hour = datetime.datetime(2021, 1, 1, 7)

def create_hrrr_zarr_explorer_url(level_type, model_type, run_hour):
    url = "https://hrrrzarr.s3.amazonaws.com/index.html"
    url += run_hour.strftime(
        f"#{level_type}/%Y%m%d/%Y%m%d_%Hz_{model_type}.zarr/")
    return url

print(create_hrrr_zarr_explorer_url(level_type, model_type, run_hour))
print(create_hrrr_zarr_explorer_url("prs", "anl", run_hour))
https://hrrrzarr.s3.amazonaws.com/index.html#sfc/20210101/20210101_07z_fcst.zarr/
https://hrrrzarr.s3.amazonaws.com/index.html#prs/20210101/20210101_07z_anl.zarr/

Url to directly download a chunk:

In [3]:
zarr_id = ZarrId(
                run_hour=datetime.datetime(2020, 8, 1, 0), # Aug 1, 0Z
                level_type="sfc",
                var_level="1000mb",
                var_name="TMP",
                model_type="anl"
                )
chunk_id = "4.3"

def create_https_chunk_url(zarr_id, chunk_id):
    url = "https://hrrrzarr.s3.amazonaws.com"
    url += zarr_id.run_hour.strftime(
        f"/{zarr_id.level_type}/%Y%m%d/%Y%m%d_%Hz_{zarr_id.model_type}.zarr/")
    url += f"{zarr_id.var_level}/{zarr_id.var_name}/{zarr_id.var_level}/{zarr_id.var_name}"
    url += f"/{zarr_id.format_chunk_id(chunk_id)}"
    return url

create_https_chunk_url(zarr_id, chunk_id)
Out[3]:
'https://hrrrzarr.s3.amazonaws.com/sfc/20200801/20200801_00z_anl.zarr/1000mb/TMP/1000mb/TMP/4.3'

For use in code¶

For automated purposes, we improve performance by downloading from the S3 protocol instead of HTTPS. There are three S3 urls we care about that are related to the zarr data format:

  • The group url
    • Start of the zarr array data format
    • Includes metadata such as the grid
  • The subgroup url
    • Where the actual data variable is stored
    • While subgroups are part of the zarr spec, there's not necessarily a good reason for the data in this case to have this extra level of nesting
  • The chunk url
    • Contains just the data for the chunk (no metadata)
    • Current APIs (zarr, xarray) don't support reading zarr data by chunks, so we have to write relatively low-level code to load data on this level
In [4]:
def create_s3_group_url(zarr_id, prefix=True):
    url = "s3://hrrrzarr/" if prefix else "" # Skip when using boto3
    url += zarr_id.run_hour.strftime(
        f"{zarr_id.level_type}/%Y%m%d/%Y%m%d_%Hz_{zarr_id.model_type}.zarr/")
    url += f"{zarr_id.var_level}/{zarr_id.var_name}"
    return url

create_s3_group_url(zarr_id)
Out[4]:
's3://hrrrzarr/sfc/20200801/20200801_00z_anl.zarr/1000mb/TMP'
In [5]:
def create_s3_subgroup_url(zarr_id, prefix=True):
    url = create_s3_group_url(zarr_id, prefix)
    url += f"/{zarr_id.var_level}"
    return url

create_s3_subgroup_url(zarr_id)
Out[5]:
's3://hrrrzarr/sfc/20200801/20200801_00z_anl.zarr/1000mb/TMP/1000mb'
In [6]:
def create_s3_chunk_url(zarr_id, chunk_id, prefix=False):
    url = create_s3_subgroup_url(zarr_id, prefix)
    url += f"/{zarr_id.var_name}/{zarr_id.format_chunk_id(chunk_id)}"
    return url

create_s3_chunk_url(zarr_id, chunk_id)
Out[6]:
'sfc/20200801/20200801_00z_anl.zarr/1000mb/TMP/1000mb/TMP/4.3'

Handling the Projection¶

The projection description for the spatial grid, which is stored in the HRRR grib2 files, is not available directly in the zarr arrays. Additionally, the zarr data has the x and y coordinates in the native Lambert Conformal Conic projection, but not the latitude and longitude data. The various use cases will have more specific examples of how to handle this information, but here we'll note that:

  1. The proj params you need to define the grid correctly are:
  {'a': 6371229,
   'b': 6371229,
   'proj': 'lcc',
   'lon_0': 262.5,
   'lat_0': 38.5,
   'lat_1': 38.5,
   'lat_2': 38.5}
  1. We also offer a "chunk index" that has the latitude and longitude coordinates for the grid, more on that under Use Case 3.

  2. The following code sets up the correct CRS in cartopy:

In [7]:
import cartopy.crs as ccrs

projection = ccrs.LambertConformal(central_longitude=262.5, 
                                   central_latitude=38.5, 
                                   standard_parallels=(38.5, 38.5),
                                    globe=ccrs.Globe(semimajor_axis=6371229,
                                                     semiminor_axis=6371229))

Loading Data¶

Use Case 1: Whole grid, one model run¶

In this case, we recommend using xarray.open_mfdataset. It allows you to load the group with its metadata and the subgroup that contains the actual data variable at the same time.

Note: If you get an error like "CRSError: Invalid datum string", there's a fix for it at the bottom of this page.

In [8]:
import s3fs
import xarray as xr
import metpy

def load_dataset(urls):
    fs = s3fs.S3FileSystem(anon=True)    
    ds = xr.open_mfdataset([s3fs.S3Map(url, s3=fs) for url in urls], engine='zarr')
    
    # add the projection data
    ds = ds.rename(projection_x_coordinate="x", projection_y_coordinate="y")
    ds = ds.metpy.assign_crs(projection.to_cf())
    ds = ds.metpy.assign_latitude_longitude()    
    return ds

load_dataset([create_s3_group_url(zarr_id), create_s3_subgroup_url(zarr_id)])
Out[8]:
<xarray.Dataset> Size: 34MB
Dimensions:                  (y: 1059, x: 1799)
Coordinates:
  * x                        (x) float64 14kB -2.698e+06 ... 2.696e+06
  * y                        (y) float64 8kB -1.587e+06 -1.584e+06 ... 1.587e+06
    metpy_crs                object 8B Projection: lambert_conformal_conic
    latitude                 (y, x) float64 15MB 21.14 21.15 ... 47.85 47.84
    longitude                (y, x) float64 15MB -122.7 -122.7 ... -60.95 -60.92
Data variables:
    TMP                      (y, x) float16 4MB dask.array<chunksize=(150, 150), meta=np.ndarray>
    forecast_period          timedelta64[ns] 8B ...
    forecast_reference_time  datetime64[ns] 8B ...
    height                   float64 8B ...
    pressure                 float64 8B ...
    time                     datetime64[ns] 8B ...
xarray.Dataset
    • y: 1059
    • x: 1799
    • x
      (x)
      float64
      -2.698e+06 -2.695e+06 ... 2.696e+06
      standard_name :
      projection_x_coordinate
      units :
      m
      _metpy_axis :
      x
      array([-2697520.142522, -2694520.142522, -2691520.142522, ...,  2690479.857478,
              2693479.857478,  2696479.857478])
    • y
      (y)
      float64
      -1.587e+06 -1.584e+06 ... 1.587e+06
      standard_name :
      projection_y_coordinate
      units :
      m
      _metpy_axis :
      y
      array([-1587306.152557, -1584306.152557, -1581306.152557, ...,  1580693.847443,
              1583693.847443,  1586693.847443])
    • metpy_crs
      ()
      object
      Projection: lambert_conformal_conic
      array(<metpy.plots.mapping.CFProjection object at 0x108313fa0>,
            dtype=object)
    • latitude
      (y, x)
      float64
      21.14 21.15 21.15 ... 47.85 47.84
      units :
      degrees_north
      standard_name :
      latitude
      array([[21.138123  , 21.14511004, 21.1520901 , ..., 21.1545089 ,
              21.14753125, 21.14054663],
             [21.16299459, 21.1699845 , 21.17696744, ..., 21.17938723,
              21.1724067 , 21.16541921],
             [21.18786863, 21.19486142, 21.20184723, ..., 21.20426802,
              21.19728462, 21.19029425],
             ...,
             [47.78955926, 47.799849  , 47.81012868, ..., 47.81369093,
              47.80341474, 47.79312849],
             [47.81409316, 47.82438621, 47.8346692 , ..., 47.83823259,
              47.8279531 , 47.81766354],
             [47.8386235 , 47.84891986, 47.85920615, ..., 47.86277069,
              47.85248789, 47.84219502]])
    • longitude
      (y, x)
      float64
      -122.7 -122.7 ... -60.95 -60.92
      units :
      degrees_east
      standard_name :
      longitude
      array([[-122.719528  , -122.69286132, -122.6661903 , ...,  -72.3430592 ,
               -72.31638668,  -72.28971849],
             [-122.72702499, -122.70035119, -122.67367305, ...,  -72.33557892,
               -72.30889927,  -72.28222397],
             [-122.73452632, -122.7078454 , -122.68116014, ...,  -72.3280943 ,
               -72.30140753,  -72.2747251 ],
             ...,
             [-134.0648096 , -134.02828423, -133.99174671, ...,  -61.02092594,
               -60.9843842 ,  -60.94785462],
             [-134.08013858, -134.04360126, -134.00705178, ...,  -61.00562502,
               -60.96907132,  -60.93252978],
             [-134.09547973, -134.05893046, -134.02236901, ...,  -60.99031194,
               -60.95374627,  -60.91719277]])
    • TMP
      (y, x)
      float16
      dask.array<chunksize=(150, 150), meta=np.ndarray>
      GRIB_PARAM :
      [2, 0, 0, 0]
      long_name :
      1000mb/TMP
      units :
      K
      Array Chunk
      Bytes 3.63 MiB 43.95 kiB
      Shape (1059, 1799) (150, 150)
      Dask graph 96 chunks in 2 graph layers
      Data type float16 numpy.ndarray
      1799 1059
    • forecast_period
      ()
      timedelta64[ns]
      ...
      standard_name :
      forecast_period
      [1 values with dtype=timedelta64[ns]]
    • forecast_reference_time
      ()
      datetime64[ns]
      ...
      standard_name :
      forecast_reference_time
      [1 values with dtype=datetime64[ns]]
    • height
      ()
      float64
      ...
      long_name :
      height
      units :
      m
      [1 values with dtype=float64]
    • pressure
      ()
      float64
      ...
      long_name :
      pressure
      units :
      Pa
      [1 values with dtype=float64]
    • time
      ()
      datetime64[ns]
      ...
      standard_name :
      time
      [1 values with dtype=datetime64[ns]]
    • x
      PandasIndex
      PandasIndex(Index([-2697520.1425219304, -2694520.1425219304, -2691520.1425219304,
             -2688520.1425219304, -2685520.1425219304, -2682520.1425219304,
             -2679520.1425219304, -2676520.1425219304, -2673520.1425219304,
             -2670520.1425219304,
             ...
              2669479.8574780696,  2672479.8574780696,  2675479.8574780696,
              2678479.8574780696,  2681479.8574780696,  2684479.8574780696,
              2687479.8574780696,  2690479.8574780696,  2693479.8574780696,
              2696479.8574780696],
            dtype='float64', name='x', length=1799))
    • y
      PandasIndex
      PandasIndex(Index([-1587306.1525566636, -1584306.1525566636, -1581306.1525566636,
             -1578306.1525566636, -1575306.1525566636, -1572306.1525566636,
             -1569306.1525566636, -1566306.1525566636, -1563306.1525566636,
             -1560306.1525566636,
             ...
              1559693.8474433364,  1562693.8474433364,  1565693.8474433364,
              1568693.8474433364,  1571693.8474433364,  1574693.8474433364,
              1577693.8474433364,  1580693.8474433364,  1583693.8474433364,
              1586693.8474433364],
            dtype='float64', name='y', length=1059))

Use Case 2: Whole grid, multiple model runs¶

In this case, it's more performant to use xarray.open_dataset. You call it once on the group url to get the grid data––remember, the geospatial grid's always the same so you can use any model run hour for that part. Then call open_dataset again for every run hour you're interested in.

In [9]:
fs = s3fs.S3FileSystem(anon=True) 

def load(url, run_hour=None, new_time_dimension=None):
    # Download the data from S3. May be lazy.
    ds = xr.open_dataset(s3fs.S3Map(url, s3=fs), engine="zarr")
    
    # Add the model run hour in as a dimension
    if run_hour is not None:
        ds[new_time_dimension] = run_hour
        ds = ds.set_coords(new_time_dimension)
    
    # Later on we use metpy functions that expect the grid variables to be x and y
    ds = ds.rename(projection_x_coordinate="x", projection_y_coordinate="y")
    return ds

def load_combined_dataset(zarr_ids):
    
    # Get the grid data (at a long forecast hour in case the whole time dim is needed)
    grid_zarr_id = dataclasses.replace(zarr_ids[0]) # dataclasses.replace is just a fancy copy function
    grid_zarr_id.run_hour = grid_zarr_id.run_hour.replace(hour=0)  
    grid = load(create_s3_group_url(grid_zarr_id))
    
    is_forecast = zarr_ids[0].model_type == "fcst"
    new_time_dimension = "reference_time" if is_forecast else "time"
    
    datasets = [load(create_s3_subgroup_url(zarr_id), zarr_id.run_hour, new_time_dimension) 
                for zarr_id in zarr_ids]
    
    if is_forecast: # Align the time axes of each dataset (b/c forecasts have different lengths)
        for dataset in datasets:
            dataset["time"] = grid["time"][:len(dataset["time"])]
        datasets = xr.align(*datasets, join="outer")
        
    ds = xr.concat(datasets, dim=new_time_dimension, combine_attrs="override")
    
    # Add the geospatial data to the combined dataset
    ds["x"] = grid["x"]
    ds["y"] = grid["y"]  
    ds = ds.metpy.assign_crs(projection.to_cf())
    ds = ds.metpy.assign_latitude_longitude()
    
    return ds
In [10]:
zarr_ids = [ZarrId(
                run_hour=datetime.datetime(2021, 7, 16, 6) + datetime.timedelta(hours=time_delta),
                level_type="sfc",
                var_level="entire_atmosphere_single_layer",
                var_name="PWAT",
                model_type="fcst"
                )
            for time_delta in range(3)] # get 3 hours starting at the given time

load_combined_dataset(zarr_ids)
Out[10]:
<xarray.Dataset> Size: 579MB
Dimensions:         (time: 48, reference_time: 3, y: 1059, x: 1799)
Coordinates:
  * time            (time) datetime64[ns] 384B 2021-07-16T01:00:00 ... 2021-0...
  * reference_time  (reference_time) datetime64[ns] 24B 2021-07-16T06:00:00 ....
  * x               (x) float64 14kB -2.698e+06 -2.695e+06 ... 2.696e+06
  * y               (y) float64 8kB -1.587e+06 -1.584e+06 ... 1.587e+06
    metpy_crs       object 8B Projection: lambert_conformal_conic
    latitude        (y, x) float64 15MB 21.14 21.15 21.15 ... 47.86 47.85 47.84
    longitude       (y, x) float64 15MB -122.7 -122.7 -122.7 ... -60.95 -60.92
Data variables:
    PWAT            (reference_time, time, y, x) float16 549MB 40.81 ... nan
xarray.Dataset
    • time: 48
    • reference_time: 3
    • y: 1059
    • x: 1799
    • time
      (time)
      datetime64[ns]
      2021-07-16T01:00:00 ... 2021-07-18
      long_name :
      time
      _metpy_axis :
      time
      array(['2021-07-16T01:00:00.000000000', '2021-07-16T02:00:00.000000000',
             '2021-07-16T03:00:00.000000000', '2021-07-16T04:00:00.000000000',
             '2021-07-16T05:00:00.000000000', '2021-07-16T06:00:00.000000000',
             '2021-07-16T07:00:00.000000000', '2021-07-16T08:00:00.000000000',
             '2021-07-16T09:00:00.000000000', '2021-07-16T10:00:00.000000000',
             '2021-07-16T11:00:00.000000000', '2021-07-16T12:00:00.000000000',
             '2021-07-16T13:00:00.000000000', '2021-07-16T14:00:00.000000000',
             '2021-07-16T15:00:00.000000000', '2021-07-16T16:00:00.000000000',
             '2021-07-16T17:00:00.000000000', '2021-07-16T18:00:00.000000000',
             '2021-07-16T19:00:00.000000000', '2021-07-16T20:00:00.000000000',
             '2021-07-16T21:00:00.000000000', '2021-07-16T22:00:00.000000000',
             '2021-07-16T23:00:00.000000000', '2021-07-17T00:00:00.000000000',
             '2021-07-17T01:00:00.000000000', '2021-07-17T02:00:00.000000000',
             '2021-07-17T03:00:00.000000000', '2021-07-17T04:00:00.000000000',
             '2021-07-17T05:00:00.000000000', '2021-07-17T06:00:00.000000000',
             '2021-07-17T07:00:00.000000000', '2021-07-17T08:00:00.000000000',
             '2021-07-17T09:00:00.000000000', '2021-07-17T10:00:00.000000000',
             '2021-07-17T11:00:00.000000000', '2021-07-17T12:00:00.000000000',
             '2021-07-17T13:00:00.000000000', '2021-07-17T14:00:00.000000000',
             '2021-07-17T15:00:00.000000000', '2021-07-17T16:00:00.000000000',
             '2021-07-17T17:00:00.000000000', '2021-07-17T18:00:00.000000000',
             '2021-07-17T19:00:00.000000000', '2021-07-17T20:00:00.000000000',
             '2021-07-17T21:00:00.000000000', '2021-07-17T22:00:00.000000000',
             '2021-07-17T23:00:00.000000000', '2021-07-18T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • reference_time
      (reference_time)
      datetime64[ns]
      2021-07-16T06:00:00 ... 2021-07-...
      array(['2021-07-16T06:00:00.000000000', '2021-07-16T07:00:00.000000000',
             '2021-07-16T08:00:00.000000000'], dtype='datetime64[ns]')
    • x
      (x)
      float64
      -2.698e+06 -2.695e+06 ... 2.696e+06
      standard_name :
      projection_x_coordinate
      units :
      m
      _metpy_axis :
      x
      array([-2697520.142522, -2694520.142522, -2691520.142522, ...,  2690479.857478,
              2693479.857478,  2696479.857478])
    • y
      (y)
      float64
      -1.587e+06 -1.584e+06 ... 1.587e+06
      standard_name :
      projection_y_coordinate
      units :
      m
      _metpy_axis :
      y
      array([-1587306.152557, -1584306.152557, -1581306.152557, ...,  1580693.847443,
              1583693.847443,  1586693.847443])
    • metpy_crs
      ()
      object
      Projection: lambert_conformal_conic
      array(<metpy.plots.mapping.CFProjection object at 0x108350760>,
            dtype=object)
    • latitude
      (y, x)
      float64
      21.14 21.15 21.15 ... 47.85 47.84
      units :
      degrees_north
      standard_name :
      latitude
      array([[21.138123  , 21.14511004, 21.1520901 , ..., 21.1545089 ,
              21.14753125, 21.14054663],
             [21.16299459, 21.1699845 , 21.17696744, ..., 21.17938723,
              21.1724067 , 21.16541921],
             [21.18786863, 21.19486142, 21.20184723, ..., 21.20426802,
              21.19728462, 21.19029425],
             ...,
             [47.78955926, 47.799849  , 47.81012868, ..., 47.81369093,
              47.80341474, 47.79312849],
             [47.81409316, 47.82438621, 47.8346692 , ..., 47.83823259,
              47.8279531 , 47.81766354],
             [47.8386235 , 47.84891986, 47.85920615, ..., 47.86277069,
              47.85248789, 47.84219502]])
    • longitude
      (y, x)
      float64
      -122.7 -122.7 ... -60.95 -60.92
      units :
      degrees_east
      standard_name :
      longitude
      array([[-122.719528  , -122.69286132, -122.6661903 , ...,  -72.3430592 ,
               -72.31638668,  -72.28971849],
             [-122.72702499, -122.70035119, -122.67367305, ...,  -72.33557892,
               -72.30889927,  -72.28222397],
             [-122.73452632, -122.7078454 , -122.68116014, ...,  -72.3280943 ,
               -72.30140753,  -72.2747251 ],
             ...,
             [-134.0648096 , -134.02828423, -133.99174671, ...,  -61.02092594,
               -60.9843842 ,  -60.94785462],
             [-134.08013858, -134.04360126, -134.00705178, ...,  -61.00562502,
               -60.96907132,  -60.93252978],
             [-134.09547973, -134.05893046, -134.02236901, ...,  -60.99031194,
               -60.95374627,  -60.91719277]])
    • PWAT
      (reference_time, time, y, x)
      float16
      40.81 40.81 40.81 ... nan nan nan
      GRIB_PARAM :
      [2, 0, 1, 3]
      long_name :
      entire_atmosphere_single_layer/PWAT
      units :
      kg m-2
      array([[[[40.8 , 40.8 , 40.8 , ..., 38.2 , 38.2 , 38.3 ],
               [40.8 , 41.44, 41.06, ..., 38.06, 38.2 , 38.06],
               [40.7 , 41.44, 41.06, ..., 37.8 , 37.94, 37.8 ],
               ...,
               [24.81, 24.94, 24.94, ..., 38.94, 39.2 , 38.8 ],
               [24.81, 24.81, 24.81, ..., 38.94, 39.2 , 38.56],
               [24.69, 24.69, 24.69, ..., 37.94, 38.06, 38.3 ]],
      
              [[40.97, 41.1 , 41.1 , ..., 39.1 , 39.22, 39.34],
               [40.97, 41.34, 41.22, ..., 38.97, 38.97, 39.1 ],
               [40.97, 41.34, 41.1 , ..., 38.6 , 38.72, 38.72],
               ...,
               [25.11, 25.11, 25.23, ..., 40.47, 40.6 , 40.47],
               [25.11, 25.11, 25.11, ..., 40.34, 40.47, 40.34],
               [24.98, 25.11, 25.11, ..., 39.6 , 39.84, 40.1 ]],
      
              [[41.28, 41.28, 41.4 , ..., 40.16, 40.28, 40.4 ],
               [41.28, 41.28, 41.4 , ..., 39.78, 39.9 , 40.03],
               [41.16, 41.28, 41.28, ..., 39.4 , 39.53, 39.66],
               ...,
      ...
               ...,
               [  nan,   nan,   nan, ...,   nan,   nan,   nan],
               [  nan,   nan,   nan, ...,   nan,   nan,   nan],
               [  nan,   nan,   nan, ...,   nan,   nan,   nan]],
      
              [[  nan,   nan,   nan, ...,   nan,   nan,   nan],
               [  nan,   nan,   nan, ...,   nan,   nan,   nan],
               [  nan,   nan,   nan, ...,   nan,   nan,   nan],
               ...,
               [  nan,   nan,   nan, ...,   nan,   nan,   nan],
               [  nan,   nan,   nan, ...,   nan,   nan,   nan],
               [  nan,   nan,   nan, ...,   nan,   nan,   nan]],
      
              [[  nan,   nan,   nan, ...,   nan,   nan,   nan],
               [  nan,   nan,   nan, ...,   nan,   nan,   nan],
               [  nan,   nan,   nan, ...,   nan,   nan,   nan],
               ...,
               [  nan,   nan,   nan, ...,   nan,   nan,   nan],
               [  nan,   nan,   nan, ...,   nan,   nan,   nan],
               [  nan,   nan,   nan, ...,   nan,   nan,   nan]]]], dtype=float16)
    • time
      PandasIndex
      PandasIndex(DatetimeIndex(['2021-07-16 01:00:00', '2021-07-16 02:00:00',
                     '2021-07-16 03:00:00', '2021-07-16 04:00:00',
                     '2021-07-16 05:00:00', '2021-07-16 06:00:00',
                     '2021-07-16 07:00:00', '2021-07-16 08:00:00',
                     '2021-07-16 09:00:00', '2021-07-16 10:00:00',
                     '2021-07-16 11:00:00', '2021-07-16 12:00:00',
                     '2021-07-16 13:00:00', '2021-07-16 14:00:00',
                     '2021-07-16 15:00:00', '2021-07-16 16:00:00',
                     '2021-07-16 17:00:00', '2021-07-16 18:00:00',
                     '2021-07-16 19:00:00', '2021-07-16 20:00:00',
                     '2021-07-16 21:00:00', '2021-07-16 22:00:00',
                     '2021-07-16 23:00:00', '2021-07-17 00:00:00',
                     '2021-07-17 01:00:00', '2021-07-17 02:00:00',
                     '2021-07-17 03:00:00', '2021-07-17 04:00:00',
                     '2021-07-17 05:00:00', '2021-07-17 06:00:00',
                     '2021-07-17 07:00:00', '2021-07-17 08:00:00',
                     '2021-07-17 09:00:00', '2021-07-17 10:00:00',
                     '2021-07-17 11:00:00', '2021-07-17 12:00:00',
                     '2021-07-17 13:00:00', '2021-07-17 14:00:00',
                     '2021-07-17 15:00:00', '2021-07-17 16:00:00',
                     '2021-07-17 17:00:00', '2021-07-17 18:00:00',
                     '2021-07-17 19:00:00', '2021-07-17 20:00:00',
                     '2021-07-17 21:00:00', '2021-07-17 22:00:00',
                     '2021-07-17 23:00:00', '2021-07-18 00:00:00'],
                    dtype='datetime64[ns]', name='time', freq='h'))
    • reference_time
      PandasIndex
      PandasIndex(DatetimeIndex(['2021-07-16 06:00:00', '2021-07-16 07:00:00',
                     '2021-07-16 08:00:00'],
                    dtype='datetime64[ns]', name='reference_time', freq=None))
    • x
      PandasIndex
      PandasIndex(Index([-2697520.1425219304, -2694520.1425219304, -2691520.1425219304,
             -2688520.1425219304, -2685520.1425219304, -2682520.1425219304,
             -2679520.1425219304, -2676520.1425219304, -2673520.1425219304,
             -2670520.1425219304,
             ...
              2669479.8574780696,  2672479.8574780696,  2675479.8574780696,
              2678479.8574780696,  2681479.8574780696,  2684479.8574780696,
              2687479.8574780696,  2690479.8574780696,  2693479.8574780696,
              2696479.8574780696],
            dtype='float64', name='x', length=1799))
    • y
      PandasIndex
      PandasIndex(Index([-1587306.1525566636, -1584306.1525566636, -1581306.1525566636,
             -1578306.1525566636, -1575306.1525566636, -1572306.1525566636,
             -1569306.1525566636, -1566306.1525566636, -1563306.1525566636,
             -1560306.1525566636,
             ...
              1559693.8474433364,  1562693.8474433364,  1565693.8474433364,
              1568693.8474433364,  1571693.8474433364,  1574693.8474433364,
              1577693.8474433364,  1580693.8474433364,  1583693.8474433364,
              1586693.8474433364],
            dtype='float64', name='y', length=1059))

Use Case 3: Gridpoint, multiple model runs¶

If you're downloading a lot of hours of data and don't need the whole grid, the xarray method will be slow, largely because it downloads various forms of metadata including the coordinates for every hour and variable. Instead, the data is chunked based on geospatial location, so you can download only the necessary chunks of the zarr array. Doing this is lower-level and won't decompress or read metadata automatically.

First, we get the chunk index file:

In [11]:
chunk_index = xr.open_zarr(s3fs.S3Map("s3://hrrrzarr/grid/HRRR_chunk_index.zarr", s3=fs))
chunk_index
Out[11]:
<xarray.Dataset> Size: 46MB
Dimensions:     (x: 1799, y: 1059)
Coordinates:
  * x           (x) float64 14kB -2.698e+06 -2.695e+06 ... 2.693e+06 2.696e+06
  * y           (y) float64 8kB -1.587e+06 -1.584e+06 ... 1.584e+06 1.587e+06
Data variables:
    chunk_id    (x, y) object 15MB dask.array<chunksize=(225, 265), meta=np.ndarray>
    chunk_x     (x) int32 7kB dask.array<chunksize=(1799,), meta=np.ndarray>
    chunk_y     (y) int32 4kB dask.array<chunksize=(1059,), meta=np.ndarray>
    in_chunk_x  (x) int32 7kB dask.array<chunksize=(1799,), meta=np.ndarray>
    in_chunk_y  (y) int32 4kB dask.array<chunksize=(1059,), meta=np.ndarray>
    index_x     (x) int32 7kB dask.array<chunksize=(1799,), meta=np.ndarray>
    index_y     (y) int32 4kB dask.array<chunksize=(1059,), meta=np.ndarray>
    latitude    (y, x) float64 15MB dask.array<chunksize=(133, 450), meta=np.ndarray>
    longitude   (y, x) float64 15MB dask.array<chunksize=(133, 450), meta=np.ndarray>
xarray.Dataset
    • x: 1799
    • y: 1059
    • x
      (x)
      float64
      -2.698e+06 -2.695e+06 ... 2.696e+06
      array([-2697520.142522, -2694520.142522, -2691520.142522, ...,  2690479.857478,
              2693479.857478,  2696479.857478])
    • y
      (y)
      float64
      -1.587e+06 -1.584e+06 ... 1.587e+06
      array([-1587306.152557, -1584306.152557, -1581306.152557, ...,  1580693.847443,
              1583693.847443,  1586693.847443])
    • chunk_id
      (x, y)
      object
      dask.array<chunksize=(225, 265), meta=np.ndarray>
      Array Chunk
      Bytes 14.54 MiB 465.82 kiB
      Shape (1799, 1059) (225, 265)
      Dask graph 32 chunks in 2 graph layers
      Data type object numpy.ndarray
      1059 1799
    • chunk_x
      (x)
      int32
      dask.array<chunksize=(1799,), meta=np.ndarray>
      Array Chunk
      Bytes 7.03 kiB 7.03 kiB
      Shape (1799,) (1799,)
      Dask graph 1 chunks in 2 graph layers
      Data type int32 numpy.ndarray
      1799 1
    • chunk_y
      (y)
      int32
      dask.array<chunksize=(1059,), meta=np.ndarray>
      Array Chunk
      Bytes 4.14 kiB 4.14 kiB
      Shape (1059,) (1059,)
      Dask graph 1 chunks in 2 graph layers
      Data type int32 numpy.ndarray
      1059 1
    • in_chunk_x
      (x)
      int32
      dask.array<chunksize=(1799,), meta=np.ndarray>
      Array Chunk
      Bytes 7.03 kiB 7.03 kiB
      Shape (1799,) (1799,)
      Dask graph 1 chunks in 2 graph layers
      Data type int32 numpy.ndarray
      1799 1
    • in_chunk_y
      (y)
      int32
      dask.array<chunksize=(1059,), meta=np.ndarray>
      Array Chunk
      Bytes 4.14 kiB 4.14 kiB
      Shape (1059,) (1059,)
      Dask graph 1 chunks in 2 graph layers
      Data type int32 numpy.ndarray
      1059 1
    • index_x
      (x)
      int32
      dask.array<chunksize=(1799,), meta=np.ndarray>
      Array Chunk
      Bytes 7.03 kiB 7.03 kiB
      Shape (1799,) (1799,)
      Dask graph 1 chunks in 2 graph layers
      Data type int32 numpy.ndarray
      1799 1
    • index_y
      (y)
      int32
      dask.array<chunksize=(1059,), meta=np.ndarray>
      Array Chunk
      Bytes 4.14 kiB 4.14 kiB
      Shape (1059,) (1059,)
      Dask graph 1 chunks in 2 graph layers
      Data type int32 numpy.ndarray
      1059 1
    • latitude
      (y, x)
      float64
      dask.array<chunksize=(133, 450), meta=np.ndarray>
      Array Chunk
      Bytes 14.54 MiB 467.58 kiB
      Shape (1059, 1799) (133, 450)
      Dask graph 32 chunks in 2 graph layers
      Data type float64 numpy.ndarray
      1799 1059
    • longitude
      (y, x)
      float64
      dask.array<chunksize=(133, 450), meta=np.ndarray>
      Array Chunk
      Bytes 14.54 MiB 467.58 kiB
      Shape (1059, 1799) (133, 450)
      Dask graph 32 chunks in 2 graph layers
      Data type float64 numpy.ndarray
      1799 1059
    • x
      PandasIndex
      PandasIndex(Index([-2697520.1425219304, -2694520.1425219304, -2691520.1425219304,
             -2688520.1425219304, -2685520.1425219304, -2682520.1425219304,
             -2679520.1425219304, -2676520.1425219304, -2673520.1425219304,
             -2670520.1425219304,
             ...
              2669479.8574780696,  2672479.8574780696,  2675479.8574780696,
              2678479.8574780696,  2681479.8574780696,  2684479.8574780696,
              2687479.8574780696,  2690479.8574780696,  2693479.8574780696,
              2696479.8574780696],
            dtype='float64', name='x', length=1799))
    • y
      PandasIndex
      PandasIndex(Index([-1587306.1525566636, -1584306.1525566636, -1581306.1525566636,
             -1578306.1525566636, -1575306.1525566636, -1572306.1525566636,
             -1569306.1525566636, -1566306.1525566636, -1563306.1525566636,
             -1560306.1525566636,
             ...
              1559693.8474433364,  1562693.8474433364,  1565693.8474433364,
              1568693.8474433364,  1571693.8474433364,  1574693.8474433364,
              1577693.8474433364,  1580693.8474433364,  1583693.8474433364,
              1586693.8474433364],
            dtype='float64', name='y', length=1059))

Next, we need to get the nearest point:

In [12]:
def get_nearest_point(projection, chunk_index, longitude, latitude):
    x, y = projection.transform_point(longitude, latitude, ccrs.PlateCarree())
    return chunk_index.sel(x=x, y=y, method="nearest")

nearest_point = get_nearest_point(projection, chunk_index, -111.8910, 40.7608)
chunk_id = nearest_point.chunk_id.values
print(chunk_id)
4.3

Then, we retrieve the chunk. We can't use xarray, since it doesn't understand how to load a single chunk of a zarr array. We could simply retrieve the data from the https url, but if we're going to do this request more than once (e.g. getting the same chunk for every hour of the day), it's faster to use boto3.

In [13]:
import boto3
from botocore import UNSIGNED
from botocore.config import Config

# Don't recreate this resource in a loop! That caused a 3-4x slowdown for me.
s3 = boto3.resource(service_name='s3', region_name='us-west-1', config=Config(signature_version=UNSIGNED))

def retrieve_object(s3, s3_url):
    obj = s3.Object('hrrrzarr', s3_url)
    return obj.get()['Body'].read()

zarr_id = ZarrId(
                run_hour=datetime.datetime(2019, 1, 1, 12),
                level_type="sfc",
                var_level="2m_above_ground",
                var_name="TMP",
                model_type="anl"
                )

compressed_data = retrieve_object(s3, create_s3_chunk_url(zarr_id, chunk_id))

Now we have to decompress. Since you're not using xarray or the zarr library, you need to hardcode the datatype you're looking for. See the float precision discussion here.

You may need to check the .zmetadata files for the variable you're using at the start and end of your date range to make sure you're using the correct datatype. An example .zmetadata file url is as follows:

https://hrrrzarr.s3.amazonaws.com/sfc/20190101/20190101_12z_anl.zarr/2m_above_ground/TMP/2m_above_ground/.zmetadata

If this file is missing you can check the .zarray,

https://hrrrzarr.s3.amazonaws.com/sfc/20190101/20190101_12z_anl.zarr/2m_above_ground/TMP/2m_above_ground/TMP/.zarray

These are plain-text json files.

In [22]:
import numcodecs as ncd
import numpy as np

def decompress_chunk(zarr_id, compressed_data):
    buffer = ncd.blosc.decompress(compressed_data)
    
    dtype = "<f2" # Note: The default datatype changed to <f4 on 2024-06-01_00z
                  # When in doubt please check the .zmetadata file for datatypes
    if zarr_id.var_level == "surface" and zarr_id.var_name == "PRES":
        dtype = "<f4"
        
    chunk = np.frombuffer(buffer, dtype=dtype)
    
    if zarr_id.model_type == "anl":
        data_array = np.reshape(chunk, (150, 150))
    else:
        entry_size = 22500
        data_array = np.reshape(chunk, (len(chunk)//entry_size, 150, 150))
        
    return data_array

chunk_data = decompress_chunk(zarr_id, compressed_data)

Once we have the decompressed data for the whole chunk, we need to index into the chunk to get the value at our point:

In [15]:
chunk_data[nearest_point.in_chunk_y.values, nearest_point.in_chunk_x.values]
Out[15]:
262.0

Putting this together to get a time series:

In [16]:
def get_value(zarr_id, chunk_id, nearest_point):
    compressed_data = retrieve_object(s3, create_s3_chunk_url(zarr_id, chunk_id))
    chunk_data = decompress_chunk(zarr_id, compressed_data)
    if zarr_id.model_type == "fcst":
        return chunk_data[:, nearest_point.in_chunk_y.values, nearest_point.in_chunk_x.values]
    else:
        return chunk_data[nearest_point.in_chunk_y.values, nearest_point.in_chunk_x.values]
    
zarr_ids = [dataclasses.replace(zarr_id, run_hour=zarr_id.run_hour + datetime.timedelta(hours=time_delta))
           for time_delta in range(12)]
[get_value(zid, chunk_id, nearest_point) for zid in zarr_ids]
Out[16]:
[262.0,
 261.2,
 261.2,
 261.5,
 262.8,
 264.5,
 265.0,
 265.8,
 266.2,
 267.5,
 267.5,
 267.5]
In [17]:
# forecast example
fcst_zarr_id = ZarrId(
                run_hour=datetime.datetime(2019, 1, 1, 12),
                level_type="sfc",
                var_level="surface",
                var_name="PRES",
                model_type="fcst"
                )
zarr_ids = [dataclasses.replace(fcst_zarr_id, run_hour=zarr_id.run_hour + datetime.timedelta(hours=time_delta))
           for time_delta in range(3)]
[get_value(zid, chunk_id, nearest_point) for zid in zarr_ids]
Out[17]:
[array([87240., 87380., 87420., 87460., 87430., 87460., 87500., 87450.,
        87470., 87500., 87490., 87560., 87640., 87660., 87690., 87700.,
        87710., 87780., 87760., 87770., 87800., 87800., 87780., 87790.,
        87810., 87840., 87850., 87880., 87900., 87900., 87850., 87790.,
        87750., 87740., 87750., 87760.], dtype=float32),
 array([87370., 87410., 87540., 87460., 87420., 87450., 87470., 87440.,
        87490., 87560., 87610., 87650., 87710., 87680., 87760., 87780.,
        87760., 87790.], dtype=float32),
 array([87500., 87500., 87550., 87460., 87530., 87460., 87460., 87470.,
        87490., 87530., 87620., 87680., 87740., 87700., 87780., 87800.,
        87730., 87750.], dtype=float32)]

Use Case 4: Small region of grid, multiple model runs¶

Here we get a subportion of the grid based on latitude/longitude bounds.

In [27]:
lat_top = 39
lat_bottom = 34
lon_top = -107
lon_bottom = -110 # Four Corners region

def check_boundaries(data):
    return ((lat_bottom < data.latitude) & (data.latitude < lat_top) & (
        lon_bottom < data.longitude) & (data.longitude < lon_top)).compute()

area = chunk_index.where(check_boundaries, drop=True)
area
Out[27]:
<xarray.Dataset> Size: 2MB
Dimensions:     (x: 110, y: 194)
Coordinates:
  * x           (x) float64 880B -1.15e+06 -1.147e+06 ... -8.255e+05 -8.225e+05
  * y           (y) float64 2kB -4.533e+05 -4.503e+05 ... 1.227e+05 1.257e+05
Data variables:
    chunk_id    (x, y) object 171kB dask.array<chunksize=(110, 152), meta=np.ndarray>
    chunk_x     (x, y) float64 171kB dask.array<chunksize=(110, 194), meta=np.ndarray>
    chunk_y     (y, x) float64 171kB dask.array<chunksize=(194, 110), meta=np.ndarray>
    in_chunk_x  (x, y) float64 171kB dask.array<chunksize=(110, 194), meta=np.ndarray>
    in_chunk_y  (y, x) float64 171kB dask.array<chunksize=(194, 110), meta=np.ndarray>
    index_x     (x, y) float64 171kB dask.array<chunksize=(110, 194), meta=np.ndarray>
    index_y     (y, x) float64 171kB dask.array<chunksize=(194, 110), meta=np.ndarray>
    latitude    (y, x) float64 171kB dask.array<chunksize=(21, 110), meta=np.ndarray>
    longitude   (y, x) float64 171kB dask.array<chunksize=(21, 110), meta=np.ndarray>
xarray.Dataset
    • x: 110
    • y: 194
    • x
      (x)
      float64
      -1.15e+06 -1.147e+06 ... -8.225e+05
      array([-1149520.142522, -1146520.142522, -1143520.142522, -1140520.142522,
             -1137520.142522, -1134520.142522, -1131520.142522, -1128520.142522,
             -1125520.142522, -1122520.142522, -1119520.142522, -1116520.142522,
             -1113520.142522, -1110520.142522, -1107520.142522, -1104520.142522,
             -1101520.142522, -1098520.142522, -1095520.142522, -1092520.142522,
             -1089520.142522, -1086520.142522, -1083520.142522, -1080520.142522,
             -1077520.142522, -1074520.142522, -1071520.142522, -1068520.142522,
             -1065520.142522, -1062520.142522, -1059520.142522, -1056520.142522,
             -1053520.142522, -1050520.142522, -1047520.142522, -1044520.142522,
             -1041520.142522, -1038520.142522, -1035520.142522, -1032520.142522,
             -1029520.142522, -1026520.142522, -1023520.142522, -1020520.142522,
             -1017520.142522, -1014520.142522, -1011520.142522, -1008520.142522,
             -1005520.142522, -1002520.142522,  -999520.142522,  -996520.142522,
              -993520.142522,  -990520.142522,  -987520.142522,  -984520.142522,
              -981520.142522,  -978520.142522,  -975520.142522,  -972520.142522,
              -969520.142522,  -966520.142522,  -963520.142522,  -960520.142522,
              -957520.142522,  -954520.142522,  -951520.142522,  -948520.142522,
              -945520.142522,  -942520.142522,  -939520.142522,  -936520.142522,
              -933520.142522,  -930520.142522,  -927520.142522,  -924520.142522,
              -921520.142522,  -918520.142522,  -915520.142522,  -912520.142522,
              -909520.142522,  -906520.142522,  -903520.142522,  -900520.142522,
              -897520.142522,  -894520.142522,  -891520.142522,  -888520.142522,
              -885520.142522,  -882520.142522,  -879520.142522,  -876520.142522,
              -873520.142522,  -870520.142522,  -867520.142522,  -864520.142522,
              -861520.142522,  -858520.142522,  -855520.142522,  -852520.142522,
              -849520.142522,  -846520.142522,  -843520.142522,  -840520.142522,
              -837520.142522,  -834520.142522,  -831520.142522,  -828520.142522,
              -825520.142522,  -822520.142522])
    • y
      (y)
      float64
      -4.533e+05 -4.503e+05 ... 1.257e+05
      array([-4.533062e+05, -4.503062e+05, -4.473062e+05, -4.443062e+05,
             -4.413062e+05, -4.383062e+05, -4.353062e+05, -4.323062e+05,
             -4.293062e+05, -4.263062e+05, -4.233062e+05, -4.203062e+05,
             -4.173062e+05, -4.143062e+05, -4.113062e+05, -4.083062e+05,
             -4.053062e+05, -4.023062e+05, -3.993062e+05, -3.963062e+05,
             -3.933062e+05, -3.903062e+05, -3.873062e+05, -3.843062e+05,
             -3.813062e+05, -3.783062e+05, -3.753062e+05, -3.723062e+05,
             -3.693062e+05, -3.663062e+05, -3.633062e+05, -3.603062e+05,
             -3.573062e+05, -3.543062e+05, -3.513062e+05, -3.483062e+05,
             -3.453062e+05, -3.423062e+05, -3.393062e+05, -3.363062e+05,
             -3.333062e+05, -3.303062e+05, -3.273062e+05, -3.243062e+05,
             -3.213062e+05, -3.183062e+05, -3.153062e+05, -3.123062e+05,
             -3.093062e+05, -3.063062e+05, -3.033062e+05, -3.003062e+05,
             -2.973062e+05, -2.943062e+05, -2.913062e+05, -2.883062e+05,
             -2.853062e+05, -2.823062e+05, -2.793062e+05, -2.763062e+05,
             -2.733062e+05, -2.703062e+05, -2.673062e+05, -2.643062e+05,
             -2.613062e+05, -2.583062e+05, -2.553062e+05, -2.523062e+05,
             -2.493062e+05, -2.463062e+05, -2.433062e+05, -2.403062e+05,
             -2.373062e+05, -2.343062e+05, -2.313062e+05, -2.283062e+05,
             -2.253062e+05, -2.223062e+05, -2.193062e+05, -2.163062e+05,
             -2.133062e+05, -2.103062e+05, -2.073062e+05, -2.043062e+05,
             -2.013062e+05, -1.983062e+05, -1.953062e+05, -1.923062e+05,
             -1.893062e+05, -1.863062e+05, -1.833062e+05, -1.803062e+05,
             -1.773062e+05, -1.743062e+05, -1.713062e+05, -1.683062e+05,
             -1.653062e+05, -1.623062e+05, -1.593062e+05, -1.563062e+05,
             -1.533062e+05, -1.503062e+05, -1.473062e+05, -1.443062e+05,
             -1.413062e+05, -1.383062e+05, -1.353062e+05, -1.323062e+05,
             -1.293062e+05, -1.263062e+05, -1.233062e+05, -1.203062e+05,
             -1.173062e+05, -1.143062e+05, -1.113062e+05, -1.083062e+05,
             -1.053062e+05, -1.023062e+05, -9.930615e+04, -9.630615e+04,
             -9.330615e+04, -9.030615e+04, -8.730615e+04, -8.430615e+04,
             -8.130615e+04, -7.830615e+04, -7.530615e+04, -7.230615e+04,
             -6.930615e+04, -6.630615e+04, -6.330615e+04, -6.030615e+04,
             -5.730615e+04, -5.430615e+04, -5.130615e+04, -4.830615e+04,
             -4.530615e+04, -4.230615e+04, -3.930615e+04, -3.630615e+04,
             -3.330615e+04, -3.030615e+04, -2.730615e+04, -2.430615e+04,
             -2.130615e+04, -1.830615e+04, -1.530615e+04, -1.230615e+04,
             -9.306153e+03, -6.306153e+03, -3.306153e+03, -3.061526e+02,
              2.693847e+03,  5.693847e+03,  8.693847e+03,  1.169385e+04,
              1.469385e+04,  1.769385e+04,  2.069385e+04,  2.369385e+04,
              2.669385e+04,  2.969385e+04,  3.269385e+04,  3.569385e+04,
              3.869385e+04,  4.169385e+04,  4.469385e+04,  4.769385e+04,
              5.069385e+04,  5.369385e+04,  5.669385e+04,  5.969385e+04,
              6.269385e+04,  6.569385e+04,  6.869385e+04,  7.169385e+04,
              7.469385e+04,  7.769385e+04,  8.069385e+04,  8.369385e+04,
              8.669385e+04,  8.969385e+04,  9.269385e+04,  9.569385e+04,
              9.869385e+04,  1.016938e+05,  1.046938e+05,  1.076938e+05,
              1.106938e+05,  1.136938e+05,  1.166938e+05,  1.196938e+05,
              1.226938e+05,  1.256938e+05])
    • chunk_id
      (x, y)
      object
      dask.array<chunksize=(110, 152), meta=np.ndarray>
      Array Chunk
      Bytes 166.72 kiB 130.62 kiB
      Shape (110, 194) (110, 152)
      Dask graph 2 chunks in 7 graph layers
      Data type object numpy.ndarray
      194 110
    • chunk_x
      (x, y)
      float64
      dask.array<chunksize=(110, 194), meta=np.ndarray>
      Array Chunk
      Bytes 166.72 kiB 166.72 kiB
      Shape (110, 194) (110, 194)
      Dask graph 1 chunks in 7 graph layers
      Data type float64 numpy.ndarray
      194 110
    • chunk_y
      (y, x)
      float64
      dask.array<chunksize=(194, 110), meta=np.ndarray>
      Array Chunk
      Bytes 166.72 kiB 166.72 kiB
      Shape (194, 110) (194, 110)
      Dask graph 1 chunks in 7 graph layers
      Data type float64 numpy.ndarray
      110 194
    • in_chunk_x
      (x, y)
      float64
      dask.array<chunksize=(110, 194), meta=np.ndarray>
      Array Chunk
      Bytes 166.72 kiB 166.72 kiB
      Shape (110, 194) (110, 194)
      Dask graph 1 chunks in 7 graph layers
      Data type float64 numpy.ndarray
      194 110
    • in_chunk_y
      (y, x)
      float64
      dask.array<chunksize=(194, 110), meta=np.ndarray>
      Array Chunk
      Bytes 166.72 kiB 166.72 kiB
      Shape (194, 110) (194, 110)
      Dask graph 1 chunks in 7 graph layers
      Data type float64 numpy.ndarray
      110 194
    • index_x
      (x, y)
      float64
      dask.array<chunksize=(110, 194), meta=np.ndarray>
      Array Chunk
      Bytes 166.72 kiB 166.72 kiB
      Shape (110, 194) (110, 194)
      Dask graph 1 chunks in 7 graph layers
      Data type float64 numpy.ndarray
      194 110
    • index_y
      (y, x)
      float64
      dask.array<chunksize=(194, 110), meta=np.ndarray>
      Array Chunk
      Bytes 166.72 kiB 166.72 kiB
      Shape (194, 110) (194, 110)
      Dask graph 1 chunks in 7 graph layers
      Data type float64 numpy.ndarray
      110 194
    • latitude
      (y, x)
      float64
      dask.array<chunksize=(21, 110), meta=np.ndarray>
      Array Chunk
      Bytes 166.72 kiB 114.30 kiB
      Shape (194, 110) (133, 110)
      Dask graph 3 chunks in 7 graph layers
      Data type float64 numpy.ndarray
      110 194
    • longitude
      (y, x)
      float64
      dask.array<chunksize=(21, 110), meta=np.ndarray>
      Array Chunk
      Bytes 166.72 kiB 114.30 kiB
      Shape (194, 110) (133, 110)
      Dask graph 3 chunks in 7 graph layers
      Data type float64 numpy.ndarray
      110 194
    • x
      PandasIndex
      PandasIndex(Index([-1149520.1425219304, -1146520.1425219304, -1143520.1425219304,
             -1140520.1425219304, -1137520.1425219304, -1134520.1425219304,
             -1131520.1425219304, -1128520.1425219304, -1125520.1425219304,
             -1122520.1425219304,
             ...
              -849520.1425219304,  -846520.1425219304,  -843520.1425219304,
              -840520.1425219304,  -837520.1425219304,  -834520.1425219304,
              -831520.1425219304,  -828520.1425219304,  -825520.1425219304,
              -822520.1425219304],
            dtype='float64', name='x', length=110))
    • y
      PandasIndex
      PandasIndex(Index([-453306.1525566636, -450306.1525566636, -447306.1525566636,
             -444306.1525566636, -441306.1525566636, -438306.1525566636,
             -435306.1525566636, -432306.1525566636, -429306.1525566636,
             -426306.1525566636,
             ...
              98693.84744333639, 101693.84744333639, 104693.84744333639,
             107693.84744333639, 110693.84744333639, 113693.84744333639,
             116693.84744333639, 119693.84744333639, 122693.84744333639,
             125693.84744333639],
            dtype='float64', name='y', length=194))

Note that while we've dropped rows and columns with no datapoints in the box, we still have lots of null datapoints where either the projection x or the projection y coordinate is sometimes within the box, but the combination is not.

Let's get the unique chunk ids from this dataset.

In [28]:
def get_unique(data):
    # We have to implement our own "unique" logic since missing values are NaN (a float) and the rest are string
    data = data.fillna(None).values.flatten()
    data = data[data != None]
    return np.unique(data)

chunk_ids = get_unique(area.chunk_id)

Now we download the chunks from AWS, decompress, and join the chunk data with the grid data.

In [34]:
def get_chunk(zarr_id, chunk_id):
    # retrieve data as before
    compressed_data = retrieve_object(s3, create_s3_chunk_url(zarr_id, chunk_id))
    chunk_data = decompress_chunk(zarr_id, compressed_data)
    
    # combine retrieved data with the chunk grid
    chunk_index.load()
    chunk_xarray = chunk_index.where(lambda x: x.chunk_id == chunk_id, drop=True)
    dimensions = ("y", "x") if zarr_id.model_type == "anl" else ("time", "y", "x")
    chunk_xarray[zarr_id.var_name] = (dimensions, chunk_data)
    return chunk_xarray

def get_chunks_combined(zarr_id, chunk_ids):
    chunks = [get_chunk(zarr_id, chunk_id) for chunk_id in chunk_ids]
    return xr.merge(chunks)

    
data = get_chunks_combined(zarr_id, chunk_ids)
data
Out[34]:
<xarray.Dataset> Size: 7MB
Dimensions:     (x: 300, y: 300)
Coordinates:
  * x           (x) float64 2kB -1.348e+06 -1.345e+06 ... -4.535e+05 -4.505e+05
  * y           (y) float64 2kB -6.873e+05 -6.843e+05 ... 2.067e+05 2.097e+05
Data variables:
    chunk_id    (x, y) object 720kB '2.3' '2.3' '2.3' ... '3.4' '3.4' '3.4'
    chunk_x     (x, y) float64 720kB 3.0 3.0 3.0 3.0 3.0 ... 4.0 4.0 4.0 4.0 4.0
    chunk_y     (y, x) float64 720kB 2.0 2.0 2.0 2.0 2.0 ... 3.0 3.0 3.0 3.0 3.0
    in_chunk_x  (x, y) float64 720kB 0.0 0.0 0.0 0.0 ... 149.0 149.0 149.0 149.0
    in_chunk_y  (y, x) float64 720kB 0.0 0.0 0.0 0.0 ... 149.0 149.0 149.0 149.0
    index_x     (x, y) float64 720kB 450.0 450.0 450.0 ... 749.0 749.0 749.0
    index_y     (y, x) float64 720kB 300.0 300.0 300.0 ... 599.0 599.0 599.0
    latitude    (y, x) float64 720kB 31.4 31.41 31.41 ... 40.27 40.27 40.27
    longitude   (y, x) float64 720kB -111.6 -111.6 -111.6 ... -102.8 -102.8
    TMP         (y, x) float16 180kB 276.8 276.8 277.0 ... 258.2 258.5 258.5
xarray.Dataset
    • x: 300
    • y: 300
    • x
      (x)
      float64
      -1.348e+06 ... -4.505e+05
      array([-1347520.142522, -1344520.142522, -1341520.142522, ...,  -456520.142522,
              -453520.142522,  -450520.142522])
    • y
      (y)
      float64
      -6.873e+05 -6.843e+05 ... 2.097e+05
      array([-687306.152557, -684306.152557, -681306.152557, ...,  203693.847443,
              206693.847443,  209693.847443])
    • chunk_id
      (x, y)
      object
      '2.3' '2.3' '2.3' ... '3.4' '3.4'
      array([['2.3', '2.3', '2.3', ..., '3.3', '3.3', '3.3'],
             ['2.3', '2.3', '2.3', ..., '3.3', '3.3', '3.3'],
             ['2.3', '2.3', '2.3', ..., '3.3', '3.3', '3.3'],
             ...,
             ['2.4', '2.4', '2.4', ..., '3.4', '3.4', '3.4'],
             ['2.4', '2.4', '2.4', ..., '3.4', '3.4', '3.4'],
             ['2.4', '2.4', '2.4', ..., '3.4', '3.4', '3.4']], dtype=object)
    • chunk_x
      (x, y)
      float64
      3.0 3.0 3.0 3.0 ... 4.0 4.0 4.0 4.0
      array([[3., 3., 3., ..., 3., 3., 3.],
             [3., 3., 3., ..., 3., 3., 3.],
             [3., 3., 3., ..., 3., 3., 3.],
             ...,
             [4., 4., 4., ..., 4., 4., 4.],
             [4., 4., 4., ..., 4., 4., 4.],
             [4., 4., 4., ..., 4., 4., 4.]])
    • chunk_y
      (y, x)
      float64
      2.0 2.0 2.0 2.0 ... 3.0 3.0 3.0 3.0
      array([[2., 2., 2., ..., 2., 2., 2.],
             [2., 2., 2., ..., 2., 2., 2.],
             [2., 2., 2., ..., 2., 2., 2.],
             ...,
             [3., 3., 3., ..., 3., 3., 3.],
             [3., 3., 3., ..., 3., 3., 3.],
             [3., 3., 3., ..., 3., 3., 3.]])
    • in_chunk_x
      (x, y)
      float64
      0.0 0.0 0.0 ... 149.0 149.0 149.0
      array([[  0.,   0.,   0., ...,   0.,   0.,   0.],
             [  1.,   1.,   1., ...,   1.,   1.,   1.],
             [  2.,   2.,   2., ...,   2.,   2.,   2.],
             ...,
             [147., 147., 147., ..., 147., 147., 147.],
             [148., 148., 148., ..., 148., 148., 148.],
             [149., 149., 149., ..., 149., 149., 149.]])
    • in_chunk_y
      (y, x)
      float64
      0.0 0.0 0.0 ... 149.0 149.0 149.0
      array([[  0.,   0.,   0., ...,   0.,   0.,   0.],
             [  1.,   1.,   1., ...,   1.,   1.,   1.],
             [  2.,   2.,   2., ...,   2.,   2.,   2.],
             ...,
             [147., 147., 147., ..., 147., 147., 147.],
             [148., 148., 148., ..., 148., 148., 148.],
             [149., 149., 149., ..., 149., 149., 149.]])
    • index_x
      (x, y)
      float64
      450.0 450.0 450.0 ... 749.0 749.0
      array([[450., 450., 450., ..., 450., 450., 450.],
             [451., 451., 451., ..., 451., 451., 451.],
             [452., 452., 452., ..., 452., 452., 452.],
             ...,
             [747., 747., 747., ..., 747., 747., 747.],
             [748., 748., 748., ..., 748., 748., 748.],
             [749., 749., 749., ..., 749., 749., 749.]])
    • index_y
      (y, x)
      float64
      300.0 300.0 300.0 ... 599.0 599.0
      array([[300., 300., 300., ..., 300., 300., 300.],
             [301., 301., 301., ..., 301., 301., 301.],
             [302., 302., 302., ..., 302., 302., 302.],
             ...,
             [597., 597., 597., ..., 597., 597., 597.],
             [598., 598., 598., ..., 598., 598., 598.],
             [599., 599., 599., ..., 599., 599., 599.]])
    • latitude
      (y, x)
      float64
      31.4 31.41 31.41 ... 40.27 40.27
      array([[31.40369289, 31.40778857, 31.41187537, ..., 32.2237807 ,
              32.22518206, 32.2265742 ],
             [31.43015611, 31.4342534 , 31.43834179, ..., 32.25056587,
              32.25196778, 32.25336047],
             [31.45662056, 31.46071945, 31.46480945, ..., 32.27735233,
              32.27875478, 32.28014802],
             ...,
             [39.29350372, 39.29808764, 39.30266163, ..., 40.21159423,
              40.21316347, 40.21472239],
             [39.3200864 , 39.324672  , 39.32924767, ..., 40.23851457,
              40.24008439, 40.24164388],
             [39.3466686 , 39.35125588, 39.35583323, ..., 40.26543448,
              40.26700488, 40.26856495]])
    • longitude
      (y, x)
      float64
      -111.6 -111.6 ... -102.8 -102.8
      array([[-111.64807219, -111.61706639, -111.58605737, ..., -102.32684331,
              -102.2951815 , -102.26351856],
             [-111.6528776 , -111.62186161, -111.59084238, ..., -102.32850583,
              -102.29683315, -102.26515934],
             [-111.65768625, -111.62666005, -111.59563062, ..., -102.33016949,
              -102.29848594, -102.26680125],
             ...,
             [-113.23324901, -113.19889805, -113.16454267, ..., -102.8766022 ,
              -102.84134966, -102.80609555],
             [-113.23918069, -113.20481728, -113.17044946, ..., -102.87866462,
              -102.84339862, -102.80813104],
             [-113.2451168 , -113.21074094, -113.17636066, ..., -102.88072862,
              -102.84544914, -102.81016809]])
    • TMP
      (y, x)
      float16
      276.8 276.8 277.0 ... 258.5 258.5
      array([[276.8, 276.8, 277. , ..., 269.2, 269.5, 269.5],
             [276.8, 276.8, 276.8, ..., 269.5, 269.5, 269.5],
             [276.8, 276.8, 276.8, ..., 269.2, 269.2, 269.2],
             ...,
             [261.8, 262.2, 262.5, ..., 258.2, 258.5, 258.8],
             [261.8, 262.5, 262.2, ..., 258.2, 258.5, 258.8],
             [259.2, 259.2, 262.5, ..., 258.2, 258.5, 258.5]], dtype=float16)
    • x
      PandasIndex
      PandasIndex(Index([-1347520.1425219304, -1344520.1425219304, -1341520.1425219304,
             -1338520.1425219304, -1335520.1425219304, -1332520.1425219304,
             -1329520.1425219304, -1326520.1425219304, -1323520.1425219304,
             -1320520.1425219304,
             ...
              -477520.1425219304,  -474520.1425219304,  -471520.1425219304,
              -468520.1425219304,  -465520.1425219304,  -462520.1425219304,
              -459520.1425219304,  -456520.1425219304,  -453520.1425219304,
              -450520.1425219304],
            dtype='float64', name='x', length=300))
    • y
      PandasIndex
      PandasIndex(Index([-687306.1525566636, -684306.1525566636, -681306.1525566636,
             -678306.1525566636, -675306.1525566636, -672306.1525566636,
             -669306.1525566636, -666306.1525566636, -663306.1525566636,
             -660306.1525566636,
             ...
              182693.8474433364,  185693.8474433364,  188693.8474433364,
              191693.8474433364,  194693.8474433364,  197693.8474433364,
              200693.8474433364,  203693.8474433364,  206693.8474433364,
              209693.8474433364],
            dtype='float64', name='y', length=300))

This has all the data for the four chunks. If we truly only want to look at the area of interest, we need to apply the filter again.

In [35]:
data.where(check_boundaries, drop=True)
Out[35]:
<xarray.Dataset> Size: 2MB
Dimensions:     (x: 110, y: 194)
Coordinates:
  * x           (x) float64 880B -1.15e+06 -1.147e+06 ... -8.255e+05 -8.225e+05
  * y           (y) float64 2kB -4.533e+05 -4.503e+05 ... 1.227e+05 1.257e+05
Data variables:
    chunk_id    (x, y) object 171kB nan nan nan nan nan ... nan nan nan nan nan
    chunk_x     (x, y) float64 171kB nan nan nan nan nan ... nan nan nan nan nan
    chunk_y     (y, x) float64 171kB nan nan nan nan nan ... nan nan nan nan nan
    in_chunk_x  (x, y) float64 171kB nan nan nan nan nan ... nan nan nan nan nan
    in_chunk_y  (y, x) float64 171kB nan nan nan nan nan ... nan nan nan nan nan
    index_x     (x, y) float64 171kB nan nan nan nan nan ... nan nan nan nan nan
    index_y     (y, x) float64 171kB nan nan nan nan nan ... nan nan nan nan nan
    latitude    (y, x) float64 171kB nan nan nan nan nan ... nan nan nan nan nan
    longitude   (y, x) float64 171kB nan nan nan nan nan ... nan nan nan nan nan
    TMP         (y, x) float16 43kB nan nan nan nan nan ... nan nan nan nan nan
xarray.Dataset
    • x: 110
    • y: 194
    • x
      (x)
      float64
      -1.15e+06 -1.147e+06 ... -8.225e+05
      array([-1149520.142522, -1146520.142522, -1143520.142522, -1140520.142522,
             -1137520.142522, -1134520.142522, -1131520.142522, -1128520.142522,
             -1125520.142522, -1122520.142522, -1119520.142522, -1116520.142522,
             -1113520.142522, -1110520.142522, -1107520.142522, -1104520.142522,
             -1101520.142522, -1098520.142522, -1095520.142522, -1092520.142522,
             -1089520.142522, -1086520.142522, -1083520.142522, -1080520.142522,
             -1077520.142522, -1074520.142522, -1071520.142522, -1068520.142522,
             -1065520.142522, -1062520.142522, -1059520.142522, -1056520.142522,
             -1053520.142522, -1050520.142522, -1047520.142522, -1044520.142522,
             -1041520.142522, -1038520.142522, -1035520.142522, -1032520.142522,
             -1029520.142522, -1026520.142522, -1023520.142522, -1020520.142522,
             -1017520.142522, -1014520.142522, -1011520.142522, -1008520.142522,
             -1005520.142522, -1002520.142522,  -999520.142522,  -996520.142522,
              -993520.142522,  -990520.142522,  -987520.142522,  -984520.142522,
              -981520.142522,  -978520.142522,  -975520.142522,  -972520.142522,
              -969520.142522,  -966520.142522,  -963520.142522,  -960520.142522,
              -957520.142522,  -954520.142522,  -951520.142522,  -948520.142522,
              -945520.142522,  -942520.142522,  -939520.142522,  -936520.142522,
              -933520.142522,  -930520.142522,  -927520.142522,  -924520.142522,
              -921520.142522,  -918520.142522,  -915520.142522,  -912520.142522,
              -909520.142522,  -906520.142522,  -903520.142522,  -900520.142522,
              -897520.142522,  -894520.142522,  -891520.142522,  -888520.142522,
              -885520.142522,  -882520.142522,  -879520.142522,  -876520.142522,
              -873520.142522,  -870520.142522,  -867520.142522,  -864520.142522,
              -861520.142522,  -858520.142522,  -855520.142522,  -852520.142522,
              -849520.142522,  -846520.142522,  -843520.142522,  -840520.142522,
              -837520.142522,  -834520.142522,  -831520.142522,  -828520.142522,
              -825520.142522,  -822520.142522])
    • y
      (y)
      float64
      -4.533e+05 -4.503e+05 ... 1.257e+05
      array([-4.533062e+05, -4.503062e+05, -4.473062e+05, -4.443062e+05,
             -4.413062e+05, -4.383062e+05, -4.353062e+05, -4.323062e+05,
             -4.293062e+05, -4.263062e+05, -4.233062e+05, -4.203062e+05,
             -4.173062e+05, -4.143062e+05, -4.113062e+05, -4.083062e+05,
             -4.053062e+05, -4.023062e+05, -3.993062e+05, -3.963062e+05,
             -3.933062e+05, -3.903062e+05, -3.873062e+05, -3.843062e+05,
             -3.813062e+05, -3.783062e+05, -3.753062e+05, -3.723062e+05,
             -3.693062e+05, -3.663062e+05, -3.633062e+05, -3.603062e+05,
             -3.573062e+05, -3.543062e+05, -3.513062e+05, -3.483062e+05,
             -3.453062e+05, -3.423062e+05, -3.393062e+05, -3.363062e+05,
             -3.333062e+05, -3.303062e+05, -3.273062e+05, -3.243062e+05,
             -3.213062e+05, -3.183062e+05, -3.153062e+05, -3.123062e+05,
             -3.093062e+05, -3.063062e+05, -3.033062e+05, -3.003062e+05,
             -2.973062e+05, -2.943062e+05, -2.913062e+05, -2.883062e+05,
             -2.853062e+05, -2.823062e+05, -2.793062e+05, -2.763062e+05,
             -2.733062e+05, -2.703062e+05, -2.673062e+05, -2.643062e+05,
             -2.613062e+05, -2.583062e+05, -2.553062e+05, -2.523062e+05,
             -2.493062e+05, -2.463062e+05, -2.433062e+05, -2.403062e+05,
             -2.373062e+05, -2.343062e+05, -2.313062e+05, -2.283062e+05,
             -2.253062e+05, -2.223062e+05, -2.193062e+05, -2.163062e+05,
             -2.133062e+05, -2.103062e+05, -2.073062e+05, -2.043062e+05,
             -2.013062e+05, -1.983062e+05, -1.953062e+05, -1.923062e+05,
             -1.893062e+05, -1.863062e+05, -1.833062e+05, -1.803062e+05,
             -1.773062e+05, -1.743062e+05, -1.713062e+05, -1.683062e+05,
             -1.653062e+05, -1.623062e+05, -1.593062e+05, -1.563062e+05,
             -1.533062e+05, -1.503062e+05, -1.473062e+05, -1.443062e+05,
             -1.413062e+05, -1.383062e+05, -1.353062e+05, -1.323062e+05,
             -1.293062e+05, -1.263062e+05, -1.233062e+05, -1.203062e+05,
             -1.173062e+05, -1.143062e+05, -1.113062e+05, -1.083062e+05,
             -1.053062e+05, -1.023062e+05, -9.930615e+04, -9.630615e+04,
             -9.330615e+04, -9.030615e+04, -8.730615e+04, -8.430615e+04,
             -8.130615e+04, -7.830615e+04, -7.530615e+04, -7.230615e+04,
             -6.930615e+04, -6.630615e+04, -6.330615e+04, -6.030615e+04,
             -5.730615e+04, -5.430615e+04, -5.130615e+04, -4.830615e+04,
             -4.530615e+04, -4.230615e+04, -3.930615e+04, -3.630615e+04,
             -3.330615e+04, -3.030615e+04, -2.730615e+04, -2.430615e+04,
             -2.130615e+04, -1.830615e+04, -1.530615e+04, -1.230615e+04,
             -9.306153e+03, -6.306153e+03, -3.306153e+03, -3.061526e+02,
              2.693847e+03,  5.693847e+03,  8.693847e+03,  1.169385e+04,
              1.469385e+04,  1.769385e+04,  2.069385e+04,  2.369385e+04,
              2.669385e+04,  2.969385e+04,  3.269385e+04,  3.569385e+04,
              3.869385e+04,  4.169385e+04,  4.469385e+04,  4.769385e+04,
              5.069385e+04,  5.369385e+04,  5.669385e+04,  5.969385e+04,
              6.269385e+04,  6.569385e+04,  6.869385e+04,  7.169385e+04,
              7.469385e+04,  7.769385e+04,  8.069385e+04,  8.369385e+04,
              8.669385e+04,  8.969385e+04,  9.269385e+04,  9.569385e+04,
              9.869385e+04,  1.016938e+05,  1.046938e+05,  1.076938e+05,
              1.106938e+05,  1.136938e+05,  1.166938e+05,  1.196938e+05,
              1.226938e+05,  1.256938e+05])
    • chunk_id
      (x, y)
      object
      nan nan nan nan ... nan nan nan nan
      array([[nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             ...,
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan]], dtype=object)
    • chunk_x
      (x, y)
      float64
      nan nan nan nan ... nan nan nan nan
      array([[nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             ...,
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan]])
    • chunk_y
      (y, x)
      float64
      nan nan nan nan ... nan nan nan nan
      array([[nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             ...,
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan]])
    • in_chunk_x
      (x, y)
      float64
      nan nan nan nan ... nan nan nan nan
      array([[nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             ...,
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan]])
    • in_chunk_y
      (y, x)
      float64
      nan nan nan nan ... nan nan nan nan
      array([[nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             ...,
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan]])
    • index_x
      (x, y)
      float64
      nan nan nan nan ... nan nan nan nan
      array([[nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             ...,
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan]])
    • index_y
      (y, x)
      float64
      nan nan nan nan ... nan nan nan nan
      array([[nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             ...,
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan]])
    • latitude
      (y, x)
      float64
      nan nan nan nan ... nan nan nan nan
      array([[nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             ...,
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan]])
    • longitude
      (y, x)
      float64
      nan nan nan nan ... nan nan nan nan
      array([[nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             ...,
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan]])
    • TMP
      (y, x)
      float16
      nan nan nan nan ... nan nan nan nan
      array([[nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             ...,
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan],
             [nan, nan, nan, ..., nan, nan, nan]], dtype=float16)
    • x
      PandasIndex
      PandasIndex(Index([-1149520.1425219304, -1146520.1425219304, -1143520.1425219304,
             -1140520.1425219304, -1137520.1425219304, -1134520.1425219304,
             -1131520.1425219304, -1128520.1425219304, -1125520.1425219304,
             -1122520.1425219304,
             ...
              -849520.1425219304,  -846520.1425219304,  -843520.1425219304,
              -840520.1425219304,  -837520.1425219304,  -834520.1425219304,
              -831520.1425219304,  -828520.1425219304,  -825520.1425219304,
              -822520.1425219304],
            dtype='float64', name='x', length=110))
    • y
      PandasIndex
      PandasIndex(Index([-453306.1525566636, -450306.1525566636, -447306.1525566636,
             -444306.1525566636, -441306.1525566636, -438306.1525566636,
             -435306.1525566636, -432306.1525566636, -429306.1525566636,
             -426306.1525566636,
             ...
              98693.84744333639, 101693.84744333639, 104693.84744333639,
             107693.84744333639, 110693.84744333639, 113693.84744333639,
             116693.84744333639, 119693.84744333639, 122693.84744333639,
             125693.84744333639],
            dtype='float64', name='y', length=194))

Despite all the NaN's there's still data!

In [36]:
data.TMP.plot()
Out[36]:
<matplotlib.collections.QuadMesh at 0x16b500bb0>

Finishing it off to get data at different times:

In [37]:
start = datetime.datetime(2018, 1, 1, 0)
times = [start + datetime.timedelta(weeks=week_delta) for week_delta in range(2)]

zarr_ids = [dataclasses.replace(zarr_id, run_hour=time) for time in times]

def get_data(zarr_ids, chunk_ids, is_forecast):
    datasets = []
    for zarr_id in zarr_ids:
        data = get_chunks_combined(zarr_id, chunk_ids)
        new_time_dimension = "run_time" if is_forecast else "time"
        data[new_time_dimension] = zarr_id.run_hour
        datasets.append(data)
    ds = xr.concat(datasets, dim=new_time_dimension, combine_attrs="override")
    return ds
    
get_data(zarr_ids, chunk_ids, False)
Out[37]:
<xarray.Dataset> Size: 13MB
Dimensions:     (x: 300, y: 300, time: 2)
Coordinates:
  * x           (x) float64 2kB -1.348e+06 -1.345e+06 ... -4.535e+05 -4.505e+05
  * y           (y) float64 2kB -6.873e+05 -6.843e+05 ... 2.067e+05 2.097e+05
  * time        (time) datetime64[ns] 16B 2018-01-01 2018-01-08
Data variables:
    chunk_id    (time, x, y) object 1MB '2.3' '2.3' '2.3' ... '3.4' '3.4' '3.4'
    chunk_x     (time, x, y) float64 1MB 3.0 3.0 3.0 3.0 3.0 ... 4.0 4.0 4.0 4.0
    chunk_y     (time, y, x) float64 1MB 2.0 2.0 2.0 2.0 2.0 ... 3.0 3.0 3.0 3.0
    in_chunk_x  (time, x, y) float64 1MB 0.0 0.0 0.0 0.0 ... 149.0 149.0 149.0
    in_chunk_y  (time, y, x) float64 1MB 0.0 0.0 0.0 0.0 ... 149.0 149.0 149.0
    index_x     (time, x, y) float64 1MB 450.0 450.0 450.0 ... 749.0 749.0 749.0
    index_y     (time, y, x) float64 1MB 300.0 300.0 300.0 ... 599.0 599.0 599.0
    latitude    (time, y, x) float64 1MB 31.4 31.41 31.41 ... 40.27 40.27 40.27
    longitude   (time, y, x) float64 1MB -111.6 -111.6 -111.6 ... -102.8 -102.8
    TMP         (time, y, x) float16 360kB 295.2 295.2 295.2 ... 277.2 277.2
xarray.Dataset
    • x: 300
    • y: 300
    • time: 2
    • x
      (x)
      float64
      -1.348e+06 ... -4.505e+05
      array([-1347520.142522, -1344520.142522, -1341520.142522, ...,  -456520.142522,
              -453520.142522,  -450520.142522])
    • y
      (y)
      float64
      -6.873e+05 -6.843e+05 ... 2.097e+05
      array([-687306.152557, -684306.152557, -681306.152557, ...,  203693.847443,
              206693.847443,  209693.847443])
    • time
      (time)
      datetime64[ns]
      2018-01-01 2018-01-08
      array(['2018-01-01T00:00:00.000000000', '2018-01-08T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • chunk_id
      (time, x, y)
      object
      '2.3' '2.3' '2.3' ... '3.4' '3.4'
      array([[['2.3', '2.3', '2.3', ..., '3.3', '3.3', '3.3'],
              ['2.3', '2.3', '2.3', ..., '3.3', '3.3', '3.3'],
              ['2.3', '2.3', '2.3', ..., '3.3', '3.3', '3.3'],
              ...,
              ['2.4', '2.4', '2.4', ..., '3.4', '3.4', '3.4'],
              ['2.4', '2.4', '2.4', ..., '3.4', '3.4', '3.4'],
              ['2.4', '2.4', '2.4', ..., '3.4', '3.4', '3.4']],
      
             [['2.3', '2.3', '2.3', ..., '3.3', '3.3', '3.3'],
              ['2.3', '2.3', '2.3', ..., '3.3', '3.3', '3.3'],
              ['2.3', '2.3', '2.3', ..., '3.3', '3.3', '3.3'],
              ...,
              ['2.4', '2.4', '2.4', ..., '3.4', '3.4', '3.4'],
              ['2.4', '2.4', '2.4', ..., '3.4', '3.4', '3.4'],
              ['2.4', '2.4', '2.4', ..., '3.4', '3.4', '3.4']]], dtype=object)
    • chunk_x
      (time, x, y)
      float64
      3.0 3.0 3.0 3.0 ... 4.0 4.0 4.0 4.0
      array([[[3., 3., 3., ..., 3., 3., 3.],
              [3., 3., 3., ..., 3., 3., 3.],
              [3., 3., 3., ..., 3., 3., 3.],
              ...,
              [4., 4., 4., ..., 4., 4., 4.],
              [4., 4., 4., ..., 4., 4., 4.],
              [4., 4., 4., ..., 4., 4., 4.]],
      
             [[3., 3., 3., ..., 3., 3., 3.],
              [3., 3., 3., ..., 3., 3., 3.],
              [3., 3., 3., ..., 3., 3., 3.],
              ...,
              [4., 4., 4., ..., 4., 4., 4.],
              [4., 4., 4., ..., 4., 4., 4.],
              [4., 4., 4., ..., 4., 4., 4.]]])
    • chunk_y
      (time, y, x)
      float64
      2.0 2.0 2.0 2.0 ... 3.0 3.0 3.0 3.0
      array([[[2., 2., 2., ..., 2., 2., 2.],
              [2., 2., 2., ..., 2., 2., 2.],
              [2., 2., 2., ..., 2., 2., 2.],
              ...,
              [3., 3., 3., ..., 3., 3., 3.],
              [3., 3., 3., ..., 3., 3., 3.],
              [3., 3., 3., ..., 3., 3., 3.]],
      
             [[2., 2., 2., ..., 2., 2., 2.],
              [2., 2., 2., ..., 2., 2., 2.],
              [2., 2., 2., ..., 2., 2., 2.],
              ...,
              [3., 3., 3., ..., 3., 3., 3.],
              [3., 3., 3., ..., 3., 3., 3.],
              [3., 3., 3., ..., 3., 3., 3.]]])
    • in_chunk_x
      (time, x, y)
      float64
      0.0 0.0 0.0 ... 149.0 149.0 149.0
      array([[[  0.,   0.,   0., ...,   0.,   0.,   0.],
              [  1.,   1.,   1., ...,   1.,   1.,   1.],
              [  2.,   2.,   2., ...,   2.,   2.,   2.],
              ...,
              [147., 147., 147., ..., 147., 147., 147.],
              [148., 148., 148., ..., 148., 148., 148.],
              [149., 149., 149., ..., 149., 149., 149.]],
      
             [[  0.,   0.,   0., ...,   0.,   0.,   0.],
              [  1.,   1.,   1., ...,   1.,   1.,   1.],
              [  2.,   2.,   2., ...,   2.,   2.,   2.],
              ...,
              [147., 147., 147., ..., 147., 147., 147.],
              [148., 148., 148., ..., 148., 148., 148.],
              [149., 149., 149., ..., 149., 149., 149.]]])
    • in_chunk_y
      (time, y, x)
      float64
      0.0 0.0 0.0 ... 149.0 149.0 149.0
      array([[[  0.,   0.,   0., ...,   0.,   0.,   0.],
              [  1.,   1.,   1., ...,   1.,   1.,   1.],
              [  2.,   2.,   2., ...,   2.,   2.,   2.],
              ...,
              [147., 147., 147., ..., 147., 147., 147.],
              [148., 148., 148., ..., 148., 148., 148.],
              [149., 149., 149., ..., 149., 149., 149.]],
      
             [[  0.,   0.,   0., ...,   0.,   0.,   0.],
              [  1.,   1.,   1., ...,   1.,   1.,   1.],
              [  2.,   2.,   2., ...,   2.,   2.,   2.],
              ...,
              [147., 147., 147., ..., 147., 147., 147.],
              [148., 148., 148., ..., 148., 148., 148.],
              [149., 149., 149., ..., 149., 149., 149.]]])
    • index_x
      (time, x, y)
      float64
      450.0 450.0 450.0 ... 749.0 749.0
      array([[[450., 450., 450., ..., 450., 450., 450.],
              [451., 451., 451., ..., 451., 451., 451.],
              [452., 452., 452., ..., 452., 452., 452.],
              ...,
              [747., 747., 747., ..., 747., 747., 747.],
              [748., 748., 748., ..., 748., 748., 748.],
              [749., 749., 749., ..., 749., 749., 749.]],
      
             [[450., 450., 450., ..., 450., 450., 450.],
              [451., 451., 451., ..., 451., 451., 451.],
              [452., 452., 452., ..., 452., 452., 452.],
              ...,
              [747., 747., 747., ..., 747., 747., 747.],
              [748., 748., 748., ..., 748., 748., 748.],
              [749., 749., 749., ..., 749., 749., 749.]]])
    • index_y
      (time, y, x)
      float64
      300.0 300.0 300.0 ... 599.0 599.0
      array([[[300., 300., 300., ..., 300., 300., 300.],
              [301., 301., 301., ..., 301., 301., 301.],
              [302., 302., 302., ..., 302., 302., 302.],
              ...,
              [597., 597., 597., ..., 597., 597., 597.],
              [598., 598., 598., ..., 598., 598., 598.],
              [599., 599., 599., ..., 599., 599., 599.]],
      
             [[300., 300., 300., ..., 300., 300., 300.],
              [301., 301., 301., ..., 301., 301., 301.],
              [302., 302., 302., ..., 302., 302., 302.],
              ...,
              [597., 597., 597., ..., 597., 597., 597.],
              [598., 598., 598., ..., 598., 598., 598.],
              [599., 599., 599., ..., 599., 599., 599.]]])
    • latitude
      (time, y, x)
      float64
      31.4 31.41 31.41 ... 40.27 40.27
      array([[[31.40369289, 31.40778857, 31.41187537, ..., 32.2237807 ,
               32.22518206, 32.2265742 ],
              [31.43015611, 31.4342534 , 31.43834179, ..., 32.25056587,
               32.25196778, 32.25336047],
              [31.45662056, 31.46071945, 31.46480945, ..., 32.27735233,
               32.27875478, 32.28014802],
              ...,
              [39.29350372, 39.29808764, 39.30266163, ..., 40.21159423,
               40.21316347, 40.21472239],
              [39.3200864 , 39.324672  , 39.32924767, ..., 40.23851457,
               40.24008439, 40.24164388],
              [39.3466686 , 39.35125588, 39.35583323, ..., 40.26543448,
               40.26700488, 40.26856495]],
      
             [[31.40369289, 31.40778857, 31.41187537, ..., 32.2237807 ,
               32.22518206, 32.2265742 ],
              [31.43015611, 31.4342534 , 31.43834179, ..., 32.25056587,
               32.25196778, 32.25336047],
              [31.45662056, 31.46071945, 31.46480945, ..., 32.27735233,
               32.27875478, 32.28014802],
              ...,
              [39.29350372, 39.29808764, 39.30266163, ..., 40.21159423,
               40.21316347, 40.21472239],
              [39.3200864 , 39.324672  , 39.32924767, ..., 40.23851457,
               40.24008439, 40.24164388],
              [39.3466686 , 39.35125588, 39.35583323, ..., 40.26543448,
               40.26700488, 40.26856495]]])
    • longitude
      (time, y, x)
      float64
      -111.6 -111.6 ... -102.8 -102.8
      array([[[-111.64807219, -111.61706639, -111.58605737, ...,
               -102.32684331, -102.2951815 , -102.26351856],
              [-111.6528776 , -111.62186161, -111.59084238, ...,
               -102.32850583, -102.29683315, -102.26515934],
              [-111.65768625, -111.62666005, -111.59563062, ...,
               -102.33016949, -102.29848594, -102.26680125],
              ...,
              [-113.23324901, -113.19889805, -113.16454267, ...,
               -102.8766022 , -102.84134966, -102.80609555],
              [-113.23918069, -113.20481728, -113.17044946, ...,
               -102.87866462, -102.84339862, -102.80813104],
              [-113.2451168 , -113.21074094, -113.17636066, ...,
               -102.88072862, -102.84544914, -102.81016809]],
      
             [[-111.64807219, -111.61706639, -111.58605737, ...,
               -102.32684331, -102.2951815 , -102.26351856],
              [-111.6528776 , -111.62186161, -111.59084238, ...,
               -102.32850583, -102.29683315, -102.26515934],
              [-111.65768625, -111.62666005, -111.59563062, ...,
               -102.33016949, -102.29848594, -102.26680125],
              ...,
              [-113.23324901, -113.19889805, -113.16454267, ...,
               -102.8766022 , -102.84134966, -102.80609555],
              [-113.23918069, -113.20481728, -113.17044946, ...,
               -102.87866462, -102.84339862, -102.80813104],
              [-113.2451168 , -113.21074094, -113.17636066, ...,
               -102.88072862, -102.84544914, -102.81016809]]])
    • TMP
      (time, y, x)
      float16
      295.2 295.2 295.2 ... 277.2 277.2
      array([[[295.2, 295.2, 295.2, ..., 266.2, 266.2, 266.2],
              [295.8, 295.5, 295.2, ..., 266.2, 266.2, 266.2],
              [296. , 295.5, 294.8, ..., 266.2, 266.2, 266. ],
              ...,
              [277.2, 277.5, 278.8, ..., 260.5, 260.8, 261. ],
              [277. , 277.8, 278.8, ..., 260.8, 260.8, 261. ],
              [275.5, 275.5, 278.2, ..., 260.5, 260.8, 261. ]],
      
             [[295.5, 295.2, 295. , ..., 287.5, 287.2, 287.2],
              [295.5, 295.2, 294.8, ..., 287.2, 287.2, 287.2],
              [295.5, 295. , 294.2, ..., 287.2, 287.2, 287.2],
              ...,
              [278.2, 278.2, 277.2, ..., 277.5, 277.5, 277.8],
              [278.8, 279.2, 277.8, ..., 277.2, 277.5, 277.5],
              [277.5, 276.2, 277.8, ..., 277.2, 277.2, 277.2]]], dtype=float16)
    • x
      PandasIndex
      PandasIndex(Index([-1347520.1425219304, -1344520.1425219304, -1341520.1425219304,
             -1338520.1425219304, -1335520.1425219304, -1332520.1425219304,
             -1329520.1425219304, -1326520.1425219304, -1323520.1425219304,
             -1320520.1425219304,
             ...
              -477520.1425219304,  -474520.1425219304,  -471520.1425219304,
              -468520.1425219304,  -465520.1425219304,  -462520.1425219304,
              -459520.1425219304,  -456520.1425219304,  -453520.1425219304,
              -450520.1425219304],
            dtype='float64', name='x', length=300))
    • y
      PandasIndex
      PandasIndex(Index([-687306.1525566636, -684306.1525566636, -681306.1525566636,
             -678306.1525566636, -675306.1525566636, -672306.1525566636,
             -669306.1525566636, -666306.1525566636, -663306.1525566636,
             -660306.1525566636,
             ...
              182693.8474433364,  185693.8474433364,  188693.8474433364,
              191693.8474433364,  194693.8474433364,  197693.8474433364,
              200693.8474433364,  203693.8474433364,  206693.8474433364,
              209693.8474433364],
            dtype='float64', name='y', length=300))
    • time
      PandasIndex
      PandasIndex(DatetimeIndex(['2018-01-01', '2018-01-08'], dtype='datetime64[ns]', name='time', freq=None))

Misc. Use Cases¶

  • I'm only looking at a grid point or small area and only during one model run
    • Just load the whole grid, it's easier
  • I'm looking at the gridpoint/small area at one time, but for tons of variables
    • Looking up by chunk will be faster

Troubleshooting¶

In my Jupyter notebook setup, a previous version of the pyproj package gives the following error when it tries looking up the projection info:

CRSError: Invalid datum string: urn:ogc:def:datum:EPSG::6326: (Internal Proj Error: proj_create: SQLite error on SELECT name, ellipsoid_auth_name, ellipsoid_code, prime_meridian_auth_name, prime_meridian_code, publication_date, frame_reference_epoch, deprecated FROM geodetic_datum WHERE auth_name = ? AND code = ?: no such column: publication_date)

I believe this is because I run Jupyter from a conda environment that's different than the kernel Jupyter is using. In any case, there's an easy fix:

In [ ]:
import pyproj
pyproj.datadir.set_data_dir("/Users/<me>/.conda/envs/<this notebook's kernel env>/share/proj")