What is the Zarr format?

Zarr is a format used to create N-dimensional arrays with any NumPy dtype

  • Chunk arrays along any dimension
  • Compress and/or filter chunks using any NumCodecs codec
  • Flexible storage of arrays
  • Read or write an array concurrently from multiple threads or processes
  • More information can be found in the documentation here: https://zarr.readthedocs.io/en/stable/

    Zarr Chunk Requirements

    Chunks must be uniform size across the N-Dimensional arrays

    For example, the HRRR data is chunked as follows (time,x,y):

  • Analyses (1,150,150) Ex. (0.4.6)
  • Forecasts (48,150,150) or (18,150,150)
  • Chunks are indexed by their location in the domain, starting with the upper left corner

    Zarr Compression

    When compressing our Zarr data files, we do so using our chunk system

    Data compression is a tradeoff between random access and compressibility. The compression we choose will result in varied speed of access and storage ratio.

    Numcodecs is a Python package providing buffer compression and transformation codecs for use in data storage and communication applications.

    These include:

  • Compression codecs, e.g., Zlib, BZ2, LZMA and Blosc
  • Pre-compression filters, e.g., Delta, Quantize, FixedScaleOffset, PackBits, Categorize
  • Integrity checks, e.g., CRC32, Adler32
  • More information can be found in the documentation here: https://numcodecs.readthedocs.io/en/stable/