Zarr
Overview Create N-dimensional arrays with any NumPy dtype Chunk arrays along any dimension Compress and/or filter chunks using any NumCodecs codec Flexible storage of arrays Read or write an array concurrently from multiple threads or processes
https://zarr.readthedocs.io/en/stable/
Chunks: Chunks must be uniform size across the N-Dimensional arrays
For example, the HRRR data is chunked as follows (time,x,y):
Analyses (1,150,150) Ex. (0.4.6) Forecasts (36,150,150) or (18,150,150)
Chunks are indexed by their location in the domain, starting with the upper left corner
For a 2-dimensional array, the chunk structure and indexing would be as follows:
Compressors: When compressing our Zarr data files, we do so using our chunk system
Data compression is a trade off between random access and compressibility. The compression we choose will result in varied speed of access and storage ratio.
Numcodecs is a Python package providing buffer compression and transformation codecs for use in data storage and communication applications. These include: Compression codecs, e.g., Zlib, BZ2, LZMA and Blosc Pre-compression filters, e.g., Delta, Quantize, FixedScaleOffset, PackBits, Categorize Integrity checks, e.g., CRC32, Adler32
https://numcodecs.readthedocs.io/en/stable/
Output: Zarr data files can be written to a variety of storage sources
Memory storage Disk (NFS) Zip storage Cloud storage (Google, AWS)
Initialize the zarr file store, then fill with arrays and compress the chunked variables