NOAA High-Resolution Rapid Refresh (HRRR) Data Archive: AWS Open Data Program
The High-Resolution Rapid Refresh (HRRR) is run operationally by the National Centers for Environmental Prediction for several regions including the contiguous (CONUS) region of the United States. The HRRR is a 3-km resolution, cloud-resolving, convection-allowing atmospheric model. For more information, see NOAA ESRL's HRRR Website
As part of the NOAA Open Data Program, the NOAA Cooperative Institute for Climate and Satellite- North Carolina (CICS-NC) has implemented a data hub architecture to facilitate transfer of key NOAA environmental datasets, such as the model output from the HRRR, to public cloud providers. Amazon's Sustainability Data Initiative supports the storage of HRRR model output in GRIB2 format. The GRIB2 format efficiently stores hundreds of two-dimensional fields for specific parameters, levels, and valid times. (Most HRRR GRIB2 files are at hourly intervals but files containing fields for selected parameters are available at 15 min intervals.)
contains GRIB2 output from the HRRR model beginning on September 30, 2014. There have been four operational releases of the HRRR and HRRR V4 will be the last operational version of the HRRR model. The number of files, forecast duration, and types of model output available differ between the model versions.
NOAA HRRR Documentation:
HRRR Smoke Quick Guide
About the AWS HRRR Zarr Archive Managed by MesoWest:
Graduate student Brian Blaylock in the MesoWest Group in the Department of Atmospheric Sciences at the University of Utah
began in 2015 to illustrate the use of a private cloud object store, Pando, developed by the Center for High Performance Computing (CHPC) at the University of Utah
. Millions of two-dimensional gridded fields in GRIB2 format have been archived since that time. Each field contains over 1.9 million values over the contiguous United States from the HRRR data assimilation and forecast modeling system. The archive has been used for retrospective analyses of meteorological conditions during high-impact weather events, assessing the accuracy of the HRRR forecasts, and providing initial and boundary conditions for research simulations. The archive has been accessible interactively and through automated download procedures for researchers at other institutions that can be tailored by the user to extract individual two-dimensional grids from within the highly compressed files. Over a thousand users have voluntarily registered to use the HRRR archive at the University of Utah.
The University of Utah archive has grown to over 130 Tbytes of HRRR model output. However, we no longer need to continue that effort since the GRIB2 files are publicly available now via AWS and Google.
Despite the highly compressible nature of GRIB2 files, they are often on the order of several hundred MB each, making applications requiring access to large numbers of files challenging due to the memory and compute resources needed to parse them.
Graduate student Taylor Gowan in the MesoWest group began in 2019 to explore archiving HRRR model output to access only the data needed for common machine-learning applications. The objective was to foster faster access by selecting subdomains, parameters of interest, and time periods without the I/O overhead that comes from accessing many GRIB2 files. With support from the Amazon Sustainability Data Initiative and based on Taylor's work, the MesoWest group is now creating and maintaining HRRR model output in an optimized format, Zarr, in a publicly-accessible AWS S3 bucket called hrrrzarr
. This archive contains sets for each model run of analysis and forecast files sectioned into 96 small chunks for every variable. Files within the AWS hrrrzarr S3 bucket are named to emulate a hierarchical data structure.
Not all HRRR GRIB2 files are expected to be processed into Zarr format. Many users will find the GRIB2 format to be adequate for their needs. The types of use cases relevant for the Zarr archive require surface sensible weather parameters or meteorological parameters at "standard" levels in the vertical. Those are found in the sfc files bucket
. Applications that require meteorological parameters at all available levels will need to access the prs files bucket
. HRRR CONUS analysis (F00) files, whether for sfc or prs files, are stored in 96 "chunks" each containing 150x150 grid points. HRRR CONUS forecast (F01-FXX) files are stored in 96 3-D cubes (XX,150,150) where the forecast duration, FXX, depends on HRRR version and time of day. For example, V4 forecasts are available out to XX=48h at 00, 06, 12, 19 UTC and out to XX=18h at other hours of the day for variables produced in each forecast GRIB2 file.
Because of the time required to wait until all GRIB2 forecast fields are available, the most recent Zarr files are typically available 3 hours after the initialization time, e.g., 00UTC analysis and forecast files are available by 03UTC. This is dependent on the availability of the GRIB2 data in the NOAA AWS bucket.
The following table describes the current status and current plans for providing zarr formatted files for the CONUS region:
||Sfc Analysis Files
||Sfc Forecast Files
||Prs Analysis Files
|V4 Near Real Time
* At this time, no decision has been made yet about supporting the sub-hourly (15 min interval) files.
Zarr Archive Documentation:
Taylor Gowan Thesis Presentation
Taylor Gowan Submitted Paper