Common Climate Data Formats: Overview

The following is by Dennis Shea (NCAR)

Data formats commonly encountered in climate research fall into 3 generic categories: GRIBnetCDF and HDF. All of these formats are portable (machine independent) and self-describing.  Self-describing files can be examined and read by the appropriate software without the user knowing the file's structural details.  Further, additional information about the data, called "metadata", may be included in the file. Typical metadata may include textual information about each variable's contents and units (eg.,"specific humidity" and "g/kg") or numerical information describing the coordinates (eg., time, level, latitude, longitude) that apply to the variables on the file.                                                                                    

Each of these file formats has evolved over time to address the changing needs of the communities they support. Hence, there are multiple versions of each format. Unfortunately for users, the newer formats are not necessarily backward compatible despite the similar naming conventions. This can be both frustrating and confusing for users. Specifically, the data formats are as follows:

  • GRIB1:    GRIdded Binary (Edition 1), World Meteorological Organization
  • GRIB2:    GRIdded Binary (Edition 2), World Meteorological Organization
  • netCDF3:  Network Common Data Form, (Version 3.x), Unidata (UCAR/NCAR)
  • netCDF4:  Network Common Data Format, (Version 4.x), Unidata (UCAR/NCAR
  • HDF4:     Hierarchical Data Format, (Version 4.x),  NCSA/NASA
  • HDF4-EOS2: HDF4-Earth Obseving System, (Version 2; georeferenced data)
  • HDF5:     Hierarchical Data Format, (Version 5.x),  NCSA/NASA
  • HDF5-EOS5: HDF5-Earth Obseving System, (Version 5; georeferenced data)
  • GeoTIFF: Georeferenced raster imagery

A subtle difference is that netCDF/HDF/HDF-EOS are file formats while GRIB is a record format. Because netCDF/HDF/HDF-EOS are file formats there are rules on a file's contents. For example, a simple netCDF rule is that all variable names must be unique.  HDF allows a file to contain multiple variables with the same name BUT the variables must be in different 'groups'. These 'groups' can be complicated but for this gross overview they can be considered analogous to Unix directories (Windows folders). Each GRIB-1 record (aka, 'message') contains information for two horizontal dimensions (eg., latitude and longitudefor one time and one level. GRIB-2 allows each record to contain multiple grids and levels for each time. A collection of GRIB records is called a GRIB file. However, there are no rules dictating the order of the collection of GRIB records (eg, records can be in random chronological order).

Another bit of information: A netCDF-4 file is bit of a misnomer. It is actually a subset of HDF5 with netCDF-3 style interfaces to the HDF5 software.