2. NetCDF and GRIB Datasets I/O#

Read a NetCDF File#

The command to open a single netCDF file is simply xarray.open_dataset(filename)

Example 1: Read an OLR file into a xarray.Dataset.

import xarray as xr

olr_ds = xr.open_dataset("data/olr.nc")
olr_ds
<xarray.Dataset> Size: 1GB
Dimensions:   (time: 8760, lon: 360, bnds: 2, lat: 90)
Coordinates:
  * time      (time) datetime64[ns] 70kB 1998-01-01 1998-01-02 ... 2021-12-31
  * lon       (lon) float32 1kB 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * lat       (lat) float32 360B -44.5 -43.5 -42.5 -41.5 ... 41.5 42.5 43.5 44.5
Dimensions without coordinates: bnds
Data variables:
    lon_bnds  (lon, bnds) float32 3kB ...
    lat_bnds  (lat, bnds) float32 720B ...
    olr       (time, lat, lon) float32 1GB ...
Attributes: (12/49)
    CDI:                        Climate Data Interface version 1.9.10 (https:...
    Conventions:                CF-1.6
    source:                     NOAA Archive of HIRS L1B data from TIROS-N Se...
    institution:                UMD/ESSIC > Earth System Science Interdiscipl...
    history:                    Fri Jan 14 11:02:54 2022: cdo sellonlatbox,0,...
    conventions:                CF-1.6
    ...                         ...
    Metadata_Link:              gov.noaa.ncdc:C00875
    product_version:            Ver01Rev02
    platform:                   TIROS-N > Television Infrared Observation Sat...
    sensor:                     HIRS-2 > High Resolution Infra-red Sounder/2,...
    spatial_resolution:         1.0 by 1.0 degree equal angle
    CDO:                        Climate Data Operators version 1.9.10 (https:...

Now we obtain a xarray.Dataset, where the olr variable and the coordinates are saved in this dataset. The name of the corrdinates is called dimension. Note that the time coordinate is converted to a datetime64 object.

Reading Multiple NetCDF Files#

Most operational centers provide data in multiple files. For example, NCEP R2 data is provided as one file per year. Therefore, we use the xarray.open_mfdataset(paths) command to open all the files at once. We can also set the combine='by_coords' option so that xarray automatically identifies the coordinates to concatenate all the files.

Example 2: In this example, we open NCEP R2 wind files, where the data is one file per year.

uds = (xr.open_mfdataset( 'data/ncep_r2_uv850/u850.*.nc',    # 檔案名稱
                           combine = "by_coords",               
                           parallel=True                     # 運用dask平行運算,提高運算效率
                         ))       
uds
<xarray.Dataset> Size: 369MB
Dimensions:    (time: 8766, bnds: 2, level: 1, lat: 73, lon: 144)
Coordinates:
  * time       (time) datetime64[ns] 70kB 1998-01-01 1998-01-02 ... 2021-12-31
  * lon        (lon) float32 576B 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5
  * lat        (lat) float32 292B 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
  * level      (level) float32 4B 850.0
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] 140kB dask.array<chunksize=(365, 2), meta=np.ndarray>
    uwnd       (time, level, lat, lon) float32 369MB dask.array<chunksize=(365, 1, 73, 144), meta=np.ndarray>
Attributes:
    CDI:            Climate Data Interface version 1.9.10 (https://mpimet.mpg...
    Conventions:    CF-1.0
    source:         NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Model
    institution:    National Centers for Environmental Prediction
    title:          Daily NCEP/DOE Reanalysis 2
    history:        Tue Jan 04 11:04:24 2022: cdo select,level=850 uwnd.1998....
    comments:       Data is from \nNCEP/DOE AMIP-II Reanalysis (Reanalysis-2)...
    platform:       Model
    dataset_title:  NCEP-DOE AMIP-II Reanalysis
    References:     https://www.esrl.noaa.gov/psd/data/gridded/data.ncep.rean...
    source_url:     http://www.cpc.ncep.noaa.gov/products/wesley/reanalysis2/
    CDO:            Climate Data Operators version 1.9.10 (https://mpimet.mpg...

We can also set the option combine='nested' and manually set the dimension for concatenation with concat_dim='time'.

uds = xr.open_mfdataset( 'data/ncep_r2_uv850/u850.*.nc',                                       
                          combine = "nested",               
                          concat_dim='time',                               
                          parallel=True                
                         ) 
uds
<xarray.Dataset> Size: 369MB
Dimensions:    (time: 8766, bnds: 2, level: 1, lat: 73, lon: 144)
Coordinates:
  * time       (time) datetime64[ns] 70kB 1998-01-01 1998-01-02 ... 2021-12-31
  * lon        (lon) float32 576B 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5
  * lat        (lat) float32 292B 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
  * level      (level) float32 4B 850.0
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] 140kB dask.array<chunksize=(365, 2), meta=np.ndarray>
    uwnd       (time, level, lat, lon) float32 369MB dask.array<chunksize=(365, 1, 73, 144), meta=np.ndarray>
Attributes:
    CDI:            Climate Data Interface version 1.9.10 (https://mpimet.mpg...
    Conventions:    CF-1.0
    source:         NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Model
    institution:    National Centers for Environmental Prediction
    title:          Daily NCEP/DOE Reanalysis 2
    history:        Tue Jan 04 11:04:24 2022: cdo select,level=850 uwnd.1998....
    comments:       Data is from \nNCEP/DOE AMIP-II Reanalysis (Reanalysis-2)...
    platform:       Model
    dataset_title:  NCEP-DOE AMIP-II Reanalysis
    References:     https://www.esrl.noaa.gov/psd/data/gridded/data.ncep.rean...
    source_url:     http://www.cpc.ncep.noaa.gov/products/wesley/reanalysis2/
    CDO:            Climate Data Operators version 1.9.10 (https://mpimet.mpg...

We get the exact same result. Then what is the difference between the two settings? From the definition of the combine options on xarray website - Combining data,

  • combine_nested(): requires specifying the order in which the objects should be combined. E.g.: a linearly-increasing ‘time’ dimension coordinate.

  • combine_by_coords(): attempts to infer this ordering automatically from the coordinates in the data.

In the first method with combine='by_coords', xarray will automatically concatenate files, whereas the second method with combine='nested' requires the user to manually set the dimension to concatenate with the option of concat_dim. Unless you are very confident about the file formats and contents, I would recommend using combine_nested rather than combine_by_coords.

Specify a file list with glob#

We can use Linux file list syntax to obtain the file list in the glob.glob function, then specify the list to xarray.open_mfdataset(). For example,

import glob

fls = (glob.glob('data/ncep_r2_uv850/u850.199?.nc') , 
       glob.glob('data/ncep_r2_uv850/u850.200?.nc'))  
fls = sum(fls, [])

uds = (xr.open_mfdataset( fls,    # 檔案名稱
                           combine = "by_coords",               
                           parallel=True                     # 運用dask平行運算,提高運算效率
                         ))       
uds
<xarray.Dataset> Size: 184MB
Dimensions:    (time: 4383, bnds: 2, level: 1, lat: 73, lon: 144)
Coordinates:
  * time       (time) datetime64[ns] 35kB 1998-01-01 1998-01-02 ... 2009-12-31
  * lon        (lon) float32 576B 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5
  * lat        (lat) float32 292B 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
  * level      (level) float32 4B 850.0
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] 70kB dask.array<chunksize=(365, 2), meta=np.ndarray>
    uwnd       (time, level, lat, lon) float32 184MB dask.array<chunksize=(365, 1, 73, 144), meta=np.ndarray>
Attributes:
    CDI:            Climate Data Interface version 1.9.10 (https://mpimet.mpg...
    Conventions:    CF-1.0
    source:         NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Model
    institution:    National Centers for Environmental Prediction
    title:          Daily NCEP/DOE Reanalysis 2
    history:        Tue Jan 04 11:04:24 2022: cdo select,level=850 uwnd.1998....
    comments:       Data is from \nNCEP/DOE AMIP-II Reanalysis (Reanalysis-2)...
    platform:       Model
    dataset_title:  NCEP-DOE AMIP-II Reanalysis
    References:     https://www.esrl.noaa.gov/psd/data/gridded/data.ncep.rean...
    source_url:     http://www.cpc.ncep.noaa.gov/products/wesley/reanalysis2/
    CDO:            Climate Data Operators version 1.9.10 (https://mpimet.mpg...

If concerning about the list order does not meet the time order, we can add the following line to re-order the time coordinate.

uds = uds.sortby('time')

parallel Option#

In the above example, we added a parallel=True option. This option utilizes the dask package to parallelize reading the files, which speeds up the reading process.

parallel - If True, the open and preprocess steps of this function will be performed in parallel using dask.delayed. Default is False.

When using this option, the data is saved to a dask.array instead of an xarray.dataset. This is because the data is temporarily stored in virtual memory. Use .load() to convert the dask array to an xarray.dataset. Details on using dask will be introduced in Unit 12.

uds.load()
<xarray.Dataset> Size: 184MB
Dimensions:    (time: 4383, bnds: 2, level: 1, lat: 73, lon: 144)
Coordinates:
  * time       (time) datetime64[ns] 35kB 1998-01-01 1998-01-02 ... 2009-12-31
  * lon        (lon) float32 576B 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5
  * lat        (lat) float32 292B 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
  * level      (level) float32 4B 850.0
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] 70kB 1998-01-01 ... 2010-01-01
    uwnd       (time, level, lat, lon) float32 184MB -7.99 -7.96 ... -3.74 -3.71
Attributes:
    CDI:            Climate Data Interface version 1.9.10 (https://mpimet.mpg...
    Conventions:    CF-1.0
    source:         NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Model
    institution:    National Centers for Environmental Prediction
    title:          Daily NCEP/DOE Reanalysis 2
    history:        Tue Jan 04 11:04:24 2022: cdo select,level=850 uwnd.1998....
    comments:       Data is from \nNCEP/DOE AMIP-II Reanalysis (Reanalysis-2)...
    platform:       Model
    dataset_title:  NCEP-DOE AMIP-II Reanalysis
    References:     https://www.esrl.noaa.gov/psd/data/gridded/data.ncep.rean...
    source_url:     http://www.cpc.ncep.noaa.gov/products/wesley/reanalysis2/
    CDO:            Climate Data Operators version 1.9.10 (https://mpimet.mpg...

Create and Write to NetCDF File#

After analysis and computation, we can also save the data into a netCDF file. For example, if we’d like to save the concatenated uds into a single netCDF file, we do

uds.to_netcdf('ex_out/ncep_r2_u850.nc',unlimited_dims='time')

It’s always a good practice to set unlimited_dims='time' for the time coordinate because it will be especially useful in combination with the Climate Data Operator (CDO). We will introduce CDO in Unit 11.

Read GRIB files#

A GRIB (GRIdded Binary) file saves data in binary format along with grid information such as time, longitude, latitude, and pressure levels. This format is widely used by ECMWF. Xarray can also open and read GRIB files (the cfgrib package is required). The syntax is as follows:

ds_grib = xr.open_dataset("example.grib", engine="cfgrib")

After reading, cfgrib will automatically create index files in the format example.grb.923a8.idx. These index files can speed up subsequent reading processes. However, if write permissions are denied, the .idx files cannot be created.