2. NetCDF and GRIB Datasets I/O

2. NetCDF and GRIB Datasets I/O#

Reading Multiple NetCDF Files#

Most operational centers provide data in multiple files. For example, NCEP R2 data is provided as one file per year. Therefore, we use the xarray.open_mfdataset(paths) command to open all the files at once. We can also set the combine='by_coords' option so that xarray automatically identifies the coordinates to concatenate all the files.

Example 2: In this example, we open NCEP R2 wind files, where the data is one file per year.

uds = (xr.open_mfdataset( 'data/ncep_r2_uv850/u850.*.nc',    # 檔案名稱
                           combine = "by_coords",               
                           parallel=True                     # 運用dask平行運算，提高運算效率
                         ))       
uds

<xarray.Dataset> Size: 369MB
Dimensions:    (time: 8766, bnds: 2, level: 1, lat: 73, lon: 144)
Coordinates:
  * time       (time) datetime64[ns] 70kB 1998-01-01 1998-01-02 ... 2021-12-31
  * lon        (lon) float32 576B 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5
  * lat        (lat) float32 292B 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
  * level      (level) float32 4B 850.0
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] 140kB dask.array<chunksize=(365, 2), meta=np.ndarray>
    uwnd       (time, level, lat, lon) float32 369MB dask.array<chunksize=(365, 1, 73, 144), meta=np.ndarray>
Attributes:
    CDI:            Climate Data Interface version 1.9.10 (https://mpimet.mpg...
    Conventions:    CF-1.0
    source:         NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Model
    institution:    National Centers for Environmental Prediction
    title:          Daily NCEP/DOE Reanalysis 2
    history:        Tue Jan 04 11:04:24 2022: cdo select,level=850 uwnd.1998....
    comments:       Data is from \nNCEP/DOE AMIP-II Reanalysis (Reanalysis-2)...
    platform:       Model
    dataset_title:  NCEP-DOE AMIP-II Reanalysis
    References:     https://www.esrl.noaa.gov/psd/data/gridded/data.ncep.rean...
    source_url:     http://www.cpc.ncep.noaa.gov/products/wesley/reanalysis2/
    CDO:            Climate Data Operators version 1.9.10 (https://mpimet.mpg...

We can also set the option combine='nested' and manually set the dimension for concatenation with concat_dim='time'.

uds = xr.open_mfdataset( 'data/ncep_r2_uv850/u850.*.nc',                                       
                          combine = "nested",               
                          concat_dim='time',                               
                          parallel=True                
                         ) 
uds

<xarray.Dataset> Size: 369MB
Dimensions:    (time: 8766, bnds: 2, level: 1, lat: 73, lon: 144)
Coordinates:
  * time       (time) datetime64[ns] 70kB 1998-01-01 1998-01-02 ... 2021-12-31
  * lon        (lon) float32 576B 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5
  * lat        (lat) float32 292B 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
  * level      (level) float32 4B 850.0
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] 140kB dask.array<chunksize=(365, 2), meta=np.ndarray>
    uwnd       (time, level, lat, lon) float32 369MB dask.array<chunksize=(365, 1, 73, 144), meta=np.ndarray>
Attributes:
    CDI:            Climate Data Interface version 1.9.10 (https://mpimet.mpg...
    Conventions:    CF-1.0
    source:         NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Model
    institution:    National Centers for Environmental Prediction
    title:          Daily NCEP/DOE Reanalysis 2
    history:        Tue Jan 04 11:04:24 2022: cdo select,level=850 uwnd.1998....
    comments:       Data is from \nNCEP/DOE AMIP-II Reanalysis (Reanalysis-2)...
    platform:       Model
    dataset_title:  NCEP-DOE AMIP-II Reanalysis
    References:     https://www.esrl.noaa.gov/psd/data/gridded/data.ncep.rean...
    source_url:     http://www.cpc.ncep.noaa.gov/products/wesley/reanalysis2/
    CDO:            Climate Data Operators version 1.9.10 (https://mpimet.mpg...

We get the exact same result. Then what is the difference between the two settings? From the definition of the combine options on xarray website - Combining data,

combine_nested(): requires specifying the order in which the objects should be combined. E.g.: a linearly-increasing ‘time’ dimension coordinate.
combine_by_coords(): attempts to infer this ordering automatically from the coordinates in the data.

In the first method with combine='by_coords', xarray will automatically concatenate files, whereas the second method with combine='nested' requires the user to manually set the dimension to concatenate with the option of concat_dim. Unless you are very confident about the file formats and contents, I would recommend using combine_nested rather than combine_by_coords.

Specify a file list with `glob`#

We can use Linux file list syntax to obtain the file list in the glob.glob function, then specify the list to xarray.open_mfdataset(). For example,

import glob

fls = (glob.glob('data/ncep_r2_uv850/u850.199?.nc') , 
       glob.glob('data/ncep_r2_uv850/u850.200?.nc'))  
fls = sum(fls, [])

uds = (xr.open_mfdataset( fls,    # 檔案名稱
                           combine = "by_coords",               
                           parallel=True                     # 運用dask平行運算，提高運算效率
                         ))       
uds

<xarray.Dataset> Size: 184MB
Dimensions:    (time: 4383, bnds: 2, level: 1, lat: 73, lon: 144)
Coordinates:
  * time       (time) datetime64[ns] 35kB 1998-01-01 1998-01-02 ... 2009-12-31
  * lon        (lon) float32 576B 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5
  * lat        (lat) float32 292B 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
  * level      (level) float32 4B 850.0
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] 70kB dask.array<chunksize=(365, 2), meta=np.ndarray>
    uwnd       (time, level, lat, lon) float32 184MB dask.array<chunksize=(365, 1, 73, 144), meta=np.ndarray>
Attributes:
    CDI:            Climate Data Interface version 1.9.10 (https://mpimet.mpg...
    Conventions:    CF-1.0
    source:         NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Model
    institution:    National Centers for Environmental Prediction
    title:          Daily NCEP/DOE Reanalysis 2
    history:        Tue Jan 04 11:04:24 2022: cdo select,level=850 uwnd.1998....
    comments:       Data is from \nNCEP/DOE AMIP-II Reanalysis (Reanalysis-2)...
    platform:       Model
    dataset_title:  NCEP-DOE AMIP-II Reanalysis
    References:     https://www.esrl.noaa.gov/psd/data/gridded/data.ncep.rean...
    source_url:     http://www.cpc.ncep.noaa.gov/products/wesley/reanalysis2/
    CDO:            Climate Data Operators version 1.9.10 (https://mpimet.mpg...

If concerning about the list order does not meet the time order, we can add the following line to re-order the time coordinate.

uds = uds.sortby('time')

Create and Write to NetCDF File#

After analysis and computation, we can also save the data into a netCDF file. For example, if we’d like to save the concatenated uds into a single netCDF file, we do

uds.to_netcdf('ex_out/ncep_r2_u850.nc',unlimited_dims='time')

It’s always a good practice to set unlimited_dims='time' for the time coordinate because it will be especially useful in combination with the Climate Data Operator (CDO). We will introduce CDO in Unit 11.

Read GRIB files#

A GRIB (GRIdded Binary) file saves data in binary format along with grid information such as time, longitude, latitude, and pressure levels. This format is widely used by ECMWF. Xarray can also open and read GRIB files (the cfgrib package is required). The syntax is as follows:

ds_grib = xr.open_dataset("example.grib", engine="cfgrib")

After reading, cfgrib will automatically create index files in the format example.grb.923a8.idx. These index files can speed up subsequent reading processes. However, if write permissions are denied, the .idx files cannot be created.

2. NetCDF and GRIB Datasets I/O

Contents

2. NetCDF and GRIB Datasets I/O#

Read a NetCDF File#

Reading Multiple NetCDF Files#

Specify a file list with `glob`#

`parallel` Option#

Create and Write to NetCDF File#

Read GRIB files#

2. NetCDF and GRIB Datasets I/O

Contents

2. NetCDF and GRIB Datasets I/O#

Read a NetCDF File#

Reading Multiple NetCDF Files#

Specify a file list with glob#

parallel Option#

Create and Write to NetCDF File#

Read GRIB files#

Specify a file list with `glob`#

`parallel` Option#