# 11. Climate Data Operator (CDO)

Climate Data Operator (CDO) is a software developed by the Max-Planck Institute for Meteorology (MPI). It provides numerous **operators** to process standard climate data or model forecast outputs, including simple statistics, arithmetic operations, data slicing, and regridding functions. Initially, CDO could only be operated using the command line, but recently MPI has developed a Python version.

In this unit, we will focus on:
- Selecting specific temporal, spatial, or vertical ranges of data, or selecting a specific variable.
- Simple statistics.
- Arithmetic operations.
- Regridding.
- Converting data formats.

For other functions and detail usages for each operator, see [CDO User Guide (for command line useage)](https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf) or [`pythob_cdo` Introduction](https://code.mpimet.mpg.de/attachments/download/27273/python_cdo_introduction.pdf).

## Fundamentals of `cdo`

### Linux Command Line Usage

The usage to operate cdo in the Linux command line is as follows:

~~~
cdo [Options] Operator1 [-Operator2] [-OperatorN] infile outfile
~~~

For example, we get the result file `outfile` after applying `OperatorN` on `infile`. If we'd like to apply multiple operators, we can chain different operators using `[-OperatorN]`. The execution order is from `[-OperatorN]` to `[-Operator2]` to `Operator1`.

### Usage in Python Script


In [1]:
from cdo import *
cdo = Cdo()

The **operators** in the command line are equivalent to the **methods** of `cdo`.


## Select Field

We first demonstrate how to slice data with cdo. Before we look into the following example, you can try to read the [CDO User Guide](https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf) and find the solution first. Data selection is in Sec. 2.3.

**Example 1:** How to slice OLR data in December from 1998 to 2018?

We use the `select` operator in the command line and the `.select()` method in Python.

**1. Command Line:**
~~~bash
cdo select,year=1998/2018,month=12 data/olr.nc data/olr_1998-2018.12.nc
~~~

**2. Python:**

In [2]:
cdo.select('year=1998/2018,month=12', input='data/olr.nc', output='ex_out/olr_1998-2018.12.nc')

'ex_out/olr_1998-2018.12.nc'

The parameter `'year=1998/2018,month=12'` means to select years ranging from 1998 to 2018 (Note that `/` means to select continuous years, whereas `year=1998,2018` only selects two years) and December. The usage is very similar between the command line and Python. The `output` is a file in the above example. We can also ask cdo to return a DataArray instead of writing to a netCDF file, such as:

In [3]:
olr_sel = cdo.select('year=1998/2018,month=12', input='data/olr.nc', returnXArray='olr')
olr_sel

`olr` for `returnXArray` option is the name of the input variable. 

Alternatively, we can select December first using `cdo.selmon()` as input and then chain with `selyear`.

In [4]:
cdo.selyear('1998/2018', input=cdo.selmon('12', input='data/olr.nc'), 
                         returnXArray='olr')

We can also pass a DataArray for `input`. The above code an be also written as

In [5]:
olr_dec = cdo.selmon('12',input='data/olr.nc',returnXArray='olr')
olr_sel = cdo.selyear('1998/2018', input=olr_dec, returnXArray='olr')
olr_sel

If we like to output the result to a Dataset, set `returnXDataset` instead of `returnXArray`. 

**Example 2 Select Field and Operator Chaining:** Merge NCEP R2 850-hPa wind field along time dimension, then select NDJ season data. 

**1. Command Line:** 

~~~bash
cdo select,month=1,11,12 -mergetime data/ncep_r2_uv850/u850.*.nc data/ncep_r2_u850_ndj.nc
~~~

**2 Python:** 

In [7]:
cdo.select('month=1,11,12', 
           input=cdo.mergetime(input='data/ncep_r2_uv850/u850.*.nc'),
           returnXArray='uwnd')

The `cdo.select()` method is used to select data for the months January, November, and December (`'month=1,11,12'`). This selection is applied to the data that is first merged along the time dimension using `cdo.mergetime()`. The merged data is specified as the input to the `cdo.select()` method.

```{note}
- `mergetime`: Merges all timesteps of all input files sorted by date and time. All input files need to have the same structure with the same variables on different timesteps. After this operation every input timestep is in outfile and all timesteps are sorted by date and time.
- `cat`: Concatenates all input datasets and appends the result to the end of outfile.
```

```{warning}
If the file size is too large (because of, for example, the high resolution), the system may overload if we operate the above command. Therefore, we can use a `for` loop to select the month and output the byproduct, and then merge them.
```

**Example 3 Select a specific domain:** use the `sellonlatbox` method to slice over the 40˚-180˚E, 20˚S-30˚N domain.

**1. Command Line:**

~~~bash
cdo sellonlatbox,40,180,-20,30 data/olr.nc olr_selbox.nc
~~~

**2. Python:**

In [8]:
cdo.sellonlatbox('40,180,-20,30',
                 input='data/olr.nc',
                 returnXArray='olr')

## Statistics

The detailed usage of statistical methods such as `sum`, `mean`, `avg`, `var`, `std`, `weighted avg`, etc., can be found in Section 2.8 of the User Guide.

```{note}
What’s different between `mean` and `avg`? To distinguish two different kinds of treatment of missing values:
- `mean`: only the not missing values are considered to belong to the sample with the side effect of a probably reduced sample size.
- `avg`: just adding the sample members and divide the result by the sample size.
- E.g.: the mean of 1, 2, miss and 3 is (1+2+3)/3 = 2, whereas the average is (1+2+miss+3)/4 = miss/4 = miss. If there are no missing values in the sample, the average and the mean are identical.
```

**Example 4:** Calculate mean OLR in December 1998-2018.

**1. Command Line:**

~~~bash
cdo timmean -select,year=1998/2018,month=12 data/olr.nc olr_dec_ltm.nc 
~~~

**2. Python:**

In [9]:
cdo.timmean(input=cdo.select('year=1998/2018,month=12', input='data/olr.nc'), 
            returnXArray='olr')

```{note}
The `time` coordinate "2008-12-16" is simply the "averaged time" of the entire period. 
```

**Example 5: Calculate daily climatology.** Calculate daily climatology of `olr.nc`.

**1. Command Line:**

~~~bash
cdo ydaymean data/olr.nc data/olr_dayClim.nc
~~~

**2.Python:** 

In [10]:
cdo.ydaymean(input='data/olr.nc', returnXArray='olr')

## Arithmetics

Arithmetics operators in cdo is `EXPR`. It includes assignment `=`, plus (`x+y`), minus (`x-y`), mulplication (`x*y`), devision (`x/y`), absolute values `abs(x)`, squared root `sqr(x)`, exponential `exp(x)`, etc.

**Example 6:** The unit of mean sea level pressure (MSLP) in NCEP R2 data is Pa. Convert to hPa. 

**1. Command Line:**
~~~bash
cdo expr,'mslp=mslp/100' mslp.2021.nc mslp.hpa.2021.nc
~~~

**2. Python:**

In [14]:
mslp_hPa = cdo.expr('mslp=mslp/100.', 
                    input='data/mslp.2021.nc', 
                    returnXArray='mslp')
mslp_hPa = mslp_hPa.assign_attrs(units='hPa')  # Change attribute. 
mslp_hPa

## Regrid

When we compare or analyze two datasets with different grid resolutions, or when we'd like to speed up computations or save RAM by coarsening the grid resolution, or when addressing model outputs, we may need to regrid the dataset. The regrid operator in CDO can handle spherical harmonics grids, Gaussian grids, or longitude/latitude grids. The corresponding resolution between these coordinate systems is summarized in the table.

![](images/fig11.1.png)

### Bilinear Interpolation to the Standard Coordinate System

**`remapbil`**: Bilinear interpolation is the most commonly used regridding method. 

![](images/fig11.2.png)

**1. Command Line:** 
~~~bash
cdo remapbil,n32 data/olr.nc data/olr.n32.nc
~~~

**2. Python:**

In [15]:
cdo.remapbil('n32', input='data/olr.nc', returnXArray='olr')

### First-order Conservative Remapping

As mentioned in Unit 7, rainfall need to stay mass conservation after regridding, therefore **conservative** regridding is required. The cdo operator for this remapping method is `remapcon`.

**1. Command Line:**
~~~bash
cdo remapcon,n32 cmorph_sample.nc cmorph_n32.nc
~~~

**2. Python:**

In [16]:
cdo.remapcon('n32',input='data/cmorph_sample.nc',returnXArray='cmorph')

### Remap to Any Grid Type

We first need to create a **grid description file**. For example, if the target grid system is the NCEP R2 grid system, the grid description file should include:

~~~
gridtype = lonlat
xsize    = 69
ysize    = 17
xfirst   = 40
xinc     = 2.5
yfirst   = -20
yinc     = 2.5
~~~


The file contains the grid type, number of grids, and starting point. Save the description file in the current folder and name it, for example, `ncep_r2_grid`, so that we can remap the file to this system.

**Example 7:** Regrid CMORPH data (0.25˚ resolution) onto the NCEP R2 grid with the first-order conservation mapping method.

**1. Command Line:**

```bash
cdo remapcon,ncep_r2_grid data/cmorph_sample.nc data/cmorph_remap.nc
```

**2. Python:**

In [18]:
cdo.remapcon('ncep_r2_grid', input='data/cmorph_sample.nc', returnXArray='cmorph')

Since I don't provide global CMORPH precipitation data as a sample file, the `ncep_r2_grid` file is not global either. For global data, the grid description file should be:


~~~
gridtype = lonlat
xsize    = 144
ysize    = 73
xfirst   = 0
xinc     = 2.5
yfirst   = -90
yinc     = 2.5
~~~

## File Output Format

The default output format in CDO is netCDF. We can change it using the option `-f <format>`. Following are the accepted formats.

![](images/fig11.3.png)

**1. Command Line:**

```bash
cdo -f grb2 copy data/olr.nc data/olr.grb2
```

In [19]:
cdo.copy(option='-f grb2', 
         input='data/olr.nc',
         output='ex_out/olr.grib2')

'ex_out/olr.grib2'

### Binary Files

Some files are saved in binary format. A binary file doesn't contain grid information. However, a control file `.ctl` is attached to help users understand how to read such files. Binary format is commonly used by GrADS. We can convert a binary file with a `.ctl` file to a netCDF file.

Below is the format of a `.ctl` file.

~~~
DSET infile.bin
OPTIONS sequential
UNDEF − 9 e + 3 3
XDEF 360  LINEAR −179.5 1
YDEF  180 LINEAR −89.5 1
ZDEF LINEAR 1 1
TDEF 1 LINEAR 00:00 Z15jun1989 12hr
VARS 1
param 1 99 description of the variable
ENDVARS
~~~

**1. Command Line:**
~~~
cdo -f nc import_binary infile.ctl outfile.nc
~~~

**2. Python:**
~~~
cdo.import_binary(input='infile.ctl', output='outfile.nc', option='-f nc')
~~~

## Delete Temporary Files

In [20]:
cdo.cleanTempDir()