12. Climate Data Operator (CDO)#

CDO是由德國Max-Plank Institute for Meteorology研發出的軟體,它擁有許多可以處理標準氣候資料和預報模式資料的指令 (Operator),包括簡易的統計、算數、資料選擇、時間和空間網格處理、內插等,是一套方便又強大的工具。以往必須在命令列 (Command Line) 使用cdo,現在MPI把這套套件寫成Python的套件,因此也可以在Python程式中使用。

在這套講義中,我們主要會學習

  • 選擇資料中特定時間、空間、垂直層資料,或是特定的變數。

  • 計算簡易統計量。

  • 資料算術。

  • 資料網格內插。

  • 轉換資料格式。

Caution

cdo計算後的產品都是輸出.nc檔,不像在前面的單元中所學會輸出成一個矩陣在Python內部使用。因此計算完後,如果在程式中需要使用這筆資料,必須再用xarray.open_dataset讀進程式。

cdo基本指令#

Linux指令列 (Command Line)#

在Linux的指令列操作cdo的方式如下:

cdo [Options] Operator1 [-Operator2] [-OperatorN] infile outfile

也就是針對檔案infile操作指令OperatorN後,會得到計算好的結果outfile
若想要在一個命令列中執行N個指令,可以利用[-OperatorN]的方式 接起來;而cdo執行的順序為[-OperatorN][-Operator2]Operator1.

Python程式#

from cdo import *
cdo = Cdo()

在Linux指令列中所用的Operator,在Python中就是cdo的方法函數。

資料選擇 (Select Field)#

cdo有完整的指令說明書,可從官網上下載。我們從最簡單的資料選擇開始示範起,在看範例之前,可以先試著透過搜尋說明書自己做做看,例如資料選擇的方法記載在說明書的2.3節。

Example 1: 如何選取1998-2018年每年12月的OLR資料?

選取資料範圍使用select的方法函數。

1. Command Line:

cdo select,year=1998/2018,month=12 data/olr.nc data/olr_1998-2018.12.nc

2. Python:

cdo.select('year=1998/2018,month=12', input='data/olr.nc', output='data/olr_1998-2018.12.nc')
'data/olr_1998-2018.12.nc'

或是

cdo.selyear('1998/2018', input=cdo.selmon('12', input='data/olr.nc'), 
                         output='data/olr_1998-2018.12.nc')
'data/olr_1998-2018.12.nc'

'year=1998/2018,month=12' 這個參數代表年份選擇的範圍是1998-2018的十二月 (/ 代表選擇連續的範圍,而year=1998,2018 代表只選擇1998和2018這兩年)。這樣的表示法不管是在command line或是在Python程式碼都是一樣的。 而輸出 output 是一個檔案,我們也可以把結果輸出成 DataArray,例如

olr_sel = cdo.select('year=1998/2018,month=12', input='data/olr.nc', returnXArray='olr')
olr_sel
<xarray.DataArray 'olr' (time: 651, lat: 90, lon: 360)> Size: 84MB
[21092400 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 5kB 1998-12-01 1998-12-02 ... 2018-12-31
  * lon      (lon) float32 1kB 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * lat      (lat) float32 360B -44.5 -43.5 -42.5 -41.5 ... 41.5 42.5 43.5 44.5
Attributes:
    standard_name:  toa_outgoing_longwave_flux
    long_name:      NOAA Climate Data Record of Daily Mean Upward Longwave Fl...
    units:          W m-2
    cell_methods:   time: mean area: mean

returnXArray='olr',代表輸出時,olr是DataArray變數名稱。

另外一種方法,是利用 cdo.selmon() 先選擇月份,再 selyear 選擇年份。

cdo.selyear('1998/2018', input=cdo.selmon('12', input='data/olr.nc'), 
                         returnXArray='olr')
<xarray.DataArray 'olr' (time: 651, lat: 90, lon: 360)> Size: 84MB
[21092400 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 5kB 1998-12-01 1998-12-02 ... 2018-12-31
  * lon      (lon) float32 1kB 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * lat      (lat) float32 360B -44.5 -43.5 -42.5 -41.5 ... 41.5 42.5 43.5 44.5
Attributes:
    standard_name:  toa_outgoing_longwave_flux
    long_name:      NOAA Climate Data Record of Daily Mean Upward Longwave Fl...
    units:          W m-2
    cell_methods:   time: mean area: mean

我們甚至可以把DataArray傳遞給cdo。以上的程式碼可以改寫為

olr_dec = cdo.selmon('12',input='data/olr.nc',returnXArray='olr')
olr_sel = cdo.selyear('1998/2018', input=olr_dec, returnXArray='olr')
olr_sel
<xarray.DataArray 'olr' (time: 651, lat: 90, lon: 360)> Size: 84MB
[21092400 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 5kB 1998-12-01 1998-12-02 ... 2018-12-31
  * lon      (lon) float32 1kB 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * lat      (lat) float32 360B -44.5 -43.5 -42.5 -41.5 ... 41.5 42.5 43.5 44.5
Attributes:
    standard_name:  toa_outgoing_longwave_flux
    long_name:      NOAA Climate Data Record of Daily Mean Upward Longwave Fl...
    units:          W m-2
    cell_methods:   time: mean area: mean

如果想把結果輸出成Dataset,則最後設定 returnXDataset 即可。

Example 2 Select Field and Operator Chaining: 先以cdo合併NCEP R2 u850風場檔案時間,再選出NDJ的資料。

1. Command Line:

cdo select,month=1,11,12 -mergetime data/ncep_r2_uv850/u850.*.nc data/ncep_r2_u850_ndj.nc

2 Python:

cdo.select('month=1,11,12', 
           input=cdo.mergetime(input='data/ncep_r2_uv850/u850.*.nc'),
           returnXArray='uwnd')
<xarray.DataArray 'uwnd' (time: 2208, level: 1, lat: 73, lon: 144)> Size: 93MB
[23210496 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 18kB 1998-01-01 1998-01-02 ... 2021-12-31
  * lon      (lon) float32 576B 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5
  * lat      (lat) float32 292B 90.0 87.5 85.0 82.5 ... -82.5 -85.0 -87.5 -90.0
  * level    (level) float32 4B 850.0
Attributes: (12/14)
    standard_name:         eastward_wind
    long_name:             Daily U-wind on Pressure Levels
    units:                 m/s
    unpacked_valid_range:  [-140.  175.]
    actual_range:          [-78.96 110.35]
    precision:             2
    ...                    ...
    var_desc:              u-wind
    dataset:               NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Daily A...
    level_desc:            Pressure Levels
    statistic:             Mean
    parent_stat:           Individual Obs
    cell_methods:          time: mean (of 4 6-hourly values in one day)

cdo.select() 是用來選擇資料範圍的方法,我們選擇了一月、十一月和十二月 ('month=1,11,12'). 而我們是先將時間全部合併 (cdo.mergetime())了之後才選擇月份的,也就是我們將 cdo.mergetime()的結果作為cdo.select()的輸入變數。

Note

  • mergetime: Merges all timesteps of all input files sorted by date and time. All input files need to have the same structure with the same variables on different timesteps. After this operation every input timestep is in outfile and all timesteps are sorted by date and time.

  • cat: Concatenates all input datasets and appends the result to the end of outfile.

Warning

如果資料太大時 (e.g. 解析度太高導致資料太大……),先做mergetime再選擇時間,可能會導致合併後的副產品太大而記憶體無法負荷。因此,也可以寫一個程式帶有for迴圈,先每一年選擇月份並輸出資料,最後再把這些中間產品全部合併起來。

Example 3 選擇特定區域的資料: 使用sellonlatbox選擇40˚-180˚E、20˚S-30˚N區域的資料。

1. Command Line:

cdo sellonlatbox,40,180,-20,30 data/olr.nc olr_selbox.nc

2. Python:

cdo.sellonlatbox('40,180,-20,30',
                 input='data/olr.nc',
                 returnXArray='olr')
<xarray.DataArray 'olr' (time: 8760, lat: 50, lon: 140)> Size: 245MB
[61320000 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 70kB 1998-01-01 1998-01-02 ... 2021-12-31
  * lon      (lon) float32 560B 40.5 41.5 42.5 43.5 ... 176.5 177.5 178.5 179.5
  * lat      (lat) float32 200B -19.5 -18.5 -17.5 -16.5 ... 26.5 27.5 28.5 29.5
Attributes:
    standard_name:  toa_outgoing_longwave_flux
    long_name:      NOAA Climate Data Record of Daily Mean Upward Longwave Fl...
    units:          W m-2
    cell_methods:   time: mean area: mean

統計 (Statistics)#

統計方法在說明書的第2.8節,例如sum, mean, avg, var, std, weighted avg and var…,請自行參閱其定義及用法。

Note

What’s different between mean and avg? To distinguish two different kinds of treatment of missing values:

  • mean: only the not missing values are considered to belong to the sample with the side effect of a probably reduced sample size.

  • avg: just adding the sample members and divide the result by the sample size.

  • E.g.: the mean of 1, 2, miss and 3 is (1+2+3)/3 = 2, whereas the average is (1+2+miss+3)/4 = miss/4 = miss. If there are no missing values in the sample, the average and the mean are identical.

Example 4: 利用CDO計算1998-2018年OLR降水資料十二月的長期平均。

1. Command Line:

cdo timmean -select,year=1998/2018,month=12 data/olr.nc olr_dec_ltm.nc 

2. Python:

cdo.timmean(input=cdo.select('year=1998/2018,month=12', input='data/olr.nc'), 
            returnXArray='olr')
<xarray.DataArray 'olr' (time: 1, lat: 90, lon: 360)> Size: 130kB
[32400 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 8B 2008-12-16
  * lon      (lon) float32 1kB 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * lat      (lat) float32 360B -44.5 -43.5 -42.5 -41.5 ... 41.5 42.5 43.5 44.5
Attributes:
    standard_name:  toa_outgoing_longwave_flux
    long_name:      NOAA Climate Data Record of Daily Mean Upward Longwave Fl...
    units:          W m-2
    cell_methods:   time: mean area: mean

Note

時間軸的 “2008-12-16” 只是整段資料的時間平均而已。

Example 5: 計算日氣候平均 計算olr.nc的日氣候平均。

1. Command Line:

cdo ydaymean data/olr.nc data/olr_dayClim.nc

2.Python:

cdo.ydaymean(input='data/olr.nc', returnXArray='olr')
<xarray.DataArray 'olr' (time: 365, lat: 90, lon: 360)> Size: 47MB
[11826000 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 3kB 2021-01-01 2021-01-02 ... 2021-12-31
  * lon      (lon) float32 1kB 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * lat      (lat) float32 360B -44.5 -43.5 -42.5 -41.5 ... 41.5 42.5 43.5 44.5
Attributes:
    standard_name:  toa_outgoing_longwave_flux
    long_name:      NOAA Climate Data Record of Daily Mean Upward Longwave Fl...
    units:          W m-2
    cell_methods:   time: mean area: mean

算術 (Arithmetics)#

cdo的算術(以EXPR的operator表示)包括= (assignment)、加 (x+y)、減 (x-y)、乘 (x*y)、除 (x/y)、絕對值abs(x)、開根號sqr(x)、取指數exp(x)等。

Example 6: NCEP MSLP資料原先單位是Pa,利用cdo將資料轉換成hPa。

1. Command Line:

cdo expr,'mslp=mslp/100' mslp.2021.nc mslp.hpa.2021.nc

2. Python:

mslp_hPa = cdo.expr('mslp=mslp/100.', 
                    input='data/mslp.2021.nc', 
                    returnXArray='mslp')
mslp_hPa = mslp_hPa.assign_attrs(units='hPa')  # Change attribute. 
mslp_hPa
<xarray.DataArray 'mslp' (time: 365, lat: 73, lon: 144)> Size: 15MB
[3836880 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 3kB 2021-01-01 2021-01-02 ... 2021-12-31
  * lon      (lon) float32 576B 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5
  * lat      (lat) float32 292B 90.0 87.5 85.0 82.5 ... -82.5 -85.0 -87.5 -90.0
Attributes:
    standard_name:  pressure
    long_name:      daily mean 6-Hourly Mean Sea Level Pressure
    units:          hPa

網格內插 (Interpolation)#

例如當兩組資料網格解析度不同卻需要在相同網格上比較和分析時,或是降低資料的網格解析度以節省計算空間,或是處理模式結果時,我們需要改變資料的網格解析度。cdo有很好用的網格內插工具,可以處理球諧函數的網格座標 (spherical harmonics)、高斯座標 (Gausian Field)、直角坐標系 (Longitude/Latitude)等。這三種座標系統的解析度大約如下表所示:

內插到標準網格#

remapbil:Bilinear interpolation,是最常使用的網格內插方法。

1. Command Line:

cdo remapbil,n32 data/olr.nc data/olr.n32.nc

2. Python:

cdo.remapcon('n32',input='data/cmorph_sample.nc',returnXArray='cmorph')
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[12], line 1
----> 1 cdo.remapcon('n32',input='data/cmorph_sample.nc',returnXArray='cmorph')

File /data/wtsai/micromamba/p3t/lib/python3.10/site-packages/cdo/cdo.py:515, in Cdo.__call__(self, *args, **kwargs)
    511         outputs.append(self.tempStore.newFile())
    513 cmd.append(' '.join(outputs))
--> 515 retvals = self.__call(cmd, envOfCall)
    516 if self.__hasError(method_name, cmd, retvals):
    517     if self.returnNoneOnError:

File /data/wtsai/micromamba/p3t/lib/python3.10/site-packages/cdo/cdo.py:334, in Cdo.__call(self, cmd, envOfCall)
    326 env.update(envOfCall)
    328 proc = subprocess.Popen(' '.join(cmd),
    329                         shell=True,
    330                         stderr=subprocess.PIPE,
    331                         stdout=subprocess.PIPE,
    332                         env=env)
--> 334 retvals = proc.communicate()
    335 stdout = retvals[0].decode("utf-8")
    336 stderr = retvals[1].decode("utf-8")

File /data/wtsai/micromamba/p3t/lib/python3.10/subprocess.py:1154, in Popen.communicate(self, input, timeout)
   1151     endtime = None
   1153 try:
-> 1154     stdout, stderr = self._communicate(input, endtime, timeout)
   1155 except KeyboardInterrupt:
   1156     # https://bugs.python.org/issue25942
   1157     # See the detailed comment in .wait().
   1158     if timeout is not None:

File /data/wtsai/micromamba/p3t/lib/python3.10/subprocess.py:2021, in Popen._communicate(self, input, endtime, orig_timeout)
   2014     self._check_timeout(endtime, orig_timeout,
   2015                         stdout, stderr,
   2016                         skip_check_and_raise=True)
   2017     raise RuntimeError(  # Impossible :)
   2018         '_check_timeout(..., skip_check_and_raise=True) '
   2019         'failed to raise TimeoutExpired.')
-> 2021 ready = selector.select(timeout)
   2022 self._check_timeout(endtime, orig_timeout, stdout, stderr)
   2024 # XXX Rewrite these to use non-blocking I/O on the file
   2025 # objects; they are no longer using C stdio!

File /data/wtsai/micromamba/p3t/lib/python3.10/selectors.py:416, in _PollLikeSelector.select(self, timeout)
    414 ready = []
    415 try:
--> 416     fd_event_list = self._selector.poll(timeout)
    417 except InterruptedError:
    418     return ready

File /data/wtsai/micromamba/p3t/lib/python3.10/site-packages/cdo/cdo.py:835, in CdoTempfileStore.__catch__(self, signum, frame, throw, **kwargs)
    833 self.__del__()
    834 if callable(throw):
--> 835     throw(signum, frame, **kwargs)
    836 else:
    837     print("caught signal", signum, frame)

KeyboardInterrupt: 

內插到任意網格#

首先要對目標網格建立一個網格資訊檔。以NCEP R2的網格作為範例如下:

gridtype = lonlat
xsize    = 69
ysize    = 17
xfirst   = 40
xinc     = 2.5
yfirst   = -20
yinc     = 2.5

以上的檔案包含了網格型態、網格數、起始點等資訊。將這個網格資訊檔放入目前的資料夾中(取名為ncep_grid),就可以使用它來進行網格內插了。

Example 7: 將CMORPH資料 (解析度0.25度) 內插到NCEP R2網格。

1. Command Line:

cdo remapbil,ncep_grid data/cmorph_sample.nc data/cmorph_remap.nc

2. Python:

cdo.remapcon('ncep_r2_grid', input='data/cmorph_sample.nc', returnXArray='cmorph')
<xarray.DataArray 'cmorph' (time: 8766, lat: 17, lon: 69)> Size: 41MB
[10282518 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 70kB 1998-01-01 1998-01-02 ... 2021-12-31
  * lon      (lon) float64 552B 40.0 42.5 45.0 47.5 ... 202.5 205.0 207.5 210.0
  * lat      (lat) float64 136B -20.0 -17.5 -15.0 -12.5 ... 12.5 15.0 17.5 20.0
Attributes:
    standard_name:  lwe_precipitation_rate
    long_name:      precipitation
    units:          mm/day
    ver_note:       1998-2020: V1,0; 2021: V0.x.
    comment:        !!! CMORPH estimate is rainrate !!!

以上的範例受限於我們提供的CMORPH資料並非全球網格。如果針對全球資料,進行內插時ncep_grid網格資訊檔應寫為

gridtype = lonlat
xsize    = 144
ysize    = 73
xfirst   = 0
xinc     = 2.5
yfirst   = -90
yinc     = 2.5

檔案輸出格式#

CDO預設輸出的檔案格式為netCDF檔案,如果想要更改,可以在Operator前加上option -f <format>。可接受的格式如下‘:

1. Command Line:

cdo -f grb2 copy data/olr.nc data/olr.grb2

2. Python:

cdo.copy(option='-f grb2', 
         input='data/olr.nc',
         output='ex_out/olr.grib2')
'ex_out/olr.grib2'

Binary二元進位檔#

有些檔案是以二元進位 (binary) 的格式儲存的,檔案本身沒有網格資料,但通常會有附加的控制檔.ctl,告訴使用者要如何讀取這類檔案。二元進位檔在GrADS很常使用,利用cdo,也可以將二元進位檔轉換成netCDF格式。

首先控制檔格式如下:

DSET infile.bin
OPTIONS sequential
UNDEF − 9 e + 3 3
XDEF 360  LINEAR −179.5 1
YDEF  180 LINEAR −89.5 1
ZDEF LINEAR 1 1
TDEF 1 LINEAR 00:00 Z15jun1989 12hr
VARS 1
param 1 99 description of the variable
ENDVARS

1. Command Line:

cdo -f nc import_binary infile.ctl outfile.nc

2. Python:

cdo.import_binary(input='infile.ctl', output='outfile.nc', option='-f nc')

刪除暫存記憶體#

cdo.cleanTempDir()