5. 手動創建DataArray#

從讀取netCDF檔案的過程中,我們已經理解DataArray的架構,也就是將資料儲存在座標軸的框架中,其中大氣科學資料的儲存格式大多是(time, level, lat, lon)這樣四維的結構。當然,也不一定要是這四個軸,只要是給定了座標軸,我們就可以手動建立DataArray。

class xarray.DataArray(data=<NA>, coords=None, dims=None, name=None, attrs=None)

  • data - Values for this array. Must be an numpy.ndarray, ndarray like, or castable to an ndarray.

  • dims – Name(s) of the data dimension(s).

  • coords – Coordinates (tick labels) to use for indexing along each dimension. The following notations are accepted:

    • mapping {dimension name: array-like}

    • mapping {coord name: DataArray}

    • mapping {coord name: Variable}

  • name – Name of this array.

  • attrs – Attributes to assign to the new instance.

Example 1: 將每日OLR資料轉化成(year, pentad, lat, lon)格式: 在探討季內尺度的季內變異時,將資料轉換成pentad單位是很方便的做法,可以過濾掉太高頻天氣尺度的變化。

先準備資料:

import xarray as xr 

olr_ds = xr.open_dataset("data/olr.nc")   
olr_da = olr_ds.olr
olr_noleap = olr_da.sel(time=~((olr_da.time.dt.month == 2) & (olr_da.time.dt.day == 29)))  # 因為處理pentad資料,2/29會使該候多一天,
                                                                                           # 為方便計算先拿掉2/29。
/Users/waynetsai/.local/lib/python3.10/site-packages/ecmwflibs/__init__.py:83: UserWarning: dlopen(/Users/waynetsai/.local/lib/python3.10/site-packages/ecmwflibs/_ecmwflibs.cpython-310-darwin.so, 0x0002): tried: '/Users/waynetsai/.local/lib/python3.10/site-packages/ecmwflibs/_ecmwflibs.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/waynetsai/.local/lib/python3.10/site-packages/ecmwflibs/_ecmwflibs.cpython-310-darwin.so' (no such file), '/Users/waynetsai/.local/lib/python3.10/site-packages/ecmwflibs/_ecmwflibs.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))
  warnings.warn(str(e))
olr_ptd = xr.DataArray( 
                       dims=['year','pentad','lat','lon'],
                       coords=dict(year=range(1998,2018), 
                                   pentad=range(1,74),
                                   lat=olr_da.lat,
                                   lon=olr_da.lon),
                       name='olr')
for yy in olr_ptd.year: 
    for p in olr_ptd.pentad:
        olr_ptd.loc[yy,p,:,:] = (olr_noleap[ int((yy-1998)*365+ (p-1)*5) : int((yy-1998)*365+ (p-1)*5 + 14),:,:  ]
                                            .mean(axis=0))
olr_ptd
<xarray.DataArray 'olr' (year: 20, pentad: 73, lat: 90, lon: 360)>
array([[[[219.89698792, 219.15559387, 218.40510559, ..., 225.50248718,
          223.55485535, 223.65061951],
         [223.34541321, 224.53091431, 222.86465454, ..., 227.65988159,
          226.38674927, 224.60290527],
         [227.29621887, 226.65647888, 226.98161316, ..., 229.71142578,
          229.39892578, 228.51731873],
         ...,
         [211.89347839, 212.39381409, 216.15280151, ..., 217.59571838,
          214.77052307, 212.65678406],
         [212.62583923, 212.52403259, 216.57559204, ..., 221.01184082,
          215.40840149, 211.68217468],
         [213.66184998, 212.31236267, 213.03718567, ..., 222.91699219,
          218.48425293, 216.92329407]],

        [[224.29585266, 222.79246521, 222.32139587, ..., 226.94450378,
          225.85778809, 225.28334045],
         [228.91574097, 228.96260071, 227.50735474, ..., 232.39346313,
          231.34643555, 229.98277283],
         [233.0683136 , 231.8346405 , 231.92663574, ..., 236.21047974,
          236.50788879, 235.31025696],
...
         [214.0785675 , 214.45594788, 218.28843689, ..., 218.18579102,
          214.91642761, 215.7442627 ],
         [212.68942261, 212.64877319, 215.43222046, ..., 211.55192566,
          211.44346619, 209.45568848],
         [210.72885132, 210.49061584, 210.29870605, ..., 214.1131897 ,
          210.98692322, 208.78648376]],

        [[214.6018219 , 215.97660828, 215.11463928, ..., 215.63418579,
          215.11351013, 215.05775452],
         [217.36659241, 217.99145508, 217.28042603, ..., 217.16047668,
          217.06045532, 216.31282043],
         [220.6197052 , 221.49935913, 221.74649048, ..., 219.66426086,
          219.0226593 , 218.87825012],
         ...,
         [205.20707703, 208.00669861, 212.51377869, ..., 205.86964417,
          203.92337036, 205.05079651],
         [202.4234314 , 204.04801941, 207.63363647, ..., 198.36546326,
          197.54985046, 198.2558136 ],
         [198.34162903, 196.90713501, 197.9355011 , ..., 202.5773468 ,
          199.10409546, 198.11695862]]]])
Coordinates:
  * year     (year) int64 1998 1999 2000 2001 2002 ... 2013 2014 2015 2016 2017
  * pentad   (pentad) int64 1 2 3 4 5 6 7 8 9 10 ... 65 66 67 68 69 70 71 72 73
  * lat      (lat) float32 -44.5 -43.5 -42.5 -41.5 -40.5 ... 41.5 42.5 43.5 44.5
  * lon      (lon) float32 0.5 1.5 2.5 3.5 4.5 ... 355.5 356.5 357.5 358.5 359.5

以上的計算過程,就是先建立一個空白的、帶有(year, pentad, lat, lon)座標軸的DataArray,再將候平均的結果儲存進去。

Note

olr_ptd.loc[yy,p,:,:] 還記得loc這個選取資料範圍的方式嗎?(see Chapter 3)

Caution

養成將DataArrray變數取名稱的好習慣:name='olr',在xr.merge()、輸出NetCDF檔案的時候才不會出錯。