5. Manually Create a DataArray

5. Manually Create a DataArray#

In Unit 3, we learned the structure of DataArray. A DataArray saves a set of data with coordinates. In atmospheric science, most of the data structures are in a 4-dimensional framework (time, level, lat, lon). However, the data does not necessarily have these 4 coordinates. As long as we specify the dimensions and coordinates, we can manually create any DataArray.

class xarray.DataArray(data=<NA>, coords=None, dims=None, name=None, attrs=None)

  • data - Values for this array. Must be an numpy.ndarray, ndarray like, or castable to an ndarray.

  • dims – Name(s) of the data dimension(s).

  • coords – Coordinates (tick labels) to use for indexing along each dimension. The following notations are accepted:

    • mapping {dimension name: array-like}

    • mapping {coord name: DataArray}

    • mapping {coord name: Variable}

  • name – Name of this array.

  • attrs – Attributes to assign to the new instance.

Example 1: Convert daily OLR data into (year, pentad, lat, lon) format. it is common to coarsen daily data into a pentad timescale to filter high-frequency variability, which is useful when analyzing intraseasonal variability.

First, we prepare the data.

import xarray as xr 

olr_ds = xr.open_dataset("data/olr.nc")   
olr_da = olr_ds.olr
olr_noleap = olr_da.sel(time=~((olr_da.time.dt.month == 2) & (olr_da.time.dt.day == 29)))  # 2/29 leads to additional days to this pentad.
                                                                                           # Exclude 2/29 for convinience. 
olr_ptd = xr.DataArray(# Specify the dimension names, coordinate, and the name of the DataArray.  
                       dims=['year','pentad','lat','lon'],
                       coords=dict(year=range(1998,2018), 
                                   pentad=range(1,74),
                                   lat=olr_da.lat,
                                   lon=olr_da.lon),
                       name='olr')
for yy in olr_ptd.year: 
    for p in olr_ptd.pentad:
        olr_ptd.loc[yy,p,:,:] = (olr_noleap[ int((yy-1998)*365+ (p-1)*5) : int((yy-1998)*365+ (p-1)*5 + 14),:,:  ]
                                            .mean(axis=0))
olr_ptd
<xarray.DataArray 'olr' (year: 20, pentad: 73, lat: 90, lon: 360)> Size: 378MB
array([[[[219.89698792, 219.15559387, 218.40510559, ..., 225.50248718,
          223.55485535, 223.65061951],
         [223.34541321, 224.53091431, 222.86465454, ..., 227.65988159,
          226.38674927, 224.60290527],
         [227.29621887, 226.65647888, 226.98161316, ..., 229.71142578,
          229.39892578, 228.51731873],
         ...,
         [211.89347839, 212.39381409, 216.15280151, ..., 217.59571838,
          214.77052307, 212.65678406],
         [212.62583923, 212.52403259, 216.57559204, ..., 221.01184082,
          215.40840149, 211.68217468],
         [213.66184998, 212.31236267, 213.03718567, ..., 222.91699219,
          218.48425293, 216.92329407]],

        [[224.29585266, 222.79246521, 222.32139587, ..., 226.94450378,
          225.85778809, 225.28334045],
         [228.91574097, 228.96260071, 227.50735474, ..., 232.39346313,
          231.34643555, 229.98277283],
         [233.0683136 , 231.8346405 , 231.92663574, ..., 236.21047974,
          236.50788879, 235.31025696],
...
         [214.0785675 , 214.45594788, 218.28843689, ..., 218.18579102,
          214.91642761, 215.7442627 ],
         [212.68942261, 212.64877319, 215.43222046, ..., 211.55192566,
          211.44346619, 209.45568848],
         [210.72885132, 210.49061584, 210.29870605, ..., 214.1131897 ,
          210.98692322, 208.78648376]],

        [[214.6018219 , 215.97660828, 215.11463928, ..., 215.63418579,
          215.11351013, 215.05775452],
         [217.36659241, 217.99145508, 217.28042603, ..., 217.16047668,
          217.06045532, 216.31282043],
         [220.6197052 , 221.49935913, 221.74649048, ..., 219.66426086,
          219.0226593 , 218.87825012],
         ...,
         [205.20707703, 208.00669861, 212.51377869, ..., 205.86964417,
          203.92337036, 205.05079651],
         [202.4234314 , 204.04801941, 207.63363647, ..., 198.36546326,
          197.54985046, 198.2558136 ],
         [198.34162903, 196.90713501, 197.9355011 , ..., 202.5773468 ,
          199.10409546, 198.11695862]]]])
Coordinates:
  * year     (year) int64 160B 1998 1999 2000 2001 2002 ... 2014 2015 2016 2017
  * pentad   (pentad) int64 584B 1 2 3 4 5 6 7 8 9 ... 66 67 68 69 70 71 72 73
  * lat      (lat) float32 360B -44.5 -43.5 -42.5 -41.5 ... 41.5 42.5 43.5 44.5
  * lon      (lon) float32 1kB 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5

In the example above, we create an empty DataArray with dimensions of (year, pentad, lat, lon). Then we calculate the pentad mean data and save it to the new DataArray.

Note

olr_ptd.loc[yy,p,:,:] Do you remember the selection method loc? (see Unit 3)

Caution

It is always a good practice to explicitly provide the name for a new DataArray (name='olr') because it prevents errors when using xr.merge() or writing to a netCDF file.