# 5. Manually Create a DataArray

In Unit 3, we learned the structure of DataArray. A DataArray saves a set of data with coordinates. In atmospheric science, most of the data structures are in a 4-dimensional framework (time, level, lat, lon). However, the data does not necessarily have these 4 coordinates. As long as we specify the dimensions and coordinates, we can manually create any DataArray.

> *class* xarray.DataArray(data=`<NA>`, coords=None, dims=None, name=None, attrs=None) 

- data - Values for this array. Must be an numpy.ndarray, ndarray like, or castable to an ndarray.
- dims – Name(s) of the data dimension(s).
- coords – Coordinates (tick labels) to use for indexing along each dimension. The following notations are accepted:
  - mapping {dimension name: array-like}
  - mapping {coord name: DataArray} 
  - mapping {coord name: Variable}
- name – Name of this array.
- attrs – Attributes to assign to the new instance.

**Example 1: Convert daily OLR data into (year, pentad, lat, lon) format.** it is common to coarsen daily data into a pentad timescale to filter high-frequency variability, which is useful when analyzing intraseasonal variability.
 

First, we prepare the data. 

In [1]:
import xarray as xr 

olr_ds = xr.open_dataset("data/olr.nc")   
olr_da = olr_ds.olr
olr_noleap = olr_da.sel(time=~((olr_da.time.dt.month == 2) & (olr_da.time.dt.day == 29)))  # 2/29 leads to additional days to this pentad.
                                                                                           # Exclude 2/29 for convinience. 


In [2]:
olr_ptd = xr.DataArray(# Specify the dimension names, coordinate, and the name of the DataArray.  
                       dims=['year','pentad','lat','lon'],
                       coords=dict(year=range(1998,2018), 
                                   pentad=range(1,74),
                                   lat=olr_da.lat,
                                   lon=olr_da.lon),
                       name='olr')
for yy in olr_ptd.year: 
    for p in olr_ptd.pentad:
        olr_ptd.loc[yy,p,:,:] = (olr_noleap[ int((yy-1998)*365+ (p-1)*5) : int((yy-1998)*365+ (p-1)*5 + 14),:,:  ]
                                            .mean(axis=0))
olr_ptd

In the example above, we create an empty DataArray with dimensions of (year, pentad, lat, lon). Then we calculate the pentad mean data and save it to the new DataArray.

```{Note} 
`olr_ptd.loc[yy,p,:,:]` Do you remember the selection method `loc`? (see Unit 3)
```

```{caution}
It is always a good practice to explicitly provide the name for a new DataArray (`name='olr'`) because it prevents errors when using `xr.merge()` or writing to a netCDF file.
```