4. Conditional Control of datetime Index#

Conditional Control of datetime Index in xarray#

In Unit 3, we demonstrated how to use the sel method with slice to select data with a continuous temporal or spatial range. However, sometimes we need to select non-continuous time periods, such as certain months over several years. In these cases, it is not useful to select with slice. Therefore, we can use conditional control arguments to select data that meet the requirements we specify. Specifically, the time coordinate in xarray is a datetime object, which includes datetime attributes such as year, month, day, and so on. We can use these attributes to select the dates we like.

Example 1: Select only the JAS season data.

import xarray as xr 

olr_da = xr.open_dataset("data/olr.nc").olr
olr_jas = olr_da.sel(time=(olr_da.time.dt.month.isin([7,8,9]))) 
olr_jas
<xarray.DataArray 'olr' (time: 2208, lat: 90, lon: 360)> Size: 286MB
[71539200 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 18kB 1998-07-01 1998-07-02 ... 2021-09-30
  * lon      (lon) float32 1kB 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * lat      (lat) float32 360B -44.5 -43.5 -42.5 -41.5 ... 41.5 42.5 43.5 44.5
Attributes:
    standard_name:  toa_outgoing_longwave_flux
    long_name:      NOAA Climate Data Record of Daily Mean Upward Longwave Fl...
    units:          W m-2
    cell_methods:   time: mean area: mean

For time=(olr_da.time.dt.month.isin([7, 8, 9])), xarray will check if the month of each timestep falls in either the 7th, 8th, or 9th month (i.e., July, August, or September). If so, the timestep will be marked True. Otherwise, it will be marked False. Finally, only the data points marked True will be preserved.

Example 2: Remove Leap Days

Similar to Example 1, we can use reverse selection to remove the leap days. This means selecting all dates that are not February 29th.

olr_noleap = olr_da.sel(time=~((olr_da.time.dt.month == 2) & (olr_da.time.dt.day == 29)))  # ~(): reversed selection
                                                                                           # not selecting 2/29
olr_noleap
<xarray.DataArray 'olr' (time: 8760, lat: 90, lon: 360)> Size: 1GB
[283824000 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 70kB 1998-01-01 1998-01-02 ... 2021-12-31
  * lon      (lon) float32 1kB 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * lat      (lat) float32 360B -44.5 -43.5 -42.5 -41.5 ... 41.5 42.5 43.5 44.5
Attributes:
    standard_name:  toa_outgoing_longwave_flux
    long_name:      NOAA Climate Data Record of Daily Mean Upward Longwave Fl...
    units:          W m-2
    cell_methods:   time: mean area: mean

DatetimeIndex and Its Applications#

Using pandas, we can easily create a datetime object. The to_datetime method can convert string with datetime format to a datetime object.

import pandas as pd

pd.to_datetime(["2000-01-01", "2000-02-02"])
DatetimeIndex(['2000-01-01', '2000-02-02'], dtype='datetime64[ns]', freq=None)

We can also specify the start time and the length to create the time series.

ts = pd.date_range("2000-01-01", periods=365)
ts
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
               '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08',
               '2000-01-09', '2000-01-10',
               ...
               '2000-12-21', '2000-12-22', '2000-12-23', '2000-12-24',
               '2000-12-25', '2000-12-26', '2000-12-27', '2000-12-28',
               '2000-12-29', '2000-12-30'],
              dtype='datetime64[ns]', length=365, freq='D')

Or specify the start and end time, and sampling frequency.

ts = pd.date_range(start='2000-01-01',end='2000-12-30',freq='1D')
ts
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
               '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08',
               '2000-01-09', '2000-01-10',
               ...
               '2000-12-21', '2000-12-22', '2000-12-23', '2000-12-24',
               '2000-12-25', '2000-12-26', '2000-12-27', '2000-12-28',
               '2000-12-29', '2000-12-30'],
              dtype='datetime64[ns]', length=365, freq='D')

To convert the datetime with formatted strings, we can use strftime method. For example, we format the datetime to ‘Jan 01 00’ here:

ts.strftime("%b %d %y")
Index(['Jan 01 00', 'Jan 02 00', 'Jan 03 00', 'Jan 04 00', 'Jan 05 00',
       'Jan 06 00', 'Jan 07 00', 'Jan 08 00', 'Jan 09 00', 'Jan 10 00',
       ...
       'Dec 21 00', 'Dec 22 00', 'Dec 23 00', 'Dec 24 00', 'Dec 25 00',
       'Dec 26 00', 'Dec 27 00', 'Dec 28 00', 'Dec 29 00', 'Dec 30 00'],
      dtype='object', length=365)

Note that this string index is no longer a datetime object. The formatter %b means to format months as abbreviated names, and %y means year without century as a zero-padded decimal number. Detailed usages of the formatters can be found in Datetime: strftime-strptime Behavior.

Similarly, we can format the time coordinate of a DataArray into a string format:

olr_da.time.dt.strftime("%b %d %y")
<xarray.DataArray 'strftime' (time: 8760)> Size: 70kB
array(['Jan 01 98', 'Jan 02 98', 'Jan 03 98', ..., 'Dec 29 21',
       'Dec 30 21', 'Dec 31 21'], dtype=object)
Coordinates:
  * time     (time) datetime64[ns] 70kB 1998-01-01 1998-01-02 ... 2021-12-31

Therefore, the DatetimeAccessor xarray.DataArray.time.dt is equivalent to a pandas.DatetimeIndex.

It is important to learn the strftime method because it will be applicable to formatting the time labels on time series plots or Hovmöller diagrams.

datetime and timedelta#

Datetime Accessor and pandas.DatetimeIndex actually belong to datetime objects.

datetime.datetime: A combination of a date and a time. Attributes: year, month, day, hour, minute, second, microsecond, and tzinfo. (datetime offical website)

We can also perform arithmetic calculations on datetime objects. For example, we can use the combination of datetime.datetime and datetime.timedelta to obtain a certain date.

A timedelta object represents a duration, the difference between two dates or times. (datetime offical website)

The following are some arithmetic rules for datetime.datetime and datetime.timedelta:

datetime2 = datetime1 + timedelta 
datetime2 = datetime1 - timedelta
timedelta = datetime1 - datetime2
datetime1 < datetime2