4. Conditional Control of datetime
Index#
Conditional Control of datetime
Index in xarray
#
In Unit 3, we demonstrated how to use the sel
method with slice
to select data with a continuous temporal or spatial range. However, sometimes we need to select non-continuous time periods, such as certain months over several years. In these cases, it is not useful to select with slice
. Therefore, we can use conditional control arguments to select data that meet the requirements we specify. Specifically, the time coordinate in xarray is a datetime object, which includes datetime attributes such as year, month, day, and so on. We can use these attributes to select the dates we like.
Example 1: Select only the JAS season data.
import xarray as xr
olr_da = xr.open_dataset("data/olr.nc").olr
olr_jas = olr_da.sel(time=(olr_da.time.dt.month.isin([7,8,9])))
olr_jas
<xarray.DataArray 'olr' (time: 2208, lat: 90, lon: 360)> Size: 286MB [71539200 values with dtype=float32] Coordinates: * time (time) datetime64[ns] 18kB 1998-07-01 1998-07-02 ... 2021-09-30 * lon (lon) float32 1kB 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5 * lat (lat) float32 360B -44.5 -43.5 -42.5 -41.5 ... 41.5 42.5 43.5 44.5 Attributes: standard_name: toa_outgoing_longwave_flux long_name: NOAA Climate Data Record of Daily Mean Upward Longwave Fl... units: W m-2 cell_methods: time: mean area: mean
For time=(olr_da.time.dt.month.isin([7, 8, 9]))
, xarray will check if the month of each timestep falls in either the 7th, 8th, or 9th month (i.e., July, August, or September). If so, the timestep will be marked True
. Otherwise, it will be marked False
. Finally, only the data points marked True
will be preserved.
Example 2: Remove Leap Days
Similar to Example 1, we can use reverse selection to remove the leap days. This means selecting all dates that are not February 29th.
olr_noleap = olr_da.sel(time=~((olr_da.time.dt.month == 2) & (olr_da.time.dt.day == 29))) # ~(): reversed selection
# not selecting 2/29
olr_noleap
<xarray.DataArray 'olr' (time: 8760, lat: 90, lon: 360)> Size: 1GB [283824000 values with dtype=float32] Coordinates: * time (time) datetime64[ns] 70kB 1998-01-01 1998-01-02 ... 2021-12-31 * lon (lon) float32 1kB 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5 * lat (lat) float32 360B -44.5 -43.5 -42.5 -41.5 ... 41.5 42.5 43.5 44.5 Attributes: standard_name: toa_outgoing_longwave_flux long_name: NOAA Climate Data Record of Daily Mean Upward Longwave Fl... units: W m-2 cell_methods: time: mean area: mean
DatetimeIndex and Its Applications#
Using pandas
, we can easily create a datetime object. The to_datetime
method can convert string with datetime format to a datetime object.
import pandas as pd
pd.to_datetime(["2000-01-01", "2000-02-02"])
DatetimeIndex(['2000-01-01', '2000-02-02'], dtype='datetime64[ns]', freq=None)
We can also specify the start time and the length to create the time series.
ts = pd.date_range("2000-01-01", periods=365)
ts
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
'2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08',
'2000-01-09', '2000-01-10',
...
'2000-12-21', '2000-12-22', '2000-12-23', '2000-12-24',
'2000-12-25', '2000-12-26', '2000-12-27', '2000-12-28',
'2000-12-29', '2000-12-30'],
dtype='datetime64[ns]', length=365, freq='D')
Or specify the start and end time, and sampling frequency.
ts = pd.date_range(start='2000-01-01',end='2000-12-30',freq='1D')
ts
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
'2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08',
'2000-01-09', '2000-01-10',
...
'2000-12-21', '2000-12-22', '2000-12-23', '2000-12-24',
'2000-12-25', '2000-12-26', '2000-12-27', '2000-12-28',
'2000-12-29', '2000-12-30'],
dtype='datetime64[ns]', length=365, freq='D')
To convert the datetime with formatted strings, we can use strftime
method. For example, we format the datetime to ‘Jan 01 00’ here:
ts.strftime("%b %d %y")
Index(['Jan 01 00', 'Jan 02 00', 'Jan 03 00', 'Jan 04 00', 'Jan 05 00',
'Jan 06 00', 'Jan 07 00', 'Jan 08 00', 'Jan 09 00', 'Jan 10 00',
...
'Dec 21 00', 'Dec 22 00', 'Dec 23 00', 'Dec 24 00', 'Dec 25 00',
'Dec 26 00', 'Dec 27 00', 'Dec 28 00', 'Dec 29 00', 'Dec 30 00'],
dtype='object', length=365)
Note that this string index is no longer a datetime object. The formatter %b
means to format months as abbreviated names, and %y
means year without century as a zero-padded decimal number. Detailed usages of the formatters can be found in Datetime: strftime
-strptime
Behavior.
Similarly, we can format the time coordinate of a DataArray into a string format:
olr_da.time.dt.strftime("%b %d %y")
<xarray.DataArray 'strftime' (time: 8760)> Size: 70kB array(['Jan 01 98', 'Jan 02 98', 'Jan 03 98', ..., 'Dec 29 21', 'Dec 30 21', 'Dec 31 21'], dtype=object) Coordinates: * time (time) datetime64[ns] 70kB 1998-01-01 1998-01-02 ... 2021-12-31
Therefore, the DatetimeAccessor xarray.DataArray.time.dt
is equivalent to a pandas.DatetimeIndex
.
It is important to learn the strftime
method because it will be applicable to formatting the time labels on time series plots or Hovmöller diagrams.
datetime
and timedelta
#
Datetime Accessor and pandas.DatetimeIndex
actually belong to datetime objects.
datetime.datetime
: A combination of a date and a time. Attributes: year, month, day, hour, minute, second, microsecond, and tzinfo. (datetime
offical website)
We can also perform arithmetic calculations on datetime objects. For example, we can use the combination of datetime.datetime
and datetime.timedelta
to obtain a certain date.
A timedelta object represents a duration, the difference between two dates or times. (
datetime
offical website)
The following are some arithmetic rules for datetime.datetime
and datetime.timedelta
:
datetime2 = datetime1 + timedelta
datetime2 = datetime1 - timedelta
timedelta = datetime1 - datetime2
datetime1 < datetime2