10. pandas
and seaborn
: Statistical Plots#
Interpretation of climate data heavily depends on statistical methods, so it is important to learn how to visualize climate statistical information. Although xarray
can read netCDF files and plot different types of maps, it doesn’t have built-in methods to support statistical plots such as box plots, regression line plots, density maps. Therefore, we need to use external libraries to create these plots. Here, we introduce the seaborn
library. seaborn
is a powerful tool for statistical data visualization. The plotting method is very simple but highly functional, and the output diagrams are aesthetic. In this unit, we will learn how to plot with seaborn
after performing calculations with xarray
.
seaborn
mainly accepts .csv
files and pandas.DataFrame
objects. When passing datasets to seaborn
, we need to tidy the data into a structure that seaborn
can identify, namely long form and wide form. The data structure accepted by seaborn
can be found in the seaborn
tutorial page. In this unit, we will focus on how to convert a DataArray/Dataset to a long or wide form pandas.DataFrame
and pass it to seaborn
. For details of plotting methods and options, see the seaborn
official website, which we will not cover in detail here.
Data Structure of pandas
#
Like xarray
, pandas
handles “labeled-data” (In fact, xarray
originated from pandas
!). Depending on the dimensionality of the data, the pandas data structures are divided into two types: Series and DataFrame.
Series#
A Series is a 1-dimensional, labeled array. It can store multiple types of elements. The “labels” are called index in the context of pandas
. The method to create a new Series is as follows:
s = pd.Series(data, index=index)
A Series can be created by specifying a data series and the label index. Here is a simple example. More detailed usage can be found in the Pandas tutorial.
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
s
a -1.538341
b 0.260428
c -0.999467
d -0.404520
e 0.528400
dtype: float64
DataFrame#
A DataFrame is a 2-dimensional labeled array. It is similar to an Excel sheet. The method to create a new DataFrame is:
df = pd.DataFrame(data, index=index, columns=columns)
Index is the label for each row, whereas columns is the label for each column.
d = np.random.randn(5,3)
df = pd.DataFrame(d, index=['a','b','c','d','e'], columns=['one','two','three'])
df
one | two | three | |
---|---|---|---|
a | 0.312066 | 0.547977 | 0.269142 |
b | 0.381498 | 0.387017 | 1.601161 |
c | 0.014665 | 2.131940 | -0.909217 |
d | 0.630053 | 1.009775 | 0.922236 |
e | 1.251739 | 0.538616 | -0.149302 |
Or we can create a DataFrame using a dictionary. The keys of the dictionary will be treated as column labels (bom
, cma
, ecmwf
, and ncep
in the following example).
df = pd.DataFrame(dict(bom=np.random.randn(10),
cma=np.random.randn(10),
ecmwf=np.random.randn(10),
ncep=np.random.randn(10)),
index=range(1998,2008)
)
df
bom | cma | ecmwf | ncep | |
---|---|---|---|---|
1998 | 0.841404 | -0.296407 | -0.591968 | -1.519123 |
1999 | 0.036797 | -0.230799 | -0.640525 | 1.352359 |
2000 | 1.112564 | 0.787573 | -1.023185 | 1.869436 |
2001 | 0.951301 | 1.492301 | -1.219619 | -1.177986 |
2002 | -0.868670 | -0.218609 | -0.510641 | 0.189584 |
2003 | -0.369046 | -1.118256 | -0.692269 | -1.117366 |
2004 | 0.573508 | 1.112475 | 0.230167 | -0.100584 |
2005 | -0.630151 | 1.322432 | 0.239550 | 0.247403 |
2006 | 1.585305 | -1.136694 | -0.105676 | -0.765231 |
2007 | -0.062118 | -1.985050 | 0.406856 | -1.706546 |
Read .csv
File with pandas
#
We can read a .csv
file using pandas.read_csv()
and convert the file contents to a pandas.DataFrame
.
Example 1a: Tsai et al. (2021, Atmosphere) defined “Subseasonal Peak Rainfall Events (SPREs)” as a maximum 15-day rainfall period within a season. The SPRE is a useful index to evaluate extreme rainfall prediction capability in forecast models. In Tsai et al. (2021), they found SPREs in the DJF season for each year from 1998 to 2019 and calculated the percentile ranks (PR) of the 15-day accumulative rainfall amount in both observation and model reforecasts. The difference of PRs between observation and models can be quantified by “root-mean-squared errors (RMSE)”. The CSV sample file sns_sample_s2s_pr_rmse.csv
contains the RMSEs of PRs for the SPREs in the BoM and CMA reforecasts in the first 15 days prediction lead time.
import pandas as pd
df = pd.read_csv("data/sns_sample_s2s_pr_rmse.csv")
df.head()
Models | Lead time (days) | Year | PR_RMSE | |
---|---|---|---|---|
0 | BoM | 1.0 | 1998.0 | 21.78 |
1 | BoM | 1.0 | 1999.0 | 36.98 |
2 | BoM | 1.0 | 2000.0 | 7.25 |
3 | BoM | 1.0 | 2001.0 | 13.18 |
4 | BoM | 1.0 | 2002.0 | 19.64 |
Plotting Long Form pandas.DataFrame
with seaborn
#
The above .csv
file is long form data accepted by seaborn
. Now we can plot the data.
Example 1b: Plot the RMSEs of SPRE PRs (abbreviated as PR_RMSE
in the file) changing with prediction lead time using box plot. The x-axis is prediction lead time, the y-axis is the magnitude of RMSE, and the spread of box is the multi-year distribution of PR_RMSE.
import matplotlib as mpl
from matplotlib import pyplot as plt
import seaborn as sns
mpl.rcParams['figure.dpi'] = 100
sns.set_theme(style="white", palette=None)
fig, ax = plt.subplots(figsize=(8,4))
bxplt = sns.boxplot(data=df,
x='Lead time (days)', y='PR_RMSE',
ax=ax,
hue='Models',
palette="Set3")
ax.set_ylabel("PR_RMSE")
plt.show()
In the following code, we specify the labels of the x
and y
axes to seaborn
. The hue
parameter is used to add a third dimension to the data visualization by assigning different colors to the categories or values of a specified variable.
We can also use the col
option to plot on multiple subplots. To do this, we need to use catplot
instead of boxplot
because setting the col
option will plot on a FacetGrid instead of on just one subplot.
sns.set_theme(style="white", palette=None)
bxplt = sns.catplot(data=df,
x='Lead time (days)', y='PR_RMSE',
kind='box', col='Models',
hue='Models',
palette="Set3")
ax.set_ylabel("PR_RMSE")
plt.show()
Conversion of xarray.DataArray
to pandas.DataFrame
#
xarray.to_pandas
#
The API reference states that the converted pandas data type is determined by the dimensions of the DataArray.
Convert this array into a pandas object with the same shape. The type of the returned object depends on the number of DataArray dimensions:
0D ->xarray.DataArray
1D ->pandas.Series
2D ->pandas.DataFrame
Only works for arrays with 2 or fewer dimensions.
Example: Scatter Plot and Regression Line#
Example 3: Plot the scatter plot between the Western Pacific Subtropical High (WPSH) Index and domain-averaged rainfall over the Yangtze River Basin (105.5˚-122˚E, 27˚-33.5˚N) during the MJJ season, and calculate the regression line. The WPSH index is calculated based on 850-hPa zonal wind and is defined as
This scatter plot allows us to understand the relationship between the two variables (rainfall and WPSH). The two variables are stored in two DataArrays, and they can be merged into one Dataset. Then we convert the Dataset to a pandas.DataFrame
so that we can plot with seaborn
.
Step 1: Read and select rainfall and wind data.
import xarray as xr
pcpds = xr.open_dataset('data/cmorph_sample.nc')
pcp = (pcpds.sel(time=slice('1998-01-01','2018-12-31'),
lat=slice(27,33.5),
lon=slice(105.5,122)).cmorph)
pcp
<xarray.DataArray 'cmorph' (time: 7670, lat: 26, lon: 66)> Size: 53MB [13161720 values with dtype=float32] Coordinates: * time (time) datetime64[ns] 61kB 1998-01-01 1998-01-02 ... 2018-12-31 * lon (lon) float32 264B 105.6 105.9 106.1 106.4 ... 121.4 121.6 121.9 * lat (lat) float32 104B 27.12 27.38 27.62 27.88 ... 32.88 33.12 33.38 Attributes: standard_name: lwe_precipitation_rate long_name: precipitation units: mm/day ver_note: 1998-2020: V1,0; 2021: V0.x. comment: !!! CMORPH estimate is rainrate !!!
uds = xr.open_mfdataset( 'data/ncep_r2_uv850/u850.*.nc',
combine = "nested",
concat_dim='time',
parallel=True
)
u = uds.sel(time=slice('1998-01-01','2018-12-31'),
level=850,
lat=slice(30,15),
lon=slice(115,140)).uwnd.load()
u
<xarray.DataArray 'uwnd' (time: 7670, lat: 7, lon: 11)> Size: 2MB array([[[ 1.6199951 , 2.199997 , 3.7700043 , ..., 10.75 , 11.020004 , 10.369995 ], [ 2.7700043 , 3.2200012 , 4.399994 , ..., 10.520004 , 8.569992 , 5.7700043 ], [ 4.0200043 , 4.319992 , 4.75 , ..., 6.869995 , 3.7700043 , 0.09999084], ..., [ 0.05000305, 0.02999878, 0.02999878, ..., -2.8600006 , -2.7100067 , -3.3600006 ], [ -3.380005 , -3.2900085 , -3.0800018 , ..., -5.630005 , -4.7599945 , -4.630005 ], [ -5.2900085 , -4.9100037 , -4.8099976 , ..., -6.6100006 , -6.8399963 , -7.2299957 ]], [[ 6.849991 , 6.6399994 , 5.5 , ..., 8.169998 , 9.75 , 10.599991 ], [ 8.669998 , 7.0899963 , 5. , ..., 7.849991 , 9.669998 , 11.099991 ], [ 6.9900055 , 5.1900024 , 3.3399963 , ..., 3.9900055 , 4.619995 , 5.7400055 ], ... [-10.199997 , -12.449997 , -12.220001 , ..., -6.300003 , -4.75 , -3.600006 ], [-14.470001 , -15.869995 , -14.449997 , ..., -9.869995 , -8.770004 , -8.169998 ], [-15.949997 , -17.270004 , -16.350006 , ..., -11.720001 , -11.169998 , -10.869995 ]], [[ -0.19999695, 0.02999878, 0.13000488, ..., 2.649994 , 2.7799988 , 3. ], [ -1.3500061 , -2.4700012 , -3.2200012 , ..., 1.6499939 , 1.300003 , 0.94999695], [ -2.869995 , -5.220001 , -6.669998 , ..., -1.0700073 , -1.449997 , -1.75 ], ..., [-10.199997 , -11.850006 , -12. , ..., -7.650009 , -6.970001 , -6.050003 ], [-14.070007 , -13.949997 , -12.600006 , ..., -9.619995 , -9. , -8.150009 ], [-16.15001 , -14.350006 , -12.25 , ..., -10.720001 , -10.5 , -10.220001 ]]], dtype=float32) Coordinates: * time (time) datetime64[ns] 61kB 1998-01-01 1998-01-02 ... 2018-12-31 * lon (lon) float32 44B 115.0 117.5 120.0 122.5 ... 135.0 137.5 140.0 * lat (lat) float32 28B 30.0 27.5 25.0 22.5 20.0 17.5 15.0 level float32 4B 850.0 Attributes: (12/14) standard_name: eastward_wind long_name: Daily U-wind on Pressure Levels units: m/s unpacked_valid_range: [-140. 175.] actual_range: [-78.96 110.35] precision: 2 ... ... var_desc: u-wind dataset: NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Daily A... level_desc: Pressure Levels statistic: Mean parent_stat: Individual Obs cell_methods: time: mean (of 4 6-hourly values in one day)
Step 2 Statistical analysis: calculate area mean rainfall and WPSH index, and select the season we need.
pcpts = (pcp.mean(axis=(1,2))
.sel(time=~((pcp.time.dt.month == 2) & (pcp.time.dt.day == 29)))
)
pcp_ptd = pcpts.coarsen(time=5,side='left', coord_func={"time": "min"}).mean() # 計算pentad mean
pcp_ptd_mjj = pcp_ptd.sel(time=(pcp_ptd.time.dt.month.isin([5,6,7])))
pcp_ptd_mjj
<xarray.DataArray 'cmorph' (time: 399)> Size: 2kB array([ 4.8970165 , 7.9518876 , 6.319289 , 2.4725406 , 5.27035 , 2.170851 , 4.783776 , 3.937529 , 11.973356 , 10.144266 , 11.061295 , 13.090139 , 9.517157 , 2.0094523 , 3.4387412 , 6.7226806 , 11.502437 , 10.5547905 , 9.479826 , 2.9567833 , 3.5055594 , 4.1681933 , 6.725932 , 11.227705 , 2.9874592 , 2.4019814 , 4.487879 , 2.9615617 , 9.328345 , 8.361376 , 17.8 , 5.0629954 , 9.045652 , 4.8956294 , 9.328997 , 3.7565734 , 3.8314571 , 4.111655 , 0.3021329 , 3.6132984 , 3.5981002 , 2.3257692 , 7.2976217 , 5.5318065 , 13.691736 , 7.9762588 , 4.2428904 , 3.8893006 , 13.693474 , 6.9901514 , 9.405932 , 4.689091 , 6.358776 , 3.2129135 , 2.252133 , 7.383217 , 2.8664687 , 4.964475 , 4.936305 , 1.1263635 , 2.2311187 , 2.2804198 , 4.0579023 , 8.117238 , 6.8227034 , 5.0488696 , 8.666551 , 5.666422 , 4.5268536 , 3.3424478 , 3.6482053 , 6.963858 , 1.558951 , 3.8080535 , 2.550478 , 5.6249647 , 9.637308 , 6.783858 , 12.585258 , 4.844767 , 1.3790094 , 4.6789045 , 1.5510608 , 7.1460257 , 5.1195927 , 7.1560845 , 12.34908 , 12.694429 , 3.5173545 , 2.4259324 , 0.28142193, 9.145932 , 12.452902 , 4.6736712 , 5.1581006 , 6.8205705 , 4.7389283 , 9.617587 , 3.4697907 , 3.9260025 , ... 5.9538927 , 5.01324 , 0.9436598 , 4.4753027 , 6.041911 , 6.934068 , 5.07711 , 4.8284035 , 8.588112 , 1.5448952 , 10.228043 , 1.7740676 , 1.9260607 , 7.228357 , 8.549138 , 6.3187065 , 12.640233 , 1.4202797 , 11.316084 , 7.279161 , 4.142191 , 3.5260372 , 2.4751165 , 4.1878905 , 7.2561064 , 11.466655 , 4.0412 , 1.946387 , 7.6451874 , 10.063392 , 6.7887535 , 6.0969815 , 12.194288 , 8.405164 , 10.297377 , 7.8729143 , 2.699021 , 4.153112 , 5.390607 , 10.084627 , 3.1376457 , 2.0847204 , 9.51324 , 11.786633 , 3.1671562 , 5.3714223 , 4.1579022 , 5.4215965 , 14.319791 , 3.3686714 , 4.624697 , 13.882413 , 12.359255 , 10.334114 , 21.119303 , 5.0222025 , 7.4891496 , 11.105817 , 0.966352 , 3.1370628 , 6.3480887 , 4.5376463 , 4.9669466 , 7.596236 , 1.2960141 , 4.0143824 , 0.98304194, 9.380851 , 8.573579 , 11.14049 , 4.2023306 , 13.881643 , 9.043998 , 9.088369 , 9.648952 , 4.0862937 , 3.827599 , 0.41491842, 1.7904428 , 4.379033 , 6.227797 , 4.3578324 , 1.9046853 , 12.030944 , 7.8016205 , 5.260373 , 2.5838928 , 3.4374127 , 2.6103146 , 6.182331 , 4.548438 , 7.4217596 , 12.894266 , 7.690734 , 2.6264687 , 0.7665617 , 2.525 , 7.5979834 , 5.8932753 ], dtype=float32) Coordinates: * time (time) datetime64[ns] 3kB 1998-05-01 1998-05-06 ... 2018-07-30
ushear = ( u.sel(lat=slice(30,28)).mean(axis=(1,2)) -
u.sel(lat=slice(17,15)).mean(axis=(1,2)) )
ushear_ts = ushear.sel(time=~((ushear.time.dt.month == 2) & (ushear.time.dt.day == 29)))
us_ptd = ushear_ts.coarsen(time=5,side='left', coord_func={"time": "min"}).mean() # 計算pentad mean
us_ptd_mjj = us_ptd.sel(time=(us_ptd.time.dt.month.isin([5,6,7])))
us_ptd_mjj
<xarray.DataArray 'uwnd' (time: 399)> Size: 2kB array([ 1.08629093e+01, 1.14845448e+01, 1.27180004e+01, 7.67090893e+00, -9.08927250e+00, 1.62546352e-01, 6.54854584e+00, 3.03854513e+00, 1.47341814e+01, 1.10549088e+01, 9.51418018e+00, 1.52740021e+01, 1.17298174e+01, 2.34254599e+00, 2.37345576e+00, 7.63490915e+00, 4.29800034e+00, 6.97127247e+00, 4.96727324e+00, 2.77472639e+00, 1.08218277e+00, 4.87781811e+00, 8.18581676e+00, 1.13078175e+01, 6.89981747e+00, -2.22527266e+00, 3.71763611e+00, 1.82545400e+00, 1.15070896e+01, 9.05963612e+00, 1.38921814e+01, 3.99600029e+00, 5.89817762e-01, -4.41818285e+00, 2.03655250e-02, -1.17494535e+01, -5.74236393e+00, -1.61383629e+01, 3.51745367e+00, -4.62781763e+00, 2.14727211e+00, -1.37327230e+00, -1.08218157e+00, 1.28032742e+01, 1.37232723e+01, 6.91254520e+00, 1.41690862e+00, 4.48781872e+00, 1.29796352e+01, 1.31552734e+01, -3.22763681e+00, -1.13463602e+01, 5.61654615e+00, -1.32600141e+00, 3.53109050e+00, -3.57618141e+00, -1.29454046e-01, 3.42327356e+00, 6.09636402e+00, 5.71090996e-01, 3.73472667e+00, 2.64709067e+00, 4.57490826e+00, 8.40818024e+00, 7.98672724e+00, 3.31927180e+00, 1.02049084e+01, 7.53345490e+00, 1.54910917e+01, 1.94982028e+00, 8.39455009e-01, -2.55490899e+00, -1.37581897e+00, -6.59454703e-01, -4.54090881e+00, -2.80399990e+00, 7.21654510e+00, 1.01392727e+01, 8.40781975e+00, 2.78236127e+00, ... -3.35909081e+00, -2.14472914e+00, -1.34690914e+01, 6.40454388e+00, 8.35018063e+00, 1.52945461e+01, 5.94981766e+00, 4.46163559e+00, 6.36672592e+00, 1.17292728e+01, 1.14947262e+01, 1.54529085e+01, 1.32661800e+01, 7.82145548e+00, 1.52783642e+01, 5.10400009e+00, -1.55118198e+01, -1.47272739e+01, -6.11872625e+00, 4.78290796e+00, 7.42691040e+00, 5.28327274e+00, 1.16412716e+01, 1.51381836e+01, 8.13599968e+00, 2.07236433e+00, -7.25818276e-01, 7.70108938e+00, 1.57998517e-01, 2.49309182e+00, 7.68818140e+00, 1.67116356e+01, 1.72925453e+01, 1.54359989e+01, 1.11898193e+01, -3.40909266e+00, 1.10725451e+01, 1.00247269e+01, 2.75854492e+00, 1.69309044e+00, -3.19054580e+00, 5.67236233e+00, 1.40927277e+01, 1.23472738e+01, 6.06909454e-01, -4.36436462e+00, 3.72109175e+00, 1.00218236e+00, 8.18345451e+00, 5.64036417e+00, -1.42000675e-01, 1.12516356e+01, 1.03294554e+01, 1.05718184e+01, 1.32959995e+01, 8.56436348e+00, 1.11461811e+01, 1.51509058e+00, -1.21736355e+01, -1.27656345e+01, 1.33679991e+01, 9.29127407e+00, 8.62581921e+00, 7.17381811e+00, 5.08327198e+00, 8.33309078e+00, 4.55763721e+00, -1.31781673e+00, -6.21254635e+00, -6.81823716e-02, 1.00396366e+01, 5.55800009e+00, -1.58636284e+00, -2.48218107e+00, -5.53545380e+00, -1.40667267e+01, -1.13407259e+01, -6.04981899e+00, -6.60945368e+00], dtype=float32) Coordinates: * time (time) datetime64[ns] 3kB 1998-05-01 1998-05-06 ... 2018-07-30 level float32 4B 850.0
Step 3: Convert the DataArrays to long frame DataFrame, and pass to seaborn
for plotting. To plot scatter plot and regression line, use seaborn.regplot
.
scatter_df = (xr.merge([pcp_ptd_mjj.rename('pcp'), us_ptd_mjj.rename('ushear')])
.to_dataframe())
scatter_df
pcp | level | ushear | |
---|---|---|---|
time | |||
1998-05-01 | 4.897017 | 850.0 | 10.862909 |
1998-05-06 | 7.951888 | 850.0 | 11.484545 |
1998-05-11 | 6.319289 | 850.0 | 12.718000 |
1998-05-16 | 2.472541 | 850.0 | 7.670909 |
1998-05-21 | 5.270350 | 850.0 | -9.089272 |
... | ... | ... | ... |
2018-07-10 | 2.626469 | 850.0 | -5.535454 |
2018-07-15 | 0.766562 | 850.0 | -14.066727 |
2018-07-20 | 2.525000 | 850.0 | -11.340726 |
2018-07-25 | 7.597983 | 850.0 | -6.049819 |
2018-07-30 | 5.893275 | 850.0 | -6.609454 |
399 rows × 3 columns
from scipy import stats
def corr(x, y):
return stats.pearsonr(x, y)[0], stats.pearsonr(x, y)[1]
# 計算相關係數和統計顯著性。
r, p = corr(us_ptd_mjj.values, pcp_ptd_mjj.values)
fig, ax = plt.subplots(figsize=(8,8))
sns.set_theme(style="white", palette=None)
plot = sns.regplot(x="ushear", y="pcp",
data=scatter_df,
ci=95, ax=ax) # ci是信心水準
ax.set_title(f'$R=$ {r:5.3f}, $p=$ {p:8.2e}', loc='right' ) # Correlation coefficient and p-values
Text(1.0, 1.0, '$R=$ 0.394, $p=$ 3.07e-16')
Plotting Wide Form pandas.DataFrame
with seaborn
#
Example 4: Plot a heat map of pentad mean rainfall percentile ranks (PRs) from April to November in 1998-2020 over the Taiwan-northern South China Sea region (18˚-24˚N, 116˚-126˚E).
lats, latn = 18, 24
lon1, lon2 = 116, 126
pcp = pcpds.sel(time=slice('1998-01-01','2020-12-31'),
lat=slice(lats,latn),
lon=slice(lon1,lon2)).cmorph
pcp_ptd_ts = (pcp.mean(axis=(1,2))
.sel(time=~((pcp.time.dt.month == 2) & (pcp.time.dt.day == 29)))
.coarsen(time=5,side='left', coord_func={"time": "min"})
.sum())
pcp_season = pcp_ptd_ts.sel(time=(pcp_ptd_ts.time.dt.month.isin([4,5,6,7,8,9,10,11])))
pcp_rank = pcp_season.rank(dim='time',pct=True) * 100. # Use `DataArray.rank` to rank. `pct=True` can calculate percentiles.
pcp_rank_da = xr.DataArray(data=pcp_rank.values.reshape(23,49), # reshape the array to (year, pentad)
dims=["year", "pentad"],
coords=dict(
year = range(1998,2021,1),
pentad = range(19,68,1),
),
name='precip')
pcp_rank_da
<xarray.DataArray 'precip' (year: 23, pentad: 49)> Size: 9kB array([[2.40461402e+01, 7.98580302e-01, 7.22271517e+01, ..., 6.67258208e+01, 8.33185448e+01, 3.62910382e+01], [3.99290151e+01, 1.87222715e+01, 6.03371783e+01, ..., 2.72404614e+01, 1.30434783e+01, 7.00976043e+01], [4.31233363e+01, 3.47826087e+01, 4.62289264e+01, ..., 6.34427684e+01, 4.42768412e+01, 6.29991127e+00], ..., [8.87311446e-02, 3.44276841e+01, 4.09937888e+01, ..., 1.56166815e+01, 5.49245785e+01, 3.02573203e+01], [4.30346051e+01, 2.66193434e-01, 5.42147294e+01, ..., 9.13930790e+01, 3.16770186e+01, 1.20674357e+01], [4.27684117e+01, 5.04880213e+01, 2.71517303e+01, ..., 1.48181012e+01, 1.27772848e+01, 4.57852706e+01]]) Coordinates: * year (year) int64 184B 1998 1999 2000 2001 2002 ... 2017 2018 2019 2020 * pentad (pentad) int64 392B 19 20 21 22 23 24 25 ... 61 62 63 64 65 66 67
pcp_rank_df = pcp_rank_da.to_pandas()
pcp_rank_df
pentad | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | ... | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
year | |||||||||||||||||||||
1998 | 24.046140 | 0.798580 | 72.227152 | 77.107365 | 32.475599 | 68.411713 | 51.197870 | 26.086957 | 45.519077 | 86.867791 | ... | 94.942325 | 43.300799 | 98.225377 | 15.527950 | 66.104703 | 52.440106 | 20.585626 | 66.725821 | 83.318545 | 36.291038 |
1999 | 39.929015 | 18.722272 | 60.337178 | 5.501331 | 7.897072 | 81.366460 | 70.275067 | 76.220053 | 55.723159 | 49.778172 | ... | 85.891748 | 64.951198 | 10.559006 | 24.312334 | 28.748891 | 13.487134 | 12.156167 | 27.240461 | 13.043478 | 70.097604 |
2000 | 43.123336 | 34.782609 | 46.228926 | 26.974268 | 34.605146 | 58.207631 | 39.219166 | 4.081633 | 70.541260 | 84.117125 | ... | 69.565217 | 48.890861 | 25.554570 | 96.628217 | 88.553682 | 65.838509 | 65.306122 | 63.442768 | 44.276841 | 6.299911 |
2001 | 39.751553 | 37.267081 | 37.799468 | 40.195209 | 61.579414 | 22.448980 | 37.089618 | 58.385093 | 92.280390 | 77.639752 | ... | 55.989352 | 5.944987 | 1.863354 | 26.619343 | 15.882875 | 57.409051 | 66.637090 | 6.654836 | 4.170364 | 2.040816 |
2002 | 4.259095 | 3.637977 | 21.916593 | 6.743567 | 3.992902 | 4.702751 | 0.177462 | 3.460515 | 33.895297 | 79.858030 | ... | 11.002662 | 49.423248 | 43.212067 | 49.068323 | 34.072760 | 50.399290 | 17.125111 | 62.999113 | 26.796806 | 25.820763 |
2003 | 48.003549 | 56.255546 | 20.496894 | 31.322094 | 78.615794 | 7.808341 | 37.533274 | 6.122449 | 11.268855 | 47.914818 | ... | 44.454303 | 11.712511 | 2.750665 | 61.490683 | 57.231588 | 27.861579 | 64.596273 | 50.842946 | 46.140195 | 8.340728 |
2004 | 44.099379 | 18.367347 | 21.118012 | 17.746229 | 5.678793 | 8.518190 | 9.671695 | 30.434783 | 19.698314 | 77.994676 | ... | 15.705413 | 38.509317 | 62.910382 | 1.685892 | 3.194321 | 13.753327 | 19.343390 | 42.945874 | 56.965395 | 28.926353 |
2005 | 7.187223 | 1.419698 | 26.264419 | 8.074534 | 18.988465 | 11.801242 | 17.480035 | 86.956522 | 61.668146 | 27.595386 | ... | 13.664596 | 9.405501 | 29.281278 | 56.166815 | 0.976043 | 42.502218 | 50.044366 | 66.015972 | 33.007986 | 50.931677 |
2006 | 2.218279 | 15.350488 | 46.406389 | 34.960071 | 19.254658 | 49.334516 | 27.950311 | 3.105590 | 69.476486 | 81.543922 | ... | 5.590062 | 11.446318 | 33.984028 | 87.133984 | 72.848270 | 29.547471 | 35.492458 | 54.303461 | 21.827862 | 33.096717 |
2007 | 51.020408 | 33.185448 | 5.767524 | 39.396628 | 27.772848 | 13.398403 | 32.386868 | 52.706300 | 30.967169 | 74.090506 | ... | 30.612245 | 12.954747 | 20.053239 | 54.835847 | 91.481810 | 75.421473 | 7.364685 | 25.199645 | 87.488909 | 43.833185 |
2008 | 12.333629 | 3.904170 | 12.599823 | 48.269743 | 51.641526 | 43.389530 | 46.850044 | 30.079858 | 56.433008 | 71.960958 | ... | 72.315883 | 7.630878 | 14.374445 | 10.825200 | 19.964508 | 93.611358 | 32.209406 | 54.037267 | 62.732919 | 47.204969 |
2009 | 30.700976 | 38.952972 | 50.754215 | 81.721384 | 90.683230 | 40.550133 | 12.688554 | 55.013310 | 0.532387 | 20.763088 | ... | 59.982254 | 72.049689 | 70.629991 | 33.362910 | 60.425909 | 4.525288 | 48.624667 | 46.761313 | 22.626442 | 19.432121 |
2010 | 14.640639 | 35.314996 | 32.564330 | 24.134871 | 33.629104 | 76.841171 | 64.063886 | 21.472937 | 24.755989 | 9.849157 | ... | 67.701863 | 96.716948 | 63.975155 | 23.336291 | 78.083407 | 72.759539 | 80.745342 | 36.645963 | 41.171251 | 10.204082 |
2011 | 10.647737 | 2.484472 | 1.153505 | 58.473824 | 18.101154 | 18.012422 | 10.026619 | 73.469388 | 26.530612 | 86.335404 | ... | 87.577640 | 44.986690 | 29.458740 | 33.451642 | 67.435670 | 94.676131 | 80.035492 | 65.572316 | 50.221828 | 68.677906 |
2012 | 31.588287 | 36.557232 | 1.597161 | 56.699201 | 22.981366 | 64.507542 | 37.976930 | 19.875776 | 54.392192 | 97.249335 | ... | 13.575865 | 3.726708 | 37.444543 | 59.804791 | 69.653949 | 12.244898 | 52.085182 | 59.094942 | 35.758651 | 39.041704 |
2013 | 59.538598 | 64.685004 | 31.410825 | 24.578527 | 29.724933 | 57.941437 | 75.332742 | 28.305235 | 66.193434 | 72.138421 | ... | 39.840284 | 14.285714 | 23.425022 | 76.929902 | 48.447205 | 35.048802 | 42.857143 | 23.070098 | 41.259982 | 42.058563 |
2014 | 49.955634 | 24.489796 | 7.275954 | 0.709849 | 10.292813 | 30.346051 | 64.862467 | 73.824312 | 36.202307 | 37.000887 | ... | 7.985803 | 35.403727 | 34.693878 | 4.436557 | 16.592724 | 9.937888 | 66.903283 | 51.286602 | 16.503993 | 41.881100 |
2015 | 25.643301 | 42.324756 | 39.574091 | 34.871340 | 41.703638 | 16.149068 | 10.115350 | 59.449867 | 38.065661 | 52.883762 | ... | 52.173913 | 85.714286 | 27.417924 | 17.213842 | 54.569654 | 31.144632 | 20.408163 | 28.482697 | 41.614907 | 25.377107 |
2016 | 0.621118 | 40.638864 | 67.169476 | 9.228039 | 29.192547 | 37.355812 | 13.220941 | 20.230701 | 50.310559 | 58.651287 | ... | 69.387755 | 80.390417 | 8.163265 | 46.938776 | 14.729370 | 25.022183 | 16.770186 | 45.430346 | 74.445430 | 81.810115 |
2017 | 19.165927 | 1.952085 | 47.116238 | 26.885537 | 59.361136 | 36.379769 | 10.913931 | 11.623780 | 50.133097 | 95.563443 | ... | 90.417036 | 24.667258 | 11.978705 | 28.837622 | 86.069210 | 64.330080 | 47.648625 | 48.802130 | 46.583851 | 53.149956 |
2018 | 0.088731 | 34.427684 | 40.993789 | 18.456078 | 37.178350 | 6.921029 | 11.091393 | 65.394854 | 4.791482 | 1.064774 | ... | 46.495120 | 28.393966 | 11.357587 | 91.215617 | 23.158829 | 1.774623 | 16.681455 | 15.616681 | 54.924579 | 30.257320 |
2019 | 43.034605 | 0.266193 | 54.214729 | 69.742680 | 8.606921 | 29.015084 | 78.172138 | 75.865129 | 47.293700 | 46.051464 | ... | 24.223602 | 14.995563 | 19.520852 | 37.888199 | 88.819876 | 47.826087 | 38.243123 | 91.393079 | 31.677019 | 12.067436 |
2020 | 42.768412 | 50.488021 | 27.151730 | 1.330967 | 47.471162 | 29.103815 | 3.549246 | 4.880213 | 42.590949 | 89.263531 | ... | 45.607808 | 84.472050 | 79.680568 | 53.238687 | 74.001775 | 76.131322 | 77.817214 | 14.818101 | 12.777285 | 45.785271 |
23 rows × 49 columns
The above DataFrame is in a wide form table.
In a long form DataFrame, labels are only stored in the index. In contrast, a wide form DataFrame stores labels in both columns and index, allowing for a two-dimensional representation.
# Plot
fig, ax = plt.subplots(figsize=(12,8))
sns.set_theme()
ax = sns.heatmap(pcp_rank_df,
cmap='jet_r',
square=True,
vmin=1,vmax=100,
cbar_kws={"shrink": 0.55, 'extend':'neither'},
xticklabels=2)
plt.xlabel("Pentad")
plt.ylabel("Years")
ax.set_title("Taiwan-Northern SCS, April to November",loc='left')
plt.savefig("pcp_pr_heatmap_obs_chn.png",orientation='portrait',dpi=300)
Conversion between Long Form and Wide Form#
Use pandas.DataFrame.unstack
.
pcp_rank_long = pcp_rank_df.unstack().reset_index(name='PR')
pcp_rank_long
pentad | year | PR | |
---|---|---|---|
0 | 19 | 1998 | 24.046140 |
1 | 19 | 1999 | 39.929015 |
2 | 19 | 2000 | 43.123336 |
3 | 19 | 2001 | 39.751553 |
4 | 19 | 2002 | 4.259095 |
... | ... | ... | ... |
1122 | 67 | 2016 | 81.810115 |
1123 | 67 | 2017 | 53.149956 |
1124 | 67 | 2018 | 30.257320 |
1125 | 67 | 2019 | 12.067436 |
1126 | 67 | 2020 | 45.785271 |
1127 rows × 3 columns