10. pandas and seaborn: Statistical Plots

10. `pandas` and `seaborn`: Statistical Plots#

Interpretation of climate data heavily depends on statistical methods, so it is important to learn how to visualize climate statistical information. Although xarray can read netCDF files and plot different types of maps, it doesn’t have built-in methods to support statistical plots such as box plots, regression line plots, density maps. Therefore, we need to use external libraries to create these plots. Here, we introduce the seaborn library. seaborn is a powerful tool for statistical data visualization. The plotting method is very simple but highly functional, and the output diagrams are aesthetic. In this unit, we will learn how to plot with seaborn after performing calculations with xarray.

seaborn mainly accepts .csv files and pandas.DataFrame objects. When passing datasets to seaborn, we need to tidy the data into a structure that seaborn can identify, namely long form and wide form. The data structure accepted by seaborn can be found in the seaborn tutorial page. In this unit, we will focus on how to convert a DataArray/Dataset to a long or wide form pandas.DataFrame and pass it to seaborn. For details of plotting methods and options, see the seaborn official website, which we will not cover in detail here.

Data Structure

Data Structure of `pandas`#

Like xarray, pandas handles “labeled-data” (In fact, xarray originated from pandas!). Depending on the dimensionality of the data, the pandas data structures are divided into two types: Series and DataFrame.

Series#

A Series is a 1-dimensional, labeled array. It can store multiple types of elements. The “labels” are called index in the context of pandas. The method to create a new Series is as follows:

s = pd.Series(data, index=index)

A Series can be created by specifying a data series and the label index. Here is a simple example. More detailed usage can be found in the Pandas tutorial.

import numpy as np 
import pandas as pd 

s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
s

a   -1.538341
b    0.260428
c   -0.999467
d   -0.404520
e    0.528400
dtype: float64

DataFrame#

A DataFrame is a 2-dimensional labeled array. It is similar to an Excel sheet. The method to create a new DataFrame is:

df = pd.DataFrame(data, index=index, columns=columns)

Index is the label for each row, whereas columns is the label for each column.

d = np.random.randn(5,3)
df = pd.DataFrame(d, index=['a','b','c','d','e'], columns=['one','two','three'])
df

	one	two	three
a	0.312066	0.547977	0.269142
b	0.381498	0.387017	1.601161
c	0.014665	2.131940	-0.909217
d	0.630053	1.009775	0.922236
e	1.251739	0.538616	-0.149302

Or we can create a DataFrame using a dictionary. The keys of the dictionary will be treated as column labels (bom, cma, ecmwf, and ncep in the following example).

df = pd.DataFrame(dict(bom=np.random.randn(10),
                       cma=np.random.randn(10),
                       ecmwf=np.random.randn(10),
                       ncep=np.random.randn(10)), 
                  index=range(1998,2008)
                  )
df

	bom	cma	ecmwf	ncep
1998	0.841404	-0.296407	-0.591968	-1.519123
1999	0.036797	-0.230799	-0.640525	1.352359
2000	1.112564	0.787573	-1.023185	1.869436
2001	0.951301	1.492301	-1.219619	-1.177986
2002	-0.868670	-0.218609	-0.510641	0.189584
2003	-0.369046	-1.118256	-0.692269	-1.117366
2004	0.573508	1.112475	0.230167	-0.100584
2005	-0.630151	1.322432	0.239550	0.247403
2006	1.585305	-1.136694	-0.105676	-0.765231
2007	-0.062118	-1.985050	0.406856	-1.706546

Read `.csv` File with `pandas`#

We can read a .csv file using pandas.read_csv() and convert the file contents to a pandas.DataFrame.

Example 1a: Tsai et al. (2021, Atmosphere) defined “Subseasonal Peak Rainfall Events (SPREs)” as a maximum 15-day rainfall period within a season. The SPRE is a useful index to evaluate extreme rainfall prediction capability in forecast models. In Tsai et al. (2021), they found SPREs in the DJF season for each year from 1998 to 2019 and calculated the percentile ranks (PR) of the 15-day accumulative rainfall amount in both observation and model reforecasts. The difference of PRs between observation and models can be quantified by “root-mean-squared errors (RMSE)”. The CSV sample file sns_sample_s2s_pr_rmse.csv contains the RMSEs of PRs for the SPREs in the BoM and CMA reforecasts in the first 15 days prediction lead time.

import pandas as pd

df = pd.read_csv("data/sns_sample_s2s_pr_rmse.csv")
df.head()

	Models	Lead time (days)	Year	PR_RMSE
0	BoM	1.0	1998.0	21.78
1	BoM	1.0	1999.0	36.98
2	BoM	1.0	2000.0	7.25
3	BoM	1.0	2001.0	13.18
4	BoM	1.0	2002.0	19.64

Plotting Long Form `pandas.DataFrame` with `seaborn`#

The above .csv file is long form data accepted by seaborn. Now we can plot the data.

Example 1b: Plot the RMSEs of SPRE PRs (abbreviated as PR_RMSE in the file) changing with prediction lead time using box plot. The x-axis is prediction lead time, the y-axis is the magnitude of RMSE, and the spread of box is the multi-year distribution of PR_RMSE.

import matplotlib as mpl
from matplotlib import pyplot as plt
import seaborn as sns

mpl.rcParams['figure.dpi'] = 100

sns.set_theme(style="white", palette=None)
fig, ax = plt.subplots(figsize=(8,4)) 
bxplt = sns.boxplot(data=df,
                    x='Lead time (days)', y='PR_RMSE', 
                    ax=ax,
                    hue='Models',
                    palette="Set3")
ax.set_ylabel("PR_RMSE")
plt.show()

_images/a9ef1e4bfdf89923a16ca35daf85d380eec472b65ce2e8e52cca776c32c38247.png

In the following code, we specify the labels of the x and y axes to seaborn. The hue parameter is used to add a third dimension to the data visualization by assigning different colors to the categories or values of a specified variable.

We can also use the col option to plot on multiple subplots. To do this, we need to use catplot instead of boxplot because setting the col option will plot on a FacetGrid instead of on just one subplot.

sns.set_theme(style="white", palette=None)
bxplt = sns.catplot(data=df,
                    x='Lead time (days)', y='PR_RMSE', 
                    kind='box', col='Models',
                    hue='Models',
                    palette="Set3")
ax.set_ylabel("PR_RMSE")
plt.show()

_images/49f7bca443876fdc424f9d66b9b4044caa99be252f727120a9d8ee8f56590cc5.png

Conversion of `xarray.DataArray` to `pandas.DataFrame`#

`xarray.to_pandas`#

The API reference states that the converted pandas data type is determined by the dimensions of the DataArray.

Convert this array into a pandas object with the same shape. The type of the returned object depends on the number of DataArray dimensions:
0D -> xarray.DataArray
1D -> pandas.Series
2D -> pandas.DataFrame
Only works for arrays with 2 or fewer dimensions.

Example: Scatter Plot and Regression Line#

Example 3: Plot the scatter plot between the Western Pacific Subtropical High (WPSH) Index and domain-averaged rainfall over the Yangtze River Basin (105.5˚-122˚E, 27˚-33.5˚N) during the MJJ season, and calculate the regression line. The WPSH index is calculated based on 850-hPa zonal wind and is defined as

\[\mathrm{WPSH} = U_{850}\left[ (115˚-140˚\mathrm{E}, 28˚-30˚\mathrm{N})\right] − U_{850}\left[(115˚-140˚\mathrm{E}, 15˚-17˚\mathrm{N})\right]\]

This scatter plot allows us to understand the relationship between the two variables (rainfall and WPSH). The two variables are stored in two DataArrays, and they can be merged into one Dataset. Then we convert the Dataset to a pandas.DataFrame so that we can plot with seaborn.

Step 1: Read and select rainfall and wind data.

Step 2 Statistical analysis: calculate area mean rainfall and WPSH index, and select the season we need.

pcpts = (pcp.mean(axis=(1,2))
            .sel(time=~((pcp.time.dt.month == 2) & (pcp.time.dt.day == 29)))
            )
pcp_ptd = pcpts.coarsen(time=5,side='left', coord_func={"time": "min"}).mean()  # 計算pentad mean
pcp_ptd_mjj = pcp_ptd.sel(time=(pcp_ptd.time.dt.month.isin([5,6,7])))
pcp_ptd_mjj

<xarray.DataArray 'cmorph' (time: 399)> Size: 2kB
array([ 4.8970165 ,  7.9518876 ,  6.319289  ,  2.4725406 ,  5.27035   ,
        2.170851  ,  4.783776  ,  3.937529  , 11.973356  , 10.144266  ,
       11.061295  , 13.090139  ,  9.517157  ,  2.0094523 ,  3.4387412 ,
        6.7226806 , 11.502437  , 10.5547905 ,  9.479826  ,  2.9567833 ,
        3.5055594 ,  4.1681933 ,  6.725932  , 11.227705  ,  2.9874592 ,
        2.4019814 ,  4.487879  ,  2.9615617 ,  9.328345  ,  8.361376  ,
       17.8       ,  5.0629954 ,  9.045652  ,  4.8956294 ,  9.328997  ,
        3.7565734 ,  3.8314571 ,  4.111655  ,  0.3021329 ,  3.6132984 ,
        3.5981002 ,  2.3257692 ,  7.2976217 ,  5.5318065 , 13.691736  ,
        7.9762588 ,  4.2428904 ,  3.8893006 , 13.693474  ,  6.9901514 ,
        9.405932  ,  4.689091  ,  6.358776  ,  3.2129135 ,  2.252133  ,
        7.383217  ,  2.8664687 ,  4.964475  ,  4.936305  ,  1.1263635 ,
        2.2311187 ,  2.2804198 ,  4.0579023 ,  8.117238  ,  6.8227034 ,
        5.0488696 ,  8.666551  ,  5.666422  ,  4.5268536 ,  3.3424478 ,
        3.6482053 ,  6.963858  ,  1.558951  ,  3.8080535 ,  2.550478  ,
        5.6249647 ,  9.637308  ,  6.783858  , 12.585258  ,  4.844767  ,
        1.3790094 ,  4.6789045 ,  1.5510608 ,  7.1460257 ,  5.1195927 ,
        7.1560845 , 12.34908   , 12.694429  ,  3.5173545 ,  2.4259324 ,
        0.28142193,  9.145932  , 12.452902  ,  4.6736712 ,  5.1581006 ,
        6.8205705 ,  4.7389283 ,  9.617587  ,  3.4697907 ,  3.9260025 ,
...
        5.9538927 ,  5.01324   ,  0.9436598 ,  4.4753027 ,  6.041911  ,
        6.934068  ,  5.07711   ,  4.8284035 ,  8.588112  ,  1.5448952 ,
       10.228043  ,  1.7740676 ,  1.9260607 ,  7.228357  ,  8.549138  ,
        6.3187065 , 12.640233  ,  1.4202797 , 11.316084  ,  7.279161  ,
        4.142191  ,  3.5260372 ,  2.4751165 ,  4.1878905 ,  7.2561064 ,
       11.466655  ,  4.0412    ,  1.946387  ,  7.6451874 , 10.063392  ,
        6.7887535 ,  6.0969815 , 12.194288  ,  8.405164  , 10.297377  ,
        7.8729143 ,  2.699021  ,  4.153112  ,  5.390607  , 10.084627  ,
        3.1376457 ,  2.0847204 ,  9.51324   , 11.786633  ,  3.1671562 ,
        5.3714223 ,  4.1579022 ,  5.4215965 , 14.319791  ,  3.3686714 ,
        4.624697  , 13.882413  , 12.359255  , 10.334114  , 21.119303  ,
        5.0222025 ,  7.4891496 , 11.105817  ,  0.966352  ,  3.1370628 ,
        6.3480887 ,  4.5376463 ,  4.9669466 ,  7.596236  ,  1.2960141 ,
        4.0143824 ,  0.98304194,  9.380851  ,  8.573579  , 11.14049   ,
        4.2023306 , 13.881643  ,  9.043998  ,  9.088369  ,  9.648952  ,
        4.0862937 ,  3.827599  ,  0.41491842,  1.7904428 ,  4.379033  ,
        6.227797  ,  4.3578324 ,  1.9046853 , 12.030944  ,  7.8016205 ,
        5.260373  ,  2.5838928 ,  3.4374127 ,  2.6103146 ,  6.182331  ,
        4.548438  ,  7.4217596 , 12.894266  ,  7.690734  ,  2.6264687 ,
        0.7665617 ,  2.525     ,  7.5979834 ,  5.8932753 ], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 3kB 1998-05-01 1998-05-06 ... 2018-07-30

xarray.DataArray

'cmorph'

time: 399

4.897 7.952 6.319 2.473 5.27 2.171 ... 2.626 0.7666 2.525 7.598 5.893

array([ 4.8970165 ,  7.9518876 ,  6.319289  ,  2.4725406 ,  5.27035   ,
        2.170851  ,  4.783776  ,  3.937529  , 11.973356  , 10.144266  ,
       11.061295  , 13.090139  ,  9.517157  ,  2.0094523 ,  3.4387412 ,
        6.7226806 , 11.502437  , 10.5547905 ,  9.479826  ,  2.9567833 ,
        3.5055594 ,  4.1681933 ,  6.725932  , 11.227705  ,  2.9874592 ,
        2.4019814 ,  4.487879  ,  2.9615617 ,  9.328345  ,  8.361376  ,
       17.8       ,  5.0629954 ,  9.045652  ,  4.8956294 ,  9.328997  ,
        3.7565734 ,  3.8314571 ,  4.111655  ,  0.3021329 ,  3.6132984 ,
        3.5981002 ,  2.3257692 ,  7.2976217 ,  5.5318065 , 13.691736  ,
        7.9762588 ,  4.2428904 ,  3.8893006 , 13.693474  ,  6.9901514 ,
        9.405932  ,  4.689091  ,  6.358776  ,  3.2129135 ,  2.252133  ,
        7.383217  ,  2.8664687 ,  4.964475  ,  4.936305  ,  1.1263635 ,
        2.2311187 ,  2.2804198 ,  4.0579023 ,  8.117238  ,  6.8227034 ,
        5.0488696 ,  8.666551  ,  5.666422  ,  4.5268536 ,  3.3424478 ,
        3.6482053 ,  6.963858  ,  1.558951  ,  3.8080535 ,  2.550478  ,
        5.6249647 ,  9.637308  ,  6.783858  , 12.585258  ,  4.844767  ,
        1.3790094 ,  4.6789045 ,  1.5510608 ,  7.1460257 ,  5.1195927 ,
        7.1560845 , 12.34908   , 12.694429  ,  3.5173545 ,  2.4259324 ,
        0.28142193,  9.145932  , 12.452902  ,  4.6736712 ,  5.1581006 ,
        6.8205705 ,  4.7389283 ,  9.617587  ,  3.4697907 ,  3.9260025 ,
...
        5.9538927 ,  5.01324   ,  0.9436598 ,  4.4753027 ,  6.041911  ,
        6.934068  ,  5.07711   ,  4.8284035 ,  8.588112  ,  1.5448952 ,
       10.228043  ,  1.7740676 ,  1.9260607 ,  7.228357  ,  8.549138  ,
        6.3187065 , 12.640233  ,  1.4202797 , 11.316084  ,  7.279161  ,
        4.142191  ,  3.5260372 ,  2.4751165 ,  4.1878905 ,  7.2561064 ,
       11.466655  ,  4.0412    ,  1.946387  ,  7.6451874 , 10.063392  ,
        6.7887535 ,  6.0969815 , 12.194288  ,  8.405164  , 10.297377  ,
        7.8729143 ,  2.699021  ,  4.153112  ,  5.390607  , 10.084627  ,
        3.1376457 ,  2.0847204 ,  9.51324   , 11.786633  ,  3.1671562 ,
        5.3714223 ,  4.1579022 ,  5.4215965 , 14.319791  ,  3.3686714 ,
        4.624697  , 13.882413  , 12.359255  , 10.334114  , 21.119303  ,
        5.0222025 ,  7.4891496 , 11.105817  ,  0.966352  ,  3.1370628 ,
        6.3480887 ,  4.5376463 ,  4.9669466 ,  7.596236  ,  1.2960141 ,
        4.0143824 ,  0.98304194,  9.380851  ,  8.573579  , 11.14049   ,
        4.2023306 , 13.881643  ,  9.043998  ,  9.088369  ,  9.648952  ,
        4.0862937 ,  3.827599  ,  0.41491842,  1.7904428 ,  4.379033  ,
        6.227797  ,  4.3578324 ,  1.9046853 , 12.030944  ,  7.8016205 ,
        5.260373  ,  2.5838928 ,  3.4374127 ,  2.6103146 ,  6.182331  ,
        4.548438  ,  7.4217596 , 12.894266  ,  7.690734  ,  2.6264687 ,
        0.7665617 ,  2.525     ,  7.5979834 ,  5.8932753 ], dtype=float32)

Coordinates: (1)

time

(time)

datetime64[ns]

1998-05-01 ... 2018-07-30

standard_name :: time
long_name :: starting point of time period
axis :: T

array(['1998-05-01T00:00:00.000000000', '1998-05-06T00:00:00.000000000',
       '1998-05-11T00:00:00.000000000', ..., '2018-07-20T00:00:00.000000000',
       '2018-07-25T00:00:00.000000000', '2018-07-30T00:00:00.000000000'],
      dtype='datetime64[ns]')

Indexes: (1)

time

PandasIndex

PandasIndex(DatetimeIndex(['1998-05-01', '1998-05-06', '1998-05-11', '1998-05-16',
               '1998-05-21', '1998-05-26', '1998-05-31', '1998-06-05',
               '1998-06-10', '1998-06-15',
               ...
               '2018-06-15', '2018-06-20', '2018-06-25', '2018-06-30',
               '2018-07-05', '2018-07-10', '2018-07-15', '2018-07-20',
               '2018-07-25', '2018-07-30'],
              dtype='datetime64[ns]', name='time', length=399, freq=None))

Attributes: (0)

ushear =  ( u.sel(lat=slice(30,28)).mean(axis=(1,2)) -
            u.sel(lat=slice(17,15)).mean(axis=(1,2)) )
ushear_ts = ushear.sel(time=~((ushear.time.dt.month == 2) & (ushear.time.dt.day == 29)))

us_ptd = ushear_ts.coarsen(time=5,side='left', coord_func={"time": "min"}).mean()  # 計算pentad mean
us_ptd_mjj = us_ptd.sel(time=(us_ptd.time.dt.month.isin([5,6,7])))
us_ptd_mjj

<xarray.DataArray 'uwnd' (time: 399)> Size: 2kB
array([ 1.08629093e+01,  1.14845448e+01,  1.27180004e+01,  7.67090893e+00,
       -9.08927250e+00,  1.62546352e-01,  6.54854584e+00,  3.03854513e+00,
        1.47341814e+01,  1.10549088e+01,  9.51418018e+00,  1.52740021e+01,
        1.17298174e+01,  2.34254599e+00,  2.37345576e+00,  7.63490915e+00,
        4.29800034e+00,  6.97127247e+00,  4.96727324e+00,  2.77472639e+00,
        1.08218277e+00,  4.87781811e+00,  8.18581676e+00,  1.13078175e+01,
        6.89981747e+00, -2.22527266e+00,  3.71763611e+00,  1.82545400e+00,
        1.15070896e+01,  9.05963612e+00,  1.38921814e+01,  3.99600029e+00,
        5.89817762e-01, -4.41818285e+00,  2.03655250e-02, -1.17494535e+01,
       -5.74236393e+00, -1.61383629e+01,  3.51745367e+00, -4.62781763e+00,
        2.14727211e+00, -1.37327230e+00, -1.08218157e+00,  1.28032742e+01,
        1.37232723e+01,  6.91254520e+00,  1.41690862e+00,  4.48781872e+00,
        1.29796352e+01,  1.31552734e+01, -3.22763681e+00, -1.13463602e+01,
        5.61654615e+00, -1.32600141e+00,  3.53109050e+00, -3.57618141e+00,
       -1.29454046e-01,  3.42327356e+00,  6.09636402e+00,  5.71090996e-01,
        3.73472667e+00,  2.64709067e+00,  4.57490826e+00,  8.40818024e+00,
        7.98672724e+00,  3.31927180e+00,  1.02049084e+01,  7.53345490e+00,
        1.54910917e+01,  1.94982028e+00,  8.39455009e-01, -2.55490899e+00,
       -1.37581897e+00, -6.59454703e-01, -4.54090881e+00, -2.80399990e+00,
        7.21654510e+00,  1.01392727e+01,  8.40781975e+00,  2.78236127e+00,
...
       -3.35909081e+00, -2.14472914e+00, -1.34690914e+01,  6.40454388e+00,
        8.35018063e+00,  1.52945461e+01,  5.94981766e+00,  4.46163559e+00,
        6.36672592e+00,  1.17292728e+01,  1.14947262e+01,  1.54529085e+01,
        1.32661800e+01,  7.82145548e+00,  1.52783642e+01,  5.10400009e+00,
       -1.55118198e+01, -1.47272739e+01, -6.11872625e+00,  4.78290796e+00,
        7.42691040e+00,  5.28327274e+00,  1.16412716e+01,  1.51381836e+01,
        8.13599968e+00,  2.07236433e+00, -7.25818276e-01,  7.70108938e+00,
        1.57998517e-01,  2.49309182e+00,  7.68818140e+00,  1.67116356e+01,
        1.72925453e+01,  1.54359989e+01,  1.11898193e+01, -3.40909266e+00,
        1.10725451e+01,  1.00247269e+01,  2.75854492e+00,  1.69309044e+00,
       -3.19054580e+00,  5.67236233e+00,  1.40927277e+01,  1.23472738e+01,
        6.06909454e-01, -4.36436462e+00,  3.72109175e+00,  1.00218236e+00,
        8.18345451e+00,  5.64036417e+00, -1.42000675e-01,  1.12516356e+01,
        1.03294554e+01,  1.05718184e+01,  1.32959995e+01,  8.56436348e+00,
        1.11461811e+01,  1.51509058e+00, -1.21736355e+01, -1.27656345e+01,
        1.33679991e+01,  9.29127407e+00,  8.62581921e+00,  7.17381811e+00,
        5.08327198e+00,  8.33309078e+00,  4.55763721e+00, -1.31781673e+00,
       -6.21254635e+00, -6.81823716e-02,  1.00396366e+01,  5.55800009e+00,
       -1.58636284e+00, -2.48218107e+00, -5.53545380e+00, -1.40667267e+01,
       -1.13407259e+01, -6.04981899e+00, -6.60945368e+00], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 3kB 1998-05-01 1998-05-06 ... 2018-07-30
    level    float32 4B 850.0

xarray.DataArray

'uwnd'

time: 399

10.86 11.48 12.72 7.671 -9.089 ... -5.535 -14.07 -11.34 -6.05 -6.609

array([ 1.08629093e+01,  1.14845448e+01,  1.27180004e+01,  7.67090893e+00,
       -9.08927250e+00,  1.62546352e-01,  6.54854584e+00,  3.03854513e+00,
        1.47341814e+01,  1.10549088e+01,  9.51418018e+00,  1.52740021e+01,
        1.17298174e+01,  2.34254599e+00,  2.37345576e+00,  7.63490915e+00,
        4.29800034e+00,  6.97127247e+00,  4.96727324e+00,  2.77472639e+00,
        1.08218277e+00,  4.87781811e+00,  8.18581676e+00,  1.13078175e+01,
        6.89981747e+00, -2.22527266e+00,  3.71763611e+00,  1.82545400e+00,
        1.15070896e+01,  9.05963612e+00,  1.38921814e+01,  3.99600029e+00,
        5.89817762e-01, -4.41818285e+00,  2.03655250e-02, -1.17494535e+01,
       -5.74236393e+00, -1.61383629e+01,  3.51745367e+00, -4.62781763e+00,
        2.14727211e+00, -1.37327230e+00, -1.08218157e+00,  1.28032742e+01,
        1.37232723e+01,  6.91254520e+00,  1.41690862e+00,  4.48781872e+00,
        1.29796352e+01,  1.31552734e+01, -3.22763681e+00, -1.13463602e+01,
        5.61654615e+00, -1.32600141e+00,  3.53109050e+00, -3.57618141e+00,
       -1.29454046e-01,  3.42327356e+00,  6.09636402e+00,  5.71090996e-01,
        3.73472667e+00,  2.64709067e+00,  4.57490826e+00,  8.40818024e+00,
        7.98672724e+00,  3.31927180e+00,  1.02049084e+01,  7.53345490e+00,
        1.54910917e+01,  1.94982028e+00,  8.39455009e-01, -2.55490899e+00,
       -1.37581897e+00, -6.59454703e-01, -4.54090881e+00, -2.80399990e+00,
        7.21654510e+00,  1.01392727e+01,  8.40781975e+00,  2.78236127e+00,
...
       -3.35909081e+00, -2.14472914e+00, -1.34690914e+01,  6.40454388e+00,
        8.35018063e+00,  1.52945461e+01,  5.94981766e+00,  4.46163559e+00,
        6.36672592e+00,  1.17292728e+01,  1.14947262e+01,  1.54529085e+01,
        1.32661800e+01,  7.82145548e+00,  1.52783642e+01,  5.10400009e+00,
       -1.55118198e+01, -1.47272739e+01, -6.11872625e+00,  4.78290796e+00,
        7.42691040e+00,  5.28327274e+00,  1.16412716e+01,  1.51381836e+01,
        8.13599968e+00,  2.07236433e+00, -7.25818276e-01,  7.70108938e+00,
        1.57998517e-01,  2.49309182e+00,  7.68818140e+00,  1.67116356e+01,
        1.72925453e+01,  1.54359989e+01,  1.11898193e+01, -3.40909266e+00,
        1.10725451e+01,  1.00247269e+01,  2.75854492e+00,  1.69309044e+00,
       -3.19054580e+00,  5.67236233e+00,  1.40927277e+01,  1.23472738e+01,
        6.06909454e-01, -4.36436462e+00,  3.72109175e+00,  1.00218236e+00,
        8.18345451e+00,  5.64036417e+00, -1.42000675e-01,  1.12516356e+01,
        1.03294554e+01,  1.05718184e+01,  1.32959995e+01,  8.56436348e+00,
        1.11461811e+01,  1.51509058e+00, -1.21736355e+01, -1.27656345e+01,
        1.33679991e+01,  9.29127407e+00,  8.62581921e+00,  7.17381811e+00,
        5.08327198e+00,  8.33309078e+00,  4.55763721e+00, -1.31781673e+00,
       -6.21254635e+00, -6.81823716e-02,  1.00396366e+01,  5.55800009e+00,
       -1.58636284e+00, -2.48218107e+00, -5.53545380e+00, -1.40667267e+01,
       -1.13407259e+01, -6.04981899e+00, -6.60945368e+00], dtype=float32)

Coordinates: (2)
- time
  (time)
  datetime64[ns]
  1998-05-01 ... 2018-07-30
  standard_name :
  time
  long_name :
  Time
  bounds :
  time_bnds
  axis :
  T
```
array(['1998-05-01T00:00:00.000000000', '1998-05-06T00:00:00.000000000',
       '1998-05-11T00:00:00.000000000', ..., '2018-07-20T00:00:00.000000000',
       '2018-07-25T00:00:00.000000000', '2018-07-30T00:00:00.000000000'],
      dtype='datetime64[ns]')
```
- level
  ()
  float32
  850.0
  standard_name :
  air_pressure
  long_name :
  Level
  units :
  millibar
  positive :
  down
  axis :
  Z
  actual_range :
  [1000. 10.]
  GRIB_id :
  100
  GRIB_name :
  hPa
  coordinate_defines :
  point
```
array(850., dtype=float32)
```

Indexes: (1)

time

PandasIndex

PandasIndex(DatetimeIndex(['1998-05-01', '1998-05-06', '1998-05-11', '1998-05-16',
               '1998-05-21', '1998-05-26', '1998-05-31', '1998-06-05',
               '1998-06-10', '1998-06-15',
               ...
               '2018-06-15', '2018-06-20', '2018-06-25', '2018-06-30',
               '2018-07-05', '2018-07-10', '2018-07-15', '2018-07-20',
               '2018-07-25', '2018-07-30'],
              dtype='datetime64[ns]', name='time', length=399, freq=None))

Attributes: (0)

Step 3: Convert the DataArrays to long frame DataFrame, and pass to seaborn for plotting. To plot scatter plot and regression line, use seaborn.regplot.

scatter_df = (xr.merge([pcp_ptd_mjj.rename('pcp'), us_ptd_mjj.rename('ushear')])
                .to_dataframe())
scatter_df

	pcp	level	ushear
time
1998-05-01	4.897017	850.0	10.862909
1998-05-06	7.951888	850.0	11.484545
1998-05-11	6.319289	850.0	12.718000
1998-05-16	2.472541	850.0	7.670909
1998-05-21	5.270350	850.0	-9.089272
...	...	...	...
2018-07-10	2.626469	850.0	-5.535454
2018-07-15	0.766562	850.0	-14.066727
2018-07-20	2.525000	850.0	-11.340726
2018-07-25	7.597983	850.0	-6.049819
2018-07-30	5.893275	850.0	-6.609454

399 rows × 3 columns

from scipy import stats

def corr(x, y):
    return stats.pearsonr(x, y)[0], stats.pearsonr(x, y)[1]

# 計算相關係數和統計顯著性。
r, p = corr(us_ptd_mjj.values, pcp_ptd_mjj.values)

fig, ax = plt.subplots(figsize=(8,8))
sns.set_theme(style="white", palette=None)
plot = sns.regplot(x="ushear", y="pcp",
                  data=scatter_df,
                  ci=95, ax=ax) # ci是信心水準
ax.set_title(f'$R=$ {r:5.3f}, $p=$ {p:8.2e}', loc='right' )  # Correlation coefficient and p-values

Text(1.0, 1.0, '$R=$ 0.394, $p=$ 3.07e-16')

_images/ac8a90f8df6070e8667f7a6ede17accbf0b29aad0d6e6df7053e7ffdf204dde6.png

Plotting Wide Form `pandas.DataFrame` with `seaborn`#

Example 4: Plot a heat map of pentad mean rainfall percentile ranks (PRs) from April to November in 1998-2020 over the Taiwan-northern South China Sea region (18˚-24˚N, 116˚-126˚E).

lats, latn = 18, 24                  
lon1, lon2 = 116, 126         

pcp = pcpds.sel(time=slice('1998-01-01','2020-12-31'),
                  lat=slice(lats,latn),
                  lon=slice(lon1,lon2)).cmorph

pcp_ptd_ts = (pcp.mean(axis=(1,2))
                 .sel(time=~((pcp.time.dt.month == 2) & (pcp.time.dt.day == 29)))
                 .coarsen(time=5,side='left', coord_func={"time": "min"})
                 .sum())
pcp_season = pcp_ptd_ts.sel(time=(pcp_ptd_ts.time.dt.month.isin([4,5,6,7,8,9,10,11])))

pcp_rank_df = pcp_rank_da.to_pandas()
pcp_rank_df

pentad	19	20	21	22	23	24	25	26	27	28	...	58	59	60	61	62	63	64	65	66	67
year
1998	24.046140	0.798580	72.227152	77.107365	32.475599	68.411713	51.197870	26.086957	45.519077	86.867791	...	94.942325	43.300799	98.225377	15.527950	66.104703	52.440106	20.585626	66.725821	83.318545	36.291038
1999	39.929015	18.722272	60.337178	5.501331	7.897072	81.366460	70.275067	76.220053	55.723159	49.778172	...	85.891748	64.951198	10.559006	24.312334	28.748891	13.487134	12.156167	27.240461	13.043478	70.097604
2000	43.123336	34.782609	46.228926	26.974268	34.605146	58.207631	39.219166	4.081633	70.541260	84.117125	...	69.565217	48.890861	25.554570	96.628217	88.553682	65.838509	65.306122	63.442768	44.276841	6.299911
2001	39.751553	37.267081	37.799468	40.195209	61.579414	22.448980	37.089618	58.385093	92.280390	77.639752	...	55.989352	5.944987	1.863354	26.619343	15.882875	57.409051	66.637090	6.654836	4.170364	2.040816
2002	4.259095	3.637977	21.916593	6.743567	3.992902	4.702751	0.177462	3.460515	33.895297	79.858030	...	11.002662	49.423248	43.212067	49.068323	34.072760	50.399290	17.125111	62.999113	26.796806	25.820763
2003	48.003549	56.255546	20.496894	31.322094	78.615794	7.808341	37.533274	6.122449	11.268855	47.914818	...	44.454303	11.712511	2.750665	61.490683	57.231588	27.861579	64.596273	50.842946	46.140195	8.340728
2004	44.099379	18.367347	21.118012	17.746229	5.678793	8.518190	9.671695	30.434783	19.698314	77.994676	...	15.705413	38.509317	62.910382	1.685892	3.194321	13.753327	19.343390	42.945874	56.965395	28.926353
2005	7.187223	1.419698	26.264419	8.074534	18.988465	11.801242	17.480035	86.956522	61.668146	27.595386	...	13.664596	9.405501	29.281278	56.166815	0.976043	42.502218	50.044366	66.015972	33.007986	50.931677
2006	2.218279	15.350488	46.406389	34.960071	19.254658	49.334516	27.950311	3.105590	69.476486	81.543922	...	5.590062	11.446318	33.984028	87.133984	72.848270	29.547471	35.492458	54.303461	21.827862	33.096717
2007	51.020408	33.185448	5.767524	39.396628	27.772848	13.398403	32.386868	52.706300	30.967169	74.090506	...	30.612245	12.954747	20.053239	54.835847	91.481810	75.421473	7.364685	25.199645	87.488909	43.833185
2008	12.333629	3.904170	12.599823	48.269743	51.641526	43.389530	46.850044	30.079858	56.433008	71.960958	...	72.315883	7.630878	14.374445	10.825200	19.964508	93.611358	32.209406	54.037267	62.732919	47.204969
2009	30.700976	38.952972	50.754215	81.721384	90.683230	40.550133	12.688554	55.013310	0.532387	20.763088	...	59.982254	72.049689	70.629991	33.362910	60.425909	4.525288	48.624667	46.761313	22.626442	19.432121
2010	14.640639	35.314996	32.564330	24.134871	33.629104	76.841171	64.063886	21.472937	24.755989	9.849157	...	67.701863	96.716948	63.975155	23.336291	78.083407	72.759539	80.745342	36.645963	41.171251	10.204082
2011	10.647737	2.484472	1.153505	58.473824	18.101154	18.012422	10.026619	73.469388	26.530612	86.335404	...	87.577640	44.986690	29.458740	33.451642	67.435670	94.676131	80.035492	65.572316	50.221828	68.677906
2012	31.588287	36.557232	1.597161	56.699201	22.981366	64.507542	37.976930	19.875776	54.392192	97.249335	...	13.575865	3.726708	37.444543	59.804791	69.653949	12.244898	52.085182	59.094942	35.758651	39.041704
2013	59.538598	64.685004	31.410825	24.578527	29.724933	57.941437	75.332742	28.305235	66.193434	72.138421	...	39.840284	14.285714	23.425022	76.929902	48.447205	35.048802	42.857143	23.070098	41.259982	42.058563
2014	49.955634	24.489796	7.275954	0.709849	10.292813	30.346051	64.862467	73.824312	36.202307	37.000887	...	7.985803	35.403727	34.693878	4.436557	16.592724	9.937888	66.903283	51.286602	16.503993	41.881100
2015	25.643301	42.324756	39.574091	34.871340	41.703638	16.149068	10.115350	59.449867	38.065661	52.883762	...	52.173913	85.714286	27.417924	17.213842	54.569654	31.144632	20.408163	28.482697	41.614907	25.377107
2016	0.621118	40.638864	67.169476	9.228039	29.192547	37.355812	13.220941	20.230701	50.310559	58.651287	...	69.387755	80.390417	8.163265	46.938776	14.729370	25.022183	16.770186	45.430346	74.445430	81.810115
2017	19.165927	1.952085	47.116238	26.885537	59.361136	36.379769	10.913931	11.623780	50.133097	95.563443	...	90.417036	24.667258	11.978705	28.837622	86.069210	64.330080	47.648625	48.802130	46.583851	53.149956
2018	0.088731	34.427684	40.993789	18.456078	37.178350	6.921029	11.091393	65.394854	4.791482	1.064774	...	46.495120	28.393966	11.357587	91.215617	23.158829	1.774623	16.681455	15.616681	54.924579	30.257320
2019	43.034605	0.266193	54.214729	69.742680	8.606921	29.015084	78.172138	75.865129	47.293700	46.051464	...	24.223602	14.995563	19.520852	37.888199	88.819876	47.826087	38.243123	91.393079	31.677019	12.067436
2020	42.768412	50.488021	27.151730	1.330967	47.471162	29.103815	3.549246	4.880213	42.590949	89.263531	...	45.607808	84.472050	79.680568	53.238687	74.001775	76.131322	77.817214	14.818101	12.777285	45.785271

23 rows × 49 columns

The above DataFrame is in a wide form table.

In a long form DataFrame, labels are only stored in the index. In contrast, a wide form DataFrame stores labels in both columns and index, allowing for a two-dimensional representation.

# Plot
fig, ax = plt.subplots(figsize=(12,8))
sns.set_theme()
ax = sns.heatmap(pcp_rank_df,
                 cmap='jet_r',
                 square=True,
                 vmin=1,vmax=100,
                 cbar_kws={"shrink": 0.55, 'extend':'neither'},
                 xticklabels=2)
plt.xlabel("Pentad")
plt.ylabel("Years")
ax.set_title("Taiwan-Northern SCS, April to November",loc='left')
plt.savefig("pcp_pr_heatmap_obs_chn.png",orientation='portrait',dpi=300)

_images/fed5a3367644b1ef0e6ac7ec7b919a4799b08de49bcc445d2fdde97606642756.png

Conversion between Long Form and Wide Form#

Use pandas.DataFrame.unstack.

pcp_rank_long = pcp_rank_df.unstack().reset_index(name='PR')
pcp_rank_long

	pentad	year	PR
0	19	1998	24.046140
1	19	1999	39.929015
2	19	2000	43.123336
3	19	2001	39.751553
4	19	2002	4.259095
...	...	...	...
1122	67	2016	81.810115
1123	67	2017	53.149956
1124	67	2018	30.257320
1125	67	2019	12.067436
1126	67	2020	45.785271

1127 rows × 3 columns

10. pandas and seaborn: Statistical Plots

Contents

10. pandas and seaborn: Statistical Plots#

Data Structure of pandas#

Series#

DataFrame#

Read .csv File with pandas#

Plotting Long Form pandas.DataFrame with seaborn#

Conversion of xarray.DataArray to pandas.DataFrame#

xarray.to_pandas#

Example: Scatter Plot and Regression Line#

Plotting Wide Form pandas.DataFrame with seaborn#

Conversion between Long Form and Wide Form#

10. `pandas` and `seaborn`: Statistical Plots#

Data Structure of `pandas`#

Read `.csv` File with `pandas`#

Plotting Long Form `pandas.DataFrame` with `seaborn`#

Conversion of `xarray.DataArray` to `pandas.DataFrame`#

`xarray.to_pandas`#

Plotting Wide Form `pandas.DataFrame` with `seaborn`#