{ "cells": [ { "cell_type": "markdown", "id": "117cb6ed-efe8-4e6e-927f-e2e744afc323", "metadata": {}, "source": [ "# 12. 大型資料處理\n", "\n", "一般認為所謂大型資料或「大數據 (Big Data)」通常應用在機器學習領域上,但在大氣科學上,我們所認定的大數據可能更接近以下維基百科的定義:\n", "\n", "> Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them.\n", "\n", "大氣科學資料因為時間長、網格解析度和垂直層越來越細的緣故,如果沒有謹慎處理,可能會造成記憶體無法負荷,例如以下錯誤訊息:\n", "\n", "`MemoryError: Unable to allocate 52.2 GiB for an array with shape (365, 37, 721, 1440) and data type float32`\n", "\n", "會有這樣的情形是因為要分析的資料量已經超過電腦記憶體 (RAM) 的負荷。那要如何避免這個情形發生呢?" ] }, { "cell_type": "markdown", "id": "a36f5b20-0665-46a8-baa9-a7108c87e627", "metadata": {}, "source": [ "## Dask \n", "\n", "Dask是一套Python的套件,可以用電腦多核心(core)來進行平行運算,因此可以提升效率。在計算時,程式不會完全讀入所有的資料,而是以符號的方式先進行運算,這個過程稱為 \"lazy computation\",也因此運算的過程不會耗費大量的記憶體 RAM。\n", "\n", "為了理解dask如何在xarray上運作,我們先以一組1000 × 4000大小的矩陣來示範。\n", "\n", "**1. Numpy矩陣**" ] }, { "cell_type": "code", "execution_count": 1, "id": "69d43e34-ec78-49e9-9483-b4149571ee88", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 1., 1., ..., 1., 1., 1.],\n", " [1., 1., 1., ..., 1., 1., 1.],\n", " [1., 1., 1., ..., 1., 1., 1.],\n", " ...,\n", " [1., 1., 1., ..., 1., 1., 1.],\n", " [1., 1., 1., ..., 1., 1., 1.],\n", " [1., 1., 1., ..., 1., 1., 1.]])" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "shape = (1000, 4000)\n", "ones_np = np.ones(shape)\n", "ones_np" ] }, { "cell_type": "markdown", "id": "381fd8c6-a09d-4c77-815c-902b79ddcf8b", "metadata": {}, "source": [ "**2. Dask矩陣**" ] }, { "cell_type": "code", "execution_count": 2, "id": "6fa512f6-a440-4dbe-9474-e7f617d80bae", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 30.52 MiB 30.52 MiB
Shape (1000, 4000) (1000, 4000)
Count 1 Tasks 1 Chunks
Type float64 numpy.ndarray
\n", "
\n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " 4000\n", " 1000\n", "\n", "
" ], "text/plain": [ "dask.array" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import dask.array as da\n", "\n", "ones = da.ones(shape)\n", "ones" ] }, { "cell_type": "markdown", "id": "5a4a3f14-9ab0-48d6-b413-b4fce0d85b3d", "metadata": {}, "source": [ "![](https://docs.dask.org/en/latest/_images/dask-array.svg)\n", "\n", "Dask會把矩陣分成許多子矩陣,這些子矩陣稱為\"chunk\"。在dask中,我們可以指定chunks的大小。" ] }, { "cell_type": "code", "execution_count": 3, "id": "79a316b6-7c1a-4103-84bf-b3d0b3ba424a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 30.52 MiB 7.63 MiB
Shape (1000, 4000) (1000, 1000)
Count 4 Tasks 4 Chunks
Type float64 numpy.ndarray
\n", "
\n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " 4000\n", " 1000\n", "\n", "
" ], "text/plain": [ "dask.array" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chunk_shape = (1000, 1000)\n", "ones = da.ones(shape, chunks=chunk_shape)\n", "ones" ] }, { "cell_type": "markdown", "id": "23679a30-ea4a-4972-8284-490d2c1b396d", "metadata": {}, "source": [ "如果我們做點計算,例如先進行相乘然後再平均" ] }, { "cell_type": "code", "execution_count": 4, "id": "02676710-31d4-4363-baae-e20e08c5e09b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 8 B 8.0 B
Shape () ()
Count 19 Tasks 1 Chunks
Type float64 numpy.ndarray
\n", "
\n", " \n", "
" ], "text/plain": [ "dask.array" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ones_mean = (ones * ones[::-1, ::-1]).mean()\n", "ones_mean" ] }, { "cell_type": "markdown", "id": "10c1f9b2-288c-4ca6-b350-9c7228e7dae5", "metadata": {}, "source": [ "計算過程如下:\n", "\n", "![](https://earth-env-data-science.github.io/_images/dask_arrays_16_0.png)\n", "\n", "也就是chunk本身會在各自的核心中先進行計算,然後最後再合併一起成為最終的結果。\n", "\n", "由以上的計算過程,我們可以大致理解dask的計算原理,更進階的用法可以參閱dask的官方網站說明。\n", "\n", "從以上例子我們可以知道,dask矩陣囊括了我們熟知的numpy套件中的函數。其實dask也囊括了xarray的函數,這對於我們要使用dask輔助處理大氣科學中大型資料是非常有利的。那麼dask怎麼幫助我們加速資料處理呢?" ] }, { "cell_type": "markdown", "id": "b10dc64b-5982-4558-832c-111f51d05d00", "metadata": {}, "source": [ "## 大型氣象資料處理\n", "\n", "在第二單元中,我們已經介紹在開啟多個檔案`xarray.open_mfdataset`時,可以加上`parallel=True`來加快計算速度,這就是把xarray以dask矩陣的方式讀取,因此電腦多核心會同時讀取檔案,以增加速度。以下我們開啟NCEP R2水平風場的檔案,指定chunks的大小`chunks={'time':183, 'level': 1, 'longitude':93*2,'latitude':91*2}`,並且計算氣候場、距平值,然後畫出結果。" ] }, { "cell_type": "code", "execution_count": 5, "id": "01bf73c7-b14d-418e-b3f1-d1c827aed33d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1 µs, sys: 0 ns, total: 1 µs\n", "Wall time: 2.86 µs\n" ] } ], "source": [ "import xarray as xr\n", "import dask\n", "\n", "with dask.config.set(**{'array.slicing.split_large_chunks': False}):\n", " uds = xr.open_mfdataset('./data/ncep_r2_uv850/u850.*.nc',\n", " combine = \"by_coords\", \n", " parallel=True,\n", " chunks={'time':183, 'longitude':36,'latitude':24}\n", " )\n", " vds = xr.open_mfdataset('data/ncep_r2_uv850/v850.*.nc',\n", " combine = \"by_coords\", \n", " parallel=True,\n", " chunks={'time':183, 'longitude':36,'latitude':24}\n", " )\n", "%time" ] }, { "cell_type": "code", "execution_count": 6, "id": "64240f94-ae35-472f-9202-3adc72663b84", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1e+03 ns, sys: 0 ns, total: 1e+03 ns\n", "Wall time: 3.1 µs\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'uwnd' (time: 8766, level: 1, lat: 37, lon: 144)>\n",
       "dask.array<getitem, shape=(8766, 1, 37, 144), dtype=float32, chunksize=(183, 1, 37, 144), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * time     (time) datetime64[ns] 1998-01-01 1998-01-02 ... 2021-12-31\n",
       "  * lon      (lon) float32 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5\n",
       "  * lat      (lat) float32 90.0 87.5 85.0 82.5 80.0 ... 10.0 7.5 5.0 2.5 0.0\n",
       "  * level    (level) float32 850.0\n",
       "Attributes:\n",
       "    standard_name:         eastward_wind\n",
       "    long_name:             Daily U-wind on Pressure Levels\n",
       "    units:                 m/s\n",
       "    unpacked_valid_range:  [-140.  175.]\n",
       "    actual_range:          [-78.96 110.35]\n",
       "    precision:             2\n",
       "    GRIB_id:               33\n",
       "    GRIB_name:             UGRD\n",
       "    var_desc:              u-wind\n",
       "    dataset:               NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Daily A...\n",
       "    level_desc:            Pressure Levels\n",
       "    statistic:             Mean\n",
       "    parent_stat:           Individual Obs\n",
       "    cell_methods:          time: mean (of 4 6-hourly values in one day)
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates:\n", " * time (time) datetime64[ns] 1998-01-01 1998-01-02 ... 2021-12-31\n", " * lon (lon) float32 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5\n", " * lat (lat) float32 90.0 87.5 85.0 82.5 80.0 ... 10.0 7.5 5.0 2.5 0.0\n", " * level (level) float32 850.0\n", "Attributes:\n", " standard_name: eastward_wind\n", " long_name: Daily U-wind on Pressure Levels\n", " units: m/s\n", " unpacked_valid_range: [-140. 175.]\n", " actual_range: [-78.96 110.35]\n", " precision: 2\n", " GRIB_id: 33\n", " GRIB_name: UGRD\n", " var_desc: u-wind\n", " dataset: NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Daily A...\n", " level_desc: Pressure Levels\n", " statistic: Mean\n", " parent_stat: Individual Obs\n", " cell_methods: time: mean (of 4 6-hourly values in one day)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u = uds.sel(lat=slice(90,0)).uwnd\n", "v = vds.sel(lat=slice(90,0)).vwnd\n", "%time\n", "u" ] }, { "cell_type": "markdown", "id": "afce89f7-1a9b-4d81-99e8-7643079a7dcb", "metadata": {}, "source": [ "在以上的預覽中,我們可以看到Chunk的資訊,這是因為在dask的lazy computation下,程式還沒有真正將資料的數值給代入,且將資料切成我們指定的小塊來進行平行運算了。到此時資料都還沒有真正進入電腦的記憶體。\n", "\n", "接下來我們要計算氣候平均,但根據xarray網站的建議,目前dask對`groupby()`、`resample()`函數還沒有很好的支援,如果在dask下計算,反而很沒效率,因此筆者建議,在切完資料、進行複雜計算前,都應該使用`load()`將資料讀取進記憶體。\n", "\n", "> `xarray.Dataset.load`: Manually trigger loading and/or computation of this dataset’s data from disk or a remote source into memory and return this dataset. \n", "\n", "不過如果上面預覽中的task越多、資料檔越大,這個步驟還是得花很多時間。" ] }, { "cell_type": "code", "execution_count": 7, "id": "606fa133", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 2 µs, sys: 1e+03 ns, total: 3 µs\n", "Wall time: 4.77 µs\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'uwnd' (time: 8766, level: 1, lat: 37, lon: 144)>\n",
       "array([[[[ -7.9900055 ,  -7.9600067 ,  -7.9400024 , ...,  -7.9799957 ,\n",
       "           -7.9600067 ,  -8.0099945 ],\n",
       "         [ -3.5800018 ,  -3.2900085 ,  -3.0099945 , ...,  -4.5099945 ,\n",
       "           -4.2100067 ,  -3.8600006 ],\n",
       "         [  4.1900024 ,   4.669998  ,   5.050003  , ...,   2.4199982 ,\n",
       "            3.069992  ,   3.699997  ],\n",
       "         ...,\n",
       "         [ -2.5400085 ,  -3.6100006 ,  -4.1600037 , ...,   0.6199951 ,\n",
       "           -0.21000671,  -1.3399963 ],\n",
       "         [ -4.1100006 ,  -4.630005  ,  -4.3099976 , ...,  -0.29000854,\n",
       "           -1.4900055 ,  -2.9100037 ],\n",
       "         [ -7.4600067 ,  -7.3899994 ,  -6.1900024 , ...,  -3.6600037 ,\n",
       "           -5.2299957 ,  -6.6100006 ]]],\n",
       "\n",
       "\n",
       "       [[[ -5.6100006 ,  -5.1600037 ,  -4.7100067 , ...,  -6.8099976 ,\n",
       "           -6.4100037 ,  -6.0099945 ],\n",
       "         [  2.2200012 ,   2.6699982 ,   3.069992  , ...,   0.56999207,\n",
       "            1.199997  ,   1.6900024 ],\n",
       "         [  8.319992  ,   8.789993  ,   9.139999  , ...,   6.369995  ,\n",
       "...\n",
       "            1.6250005 ,   1.9250002 ],\n",
       "         [ -1.2750001 ,  -0.7999997 ,  -0.2249999 , ...,  -2.7249997 ,\n",
       "           -2.2       ,  -1.7499998 ],\n",
       "         [ -3.9249995 ,  -3.4999998 ,  -2.9250002 , ...,  -5.4       ,\n",
       "           -4.975     ,  -4.5       ]]],\n",
       "\n",
       "\n",
       "       [[[ -6.7499995 ,  -6.775     ,  -6.8250003 , ...,  -6.6       ,\n",
       "           -6.625     ,  -6.7249994 ],\n",
       "         [ -6.8749995 ,  -6.925     ,  -6.9499993 , ...,  -6.575     ,\n",
       "           -6.675     ,  -6.775     ],\n",
       "         [ -7.05      ,  -7.175     ,  -7.225     , ...,  -6.5000005 ,\n",
       "           -6.725     ,  -6.8749995 ],\n",
       "         ...,\n",
       "         [ -0.625     ,  -0.8500004 ,  -0.5750003 , ...,  -2.6       ,\n",
       "           -1.5500002 ,  -0.7249999 ],\n",
       "         [ -1.6749997 ,  -1.3499999 ,  -1.2750001 , ...,  -5.9500003 ,\n",
       "           -4.4749994 ,  -2.775     ],\n",
       "         [ -3.15      ,  -2.8249998 ,  -2.8250003 , ...,  -7.700001  ,\n",
       "           -6.125     ,  -4.3       ]]]], dtype=float32)\n",
       "Coordinates:\n",
       "  * time     (time) datetime64[ns] 1998-01-01 1998-01-02 ... 2021-12-31\n",
       "  * lon      (lon) float32 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5\n",
       "  * lat      (lat) float32 90.0 87.5 85.0 82.5 80.0 ... 10.0 7.5 5.0 2.5 0.0\n",
       "  * level    (level) float32 850.0\n",
       "Attributes:\n",
       "    standard_name:         eastward_wind\n",
       "    long_name:             Daily U-wind on Pressure Levels\n",
       "    units:                 m/s\n",
       "    unpacked_valid_range:  [-140.  175.]\n",
       "    actual_range:          [-78.96 110.35]\n",
       "    precision:             2\n",
       "    GRIB_id:               33\n",
       "    GRIB_name:             UGRD\n",
       "    var_desc:              u-wind\n",
       "    dataset:               NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Daily A...\n",
       "    level_desc:            Pressure Levels\n",
       "    statistic:             Mean\n",
       "    parent_stat:           Individual Obs\n",
       "    cell_methods:          time: mean (of 4 6-hourly values in one day)
" ], "text/plain": [ "\n", "array([[[[ -7.9900055 , -7.9600067 , -7.9400024 , ..., -7.9799957 ,\n", " -7.9600067 , -8.0099945 ],\n", " [ -3.5800018 , -3.2900085 , -3.0099945 , ..., -4.5099945 ,\n", " -4.2100067 , -3.8600006 ],\n", " [ 4.1900024 , 4.669998 , 5.050003 , ..., 2.4199982 ,\n", " 3.069992 , 3.699997 ],\n", " ...,\n", " [ -2.5400085 , -3.6100006 , -4.1600037 , ..., 0.6199951 ,\n", " -0.21000671, -1.3399963 ],\n", " [ -4.1100006 , -4.630005 , -4.3099976 , ..., -0.29000854,\n", " -1.4900055 , -2.9100037 ],\n", " [ -7.4600067 , -7.3899994 , -6.1900024 , ..., -3.6600037 ,\n", " -5.2299957 , -6.6100006 ]]],\n", "\n", "\n", " [[[ -5.6100006 , -5.1600037 , -4.7100067 , ..., -6.8099976 ,\n", " -6.4100037 , -6.0099945 ],\n", " [ 2.2200012 , 2.6699982 , 3.069992 , ..., 0.56999207,\n", " 1.199997 , 1.6900024 ],\n", " [ 8.319992 , 8.789993 , 9.139999 , ..., 6.369995 ,\n", "...\n", " 1.6250005 , 1.9250002 ],\n", " [ -1.2750001 , -0.7999997 , -0.2249999 , ..., -2.7249997 ,\n", " -2.2 , -1.7499998 ],\n", " [ -3.9249995 , -3.4999998 , -2.9250002 , ..., -5.4 ,\n", " -4.975 , -4.5 ]]],\n", "\n", "\n", " [[[ -6.7499995 , -6.775 , -6.8250003 , ..., -6.6 ,\n", " -6.625 , -6.7249994 ],\n", " [ -6.8749995 , -6.925 , -6.9499993 , ..., -6.575 ,\n", " -6.675 , -6.775 ],\n", " [ -7.05 , -7.175 , -7.225 , ..., -6.5000005 ,\n", " -6.725 , -6.8749995 ],\n", " ...,\n", " [ -0.625 , -0.8500004 , -0.5750003 , ..., -2.6 ,\n", " -1.5500002 , -0.7249999 ],\n", " [ -1.6749997 , -1.3499999 , -1.2750001 , ..., -5.9500003 ,\n", " -4.4749994 , -2.775 ],\n", " [ -3.15 , -2.8249998 , -2.8250003 , ..., -7.700001 ,\n", " -6.125 , -4.3 ]]]], dtype=float32)\n", "Coordinates:\n", " * time (time) datetime64[ns] 1998-01-01 1998-01-02 ... 2021-12-31\n", " * lon (lon) float32 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5\n", " * lat (lat) float32 90.0 87.5 85.0 82.5 80.0 ... 10.0 7.5 5.0 2.5 0.0\n", " * level (level) float32 850.0\n", "Attributes:\n", " standard_name: eastward_wind\n", " long_name: Daily U-wind on Pressure Levels\n", " units: m/s\n", " unpacked_valid_range: [-140. 175.]\n", " actual_range: [-78.96 110.35]\n", " precision: 2\n", " GRIB_id: 33\n", " GRIB_name: UGRD\n", " var_desc: u-wind\n", " dataset: NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Daily A...\n", " level_desc: Pressure Levels\n", " statistic: Mean\n", " parent_stat: Individual Obs\n", " cell_methods: time: mean (of 4 6-hourly values in one day)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u.load()\n", "v.load()\n", "%time\n", "u" ] }, { "cell_type": "markdown", "id": "afea06fa", "metadata": {}, "source": [ "此時這是一個DataArray。接下來計算氣候平均就很快了。" ] }, { "cell_type": "code", "execution_count": 8, "id": "dbee03bf-d221-46cc-9934-e816044e5847", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'uwnd' (dayofyear: 366, level: 1, lat: 37, lon: 144)>\n",
       "array([[[[-1.5247937e+00, -1.4514604e+00, -1.3754172e+00, ...,\n",
       "          -1.7235428e+00, -1.6562518e+00, -1.5960444e+00],\n",
       "         [-5.6333429e-01, -4.1833475e-01, -2.7458504e-01, ...,\n",
       "          -9.6874976e-01, -8.4125155e-01, -6.9416922e-01],\n",
       "         [ 7.5583190e-01,  9.2770594e-01,  1.0993727e+00, ...,\n",
       "           3.0166462e-01,  4.4770643e-01,  6.0145718e-01],\n",
       "         ...,\n",
       "         [-1.4350017e+00, -1.3072938e+00, -1.2437533e+00, ...,\n",
       "          -3.2308362e+00, -2.5950012e+00, -1.8743768e+00],\n",
       "         [-7.3541850e-01, -6.0729337e-01, -4.0958592e-01, ...,\n",
       "          -2.8920867e+00, -2.0904186e+00, -1.2187523e+00],\n",
       "         [-6.3625211e-01, -6.9604486e-01, -6.7854387e-01, ...,\n",
       "          -2.7762527e+00, -1.9468770e+00, -1.0535429e+00]]],\n",
       "\n",
       "\n",
       "       [[[ 1.3624781e-01,  1.8458217e-01,  2.3916422e-01, ...,\n",
       "          -1.3961315e-02,  3.6873043e-02,  8.3957411e-02],\n",
       "         [ 2.3895724e-01,  3.3770636e-01,  4.3708053e-01, ...,\n",
       "          -5.7292778e-02,  4.5207102e-02,  1.4082973e-01],\n",
       "         [ 2.3937146e-01,  3.5499719e-01,  4.8520657e-01, ...,\n",
       "...\n",
       "         [-1.4406258e+00, -1.0618763e+00, -4.6687618e-01, ...,\n",
       "          -3.2627103e+00, -2.4233356e+00, -1.8195858e+00],\n",
       "         [-8.7166828e-01, -7.2979182e-01, -4.7916898e-01, ...,\n",
       "          -2.8472939e+00, -1.9472933e+00, -1.2368768e+00]]],\n",
       "\n",
       "\n",
       "       [[[-4.1091676e+00, -3.9441681e+00, -3.7783356e+00, ...,\n",
       "          -4.5566688e+00, -4.4050002e+00, -4.2674980e+00],\n",
       "         [-3.2116699e+00, -3.1075017e+00, -2.9991691e+00, ...,\n",
       "          -3.5141671e+00, -3.4233353e+00, -3.3258324e+00],\n",
       "         [-3.0250015e+00, -3.1033335e+00, -3.1783364e+00, ...,\n",
       "          -2.5975020e+00, -2.7550004e+00, -2.8883379e+00],\n",
       "         ...,\n",
       "         [ 6.7749852e-01,  9.3666548e-01,  8.9166456e-01, ...,\n",
       "          -2.1883342e+00, -1.1716664e+00, -9.6667372e-02],\n",
       "         [ 3.4666610e-01,  7.5583148e-01,  1.2258316e+00, ...,\n",
       "          -2.0166693e+00, -1.1691660e+00, -3.0000091e-01],\n",
       "         [ 2.3749900e-01,  4.7500062e-01,  9.8333144e-01, ...,\n",
       "          -1.7283325e+00, -9.7916764e-01, -2.2666772e-01]]]],\n",
       "      dtype=float32)\n",
       "Coordinates:\n",
       "  * lon        (lon) float32 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5\n",
       "  * lat        (lat) float32 90.0 87.5 85.0 82.5 80.0 ... 10.0 7.5 5.0 2.5 0.0\n",
       "  * level      (level) float32 850.0\n",
       "  * dayofyear  (dayofyear) int64 1 2 3 4 5 6 7 8 ... 360 361 362 363 364 365 366
" ], "text/plain": [ "\n", "array([[[[-1.5247937e+00, -1.4514604e+00, -1.3754172e+00, ...,\n", " -1.7235428e+00, -1.6562518e+00, -1.5960444e+00],\n", " [-5.6333429e-01, -4.1833475e-01, -2.7458504e-01, ...,\n", " -9.6874976e-01, -8.4125155e-01, -6.9416922e-01],\n", " [ 7.5583190e-01, 9.2770594e-01, 1.0993727e+00, ...,\n", " 3.0166462e-01, 4.4770643e-01, 6.0145718e-01],\n", " ...,\n", " [-1.4350017e+00, -1.3072938e+00, -1.2437533e+00, ...,\n", " -3.2308362e+00, -2.5950012e+00, -1.8743768e+00],\n", " [-7.3541850e-01, -6.0729337e-01, -4.0958592e-01, ...,\n", " -2.8920867e+00, -2.0904186e+00, -1.2187523e+00],\n", " [-6.3625211e-01, -6.9604486e-01, -6.7854387e-01, ...,\n", " -2.7762527e+00, -1.9468770e+00, -1.0535429e+00]]],\n", "\n", "\n", " [[[ 1.3624781e-01, 1.8458217e-01, 2.3916422e-01, ...,\n", " -1.3961315e-02, 3.6873043e-02, 8.3957411e-02],\n", " [ 2.3895724e-01, 3.3770636e-01, 4.3708053e-01, ...,\n", " -5.7292778e-02, 4.5207102e-02, 1.4082973e-01],\n", " [ 2.3937146e-01, 3.5499719e-01, 4.8520657e-01, ...,\n", "...\n", " [-1.4406258e+00, -1.0618763e+00, -4.6687618e-01, ...,\n", " -3.2627103e+00, -2.4233356e+00, -1.8195858e+00],\n", " [-8.7166828e-01, -7.2979182e-01, -4.7916898e-01, ...,\n", " -2.8472939e+00, -1.9472933e+00, -1.2368768e+00]]],\n", "\n", "\n", " [[[-4.1091676e+00, -3.9441681e+00, -3.7783356e+00, ...,\n", " -4.5566688e+00, -4.4050002e+00, -4.2674980e+00],\n", " [-3.2116699e+00, -3.1075017e+00, -2.9991691e+00, ...,\n", " -3.5141671e+00, -3.4233353e+00, -3.3258324e+00],\n", " [-3.0250015e+00, -3.1033335e+00, -3.1783364e+00, ...,\n", " -2.5975020e+00, -2.7550004e+00, -2.8883379e+00],\n", " ...,\n", " [ 6.7749852e-01, 9.3666548e-01, 8.9166456e-01, ...,\n", " -2.1883342e+00, -1.1716664e+00, -9.6667372e-02],\n", " [ 3.4666610e-01, 7.5583148e-01, 1.2258316e+00, ...,\n", " -2.0166693e+00, -1.1691660e+00, -3.0000091e-01],\n", " [ 2.3749900e-01, 4.7500062e-01, 9.8333144e-01, ...,\n", " -1.7283325e+00, -9.7916764e-01, -2.2666772e-01]]]],\n", " dtype=float32)\n", "Coordinates:\n", " * lon (lon) float32 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5\n", " * lat (lat) float32 90.0 87.5 85.0 82.5 80.0 ... 10.0 7.5 5.0 2.5 0.0\n", " * level (level) float32 850.0\n", " * dayofyear (dayofyear) int64 1 2 3 4 5 6 7 8 ... 360 361 362 363 364 365 366" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "uDayClim = u.groupby('time.dayofyear').mean('time')\n", "vDayClim = v.groupby('time.dayofyear').mean('time')\n", "\n", "uDayClim" ] }, { "cell_type": "markdown", "id": "ba25fdd2", "metadata": {}, "source": [ "### 使用dask的一些好習慣\n", "\n", "1. 目前dask在`resample()` or `groupby()`兩個函數並沒有做很好的效率最佳化,因此建議在這之前就先進行`load()`的動作,以避免非常大量的計算。從上面的範例就可以看到從頭計算到ws這個動作,就要花費2279930 tasks,必然要花費很多時間!\n", "2. 把一些初步的結果先儲存成netCDF檔案,然後重新讀進來,會比較節省時間。\n", "3. 空間上切越小的chunks越好 (e.g., chunks={'latitude': 10, 'longitude': 10})。\n", "4. xarray官方網站建議開啟多個檔案時,設定`engine='h5netcdf'`,會比 `engine='netcdf4'`快。\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "vscode": { "interpreter": { "hash": "8e905df1d4d920326545d879dea538d50859be77412bc9bf54949dad3bde9dd6" } } }, "nbformat": 4, "nbformat_minor": 5 }