{ "cells": [ { "cell_type": "markdown", "id": "117cb6ed-efe8-4e6e-927f-e2e744afc323", "metadata": {}, "source": [ "# 12. `dask` and Large Datasets Computations\n", "\n", "Some people think that large datasets or \"big data\" are mostly applied to machine learning and artificial intelligence fields. In atmospheric science, \n", "\n", "> Big data refers to data sets that are so voluminous and complex that traditional data processing application software is inadequate to deal with them.\n", "\n", "Due to the long periods and finer grid resolutions of modern reanalysis data or model outputs, it often leads to system overload if not processed properly. You may see the following error message:\n", "\n", "`MemoryError: Unable to allocate 52.2 GiB for an array with shape (365, 37, 721, 1440) and data type float32`\n", "\n", "This error message appears because the data size has exceeded the RAM capacity. How should we avoid this situation?" ] }, { "cell_type": "markdown", "id": "a36f5b20-0665-46a8-baa9-a7108c87e627", "metadata": {}, "source": [ "## Dask \n", "\n", "`dask`` is a flexible library for parallel computing in Python. It can scale up to operate on large datasets and perform computations that cannot fit into memory. dask achieves this by breaking down large computations into smaller tasks, which are then executed in parallel. This can avoid consuming large amount of RAM. \n", "\n", "To understand the usage of dask, with demonstrate first with a 1000 × 4000 array size. \n", "\n", "**1. Numpy Array:**" ] }, { "cell_type": "code", "execution_count": 1, "id": "69d43e34-ec78-49e9-9483-b4149571ee88", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 1., 1., ..., 1., 1., 1.],\n", " [1., 1., 1., ..., 1., 1., 1.],\n", " [1., 1., 1., ..., 1., 1., 1.],\n", " ...,\n", " [1., 1., 1., ..., 1., 1., 1.],\n", " [1., 1., 1., ..., 1., 1., 1.],\n", " [1., 1., 1., ..., 1., 1., 1.]])" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "shape = (1000, 4000)\n", "ones_np = np.ones(shape)\n", "ones_np" ] }, { "cell_type": "markdown", "id": "381fd8c6-a09d-4c77-815c-902b79ddcf8b", "metadata": {}, "source": [ "**2. Dask Array:**" ] }, { "cell_type": "code", "execution_count": 2, "id": "6fa512f6-a440-4dbe-9474-e7f617d80bae", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 30.52 MiB 30.52 MiB
Shape (1000, 4000) (1000, 4000)
Dask graph 1 chunks in 1 graph layer
Data type float64 numpy.ndarray
\n", "
\n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " 4000\n", " 1000\n", "\n", "
" ], "text/plain": [ "dask.array" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import dask.array as da\n", "\n", "ones = da.ones(shape)\n", "ones" ] }, { "cell_type": "markdown", "id": "5a4a3f14-9ab0-48d6-b413-b4fce0d85b3d", "metadata": {}, "source": [ "![](https://docs.dask.org/en/latest/_images/dask-array.svg)\n", "\n", "Dask devides the entire array into sub-arrays named \"chunk\". In `dask`, we can specify the size of a chunk." ] }, { "cell_type": "code", "execution_count": 3, "id": "79a316b6-7c1a-4103-84bf-b3d0b3ba424a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 30.52 MiB 7.63 MiB
Shape (1000, 4000) (1000, 1000)
Dask graph 4 chunks in 1 graph layer
Data type float64 numpy.ndarray
\n", "
\n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " 4000\n", " 1000\n", "\n", "
" ], "text/plain": [ "dask.array" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chunk_shape = (1000, 1000)\n", "ones = da.ones(shape, chunks=chunk_shape)\n", "ones" ] }, { "cell_type": "markdown", "id": "23679a30-ea4a-4972-8284-490d2c1b396d", "metadata": {}, "source": [ "We can do some arithmetic calculations, such as multiplication and averaging." ] }, { "cell_type": "code", "execution_count": 4, "id": "02676710-31d4-4363-baae-e20e08c5e09b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 8 B 8 B
Shape () ()
Dask graph 1 chunks in 6 graph layers
Data type float64 numpy.ndarray
\n", "
\n", " \n", "
" ], "text/plain": [ "dask.array" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ones_mean = (ones * ones[::-1, ::-1]).mean()\n", "ones_mean" ] }, { "cell_type": "markdown", "id": "10c1f9b2-288c-4ca6-b350-9c7228e7dae5", "metadata": {}, "source": [ "Following is the calculation procedure:\n", "\n", "![](https://earth-env-data-science.github.io/_images/dask_arrays_16_0.png)\n", "\n", "Dask allows computation of each chunk in each memory core, and finally combines all the computation of each chunk to a final result. For more advanced usages of dask, see [`dask` official tutorial](https://www.dask.org/).\n", "\n", "Dask integrates commonly-used functions in `numpy` and `xarray`, which is beneficial to processing climate data. Then how will `dask` help with large datasets?\n" ] }, { "cell_type": "markdown", "id": "b10dc64b-5982-4558-832c-111f51d05d00", "metadata": {}, "source": [ "## Large Climate Dataset Processing\n", "\n", "In Unit 2, we introduced the `parallel=True` option in `xarray.open_mfdataset`. This option allows `xarray` to read the file using `dask` array. Therefore, the system will read data using multi-core computation to speed up the reading process. Now, we read the NCEP R2 850-hPa wind field and specify the size of chunks by `chunks={'time': 183, 'level': 1, 'longitude': 93*2, 'latitude': 91*2}` then calculate climatology, anomaly, and plot." ] }, { "cell_type": "code", "execution_count": 5, "id": "01bf73c7-b14d-418e-b3f1-d1c827aed33d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 12 μs, sys: 2 μs, total: 14 μs\n", "Wall time: 26.5 μs\n" ] } ], "source": [ "import xarray as xr\n", "import dask\n", "\n", "with dask.config.set(**{'array.slicing.split_large_chunks': False}):\n", " uds = xr.open_mfdataset('./data/ncep_r2_uv850/u850.*.nc',\n", " combine = \"by_coords\", \n", " parallel=True,\n", " chunks={'time':183, 'longitude':36,'latitude':24}\n", " )\n", " vds = xr.open_mfdataset('data/ncep_r2_uv850/v850.*.nc',\n", " combine = \"by_coords\", \n", " parallel=True,\n", " chunks={'time':183, 'longitude':36,'latitude':24}\n", " )\n", "%time" ] }, { "cell_type": "code", "execution_count": 6, "id": "64240f94-ae35-472f-9202-3adc72663b84", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 9 μs, sys: 2 μs, total: 11 μs\n", "Wall time: 23.1 μs\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'uwnd' (time: 8766, level: 1, lat: 37, lon: 144)> Size: 187MB\n",
       "dask.array<getitem, shape=(8766, 1, 37, 144), dtype=float32, chunksize=(183, 1, 37, 144), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * time     (time) datetime64[ns] 70kB 1998-01-01 1998-01-02 ... 2021-12-31\n",
       "  * lon      (lon) float32 576B 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5\n",
       "  * lat      (lat) float32 148B 90.0 87.5 85.0 82.5 80.0 ... 7.5 5.0 2.5 0.0\n",
       "  * level    (level) float32 4B 850.0\n",
       "Attributes: (12/14)\n",
       "    standard_name:         eastward_wind\n",
       "    long_name:             Daily U-wind on Pressure Levels\n",
       "    units:                 m/s\n",
       "    unpacked_valid_range:  [-140.  175.]\n",
       "    actual_range:          [-78.96 110.35]\n",
       "    precision:             2\n",
       "    ...                    ...\n",
       "    var_desc:              u-wind\n",
       "    dataset:               NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Daily A...\n",
       "    level_desc:            Pressure Levels\n",
       "    statistic:             Mean\n",
       "    parent_stat:           Individual Obs\n",
       "    cell_methods:          time: mean (of 4 6-hourly values in one day)
" ], "text/plain": [ " Size: 187MB\n", "dask.array\n", "Coordinates:\n", " * time (time) datetime64[ns] 70kB 1998-01-01 1998-01-02 ... 2021-12-31\n", " * lon (lon) float32 576B 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5\n", " * lat (lat) float32 148B 90.0 87.5 85.0 82.5 80.0 ... 7.5 5.0 2.5 0.0\n", " * level (level) float32 4B 850.0\n", "Attributes: (12/14)\n", " standard_name: eastward_wind\n", " long_name: Daily U-wind on Pressure Levels\n", " units: m/s\n", " unpacked_valid_range: [-140. 175.]\n", " actual_range: [-78.96 110.35]\n", " precision: 2\n", " ... ...\n", " var_desc: u-wind\n", " dataset: NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Daily A...\n", " level_desc: Pressure Levels\n", " statistic: Mean\n", " parent_stat: Individual Obs\n", " cell_methods: time: mean (of 4 6-hourly values in one day)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u = uds.sel(lat=slice(90,0)).uwnd\n", "v = vds.sel(lat=slice(90,0)).vwnd\n", "%time\n", "u" ] }, { "cell_type": "markdown", "id": "afce89f7-1a9b-4d81-99e8-7643079a7dcb", "metadata": {}, "source": [ "Now we see the chunk information. This is because of the dask **lazy computation** that the program didn't really compute with real data values. At this moment, the data is not loaded in the RAM of the system.\n", "\n", "Now, we are going to calculate the climatological mean. However, the built-in functions `groupby()` and `resample()` in `xarray` are not well supported. Therefore, we will compute with the `flox` libray, which is a library with significantly faster groupby calculations. The `flox` groupby method is [`flox.xarray.xarray_reduce`](https://flox.readthedocs.io/en/latest/generated/flox.xarray.xarray_reduce.html). " ] }, { "cell_type": "code", "execution_count": 7, "id": "4c52fc19", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'uwnd' (dayofyear: 366, level: 1, lat: 37, lon: 144)> Size: 8MB\n",
       "dask.array<transpose, shape=(366, 1, 37, 144), dtype=float32, chunksize=(366, 1, 37, 144), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * lon        (lon) float32 576B 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5\n",
       "  * lat        (lat) float32 148B 90.0 87.5 85.0 82.5 80.0 ... 7.5 5.0 2.5 0.0\n",
       "  * level      (level) float32 4B 850.0\n",
       "  * dayofyear  (dayofyear) int64 3kB 1 2 3 4 5 6 7 ... 361 362 363 364 365 366\n",
       "Attributes: (12/14)\n",
       "    standard_name:         eastward_wind\n",
       "    long_name:             Daily U-wind on Pressure Levels\n",
       "    units:                 m/s\n",
       "    unpacked_valid_range:  [-140.  175.]\n",
       "    actual_range:          [-78.96 110.35]\n",
       "    precision:             2\n",
       "    ...                    ...\n",
       "    var_desc:              u-wind\n",
       "    dataset:               NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Daily A...\n",
       "    level_desc:            Pressure Levels\n",
       "    statistic:             Mean\n",
       "    parent_stat:           Individual Obs\n",
       "    cell_methods:          time: mean (of 4 6-hourly values in one day)
" ], "text/plain": [ " Size: 8MB\n", "dask.array\n", "Coordinates:\n", " * lon (lon) float32 576B 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5\n", " * lat (lat) float32 148B 90.0 87.5 85.0 82.5 80.0 ... 7.5 5.0 2.5 0.0\n", " * level (level) float32 4B 850.0\n", " * dayofyear (dayofyear) int64 3kB 1 2 3 4 5 6 7 ... 361 362 363 364 365 366\n", "Attributes: (12/14)\n", " standard_name: eastward_wind\n", " long_name: Daily U-wind on Pressure Levels\n", " units: m/s\n", " unpacked_valid_range: [-140. 175.]\n", " actual_range: [-78.96 110.35]\n", " precision: 2\n", " ... ...\n", " var_desc: u-wind\n", " dataset: NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Daily A...\n", " level_desc: Pressure Levels\n", " statistic: Mean\n", " parent_stat: Individual Obs\n", " cell_methods: time: mean (of 4 6-hourly values in one day)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from flox.xarray import xarray_reduce\n", "\n", "uDayClm = xarray_reduce(u,u.time.dt.dayofyear,func='mean',dim='time')\n", "vDayClm = xarray_reduce(u,u.time.dt.dayofyear,func='mean',dim='time')\n", "\n", "uDayClm " ] }, { "cell_type": "markdown", "id": "28e7047f", "metadata": {}, "source": [ "After lazy computation, we can now really trigger computation and return the final result with `.compute()`." ] }, { "cell_type": "code", "execution_count": 8, "id": "cbec20b2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'uwnd' (dayofyear: 366, level: 1, lat: 37, lon: 144)> Size: 8MB\n",
       "array([[[[-1.5247940e+00, -1.4514602e+00, -1.3754174e+00, ...,\n",
       "          -1.7235428e+00, -1.6562518e+00, -1.5960444e+00],\n",
       "         [-5.6333429e-01, -4.1833475e-01, -2.7458504e-01, ...,\n",
       "          -9.6874976e-01, -8.4125155e-01, -6.9416922e-01],\n",
       "         [ 7.5583190e-01,  9.2770594e-01,  1.0993727e+00, ...,\n",
       "           3.0166462e-01,  4.4770643e-01,  6.0145718e-01],\n",
       "         ...,\n",
       "         [-1.4350017e+00, -1.3072938e+00, -1.2437533e+00, ...,\n",
       "          -3.2308366e+00, -2.5950010e+00, -1.8743768e+00],\n",
       "         [-7.3541850e-01, -6.0729337e-01, -4.0958592e-01, ...,\n",
       "          -2.8920867e+00, -2.0904186e+00, -1.2187523e+00],\n",
       "         [-6.3625211e-01, -6.9604486e-01, -6.7854387e-01, ...,\n",
       "          -2.7762527e+00, -1.9468769e+00, -1.0535429e+00]]],\n",
       "\n",
       "\n",
       "       [[[ 1.3624781e-01,  1.8458217e-01,  2.3916422e-01, ...,\n",
       "          -1.3961315e-02,  3.6873043e-02,  8.3957411e-02],\n",
       "         [ 2.3895724e-01,  3.3770636e-01,  4.3708053e-01, ...,\n",
       "          -5.7292778e-02,  4.5207102e-02,  1.4082973e-01],\n",
       "         [ 2.3937146e-01,  3.5499719e-01,  4.8520657e-01, ...,\n",
       "...\n",
       "         [-1.4406258e+00, -1.0618763e+00, -4.6687624e-01, ...,\n",
       "          -3.2627106e+00, -2.4233358e+00, -1.8195858e+00],\n",
       "         [-8.7166828e-01, -7.2979182e-01, -4.7916898e-01, ...,\n",
       "          -2.8472936e+00, -1.9472933e+00, -1.2368768e+00]]],\n",
       "\n",
       "\n",
       "       [[[-4.1091676e+00, -3.9441681e+00, -3.7783356e+00, ...,\n",
       "          -4.5566688e+00, -4.4050002e+00, -4.2674980e+00],\n",
       "         [-3.2116699e+00, -3.1075017e+00, -2.9991691e+00, ...,\n",
       "          -3.5141671e+00, -3.4233353e+00, -3.3258324e+00],\n",
       "         [-3.0250015e+00, -3.1033335e+00, -3.1783364e+00, ...,\n",
       "          -2.5975020e+00, -2.7550004e+00, -2.8883379e+00],\n",
       "         ...,\n",
       "         [ 6.7749852e-01,  9.3666548e-01,  8.9166456e-01, ...,\n",
       "          -2.1883342e+00, -1.1716664e+00, -9.6667372e-02],\n",
       "         [ 3.4666610e-01,  7.5583148e-01,  1.2258316e+00, ...,\n",
       "          -2.0166693e+00, -1.1691660e+00, -3.0000091e-01],\n",
       "         [ 2.3749900e-01,  4.7500062e-01,  9.8333144e-01, ...,\n",
       "          -1.7283325e+00, -9.7916764e-01, -2.2666772e-01]]]],\n",
       "      dtype=float32)\n",
       "Coordinates:\n",
       "  * lon        (lon) float32 576B 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5\n",
       "  * lat        (lat) float32 148B 90.0 87.5 85.0 82.5 80.0 ... 7.5 5.0 2.5 0.0\n",
       "  * level      (level) float32 4B 850.0\n",
       "  * dayofyear  (dayofyear) int64 3kB 1 2 3 4 5 6 7 ... 361 362 363 364 365 366\n",
       "Attributes: (12/14)\n",
       "    standard_name:         eastward_wind\n",
       "    long_name:             Daily U-wind on Pressure Levels\n",
       "    units:                 m/s\n",
       "    unpacked_valid_range:  [-140.  175.]\n",
       "    actual_range:          [-78.96 110.35]\n",
       "    precision:             2\n",
       "    ...                    ...\n",
       "    var_desc:              u-wind\n",
       "    dataset:               NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Daily A...\n",
       "    level_desc:            Pressure Levels\n",
       "    statistic:             Mean\n",
       "    parent_stat:           Individual Obs\n",
       "    cell_methods:          time: mean (of 4 6-hourly values in one day)
" ], "text/plain": [ " Size: 8MB\n", "array([[[[-1.5247940e+00, -1.4514602e+00, -1.3754174e+00, ...,\n", " -1.7235428e+00, -1.6562518e+00, -1.5960444e+00],\n", " [-5.6333429e-01, -4.1833475e-01, -2.7458504e-01, ...,\n", " -9.6874976e-01, -8.4125155e-01, -6.9416922e-01],\n", " [ 7.5583190e-01, 9.2770594e-01, 1.0993727e+00, ...,\n", " 3.0166462e-01, 4.4770643e-01, 6.0145718e-01],\n", " ...,\n", " [-1.4350017e+00, -1.3072938e+00, -1.2437533e+00, ...,\n", " -3.2308366e+00, -2.5950010e+00, -1.8743768e+00],\n", " [-7.3541850e-01, -6.0729337e-01, -4.0958592e-01, ...,\n", " -2.8920867e+00, -2.0904186e+00, -1.2187523e+00],\n", " [-6.3625211e-01, -6.9604486e-01, -6.7854387e-01, ...,\n", " -2.7762527e+00, -1.9468769e+00, -1.0535429e+00]]],\n", "\n", "\n", " [[[ 1.3624781e-01, 1.8458217e-01, 2.3916422e-01, ...,\n", " -1.3961315e-02, 3.6873043e-02, 8.3957411e-02],\n", " [ 2.3895724e-01, 3.3770636e-01, 4.3708053e-01, ...,\n", " -5.7292778e-02, 4.5207102e-02, 1.4082973e-01],\n", " [ 2.3937146e-01, 3.5499719e-01, 4.8520657e-01, ...,\n", "...\n", " [-1.4406258e+00, -1.0618763e+00, -4.6687624e-01, ...,\n", " -3.2627106e+00, -2.4233358e+00, -1.8195858e+00],\n", " [-8.7166828e-01, -7.2979182e-01, -4.7916898e-01, ...,\n", " -2.8472936e+00, -1.9472933e+00, -1.2368768e+00]]],\n", "\n", "\n", " [[[-4.1091676e+00, -3.9441681e+00, -3.7783356e+00, ...,\n", " -4.5566688e+00, -4.4050002e+00, -4.2674980e+00],\n", " [-3.2116699e+00, -3.1075017e+00, -2.9991691e+00, ...,\n", " -3.5141671e+00, -3.4233353e+00, -3.3258324e+00],\n", " [-3.0250015e+00, -3.1033335e+00, -3.1783364e+00, ...,\n", " -2.5975020e+00, -2.7550004e+00, -2.8883379e+00],\n", " ...,\n", " [ 6.7749852e-01, 9.3666548e-01, 8.9166456e-01, ...,\n", " -2.1883342e+00, -1.1716664e+00, -9.6667372e-02],\n", " [ 3.4666610e-01, 7.5583148e-01, 1.2258316e+00, ...,\n", " -2.0166693e+00, -1.1691660e+00, -3.0000091e-01],\n", " [ 2.3749900e-01, 4.7500062e-01, 9.8333144e-01, ...,\n", " -1.7283325e+00, -9.7916764e-01, -2.2666772e-01]]]],\n", " dtype=float32)\n", "Coordinates:\n", " * lon (lon) float32 576B 0.0 2.5 5.0 7.5 ... 350.0 352.5 355.0 357.5\n", " * lat (lat) float32 148B 90.0 87.5 85.0 82.5 80.0 ... 7.5 5.0 2.5 0.0\n", " * level (level) float32 4B 850.0\n", " * dayofyear (dayofyear) int64 3kB 1 2 3 4 5 6 7 ... 361 362 363 364 365 366\n", "Attributes: (12/14)\n", " standard_name: eastward_wind\n", " long_name: Daily U-wind on Pressure Levels\n", " units: m/s\n", " unpacked_valid_range: [-140. 175.]\n", " actual_range: [-78.96 110.35]\n", " precision: 2\n", " ... ...\n", " var_desc: u-wind\n", " dataset: NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Daily A...\n", " level_desc: Pressure Levels\n", " statistic: Mean\n", " parent_stat: Individual Obs\n", " cell_methods: time: mean (of 4 6-hourly values in one day)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "uDayClm_fn = uDayClm.compute()\n", "vDayClm_fn = vDayClm.compute()\n", "\n", "uDayClm_fn" ] }, { "cell_type": "markdown", "id": "9494f300", "metadata": {}, "source": [ "The computed result is now saved in a DataArray. " ] }, { "cell_type": "markdown", "id": "ba25fdd2", "metadata": {}, "source": [ "### Best Practices with Working with `dask`\n", "\n", "1. Save intermediate results to disk as a netCDF files (using `to_netcdf()`) and then load them again with `open_dataset()` for further computations.\n", "2. Specify smaller chunks across space when using `open_mfdataset()` (e.g., `chunks={'latitude': 10, 'longitude': 10}`)。\n", "3. Chunk as early as possible, and avoid rechunking as much as possible. Always pass the `chunks={}` argument to `open_mfdataset()` to avoid redundant file reads.\n", "4. `groupby()` is a costly operation and will perform a lot better if the flox package is installed. See the `flox` documentation for more. By default Xarray will use `flox` if installed.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" }, "vscode": { "interpreter": { "hash": "8e905df1d4d920326545d879dea538d50859be77412bc9bf54949dad3bde9dd6" } } }, "nbformat": 4, "nbformat_minor": 5 }