{ "cells": [ { "cell_type": "markdown", "id": "ece5758f-c463-474e-b429-3c184f5ff3ee", "metadata": {}, "source": [ "## Loading data with chunks\n", "\n", "The yt_xarray accessor can loaded gridded data in chunks for some gridded data (at present, fields must be 3D and the grid must be a uniform grid). \n", "\n", "Given an xarray dataset:" ] }, { "cell_type": "code", "execution_count": 1, "id": "4b579c9a-3aab-49fa-bbf0-99c0c9778e78", "metadata": {}, "outputs": [], "source": [ "from yt_xarray.sample_data import load_random_xr_data\n", "import yt \n", "import yt_xarray \n", "\n", "fields = {'temperature': ('x', 'y', 'z'), 'pressure': ('x', 'y', 'z')}\n", "dims = {'x': (0,1,15), 'y': (0, 1, 10), 'z': (0, 1, 15)}\n", "ds = load_random_xr_data(fields, dims, length_unit='m')" ] }, { "cell_type": "markdown", "id": "08ab39fc-5ccb-47a4-b16d-b285ba57cc05", "metadata": {}, "source": [ "the `chunksizes` argument to `ds.yt.load_grid` will split the grid into sub-grids, each containing a number of cell sizes equal to `chunksizes` (with extra grids containing any partial chunks): " ] }, { "cell_type": "code", "execution_count": 2, "id": "aae128d8-0c81-471e-86bd-bb0d5903a344", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "yt_xarray : [INFO ] 2023-02-06 13:47:37,093: Inferred geometry type is cartesian. To override, use ds.yt.set_geometry\n", "yt_xarray : [INFO ] 2023-02-06 13:47:37,095: Attempting to detect if yt_xarray will require field interpolation:\n", "yt_xarray : [INFO ] 2023-02-06 13:47:37,096: Cartesian geometry on uniform grid: yt_xarray will not interpolate.\n", "yt_xarray : [INFO ] 2023-02-06 13:47:37,096: Constructing a yt chunked grid with 18 chunks.\n", "yt : [INFO ] 2023-02-06 13:47:37,188 Parameters: current_time = 0.0\n", "yt : [INFO ] 2023-02-06 13:47:37,189 Parameters: domain_dimensions = [15 10 15]\n", "yt : [INFO ] 2023-02-06 13:47:37,190 Parameters: domain_left_edge = [-0.03571429 -0.05555556 -0.03571429]\n", "yt : [INFO ] 2023-02-06 13:47:37,191 Parameters: domain_right_edge = [1.03571429 1.05555556 1.03571429]\n", "yt : [INFO ] 2023-02-06 13:47:37,192 Parameters: cosmological_simulation = 0\n" ] } ], "source": [ "yt_ds = ds.yt.load_grid(chunksizes=5)" ] }, { "cell_type": "markdown", "id": "00865725-9b12-4bbd-af30-63531bb2f2b7", "metadata": {}, "source": [ "Now when yt processes data, it will iterate over the chunks for most functions. The yt grids can be visualized on plot objects with the `annotate_grids` plot callback:" ] }, { "cell_type": "code", "execution_count": 3, "id": "cdd51be3-5d93-4d50-94e8-293df9a48f6f", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "yt : [INFO ] 2023-02-06 13:47:37,442 xlim = -0.035714 1.035714\n", "yt : [INFO ] 2023-02-06 13:47:37,442 ylim = -0.055556 1.055556\n", "yt : [INFO ] 2023-02-06 13:47:37,443 xlim = -0.035714 1.035714\n", "yt : [INFO ] 2023-02-06 13:47:37,444 ylim = -0.055556 1.055556\n", "yt : [INFO ] 2023-02-06 13:47:37,448 Making a fixed resolution buffer of (('stream', 'temperature')) 800 by 800\n", "/home/chris/.pyenv/versions/3.9.0/envs/yt_xarray/lib/python3.9/site-packages/unyt/array.py:1882: RuntimeWarning: invalid value encountered in true_divide\n", " out_arr = func(\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "slc = yt.SlicePlot(yt_ds, \"z\", (\"stream\", \"temperature\"), origin=\"native\")\n", "slc.set_log((\"stream\", \"temperature\"), False)\n", "slc.annotate_grids()\n", "slc.show()" ] }, { "cell_type": "markdown", "id": "d8678d80-3981-4c4f-b68b-50c32e7a7cd3", "metadata": {}, "source": [ "You can also specify the `chunksizes` argument as a tuple if you want to have different sized chunks in each dimension:" ] }, { "cell_type": "code", "execution_count": 4, "id": "8680374e-fb7e-4d14-9266-9f8297b6771f", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "yt_xarray : [INFO ] 2023-02-06 13:47:38,156: Attempting to detect if yt_xarray will require field interpolation:\n", "yt_xarray : [INFO ] 2023-02-06 13:47:38,157: Cartesian geometry on uniform grid: yt_xarray will not interpolate.\n", "yt_xarray : [INFO ] 2023-02-06 13:47:38,158: Constructing a yt chunked grid with 96 chunks.\n", "yt : [INFO ] 2023-02-06 13:47:38,248 Parameters: current_time = 0.0\n", "yt : [INFO ] 2023-02-06 13:47:38,249 Parameters: domain_dimensions = [15 10 15]\n", "yt : [INFO ] 2023-02-06 13:47:38,250 Parameters: domain_left_edge = [-0.03571429 -0.05555556 -0.03571429]\n", "yt : [INFO ] 2023-02-06 13:47:38,251 Parameters: domain_right_edge = [1.03571429 1.05555556 1.03571429]\n", "yt : [INFO ] 2023-02-06 13:47:38,251 Parameters: cosmological_simulation = 0\n", "yt : [INFO ] 2023-02-06 13:47:38,373 xlim = -0.035714 1.035714\n", "yt : [INFO ] 2023-02-06 13:47:38,374 ylim = -0.055556 1.055556\n", "yt : [INFO ] 2023-02-06 13:47:38,375 xlim = -0.035714 1.035714\n", "yt : [INFO ] 2023-02-06 13:47:38,376 ylim = -0.055556 1.055556\n", "yt : [INFO ] 2023-02-06 13:47:38,379 Making a fixed resolution buffer of (('stream', 'temperature')) 800 by 800\n", "/home/chris/.pyenv/versions/3.9.0/envs/yt_xarray/lib/python3.9/site-packages/unyt/array.py:1882: RuntimeWarning: invalid value encountered in true_divide\n", " out_arr = func(\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "yt_ds = ds.yt.load_grid(chunksizes=(2,3,5))\n", "slc = yt.SlicePlot(yt_ds, \"z\", (\"stream\", \"temperature\"), origin=\"native\")\n", "slc.set_log((\"stream\", \"temperature\"), False)\n", "slc.annotate_grids()\n", "slc.show()" ] }, { "cell_type": "markdown", "id": "3fd3e9c8-a50c-4768-9db5-e348473bad6d", "metadata": {}, "source": [ "## yt_xarray and Dask chunks\n", "\n", "While yt_xarray can handle chunked xarray fields, the current implementation does not guarantee alignment between yt and xarray chunks. This means that when using a Dask backend, yt may request index ranges that span dask chunks when operating on yt grid objects. To illustrate this, let's first build an xarray dataset with a single field defined by a dask array:" ] }, { "cell_type": "code", "execution_count": 5, "id": "14ee420b-e865-4aa4-bbe9-9d0759b72934", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'test_field' (x: 800, y: 650, z: 500)>\n",
       "dask.array<random_sample, shape=(800, 650, 500), dtype=float64, chunksize=(100, 100, 100), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * x        (x) float64 0.0 0.001252 0.002503 0.003755 ... 0.9975 0.9987 1.0\n",
       "  * y        (y) float64 0.0 0.001541 0.003082 0.004622 ... 0.9969 0.9985 1.0\n",
       "  * z        (z) float64 0.0 0.002004 0.004008 0.006012 ... 0.996 0.998 1.0
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates:\n", " * x (x) float64 0.0 0.001252 0.002503 0.003755 ... 0.9975 0.9987 1.0\n", " * y (y) float64 0.0 0.001541 0.003082 0.004622 ... 0.9969 0.9985 1.0\n", " * z (z) float64 0.0 0.002004 0.004008 0.006012 ... 0.996 0.998 1.0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from dask import array as da\n", "import xarray as xr\n", "import numpy as np \n", "import yt_xarray \n", "\n", "shp = (800, 650, 500)\n", "f1 = da.random.random(shp , chunks=100)\n", "coords = {'x': np.linspace(0, 1, shp[0]), \n", " 'y': np.linspace(0, 1, shp[1]), \n", " 'z': np.linspace(0, 1, shp[2])}\n", "\n", "data = {'test_field': xr.DataArray(f1, coords=coords, dims=('x', 'y', 'z'))}\n", "ds = xr.Dataset(data)\n", "ds.test_field" ] }, { "cell_type": "markdown", "id": "60e0b5ae-1554-4746-86b5-920cc6791911", "metadata": {}, "source": [ "Now when calling `load_grid`, the yt grid objects will contain dask chunks:" ] }, { "cell_type": "code", "execution_count": 6, "id": "a41ee76b-8c62-443f-bdd2-075171682732", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "yt_xarray : [INFO ] 2023-02-06 13:47:38,993: Inferred geometry type is cartesian. To override, use ds.yt.set_geometry\n", "yt_xarray : [INFO ] 2023-02-06 13:47:38,995: Attempting to detect if yt_xarray will require field interpolation:\n", "yt_xarray : [INFO ] 2023-02-06 13:47:38,996: Cartesian geometry on uniform grid: yt_xarray will not interpolate.\n", "yt_xarray : [INFO ] 2023-02-06 13:47:38,997: Constructing a yt chunked grid with 280 chunks.\n", "yt : [INFO ] 2023-02-06 13:47:39,073 Parameters: current_time = 0.0\n", "yt : [INFO ] 2023-02-06 13:47:39,075 Parameters: domain_dimensions = [800 650 500]\n", "yt : [INFO ] 2023-02-06 13:47:39,076 Parameters: domain_left_edge = [-0.00062578 -0.00077042 -0.001002 ]\n", "yt : [INFO ] 2023-02-06 13:47:39,077 Parameters: domain_right_edge = [1.00062578 1.00077042 1.001002 ]\n", "yt : [INFO ] 2023-02-06 13:47:39,078 Parameters: cosmological_simulation = 0\n" ] } ], "source": [ "yt_ds = ds.yt.load_grid(length_unit='m', chunksizes=100)" ] }, { "cell_type": "markdown", "id": "fc0370e9-dd01-473e-b62c-0d781924a51e", "metadata": {}, "source": [ "Because the chunking division follows the same axis-ordering as xarray and we chose the same chunksize, the resulting yt grids should line up with the dask array chunks:" ] }, { "cell_type": "code", "execution_count": 7, "id": "09e6e870-af6a-498c-a507-53407b1a50ac", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "yt : [INFO ] 2023-02-06 13:47:39,897 xlim = -0.000626 1.000626\n", "yt : [INFO ] 2023-02-06 13:47:39,898 ylim = -0.000770 1.000770\n", "yt : [INFO ] 2023-02-06 13:47:39,899 xlim = -0.000626 1.000626\n", "yt : [INFO ] 2023-02-06 13:47:39,899 ylim = -0.000770 1.000770\n", "yt : [INFO ] 2023-02-06 13:47:39,904 Making a fixed resolution buffer of (('stream', 'test_field')) 800 by 800\n", "/home/chris/.pyenv/versions/3.9.0/envs/yt_xarray/lib/python3.9/site-packages/unyt/array.py:1882: RuntimeWarning: invalid value encountered in true_divide\n", " out_arr = func(\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "slc = yt.SlicePlot(yt_ds, \"z\", (\"stream\", \"test_field\"))\n", "slc.annotate_grids()\n", "slc.show()" ] }, { "cell_type": "markdown", "id": "9748e8a5-57cd-4932-b461-c8b3737e0c15", "metadata": {}, "source": [ "But you can also set the yt grid chunksize independently of the underlying dask chunks. If loading the yt datasets with a chunksize larger than that of the underlying dask array, then multiple dask chunks will be contained within a yt grid:" ] }, { "cell_type": "code", "execution_count": 8, "id": "dfd3e4e6-121b-4a84-8106-67fd81bd47e3", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "yt_xarray : [INFO ] 2023-02-06 13:47:41,013: Attempting to detect if yt_xarray will require field interpolation:\n", "yt_xarray : [INFO ] 2023-02-06 13:47:41,013: Cartesian geometry on uniform grid: yt_xarray will not interpolate.\n", "yt_xarray : [INFO ] 2023-02-06 13:47:41,014: Constructing a yt chunked grid with 48 chunks.\n", "yt : [INFO ] 2023-02-06 13:47:41,063 Parameters: current_time = 0.0\n", "yt : [INFO ] 2023-02-06 13:47:41,064 Parameters: domain_dimensions = [800 650 500]\n", "yt : [INFO ] 2023-02-06 13:47:41,065 Parameters: domain_left_edge = [-0.00062578 -0.00077042 -0.001002 ]\n", "yt : [INFO ] 2023-02-06 13:47:41,066 Parameters: domain_right_edge = [1.00062578 1.00077042 1.001002 ]\n", "yt : [INFO ] 2023-02-06 13:47:41,067 Parameters: cosmological_simulation = 0\n" ] } ], "source": [ "yt_ds = ds.yt.load_grid(length_unit='m', chunksizes=200)" ] }, { "cell_type": "code", "execution_count": 9, "id": "7595d987-1bd4-4487-b6bd-1da1fc67c37a", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "yt : [INFO ] 2023-02-06 13:47:41,846 xlim = -0.000626 1.000626\n", "yt : [INFO ] 2023-02-06 13:47:41,847 ylim = -0.000770 1.000770\n", "yt : [INFO ] 2023-02-06 13:47:41,848 xlim = -0.000626 1.000626\n", "yt : [INFO ] 2023-02-06 13:47:41,849 ylim = -0.000770 1.000770\n", "yt : [INFO ] 2023-02-06 13:47:41,857 Making a fixed resolution buffer of (('stream', 'test_field')) 800 by 800\n", "/home/chris/.pyenv/versions/3.9.0/envs/yt_xarray/lib/python3.9/site-packages/unyt/array.py:1882: RuntimeWarning: invalid value encountered in true_divide\n", " out_arr = func(\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "slc = yt.SlicePlot(yt_ds, \"z\", (\"stream\", \"test_field\"))\n", "slc.annotate_grids()\n", "slc.show()" ] }, { "cell_type": "markdown", "id": "c5a22be8-5d5b-491d-a76c-a6d54fd1dd1e", "metadata": {}, "source": [ "In some cases, the mismatch may lead to memory issues: if you set the yt grids contain multiple dask chunks then those dask chunks need to fit in memory. But there may also be some performance savings by containing multiple dask chunks within a single yt sub-grid.\n", "\n", "Work is under way to better sync the xarray-dask chunks with yt grid objects, see [Issue 28](https://github.com/data-exp-lab/yt_xarray/issues/28) for any updates or to get involved!" ] }, { "cell_type": "code", "execution_count": null, "id": "3d600447-385d-49f2-b5ac-7db9d7a397a1", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" } }, "nbformat": 4, "nbformat_minor": 5 }