{ "cells": [ { "cell_type": "markdown", "id": "9abc7ae0-1bba-47d8-a1df-17386def470c", "metadata": {}, "source": [ "## Loading data with yt_xarray \n", "\n", "This notebook demonstrates how to initialize a yt dataset object from an open xarray dataset.\n", "\n", "After describing the sample data that is used, the notebook covers:\n", "\n", "* [Loading all fields](#Loading-all-fields)\n", "* [Overview of yt datasets](#A-brief-overview-of-yt-datasets)\n", "* [Loading a subset of fields](#Loading-a-subset-of-fields)\n", "* [Loading method and memory usage](#Loading-method-and-memory-usage)\n", "\n", "\n", "### sample data\n", "\n", "We'll be using some random sample data in this notebook (as well as many of the others), generated from a convenience function, `yt_xarray.sample_data.load_random_xr_data()`. To use it, we have to supply two dictionaries: one containing fieldnames mapped to the dimension names and a second containing the starting value, end value and number of elements for each dimension:\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "9b750bb0-f68d-4765-8860-caeb37f1b2f2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset>\n",
       "Dimensions:      (x: 15, y: 10, z: 15)\n",
       "Coordinates:\n",
       "  * x            (x) float64 0.0 0.07143 0.1429 0.2143 ... 0.8571 0.9286 1.0\n",
       "  * y            (y) float64 0.0 0.1111 0.2222 0.3333 ... 0.7778 0.8889 1.0\n",
       "  * z            (z) float64 0.0 0.07143 0.1429 0.2143 ... 0.8571 0.9286 1.0\n",
       "Data variables:\n",
       "    temperature  (x, y, z) float64 0.7337 0.2377 0.9107 ... 0.3052 0.8424 0.615\n",
       "    pressure     (x, y, z) float64 0.8171 0.6735 0.5087 ... 0.5845 0.2743 0.3072\n",
       "Attributes:\n",
       "    geospatial_vertical_units:  m
" ], "text/plain": [ "\n", "Dimensions: (x: 15, y: 10, z: 15)\n", "Coordinates:\n", " * x (x) float64 0.0 0.07143 0.1429 0.2143 ... 0.8571 0.9286 1.0\n", " * y (y) float64 0.0 0.1111 0.2222 0.3333 ... 0.7778 0.8889 1.0\n", " * z (z) float64 0.0 0.07143 0.1429 0.2143 ... 0.8571 0.9286 1.0\n", "Data variables:\n", " temperature (x, y, z) float64 0.7337 0.2377 0.9107 ... 0.3052 0.8424 0.615\n", " pressure (x, y, z) float64 0.8171 0.6735 0.5087 ... 0.5845 0.2743 0.3072\n", "Attributes:\n", " geospatial_vertical_units: m" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from yt_xarray.sample_data import load_random_xr_data\n", "\n", "fields = {'temperature': ('x', 'y', 'z'), 'pressure': ('x', 'y', 'z')}\n", "dims = {'x': (0,1,15), 'y': (0, 1, 10), 'z': (0, 1, 15)}\n", "ds = load_random_xr_data(fields, dims, length_unit='m')\n", "ds" ] }, { "cell_type": "markdown", "id": "c0f06a1a-7196-4da0-b41d-4298fa5644e7", "metadata": {}, "source": [ "While we're using random sample data here, note that yt_xarray provides a simple wrapper of the standard xarray `open_dataset` function that will check yt's `test_data_dir` for data if the file is not found in the local path. It is used in the same way as xarray:\n", "\n", "```python\n", "ds = yt_xarray.open_dataset(\"path/to/your/dataset.nc\")\n", "```\n", "\n" ] }, { "cell_type": "markdown", "id": "c7603170-3ece-4240-be38-bfc9f963f1b2", "metadata": {}, "source": [ "### Loading all fields\n", "\n", "The primary way of loading data into yt is by creation of a yt dataset object. \n", "\n", "To create a yt dataset that loads all the data variables:" ] }, { "cell_type": "code", "execution_count": 2, "id": "272662d7-65cc-426f-8a42-b41cf5a07313", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "yt_xarray : [INFO ] 2023-02-06 12:24:07,825: Inferred geometry type is cartesian. To override, use ds.yt.set_geometry\n", "yt_xarray : [INFO ] 2023-02-06 12:24:07,826: Attempting to detect if yt_xarray will require field interpolation:\n", "yt_xarray : [INFO ] 2023-02-06 12:24:07,827: Cartesian geometry on uniform grid: yt_xarray will not interpolate.\n", "yt : [INFO ] 2023-02-06 12:24:07,927 Parameters: current_time = 0.0\n", "yt : [INFO ] 2023-02-06 12:24:07,928 Parameters: domain_dimensions = [15 10 15]\n", "yt : [INFO ] 2023-02-06 12:24:07,929 Parameters: domain_left_edge = [-0.03571429 -0.05555556 -0.03571429]\n", "yt : [INFO ] 2023-02-06 12:24:07,930 Parameters: domain_right_edge = [1.03571429 1.05555556 1.03571429]\n", "yt : [INFO ] 2023-02-06 12:24:07,931 Parameters: cosmological_simulation = 0\n" ] } ], "source": [ "import yt_xarray\n", "import yt\n", "\n", "yt_ds = ds.yt.load_grid()" ] }, { "cell_type": "markdown", "id": "ccde9182-3221-4ade-a4e7-2b85bb1cdde6", "metadata": {}, "source": [ "note that this yt dataset actually maintains references to the open xarray dataset! Data will be loaded into yt only as needed. \n", "\n", "\n", "### A brief overview of yt datasets\n", "\n", "Now that we have a yt dataset, let's do a quick overview of a yt dataset:\n", "\n", "\n", "**Field Tuples**\n", "\n", "You can check the available fields in a yt dataset with:" ] }, { "cell_type": "code", "execution_count": 3, "id": "0dcf56f3-434c-41a9-a94d-d26e96cfbf65", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('stream', 'pressure'), ('stream', 'temperature')]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "yt_ds.field_list" ] }, { "cell_type": "markdown", "id": "baf4848a-b204-4f7c-b942-c92b7312f892", "metadata": {}, "source": [ "fields in yt include both a field type and a field name. yt_xarray relies on yt's \"stream\" frontend infrastructure, so all of our xarray fields end up with a field type of \"stream\". This is important because when referring to fields, you'll need to supply both the field type and the field name. \n", "\n", "To construct a `SlicePlot`, for example:" ] }, { "cell_type": "code", "execution_count": 4, "id": "f8bdd9b1-ad86-459e-ab7b-5baded948653", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "yt : [INFO ] 2023-02-06 12:24:08,113 xlim = -0.035714 1.035714\n", "yt : [INFO ] 2023-02-06 12:24:08,114 ylim = -0.055556 1.055556\n", "yt : [INFO ] 2023-02-06 12:24:08,115 xlim = -0.035714 1.035714\n", "yt : [INFO ] 2023-02-06 12:24:08,115 ylim = -0.055556 1.055556\n", "yt : [INFO ] 2023-02-06 12:24:08,121 Making a fixed resolution buffer of (('stream', 'temperature')) 800 by 800\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "slc = yt.SlicePlot(yt_ds, \"z\", (\"stream\", \"temperature\"))\n", "slc.set_log((\"stream\", \"temperature\"), False)\n", "slc.show()" ] }, { "cell_type": "markdown", "id": "0d007ae9-28d8-4673-8884-7e3745020bb3", "metadata": {}, "source": [ "**Domain extent and units**\n", "\n", "yt also has some useful attributes to quickly check the domain extents:" ] }, { "cell_type": "code", "execution_count": 5, "id": "6585e4c2-5010-4483-bbd2-9582a549970d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.5 0.5 0.5] code_length\n" ] } ], "source": [ "print(yt_ds.domain_center)" ] }, { "cell_type": "code", "execution_count": 6, "id": "2c6e209d-b953-4af8-9143-d80ad77a9983", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1.07142857 1.11111111 1.07142857] code_length\n" ] } ], "source": [ "print(yt_ds.domain_width)" ] }, { "cell_type": "markdown", "id": "0c8ac07d-d47f-4557-b771-55d35abf19cd", "metadata": {}, "source": [ "You'll notice that the output above are `unyt` arrays. The \"code_length\" refers to the representative length of your volume. You can view a unyt array with different unyts with:\n" ] }, { "cell_type": "code", "execution_count": 7, "id": "abe0b1f8-968f-4f7d-b517-e8edc3e345c9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1.07142857 1.11111111 1.07142857] m\n" ] } ], "source": [ "print(yt_ds.domain_width.to('m'))" ] }, { "cell_type": "markdown", "id": "42c9e68b-5093-444c-b123-9d8bcc14ee64", "metadata": {}, "source": [ "\n", "The above sample dataset sets an attribute, `geospatial_vertical_units`:" ] }, { "cell_type": "code", "execution_count": 8, "id": "284b12f5-b595-40da-aff7-a079f28b8aa0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'m'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.geospatial_vertical_units" ] }, { "cell_type": "markdown", "id": "f99f4103-d17d-4765-be60-6a694d393f09", "metadata": {}, "source": [ "which yt_xarray sets as the dataset `length_unit`:" ] }, { "cell_type": "code", "execution_count": 9, "id": "abeab962-3051-477e-ac80-fb87df87e5de", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "unyt_quantity(1., 'm')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "yt_ds.length_unit" ] }, { "cell_type": "markdown", "id": "6f3677b7-1532-4397-b540-bbbde5e4f750", "metadata": {}, "source": [ "without this attribute, you can set it explicitly so that yt will know the dimensions of your input coordinates:" ] }, { "cell_type": "code", "execution_count": 10, "id": "27e472e6-2c01-4dcc-b7ba-75566cd4749d", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "yt_xarray : [INFO ] 2023-02-06 12:24:08,945: Attempting to detect if yt_xarray will require field interpolation:\n", "yt_xarray : [INFO ] 2023-02-06 12:24:08,946: Cartesian geometry on uniform grid: yt_xarray will not interpolate.\n", "yt : [INFO ] 2023-02-06 12:24:09,045 Parameters: current_time = 0.0\n", "yt : [INFO ] 2023-02-06 12:24:09,046 Parameters: domain_dimensions = [15 10 15]\n", "yt : [INFO ] 2023-02-06 12:24:09,047 Parameters: domain_left_edge = [-0.03571429 -0.05555556 -0.03571429]\n", "yt : [INFO ] 2023-02-06 12:24:09,048 Parameters: domain_right_edge = [1.03571429 1.05555556 1.03571429]\n", "yt : [INFO ] 2023-02-06 12:24:09,049 Parameters: cosmological_simulation = 0\n" ] } ], "source": [ "yt_ds = ds.yt.load_grid(length_unit='km')" ] }, { "cell_type": "code", "execution_count": 11, "id": "6207284e-1a85-4ac9-8209-0d2d094956d2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "unyt_array([1071.42857143, 1111.11111111, 1071.42857143], 'm')" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "yt_ds.domain_width.to('m')" ] }, { "cell_type": "markdown", "id": "341ce0e1-4aa6-475b-834b-ee6ad96b9e4e", "metadata": {}, "source": [ "for more on units, unyt and yt, you can read more [here](https://yt-project.org/doc/analyzing/units.html). \n", "\n" ] }, { "cell_type": "markdown", "id": "3fb9cf0a-64d1-481e-9433-1c3e2b4dee5a", "metadata": {}, "source": [ "You also might notice that the domain width in the yt dataset is **slightly** larger than that in the xarray dataset:" ] }, { "cell_type": "code", "execution_count": 12, "id": "018bb5f4-da2e-4e33-b1fc-ec5a4a62cfdb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1.07142857 1.11111111 1.07142857] code_length\n" ] } ], "source": [ "print(yt_ds.domain_width)" ] }, { "cell_type": "code", "execution_count": 13, "id": "efb1aa9c-0d3c-468b-a8b2-3bf47be7725a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1.0, 1.0, 1.0]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[ds.coords[dim].max().values.item() - ds.coords[dim].min().values.item() for dim in ds.dims]" ] }, { "cell_type": "markdown", "id": "77a8d8d8-b994-4656-ae9f-ae20bc70ac8f", "metadata": {}, "source": [ "This is related to how yt_xarray builds cells from node values, see [here](https://yt-xarray.readthedocs.io/en/latest/supported_grids.html) for an explanation." ] }, { "cell_type": "markdown", "id": "64143714-9589-42d6-afb2-fa4def4f937f", "metadata": {}, "source": [ "### Loading a subset of fields\n", "\n", "When you call `ds.yt.load_grid()` without arguments, it attempts to grab references to all of the available fields. If you only want to work with a subset of fields, you can supply the `fields` argument:" ] }, { "cell_type": "code", "execution_count": 14, "id": "2fdaf1b5-0248-4cc9-870f-6a5ed8b24bd4", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "yt_xarray : [INFO ] 2023-02-06 12:24:09,089: Attempting to detect if yt_xarray will require field interpolation:\n", "yt_xarray : [INFO ] 2023-02-06 12:24:09,091: Cartesian geometry on uniform grid: yt_xarray will not interpolate.\n", "yt : [INFO ] 2023-02-06 12:24:09,164 Parameters: current_time = 0.0\n", "yt : [INFO ] 2023-02-06 12:24:09,165 Parameters: domain_dimensions = [15 10 15]\n", "yt : [INFO ] 2023-02-06 12:24:09,166 Parameters: domain_left_edge = [-0.03571429 -0.05555556 -0.03571429]\n", "yt : [INFO ] 2023-02-06 12:24:09,168 Parameters: domain_right_edge = [1.03571429 1.05555556 1.03571429]\n", "yt : [INFO ] 2023-02-06 12:24:09,169 Parameters: cosmological_simulation = 0\n" ] } ], "source": [ "yt_ds = ds.yt.load_grid(fields=('temperature',))" ] }, { "cell_type": "markdown", "id": "9c981496-9c50-4bee-9a2c-e1fc4b0088c6", "metadata": {}, "source": [ "One of the limitations to `yt_xarray`at present is that the fields loaded into yt must have the same dimensions. \n", "\n", "If for example, we had a dataset with a mix of 2d and 3d and 4d (space + time) variables:\n" ] }, { "cell_type": "code", "execution_count": 15, "id": "58628610-e6e6-428a-a49e-7a43ddc98bbe", "metadata": {}, "outputs": [], "source": [ "fields = {'temperature': ('x', 'y', 'z'), \n", " 'pressure': ('x', 'y', 'z'), \n", " 'precip': ('x', 'y', 'time'), \n", " 'cumulative_precip': ('x', 'y')}\n", "dims = {'x': (0,1,14), 'y': (0, 1, 10), 'z': (0, 1, 15), 'time': (0, 1, 11)}\n", "ds = load_random_xr_data(fields, dims, length_unit='m')" ] }, { "cell_type": "markdown", "id": "c49a8f17-c79c-422d-bf63-d9eb5fbb5bc7", "metadata": {}, "source": [ "and simply try to load the full dataset with \n", "\n", "```\n", "ds.yt.load_grid()\n", "```\n", "then we will get an error due to the mismatching dimensions of the variables. In this case, we can load both the temperature and pressure fields with:" ] }, { "cell_type": "code", "execution_count": 16, "id": "c65a0110-816b-4e78-b6af-7c427639e7c7", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "yt_xarray : [INFO ] 2023-02-06 12:24:09,202: Inferred geometry type is cartesian. To override, use ds.yt.set_geometry\n", "yt_xarray : [INFO ] 2023-02-06 12:24:09,203: Attempting to detect if yt_xarray will require field interpolation:\n", "yt_xarray : [INFO ] 2023-02-06 12:24:09,204: Cartesian geometry on uniform grid: yt_xarray will not interpolate.\n", "yt : [INFO ] 2023-02-06 12:24:09,266 Parameters: current_time = 0.0\n", "yt : [INFO ] 2023-02-06 12:24:09,267 Parameters: domain_dimensions = [14 10 15]\n", "yt : [INFO ] 2023-02-06 12:24:09,268 Parameters: domain_left_edge = [-0.03846154 -0.05555556 -0.03571429]\n", "yt : [INFO ] 2023-02-06 12:24:09,269 Parameters: domain_right_edge = [1.03846154 1.05555556 1.03571429]\n", "yt : [INFO ] 2023-02-06 12:24:09,270 Parameters: cosmological_simulation = 0\n" ] } ], "source": [ "yt_ds = ds.yt.load_grid(fields=('temperature', 'pressure'))" ] }, { "cell_type": "markdown", "id": "b0f0c000-c6b5-4d5b-90e1-a99741218f46", "metadata": {}, "source": [ "Furthermore, when loading variables with a time dimension, you should select the time index to load:" ] }, { "cell_type": "code", "execution_count": 17, "id": "320cb3f4-eed1-4a25-b14a-dbbaf12b9096", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "yt_xarray : [INFO ] 2023-02-06 12:24:09,285: Attempting to detect if yt_xarray will require field interpolation:\n", "yt_xarray : [INFO ] 2023-02-06 12:24:09,287: Cartesian geometry on uniform grid: yt_xarray will not interpolate.\n", "yt : [INFO ] 2023-02-06 12:24:09,344 Parameters: current_time = 0.1\n", "yt : [INFO ] 2023-02-06 12:24:09,345 Parameters: domain_dimensions = [14 10 1]\n", "yt : [INFO ] 2023-02-06 12:24:09,347 Parameters: domain_left_edge = [-0.03846154 -0.05555556 -0.5 ]\n", "yt : [INFO ] 2023-02-06 12:24:09,348 Parameters: domain_right_edge = [1.03846154 1.05555556 0.5 ]\n", "yt : [INFO ] 2023-02-06 12:24:09,349 Parameters: cosmological_simulation = 0\n" ] } ], "source": [ "yt_ds = ds.yt.load_grid(fields=('precip',), sel_dict={'time':1})" ] }, { "cell_type": "code", "execution_count": 18, "id": "a96806a9-b9df-4466-97df-b87f7554e3f3", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "yt : [INFO ] 2023-02-06 12:24:09,494 min value is 1.64191e-02 at 0.9230769230769234 0.3333333333333334 0.0000000000000000\n" ] } ], "source": [ "value, location = yt_ds.find_min(('stream', 'precip'))" ] }, { "cell_type": "markdown", "id": "7e410e6b-73c1-4512-a3d5-3e9c7cadd555", "metadata": {}, "source": [ "### Loading method and memory usage\n", "\n", "When you use `ds.yt.load_grid`, by default yt_xarray will use references to the open xarray dataset handle internally within yt to avoid copying data. If your dataset fits into memory without a problem, then you can supply an additional `use_callable=False` argument to direct yt_xarray to instead work in memory on copies of data." ] }, { "cell_type": "code", "execution_count": 19, "id": "e7b9892b-20d4-4af2-9322-d59f7413b2bc", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "yt_xarray : [INFO ] 2023-02-06 12:24:09,500: Attempting to detect if yt_xarray will require field interpolation:\n", "yt_xarray : [INFO ] 2023-02-06 12:24:09,502: Cartesian geometry on uniform grid: yt_xarray will not interpolate.\n", "yt : [INFO ] 2023-02-06 12:24:09,552 Parameters: current_time = 0.0\n", "yt : [INFO ] 2023-02-06 12:24:09,553 Parameters: domain_dimensions = [14 10 15]\n", "yt : [INFO ] 2023-02-06 12:24:09,554 Parameters: domain_left_edge = [-0.03846154 -0.05555556 -0.03571429]\n", "yt : [INFO ] 2023-02-06 12:24:09,556 Parameters: domain_right_edge = [1.03846154 1.05555556 1.03571429]\n", "yt : [INFO ] 2023-02-06 12:24:09,557 Parameters: cosmological_simulation = 0\n" ] } ], "source": [ "yt_ds = ds.yt.load_grid(fields=('temperature',), use_callable=False)" ] }, { "cell_type": "markdown", "id": "77d32668-466a-4203-a2e0-4cea364e6c4e", "metadata": {}, "source": [ "and data will be copied from the xarray dataset during creation of the yt dataset." ] }, { "cell_type": "code", "execution_count": null, "id": "d876c775-2d74-4124-8ab9-8ea540a54343", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" } }, "nbformat": 4, "nbformat_minor": 5 }