diff --git a/docs/development/policies.md b/docs/development/policies.md index 6cc751240..be7404ed9 100644 --- a/docs/development/policies.md +++ b/docs/development/policies.md @@ -22,7 +22,7 @@ Parcels follows [Intended Effort Versioning (EffVer)](https://jacobtomlinson.dev When making backward incompatible changes, we will make sure these changes and instructions to upgrade are communicated to the user via change logs or migration guides, and (where applicable) informative error messaging. -Note when conducting research we highly recommend documenting which version of Parcels (and other packages) you are using. This can be as easy as doing `conda env export > environment.yml` alongside your project code. The Parcels version used to generate an output file is also stored as metadata entry in the `.zarr` output file. +Note when conducting research we highly recommend documenting which version of Parcels (and other packages) you are using. This can be as easy as doing `conda env export > environment.yml` alongside your project code. The Parcels version used to generate an output file is also stored as metadata entry in the `.parquet` output file. ## Changes in policies diff --git a/docs/getting_started/explanation_concepts.md b/docs/getting_started/explanation_concepts.md index d8b285039..0283299ff 100644 --- a/docs/getting_started/explanation_concepts.md +++ b/docs/getting_started/explanation_concepts.md @@ -186,7 +186,7 @@ pset.execute(kernels=kernels, dt=dt, runtime=runtime) ### Output -To analyse the particle data generated in the simulation, we need to define a `parcels.ParticleFile` and add it as an argument to `parcels.ParticleSet.execute()`. The output will be written in a [zarr format](https://zarr.readthedocs.io/en/stable/), which can be opened as an `xarray.Dataset`. The dataset will contain the particle data with at least `time`, `z`, `lat` and `lon`, for each particle at timesteps defined by the `outputdt` argument. +To analyse the particle data generated in the simulation, we need to define a `parcels.ParticleFile` and add it as an argument to `parcels.ParticleSet.execute()`. The output will be written in a [parquet format](https://parquet.apache.org/), which can be opened as a `polars.DataFrame`. The dataset will contain the particle data with at least `time`, `z`, `lat` and `lon`, for each particle at timesteps defined by the `outputdt` argument. There are many ways to analyze particle output, and although we provide [a short tutorial to get started](./tutorial_output.ipynb), we recommend writing your own analysis code and checking out [related Lagrangian analysis projects in our community page](../community/index.md#analysis-code). diff --git a/docs/getting_started/tutorial_output.ipynb b/docs/getting_started/tutorial_output.ipynb index c3dbba852..a5e70a42e 100644 --- a/docs/getting_started/tutorial_output.ipynb +++ b/docs/getting_started/tutorial_output.ipynb @@ -21,7 +21,7 @@ "- [**Plotting**](#plotting)\n", "- [**Animations**](#animations)\n", "\n", - "For more advanced reading and tutorials on the analysis of Lagrangian trajectories, we recommend checking out the [Lagrangian Diagnostics Analysis Cookbook](https://lagrangian-diags.readthedocs.io/en/latest/tutorials.html) and the project in general. The [TrajAn package](https://opendrift.github.io/trajan/index.html) can be used to read and plot datasets of Lagrangian trajectories." + "For more advanced reading and tutorials on the analysis of Lagrangian trajectories, we recommend checking out the [Lagrangian Diagnostics Analysis Cookbook](https://lagrangian-diags.readthedocs.io/en/latest/tutorials.html) and the project in general." ] }, { @@ -30,10 +30,13 @@ "metadata": {}, "outputs": [], "source": [ - "from datetime import datetime, timedelta\n", + "from datetime import datetime\n", "\n", + "import matplotlib.pyplot as plt\n", "import numpy as np\n", - "import xarray as xr\n", + "import pandas as pd\n", + "import polars as pl\n", + "import pyarrow.parquet as pq\n", "\n", "import parcels\n", "import parcels.tutorial" @@ -81,14 +84,14 @@ " fieldset=fieldset, pclass=parcels.Particle, lon=lon, lat=lat, time=time, z=z\n", ")\n", "\n", - "output_file = parcels.ParticleFile(\"output.zarr\", outputdt=np.timedelta64(2, \"h\"))" + "output_file = parcels.ParticleFile(\"output.parquet\", outputdt=np.timedelta64(2, \"h\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Parcels saves some metadata in the output file with every simulation (Parcels version, CF convention information, etc.). This metadata is just a dictionary which is propogated to `xr.Dataset(attrs=...)` and is stored in the `.metadata` attribute. We are free to manipulate this dictionary to add any custom, xarray-compatible metadata relevant to their simulation. Here we add a custom metadata field `date_created` to the output file." + "Parcels saves some metadata in the output file with every simulation (Parcels version, CF convention information, etc.). This metadata is just a dictionary which is propogated to the parquet metadata. We are free to manipulate this dictionary to add any custom metadata relevant to our simulation. Here we add a custom metadata field `date_created` to the output file." ] }, { @@ -126,23 +129,37 @@ ")" ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Reading the output file\n", + "\n" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{note}\n", - "TODO: add section on chunking\n", + "As of Parcels v4, the default output format is [`parquet`](https://parquet.apache.org/) (instead of `zarr`). The `parquet` output format is a tabular format, in which every row corresponds to an observation of a particle trajectory. The `zarr` output format is a multidimensional array format, in which the data is stored in a 2D array with dimensions `traj` and `obs`. The `parquet` format is more compact and faster to read.\n", + "\n", + "However, the `parquet` format does not support the [CF-convention for trajectories data](http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_multidimensional_array_representation_of_trajectories) implemented with the [NCEI trajectory template](https://www.ncei.noaa.gov/data/oceans/ncei/formats/netcdf/v2.0/trajectoryIncomplete.cdl). We are working on efficient tooling to convert the parcels `parquet` output into a CF-compliant format.\n", + "\n", + "TODO: Add link to tracking issue on github for this tooling.\n", "```" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "## Reading the output file\n", + "Parcels exports output trajectories in `parquet` [format](https://parquet.apache.org/). Files in `parquet` are stored in tabular data, so each row corresponds to a particle at a given time step, and columns correspond to particle attributes (lon, lat, time, etc.). \n", "\n", - "Parcels exports output trajectories in `zarr` [format](https://zarr.readthedocs.io/en/stable/). Files in `zarr` are typically _much_ smaller in size than netcdf, although may be slightly more challenging to handle (but `xarray` has a fairly seamless `open_zarr()` method). Note when we display the dataset we can see our custom metadata field `date_created`.\n" + "The files can be analysed with a wide range of tools, including `pandas`, `polars` or the [`lt toolbox`](https://github.com/oj-tooth/lt_toolbox). The latter is specifically designed for the analysis of Lagrangian trajectories, and can be used to compute a wide range of Lagrangian diagnostics- but is still in alpha stage of development.\n", + "\n", + "In Polars, these files can be opened with `polars.read_parquet`:" ] }, { @@ -151,29 +168,34 @@ "metadata": {}, "outputs": [], "source": [ - "ds_particles = xr.open_zarr(\"output.zarr\")\n", + "df_polars = pl.read_parquet(\"output.parquet\")\n", "\n", - "print(ds_particles)" + "print(df_polars)" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "Once you have loaded the data as an `xarray` DataSet using `xr.open_zarr()`, you can always save the file to NetCDF if you prefer with the `.to_netcdf()` method.\n" + "To see our custom metadata field `date_created`, we need to use `pyarrow.parquet`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "schema = pq.read_schema(\"output.parquet\")\n", + "for k, v in schema.metadata.items():\n", + " print(k.decode(), \"->\", v.decode())" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "## Trajectory data structure\n", - "\n", - "The data zarr file are organised according to the [CF-convention for trajectories data](http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_multidimensional_array_representation_of_trajectories) implemented with the [NCEI trajectory template](https://www.ncei.noaa.gov/data/oceans/ncei/formats/netcdf/v2.0/trajectoryIncomplete.cdl). The data is stored in a **two-dimensional array** with the dimensions `traj` and `obs`. Each particle trajectory is essentially stored as a time series where the coordinate data (`lon`, `lat`, `time`) are a function of the observation (`obs`).\n", - "\n", - "The output dataset used here contains **10 particles** and **13 observations**. Not every particle has 13 observations however; since we released particles at different times some particle trajectories are shorter than others.\n" + "As you may have noticed above, the `time` is shown as a `float64` (in seconds) in `df_polars`. That is because `polars.read_parquet` does not automatically convert the cftime. To handle this, we also provide a helper function `parcels.read_particlefile` to read ParticleFiles, which does automatically convert the cftime. " ] }, { @@ -182,11 +204,9 @@ "metadata": {}, "outputs": [], "source": [ - "np.set_printoptions(linewidth=160)\n", - "one_hour = np.timedelta64(1, \"h\") # Define timedelta object to help with conversion\n", - "time_from_start = ds_particles[\"time\"].values - fieldset.time_interval.left\n", + "df_particles = parcels.read_particlefile(\"output.parquet\")\n", "\n", - "print(time_from_start / one_hour) # timedelta / timedelta -> float number of hours" + "print(df_particles)" ] }, { @@ -194,7 +214,22 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Note how the first observation occurs at a different time for each trajectory. So remember that `obs != time`\n" + "## Trajectory data structure\n", + "\n", + "The output dataset used here contains **10 particles** and **25 observations**. Not every particle has 25 observations however; since we released particles at different times some particle trajectories are shorter than others." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for traj in df_particles.partition_by(\"particle_id\", maintain_order=True):\n", + " time_origin = pd.Timestamp(fieldset.time_interval.left).to_pydatetime()\n", + " time_in_hour = (traj[\"time\"] - time_origin).dt.total_hours()\n", + " traj_id = traj[\"particle_id\"][0]\n", + " print(f\"Particle {traj_id}: \" + \"\".join(f\"{int(t):2d} \" for t in time_in_hour))" ] }, { @@ -202,29 +237,23 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Analysis\n", - "\n", - "Sometimes, trajectories are analyzed as they are stored: as individual time series. If we want to study the distance travelled as a function of time, the time we are interested in is the time relative to the start of the each particular trajectory: the array operations are simple since each trajectory is analyzed as a function of `obs`. The time variable is only needed to express the results in the correct units.\n" + "Note how the first observation occurs at a different time for each trajectory.\n" ] }, { - "cell_type": "code", - "execution_count": null, + "attachments": {}, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "import matplotlib.pyplot as plt\n", + "## Plotting trajectories\n", "\n", - "x = ds_particles[\"lon\"].values\n", - "y = ds_particles[\"lat\"].values\n", - "distance = np.cumsum(\n", - " np.sqrt(np.square(np.diff(x)) + np.square(np.diff(y))), axis=1\n", - ") # d = (dx^2 + dy^2)^(1/2)\n", + "Parcels output consists of particle trajectories through time and space. An important way to explore patterns in this information is to draw the trajectories in space. The [**trajan**](https://opendrift.github.io/trajan/index.html) package can be used to quickly plot parcels results, but users are encouraged to create their own figures, for example by using the comprehensive [**matplotlib**](https://matplotlib.org/) library. Here we show a basic setup on how to process the parcels output into trajectory plots and animations.\n", + "\n", + "```{warning}\n", + "Trajan is not yet compatible with the `parquet` output format, but we are working on a solution to this.\n", + "```\n", "\n", - "real_time = time_from_start / one_hour # convert time to hours\n", - "time_since_release = (\n", - " real_time.transpose() - real_time[:, 0]\n", - ") # substract the initial time from each timeseries" + "Since the `parquet` output format is tabular, a simple plot of the longitude and latitude of all particles will show one continuous line:" ] }, { @@ -233,26 +262,16 @@ "metadata": {}, "outputs": [], "source": [ - "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4), constrained_layout=True)\n", - "\n", - "ax1.set_ylabel(\"Distance travelled [m]\")\n", - "ax1.set_xlabel(\"observation\", weight=\"bold\")\n", - "d_plot = ax1.plot(distance.transpose())\n", - "\n", - "ax2.set_ylabel(\"Distance travelled [m]\")\n", - "ax2.set_xlabel(\"time since release [hours]\", weight=\"bold\")\n", - "d_plot_t = ax2.plot(time_since_release[1:], distance.transpose())\n", + "fig, ax = plt.subplots(figsize=(5, 3))\n", + "ax.plot(df_particles[\"lon\"], df_particles[\"lat\"], \".-\")\n", "plt.show()" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "The two figures above show the same graph. Time is not needed to create the first figure. The time variable minus the first value of each trajectory gives the x-axis the correct units in the second figure.\n", - "\n", - "We can also plot the distance travelled as a function of the absolute time easily, since the `time` variable matches up with the data for each individual trajectory.\n" + "To show the trajectories of individual particles, we must group the data by trajectory and plot each trajectory separately:" ] }, { @@ -261,21 +280,20 @@ "metadata": {}, "outputs": [], "source": [ - "plt.figure()\n", - "ax = plt.axes()\n", - "ax.set_ylabel(\"Distance travelled [m]\")\n", - "ax.set_xlabel(\"time [hours]\", weight=\"bold\")\n", - "d_plot_t = ax.plot(real_time.T[1:], distance.transpose())" + "fig, ax = plt.subplots(figsize=(5, 3))\n", + "for traj in df_particles.partition_by(\"particle_id\", maintain_order=True):\n", + " traj_id = traj[\"particle_id\"][0]\n", + " ax.plot(traj[\"lon\"], traj[\"lat\"], \".-\", label=f\"P{traj_id}\")\n", + "ax.legend(loc=\"center left\", bbox_to_anchor=(1.02, 0.5), borderaxespad=0.0)\n", + "plt.tight_layout()\n", + "plt.show()" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "### Conditional selection\n", - "\n", - "In other cases, the processing of the data itself however depends on the absolute time at which the observations are made, e.g. studying seasonal phenomena. In that case the array structure is not as simple: the data must be selected by their `time` value. Here we show how the mean location of the particles evolves through time. This also requires the trajectory data to be aligned in time. The data are selected using `xr.DataArray.where()` which compares the time variable to a specific time. This type of selecting data with a condition (`ds_particles['time']==time`) is a powerful tool to analyze trajectory data.\n" + "However, if you want to plot only the locations at a certain time step, you can simply select the data at that time step:" ] }, { @@ -284,65 +302,46 @@ "metadata": {}, "outputs": [], "source": [ - "# Using xarray\n", - "mean_lon_x = []\n", - "mean_lat_x = []\n", + "time_step = np.timedelta64(18, \"h\")\n", + "time_to_plot = fieldset.time_interval.left + time_step\n", + "particles = df_particles.filter(pl.col(\"time\") == pl.lit(time_to_plot))\n", "\n", - "timerange = np.arange(\n", - " np.nanmin(ds_particles[\"time\"].values),\n", - " np.nanmax(ds_particles[\"time\"].values) + np.timedelta64(timedelta(hours=2)),\n", - " timedelta(hours=2),\n", - ") # timerange in nanoseconds\n", - "\n", - "for time in timerange:\n", - " # if all trajectories share an observation at time\n", - " if np.all(np.any(ds_particles[\"time\"] == time, axis=1)):\n", - " # find the data that share the time\n", - " mean_lon_x += [\n", - " np.nanmean(ds_particles[\"lon\"].where(ds_particles[\"time\"] == time).values)\n", - " ]\n", - " mean_lat_x += [\n", - " np.nanmean(ds_particles[\"lat\"].where(ds_particles[\"time\"] == time).values)\n", - " ]" + "fig, ax = plt.subplots(figsize=(5, 3))\n", + "ax.plot(particles[\"lon\"], particles[\"lat\"], \"o\")\n", + "title_time = pd.to_datetime(time_to_plot).strftime(\"%Y-%m-%d %H:%M:%S\")\n", + "ax.set_title(f\"Particle locations at {title_time}\")\n", + "plt.show()" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "plt.figure()\n", - "ax = plt.axes()\n", - "ax.set_ylabel(\"Meridional distance [m]\")\n", - "ax.set_xlabel(\"Zonal distance [m]\")\n", - "ax.grid()\n", - "ax.scatter(mean_lon_x, mean_lat_x, marker=\"^\", s=80)\n", - "plt.show()" + "Or, if you want to plot the particles a certain amount of time after they were released, you can select the data based on the time since release:" ] }, { - "attachments": {}, - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ - "## Plotting\n", - "\n", - "Parcels output consists of particle trajectories through time and space. An important way to explore patterns in this information is to draw the trajectories in space. The [**trajan**](https://opendrift.github.io/trajan/index.html) package can be used to quickly plot parcels results, but users are encouraged to create their own figures, for example by using the comprehensive [**matplotlib**](https://matplotlib.org/) library. Here we show a basic setup on how to process the parcels output into trajectory plots and animations.\n", - "\n", - "Some other packages to help you make beautiful figures are:\n", + "time_step = np.timedelta64(18, \"h\")\n", + "particles = df_particles.filter(\n", + " (pl.col(\"time\") - pl.col(\"time\").min().over(\"particle_id\")) == pl.lit(time_step)\n", + ")\n", "\n", - "- [**cartopy**](https://scitools.org.uk/cartopy/docs/latest/), a map-drawing tool especially compatible with matplotlib\n", - "- [**trajan**](https://opendrift.github.io/trajan/index.html), a package to quickly plot trajectories\n", - "- [**cmocean**](https://matplotlib.org/cmocean/), a set of ocean-relevant colormaps\n" + "fig, ax = plt.subplots(figsize=(5, 3))\n", + "ax.plot(particles[\"lon\"], particles[\"lat\"], \"o\")\n", + "ax.set_title(f\"Particle locations {time_step} after their release\")\n", + "plt.show()" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "To draw the trajectory data in space usually it is informative to draw points at the observed coordinates to see the resolution of the output and draw a line through them to separate the different trajectories. The coordinates to draw are in `lon` and `lat` and can be passed to either `matplotlib.pyplot.plot` or `matplotlib.pyplot.scatter`. Note however, that the default way matplotlib plots 2D arrays is to plot a separate set for each column. In the parcels 2D output, the columns correspond to the `obs` dimension, so to separate the different trajectories we need to transpose the 2D array using `.T`.\n" + "We can plot the distance travelled as a function of absolute and relative time:" ] }, { @@ -351,23 +350,24 @@ "metadata": {}, "outputs": [], "source": [ - "fig, (ax1, ax2, ax3, ax4) = plt.subplots(\n", - " 1, 4, figsize=(16, 3.5), constrained_layout=True\n", - ")\n", + "fig, ax = plt.subplots(1, 2, figsize=(10, 4))\n", + "\n", + "for traj in df_particles.partition_by(\"particle_id\", maintain_order=True):\n", + " distance = np.sqrt(\n", + " (traj[\"lon\"] - traj[\"lon\"][0]) ** 2 + (traj[\"lat\"] - traj[\"lat\"][0]) ** 2\n", + " )\n", + " ax[0].plot(traj[\"time\"], distance, \".-\", label=f\"P{traj['particle_id'][0]}\")\n", "\n", - "###-Points-###\n", - "ax1.set_title(\"Points\")\n", - "ax1.scatter(ds_particles[\"lon\"].T, ds_particles[\"lat\"].T)\n", - "###-Lines-###\n", - "ax2.set_title(\"Lines\")\n", - "ax2.plot(ds_particles[\"lon\"].T, ds_particles[\"lat\"].T)\n", - "###-Points + Lines-###\n", - "ax3.set_title(\"Points + Lines\")\n", - "ax3.plot(ds_particles[\"lon\"].T, ds_particles[\"lat\"].T, marker=\"o\")\n", - "###-Not Transposed-###\n", - "ax4.set_title(\"Not transposed\")\n", - "ax4.plot(ds_particles[\"lon\"], ds_particles[\"lat\"], marker=\"o\")\n", + " rel_time = (traj[\"time\"] - traj[\"time\"][0]).dt.total_hours()\n", + " ax[1].plot(rel_time, distance, \".-\", label=f\"P{traj['particle_id'][0]}\")\n", "\n", + "ax[0].set_xlabel(\"Date\")\n", + "ax[0].set_ylabel(\"Distance travelled [degrees]\")\n", + "ax[0].tick_params(axis=\"x\", labelrotation=45)\n", + "ax[1].set_xlabel(\"Time since release [hours]\")\n", + "ax[1].set_ylabel(\"Distance travelled [degrees]\")\n", + "\n", + "plt.tight_layout()\n", "plt.show()" ] }, @@ -384,9 +384,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Trajectory plots like the ones above can become very cluttered for large sets of particles. To better see patterns, it's a good idea to create an animation in time and space. To do this, matplotlib offers an [animation package](https://matplotlib.org/stable/api/animation_api.html). Here we show how to use the [**FuncAnimation**](https://matplotlib.org/3.3.2/api/_as_gen/matplotlib.animation.FuncAnimation.html#matplotlib.animation.FuncAnimation) class to animate parcels trajectory data, based on [this visualisation tutorial](https://github.com/Parcels-code/10year-anniversary-session5/blob/eaf7ac35f43c222280fa5577858be81dc346c06b/animations_tutorial.ipynb) from 10-years Parcels. \n", - "\n", - "To correctly reveal the patterns in time we must remember that the `obs` dimension does not necessarily correspond to the `time` variable ([see the section of Trajectory data structure above](#trajectory-data-structure)). In the animation of the particles, we usually want to draw the points at each consecutive moment in time, not necessarily at each moment since the start of the trajectory. To do this we must [select the correct data](#conditional-selection) in each rendering.\n" + "Trajectory plots like the ones above can become very cluttered for large sets of particles. To better see patterns, it's a good idea to create an animation in time and space. To do this, matplotlib offers an [animation package](https://matplotlib.org/stable/api/animation_api.html). Here we show how to use the [**FuncAnimation**](https://matplotlib.org/3.3.2/api/_as_gen/matplotlib.animation.FuncAnimation.html#matplotlib.animation.FuncAnimation) class to animate parcels trajectory data.\n" ] }, { @@ -398,15 +396,8 @@ "import cartopy.crs as ccrs\n", "import cartopy.feature as cfeature\n", "import matplotlib\n", - "from matplotlib.animation import FuncAnimation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ + "from matplotlib.animation import FuncAnimation\n", + "\n", "# for interactive display of animation\n", "plt.rcParams[\"animation.html\"] = \"jshtml\"" ] @@ -417,49 +408,22 @@ "metadata": {}, "outputs": [], "source": [ - "# Number of timesteps to animate\n", - "nframes = 25 # use less frames for testing purposes\n", - "nreducedtrails = 1 # every 10th particle will have a trail (if 1, all particles have trails. Adjust for faster performance)\n", - "\n", + "time_step = np.timedelta64(2, \"h\") # time step for animation frames\n", "\n", - "# Set up the colors and associated trajectories:\n", - "# get release times for each particle (first valide obs for each trajectory)\n", - "release_times = ds_particles[\"time\"].min(dim=\"obs\", skipna=True).values\n", - "\n", - "# get unique release times and assign colors\n", - "unique_release_times = np.unique(release_times[~np.isnat(release_times)])\n", - "n_release_times = len(unique_release_times)\n", - "print(f\"Number of unique release times: {n_release_times}\")\n", + "timerange = np.arange(\n", + " np.nanmin(df_particles[\"time\"]),\n", + " np.nanmax(df_particles[\"time\"]) + time_step,\n", + " time_step,\n", + ")\n", "\n", - "# choose a continuous colormap\n", + "# set up a unique color for each trajectory\n", "colormap = matplotlib.colormaps[\"tab20b\"]\n", - "\n", - "# set up a unique color for each release time\n", - "release_time_to_color = {}\n", - "for i, release_time in enumerate(unique_release_times):\n", - " release_time_to_color[release_time] = colormap(i / max(n_release_times - 1, 1))\n", - "\n", - "\n", - "# --> Store data for all timeframes (this is needed for faster performance)\n", - "print(\"Pre-computing all particle positions...\")\n", - "all_particles_data = []\n", - "for i, target_time in enumerate(timerange):\n", - " time_id = np.where(ds_particles[\"time\"] == target_time)\n", - " lons = ds_particles[\"lon\"].values[time_id]\n", - " lats = ds_particles[\"lat\"].values[time_id]\n", - " particle_indices = time_id[0]\n", - " valid = ~np.isnan(lons) & ~np.isnan(lats)\n", - "\n", - " all_particles_data.append(\n", - " {\n", - " \"lons\": lons[valid],\n", - " \"lats\": lats[valid],\n", - " \"particle_indices\": particle_indices[valid],\n", - " \"valid_count\": np.sum(valid),\n", - " }\n", + "trajectory_to_color = {}\n", + "for i, trajectory in enumerate(df_particles[\"particle_id\"].unique()):\n", + " trajectory_to_color[trajectory] = colormap(\n", + " i / max(len(df_particles[\"particle_id\"].unique()) - 1, 1)\n", " )\n", "\n", - "\n", "# figure setup\n", "fig, ax = plt.subplots(figsize=(6, 5), subplot_kw={\"projection\": ccrs.PlateCarree()})\n", "ax.set_xlim(30, 33)\n", @@ -469,86 +433,67 @@ "ax.set_yticks(ticks=np.arange(-33, -29.5, 0.5))\n", "ax.set_yticklabels(np.arange(33, 29.5, -0.5).astype(str))\n", "ax.set_ylabel(\"Latitude (deg S)\")\n", - "ax.coastlines(color=\"saddlebrown\")\n", - "ax.add_feature(cfeature.LAND, alpha=0.5, facecolor=\"saddlebrown\")\n", - "\n", - "# --> Use pre-computed data for initial setup\n", - "initial_data = all_particles_data[0]\n", - "initial_colors = []\n", - "for particle_idx in initial_data[\"particle_indices\"]:\n", - " rt = release_times[particle_idx]\n", - " if rt in release_time_to_color:\n", - " initial_colors.append(release_time_to_color[rt])\n", - " else:\n", - " initial_colors.append(\"blue\")\n", + "ax.coastlines()\n", + "ax.add_feature(cfeature.LAND)\n", "\n", "# --> plot first timestep\n", - "scatter = ax.scatter(initial_data[\"lons\"], initial_data[\"lats\"], s=10, c=initial_colors)\n", + "particles = df_particles.filter(pl.col(\"time\") == pl.lit(timerange[0]))\n", + "scatter = ax.scatter(\n", + " particles[\"lon\"],\n", + " particles[\"lat\"],\n", + " s=10,\n", + " c=[trajectory_to_color[p] for p in particles[\"particle_id\"]],\n", + ")\n", "\n", "# --> initialize trails\n", "trail_plot = []\n", "\n", "# Set initial title\n", - "t_str = str(timerange[0])[:19] # Format datetime nicely\n", + "t_str = pd.to_datetime(timerange[0]).strftime(\n", + " \"%Y-%m-%d %H:%M:%S\"\n", + ") # Format datetime nicely\n", "title = ax.set_title(f\"Particles at t = {t_str}\")\n", "\n", "\n", "# loop over for animation\n", "def animate(i):\n", - " print(f\"Animating frame {i + 1}/{len(timerange)} at time {timerange[i]}\")\n", - " t_str = str(timerange[i])[:19]\n", + " t_str = pd.to_datetime(timerange[i]).strftime(\"%Y-%m-%d %H:%M:%S\")\n", " title.set_text(f\"Particles at t = {t_str}\")\n", "\n", " # Find particles at current time\n", - " current_data = all_particles_data[i]\n", - "\n", - " if current_data[\"valid_count\"] > 0:\n", - " current_colors = []\n", - " for particle_idx in current_data[\"particle_indices\"]:\n", - " rt = release_times[particle_idx]\n", - " current_colors.append(release_time_to_color[rt])\n", - "\n", - " scatter.set_offsets(np.c_[current_data[\"lons\"], current_data[\"lats\"]])\n", - " scatter.set_color(current_colors)\n", + " particles = df_particles.filter(pl.col(\"time\") == pl.lit(timerange[i]))\n", "\n", - " # --> add trails\n", + " if len(particles) > 0:\n", + " scatter.set_offsets(np.c_[particles[\"lon\"], particles[\"lat\"]])\n", + " scatter.set_color([trajectory_to_color[p] for p in particles[\"particle_id\"]])\n", "\n", + " # --> reset trails\n", " for trail in trail_plot:\n", " trail.remove()\n", " trail_plot.clear()\n", - "\n", " trail_length = min(10, i) # trails will have max length of 10 time steps\n", - "\n", " if trail_length > 0:\n", - " sampled_particles = current_data[\"particle_indices\"][\n", - " ::nreducedtrails\n", - " ] # use all or sample if you want faster computation\n", - "\n", - " for particle_idx in sampled_particles:\n", - " trail_lons = []\n", - " trail_lats = []\n", - " for j in range(i - trail_length, i + 1):\n", - " past_data = all_particles_data[j]\n", - " if particle_idx in past_data[\"particle_indices\"]:\n", - " idx = np.where(past_data[\"particle_indices\"] == particle_idx)[\n", - " 0\n", - " ][0]\n", - " trail_lons.append(past_data[\"lons\"][idx])\n", - " trail_lats.append(past_data[\"lats\"][idx])\n", - " if len(trail_lons) > 1:\n", - " rt = release_times[particle_idx]\n", - " color = release_time_to_color[rt]\n", + " for traj in particles[\"particle_id\"].unique():\n", + " traj_trail = df_particles.filter(\n", + " (pl.col(\"particle_id\") == traj)\n", + " & (pl.col(\"time\") >= pl.lit(timerange[max(0, i - trail_length)]))\n", + " & (pl.col(\"time\") <= pl.lit(timerange[i]))\n", + " )\n", + " if len(traj_trail) > 1:\n", " (trail,) = ax.plot(\n", - " trail_lons, trail_lats, color=color, linewidth=0.6, alpha=0.6\n", + " traj_trail[\"lon\"],\n", + " traj_trail[\"lat\"],\n", + " color=trajectory_to_color[traj],\n", + " linewidth=0.6,\n", + " alpha=0.6,\n", " )\n", " trail_plot.append(trail)\n", - "\n", " else:\n", " scatter.set_offsets(np.empty((0, 2)))\n", "\n", "\n", "# Create animation\n", - "anim = FuncAnimation(fig, animate, frames=nframes, interval=100)\n", + "anim = FuncAnimation(fig, animate, frames=len(timerange), interval=100)\n", "plt.close(fig)\n", "anim" ] @@ -557,7 +502,7 @@ "metadata": { "celltoolbar": "Metagegevens bewerken", "kernelspec": { - "display_name": "default", + "display_name": "docs", "language": "python", "name": "python3" }, @@ -571,7 +516,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.14.3" + "version": "3.14.4" } }, "nbformat": 4, diff --git a/docs/getting_started/tutorial_quickstart.md b/docs/getting_started/tutorial_quickstart.md index cf17e4bb4..5f96a5791 100644 --- a/docs/getting_started/tutorial_quickstart.md +++ b/docs/getting_started/tutorial_quickstart.md @@ -13,12 +13,12 @@ read more, we have a [concepts overview](./explanation_concepts.md) discussing t ## Imports -Parcels depends on `xarray`, expecting inputs in the form of [`xarray.Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) -and writing output files that can be read with xarray. +Parcels depends on `xarray`, expecting inputs in the form of [`xarray.Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html). Output files can be read with `pandas`. ```{code-cell} import numpy as np import xarray as xr +import polars as pl import parcels import parcels.tutorial ``` @@ -123,11 +123,11 @@ Before starting the simulation, we must define where and how frequent we want to We can define this in a {py:obj}`parcels.ParticleFile` object: ```{code-cell} -output_file = parcels.ParticleFile("output-quickstart.zarr", outputdt=np.timedelta64(1, "h")) +output_file = parcels.ParticleFile("output-quickstart.parquet", outputdt=np.timedelta64(1, "h")) ``` -The output files are in `.zarr` [format](https://zarr.readthedocs.io/en/stable/), which can be read by `xarray`. -See the [Parcels output tutorial](./tutorial_output.ipynb) for more information on the zarr format. We want to choose +The output files are in `.parquet` [format](https://parquet.apache.org/), which can be read by `polars`. +See the [Parcels output tutorial](./tutorial_output.ipynb) for more information on the parquet format. We want to choose the `outputdt` argument so that it captures the smallest timescales of our interest. ## Run Simulation: `ParticleSet.execute()` @@ -155,23 +155,22 @@ pset.execute( To start analyzing the trajectories computed by **Parcels**, we can open the `ParticleFile` using `xarray`: ```{code-cell} -ds_particles = xr.open_zarr("output-quickstart.zarr") -ds_particles +df = parcels.read_particlefile("output-quickstart.parquet") +df ``` -The 10 particle trajectories are stored along the `trajectory` dimension, and each trajectory contains 25 observations -(initial values + 24 hourly timesteps) along the `obs` dimension. The [working with Parcels output tutorial](./tutorial_output.ipynb) -provides more detail about the dataset and how to analyse it. +The file contains 250 rows: 25 observations for the 10 particle trajectories. +The [working with Parcels output tutorial](./tutorial_output.ipynb) provides more detail about the dataset and how to analyse it. Let's verify that Parcels has computed the advection of the virtual particles! ```{code-cell} import matplotlib.pyplot as plt -# plot positions and color particles by number of observation -scatter = plt.scatter(ds_particles.lon.T, ds_particles.lat.T, c=np.repeat(ds_particles.obs.values,npart)) -plt.scatter(ds_particles.lon[:,0],ds_particles.lat[:,0],facecolors="none",edgecolors='r') # starting positions -plt.scatter(lon,lat,facecolors="none",edgecolors='r') # starting positions +# plot positions and color particles by time +scatter = plt.scatter(df['lon'], df['lat'], c=df['time']) +plt.scatter(df['lon'][:npart], df['lat'][:npart], facecolors="none", edgecolors='r') # starting positions +plt.scatter(lon, lat, facecolors="none", edgecolors='r') # starting positions plt.xlim(31,33) plt.ylabel("Latitude [deg N]") plt.ylim(-33,-30) @@ -196,7 +195,7 @@ location! ```{code-cell} :tags: [hide-output] # set up output file -output_file = parcels.ParticleFile("output-backwards.zarr", outputdt=np.timedelta64(1, "h")) +output_file = parcels.ParticleFile("output-backwards.parquet", outputdt=np.timedelta64(1, "h")) # execute simulation in backwards time pset.execute( @@ -210,10 +209,11 @@ pset.execute( When we check the output, we can see that the particles have returned to their original position! ```{code-cell} -ds_particles_back = xr.open_zarr("output-backwards.zarr") +df_back = parcels.read_particlefile("output-backwards.parquet") -scatter = plt.scatter(ds_particles_back.lon.T, ds_particles_back.lat.T, c=np.repeat(ds_particles_back.obs.values,npart)) -plt.scatter(ds_particles_back.lon[:,0],ds_particles_back.lat[:,0],facecolors="none",edgecolors='r') # starting positions +scatter = plt.scatter(df_back['lon'], df_back['lat'], c=df_back['time']) +particles_at_start = df_back.filter(pl.col("time") == df_back["time"].min()) +plt.scatter(particles_at_start['lon'], particles_at_start['lat'], facecolors="none", edgecolors='r') # starting positions plt.xlabel("Longitude [deg E]") plt.xlim(31,33) plt.ylabel("Latitude [deg N]") @@ -226,6 +226,6 @@ Using Euler forward advection, the final positions are equal to the original pos ```{code-cell} # testing that final location == original location -np.testing.assert_almost_equal(ds_particles_back['lat'].values[:,-1],ds_particles['lat'].values[:,0], 2) -np.testing.assert_almost_equal(ds_particles_back['lon'].values[:,-1],ds_particles['lon'].values[:,0], 2) +np.testing.assert_almost_equal(particles_at_start["lat"], lat, 2) +np.testing.assert_almost_equal(particles_at_start['lon'], lon, 2) ``` diff --git a/docs/user_guide/examples/explanation_kernelloop.md b/docs/user_guide/examples/explanation_kernelloop.md index c4a9a58d1..1bff904e2 100644 --- a/docs/user_guide/examples/explanation_kernelloop.md +++ b/docs/user_guide/examples/explanation_kernelloop.md @@ -108,7 +108,7 @@ lats = np.linspace(-32.5, -30.5, npart) pset = parcels.ParticleSet(fieldset, pclass=parcels.Particle, z=z, lat=lats, lon=lons) output_file = parcels.ParticleFile( - store="advection_then_wind.zarr", outputdt=np.timedelta64(6,'h') + path="advection_then_wind.parquet", outputdt=np.timedelta64(6,'h') ) pset.execute( [parcels.kernels.AdvectionRK2, wind_kernel], @@ -126,7 +126,7 @@ pset_reverse = parcels.ParticleSet( fieldset, pclass=parcels.Particle, z=z, lat=lats, lon=lons ) output_file_reverse = parcels.ParticleFile( - store="wind_then_advection.zarr", outputdt=np.timedelta64(6,"h") + path="wind_then_advection.parquet", outputdt=np.timedelta64(6,"h") ) pset_reverse.execute( [wind_kernel, parcels.kernels.AdvectionRK2], @@ -140,10 +140,14 @@ Finally, plot the trajectories to show that they are identical in the two simula ```{code-cell} # Plot the resulting particle trajectories overlapped for both cases -advection_then_wind = xr.open_zarr("advection_then_wind.zarr") -wind_then_advection = xr.open_zarr("wind_then_advection.zarr") -plt.plot(wind_then_advection.lon.T, wind_then_advection.lat.T, "-") -plt.plot(advection_then_wind.lon.T, advection_then_wind.lat.T, "--", c="k", alpha=0.7) +advection_then_wind = parcels.read_particlefile("advection_then_wind.parquet") +wind_then_advection = parcels.read_particlefile("wind_then_advection.parquet") + +fig, ax = plt.subplots(figsize=(5, 3)) +for traj in wind_then_advection.partition_by("particle_id", maintain_order=True): + ax.plot(traj["lon"], traj["lat"], "-") +for traj in advection_then_wind.partition_by("particle_id", maintain_order=True): + ax.plot(traj["lon"], traj["lat"], "--", c="k", alpha=0.7) plt.show() ``` diff --git a/docs/user_guide/examples/tutorial_Argofloats.ipynb b/docs/user_guide/examples/tutorial_Argofloats.ipynb index a96e7bcb6..6704bf262 100644 --- a/docs/user_guide/examples/tutorial_Argofloats.ipynb +++ b/docs/user_guide/examples/tutorial_Argofloats.ipynb @@ -159,9 +159,8 @@ "\n", "# Create a ParticleFile object to store the output\n", "output_file = parcels.ParticleFile(\n", - " store=\"argo_float.zarr\",\n", + " \"argo_float.parquet\",\n", " outputdt=timedelta(minutes=15),\n", - " chunks=(1, 500), # setting to write in chunks of 500 observations\n", ")\n", "\n", "# Now execute the Kernels for 30 days, saving data every 30 minutes\n", @@ -183,20 +182,6 @@ "First plot the depth as a function of time, with the temperature as color (only on the upcast)." ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds_particles = xr.open_zarr(output_file.store)\n", - "x = ds_particles[\"lon\"][:].squeeze()\n", - "y = ds_particles[\"lat\"][:].squeeze()\n", - "z = ds_particles[\"z\"][:].squeeze()\n", - "time = ds_particles[\"time\"][:].squeeze()\n", - "temp = ds_particles[\"temp\"][:].squeeze()" - ] - }, { "cell_type": "code", "execution_count": null, @@ -205,10 +190,12 @@ "source": [ "import matplotlib.pyplot as plt\n", "\n", + "df = parcels.read_particlefile(\"argo_float.parquet\")\n", + "\n", "fig = plt.figure(figsize=(13, 6))\n", "ax = plt.axes()\n", - "ax.plot(time, z, color=\"gray\")\n", - "cb = ax.scatter(time, z, c=temp, s=20, marker=\"o\", zorder=2)\n", + "ax.plot(df[\"time\"], df[\"z\"], color=\"gray\")\n", + "cb = ax.scatter(df[\"time\"], df[\"z\"], c=df[\"temp\"], s=20, marker=\"o\", zorder=2)\n", "ax.set_xlabel(\"Time [days]\")\n", "ax.set_ylabel(\"Depth (m)\")\n", "ax.invert_yaxis()\n", @@ -234,12 +221,12 @@ "fig = plt.figure(figsize=(13, 8))\n", "ax = plt.axes(projection=\"3d\")\n", "ax.view_init(azim=-145)\n", - "ax.plot3D(x, y, z, color=\"gray\")\n", - "cb = ax.scatter(x, y, z, c=temp, s=20, marker=\"o\", zorder=2)\n", + "ax.plot3D(df[\"lon\"], df[\"lat\"], df[\"z\"], color=\"gray\")\n", + "cb = ax.scatter(df[\"lon\"], df[\"lat\"], df[\"z\"], c=df[\"temp\"], s=20, marker=\"o\", zorder=2)\n", "ax.set_xlabel(\"Longitude\")\n", "ax.set_ylabel(\"Latitude\")\n", "ax.set_zlabel(\"Depth (m)\")\n", - "ax.set_zlim(np.max(z), 0)\n", + "ax.set_zlim(df[\"z\"].max(), 0)\n", "fig.colorbar(cb, label=\"Temperature (°C)\")\n", "plt.show()" ] @@ -261,7 +248,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.14.2" + "version": "3.14.4" } }, "nbformat": 4, diff --git a/docs/user_guide/examples/tutorial_croco_3D.ipynb b/docs/user_guide/examples/tutorial_croco_3D.ipynb index 7ea142439..50917272e 100644 --- a/docs/user_guide/examples/tutorial_croco_3D.ipynb +++ b/docs/user_guide/examples/tutorial_croco_3D.ipynb @@ -34,7 +34,17 @@ "metadata": {}, "outputs": [], "source": [ - "import matplotlib.pyplot as plt\nimport numpy as np\nimport xarray as xr\n\nimport parcels\nimport parcels.tutorial\n\nds_fields = parcels.tutorial.open_dataset(\"CROCOidealized_data/data\")\n\nds_fields.load(); # Preload data to speed up access" + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "import polars as pl\n", + "import xarray as xr\n", + "\n", + "import parcels\n", + "import parcels.tutorial\n", + "\n", + "ds_fields = parcels.tutorial.open_dataset(\"CROCOidealized_data/data\")\n", + "\n", + "ds_fields.load(); # Preload data to speed up access" ] }, { @@ -119,7 +129,7 @@ ")\n", "\n", "outputfile = parcels.ParticleFile(\n", - " store=\"croco_particles3D.zarr\",\n", + " path=\"croco_particles3D.parquet\",\n", " outputdt=np.timedelta64(5000, \"s\"),\n", ")\n", "\n", @@ -149,10 +159,11 @@ "outputs": [], "source": [ "fig, ax = plt.subplots(1, 1, figsize=(6, 4))\n", - "ds = xr.open_zarr(\"croco_particles3D.zarr\")\n", + "df = pl.read_parquet(\"croco_particles3D.parquet\")\n", "\n", "ax.plot(X / 1e3, Z, \"k.\", label=\"Initial positions\")\n", - "ax.plot(ds.lon.T / 1e3, ds.z.T, \".-\")\n", + "for traj in df.partition_by(\"particle_id\", maintain_order=True):\n", + " ax.plot(traj[\"lon\"] / 1e3, traj[\"z\"], \".-\")\n", "\n", "for z in ds_fields.s_w.values:\n", " ax.plot(\n", @@ -208,7 +219,7 @@ ")\n", "\n", "outputfile = parcels.ParticleFile(\n", - " store=\"croco_particles_noW.zarr\", outputdt=np.timedelta64(5000, \"s\")\n", + " path=\"croco_particles_noW.parquet\", outputdt=np.timedelta64(5000, \"s\")\n", ")\n", "\n", "pset_noW.execute(\n", @@ -219,10 +230,11 @@ ")\n", "\n", "fig, ax = plt.subplots(1, 1, figsize=(6, 4))\n", - "ds = xr.open_zarr(\"croco_particles_noW.zarr\")\n", + "df = pl.read_parquet(\"croco_particles_noW.parquet\")\n", "\n", "ax.plot(X / 1e3, Z, \"k.\", label=\"Initial positions\")\n", - "ax.plot(ds.lon.T / 1e3, ds.z.T, \".-\")\n", + "for traj in df.partition_by(\"particle_id\", maintain_order=True):\n", + " ax.plot(traj[\"lon\"] / 1e3, traj[\"z\"], \".-\")\n", "\n", "for z in ds_fields.s_w.values:\n", " ax.plot(\n", @@ -306,7 +318,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.14.2" + "version": "3.14.4" } }, "nbformat": 4, diff --git a/docs/user_guide/examples/tutorial_delaystart.ipynb b/docs/user_guide/examples/tutorial_delaystart.ipynb index 8bb3ffd95..7678e2a17 100644 --- a/docs/user_guide/examples/tutorial_delaystart.ipynb +++ b/docs/user_guide/examples/tutorial_delaystart.ipynb @@ -26,6 +26,8 @@ "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", + "import pandas as pd\n", + "import polars as pl\n", "import xarray as xr\n", "from matplotlib.animation import FuncAnimation\n", "\n", @@ -115,7 +117,7 @@ "outputs": [], "source": [ "output_file = parcels.ParticleFile(\n", - " \"delayparticle_time.zarr\", outputdt=np.timedelta64(1, \"h\")\n", + " \"delayparticle_time.parquet\", outputdt=np.timedelta64(1, \"h\")\n", ")\n", "\n", "pset.execute(\n", @@ -140,7 +142,7 @@ "metadata": {}, "outputs": [], "source": [ - "ds_particles = xr.open_zarr(\"delayparticle_time.zarr\")\n", + "df_particles = parcels.read_particlefile(\"delayparticle_time.parquet\")\n", "\n", "fig = plt.figure(figsize=(7, 5), constrained_layout=True)\n", "ax = fig.add_subplot()\n", @@ -150,27 +152,17 @@ "ax.set_xlim(31, 33)\n", "ax.set_ylim(-32, -30)\n", "\n", - "timerange = np.unique(ds_particles[\"time\"].values[np.isfinite(ds_particles[\"time\"])])\n", + "timerange = df_particles[\"time\"].unique()\n", "\n", - "# Indices of the data where time = 0\n", - "time_id = np.where(ds_particles[\"time\"] == timerange[0])\n", - "\n", - "sc = ax.scatter(\n", - " ds_particles[\"lon\"].values[time_id], ds_particles[\"lat\"].values[time_id]\n", - ")\n", - "\n", - "t = timerange[0].astype(\"datetime64[h]\")\n", - "title = ax.set_title(f\"Particles at t = {t}\")\n", + "particles = df_particles.filter(pl.col(\"time\") == pl.lit(timerange[0]))\n", + "sc = ax.scatter(particles[\"lon\"], particles[\"lat\"])\n", + "title = ax.set_title(f\"Particles at t = {timerange[0]}\")\n", "\n", "\n", "def animate(i):\n", - " t = timerange[i].astype(\"datetime64[h]\")\n", - " title.set_text(f\"Particles at t = {t}\")\n", - "\n", - " time_id = np.where(ds_particles[\"time\"] == timerange[i])\n", - " sc.set_offsets(\n", - " np.c_[ds_particles[\"lon\"].values[time_id], ds_particles[\"lat\"].values[time_id]]\n", - " )\n", + " particles = df_particles.filter(pl.col(\"time\") == pl.lit(timerange[i]))\n", + " sc.set_offsets(np.c_[particles[\"lon\"], particles[\"lat\"]])\n", + " title.set_text(f\"Particles at t = {timerange[i]}\")\n", "\n", "\n", "anim = FuncAnimation(fig, animate, frames=len(timerange), interval=100)\n", @@ -254,7 +246,7 @@ "outputs": [], "source": [ "output_file = parcels.ParticleFile(\n", - " \"delayparticle_releasedt.zarr\", outputdt=np.timedelta64(1, \"h\")\n", + " \"delayparticle_releasedt.parquet\", outputdt=np.timedelta64(1, \"h\")\n", ")\n", "\n", "pset.execute(\n", @@ -279,7 +271,7 @@ "metadata": {}, "outputs": [], "source": [ - "ds_particles = xr.open_zarr(\"delayparticle_releasedt.zarr\")\n", + "df_particles = parcels.read_particlefile(\"delayparticle_releasedt.parquet\")\n", "\n", "fig = plt.figure(figsize=(7, 5), constrained_layout=True)\n", "ax = fig.add_subplot()\n", @@ -289,27 +281,17 @@ "ax.set_xlim(31, 33)\n", "ax.set_ylim(-32, -30)\n", "\n", - "timerange = np.unique(ds_particles[\"time\"].values[np.isfinite(ds_particles[\"time\"])])\n", + "timerange = df_particles[\"time\"].unique()\n", "\n", - "# Indices of the data where time = 0\n", - "time_id = np.where(ds_particles[\"time\"] == timerange[0])\n", - "\n", - "sc = ax.scatter(\n", - " ds_particles[\"lon\"].values[time_id], ds_particles[\"lat\"].values[time_id]\n", - ")\n", - "\n", - "t = timerange[0].astype(\"datetime64[h]\")\n", - "title = ax.set_title(f\"Particles at t = {t}\")\n", + "particles = df_particles.filter(pl.col(\"time\") == pl.lit(timerange[0]))\n", + "sc = ax.scatter(particles[\"lon\"], particles[\"lat\"])\n", + "title = ax.set_title(f\"Particles at t = {timerange[0]}\")\n", "\n", "\n", "def animate(i):\n", - " t = timerange[i].astype(\"datetime64[h]\")\n", - " title.set_text(f\"Particles at t = {t}\")\n", - "\n", - " time_id = np.where(ds_particles[\"time\"] == timerange[i])\n", - " sc.set_offsets(\n", - " np.c_[ds_particles[\"lon\"].values[time_id], ds_particles[\"lat\"].values[time_id]]\n", - " )\n", + " particles = df_particles.filter(pl.col(\"time\") == pl.lit(timerange[i]))\n", + " sc.set_offsets(np.c_[particles[\"lon\"], particles[\"lat\"]])\n", + " title.set_text(f\"Particles at t = {timerange[i]}\")\n", "\n", "\n", "anim = FuncAnimation(fig, animate, frames=len(timerange), interval=100)\n", @@ -326,20 +308,7 @@ "\n", "Note that, because the `outputdt` variable controls the Kernel-loop, all particles are written _at the same time_, even when they start at a non-multiple of `outputdt`.\n", "\n", - "For example, if your particles start at `time=[0, 1, 2]` and `outputdt=2`, then the times written (for `dt=1` and `endtime=4`) will be\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "outtime_expected = np.array(\n", - " [[0, 2, 4], [2, 4, np.datetime64(\"NaT\")], [2, 4, np.datetime64(\"NaT\")]],\n", - " dtype=\"timedelta64[h]\",\n", - ")\n", - "print(outtime_expected)" + "For example, if your particles start at `time=[0, 1, 2]` and `outputdt=2`, then the times written (for `dt=1` and `endtime=4`) will be `[0, 2, 2, 2, 4, 4, 4]`" ] }, { @@ -352,7 +321,7 @@ }, "outputs": [], "source": [ - "outfilepath = \"delayparticle_nonmatchingtime.zarr\"\n", + "outfilepath = \"delayparticle_nonmatchingtime.parquet\"\n", "\n", "pset = parcels.ParticleSet(\n", " fieldset=fieldset,\n", @@ -377,7 +346,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "And indeed, the `time` values in the NetCDF output file are as expected\n" + "And indeed, the `time` values in the output file are as expected\n" ] }, { @@ -386,15 +355,8 @@ "metadata": {}, "outputs": [], "source": [ - "outtime_infile = (\n", - " xr.open_zarr(outfilepath).time.values[:] - ds_fields.time.values[0]\n", - ") # subtract initial time to convert from datetime64 to timedelta64\n", - "print(outtime_infile.astype(\"timedelta64[h]\"))\n", - "\n", - "assert (\n", - " outtime_expected[np.isfinite(outtime_expected)]\n", - " == outtime_infile[np.isfinite(outtime_infile)]\n", - ").all()" + "outtime_infile = parcels.read_particlefile(outfilepath)\n", + "print(outtime_infile[\"time\"])" ] }, { @@ -420,6 +382,7 @@ " time=ds_fields.time.values[0] + times,\n", " z=[0.5] * len(times),\n", " )\n", + " outfilepath = f\"delayparticle_nonmatchingtime_{times[0]}_{times[1]}.parquet\"\n", " output_file = parcels.ParticleFile(outfilepath, outputdt=np.timedelta64(2, \"h\"))\n", " pset.execute(\n", " parcels.kernels.AdvectionRK2,\n", @@ -428,8 +391,8 @@ " output_file=output_file,\n", " verbose_progress=False,\n", " )\n", - " outtime_infile = xr.open_zarr(outfilepath).time.values[:] - ds_fields.time.values[0]\n", - " print(outtime_infile.astype(\"timedelta64[h]\"))" + " outtime_infile = parcels.read_particlefile(outfilepath)\n", + " print(outtime_infile[\"time\"])" ] } ], @@ -449,7 +412,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.14.2" + "version": "3.14.4" } }, "nbformat": 4, diff --git a/docs/user_guide/examples/tutorial_diffusion.ipynb b/docs/user_guide/examples/tutorial_diffusion.ipynb index 5010f8406..8541433d5 100644 --- a/docs/user_guide/examples/tutorial_diffusion.ipynb +++ b/docs/user_guide/examples/tutorial_diffusion.ipynb @@ -116,7 +116,6 @@ "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", - "import trajan as ta\n", "import xarray as xr\n", "\n", "import parcels\n", @@ -253,11 +252,7 @@ "outputs": [], "source": [ "testParticles = get_test_particles()\n", - "output_file = parcels.ParticleFile(\n", - " store=\"M1_out.zarr\",\n", - " chunks=(len(testParticles), 50),\n", - " outputdt=np.timedelta64(1, \"ms\"),\n", - ")\n", + "output_file = parcels.ParticleFile(\"M1_out.parquet\", outputdt=np.timedelta64(1, \"ms\"))\n", "\n", "testParticles.execute(\n", " parcels.kernels.AdvectionDiffusionM1,\n", @@ -273,7 +268,7 @@ "metadata": {}, "outputs": [], "source": [ - "M1_out = xr.open_zarr(\"M1_out.zarr\")" + "M1_out = parcels.read_particlefile(\"M1_out.parquet\")" ] }, { @@ -294,19 +289,24 @@ "fig.set_figwidth(12)\n", "\n", "x = np.arange(0, 0.3, 0.001)\n", - "for data, ai, dim, ystart, ylim in zip(\n", - " [M1_out.lat, M1_out.lon], ax, (\"y\", \"x\"), (0.75, 0), [(0, 1), (-1, 1)]\n", - "):\n", - " ai.plot(x, data.T[: len(x), :], alpha=0.3)\n", - " ai.scatter(0, ystart, s=20, c=\"r\", zorder=3)\n", + "for traj in M1_out.partition_by(\"particle_id\", maintain_order=True):\n", + " ax[0].plot(x, traj[\"lat\"][: len(x)], alpha=0.3)\n", + "ax[0].scatter(0, 0.75, s=20, c=\"r\", zorder=3)\n", + "ax[0].set_ylabel(\"y\")\n", + "ax[0].set_ylim(0, 1)\n", + "\n", + "for traj in M1_out.partition_by(\"particle_id\", maintain_order=True):\n", + " ax[1].plot(x, traj[\"lon\"][: len(x)], alpha=0.3)\n", + "ax[1].scatter(0, 0, s=20, c=\"r\", zorder=3)\n", + "ax[1].set_ylabel(\"x\")\n", + "ax[1].set_ylim(-1, 1)\n", + "\n", + "for ai in ax:\n", " ai.set_xlabel(\"t\")\n", - " ai.set_ylabel(dim)\n", " ai.set_xlim(0, 0.3)\n", - " ai.set_ylim(ylim)\n", "\n", "fig.suptitle(\n", - " \"`AdvectionDiffusionM1` Simulation: \"\n", - " f\"Particle trajectories in the x- and y-directions against time\"\n", + " \"`AdvectionDiffusionM1` Simulation: Particle trajectories in the x- and y-directions against time\"\n", ")\n", "plt.show()" ] @@ -332,11 +332,7 @@ "outputs": [], "source": [ "testParticles = get_test_particles()\n", - "output_file = parcels.ParticleFile(\n", - " store=\"EM_out.zarr\",\n", - " chunks=(len(testParticles), 50),\n", - " outputdt=np.timedelta64(1, \"ms\"),\n", - ")\n", + "output_file = parcels.ParticleFile(\"EM_out.parquet\", outputdt=np.timedelta64(1, \"ms\"))\n", "np.random.seed(1636) # Random seed for reproducibility\n", "testParticles.execute(\n", " parcels.kernels.AdvectionDiffusionEM,\n", @@ -352,7 +348,7 @@ "metadata": {}, "outputs": [], "source": [ - "EM_out = xr.open_zarr(\"EM_out.zarr\")" + "EM_out = parcels.read_particlefile(\"EM_out.parquet\")" ] }, { @@ -365,19 +361,25 @@ "fig.set_figwidth(12)\n", "\n", "x = np.arange(0, 0.3, 0.001)\n", - "for data, ai, dim, ystart, ylim in zip(\n", - " [EM_out.lat, EM_out.lon], ax, (\"y\", \"x\"), (0.75, 0), [(0, 1), (-1, 1)]\n", - "):\n", - " ai.plot(x, data.T[: len(x), :], alpha=0.3)\n", - " ai.scatter(0, ystart, s=20, c=\"r\", zorder=3)\n", + "x = np.arange(0, 0.3, 0.001)\n", + "for traj in EM_out.partition_by(\"particle_id\", maintain_order=True):\n", + " ax[0].plot(x, traj[\"lat\"][: len(x)], alpha=0.3)\n", + "ax[0].scatter(0, 0.75, s=20, c=\"r\", zorder=3)\n", + "ax[0].set_ylabel(\"y\")\n", + "ax[0].set_ylim(0, 1)\n", + "\n", + "for traj in EM_out.partition_by(\"particle_id\", maintain_order=True):\n", + " ax[1].plot(x, traj[\"lon\"][: len(x)], alpha=0.3)\n", + "ax[1].scatter(0, 0, s=20, c=\"r\", zorder=3)\n", + "ax[1].set_ylabel(\"x\")\n", + "ax[1].set_ylim(-1, 1)\n", + "\n", + "for ai in ax:\n", " ai.set_xlabel(\"t\")\n", - " ai.set_ylabel(dim)\n", " ai.set_xlim(0, 0.3)\n", - " ai.set_ylim(ylim)\n", "\n", "fig.suptitle(\n", - " \"`AdvectionDiffusionEM` Simulation: \"\n", - " f\"Particle trajectories in the x- and y-directions against time\"\n", + " \"`AdvectionDiffusionEM` Simulation: Particle trajectories in the x- and y-directions against time\"\n", ")\n", "plt.show()" ] @@ -568,9 +570,7 @@ " lon=lon,\n", ")\n", "\n", - "output_file = parcels.ParticleFile(\n", - " store=\"smagdiff.zarr\", outputdt=np.timedelta64(1, \"h\"), chunks=(1, 57)\n", - ")\n", + "output_file = parcels.ParticleFile(\"smagdiff.parquet\", outputdt=np.timedelta64(1, \"h\"))\n", "\n", "np.random.seed(1636) # Random seed for reproducibility\n", "\n", @@ -596,13 +596,14 @@ "metadata": {}, "outputs": [], "source": [ - "ds_particles = xr.open_zarr(\"smagdiff.zarr\")\n", + "df = parcels.read_particlefile(\"smagdiff.parquet\")\n", "\n", "temperature = ds_fields.isel(time=0, depth=0).thetao.plot(cmap=\"magma\")\n", "velocity = ds_fields.isel(time=0, depth=0).plot.quiver(\n", " x=\"longitude\", y=\"latitude\", u=\"uo\", v=\"vo\"\n", ")\n", - "particles = ds_particles.traj.plot(color=\"blue\")\n", + "for traj in df.partition_by(\"particle_id\", maintain_order=True):\n", + " plt.plot(traj[\"lon\"], traj[\"lat\"], color=\"blue\")\n", "plt.ylim(-31, -30)\n", "plt.xlim(31, 32.1)\n", "plt.show()" @@ -634,7 +635,7 @@ ], "metadata": { "kernelspec": { - "display_name": "test-notebooks", + "display_name": "docs", "language": "python", "name": "python3" }, @@ -648,7 +649,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.0" + "version": "3.14.4" } }, "nbformat": 4, diff --git a/docs/user_guide/examples/tutorial_dt_integrators.ipynb b/docs/user_guide/examples/tutorial_dt_integrators.ipynb index bd4d93de5..2acd3ebda 100644 --- a/docs/user_guide/examples/tutorial_dt_integrators.ipynb +++ b/docs/user_guide/examples/tutorial_dt_integrators.ipynb @@ -53,10 +53,12 @@ "metadata": {}, "outputs": [], "source": [ + "from pathlib import Path\n", + "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", - "import xarray as xr\n", + "import polars as pl\n", "\n", "import parcels\n", "import parcels.tutorial\n", @@ -67,6 +69,9 @@ ")\n", "ds_fields.load() # load the dataset into memory\n", "\n", + "OUTPUT_FOLDER = Path(\"output\")\n", + "OUTPUT_FOLDER.mkdir(exist_ok=True)\n", + "\n", "# Convert to SGRID-compliant dataset and create FieldSet\n", "fields = {\"U\": ds_fields[\"uo\"], \"V\": ds_fields[\"vo\"]}\n", "ds_fset = parcels.convert.copernicusmarine_to_sgrid(fields=fields)\n", @@ -237,14 +242,11 @@ " lon=initial_release_lons,\n", " )\n", " outputdt = dt\n", - " chunks = int(\n", - " runtime / outputdt / 2\n", - " ) # Because we will store a lot of positions, to speed up our simulation we need to chunk the output datafile\n", "\n", " pfile = parcels.ParticleFile(\n", - " store=f\"output/AdvectionRK2_dt_{int(dt / np.timedelta64(1, 's'))}s.zarr\",\n", + " path=OUTPUT_FOLDER\n", + " / f\"AdvectionRK2_dt_{int(dt / np.timedelta64(1, 's'))}s.parquet\",\n", " outputdt=outputdt,\n", - " chunks=(len(pset), chunks),\n", " )\n", "\n", " print(f\"Begin simulation for dt = {int(dt / np.timedelta64(1, 's'))} s\")\n", @@ -294,18 +296,21 @@ "ax = plt.axes()\n", "temperature = ds_fields.isel(time=0, depth=0).thetao.plot(cmap=\"Greys\")\n", "for j, dt in enumerate(dt_choices):\n", - " ds = xr.open_zarr(\n", - " f\"output/AdvectionRK2_dt_{int(dt / np.timedelta64(1, 's'))}s.zarr\"\n", - " )\n", - " labels = [f\"dt = {str(dt)}\"] + [None] * (ds.lon.shape[0] - 1)\n", - " ax.plot(\n", - " ds.lon.T,\n", - " ds.lat.T,\n", - " alpha=0.75,\n", - " color=plt.cm.viridis(dt_colours[j]),\n", - " label=labels,\n", + " df = parcels.read_particlefile(\n", + " OUTPUT_FOLDER / f\"AdvectionRK2_dt_{int(dt / np.timedelta64(1, 's'))}s.parquet\"\n", " )\n", - "ax.scatter(ds.lon[:, 0], ds.lat[:, 0], c=\"r\", marker=\"s\", label=\"starting locations\")\n", + " for i, traj in enumerate(df.partition_by(\"particle_id\", maintain_order=True)):\n", + " ax.plot(\n", + " traj[\"lon\"],\n", + " traj[\"lat\"],\n", + " alpha=0.75,\n", + " color=plt.cm.viridis(dt_colours[j]),\n", + " label=f\"dt = {dt}\" if i == 0 else None,\n", + " )\n", + "df_start = df.filter(pl.col(\"time\") == df[\"time\"].min())\n", + "ax.scatter(\n", + " df_start[\"lon\"], df_start[\"lat\"], c=\"r\", marker=\"s\", label=\"starting locations\"\n", + ")\n", "ax.legend()\n", "ax.set_ylim(-32.7, -31.3)\n", "ax.set_xlim(31, 32.4)\n", @@ -335,7 +340,6 @@ " Haversine formula used, which assumes the Earth is a sphere.\n", " source: https://stackoverflow.com/questions/19412462/getting-distance-between-two-points-based-on-latitude-longitude\n", " \"\"\"\n", - "\n", " R = 6371.0 # approximate radius of earth in km\n", "\n", " lat1 = np.radians(lata)\n", @@ -378,45 +382,37 @@ "axs[1].set_ylim(0, 50)\n", "\n", "# set 5 minute dt as benchmark\n", - "ds_5min = xr.open_zarr(f\"output/AdvectionRK2_dt_300s.zarr\")\n", + "df_5min = parcels.read_particlefile(OUTPUT_FOLDER / \"AdvectionRK2_dt_300s.parquet\")\n", "for i, dt in enumerate(dt_choices[:-1]):\n", - " ds = xr.open_zarr(\n", - " f\"output/AdvectionRK2_dt_{int(dt / np.timedelta64(1, 's'))}s.zarr\"\n", + " df = parcels.read_particlefile(\n", + " OUTPUT_FOLDER / f\"AdvectionRK2_dt_{int(dt / np.timedelta64(1, 's'))}s.parquet\"\n", " )\n", - " labels = [f\"dt = {str(dt)}\"] + [None] * (ds.lon.shape[0] - 1)\n", "\n", " # subset 5 minute data to match dt\n", - " lon_5min_sub = ds_5min.lon.where(\n", - " ds_5min.time.isin(ds.time.values).compute(), drop=True\n", - " ).values\n", - " lat_5min_sub = ds_5min.lat.where(\n", - " ds_5min.time.isin(ds.time.values).compute(), drop=True\n", - " ).values\n", - "\n", - " # remove nans\n", - " lon_valid = ds.lon.where(~np.isnan(ds.lon).compute(), drop=True).values\n", - " lat_valid = ds.lat.where(~np.isnan(ds.lat).compute(), drop=True).values\n", + " lon_5min_sub = df_5min.filter(pl.col(\"time\").is_in(df[\"time\"].implode()))[\"lon\"]\n", + " lat_5min_sub = df_5min.filter(pl.col(\"time\").is_in(df[\"time\"].implode()))[\"lat\"]\n", "\n", " # compute separation distance between each particle in km\n", - " dist = dist_km(lon_valid, lon_5min_sub, lat_valid, lat_5min_sub)\n", + " dist = dist_km(df[\"lon\"], lon_5min_sub, df[\"lat\"], lat_5min_sub)\n", + " df = df.with_columns(pl.Series(\"dist\", dist))\n", "\n", " # plot\n", - " time_valid = ds.time.where(~np.isnan(ds.time).compute(), drop=True)\n", - " axs[0].plot(\n", - " time_valid.T,\n", - " dist.T,\n", - " alpha=0.75,\n", - " color=plt.cm.viridis(dt_colours[i]),\n", - " label=labels,\n", - " )\n", - " axs[1].plot(\n", - " time_valid.T,\n", - " dist.T,\n", - " alpha=0.75,\n", - " color=plt.cm.viridis(dt_colours[i]),\n", - " label=labels,\n", - " )\n", - " dist_end[i] = dist[:, -1]\n", + " for j, traj in enumerate(df.partition_by(\"particle_id\", maintain_order=True)):\n", + " axs[0].plot(\n", + " traj[\"time\"],\n", + " traj[\"dist\"],\n", + " alpha=0.75,\n", + " color=plt.cm.viridis(dt_colours[i]),\n", + " label=f\"dt = {dt}\" if j == 0 else None,\n", + " )\n", + " axs[1].plot(\n", + " traj[\"time\"],\n", + " traj[\"dist\"],\n", + " alpha=0.75,\n", + " color=plt.cm.viridis(dt_colours[i]),\n", + " label=f\"dt = {dt}\" if j == 0 else None,\n", + " )\n", + " dist_end[i] = df.filter(pl.col(\"time\") == df[\"time\"].max())[\"dist\"]\n", "axs[0].legend()\n", "axs[0].set_ylabel(\"Distance (km)\")\n", "plt.show()" @@ -445,7 +441,7 @@ " (dt / np.timedelta64(1, \"m\")).astype(int),\n", " np.mean(dist_end[i]),\n", " color=plt.cm.viridis(dt_colours[i]),\n", - " label=f\"dt = {str(dt)}\",\n", + " label=f\"dt = {dt}\",\n", " )\n", "ax[0].plot(\n", " (dt_choices[:-1] / np.timedelta64(1, \"m\")).astype(int), np.mean(dist_end, axis=1)\n", @@ -460,7 +456,7 @@ " (dt / np.timedelta64(1, \"m\")).astype(int),\n", " sim_duration[i],\n", " color=plt.cm.viridis(dt_colours[i]),\n", - " label=f\"dt = {str(dt)}\",\n", + " label=f\"dt = {dt}\",\n", " )\n", "ax[1].set_ylabel(\"Simulation Duration (s)\")\n", "ax[1].set_xlabel(\"dt (minutes)\")\n", @@ -589,14 +585,11 @@ " lon=initial_release_lons,\n", " )\n", " outputdt = dt\n", - " chunks = int(\n", - " runtime / outputdt / 2\n", - " ) # Because we will store a lot of positions, to speed up our simulation we need to chunk the output datafile\n", "\n", " pfile = parcels.ParticleFile(\n", - " store=f\"output/{advection_scheme.__name__}_dt_{int(dt / np.timedelta64(1, 's'))}s.zarr\",\n", + " path=OUTPUT_FOLDER\n", + " / f\"KernelCompare_{advection_scheme.__name__}_dt_{int(dt / np.timedelta64(1, 's'))}s.parquet\",\n", " outputdt=outputdt,\n", - " chunks=(len(pset), chunks),\n", " )\n", "\n", " print(\n", @@ -630,22 +623,24 @@ "for i, dt in enumerate(dt_choices):\n", " m = i // 3\n", " n = i % 3\n", - " axs[m, n].set_title(f\"dt = {str(dt)}\")\n", + " axs[m, n].set_title(f\"dt = {dt}\")\n", " axs[m, n].set_xlabel(\"Longitude\")\n", " for j, advection_scheme in enumerate(advection_schemes):\n", - " ds = xr.open_zarr(\n", - " f\"output/{advection_scheme.__name__}_dt_{int(dt / np.timedelta64(1, 's'))}s.zarr\"\n", - " )\n", - " labels = [f\"{advection_scheme.__name__}\"] + [None] * (ds.lon.shape[0] - 1)\n", - " axs[m, n].plot(\n", - " ds.lon.T,\n", - " ds.lat.T,\n", - " alpha=0.75,\n", - " color=plt.cm.viridis(scheme_colours[j]),\n", - " label=labels,\n", + " df = parcels.read_particlefile(\n", + " OUTPUT_FOLDER\n", + " / f\"KernelCompare_{advection_scheme.__name__}_dt_{int(dt / np.timedelta64(1, 's'))}s.parquet\"\n", " )\n", + " for i, traj in enumerate(df.partition_by(\"particle_id\", maintain_order=True)):\n", + " axs[m, n].plot(\n", + " traj[\"lon\"],\n", + " traj[\"lat\"],\n", + " alpha=0.75,\n", + " color=plt.cm.viridis(scheme_colours[j]),\n", + " label=f\"{advection_scheme.__name__}\" if i == 0 else None,\n", + " )\n", + " df_start = df.filter(pl.col(\"time\") == df[\"time\"].min())\n", " axs[m, n].scatter(\n", - " ds.lon[:, 0], ds.lat[:, 0], c=\"r\", marker=\"s\", label=\"starting locations\"\n", + " df_start[\"lon\"], df_start[\"lat\"], c=\"r\", marker=\"s\", label=\"starting locations\"\n", " )\n", " axs[m, n].grid()\n", "axs[-1, -1].axis(\"off\")\n", @@ -677,46 +672,39 @@ "for i, dt in enumerate(dt_choices):\n", " m = i // 3\n", " n = i % 3\n", - " axs[m, n].set_title(f\"dt = {str(dt)}\")\n", + " axs[m, n].set_title(f\"dt = {dt}\")\n", " axs[m, n].set_xlabel(\"Time\")\n", " axs[m, n].tick_params(\"x\", rotation=45)\n", " axs[m, n].set_yscale(\"log\")\n", " axs[m, n].set_ylim(1e-4, 1e1)\n", - " ds_RK4 = xr.open_zarr(\n", - " f\"output/AdvectionRK4_dt_{int(dt / np.timedelta64(1, 's'))}s.zarr\"\n", + " df_RK4 = parcels.read_particlefile(\n", + " OUTPUT_FOLDER\n", + " / f\"KernelCompare_AdvectionRK4_dt_{int(dt / np.timedelta64(1, 's'))}s.parquet\"\n", " )\n", " for j, advection_scheme in enumerate(advection_schemes[:-1]):\n", - " ds = xr.open_zarr(\n", - " f\"output/{advection_scheme.__name__}_dt_{int(dt / np.timedelta64(1, 's'))}s.zarr\"\n", - " )\n", - " labels = [f\"|{advection_scheme.__name__} - AdvectionRK4|\"] + [None] * (\n", - " ds.lon.shape[0] - 1\n", + " df = parcels.read_particlefile(\n", + " OUTPUT_FOLDER\n", + " / f\"KernelCompare_{advection_scheme.__name__}_dt_{int(dt / np.timedelta64(1, 's'))}s.parquet\"\n", " )\n", "\n", - " # remove nans\n", - " lon_valid_RK4 = ds_RK4.lon.where(\n", - " ~np.isnan(ds_RK4.lon).compute(), drop=True\n", - " ).values\n", - " lat_valid_RK4 = ds_RK4.lat.where(\n", - " ~np.isnan(ds_RK4.lat).compute(), drop=True\n", - " ).values\n", - " lon_valid = ds.lon.where(~np.isnan(ds.lon).compute(), drop=True).values\n", - " lat_valid = ds.lat.where(~np.isnan(ds.lat).compute(), drop=True).values\n", - " dist = dist_km(lon_valid, lon_valid_RK4, lat_valid, lat_valid_RK4)\n", - " time_valid = ds.time.where(~np.isnan(ds.time).compute(), drop=True).values\n", - " axs[m, n].plot(\n", - " time_valid.T,\n", - " dist.T,\n", - " alpha=0.75,\n", - " color=plt.cm.viridis(scheme_colours[j]),\n", - " label=labels,\n", - " )\n", - " dist_end[j, i] = dist[:, -1]\n", + " dist = dist_km(df[\"lon\"], df_RK4[\"lon\"], df[\"lat\"], df_RK4[\"lat\"])\n", + " df = df.with_columns(pl.Series(\"dist\", dist))\n", + " for k, traj in enumerate(df.partition_by(\"particle_id\", maintain_order=True)):\n", + " axs[m, n].plot(\n", + " traj[\"time\"],\n", + " traj[\"dist\"],\n", + " alpha=0.75,\n", + " color=plt.cm.viridis(scheme_colours[j]),\n", + " label=f\"|{advection_scheme.__name__} - AdvectionRK4|\"\n", + " if k == 0\n", + " else None,\n", + " )\n", + " dist_end[j, i] = df.filter(pl.col(\"time\") == df[\"time\"].max())[\"dist\"]\n", " axs[m, n].grid()\n", "axs[-1, -1].axis(\"off\")\n", "axs[0, 0].legend()\n", - "axs[0, 0].set_ylabel(\"Latitude\")\n", - "axs[1, 0].set_ylabel(\"Latitude\")\n", + "axs[0, 0].set_ylabel(\"Distance (km)\")\n", + "axs[1, 0].set_ylabel(\"Distance (km)\")\n", "plt.tight_layout()\n", "plt.show()" ] @@ -838,7 +826,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.14.2" + "version": "3.14.4" } }, "nbformat": 4, diff --git a/docs/user_guide/examples/tutorial_interaction.ipynb b/docs/user_guide/examples/tutorial_interaction.ipynb index 6eb5cc3a2..97fe828e7 100644 --- a/docs/user_guide/examples/tutorial_interaction.ipynb +++ b/docs/user_guide/examples/tutorial_interaction.ipynb @@ -38,7 +38,7 @@ "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", - "import xarray as xr\n", + "import polars as pl\n", "from matplotlib.animation import FuncAnimation\n", "\n", "import parcels\n", @@ -56,7 +56,8 @@ "source": [ "def Pull(particles, fieldset):\n", " \"\"\"Kernel that \"pulls\" all neighbour particles\n", - " toward the attracting particle with a constant velocity\"\"\"\n", + " toward the attracting particle with a constant velocity\n", + " \"\"\"\n", " interaction_distance = 0.5\n", " velocity = -0.04 # predefined attracting velocity\n", "\n", @@ -129,7 +130,7 @@ ")\n", "\n", "output_file = parcels.ParticleFile(\n", - " store=\"InteractingParticles.zarr\",\n", + " path=\"InteractingParticles.parquet\",\n", " outputdt=np.timedelta64(1, \"s\"),\n", ")\n", "\n", @@ -154,15 +155,11 @@ "metadata": {}, "outputs": [], "source": [ - "data_xarray = xr.open_zarr(\"InteractingParticles.zarr\")\n", - "data_attr = data_xarray.where(data_xarray[\"attractor\"].compute() == 1, drop=True)\n", - "data_other = data_xarray.where(data_xarray[\"attractor\"].compute() == 0, drop=True)\n", - "\n", - "timerange = np.arange(\n", - " np.nanmin(data_xarray[\"time\"].values),\n", - " np.nanmax(data_xarray[\"time\"].values),\n", - " np.timedelta64(1, \"s\"),\n", - ")\n", + "df = parcels.read_particlefile(\"InteractingParticles.parquet\")\n", + "df_attr = df.filter(pl.col(\"attractor\") == 1)\n", + "df_other = df.filter(pl.col(\"attractor\") == 0)\n", + "\n", + "timerange = df[\"time\"].unique()\n", "\n", "fig = plt.figure(figsize=(4, 4), constrained_layout=True)\n", "ax = fig.add_subplot()\n", @@ -172,28 +169,14 @@ "ax.set_xlim(-1.1, 1.1)\n", "ax.set_ylim(-1.1, 1.1)\n", "\n", - "time_id = np.where(data_other[\"time\"] == timerange[0])\n", - "time_id_attr = np.where(data_attr[\"time\"] == timerange[0])\n", - "\n", - "scatter = ax.scatter(\n", - " data_other[\"lon\"].values[time_id],\n", - " data_other[\"lat\"].values[time_id],\n", - " c=\"b\",\n", - " s=5,\n", - " zorder=1,\n", - ")\n", + "particles = df_other.filter(pl.col(\"time\") == pl.lit(timerange[0]))\n", + "scatter = ax.scatter(particles[\"lon\"], particles[\"lat\"], c=\"b\", s=5, zorder=1)\n", + "particles_attr = df_attr.filter(pl.col(\"time\") == pl.lit(timerange[0]))\n", "scatter_attr = ax.scatter(\n", - " data_attr[\"lon\"].values[time_id_attr],\n", - " data_attr[\"lat\"].values[time_id_attr],\n", - " c=\"r\",\n", - " s=40,\n", - " zorder=2,\n", + " particles_attr[\"lon\"], particles_attr[\"lat\"], c=\"r\", s=40, zorder=2\n", ")\n", - "\n", "circs = []\n", - "for lon_a, lat_a in zip(\n", - " data_attr[\"lon\"].values[time_id_attr], data_attr[\"lat\"].values[time_id_attr]\n", - "):\n", + "for lon_a, lat_a in zip(particles_attr[\"lon\"], particles_attr[\"lat\"], strict=True):\n", " circs.append(\n", " ax.add_patch(\n", " plt.Circle(\n", @@ -202,30 +185,23 @@ " )\n", " )\n", "\n", - "t = str(timerange[0].astype(\"timedelta64[s]\"))\n", - "title = ax.set_title(\"Particles at t = \" + t + \" (Red particles are attractors)\")\n", + "title = ax.set_title(\n", + " f\"Particles at t = {timerange[0].total_seconds()}s\\n(Red particles are attractors)\"\n", + ")\n", "\n", "\n", "def animate(i):\n", - " t = str(timerange[i].astype(\"timedelta64[s]\"))\n", - " title.set_text(\"Particles at t = \" + t + \"\\n (Red particles are attractors)\")\n", - "\n", - " time_id = np.where(data_other[\"time\"] == timerange[i])\n", - " time_id_attr = np.where(data_attr[\"time\"] == timerange[i])\n", - " scatter.set_offsets(\n", - " np.c_[data_other[\"lon\"].values[time_id], data_other[\"lat\"].values[time_id]]\n", - " )\n", - " scatter_attr.set_offsets(\n", - " np.c_[\n", - " data_attr[\"lon\"].values[time_id_attr], data_attr[\"lat\"].values[time_id_attr]\n", - " ]\n", - " )\n", + " particles = df_other.filter(pl.col(\"time\") == pl.lit(timerange[i]))\n", + " scatter.set_offsets(np.c_[particles[\"lon\"], particles[\"lat\"]])\n", + " particles_attr = df_attr.filter(pl.col(\"time\") == pl.lit(timerange[i]))\n", + " scatter_attr.set_offsets(np.c_[particles_attr[\"lon\"], particles_attr[\"lat\"]])\n", " for c, lon_a, lat_a in zip(\n", - " circs,\n", - " data_attr[\"lon\"].values[time_id_attr],\n", - " data_attr[\"lat\"].values[time_id_attr],\n", + " circs, particles_attr[\"lon\"], particles_attr[\"lat\"], strict=True\n", " ):\n", " c.center = (lon_a, lat_a)\n", + " title.set_text(\n", + " f\"Particles at t = {timerange[i].total_seconds()}s\\n(Red particles are attractors)\"\n", + " )\n", "\n", "\n", "# Create animation\n", @@ -321,7 +297,7 @@ ")\n", "\n", "output_file = parcels.ParticleFile(\n", - " store=\"MergingParticles.zarr\",\n", + " path=\"MergingParticles.parquet\",\n", " outputdt=np.timedelta64(1, \"s\"),\n", ")\n", "\n", @@ -346,13 +322,8 @@ "metadata": {}, "outputs": [], "source": [ - "data_xarray = xr.open_zarr(\"MergingParticles.zarr\")\n", - "\n", - "timerange = np.arange(\n", - " np.nanmin(data_xarray[\"time\"].values),\n", - " np.nanmax(data_xarray[\"time\"].values),\n", - " np.timedelta64(1, \"s\"),\n", - ")\n", + "df = parcels.read_particlefile(\"MergingParticles.parquet\")\n", + "timerange = df[\"time\"].unique()\n", "\n", "fig = plt.figure(figsize=(4, 4), constrained_layout=True)\n", "ax = fig.add_subplot()\n", @@ -362,31 +333,18 @@ "ax.set_xlim(-1.1, 1.1)\n", "ax.set_ylim(-1.1, 1.1)\n", "\n", - "time_id = np.where(data_xarray[\"time\"] == timerange[0])\n", - "\n", + "particles = df.filter(pl.col(\"time\") == pl.lit(timerange[0]))\n", "scatter = ax.scatter(\n", - " data_xarray[\"lon\"].values[time_id],\n", - " data_xarray[\"lat\"].values[time_id],\n", - " s=data_xarray[\"mass\"].values[time_id],\n", - " c=\"b\",\n", - " zorder=1,\n", + " particles[\"lon\"], particles[\"lat\"], c=\"b\", s=particles[\"mass\"], zorder=1\n", ")\n", - "\n", - "t = str(timerange[0].astype(\"timedelta64[s]\"))\n", - "title = ax.set_title(\"Particles at t = \" + t)\n", + "title = ax.set_title(f\"Particles at t = {timerange[0].total_seconds()}s\")\n", "\n", "\n", "def animate(i):\n", - " t = str(timerange[i].astype(\"timedelta64[s]\"))\n", - " title.set_text(\"Particles at t = \" + t)\n", - "\n", - " time_id = np.where(data_xarray[\"time\"] == timerange[i])\n", - " scatter.set_offsets(\n", - " np.c_[data_xarray[\"lon\"].values[time_id], data_xarray[\"lat\"].values[time_id]]\n", - " )\n", - " scatter.set_sizes(data_xarray[\"mass\"].values[time_id])\n", - "\n", - " return (scatter,)\n", + " particles = df.filter(pl.col(\"time\") == pl.lit(timerange[i]))\n", + " scatter.set_offsets(np.c_[particles[\"lon\"], particles[\"lat\"]])\n", + " scatter.set_sizes(particles[\"mass\"])\n", + " title.set_text(f\"Particles at t = {timerange[i].total_seconds()}s\")\n", "\n", "\n", "anim = FuncAnimation(fig, animate, frames=len(timerange), interval=100)\n", @@ -397,7 +355,7 @@ ], "metadata": { "kernelspec": { - "display_name": "test-latest", + "display_name": "docs", "language": "python", "name": "python3" }, @@ -411,7 +369,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.0" + "version": "3.14.4" } }, "nbformat": 4, diff --git a/docs/user_guide/examples/tutorial_manipulating_field_data.ipynb b/docs/user_guide/examples/tutorial_manipulating_field_data.ipynb index 4196ba4a5..cf02db372 100644 --- a/docs/user_guide/examples/tutorial_manipulating_field_data.ipynb +++ b/docs/user_guide/examples/tutorial_manipulating_field_data.ipynb @@ -131,7 +131,7 @@ "\n", "pset = parcels.ParticleSet(fieldset, pclass=parcels.Particle, z=z, lat=lats, lon=lons)\n", "output_file = parcels.ParticleFile(\n", - " store=\"summed_advection_wind.zarr\", outputdt=np.timedelta64(6, \"h\")\n", + " path=\"summed_advection_wind.parquet\", outputdt=np.timedelta64(6, \"h\")\n", ")\n", "pset.execute(\n", " [parcels.kernels.AdvectionRK2],\n", @@ -157,8 +157,9 @@ "outputs": [], "source": [ "# Plot the resulting particle trajectories overlapped for both cases\n", - "summed_advection_wind = xr.open_zarr(\"summed_advection_wind.zarr\")\n", - "plt.plot(summed_advection_wind.lon.T, summed_advection_wind.lat.T, \"-\")\n", + "summed_advection_wind = parcels.read_particlefile(\"summed_advection_wind.parquet\")\n", + "for traj in summed_advection_wind.partition_by(\"particle_id\", maintain_order=True):\n", + " plt.plot(traj[\"lon\"], traj[\"lat\"], \"-\")\n", "plt.show()" ] } @@ -179,7 +180,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.14.2" + "version": "3.14.4" } }, "nbformat": 4, diff --git a/docs/user_guide/examples/tutorial_mitgcm.ipynb b/docs/user_guide/examples/tutorial_mitgcm.ipynb index ae46c7fce..92258c6ca 100644 --- a/docs/user_guide/examples/tutorial_mitgcm.ipynb +++ b/docs/user_guide/examples/tutorial_mitgcm.ipynb @@ -25,7 +25,6 @@ "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", - "import xarray as xr\n", "\n", "import parcels\n", "import parcels.tutorial\n", @@ -94,7 +93,7 @@ "pset = parcels.ParticleSet(fieldset=fieldset, lon=X, lat=Y, z=Z)\n", "\n", "outputfile = parcels.ParticleFile(\n", - " store=\"mitgcm_particles.zarr\",\n", + " path=\"mitgcm_particles.parquet\",\n", " outputdt=np.timedelta64(5000, \"s\"),\n", ")\n", "\n", @@ -121,9 +120,10 @@ "metadata": {}, "outputs": [], "source": [ - "ds = xr.open_zarr(\"mitgcm_particles.zarr\")\n", + "df = parcels.read_particlefile(\"mitgcm_particles.parquet\")\n", "\n", - "plt.plot(ds.lon.T, ds.lat.T, \".-\")\n", + "for traj in df.partition_by(\"particle_id\", maintain_order=True):\n", + " plt.plot(traj[\"lon\"], traj[\"lat\"], \".-\")\n", "plt.show()" ] } @@ -144,7 +144,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.14.2" + "version": "3.14.4" } }, "nbformat": 4, diff --git a/docs/user_guide/examples/tutorial_nemo.ipynb b/docs/user_guide/examples/tutorial_nemo.ipynb index 61fdbac56..628f04649 100644 --- a/docs/user_guide/examples/tutorial_nemo.ipynb +++ b/docs/user_guide/examples/tutorial_nemo.ipynb @@ -48,7 +48,7 @@ "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", - "import xarray as xr\n", + "import polars as pl\n", "\n", "import parcels\n", "import parcels.tutorial" @@ -151,7 +151,7 @@ "\n", "pset = parcels.ParticleSet(fieldset, lon=lonp, lat=latp)\n", "pfile = parcels.ParticleFile(\n", - " store=\"output_curvilinear.zarr\", outputdt=np.timedelta64(1, \"D\")\n", + " \"output_curvilinear.parquet\", outputdt=np.timedelta64(1, \"D\")\n", ")\n", "\n", "pset.execute(\n", @@ -176,9 +176,11 @@ "metadata": {}, "outputs": [], "source": [ - "ds = xr.open_zarr(\"output_curvilinear.zarr\")\n", + "df = parcels.read_particlefile(\"output_curvilinear.parquet\")\n", + "\n", + "for traj in df.partition_by(\"particle_id\", maintain_order=True):\n", + " plt.plot(traj[\"lon\"], traj[\"lat\"], \".-\")\n", "\n", - "plt.plot(ds.lon.T, ds.lat.T, \".-\")\n", "plt.vlines(np.arange(-180, 901, 360), -90, 90, color=\"r\", label=\"antimeridian\")\n", "plt.ylabel(\"Latitude [deg N]\")\n", "plt.xlabel(\"Longitude [deg E]\")\n", @@ -202,8 +204,13 @@ "outputs": [], "source": [ "# post processing\n", - "ds[\"lon\"] = ds[\"lon\"] % 360\n", - "ds[\"lon\"] = ds[\"lon\"].where(ds[\"lon\"] <= 180, ds[\"lon\"] - 360)" + "df = df.with_columns((pl.col(\"lon\") % 360).alias(\"lon\"))\n", + "df = df.with_columns(\n", + " pl.when(pl.col(\"lon\") <= 180)\n", + " .then(pl.col(\"lon\"))\n", + " .otherwise(pl.col(\"lon\") - 360)\n", + " .alias(\"lon\")\n", + ")" ] }, { @@ -225,7 +232,7 @@ "\n", "pset = parcels.ParticleSet(fieldset, lon=lonp, lat=latp)\n", "pfile = parcels.ParticleFile(\n", - " store=\"output_curvilinear_periodic.zarr\", outputdt=np.timedelta64(1, \"D\")\n", + " \"output_curvilinear_periodic.parquet\", outputdt=np.timedelta64(1, \"D\")\n", ")\n", "\n", "pset.execute(\n", @@ -242,10 +249,9 @@ "metadata": {}, "outputs": [], "source": [ - "ds_periodic = xr.open_zarr(\"output_curvilinear_periodic.zarr\")\n", - "\n", "fig, ax = plt.subplots(1, 2, figsize=(10, 5))\n", - "ax[0].plot(ds.lon.T, ds.lat.T, \".-\")\n", + "for traj in df.partition_by(\"particle_id\", maintain_order=True):\n", + " ax[0].plot(traj[\"lon\"], traj[\"lat\"], \".-\")\n", "ax[0].vlines(np.arange(-180, 360, 360), -90, 90, color=\"r\", label=\"antimeridian\")\n", "ax[0].set_ylabel(\"Latitude [deg N]\")\n", "ax[0].set_xlabel(\"Longitude [deg E]\")\n", @@ -253,7 +259,11 @@ "ax[0].set_title(\"in post processing\")\n", "ax[0].legend(loc=\"lower center\")\n", "\n", - "ax[1].plot(ds_periodic.lon.T, ds_periodic.lat.T, \".-\")\n", + "\n", + "df_periodic = parcels.read_particlefile(\"output_curvilinear_periodic.parquet\")\n", + "for traj in df_periodic.partition_by(\"particle_id\", maintain_order=True):\n", + " ax[1].plot(traj[\"lon\"], traj[\"lat\"], \".-\")\n", + "\n", "ax[1].vlines(np.arange(-180, 360, 360), -90, 90, color=\"r\", label=\"antimeridian\")\n", "ax[1].set_ylabel(\"Latitude [deg N]\")\n", "ax[1].set_xlabel(\"Longitude [deg E]\")\n", @@ -325,9 +335,7 @@ " z=100 * np.ones(npart),\n", ")\n", "\n", - "pfile = parcels.ParticleFile(\n", - " store=\"output_nemo3D.zarr\", outputdt=np.timedelta64(1, \"D\")\n", - ")\n", + "pfile = parcels.ParticleFile(\"output_nemo3D.parquet\", outputdt=np.timedelta64(1, \"D\"))\n", "\n", "pset.execute(\n", " parcels.kernels.AdvectionRK2_3D,\n", @@ -354,8 +362,9 @@ "field = field.where(field != 0, np.nan) # Mask land values for better plotting\n", "plt.pcolormesh(fieldset.U.grid.lon, fieldset.U.grid.lat, field, cmap=\"RdBu\")\n", "\n", - "ds_out = xr.open_zarr(\"output_nemo3D.zarr\")\n", - "plt.scatter(ds_out.lon.T, ds_out.lat.T, c=-ds_out.z.T, marker=\".\")\n", + "df_out = parcels.read_particlefile(\"output_nemo3D.parquet\")\n", + "for traj in df_out.partition_by(\"particle_id\", maintain_order=True):\n", + " plt.scatter(traj[\"lon\"], traj[\"lat\"], c=-traj[\"z\"], marker=\".\")\n", "plt.colorbar(label=\"Depth (m)\")\n", "plt.show()" ] @@ -363,7 +372,7 @@ ], "metadata": { "kernelspec": { - "display_name": "default", + "display_name": "docs", "language": "python", "name": "python3" }, @@ -377,7 +386,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.14.3" + "version": "3.14.4" } }, "nbformat": 4, diff --git a/docs/user_guide/examples/tutorial_nestedgrids.ipynb b/docs/user_guide/examples/tutorial_nestedgrids.ipynb index 86095e6b8..f06ec84c6 100644 --- a/docs/user_guide/examples/tutorial_nestedgrids.ipynb +++ b/docs/user_guide/examples/tutorial_nestedgrids.ipynb @@ -483,7 +483,7 @@ "\n", "pset = parcels.ParticleSet(fieldset, pclass=NestedGridParticle, lon=lon, lat=lat)\n", "ofile = parcels.ParticleFile(\n", - " \"nestedgrid_particles.zarr\", outputdt=np.timedelta64(1, \"D\")\n", + " \"nestedgrid_particles.parquet\", outputdt=np.timedelta64(1, \"D\")\n", ")\n", "pset.execute(\n", " AdvectEE_NestedGrids,\n", @@ -503,12 +503,15 @@ "source": [ "fig, ax = plt.subplots(1, 1, figsize=(10, 5))\n", "\n", - "ds_out = xr.open_zarr(\"nestedgrid_particles.zarr\")\n", + "df = parcels.read_particlefile(\"nestedgrid_particles.parquet\")\n", "\n", - "plt.plot(ds_out.lon.T, ds_out.lat.T, \"k\", linewidth=0.5)\n", - "sc = ax.scatter(ds_out.lon, ds_out.lat, c=ds_out.gridID, s=4, cmap=cmap, vmin=0, vmax=2)\n", - "xl, yl = ax.get_xlim(), ax.get_ylim()\n", + "for traj in df.partition_by(\"particle_id\", maintain_order=True):\n", + " plt.plot(traj[\"lon\"], traj[\"lat\"], \".-\")\n", + " sc = ax.scatter(\n", + " traj[\"lon\"], traj[\"lat\"], c=traj[\"gridID\"], s=4, cmap=cmap, vmin=0, vmax=2\n", + " )\n", "\n", + "xl, yl = ax.get_xlim(), ax.get_ylim()\n", "for i in range(n_grids - 1):\n", " poly = grid_polygons[i]\n", " ax.plot(\n", @@ -524,7 +527,6 @@ "cbar = plt.colorbar(sc, ticks=[0, 1, 2], ax=ax)\n", "cbar.set_label(\"Grid ID\")\n", "ax.set_title(\"Particle advection through nested Grids\")\n", - "plt.tight_layout\n", "plt.show()" ] }, @@ -541,7 +543,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "docs", "language": "python", "name": "python3" }, @@ -555,7 +557,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.11" + "version": "3.14.4" } }, "nbformat": 4, diff --git a/docs/user_guide/examples/tutorial_sampling.ipynb b/docs/user_guide/examples/tutorial_sampling.ipynb index fc4583dad..a9ded57b3 100644 --- a/docs/user_guide/examples/tutorial_sampling.ipynb +++ b/docs/user_guide/examples/tutorial_sampling.ipynb @@ -38,9 +38,6 @@ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", - "# To open and look at the temperature data\n", - "import xarray as xr\n", - "\n", "import parcels\n", "import parcels.tutorial" ] @@ -156,7 +153,7 @@ " fieldset=fieldset, pclass=SampleParticle, lon=lon, lat=lat, time=time, z=z\n", ")\n", "\n", - "output_file = parcels.ParticleFile(\"sampletemp.zarr\", outputdt=timedelta(hours=1))\n", + "output_file = parcels.ParticleFile(\"sampletemp.parquet\", outputdt=timedelta(hours=1))\n", "\n", "pset.execute(\n", " [parcels.kernels.AdvectionRK2, SampleT],\n", @@ -180,22 +177,23 @@ "metadata": {}, "outputs": [], "source": [ - "ds_particles = xr.open_zarr(\"sampletemp.zarr\")\n", + "df = parcels.read_particlefile(\"sampletemp.parquet\")\n", "\n", "plt.figure()\n", "ax = plt.axes()\n", "ax.set_ylabel(\"Latitude\")\n", "ax.set_xlabel(\"Longitude\")\n", - "ax.plot(ds_particles.lon.transpose(), ds_particles.lat.transpose(), c=\"k\", zorder=1)\n", - "T_scatter = ax.scatter(\n", - " ds_particles.lon,\n", - " ds_particles.lat,\n", - " c=ds_particles.temperature,\n", - " cmap=plt.cm.inferno,\n", - " norm=mpl.colors.Normalize(vmin=22.0, vmax=24.0),\n", - " edgecolor=\"k\",\n", - " zorder=2,\n", - ")\n", + "for traj in df.partition_by(\"particle_id\", maintain_order=True):\n", + " ax.plot(traj[\"lon\"], traj[\"lat\"], c=\"k\", zorder=1)\n", + " T_scatter = ax.scatter(\n", + " traj[\"lon\"],\n", + " traj[\"lat\"],\n", + " c=traj[\"temperature\"],\n", + " cmap=plt.cm.inferno,\n", + " norm=mpl.colors.Normalize(vmin=22.0, vmax=24.0),\n", + " edgecolor=\"k\",\n", + " zorder=2,\n", + " )\n", "plt.colorbar(T_scatter, label=r\"T [$^{\\circ} C$]\")\n", "plt.show()" ] diff --git a/docs/user_guide/v4-migration.md b/docs/user_guide/v4-migration.md index e4f9f1c06..53a974821 100644 --- a/docs/user_guide/v4-migration.md +++ b/docs/user_guide/v4-migration.md @@ -35,11 +35,13 @@ Version 4 of Parcels is unreleased at the moment. The information in this migrat ## ParticleFile +- ParticleFiles output is now written in parquet format by default, instead of zarr. This means that ParticleFiles can now be read with `polars.read_parquet`. We also provide a helper function `parcels.read_particlefile` to read ParticleFiles, which automatically converts the cftime. - Particlefiles should be created by `ParticleFile(...)` instead of `pset.ParticleFile(...)` -- `ParticleFile` output is now in Parquet format - `ParticleFile` writing behaviour now errors out if there's existing output (this be being further discussed in https://github.com/Parcels-code/Parcels/issues/2593 ) - A utility to read in ParticleFile output is now available. `parcels.read_particlefile()` - "trajectory" is now called "particle_id" in the particle file output +- The `name` argument in `ParticleFile` has been replaced by `path` and can now be a string or a Path. +- The `chunks` argument in `ParticleFile` has been removed. - The `to_write="once"`option has been removed. A variable can now only be either written at every output time step, or not written at all. ## Field diff --git a/pixi.toml b/pixi.toml index 2c6a89cff..674823235 100644 --- a/pixi.toml +++ b/pixi.toml @@ -62,6 +62,7 @@ xgcm = { version = "0.9.*", channel = "conda-forge" } cf_xarray = "0.10.*" cftime = "1.6.*" pooch = "1.8.*" +polars = "*" [feature.py311.dependencies] python = "3.11.*" diff --git a/src/parcels/_core/particlefile.py b/src/parcels/_core/particlefile.py index a7daf4f64..0f1a2d2c5 100644 --- a/src/parcels/_core/particlefile.py +++ b/src/parcels/_core/particlefile.py @@ -8,6 +8,7 @@ import numpy as np import pandas as pd +import polars as pl import pyarrow as pa import pyarrow.parquet as pq import xarray as xr @@ -232,14 +233,18 @@ def read_particlefile(path: PathLike, decode_times: bool = True) -> pd.DataFrame attrs = {k.decode(): v.decode() for k, v in time_field.metadata.items()} - df = pd.read_parquet(path) + df = pl.read_parquet(path) if not decode_times: return df values = table.column("time").to_numpy() var = xr.Variable(("time",), values, attrs) values = xr.coders.CFDatetimeCoder(time_unit="s").decode(var).values - - df["time"] = values + if "since" in attrs["units"]: + values = values.astype("datetime64[ns]") + df = df.with_columns(pl.Series("time", values, dtype=pl.Datetime("ns"))) + else: + values = values.astype("timedelta64[ns]") * 1e9 + df = df.with_columns(pl.Series("time", values, dtype=pl.Duration("ns"))) return df diff --git a/src/parcels/_reprs.py b/src/parcels/_reprs.py index d27eee379..e87d4dc4c 100644 --- a/src/parcels/_reprs.py +++ b/src/parcels/_reprs.py @@ -97,7 +97,7 @@ def particleset_repr(pset: ParticleSet) -> str: def particlesetview_repr(pview: Any) -> str: """Return a pretty repr for ParticleSetView""" time_string = "not_yet_set" if pview.time is None or np.isnan(pview.time) else f"{pview.time:f}" - out = f"P[{pview.trajectory}]: time={time_string}, z={pview.z:f}, lat={pview.lat:f}, lon={pview.lon:f}" + out = f"P[{pview.particle_id}]: time={time_string}, z={pview.z:f}, lat={pview.lat:f}, lon={pview.lon:f}" vars = [v.name for v in pview._ptype.variables if v.to_write is True and v.name not in ["lon", "lat", "z", "time"]] for var in vars: out += f", {var}={getattr(pview, var):f}" @@ -129,8 +129,6 @@ def particlefile_repr(pfile: Any) -> str: out = f"""<{type(pfile).__name__}> path : {pfile.path} outputdt : {pfile.outputdt!r} - chunks : {pfile.chunks!r} - create_new_zarrfile : {pfile.create_new_zarrfile!r} metadata : {_format_list_items_multiline(pfile.metadata, level=2, with_brackets=False)} """ diff --git a/tests/test_particlefile.py b/tests/test_particlefile.py index 759ac5274..5173cba7e 100755 --- a/tests/test_particlefile.py +++ b/tests/test_particlefile.py @@ -4,6 +4,7 @@ import numpy as np import pandas as pd +import polars as pl import pyarrow as pa import pyarrow.parquet as pq import pytest @@ -298,9 +299,10 @@ def IncreaseAge(particles, fieldset): # pragma: no cover df = parcels.read_particlefile(tmp_parquet) # Map sorted particle IDs to release times (0, 1, ..., npart-1 seconds) - for index, df_traj in df.groupby("particle_id"): - release_time = time[index] - np.testing.assert_equal(df_traj["age"].astype("timedelta64[s]").values, (df_traj["time"] - release_time).values) + for i, df_traj in enumerate(df.partition_by("particle_id", maintain_order=True)): + release_time = pd.Timestamp(time[i]).to_pydatetime() + traj_time = (df_traj["time"] - release_time).dt.total_seconds() + assert (df_traj["age"] == traj_time).all() def test_reset_dt(fieldset, tmp_parquet): @@ -362,9 +364,8 @@ def test_pset_execute_outputdt_forwards(fieldset): dt = timedelta(minutes=5) df = setup_pset_execute(fieldset=fieldset, outputdt=outputdt, execute_kwargs=dict(runtime=runtime, dt=dt)) - particle_0_times = df[df["particle_id"] == 0].time.values - - np.testing.assert_equal(np.diff(particle_0_times), outputdt.seconds) + particle_0_times = df.filter(pl.col("particle_id") == 0)["time"] + np.testing.assert_equal(np.diff(particle_0_times) / 1e9, outputdt.seconds) def test_pset_execute_output_time_forwards(fieldset): @@ -374,8 +375,8 @@ def test_pset_execute_output_time_forwards(fieldset): dt = np.timedelta64(5, "m") df = setup_pset_execute(fieldset=fieldset, outputdt=outputdt, execute_kwargs=dict(runtime=runtime, dt=dt)) - assert df.time.min() == pd.Timestamp(fieldset.time_interval.left) - assert df.time.max() - df.time.min() == runtime + assert df["time"].min() == pd.Timestamp(fieldset.time_interval.left) + assert df["time"].max() - df["time"].min() == runtime def test_pset_execute_outputdt_backwards(fieldset): @@ -385,8 +386,8 @@ def test_pset_execute_outputdt_backwards(fieldset): dt = -timedelta(minutes=5) df = setup_pset_execute(fieldset=fieldset, outputdt=outputdt, execute_kwargs=dict(runtime=runtime, dt=dt)) - particle_0_times = df[df["particle_id"] == 0].time.values - np.testing.assert_equal(np.diff(particle_0_times), -outputdt.seconds) + particle_0_times = df.filter(pl.col("particle_id") == 0)["time"] + np.testing.assert_equal(np.diff(particle_0_times) / 1e9, -outputdt.seconds) def test_pset_execute_outputdt_backwards_fieldset_timevarying(): @@ -404,8 +405,8 @@ def test_pset_execute_outputdt_backwards_fieldset_timevarying(): fieldset = FieldSet.from_sgrid_conventions(ds_fset) df = setup_pset_execute(outputdt=outputdt, execute_kwargs=dict(runtime=runtime, dt=dt), fieldset=fieldset) - particle_0_times = df[df["particle_id"] == 0].time.values - np.testing.assert_equal(np.diff(particle_0_times), -outputdt.seconds) + particle_0_times = df.filter(pl.col("particle_id") == 0)["time"] + np.testing.assert_equal(np.diff(particle_0_times) / 1e9, -outputdt.seconds) def test_particlefile_init(tmp_parquet): diff --git a/tests/utils.py b/tests/utils.py index 33d6e0012..c87198fd7 100644 --- a/tests/utils.py +++ b/tests/utils.py @@ -4,6 +4,7 @@ import struct from collections import defaultdict +from datetime import datetime from pathlib import Path import cftime @@ -160,7 +161,7 @@ def assert_cftime_like_particlefile(parquet_path: Path) -> None: df = parcels.read_particlefile(parquet_path, decode_times=True) # check first value (and hence rest of array) is what we expect - assert isinstance(df["time"].values[0], (cftime.datetime, np.datetime64)), ( + assert isinstance(df["time"][0], (cftime.datetime, datetime)), ( "CF-time values in Parquet did not get properly decoded. Are the attributes correct?" ) return