Update the documentation to use parquet output by erikvansebille · Pull Request #2607 · Parcels-code/Parcels

erikvansebille · 2026-04-30T10:15:26Z

Description

This PR updates all the documentation and tutorial notebooks to parse the parquet output introduced in #2600, as tracked in #2582. It also updated the parcels.read_particlefile() to use polars, which scales better for large output files

Checklist

Contributes to Migrate particlefile output to tabular (via Parquet) #2582
Tests added
This PR targets the correct branch (main for normal development, v3-support for v3 support)

AI Disclosure

This PR contains AI-generated content.
- I have tested any AI-generated content in my PR.
- I take responsibility for any AI-generated content in my PR.
- Describe how you used it (e.g., by pasting your prompt): Help with how to use polars

Covered by test_write_dtypes_pfile

for more information, see https://pre-commit.ci

This mark was only introduced during refactoring

Remove temporary test_cftime.py file

This function is now independent of the time_interval as time is now stored as float

Remove nested key - save on root instead

…ection)

VeckoTheGecko

As part of review I've looked both at the code, as well as visually compared plots before and after.

I've gone through and pushed some edits which were quite straight-forward:

1b35bf9 Fixing a notebook
d977c88e7

Now the docs builds are passing

Other than that, I have some small comments - nothing major.

Given we're now using Polars in the docs, the tests, and in the read_particlefile function - I think its easiest just for us to add it as a core dependency to Parcels. We could make it an optional dependency, but we don't really have the tooling for that in Parcels (and I don't think its worth adding the tooling in this case).

If we add as a core dependency:

Update pyproject.toml and pixi.toml (run-dependencies to = ">=1.31.0" and feature.minimum.dependencies to = "1.31.*")
Update recipe.yaml

I'm happy to make those updates.

VeckoTheGecko · 2026-05-01T10:05:42Z


-The output files are in `.zarr` [format](https://zarr.readthedocs.io/en/stable/), which can be read by `xarray`.
-See the [Parcels output tutorial](./tutorial_output.ipynb) for more information on the zarr format. We want to choose
+The output files are in `.parquet` [format](https://parquet.apache.org/), which can be read by `polars`.


Would this be a good place to link to Polars?

Suggested change

The output files are in `.parquet` [format](https://parquet.apache.org/), which can be read by `polars`.

The output files are in `.parquet` [format](https://parquet.apache.org/), which can be read by [Polars](https://pola.rs/).

I don't think we link to it yet in the docs here

VeckoTheGecko · 2026-05-01T10:10:53Z


-Parcels depends on `xarray`, expecting inputs in the form of [`xarray.Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html)
-and writing output files that can be read with xarray.
+Parcels depends on `xarray`, expecting inputs in the form of [`xarray.Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html). Output files can be read with `pandas`.


Suggested change

Parcels depends on `xarray`, expecting inputs in the form of [`xarray.Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html). Output files can be read with `pandas`.

Parcels depends on `xarray`, expecting inputs in the form of [`xarray.Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html). Output files can be read with `polars`.

VeckoTheGecko · 2026-05-01T10:12:38Z

@@ -155,23 +155,22 @@ pset.execute(
 To start analyzing the trajectories computed by **Parcels**, we can open the `ParticleFile` using `xarray`:


This needs to be updated from "xarray"

VeckoTheGecko · 2026-05-01T11:30:40Z

+    if "since" in attrs["units"]:
+        values = values.astype("datetime64[ns]")
+        df = df.with_columns(pl.Series("time", values, dtype=pl.Datetime("ns")))
+    else:
+        values = values.astype("timedelta64[ns]") * 1e9
+        df = df.with_columns(pl.Series("time", values, dtype=pl.Duration("ns")))


I don't think this works properly with cf-time variables, and think that it will silently fail by providing incorrect times. I think its worth updating the docstring and also adding a check (if the calendar in the metadata isn't supported (ie. is CF specific) raise a NotImplementedError) since this is quite a user facing function.

VeckoTheGecko · 2026-05-01T11:49:49Z

+    assert isinstance(df["time"][0], (cftime.datetime, datetime)), (
        "CF-time values in Parquet did not get properly decoded. Are the attributes correct?"
    )


This assert should be updated pending the discussion from the other comment on the read_particlefile function.

VeckoTheGecko · 2026-05-01T14:06:58Z

 ```{code-cell}
-ds_particles_back = xr.open_zarr("output-backwards.zarr")
+df_back = parcels.read_particlefile("output-backwards.parquet")

-scatter = plt.scatter(ds_particles_back.lon.T, ds_particles_back.lat.T, c=np.repeat(ds_particles_back.obs.values,npart))
-plt.scatter(ds_particles_back.lon[:,0],ds_particles_back.lat[:,0],facecolors="none",edgecolors='r') # starting positions
+scatter = plt.scatter(df_back['lon'], df_back['lat'], c=df_back['time'])
+particles_at_start = df_back.filter(pl.col("time") == df_back["time"].min())
+plt.scatter(particles_at_start['lon'], particles_at_start['lat'], facecolors="none", edgecolors='r') # starting positions
 plt.xlabel("Longitude [deg E]")
 plt.xlim(31,33)
 plt.ylabel("Latitude [deg N]")
 plt.ylim(-33,-30)
 plt.colorbar(scatter, label="Observation number")
 plt.show()
 ```


This output is different to what is was before.

What are we highlighting with the red circles here?

(before on the left, this PR on the right)

VeckoTheGecko · 2026-05-01T14:14:41Z

I've gone through this - really nice update! I think its quite clear

VeckoTheGecko · 2026-05-01T14:23:50Z

FYI the colourbar is different here - seems to be better now (fits the range of values better)

VeckoTheGecko · 2026-05-01T14:24:48Z

FYI The final plot here doesn't have black lines through it

I'm happy either way

VeckoTheGecko and others added 30 commits April 20, 2026 14:41

Update .gitignore

ffe0ebf

Disable zarr writing

9ebb653

Fix parquet writing

0218c28

Remove test_vriable_write_double

4e7de3e

Fix all "uses_old_zarr" tests

bc653f1

Remove test_variable_write_double

5daec3b

Covered by test_write_dtypes_pfile

Fixing tests

2a07ced

More test fixing

b8c5477

Fix last tests

2d438c5

[pre-commit.ci] auto fixes from pre-commit.com hooks

32a82fa

for more information, see https://pre-commit.ci

Remove old fixtures

19fbd8d

Fix pre-commit errors

e74672f

Cleanup

db9f983

This mark was only introduced during refactoring

Add pandas and pyarrow as explicit dependencies

ac2a830

Add assert_cftime_like_particlefile

de464e5

Remove temporary test_cftime.py file

MAINT: Cleanup create_particle_data

57ccf6f

This function is now independent of the time_interval as time is now stored as float

Add cftime metadata serialization

b2bde50

Add np.timedelta64 support

55493a9

Fix assert_cftime_like_particlefile

bab4d5d

Remove nested key - save on root instead

Move imports

b28665c

Fixing tests

7184e1f

Fix test_time_is_age test

e7e37ef

Refactor assert_cftime_like_particlefile

54c829a

Self-review feedback

8626d48

Fix test_particle_schema

3693329

Make read_particlefile public

81f127b

Add docstring to read_particlefile

9fcb5bf

Updarting Argo tutorial to use parquet

41ed3d8

Updating tutorial_nemo to use parquet output

b64a00e

Update tutorial_diffusion to use parquet

5e0fc7f

erikvansebille added 20 commits April 28, 2026 12:34

Fix using polars in tutorial_output

b049a73

Fixing read_parquet to use polars

59ed170

Update tutorial_delaystart to use parquet

3f326ce

Update tutorial_dt_integrators to use parquet

1a48b44

Fixing parcels.read_particlefile for timedelta time

9e3b88a

Update tutorial_interaction to use parquet

9084910

Update tutorial_manipulating_field_data to use parquet

3ad3f10

Update tutorial_mitgcm to use parquet

daad8c9

Updsate tutorial_nestedgrids to use parquet

f811b8d

Update tutorial_sampling to use parquet (and remove to_write="once" s…

4dd08b9

…ection)

Removing old attributes from particlefile.repr

b06a051

Using more intuitive variable names for polars subsetting

aa9fbd1

Fixing repr of particleset

b57e78f

Update tutorial_Argofloats.ipynb

9aa1459

Update tutorial_quickstart to use parquet

d521ad3

Update tutorial_croco_3D.ipynb

3d0c55d

Using polars in tutorial_diffusion

580d5fb

Use polars in tutorial_nemo

4571bca

Use parquet in explanation_kernelloop

3d29c72

Update policies.md

45c9cf0

github-project-automation Bot added this to Parcels development Apr 30, 2026

github-project-automation Bot moved this to Backlog in Parcels development Apr 30, 2026

erikvansebille and others added 2 commits April 30, 2026 13:16

Fixing unit tests to use polars in parcels.read_particlefile

d47202b

Merge branch 'main' into update_parquet_docs

31dabc0

erikvansebille mentioned this pull request May 1, 2026

Remove to_write="once" option #2609

Merged

5 tasks

Merge branch 'main' into update_parquet_docs

31a49e8

VeckoTheGecko mentioned this pull request May 1, 2026

DOC: Fix code blocks in posting-issues.md #2621

Merged

3 tasks

VeckoTheGecko added 2 commits May 1, 2026 15:46

Merge branch 'main' into update_parquet_docs

3b2a166

Doc fix: docs/user_guide/examples/tutorial_dt_integrators.ipynb

1b35bf9

VeckoTheGecko reviewed May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the documentation to use parquet output#2607

Update the documentation to use parquet output#2607
erikvansebille wants to merge 76 commits intoParcels-code:mainfrom
erikvansebille:update_parquet_docs

erikvansebille commented Apr 30, 2026

Uh oh!

VeckoTheGecko left a comment

Uh oh!

VeckoTheGecko May 1, 2026

Uh oh!

VeckoTheGecko May 1, 2026

Uh oh!

VeckoTheGecko May 1, 2026

Uh oh!

VeckoTheGecko May 1, 2026

Uh oh!

VeckoTheGecko May 1, 2026

Uh oh!

VeckoTheGecko May 1, 2026

Uh oh!

VeckoTheGecko May 1, 2026

Uh oh!

VeckoTheGecko May 1, 2026

Uh oh!

VeckoTheGecko May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	The output files are in `.parquet` [format](https://parquet.apache.org/), which can be read by `polars`.
	The output files are in `.parquet` [format](https://parquet.apache.org/), which can be read by [Polars](https://pola.rs/).

	Parcels depends on `xarray`, expecting inputs in the form of [`xarray.Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html). Output files can be read with `pandas`.
	Parcels depends on `xarray`, expecting inputs in the form of [`xarray.Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html). Output files can be read with `polars`.

		@@ -155,23 +155,22 @@ pset.execute(
		To start analyzing the trajectories computed by Parcels, we can open the `ParticleFile` using `xarray`:

Conversation

erikvansebille commented Apr 30, 2026

Description

Checklist

AI Disclosure

Uh oh!

VeckoTheGecko left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants