Skip to content

perf: cache default ArraySpec for regular chunk grids#3908

Open
d-v-b wants to merge 5 commits intozarr-developers:mainfrom
d-v-b:perf/cache-default-chunk-spec
Open

perf: cache default ArraySpec for regular chunk grids#3908
d-v-b wants to merge 5 commits intozarr-developers:mainfrom
d-v-b:perf/cache-default-chunk-spec

Conversation

@d-v-b
Copy link
Copy Markdown
Contributor

@d-v-b d-v-b commented Apr 15, 2026

For regular grids, all chunks have the same codec_shape, so we can
build the ArraySpec once and reuse it for every chunk — avoiding the
per-chunk ChunkGrid.getitem + ArraySpec construction overhead.

Adds _get_default_chunk_spec() and uses it in _get_selection and
_set_selection. Saves ~5ms per 1000 chunks.

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

For regular grids, all chunks have the same codec_shape, so we can
build the ArraySpec once and reuse it for every chunk — avoiding the
per-chunk ChunkGrid.__getitem__ + ArraySpec construction overhead.

Adds _get_default_chunk_spec() and uses it in _get_selection and
_set_selection. Saves ~5ms per 1000 chunks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label Apr 15, 2026
@github-actions github-actions Bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Apr 15, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.11%. Comparing base (029c376) to head (6fa93cb).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3908   +/-   ##
=======================================
  Coverage   93.11%   93.11%           
=======================================
  Files          85       85           
  Lines       11365    11371    +6     
=======================================
+ Hits        10582    10588    +6     
  Misses        783      783           
Files with missing lines Coverage Δ
src/zarr/core/array.py 97.72% <100.00%> (+0.01%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-v-b d-v-b requested a review from maxrjones April 21, 2026 19:12
Comment thread src/zarr/core/array.py
Comment on lines +5373 to +5378
def _get_default_chunk_spec(
metadata: ArrayMetadata,
chunk_grid: ChunkGrid,
array_config: ArrayConfig,
prototype: BufferPrototype,
) -> ArraySpec | None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the name of _get_default_chunk_spec, should this be a ChunkSpec?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the arrayspec for a chunk, not a chunkspec. And the consumer needs an ArraySpec, so we can't change the return type.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, still a bit confusing but I guess that might relate to the fact that a better design wouldn't need a per-chunk ArraySpec.

My only other question is why build a new function rather than making chunk_coords: tuple[int, ...] optional in _get_chunk_spec, such that it could return a default?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would the default chunk coordinates be? the "origin" chunk coordinate depends on the dimensionality of the array

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the goal here is to avoid object creation overhead inside _get_chunk_spec. Adding default parameters to _get_chunk_spec would not help us, because we would still create many identical ArraySpec objects. The only change to _get_chunk_spec that would help is adding a caching layer via @lru_cache, I should see if that works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants