perf: cache default ArraySpec for regular chunk grids#3908
perf: cache default ArraySpec for regular chunk grids#3908d-v-b wants to merge 5 commits intozarr-developers:mainfrom
Conversation
For regular grids, all chunks have the same codec_shape, so we can build the ArraySpec once and reuse it for every chunk — avoiding the per-chunk ChunkGrid.__getitem__ + ArraySpec construction overhead. Adds _get_default_chunk_spec() and uses it in _get_selection and _set_selection. Saves ~5ms per 1000 chunks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3908 +/- ##
=======================================
Coverage 93.11% 93.11%
=======================================
Files 85 85
Lines 11365 11371 +6
=======================================
+ Hits 10582 10588 +6
Misses 783 783
🚀 New features to boost your workflow:
|
| def _get_default_chunk_spec( | ||
| metadata: ArrayMetadata, | ||
| chunk_grid: ChunkGrid, | ||
| array_config: ArrayConfig, | ||
| prototype: BufferPrototype, | ||
| ) -> ArraySpec | None: |
There was a problem hiding this comment.
given the name of _get_default_chunk_spec, should this be a ChunkSpec?
There was a problem hiding this comment.
it's the arrayspec for a chunk, not a chunkspec. And the consumer needs an ArraySpec, so we can't change the return type.
There was a problem hiding this comment.
ok, still a bit confusing but I guess that might relate to the fact that a better design wouldn't need a per-chunk ArraySpec.
My only other question is why build a new function rather than making chunk_coords: tuple[int, ...] optional in _get_chunk_spec, such that it could return a default?
There was a problem hiding this comment.
what would the default chunk coordinates be? the "origin" chunk coordinate depends on the dimensionality of the array
There was a problem hiding this comment.
Part of the goal here is to avoid object creation overhead inside _get_chunk_spec. Adding default parameters to _get_chunk_spec would not help us, because we would still create many identical ArraySpec objects. The only change to _get_chunk_spec that would help is adding a caching layer via @lru_cache, I should see if that works
For regular grids, all chunks have the same codec_shape, so we can
build the ArraySpec once and reuse it for every chunk — avoiding the
per-chunk ChunkGrid.getitem + ArraySpec construction overhead.
Adds _get_default_chunk_spec() and uses it in _get_selection and
_set_selection. Saves ~5ms per 1000 chunks.
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com