Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
17e74a9
feat(metadata): scaffold zarr-metadata package structure
d-v-b Apr 21, 2026
8da469f
build(metadata): depend on zarr-metadata via local uv workspace source
d-v-b Apr 21, 2026
e43bd36
feat(metadata): add JSON, NamedConfig, NamedRequiredConfig primitives
d-v-b Apr 21, 2026
28efcde
feat(metadata): add v3 array metadata types
d-v-b Apr 21, 2026
23aed89
feat(metadata): add v3 consolidated metadata type
d-v-b Apr 21, 2026
43eefba
feat(metadata): add v3 group metadata type
d-v-b Apr 21, 2026
f26e1bd
feat(metadata): wire up zarr_metadata.v3 re-exports
d-v-b Apr 21, 2026
27b000c
feat(metadata): add faithful v2 array metadata types
d-v-b Apr 21, 2026
2a45d2d
feat(metadata): add v2 group metadata type
d-v-b Apr 21, 2026
d530cb4
feat(metadata): add v2 consolidated metadata type (canonical impl, no…
d-v-b Apr 21, 2026
266a8eb
feat(metadata): wire up zarr_metadata.v2 re-exports
d-v-b Apr 21, 2026
e431dee
feat(metadata): add ArrayMetadata, GroupMetadata version-polymorphic …
d-v-b Apr 21, 2026
c547f55
feat(metadata): add Codec envelope and blosc codec configurations
d-v-b Apr 21, 2026
1517cd8
feat(metadata): add dtype types (DType, LengthBytesConfig, FixedLengt…
d-v-b Apr 21, 2026
b90fb68
test(metadata): smoke + structural tests for the package
d-v-b Apr 21, 2026
bb0183c
refactor(common): re-export JSON, NamedConfig, NamedRequiredConfig fr…
d-v-b Apr 21, 2026
fc09be6
refactor(metadata): re-export v3 types from zarr-metadata
d-v-b Apr 21, 2026
33bfc99
refactor(metadata): re-export faithful v2 array metadata type
d-v-b Apr 21, 2026
1098718
refactor(codecs): re-export blosc codec configurations from zarr-meta…
d-v-b Apr 21, 2026
b437812
refactor(abc): re-export CodecJSON from zarr-metadata
d-v-b Apr 21, 2026
2578ad8
refactor(dtype): re-export DTypeJSON from zarr-metadata
d-v-b Apr 21, 2026
51a1df3
refactor(dtype): re-export LengthBytesConfig from zarr-metadata
d-v-b Apr 21, 2026
d06fad4
refactor(dtype): re-export FixedLengthBytesConfig from zarr-metadata
d-v-b Apr 21, 2026
a2c2960
refactor(dtype): re-export TimeConfig from zarr-metadata
d-v-b Apr 21, 2026
d42a508
refactor(metadata): use tuple[int, ...] for fixed-length fields + typ…
d-v-b Apr 21, 2026
e7ff23c
refactor(metadata): fix explicit re-exports and complete DateTimeUnit…
d-v-b Apr 21, 2026
99b2571
refactor(metadata): extract primitives to common.py to break import c…
d-v-b Apr 21, 2026
a88716b
fix(metadata): address review findings
d-v-b Apr 21, 2026
7571dbc
refactor(metadata): remove consolidated_metadata from GroupMetadataV3
d-v-b Apr 21, 2026
bb98cde
chore(metadata): don't track zarr-metadata's uv.lock
d-v-b Apr 21, 2026
08c7643
feat(metadata): add v3 codec types for bytes, crc32c, gzip, zstd, tra…
d-v-b Apr 21, 2026
2feb4be
refactor(metadata): define codec envelope TypedDicts explicitly
d-v-b Apr 21, 2026
a12fe70
feat(metadata): add Final string constants for codec names and enum-v…
d-v-b Apr 21, 2026
ec22950
docs(metadata): say "codec metadata" instead of "codec envelope"
d-v-b Apr 21, 2026
275bf55
docs(metadata): use single-backtick markdown code formatting
d-v-b Apr 21, 2026
374181a
feat(metadata): add v3 spec data type metadata
d-v-b Apr 21, 2026
9554ca3
refactor(metadata): per-dtype modules with fill-value types and valid…
d-v-b Apr 21, 2026
8d2bd63
refactor(metadata): per-grid and per-encoding modules for chunk_grid …
d-v-b Apr 21, 2026
e6139a6
refactor(metadata): move codec/ and dtype/ under v3/
d-v-b Apr 21, 2026
ac0304c
refactor(metadata): rename v3/dtype/ -> v3/data_type/
d-v-b Apr 21, 2026
ef4b773
Merge branch 'main' into refactor/metadata-package
d-v-b Apr 22, 2026
b7b055e
feat: add zarr-metadata package
d-v-b Apr 22, 2026
0ae8db9
Merge branch 'main' of github.com:zarr-developers/zarr-python into re…
d-v-b Apr 22, 2026
1b62c4c
Merge branch 'main' into refactor/metadata-package
d-v-b Apr 22, 2026
c6fcde9
test(metadata): drop tests that don't actually test anything
d-v-b Apr 22, 2026
c90c9a0
build(metadata): lower minimum Python to 3.11
d-v-b Apr 22, 2026
331ea93
Merge branch 'main' into refactor/metadata-package
d-v-b Apr 27, 2026
33c9a80
Merge branch 'refactor/metadata-package' of https://github.com/d-v-b/…
d-v-b Apr 28, 2026
700d916
refactor: remove generic base metadata
d-v-b Apr 28, 2026
84d5ca1
refactor: clean up codecs init
d-v-b Apr 29, 2026
c30a768
fix: correct v2 structured dtype spec
d-v-b Apr 29, 2026
fa66cc9
refactor: drop readonly for numcodecs config
d-v-b Apr 29, 2026
cf12cdc
docs: improve docstring
d-v-b Apr 29, 2026
82e10d6
fix: use empty typeddict for crc32c config
d-v-b Apr 29, 2026
4b5bd11
fix: remove arbitrary json from consolidated model
d-v-b Apr 29, 2026
6a0be8c
Merge branch 'main' into refactor/metadata-package
d-v-b Apr 29, 2026
e6e5920
fix: don't depend on zarr-metadata yet
d-v-b Apr 29, 2026
a732fb2
fix: typesize is not required
d-v-b Apr 29, 2026
8691138
fix: re-wire zarr-metadata up as a dependency for zarr-python
d-v-b Apr 29, 2026
a6d0e5e
chore: revert changes to src/zarr
d-v-b Apr 29, 2026
039fd7e
Merge branch 'main' into refactor/metadata-package
d-v-b Apr 29, 2026
b8d67fe
chore: mypy ignore the new package
d-v-b Apr 29, 2026
cdedda2
Merge branch 'refactor/metadata-package' of https://github.com/d-v-b/…
d-v-b Apr 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,6 @@ tests/.hypothesis

zarr/version.py
zarr.egg-info/

# zarr-metadata package lockfile (a library, not an app)
packages/zarr-metadata/uv.lock
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ repos:
rev: v1.19.1
hooks:
- id: mypy
files: src|tests
files: ^(src|tests)/
additional_dependencies:
# Package dependencies
- packaging
Expand Down
15 changes: 15 additions & 0 deletions packages/zarr-metadata/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# zarr-metadata

Spec-defined metadata types for Zarr v2 and v3, distributed as pure-typing
artifacts (TypedDicts, type aliases, unions). No runtime logic, no numpy,
no storage backends.

`zarr-metadata` is developed in the [zarr-python](https://github.com/zarr-developers/zarr-python)
repository at `packages/zarr-metadata/`.

## Principle

Every type that models a spec artifact (v2 or v3 array/group/consolidated
metadata, chunk grids, codec metadata, dtype shapes) belongs in
`zarr-metadata`. Zarr-python implementation details (runtime codecs,
config dataclasses, numcodecs-derived helpers) stay in `zarr`.
51 changes: 51 additions & 0 deletions packages/zarr-metadata/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
[build-system]
requires = ["hatchling>=1.29.0"]
build-backend = "hatchling.build"

[project]
name = "zarr-metadata"
version = "0.1.0"
description = "Spec-defined metadata types for Zarr v2 and v3."
readme = "README.md"
requires-python = ">=3.11"
Comment thread
maxrjones marked this conversation as resolved.
license = "MIT"
authors = [
{ name = "Davis Bennett", email = "davis.v.bennett@gmail.com" },
]
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Programming Language :: Python :: 3.14",
"Typing :: Typed",
]
dependencies = [
"typing_extensions>=4.13",
]

[project.optional-dependencies]
test = ["pytest"]

[tool.hatch.build.targets.wheel]
packages = ["src/zarr_metadata"]

[tool.numpydoc_validation]
checks = [
"GL10",
"SS04",
"PR02",
"PR03",
"PR05",
"PR06",
]

[tool.pyright]
include = ["src"]
enableExperimentalFeatures = true
typeCheckingMode = "strict"
pythonVersion = "3.11"
23 changes: 23 additions & 0 deletions packages/zarr-metadata/src/zarr_metadata/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from zarr_metadata.common import JSON, NamedConfig
from zarr_metadata.v2.array import ArrayMetadataV2
from zarr_metadata.v2.group import GroupMetadataV2
from zarr_metadata.v3.array import ArrayMetadataV3
from zarr_metadata.v3.group import GroupMetadataV3

ArrayMetadata = ArrayMetadataV2 | ArrayMetadataV3
"""Any Zarr array metadata document (v2 or v3)."""

GroupMetadata = GroupMetadataV2 | GroupMetadataV3
"""Any Zarr group metadata document (v2 or v3)."""


__all__ = [
"JSON",
"ArrayMetadata",
"ArrayMetadataV2",
"ArrayMetadataV3",
"GroupMetadata",
"GroupMetadataV2",
"GroupMetadataV3",
"NamedConfig",
]
24 changes: 24 additions & 0 deletions packages/zarr-metadata/src/zarr_metadata/common.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
"""
Top-level cross-version primitives for Zarr metadata.

Version-specific types live under `zarr_metadata.v2` and `zarr_metadata.v3`.
Codec and dtype spec types live under `zarr_metadata.v3.codec` and
`zarr_metadata.v3.data_type`.
"""

from collections.abc import Mapping, Sequence
from typing import NotRequired, TypedDict

JSON = str | int | float | bool | Mapping[str, "JSON"] | Sequence["JSON"] | None
"""Any valid JSON value."""


class NamedConfig(TypedDict):
"""
Externally-tagged union member for a metadata field.

Generic with two parameters: name literal and configuration mapping.
"""

name: str
configuration: NotRequired[Mapping[str, JSON]]
Empty file.
15 changes: 15 additions & 0 deletions packages/zarr-metadata/src/zarr_metadata/v2/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
"""Zarr v2 metadata types."""

from zarr_metadata.v2.array import ArrayMetadataV2, DataTypeV2, DataTypeV2Structured
from zarr_metadata.v2.codec import NumcodecsConfig
from zarr_metadata.v2.consolidated import ConsolidatedMetadataV2
from zarr_metadata.v2.group import GroupMetadataV2

__all__ = [
"ArrayMetadataV2",
"ConsolidatedMetadataV2",
"DataTypeV2",
"DataTypeV2Structured",
"GroupMetadataV2",
"NumcodecsConfig",
]
55 changes: 55 additions & 0 deletions packages/zarr-metadata/src/zarr_metadata/v2/array.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
"""Zarr v2 array metadata types."""

from __future__ import annotations

from typing import TYPE_CHECKING, Literal, NotRequired, TypedDict

if TYPE_CHECKING:
from zarr_metadata.common import JSON
from zarr_metadata.v2.codec import NumcodecsConfig


DataTypeV2Structured = tuple[str, str] | tuple[str, str, tuple[int, ...]]
"""
A single field entry inside a structured v2 dtype.

Spec-faithful: `datatype` is a numpy-style dtype string; `shape` is
present only when the field is a subarray field.

See https://zarr-specs.readthedocs.io/en/latest/v2/v2.0.html#data-type-encoding
"""

DataTypeV2 = str | tuple[DataTypeV2Structured, ...]
"""The v2 dtype representation.

Simple dtypes are numpy-style strings (e.g. `"<f8"`, `"|S10"`).
Structured dtypes are lists of field records. Endianness is encoded in the
prefix character of the string; parsing it out is a caller concern, not
part of this type.
"""


class ArrayMetadataV2(TypedDict):
"""
Zarr v2 array metadata document (the `.zarray` content).

See https://zarr-specs.readthedocs.io/en/latest/v2/v2.0.html
"""

zarr_format: Literal[2]
shape: tuple[int, ...]
chunks: tuple[int, ...]
dtype: DataTypeV2
compressor: NumcodecsConfig | None
fill_value: JSON
order: Literal["C", "F"]
filters: tuple[NumcodecsConfig, ...] | None
dimension_separator: NotRequired[Literal[".", "/"]]
attributes: JSON


__all__ = [
"ArrayMetadataV2",
"DataTypeV2",
"DataTypeV2Structured",
]
29 changes: 29 additions & 0 deletions packages/zarr-metadata/src/zarr_metadata/v2/codec.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
"""
Zarr v2 codec configuration shape.

V2 compressors and filters are numcodecs configuration dicts: a required
`id` field naming the codec, plus arbitrary codec-specific extra fields.
"""

from typing_extensions import TypedDict

from zarr_metadata.common import JSON


class NumcodecsConfig(TypedDict, extra_items=JSON): # type: ignore[call-arg]
"""
A numcodecs configuration dict, used as a v2 compressor or filter.

The required `id` field names the codec; codec-specific parameters
(e.g. `cname`, `clevel` for blosc) appear as extra fields.

See the "compressor" and "filters" sections of
https://zarr-specs.readthedocs.io/en/latest/v2/v2.0.html
"""

id: str


__all__ = [
"NumcodecsConfig",
]
35 changes: 35 additions & 0 deletions packages/zarr-metadata/src/zarr_metadata/v2/consolidated.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
"""Zarr v2 consolidated metadata (`.zmetadata` file).

This module models the de-facto `.zmetadata` file used by the reference
Python implementation of Zarr v2. **This is NOT a spec artifact.** There
is no Zarr v2 specification that defines `.zmetadata`; it is a
canonical-implementation convention.
"""

from __future__ import annotations

from typing import TYPE_CHECKING, TypedDict

if TYPE_CHECKING:
from collections.abc import Mapping

from .array import ArrayMetadataV2
from .group import GroupMetadataV2


class ConsolidatedMetadataV2(TypedDict):
"""
`.zmetadata` file contents.

The `metadata` map uses flat path keys (`"foo/bar/.zarray"`,
`"foo/.zattrs"`, etc.) pointing to the JSON contents of the file at
that path. The keys include the filename suffix, not just the node path.
"""

zarr_consolidated_format: int
metadata: Mapping[str, GroupMetadataV2 | ArrayMetadataV2]


__all__ = [
"ConsolidatedMetadataV2",
]
19 changes: 19 additions & 0 deletions packages/zarr-metadata/src/zarr_metadata/v2/group.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
"""Zarr v2 group metadata types."""

from typing import Literal, TypedDict


class GroupMetadataV2(TypedDict):
"""
Zarr v2 group metadata document (the `.zgroup` content).

Attributes live in a sibling `.zattrs` file, so they are not part
of this dict.
"""

zarr_format: Literal[2]


__all__ = [
"GroupMetadataV2",
]
13 changes: 13 additions & 0 deletions packages/zarr-metadata/src/zarr_metadata/v3/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
"""Zarr v3 metadata types."""

from zarr_metadata.v3.array import AllowedExtraField, ArrayMetadataV3, MetadataField
from zarr_metadata.v3.consolidated import ConsolidatedMetadataV3
from zarr_metadata.v3.group import GroupMetadataV3

__all__ = [
"AllowedExtraField",
"ArrayMetadataV3",
"ConsolidatedMetadataV3",
"GroupMetadataV3",
"MetadataField",
]
52 changes: 52 additions & 0 deletions packages/zarr-metadata/src/zarr_metadata/v3/array.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
"""Zarr v3 array metadata types."""

from collections.abc import Mapping
from typing import Literal, NotRequired

from typing_extensions import TypedDict

from zarr_metadata.common import JSON, NamedConfig


class AllowedExtraField(TypedDict, extra_items=JSON): # type: ignore[call-arg]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd slightly prefer OptionalExtension as the name here

"""
Extra field on a v3 array metadata document.
Extras must include `must_understand: false` and may carry arbitrary
additional JSON data.
"""

must_understand: Literal[False]


MetadataField = str | NamedConfig
"""A string or a {name: str, configuration: {...}} key value pair, where the 'configuration' key may be omitted. """


class ArrayMetadataV3(TypedDict, extra_items=AllowedExtraField): # type: ignore[call-arg]
"""
Zarr v3 array metadata document (the `zarr.json` content for an array).
Extra keys are permitted if they conform to `AllowedExtraField`.
See https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#array-metadata
"""

zarr_format: Literal[3]
node_type: Literal["array"]
data_type: MetadataField
shape: tuple[int, ...]
chunk_grid: MetadataField
chunk_key_encoding: MetadataField
fill_value: JSON
codecs: tuple[MetadataField, ...]
attributes: NotRequired[Mapping[str, JSON]]
storage_transformers: NotRequired[tuple[MetadataField, ...]]
Comment on lines +37 to +44
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are more restrictions on data_type, chunk_grid, chunk_key_encoding, and codecs than implied by MetadataField. This would be more useful if it were more strictly typed. In addition, MetadataField allowing a string shorthand is wrong in the chunk grid case.

Copy link
Copy Markdown
Contributor Author

@d-v-b d-v-b Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which restrictions are you thinking of? per the spec they can all be strings or {name: <str>} or {name: <str>, config: <object>}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of strict type definitions for <object>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

such as? we don't know anything more about the configuration field except that it's Mapping[str, JSON]

Comment on lines +37 to +44
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage_transformers should be empty array until one is defined as an extension

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

dimension_names: NotRequired[tuple[str | None, ...]]


__all__ = [
"AllowedExtraField",
"ArrayMetadataV3",
"MetadataField",
]
10 changes: 10 additions & 0 deletions packages/zarr-metadata/src/zarr_metadata/v3/chunk_grid/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"""
Zarr v3 chunk grid metadata types.

Each chunk grid lives in its own submodule:

- `regular` -- core v3 spec
- `rectilinear` -- zarr-extensions

See https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#chunk-grids
"""
Loading
Loading