Releases · Blosc/python-blosc2

07 May 11:38

FrancescAlted

v4.2.0

81a9c09

Release 4.2.0 Latest

Latest

Changes from 4.1.2 to 4.2.0

CTable: columnar compressed tables

Introduced blosc2.CTable, a new columnar table container for compressed, typed columns. CTables support dataclass- and schema-based construction, row iteration, column access, table views, head() / tail() / sample(), sorting, selection and compact where expressions.
Added persistent CTables backed by TreeStore, with support for blosc2.open(), CTable.open(), CTable.load(), CTable.save(), CTable.to_b2d() and CTable.to_b2z(). CTable views can be saved too, and .b2z/.b2d path handling has been tightened.
Added mutation operations for CTables, including append(), extend(), delete(), compact(), add_column(), drop_column(), rename_column() and related schema validation.
Added computed columns, including virtual computed columns backed by lazy expressions, materialized computed columns and automatic filling of materialized computed columns during inserts.
Added CTable indexing support, including persistent indexes, direct expression indexes, ordered index reuse, boolean LazyExpr/NDArray masks in CTable.__getitem__, iter_sorted() and indexing support for .b2z tables.
Added nullable schema support and null policies for CTable scalar columns, preserving nullable scalar Parquet round-trips.
Added variable-length CTable column support via ListArray / ObjectArray, including vlstring and vlbytes schema specs, fixed-length string/bytes import support and list/struct Arrow/Parquet round-trips.
Added Arrow, Parquet and CSV interoperability for CTables, including batch-wise Arrow/Parquet import/export, Arrow schema metadata preservation, CTable.from_arrow_batches() improvements and a new parquet-to-blosc2 CLI utility.
Added CTable documentation, tutorials, examples and benchmarks covering schema definition, persistence, querying, indexing, mutations, nullable columns, computed columns and variable-length columns.

Indexing and ordering

Added a new indexing subsystem for NDArrays and CTables, including full, partial/bucket, light/medium and OPSI-style index kinds, out-of-core index builders and sidecar storage.
Added blosc2.Index as the unified public index handle, plus APIs such as create_index(), compact_index(), iter_sorted(), will_use_index() and related query explanation support.
Added materialized expression indexes for NDArrays and direct expression indexes for CTables.
Added persistent query-result caching for indexed lookups, with FIFO pruning and cache accounting.
Added blosc2.argsort() and refactored indexing APIs around explicit index enums and sorting helpers.
Improved indexed query performance with Cython accelerators, threaded chunk batching, zero-copy/cached mmap reads, chunk-aware and reduced-order layouts and faster scattered row gathering.
Reduced memory usage during index creation and lookup by avoiding full sidecar materialization, replacing memmap staging with Blosc2 scratch arrays and adding tmpdir support for full out-of-core indexes.

Persistence, stores and serialization

Added structured Blosc2 serialization based on b2object carriers, including persisted C2Array, LazyExpr and DSL LazyUDF objects.
Added blosc2.Ref for serializing external references, plus examples for b2object bundles and persisted expressions/UDFs.
Added blosc2.load() as a convenience loader.
Added vlmeta support to LazyArray objects.
Improved store handling by preserving lazy b2object carriers in DictStore, allowing reopened proxies to refill caches after read-only opens, relaxing DictStore/TreeStore suffix requirements and adding DictStore.to_b2d().
Accelerated blosc2.open() by trying standard opens first and warning on implicit append mode.

Arrays, computation and containers

Added ObjectArray for fully general object data and renamed the earlier VLArray work accordingly; added ListArray docstrings and Arrow integration improvements.
Added schema helpers including numeric specs, blosc2.struct() and blosc2.object() for nested/fully general column declarations.
Improved fromiter() with direct chunked construction and substantially lower peak memory use.
Improved asarray() behavior for NDArray inputs when copy-inducing keyword arguments are supplied.
Added SChunk.reorder_offsets().
Improved BatchArray defaults and documentation; the default compression level is now tuned for faster lookup/scan behavior.
Continued matmul/linalg optimization work and shared-thread-pool integration.

CLI, docs and examples

Added the parquet-to-blosc2 command with options such as --max-rows, --parquet-batch-size, --blosc2-items-per-block and --use-dict.
Added new CTable, ObjectArray, BatchArray, containers, indexing and serialization tutorials and examples.
Reorganized and expanded the API reference for CTable, Column, schema specs, Index, save/load helpers and miscellaneous APIs.
Updated benchmark suites for CTables, indexing, Parquet import/export, BatchArray and NDArray construction/indexing.

Fixes and compatibility

Updated bundled C-Blosc2 to v3.0.2 and require C-Blosc2 >= 3.0.0 when building against a system library.
Updated bundled C-Blosc2 and miniexpr sources multiple times.
Restored compatibility with NumPy < 2.
Fixed Windows and mmap/file-locking issues in index creation, rebuilds and temporary file cleanup.
Fixed full-index query failures for large CTable columns and full out-of-core merge failures on systems with small /tmp.
Fixed stale sidecar/cache reuse and targeted cache invalidation when persistent sidecars are replaced.
Fixed .b2z double-open corruption caused by GC-triggered repacking and made temporary .b2z unpacking default to the source file directory.
Fixed a regression when reopening persisted proxies in read-only mode.
Fixed GC-induced thread hangs on macOS with Python 3.14 and hardened async chunk reading/cache cleanup paths.
Fixed lazy-chunk source-size handling in decode/getitem callers.
Fixed nullable validation, dictionary extend validation, CTable close propagation, print alignment and NumPy mask support.
Fixed arange() regressions and several pre-existing set_slice error-handling issues.
Clamped indexing/thread defaults for wasm32.

Assets 2

03 Mar 11:09

lshaw8317

v4.1.2

0fc782e

Blosc2 v4.1.2

Updated c-blosc2 for memory leak and other bug fixes

Assets 2

02 Mar 15:03

lshaw8317

v4.1.1

58e4515

Blosc2 v4.1.1

Update miniexpr version to fix bug on Ubuntu-arm64.

Assets 2

28 Feb 07:13

lshaw8317

v4.1.0

c275744

Blosc2 v4.1.0

Add DSL kernel functionality for faster, compiled, user-defined functions which broadly respect python syntax and implement the LazyArray interface. See the introductory tutorial at: https://blosc.org/python-blosc2/getting_started/tutorials/03.lazyarray-udf-kernels.html
Add read-only mmap support for store containers:
DictStore, TreeStore, and EmbedStore now accept mmap_mode="r"
when opened with mode="r" (including via blosc2.open for .b2d,
.b2z, and .b2e).
New .meta entry for store containers, allowing better store recognition at blosc2.open() time. Fixes #546.
Add cumulative_sum and cumulative_prod functions for Array API compliance.
Add Unicode string arrays, support comparison operations with them, and optimised compression path.
Add endswith and startswith and extend contains to support strings and offer miniexpr multithreaded computation when possible.
Use DSL kernels to accelerate arange/linspace constructors by 6-10x.
Improve documentation for filters and filters_meta.
Fix edge case issues with resize and constructors so that chunks may be set independently of shape, and arrays may be extended from empty consistently.
Continued work on miniexpr integration, interface, and support.
Ruff fixes and implementation of PEP recommendations.

Assets 2

29 Jan 14:18

lshaw8317

v4.0.0

58cce0f

Blosc2 v4.0.0

What's Changed

The main change is hyperfast fully multithreaded computation with miniexpr (final PR * Miniexpr for Windows by @FrancescAlted in #565).
In addition, the internal wheel structure has been changed to implement PEP 427 (@lshaw8317 in #560). In addition:

feat: add support for .b2z, .b2d, .b2e files and update related tests by @bossbeagle1509 in #541
Add none indexing for lazyudf/lazyarray by @lshaw8317 in #545
Respect NUMEXPR_MAX_THREADS when setting numexpr thread count by @skmendez in #567
Add openzl_plugin support by @lshaw8317 in #559

Full Changelog: v3.12.2...v4.0.0

Contributors

FrancescAlted, skmendez, and 2 other contributors

Assets 2

22 Jan 15:43

lshaw8317

v4.0.0-b1

a21b920

Blosc2 v4.0.0-b1 Pre-release

Pre-release

This is a beta version with hyperfast multithreaded expression calculatio via the incorporation of miniexpr; as well as better support for plugins (stay tuned for blosc2_openzl plugin!),

What's Changed

Update pre-commit hooks by @pre-commit-ci[bot] in #537
Fix fancy index item bug by @ykcUconn in #543
feat: add support for .b2z, .b2d, .b2e files and update related tests by @bossbeagle1509 in #541
Add none indexing for lazyudf/lazyarray by @lshaw8317 in #545
Bump actions/download-artifact from 6 to 7 by @dependabot[bot] in #547
Bump actions/upload-artifact from 5 to 6 by @dependabot[bot] in #548
Update pre-commit hooks by @pre-commit-ci[bot] in #550
PEP 639 compliance by @DimitriPapadopoulos in #552
Multi-threaded reductions by @FrancescAlted in #549
Implement PEP recommendations by @lshaw8317 in #560
Add openzl_plugin support by @lshaw8317 in #559

New Contributors

@ykcUconn made their first contribution in #543
@bossbeagle1509 made their first contribution in #541

Full Changelog: v3.12.2...v4.0.0-b1

Contributors

FrancescAlted, DimitriPapadopoulos, and 5 other contributors

Assets 2

04 Dec 11:46

lshaw8317

v3.12.2

f855c2f

Blosc2 v3.12.2

What's Changed

Hotfix to change WASM wheel hosting to separate repo

Assets 2

03 Dec 17:10

lshaw8317

v3.12.1

1e38896

Blosc2 v3.12.1

What's Changed

Allow saving of numba-decorated lazyudfs by @lshaw8317 in #538
Automate upload of WASM wheels to GitHub pages

Contributors

lshaw8317

Assets 2

02 Dec 16:11

lshaw8317

v3.12.0

32211fe

Blosc2 v3.12.0

What's Changed

LazyUDF objects can now be saved to disk
Calls to __matmul__ NumPy ufunc now passed to blosc2.matmul
Streamlined LazyUDF.compute is now much more robust and functional
The get_chunk method for LazyExpr is more efficient and enabled for general LazyArray objects
LazyExpr calculation can now be done even with expressions with pure scalar operands, e.g 10 * 3 +1..

Full Changelog: v3.11.1...v3.12.0

Assets 4

16 Nov 16:40

lshaw8317

v3.11.1

de06734

Blosc2 3.11.1

What's Changed

✅ Change the NDArray.size to return the number of elements in array, instead of the size of the array in bytes
✅ Bug fixes for lazy expressions to allow a wider range of functionality
✅ Small bug fix for slice indexing with step larger than chunksize
✅ Tweak automatic chunk sizing of results for certain (e.g. linalg) operations to enhance performance
✅ Various cosmetic fixes and streamlining (thanks to the indefatigable @DimitriPapadopoulos)

Full Changelog: v3.11.0...v3.11.1

Contributors

DimitriPapadopoulos

Assets 4

Uh oh!

Releases: Blosc/python-blosc2

Release 4.2.0

Changes from 4.1.2 to 4.2.0

CTable: columnar compressed tables

Indexing and ordering

Persistence, stores and serialization

Arrays, computation and containers

CLI, docs and examples

Fixes and compatibility

Uh oh!

Blosc2 v4.1.2

Uh oh!

Blosc2 v4.1.1

Uh oh!

Blosc2 v4.1.0

Uh oh!

Blosc2 v4.0.0

What's Changed

Contributors

Uh oh!

Blosc2 v4.0.0-b1

What's Changed

New Contributors

Contributors

Uh oh!

Blosc2 v3.12.2

What's Changed

Uh oh!

Blosc2 v3.12.1

What's Changed

Contributors

Uh oh!

Blosc2 v3.12.0

What's Changed

Uh oh!

Blosc2 3.11.1

What's Changed

Contributors

Uh oh!