Skip to content

Releases: Blosc/python-blosc2

Release 4.2.0

07 May 11:38

Choose a tag to compare

Changes from 4.1.2 to 4.2.0

CTable: columnar compressed tables

  • Introduced blosc2.CTable, a new columnar table container for compressed, typed columns. CTables support dataclass- and schema-based construction, row iteration, column access, table views, head() / tail() / sample(), sorting, selection and compact where expressions.
  • Added persistent CTables backed by TreeStore, with support for blosc2.open(), CTable.open(), CTable.load(), CTable.save(), CTable.to_b2d() and CTable.to_b2z(). CTable views can be saved too, and .b2z/.b2d path handling has been tightened.
  • Added mutation operations for CTables, including append(), extend(), delete(), compact(), add_column(), drop_column(), rename_column() and related schema validation.
  • Added computed columns, including virtual computed columns backed by lazy expressions, materialized computed columns and automatic filling of materialized computed columns during inserts.
  • Added CTable indexing support, including persistent indexes, direct expression indexes, ordered index reuse, boolean LazyExpr/NDArray masks in CTable.__getitem__, iter_sorted() and indexing support for .b2z tables.
  • Added nullable schema support and null policies for CTable scalar columns, preserving nullable scalar Parquet round-trips.
  • Added variable-length CTable column support via ListArray / ObjectArray, including vlstring and vlbytes schema specs, fixed-length string/bytes import support and list/struct Arrow/Parquet round-trips.
  • Added Arrow, Parquet and CSV interoperability for CTables, including batch-wise Arrow/Parquet import/export, Arrow schema metadata preservation, CTable.from_arrow_batches() improvements and a new parquet-to-blosc2 CLI utility.
  • Added CTable documentation, tutorials, examples and benchmarks covering schema definition, persistence, querying, indexing, mutations, nullable columns, computed columns and variable-length columns.

Indexing and ordering

  • Added a new indexing subsystem for NDArrays and CTables, including full, partial/bucket, light/medium and OPSI-style index kinds, out-of-core index builders and sidecar storage.
  • Added blosc2.Index as the unified public index handle, plus APIs such as create_index(), compact_index(), iter_sorted(), will_use_index() and related query explanation support.
  • Added materialized expression indexes for NDArrays and direct expression indexes for CTables.
  • Added persistent query-result caching for indexed lookups, with FIFO pruning and cache accounting.
  • Added blosc2.argsort() and refactored indexing APIs around explicit index enums and sorting helpers.
  • Improved indexed query performance with Cython accelerators, threaded chunk batching, zero-copy/cached mmap reads, chunk-aware and reduced-order layouts and faster scattered row gathering.
  • Reduced memory usage during index creation and lookup by avoiding full sidecar materialization, replacing memmap staging with Blosc2 scratch arrays and adding tmpdir support for full out-of-core indexes.

Persistence, stores and serialization

  • Added structured Blosc2 serialization based on b2object carriers, including persisted C2Array, LazyExpr and DSL LazyUDF objects.
  • Added blosc2.Ref for serializing external references, plus examples for b2object bundles and persisted expressions/UDFs.
  • Added blosc2.load() as a convenience loader.
  • Added vlmeta support to LazyArray objects.
  • Improved store handling by preserving lazy b2object carriers in DictStore, allowing reopened proxies to refill caches after read-only opens, relaxing DictStore/TreeStore suffix requirements and adding DictStore.to_b2d().
  • Accelerated blosc2.open() by trying standard opens first and warning on implicit append mode.

Arrays, computation and containers

  • Added ObjectArray for fully general object data and renamed the earlier VLArray work accordingly; added ListArray docstrings and Arrow integration improvements.
  • Added schema helpers including numeric specs, blosc2.struct() and blosc2.object() for nested/fully general column declarations.
  • Improved fromiter() with direct chunked construction and substantially lower peak memory use.
  • Improved asarray() behavior for NDArray inputs when copy-inducing keyword arguments are supplied.
  • Added SChunk.reorder_offsets().
  • Improved BatchArray defaults and documentation; the default compression level is now tuned for faster lookup/scan behavior.
  • Continued matmul/linalg optimization work and shared-thread-pool integration.

CLI, docs and examples

  • Added the parquet-to-blosc2 command with options such as --max-rows, --parquet-batch-size, --blosc2-items-per-block and --use-dict.
  • Added new CTable, ObjectArray, BatchArray, containers, indexing and serialization tutorials and examples.
  • Reorganized and expanded the API reference for CTable, Column, schema specs, Index, save/load helpers and miscellaneous APIs.
  • Updated benchmark suites for CTables, indexing, Parquet import/export, BatchArray and NDArray construction/indexing.

Fixes and compatibility

  • Updated bundled C-Blosc2 to v3.0.2 and require C-Blosc2 >= 3.0.0 when building against a system library.
  • Updated bundled C-Blosc2 and miniexpr sources multiple times.
  • Restored compatibility with NumPy < 2.
  • Fixed Windows and mmap/file-locking issues in index creation, rebuilds and temporary file cleanup.
  • Fixed full-index query failures for large CTable columns and full out-of-core merge failures on systems with small /tmp.
  • Fixed stale sidecar/cache reuse and targeted cache invalidation when persistent sidecars are replaced.
  • Fixed .b2z double-open corruption caused by GC-triggered repacking and made temporary .b2z unpacking default to the source file directory.
  • Fixed a regression when reopening persisted proxies in read-only mode.
  • Fixed GC-induced thread hangs on macOS with Python 3.14 and hardened async chunk reading/cache cleanup paths.
  • Fixed lazy-chunk source-size handling in decode/getitem callers.
  • Fixed nullable validation, dictionary extend validation, CTable close propagation, print alignment and NumPy mask support.
  • Fixed arange() regressions and several pre-existing set_slice error-handling issues.
  • Clamped indexing/thread defaults for wasm32.

Blosc2 v4.1.2

03 Mar 11:09

Choose a tag to compare

Updated c-blosc2 for memory leak and other bug fixes

Blosc2 v4.1.1

02 Mar 15:03

Choose a tag to compare

Update miniexpr version to fix bug on Ubuntu-arm64.

Blosc2 v4.1.0

28 Feb 07:13

Choose a tag to compare

  • Add DSL kernel functionality for faster, compiled, user-defined functions which broadly respect python syntax and implement the LazyArray interface. See the introductory tutorial at: https://blosc.org/python-blosc2/getting_started/tutorials/03.lazyarray-udf-kernels.html
  • Add read-only mmap support for store containers:
    DictStore, TreeStore, and EmbedStore now accept mmap_mode="r"
    when opened with mode="r" (including via blosc2.open for .b2d,
    .b2z, and .b2e).
  • New .meta entry for store containers, allowing better store recognition at blosc2.open() time. Fixes #546.
  • Add cumulative_sum and cumulative_prod functions for Array API compliance.
  • Add Unicode string arrays, support comparison operations with them, and optimised compression path.
  • Add endswith and startswith and extend contains to support strings and offer miniexpr multithreaded computation when possible.
  • Use DSL kernels to accelerate arange/linspace constructors by 6-10x.
  • Improve documentation for filters and filters_meta.
  • Fix edge case issues with resize and constructors so that chunks may be set independently of shape, and arrays may be extended from empty consistently.
  • Continued work on miniexpr integration, interface, and support.
  • Ruff fixes and implementation of PEP recommendations.

Blosc2 v4.0.0

29 Jan 14:18

Choose a tag to compare

What's Changed

The main change is hyperfast fully multithreaded computation with miniexpr (final PR * Miniexpr for Windows by @FrancescAlted in #565).
In addition, the internal wheel structure has been changed to implement PEP 427 (@lshaw8317 in #560). In addition:

Full Changelog: v3.12.2...v4.0.0

Blosc2 v4.0.0-b1

22 Jan 15:43

Choose a tag to compare

Blosc2 v4.0.0-b1 Pre-release
Pre-release

This is a beta version with hyperfast multithreaded expression calculatio via the incorporation of miniexpr; as well as better support for plugins (stay tuned for blosc2_openzl plugin!),

What's Changed

New Contributors

Full Changelog: v3.12.2...v4.0.0-b1

Blosc2 v3.12.2

04 Dec 11:46

Choose a tag to compare

What's Changed

  • Hotfix to change WASM wheel hosting to separate repo

Blosc2 v3.12.1

03 Dec 17:10

Choose a tag to compare

What's Changed

  • Allow saving of numba-decorated lazyudfs by @lshaw8317 in #538
  • Automate upload of WASM wheels to GitHub pages

Blosc2 v3.12.0

02 Dec 16:11

Choose a tag to compare

What's Changed

  • LazyUDF objects can now be saved to disk
  • Calls to __matmul__ NumPy ufunc now passed to blosc2.matmul
  • Streamlined LazyUDF.compute is now much more robust and functional
  • The get_chunk method for LazyExpr is more efficient and enabled for general LazyArray objects
  • LazyExpr calculation can now be done even with expressions with pure scalar operands, e.g 10 * 3 +1..

Full Changelog: v3.11.1...v3.12.0

Blosc2 3.11.1

16 Nov 16:40

Choose a tag to compare

What's Changed

✅ Change the NDArray.size to return the number of elements in array, instead of the size of the array in bytes
✅ Bug fixes for lazy expressions to allow a wider range of functionality
✅ Small bug fix for slice indexing with step larger than chunksize
✅ Tweak automatic chunk sizing of results for certain (e.g. linalg) operations to enhance performance
✅ Various cosmetic fixes and streamlining (thanks to the indefatigable @DimitriPapadopoulos)

Full Changelog: v3.11.0...v3.11.1