Releases: Blosc/python-blosc2
Release 4.2.0
Changes from 4.1.2 to 4.2.0
CTable: columnar compressed tables
- Introduced
blosc2.CTable, a new columnar table container for compressed, typed columns. CTables support dataclass- and schema-based construction, row iteration, column access, table views,head()/tail()/sample(), sorting, selection and compactwhereexpressions. - Added persistent CTables backed by
TreeStore, with support forblosc2.open(),CTable.open(),CTable.load(),CTable.save(),CTable.to_b2d()andCTable.to_b2z(). CTable views can be saved too, and.b2z/.b2dpath handling has been tightened. - Added mutation operations for CTables, including
append(),extend(),delete(),compact(),add_column(),drop_column(),rename_column()and related schema validation. - Added computed columns, including virtual computed columns backed by lazy expressions, materialized computed columns and automatic filling of materialized computed columns during inserts.
- Added CTable indexing support, including persistent indexes, direct expression indexes, ordered index reuse, boolean
LazyExpr/NDArraymasks inCTable.__getitem__,iter_sorted()and indexing support for.b2ztables. - Added nullable schema support and null policies for CTable scalar columns, preserving nullable scalar Parquet round-trips.
- Added variable-length CTable column support via
ListArray/ObjectArray, includingvlstringandvlbytesschema specs, fixed-length string/bytes import support and list/struct Arrow/Parquet round-trips. - Added Arrow, Parquet and CSV interoperability for CTables, including batch-wise Arrow/Parquet import/export, Arrow schema metadata preservation,
CTable.from_arrow_batches()improvements and a newparquet-to-blosc2CLI utility. - Added CTable documentation, tutorials, examples and benchmarks covering schema definition, persistence, querying, indexing, mutations, nullable columns, computed columns and variable-length columns.
Indexing and ordering
- Added a new indexing subsystem for NDArrays and CTables, including full, partial/bucket, light/medium and OPSI-style index kinds, out-of-core index builders and sidecar storage.
- Added
blosc2.Indexas the unified public index handle, plus APIs such ascreate_index(),compact_index(),iter_sorted(),will_use_index()and related query explanation support. - Added materialized expression indexes for NDArrays and direct expression indexes for CTables.
- Added persistent query-result caching for indexed lookups, with FIFO pruning and cache accounting.
- Added
blosc2.argsort()and refactored indexing APIs around explicit index enums and sorting helpers. - Improved indexed query performance with Cython accelerators, threaded chunk batching, zero-copy/cached mmap reads, chunk-aware and reduced-order layouts and faster scattered row gathering.
- Reduced memory usage during index creation and lookup by avoiding full sidecar materialization, replacing memmap staging with Blosc2 scratch arrays and adding
tmpdirsupport for full out-of-core indexes.
Persistence, stores and serialization
- Added structured Blosc2 serialization based on b2object carriers, including persisted
C2Array,LazyExprand DSLLazyUDFobjects. - Added
blosc2.Reffor serializing external references, plus examples for b2object bundles and persisted expressions/UDFs. - Added
blosc2.load()as a convenience loader. - Added
vlmetasupport toLazyArrayobjects. - Improved store handling by preserving lazy b2object carriers in
DictStore, allowing reopened proxies to refill caches after read-only opens, relaxingDictStore/TreeStoresuffix requirements and addingDictStore.to_b2d(). - Accelerated
blosc2.open()by trying standard opens first and warning on implicit append mode.
Arrays, computation and containers
- Added
ObjectArrayfor fully general object data and renamed the earlierVLArraywork accordingly; addedListArraydocstrings and Arrow integration improvements. - Added schema helpers including numeric specs,
blosc2.struct()andblosc2.object()for nested/fully general column declarations. - Improved
fromiter()with direct chunked construction and substantially lower peak memory use. - Improved
asarray()behavior for NDArray inputs when copy-inducing keyword arguments are supplied. - Added
SChunk.reorder_offsets(). - Improved
BatchArraydefaults and documentation; the default compression level is now tuned for faster lookup/scan behavior. - Continued matmul/linalg optimization work and shared-thread-pool integration.
CLI, docs and examples
- Added the
parquet-to-blosc2command with options such as--max-rows,--parquet-batch-size,--blosc2-items-per-blockand--use-dict. - Added new CTable, ObjectArray, BatchArray, containers, indexing and serialization tutorials and examples.
- Reorganized and expanded the API reference for CTable, Column, schema specs, Index, save/load helpers and miscellaneous APIs.
- Updated benchmark suites for CTables, indexing, Parquet import/export, BatchArray and NDArray construction/indexing.
Fixes and compatibility
- Updated bundled C-Blosc2 to v3.0.2 and require C-Blosc2 >= 3.0.0 when building against a system library.
- Updated bundled C-Blosc2 and miniexpr sources multiple times.
- Restored compatibility with NumPy < 2.
- Fixed Windows and mmap/file-locking issues in index creation, rebuilds and temporary file cleanup.
- Fixed full-index query failures for large CTable columns and full out-of-core merge failures on systems with small
/tmp. - Fixed stale sidecar/cache reuse and targeted cache invalidation when persistent sidecars are replaced.
- Fixed
.b2zdouble-open corruption caused by GC-triggered repacking and made temporary.b2zunpacking default to the source file directory. - Fixed a regression when reopening persisted proxies in read-only mode.
- Fixed GC-induced thread hangs on macOS with Python 3.14 and hardened async chunk reading/cache cleanup paths.
- Fixed lazy-chunk source-size handling in decode/getitem callers.
- Fixed nullable validation, dictionary extend validation, CTable close propagation, print alignment and NumPy mask support.
- Fixed
arange()regressions and several pre-existingset_sliceerror-handling issues. - Clamped indexing/thread defaults for wasm32.
Blosc2 v4.1.2
Updated c-blosc2 for memory leak and other bug fixes
Blosc2 v4.1.1
Update miniexpr version to fix bug on Ubuntu-arm64.
Blosc2 v4.1.0
- Add DSL kernel functionality for faster, compiled, user-defined functions which broadly respect python syntax and implement the
LazyArrayinterface. See the introductory tutorial at: https://blosc.org/python-blosc2/getting_started/tutorials/03.lazyarray-udf-kernels.html - Add read-only mmap support for store containers:
DictStore,TreeStore, andEmbedStorenow acceptmmap_mode="r"
when opened withmode="r"(including viablosc2.openfor.b2d,
.b2z, and.b2e). - New .meta entry for store containers, allowing better store recognition at
blosc2.open()time. Fixes #546. - Add
cumulative_sumandcumulative_prodfunctions for Array API compliance. - Add Unicode string arrays, support comparison operations with them, and optimised compression path.
- Add
endswithandstartswithand extendcontainsto support strings and offerminiexprmultithreaded computation when possible. - Use DSL kernels to accelerate
arange/linspaceconstructors by 6-10x. - Improve documentation for
filtersandfilters_meta. - Fix edge case issues with
resizeandconstructorsso thatchunksmay be set independently of shape, and arrays may be extended from empty consistently. - Continued work on
miniexprintegration, interface, and support. - Ruff fixes and implementation of PEP recommendations.
Blosc2 v4.0.0
What's Changed
The main change is hyperfast fully multithreaded computation with miniexpr (final PR * Miniexpr for Windows by @FrancescAlted in #565).
In addition, the internal wheel structure has been changed to implement PEP 427 (@lshaw8317 in #560). In addition:
- feat: add support for .b2z, .b2d, .b2e files and update related tests by @bossbeagle1509 in #541
- Add none indexing for lazyudf/lazyarray by @lshaw8317 in #545
- Respect NUMEXPR_MAX_THREADS when setting numexpr thread count by @skmendez in #567
- Add openzl_plugin support by @lshaw8317 in #559
Full Changelog: v3.12.2...v4.0.0
Blosc2 v4.0.0-b1
This is a beta version with hyperfast multithreaded expression calculatio via the incorporation of miniexpr; as well as better support for plugins (stay tuned for blosc2_openzl plugin!),
What's Changed
- Update pre-commit hooks by @pre-commit-ci[bot] in #537
- Fix fancy index item bug by @ykcUconn in #543
- feat: add support for .b2z, .b2d, .b2e files and update related tests by @bossbeagle1509 in #541
- Add none indexing for lazyudf/lazyarray by @lshaw8317 in #545
- Bump actions/download-artifact from 6 to 7 by @dependabot[bot] in #547
- Bump actions/upload-artifact from 5 to 6 by @dependabot[bot] in #548
- Update pre-commit hooks by @pre-commit-ci[bot] in #550
- PEP 639 compliance by @DimitriPapadopoulos in #552
- Multi-threaded reductions by @FrancescAlted in #549
- Implement PEP recommendations by @lshaw8317 in #560
- Add openzl_plugin support by @lshaw8317 in #559
New Contributors
- @ykcUconn made their first contribution in #543
- @bossbeagle1509 made their first contribution in #541
Full Changelog: v3.12.2...v4.0.0-b1
Blosc2 v3.12.2
What's Changed
- Hotfix to change WASM wheel hosting to separate repo
Blosc2 v3.12.1
What's Changed
- Allow saving of numba-decorated lazyudfs by @lshaw8317 in #538
- Automate upload of WASM wheels to GitHub pages
Blosc2 v3.12.0
What's Changed
LazyUDFobjects can now be saved to disk- Calls to
__matmul__NumPy ufunc now passed toblosc2.matmul - Streamlined
LazyUDF.computeis now much more robust and functional - The
get_chunkmethod forLazyExpris more efficient and enabled for generalLazyArrayobjects LazyExprcalculation can now be done even with expressions with pure scalar operands, e.g10 * 3 +1..
Full Changelog: v3.11.1...v3.12.0
Blosc2 3.11.1
What's Changed
✅ Change the NDArray.size to return the number of elements in array, instead of the size of the array in bytes
✅ Bug fixes for lazy expressions to allow a wider range of functionality
✅ Small bug fix for slice indexing with step larger than chunksize
✅ Tweak automatic chunk sizing of results for certain (e.g. linalg) operations to enhance performance
✅ Various cosmetic fixes and streamlining (thanks to the indefatigable @DimitriPapadopoulos)
Full Changelog: v3.11.0...v3.11.1