perf(encoding): #111 Phase 3 — const-generic Strategy dispatch by polaz · Pull Request #123 · structured-world/structured-zstd

polaz · 2026-05-14T20:09:04Z

Summary

Phase 3 of the #111 encoder architecture rewrite — replace runtime enum HcParseMode + MatcherBackend dispatch with const-generic Strategy monomorphisation. Closes #122.

After Phase 1 (#119) restructured the encoder into per-backend types and Phase 2 (#121) removed hot-path saturating_*, the residual default-level / level-22 perf gap was hypothesised to come from runtime match self.parse_mode / match self.active_backend branches firing on every encoded position. This phase lifts those choices into the type system: each level → Strategy mapping is now a concrete ZST that implements the Strategy trait. Hot-path entry points are generic over S: Strategy, the compiler monomorphises the inner loops per variant, and every if S::FOO branch resolves at codegen time.

Architectural mechanism

Matcher::start_matching                              // 7-arm match on StrategyTag (per block)
 └─ compress_block::<S>                              // S::BACKEND const match
     ├─ Simple/Dfast/Row                             // backends without parse_mode
     └─ HcMatchGenerator::start_matching_strategy::<S>
         ├─ S::USE_BT == false → start_matching_lazy
         └─ S::USE_BT == true  → start_matching_optimal::<S>
             ├─ HcOptimalCostProfile::const_for_strategy::<S>()
             ├─ should_run_btultra2_seed_pass::<S>           // const false unless S = BtUltra2
             └─ build_optimal_plan::<S>
                 └─ build_optimal_plan_impl::<S, ACC, FAV>
                     └─ SIMD wrapper::<S, ACC, FAV>
                         └─ build_optimal_plan_impl_body!(S)
                             ├─ S::OPT_LEVEL == 0  → abort_on_worse_match
                             ├─ S::OPT_LEVEL >= 2  → opt_level (refined)
                             └─ $collect::<S, true>
                                 └─ collect_optimal_candidates_initialized_body!(S)
                                     └─ S::USE_HASH3 → hash3 lookup (const-gated)

LevelParams is the single source of truth from CompressionLevel to strategy: it carries strategy_tag: StrategyTag, and the runtime BackendTag consumed by the driver dispatcher is derived via StrategyTag::backend(). reset() reads params.strategy_tag directly — there is no second for_compression_level(level) recomputation that could drift.

Commit map (20 commits)

Core dispatch (commits 1-10):

c7c0c76e — Strategy trait + 7 ZST types (Fast / Dfast / Greedy / Lazy / BtOpt / BtUltra / BtUltra2). Associated consts: BACKEND, MIN_MATCH, ACCURATE_PRICE, FAVOR_SMALL_OFFSETS, USE_HASH3, USE_BT, OPT_LEVEL, MAX_CHAIN_DEPTH, SUFFICIENT_MATCH_LEN. StrategyTag::for_level(u8) mirrors LEVEL_TABLE row-for-row.
46b0ef94 — plumb StrategyTag into MatchGeneratorDriver.
e27a2e96 — compress_block<S> generic entry on the driver.
5c5adfaf — S::USE_BT replaces match parse_mode in HC dispatch.
8f599fba — DP body const-folds abort_on_worse_match / opt_level from S::OPT_LEVEL. Primary hot-path commit.
d5d29ac1 — diagnostic perf probes (#[ignore]'d).
dd6f23cd — S::USE_HASH3 const-gates the hash3 lookup.
d8355d16 — btultra2 seed pass gated on S::OPT_LEVEL && S::USE_HASH3.
23839074 — HcOptimalCostProfile::const_for_strategy<S>().
30f1c9a7 — full rip-out of HcParseMode + MatcherBackend runtime enums, the parse_mode field on HcConfig and HcMatchGenerator, Strategy::PARSE_MODE, the runtime for_mode(HcParseMode, bool) API.

CI infrastructure (commits 11-12, 16, 19):

97a8ab28 — split bench matrix per level (target × level = 18 shards) + benchmark-aggregate job; ~25 min wall-clock.
18bb80d3 — cargo fmt --check fix.
ee62ec0a — separate bench-build (3 jobs) from benchmark (18 shards). Shards download a pre-built binary via STRUCTURED_ZSTD_BENCH_BIN; no cargo invocation in shards. Saves ~5 min per shard.
79f0955c — bench target inventory centralised behind a new bench-matrix job that emits the canonical JSON; bench-build, benchmark, benchmark-aggregate, benchmark-pages all consume fromJSON(needs.bench-matrix.outputs.targets). Split per-target build_setup (run only on bench-build) from runtime_setup (run only on benchmark shards) — x86_64-musl no longer reinstalls musl-tools on every runtime shard. benchmark-aggregate ships a single combined benchmark-aggregated artifact, replacing the three hardcoded per-target uploads. Also refreshes stale migration comments in match_generator.rs (the const-generic dispatch had landed but the comments still framed it as future work).

Post-review hardening (commits 13-15, 17-18, 20):

372ef501 — strategy_tag field on HcMatchGenerator (mirrored from driver during configure()). Test-only dispatchers route BtOpt/BtUltra/BtUltra2 to distinct monomorphisations instead of collapsing. Level(n) clamp moved to i32 before u8 cast (Level(256) no longer wraps to Dfast); regression test added. tests/common::dump_block_breakdown shared parser, --tests truncated frame header guard. supported_levels_filtered() panics on unknown bench filter tokens.
910db7b3 — collapse pass2-era profile-duplicate tests, identity BackendTag → BackendTag match dropped, seed pass uses type S = BtUltra2 consistently, parse_zstd_block_breakdown returns None on truncated header, bench filter panics for any unknown token (not just empty result).
b63a37cd — LevelParams switched from backend: BackendTag to strategy_tag: StrategyTag with derived backend() method. reset() reads params.strategy_tag directly. Three hc_collect_optimal_candidates_*_hash3_* / hc_hash3_tail_match_* tests removed (they relied on a hash3_log != 0 test-only escape hatch that masked an unreachable production state). Plus retention-days: 7 on bench shard artifacts.
bdd88ffc — new fuzz CI matrix (5 targets — decode/encode/interop/huff0/fse). Each shard replays the existing zstd/fuzz/artifacts/ regression corpus first, then runs 90s of fresh libFuzzer time. Crash artifacts uploaded on failure (14-day retention).
49f7f2c2 — fuzz CI followup: zstd/fuzz/Cargo.toml had no edition field (defaulted to 2015, breaking use structured_zstd::…); pinned RUSTUP_TOOLCHAIN=nightly at the job env level so cargo fuzz survives the rust-toolchain.toml stable pin once it shells out to its inner cargo build.
bd02dc6e — fuzz_targets/interop.rs bug: encode_szstd_uncompressed / encode_szstd_compressed drained the input Read into a local Vec and then compressed the now-empty Read, so the encoding branches silently encoded zero bytes and the two corpus artifacts under artifacts/interop/ triggered an assertion on every CI replay. Takes &[u8] directly now; encode_szstd_compressed uses CompressionLevel::Default instead of duplicating the uncompressed variant.

Perf delta vs `perf/post-pr-121-baseline` (host x86_64-darwin, best of 5)

Note: the original #122 perf targets (≥30 % default-level, ≥15 % fast-level) were not hit. The const-generic dispatch by itself didn't move the needle on host because the runtime reads it replaces (self.parse_mode, self.use_hash3) were already cache-hot single-instruction loads with perfectly-predicted branches. The structural foundation is the real deliverable; the perf payoff is gated on Phase 4 (BT/HC specialization + donor fast paths).

Scenario	Level	Baseline	HEAD	Δ
decodecorpus-z000033	1	19.36 ms	18.87 ms	−2.53 %
decodecorpus-z000033	3	92.60 ms	94.23 ms	+1.76 %
decodecorpus-z000033	7	91.17 ms	92.02 ms	+0.93 %
decodecorpus-z000033	11	105.45 ms	102.95 ms	−2.37 %
decodecorpus-z000033	22	372.44 ms	366.61 ms	−1.56 %
high-entropy-1m	1	0.741 ms	0.706 ms	−4.72 %
high-entropy-1m	7	1.19 ms	1.12 ms	−5.98 %
high-entropy-1m	11	1.66 ms	1.51 ms	−9.17 %
high-entropy-1m	22	2.67 ms	2.81 ms	+5.59 %
low-entropy-1m	1–22	—	—	±1 % (noise)
large-log-stream	1	16.15 ms	16.13 ms	−0.14 %
large-log-stream	22	20.37 ms	19.81 ms	−2.74 %

Architectural payoff

Type system enforces correctness: compress_block::<Fast> cannot reach start_matching_optimal (no USE_BT); BtUltra2-only hash3 code is unreachable from non-BtUltra2 monomorphisations.
Single source of truth for strategy. LevelParams::strategy_tag is the only authoritative mapping from CompressionLevel to strategy; the driver derives BackendTag from it instead of parallel-resolving the level twice.
HcParseMode and MatcherBackend removed, single BackendTag + StrategyTag in strategy.rs.
Foundation for Phase 4-5 ready — LDM, block splitter, BT/HC specialization can plug into the per-monomorphisation hot path without restructuring dispatch.
CI bench wall-clock cut from ~45 min to ~25 min, with builds reused across the 18 bench shards (one build per target instead of one per shard). Bench target inventory now lives in a single bench-matrix JSON output consumed by every downstream bench job — adding a new target is a one-row change.
Fuzz coverage restored — the existing zstd/fuzz/ setup now runs every PR with regression-corpus replay + 90s libFuzzer per target. The replay step also surfaced (and we fixed) a real bug in the interop target that had silently encoded zero bytes since the fuzz crate was first written.

Verification

457/457 nextest (3 tests removed in b63a37cd — they tested a production-unreachable hash3_log != 0 + search_depth = 0 state that the removed test-only escape hatch made possible).
cargo clippy clean (default + --tests + --benches).
cargo fmt --check clean.
Build × 2 (default + --no-default-features) at every commit.
Level22 ratio gate (level22_sequences_match_donor_on_corpus_proxy) green at every commit.
Three new diagnostic probes (#[ignore]'d) document:
- high-entropy-1m level22 speedup origin (source-size-hint-gated raw fast path)
- incompressibility detector false-positive check (mixed-content + textish + pattern-only — all ratios identical to C ±7 byte frame overhead)
- corpus z000033 +0.08 % gap is block-split heuristic (233 vs 256 compressed blocks), not core compression
CI bench split round-tripped through a synthetic two-shard fixture; new bench-build pipeline smoke-tested locally with STRUCTURED_ZSTD_BENCH_BIN=....

Acceptance criteria from #122

Strategy trait + 7 concrete strategy types
compress_block<S: Strategy> top-level entry — HcParseMode runtime dispatch removed from inner loops
No match self.parse_mode or if self.use_hash3 style branches on the encoded-position hot path
HcParseMode enum deleted (plus MatcherBackend deleted)
cargo nextest run --features hash,std,dict_builder 100 % pass
level22_sequences_match_donor_on_corpus_proxy ratio gate PASS on every commit
Roundtrip + cross-validation tests pass on every commit
Build clean (default + --no-default-features) + cargo clippy --all-targets -- -D warnings clean
Perf targets NOT met — see report above. Phase 3 ships as architectural refactor; perf payoff deferred to Phase 4.

Closes #122. Refs #111.

Summary by CodeRabbit

Refactor
- Encoder internals reorganized to a strategy-driven design, reducing runtime branching and improving hot-path performance.
CI
- Added nightly fuzz smoke job and reworked benchmarking into build, per-level shards, and a new aggregation step; publishing updated to use the consolidated benchmark artifact.
- Benchmark runner can now use a prebuilt bench binary.
New Tools
- Added a script to merge per-level benchmark shards into per-target consolidated outputs.
Tests
- Added several ignored/manual probe tests and new frame-analysis test utilities.

Phase 3 commit 1: lay down the type-level skeleton that future commits will route the encoder hot paths through. No call sites change — this commit only adds new types alongside the existing runtime `HcParseMode` / `MatcherBackend` dispatch. - `Strategy` trait with associated consts mirroring the donor `ZSTD_compressionParameters` template parameters: `BACKEND`, `MIN_MATCH`, `ACCURATE_PRICE`, `FAVOR_SMALL_OFFSETS`, `USE_HASH3`, `USE_BT`, `OPT_LEVEL`, plus a `PARSE_MODE` bridge to the existing runtime enum. - 7 concrete strategy types `Fast` / `Dfast` / `Greedy` / `Lazy` / `BtOpt` / `BtUltra` / `BtUltra2`. Levels 5-15 collapse onto `Lazy` because their differences are runtime `HcConfig` fields, not compile-time consts. - `BackendTag` enum + `StrategyTag` runtime dispatcher with a `for_level(u8) -> StrategyTag` const fn that mirrors the donor `clevels.h` mapping (1 → Fast, 2-3 → Dfast, 4 → Greedy, 5-15 → Lazy, 16-17 → BtOpt, 18-19 → BtUltra, 20-22 → BtUltra2). - Bridge methods `StrategyTag::parse_mode()` / `::backend()` so the next commits can introduce a generic entry point that still hands off to the runtime-enum-consuming code. - Test coverage: - Each `Strategy` impl's `PARSE_MODE` and `BACKEND` agree with the matching `StrategyTag` arm. - `for_level` returns the right tag at every band boundary plus a mid-band level. - `USE_BT` aligns with parse_mode (only BtOpt / BtUltra / BtUltra2 walk the BT). - `USE_HASH3` is set exclusively for BtUltra2 (donor parity). - `ACCURATE_PRICE` / `FAVOR_SMALL_OFFSETS` track the `HcOptimalCostProfile::for_mode` runtime table row-for-row so the eventual `const_for::<S>` rewrite is a mechanical swap. Refs #122 Phase 3.

Phase 3 commit 2: resolve the compile-time strategy tag once per `reset()` and store it on the driver alongside the existing `active_backend` runtime enum. No hot-path dispatch changes yet — the field is set, debug-asserted to stay in lock-step with `active_backend`, and covered by a new test. - Add `StrategyTag::for_compression_level(CompressionLevel)` mirroring `resolve_level_params` exactly: named variants (`Uncompressed` → Fast, `Default` → Dfast, `Better`/`Best` → Lazy) and numeric `Level(n)` clamped through the existing `for_level(u8)` band map. Negative levels fall back to `Fast` (Simple matcher with elevated `hash_fill_step`), matching the `resolve_level_params` Simple-arm. - Add `strategy_tag: StrategyTag` to `MatchGeneratorDriver`. Initialise to `StrategyTag::Fast` to match the existing `active_backend = MatcherBackend::Simple` default. - Resolve `strategy_tag` from the requested `CompressionLevel` inside `reset()`, immediately after the existing `active_backend` assignment. A `debug_assert!` enforces the invariant `strategy_tag.backend() == active_backend` so future drift in `for_compression_level` or `resolve_level_params` is caught in debug builds. - New regression test `driver_reset_keeps_strategy_tag_in_sync_with_active_backend` walks every band boundary (1, 2-3, 4, 5-15, 16-17, 18-19, 20-22) plus all four named variants and asserts both the expected tag and the matching backend. This commit is purely additive — no behaviour change for any caller. Sets up the data flow that commit 3 will branch the hot dispatch on to enter `compress_block::<S>` monomorphisations. Refs #122 Phase 3.

Phase 3 commit 3: introduce the monomorphised per-block entry point without changing any inner hot-loop behaviour. Each `Matcher::start_matching` call now picks a concrete `S: Strategy` from `self.strategy_tag` (resolved at `reset()` in commit 2) and hands off to `compress_block::<S>`. The body still uses runtime dispatch internally, so this commit is behaviour-preserving — it exists to plumb the type parameter into the HC matcher so later commits can replace inner `match self.parse_mode` branches with `if S::USE_BT` / `if S::USE_HASH3` const checks. - `MatchGeneratorDriver::start_matching` becomes a 7-arm match over `self.strategy_tag` dispatching into `compress_block::<S>` per variant. - `compress_block<S: Strategy>(&mut self, …)` lives next to the `impl Matcher` block. Body switches on `S::BACKEND` (an associated `const`), so each monomorphisation keeps exactly one backend arm after const-eval folds the `match` — verified by inspection of the generated IR before/after. - For the `HashChain` backend, `compress_block` calls a new `HcMatchGenerator::start_matching_strategy<S>` method that today just delegates back to the existing runtime-dispatched `start_matching`. A `debug_assert_eq!` enforces `S::PARSE_MODE == self.parse_mode` so the bridge cannot drift silently while we migrate the inner loops in commits 4-7. 463/463 nextest, clippy clean, build × 2. Refs #122 Phase 3.

…e_mode Phase 3 commit 4: the strategy-aware entry on `HcMatchGenerator` now branches on the compile-time `S::USE_BT` constant. Each monomorphisation (Lazy / BtOpt / BtUltra / BtUltra2) keeps exactly one of `start_matching_lazy` / `start_matching_optimal` — the dead arm is dropped at codegen time, so the per-block dispatch no longer pays for a runtime tag check. The lazy/optimal split is the outer-most parse_mode branch — every encoded position inside the block already runs the right inner parser, but until now the dispatch *itself* re-evaluated `self.parse_mode` on every block call. - `HcMatchGenerator::start_matching_strategy<S>` body becomes `if S::USE_BT { start_matching_optimal } else { start_matching_lazy }`. - The runtime-dispatched inherent `start_matching` is now `#[cfg(test)]` — the production driver path goes through `start_matching_strategy` exclusively. Tests that exercise HcMatchGenerator directly still use the inherent method; the final cleanup commit removes both the inherent method and the `parse_mode` field. - `debug_assert_eq!(self.parse_mode, S::PARSE_MODE)` at the entry catches any future drift between the runtime field (still set by `configure`) and the const passed by the dispatcher. - Cost-model and parse-mode parity assertions in `strategy::tests` moved from `#[test] fn` into compile-time `const _: () = { assert!(…) }` blocks per `clippy::assertions_on_constants` (the asserts only inspect associated `const`s — they belong at compile time, not in the runner). 460/460 nextest, clippy clean (default + `--tests`), build × 2. Refs #122 Phase 3.

Phase 3 commit 5: thread the const-generic `S: Strategy` through the optimal parser entry, BT dispatcher, and the four SIMD-kernel wrappers down to the `build_optimal_plan_impl_body!` macro. Inside the macro the two `parse_mode` reads that previously drove every candidate insertion and traceback step are replaced with strategy-projected `let` bindings — each is a compile-time constant per monomorphisation, so LLVM folds them and drops the dead arms in each strategy's instance. - `build_optimal_plan_impl_body!` gains a `$strategy_ty:ty` parameter and computes `abort_on_worse_match = <S>::OPT_LEVEL == 0` / `opt_level = <S>::OPT_LEVEL >= 2`. A `debug_assert!` enforces `<S>::USE_BT` so the body never runs for a non-BT strategy. - Each of `build_optimal_plan_impl_neon` / `_sse42` / `_avx2_bmi2` / `_scalar` gains `S: Strategy` as a leading type parameter alongside the existing `ACCURATE_PRICE` / `FAVOR_SMALL_OFFSETS` const-generics; they pass S through into the macro body. - `build_optimal_plan_impl<S>` plumbs S into the 4 kernel dispatchers. The runtime 4-arm match on `(profile.accurate, profile.favor_small_offsets)` stays for now — `generic_const_exprs` would be needed to project the bool consts directly into a const-generic argument list, and the dispatch is per-block, not per-position. In practice only two arms are reached (BtOpt → (false, true), BtUltra / BtUltra2 → (true, false)). - `build_optimal_plan<S>` and `start_matching_optimal<S>` follow. Both `start_matching_strategy<S>` (production path) and the `#[cfg(test)]` inherent `start_matching` (now 4-arm on `HcParseMode` → concrete `BtOpt` / `BtUltra` / `BtUltra2` types) invoke the generic entry. `run_btultra2_seed_pass` pins `S = BtUltra2` locally — by name it is BtUltra2-exclusive. - `nested const items cannot project through outer generics`: switched the per-strategy flags from `const ABORT_ON_WORSE_MATCH: bool = …` to `let abort_on_worse_match: bool = …` so the expressions live in the parent fn's substitution scope. The optimiser still folds them per monomorphisation. 460/460 nextest, clippy clean (default + `--tests`), build × 2. Refs #122 Phase 3.

Three ignored manual probes that document the level22 perf claims with executable evidence. All gated on `#[ignore]` so they do not run in CI by default; invoke with `cargo nextest run --release ... --run-ignored ignored-only`. - `he_level22_ratio` — confirms the high-entropy-1m scenario speedup (~10x vs C FFI) is driven by the source-size-hinted raw-fast-path; toggles the hint off to verify the slow-path matches C-zstd timing. - `incompressibility_falsepos` — exercises the `block_looks_incompressible` heuristic on hard-random, mixed, textish, and pattern-only inputs to confirm zero false positives (rust/C output sizes match within a 7-byte frame overhead). - `corpus_level22_gap` — quantifies the +0.08 % decodecorpus-z000033 output gap as a block-split heuristic divergence (233 vs 256 compressed blocks), not a core-compression regression. Refs #122 Phase 3 perf accounting.

Phase 3 commit 6: thread S through the optimal-candidates body and its four SIMD-kernel wrappers, then gate the hash3 short-match lookup (donor `static (mls==3)` probe) on `S::USE_HASH3`. Only `BtUltra2` sets `USE_HASH3 = true`; every other monomorphisation drops the entire `update_hash3_until` + `$hash3` block at codegen time. On the level22 hot path this removes one indirect read + branch + table walk per encoded position. - `collect_optimal_candidates_initialized_body!` gains `$strategy_ty:ty` and a `let use_hash3: bool = <S>::USE_HASH3` binding. The hash3 sub-block becomes `if use_hash3 && … { … }` so the dead arm folds away per S. - `debug_assert!` confirms the runtime/compile-time invariant (`USE_HASH3 = true` implies `hash3_log != 0`). - Each of the four SIMD wrappers (`collect_optimal_candidates_ initialized_neon` / `_sse42` / `_avx2_bmi2` / `_scalar`) gains `S: Strategy` alongside the existing `USE_BT_MATCHFINDER` const. The cross-platform dispatcher (`collect_optimal_candidates_ initialized`) plumbs S through to all four arms. - The DP body call sites in `build_optimal_plan_impl_body!` (`$self.$collect::<$strategy_ty, true>(…)`) thread S into the per-position pipeline. - `#[cfg(test)] collect_optimal_candidates` rewritten as a four-arm match on `(uses_bt, hash3_log != 0)` so artificial test configurations still reach the right monomorphisation — production callers never hit this helper. 460/460 nextest, clippy clean, build × 2. Refs #122 Phase 3.

Phase 3 commit 7: make `should_run_btultra2_seed_pass` generic over `S: Strategy` and short-circuit the predicate at compile time for every non-BtUltra2 monomorphisation. Only `BtUltra2` satisfies `S::OPT_LEVEL == 2 && S::USE_HASH3`, so Fast / Dfast / Greedy / Lazy / BtOpt / BtUltra now drop both the predicate body and the inlined call to `run_btultra2_seed_pass` after the per-strategy monomorphisation pass. - `should_run_btultra2_seed_pass<S>` starts with `if !(S::OPT_LEVEL == 2 && S::USE_HASH3) { return false; }` — a const expression per S, so LLVM folds it and the rest of the predicate (HcBackend::Bt unwrap, frame-start checks, dictionary state) only emits for BtUltra2. - Replaced the runtime `self.is_btultra2()` read inside the predicate with the const path; the `is_btultra2` helper stays for the (single remaining) profile-selection site. - `start_matching_optimal::<S>` now invokes `self.should_run_btultra2_seed_pass::<S>(current_len)`. - All test call sites pin `S = BtUltra2` (the only strategy where the predicate is interesting); the corresponding assertions remain unchanged. 460/460 nextest, clippy clean, build × 2. Refs #122 Phase 3.

Phase 3 commit 8: replace the runtime profile selection in `start_matching_optimal::<S>` with `HcOptimalCostProfile:: const_for_strategy::<S>()`. The four profile fields (max chain depth, sufficient match length, accurate flag, favor-small-offsets flag) come straight from the strategy's associated consts, so the optimiser folds the whole profile into a single literal at codegen time. The old two-arm runtime read (`if self.is_btultra2() { for_mode(parse_mode, true) } else { for_mode(parse_mode, false) }`) disappears. - Add `Strategy::MAX_CHAIN_DEPTH` and `Strategy::SUFFICIENT_MATCH_LEN` associated consts, mirroring the `HcOptimalCostProfile::for_mode` table row-for-row. Non-BT strategies (Fast / Dfast / Greedy) get the Lazy2 row so the trait stays total — those values are never read because USE_BT short-circuits the path. - Add `HcOptimalCostProfile::const_for_strategy<S>()` next to the existing `for_mode`. Marked `#[inline]` so the literal lands at the call site after monomorphisation. - `start_matching_optimal::<S>` calls the new const peer; the inherent `HcMatchGenerator::is_btultra2()` accessor is gone (no remaining readers). - `for_mode` itself stays — it has callers in tests and in `run_btultra2_seed_pass` (which pins `S = BtUltra2` and could switch to `const_for_strategy` later, but the call is once per frame so it is not on the hot path). 460/460 nextest, clippy clean (default + `--tests`), build × 2. Refs #122 Phase 3.

Phase 3 commit 9 (close-out): delete both legacy runtime dispatch enums and every read site that depended on them. The encoder now selects its match-finder backend exclusively through the const-generic `S: Strategy` trait and the once-per-frame `StrategyTag` resolved at `reset()`. - `HcParseMode` (Lazy2 / BtOpt / BtUltra / BtUltra2) deleted from `strategy.rs`. The `Strategy::PARSE_MODE` associated const and the `StrategyTag::parse_mode()` bridge are gone with it — nothing reads them now that every dispatch routes through `S::USE_BT`, `S::USE_HASH3`, `S::OPT_LEVEL`, etc. - `MatcherBackend` (Simple / Dfast / Row / HashChain) deleted from `match_generator.rs`. All references (`LevelParams::backend`, `MatchGeneratorDriver::active_backend`, plus the `retire_dictionary_budget` / `trim_after_budget_retire` / `skip_matching_for_dictionary_priming` 4-arm matches) now use the unified `super::strategy::BackendTag`. Same 4-variant shape, one enum. - `HcConfig` drops its `parse_mode: HcParseMode` field; every `HcConfig` literal in `LEVEL_TABLE` and the per-source-size variants (`BTOPT_HC_CONFIG`, `BTULTRA_HC_CONFIG`, `BTULTRA2_HC_CONFIG`, etc.) shrinks by one line. - `HcMatchGenerator::parse_mode` field deleted. `HcMatchGenerator::configure` gains an explicit `tag: StrategyTag` argument from which it derives `is_btultra2`, `uses_bt`, and the search-depth cap. The driver passes `self.strategy_tag` (set at `reset()`) on the single production call site. - `HcOptimalCostProfile::for_mode(HcParseMode, bool)` deleted from `cost_model::mod`. `const_for_strategy::<S>()` is now the only way to build a profile. Every call site (production + `run_btultra2_seed_pass` + the cost-model tests) was migrated to pass an explicit `S = strategy::{Lazy, BtOpt, BtUltra, BtUltra2}`. - The three `level_*_use_*_parse_mode` donor-parity tests are rewritten to assert against `StrategyTag::for_level(level)` directly (the data they were checking lives in the const-generic trait now). The `#[cfg(test)] HcMatchGenerator::start_matching` helper picks a strategy from the runtime table flags (`uses_bt`/`is_btultra2`/`hash3_log`) so artificial test setups still reach the right monomorphisation. - `start_matching_strategy::<S>`'s `debug_assert_eq!` now compares `S::USE_BT` against `self.table.uses_bt` (the field set by `configure`) instead of the deleted `self.parse_mode` / `S::PARSE_MODE` pair. - Module-level docs in `strategy.rs` are rewritten as a single Phase-3 dispatch flow diagram covering every const-folded gate (`USE_BT`, `USE_HASH3`, `OPT_LEVEL`, `ACCURATE_PRICE`, `FAVOR_SMALL_OFFSETS`, `MAX_CHAIN_DEPTH`, `SUFFICIENT_MATCH_LEN`). `encoding/mod.rs` header updated to drop the now-stale `HcParseMode` reference and to add `perf/post-pr-121-baseline`. 460/460 nextest, clippy clean (default + `--tests`), build × 2. Phase 3 acceptance criterion "HcParseMode enum deleted" — done. Refs #122 Phase 3.

Before: a single CI runner per target ran every level back-to-back (~45 min on i686, with level22 dominating ~20 min on its own). After: matrix split target × level (3 targets × 6 levels = 18 parallel runners) plus a single aggregation job that consolidates per-target shards back into the existing per-target artifact shape, so the cross-target merge step downstream stays unchanged. - `support::supported_levels_filtered()` honours `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER` (comma-separated level names); every loop in `compare_ffi.rs` that used to iterate over `supported_levels()` now uses the filtered variant. With no env var, behaviour is identical to before. - `.github/scripts/aggregate-bench-levels.py` reads `benchmark-{results,delta,relative,report}.<target>.<level>.<ext>` shards (uploaded as `benchmark-shard-<target>-<level>` artifacts) and emits per-target consolidated `benchmark-{results,delta,relative,report}.<target>.<ext>` files matching the pre-split naming. Markdown shards are concatenated under per-level `## Level: <name>` sections. - CI workflow: - `benchmark` job: matrix becomes `bench × level`. Each runner sets `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER` and uploads its `benchmark-shard-<target>-<level>` artifact. - new `benchmark-aggregate` job (post-benchmark): downloads every `benchmark-shard-*` artifact, runs the aggregation script, and re-uploads three target-shaped artifacts (`benchmark-x86_64-gnu`, `benchmark-i686-gnu`, `benchmark-x86_64-musl`). `github-action-benchmark` moved here so the regression alert fires on the consolidated x86_64-gnu result. - `benchmark-pages` now depends on `benchmark-aggregate` and downloads the three per-target artifacts explicitly (instead of the previous `benchmark-*` glob) so the shard artifacts don't leak into `merge-benchmarks.py`'s `<target>.<level>` name extraction. - Per-runner `timeout-minutes` cut from 40/55/40 to 25/30/25 matching the new per-level work share. Smoke-test: - `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER=fastest cargo bench --list` emits only `fastest` benchmarks; no env var → all 6 levels. - Aggregator round-trips a synthetic two-shard fixture and produces the expected consolidated JSON + markdown files. - Local nextest 460/460, clippy clean (default + benches). Refs #122 Phase 3 — supporting infrastructure.

coderabbitai · 2026-05-14T20:09:16Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b3b28ce5-94ca-4082-9b31-c9aa556ab415

📥 Commits

Reviewing files that changed from the base of the PR and between 3c7f8b7 and b7e45c9.

📒 Files selected for processing (2)

.github/workflows/ci.yml
zstd/src/encoding/match_generator.rs

📝 Walkthrough

Walkthrough

Adds a nightly fuzz job; splits benchmarks into bench-matrix, bench-build, and a benchmark job matrix that emits per-(target,level) shards; introduces benchmark-aggregate to merge level shards into per-target artifacts using .github/scripts/aggregate-bench-levels.py; updates pages and runner to consume consolidated outputs.

Changes

CI benchmark pipeline

Layer / File(s)	Summary
Fuzz job `.github/workflows/ci.yml`	Adds `fuzz` job running short libFuzzer smoke runs per fuzz target, replaying regression artifacts and uploading crash artifacts on failure.
bench-matrix job (targets inventory) `.github/workflows/ci.yml`	Produces `targets.json` and exports `targets` / `ids_csv` workflow outputs for downstream matrix and build jobs.
Bench-build job (build & upload bench binary) `.github/workflows/ci.yml`	Adds `bench-build` job that builds the `compare_ffi` benchmark binary per target and uploads `bench-binary-<target-id>` artifacts.
Benchmark job: target × level matrix and shard uploads `.github/workflows/ci.yml`	Reworks `benchmark` job to a target×level matrix, sets `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER` for per-level runs, renames outputs to include target+level, and uploads per-(target,level) shard artifacts `benchmark-shard-<target>-<level>`.
Aggregate shards into per-target artifacts & pages update `.github/workflows/ci.yml`, `.github/scripts/aggregate-bench-levels.py`	`benchmark-aggregate` downloads `benchmark-shard-*` shards, runs `aggregate-bench-levels.py` to merge level shards into consolidated per-target `benchmark-<target>` artifacts, uploads a single `benchmark-aggregated` artifact, and `benchmark-pages` now depends on `benchmark-aggregate` and downloads only the consolidated artifact.
Run-benchmarks script: reuse pre-built binary `.github/scripts/run-benchmarks.sh`	Support using a pre-built Criterion bench binary via `STRUCTURED_ZSTD_BENCH_BIN` (executable check + run with `--bench --output-format bencher`) and fall back to `cargo bench` when unset; preserves existing raw output capture.

Sequence Diagram(s)

sequenceDiagram
  participant BenchBuild as bench-build (GH job)
  participant ArtifactStore as Actions Artifacts
  participant BenchmarkJob as benchmark (GH job matrix)
  participant RunnerScript as run-benchmarks.sh
  participant AggregateJob as benchmark-aggregate (GH job)
  participant AggregateScript as aggregate-bench-levels.py
  participant PagesJob as benchmark-pages (GH job)

  BenchBuild->>ArtifactStore: upload bench-binary-<target>
  BenchmarkJob->>ArtifactStore: download bench-binary-<target>
  BenchmarkJob->>RunnerScript: invoke with STRUCTURED_ZSTD_BENCH_LEVEL_FILTER
  RunnerScript->>BenchmarkJob: produce per-(target,level) shard files
  BenchmarkJob->>ArtifactStore: upload benchmark-shard-<target>-<level>
  AggregateJob->>ArtifactStore: download benchmark-shard-*
  AggregateJob->>AggregateScript: run aggregate-bench-levels.py to merge shards
  AggregateScript->>ArtifactStore: upload consolidated benchmark-aggregated
  PagesJob->>ArtifactStore: download benchmark-aggregated

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

feat: encoder architecture rewrite (split monolith, const-generic Strategy, arena allocator) #111: Contains the Strategy/StrategyTag encoder refactor; this PR includes CI benchmark changes to evaluate level-wise performance related to that work.
Roadmap: structured-zstd feature parity with C zstd #28: Large encoder architecture rewrite work that overlaps the match_generator/cost_model refactors and benchmark plumbing in this PR.

Possibly related PRs

structured-world/structured-zstd#117: Overlaps at match_generator wiring changes and Simple matcher integration.
perf(bench): add rust/ffi delta benchmark artifacts #65: Related to benchmark artifact naming and merging; complements the new aggregation script.
fix(encoding): implement default compression level #34: Overlaps backend/matcher routing changes that interact with the Strategy/StrategyTag refactor.

"I'm a rabbit in the CI shrub,
I hop through artifacts and log,
I build a tiny binary, then I stitch the shards,
Levels line up like garden rows, targets stacked in bog.
I nibble duplicates and stitch the rest — metrics grow like carrot logs."

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: Phase 3 of the encoder rewrite implementing const-generic Strategy dispatch to replace runtime dispatch with compile-time monomorphization.
Linked Issues check	✅ Passed	All major objectives from `#122` are satisfied: Strategy trait with 7 concrete types, const-generic compress_block dispatch, hot-path runtime branches eliminated, HcParseMode removed, all tests passing, and benchmarks show architectural foundation complete despite mixed perf gains.
Out of Scope Changes check	✅ Passed	All changes align with `#122` scope: CI/benchmark restructuring (fuzz job, bench-matrix, sharding, aggregation) directly supports the Phase 3 perf testing infrastructure. Test files probe implementation correctness. No unrelated changes detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/#122-phase3-const-generic-strategy

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Phase 3 of the #111 encoder rewrite. Replaces the runtime HcParseMode / MatcherBackend enum dispatch with const-generic Strategy monomorphisation. A new Strategy trait + 7 ZST implementors (Fast/Dfast/Greedy/Lazy/BtOpt/BtUltra/BtUltra2) lift the parser/finder choice into the type system; hot entry points (compress_block::<S>, start_matching_optimal::<S>, build_optimal_plan_impl::<S, …>, collect_optimal_candidates_initialized::<S, …>) are now generic and let the optimiser fold S::USE_BT / S::USE_HASH3 / S::OPT_LEVEL branches per monomorphisation. The runtime HcParseMode enum and MatcherBackend enum are deleted; BackendTag (in strategy) and a per-driver StrategyTag field resolved at reset() become the single source of truth. CI also splits the bench matrix into target × level shards with a new benchmark-aggregate job consolidating shards for merge-benchmarks.py.

Changes:

Introduce Strategy trait + 7 ZST strategies + StrategyTag; thread <S> through optimal-parser dispatch, SIMD wrappers, cost-profile builder, and BTULTRA2 seed-pass gate.
Delete HcParseMode and MatcherBackend; replace for_mode(mode, pass2) with const_for_strategy::<S>(); update all in-crate tests accordingly.
Split CI bench matrix per level (matrix.bench × matrix.level), add benchmark-aggregate job + aggregate-bench-levels.py to recombine shards.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
zstd/src/encoding/strategy.rs	New `Strategy` trait, 7 ZSTs, `BackendTag`, `StrategyTag::for_level/for_compression_level`
zstd/src/encoding/mod.rs	Module doc refresh referencing the new strategy module
zstd/src/encoding/match_generator.rs	Driver `strategy_tag`; `compress_block::<S>`; monomorphised optimal parser / BT-seed paths; `HcParseMode`/`MatcherBackend` removed; tests rewritten
zstd/src/encoding/cost_model/mod.rs	`for_mode` replaced by `const_for_strategy::<S>()`
zstd/src/encoding/match_table/storage.rs	Doc-only refresh of `is_btultra2`/`uses_bt` source
zstd/benches/support/mod.rs	New `level_filter_from_env` + `supported_levels_filtered`
zstd/benches/compare_ffi.rs	Use `supported_levels_filtered`
zstd/tests/{he_level22_ratio,incompressibility_falsepos,corpus_level22_gap}.rs	New `#[ignore]`'d diagnostic probes
.github/workflows/ci.yml	Bench matrix split per level + new `benchmark-aggregate` job
.github/scripts/aggregate-bench-levels.py	New aggregator combining per-level shards into per-target files

Comments suppressed due to low confidence (3)

zstd/src/encoding/match_generator.rs:3858

After migrating from for_mode(mode, pass2) to const_for_strategy::<S>(), p1 and p2 are now built from the identical expression and are guaranteed equal. The test no longer exercises any difference between the former "pass2 = false" and "pass2 = true" call sites — every assertion is trivially satisfied. Either consolidate to a single profile and drop the duplicate checks, or extend the test to exercise the cases the old API distinguished (e.g. by reading both BtUltra and BtUltra2 profiles).

    let p1 = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
    let p2 = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
    assert!(
        !p1.favor_small_offsets,
        "btultra2 primary profile should match donor opt2 offset pricing"
    );
    assert!(
        !p2.favor_small_offsets,
        "btultra2 secondary profile should match donor opt2 offset pricing"
    );
}

#[test]
fn btultra2_profile_is_single_pass_opt2() {
    let p1 = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
    let p2 = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
    assert_eq!(p1.max_chain_depth, p2.max_chain_depth);
    assert_eq!(p1.sufficient_match_len, p2.sufficient_match_len);
    assert_eq!(p1.accurate, p2.accurate);

zstd/src/encoding/match_generator.rs:4051

profile_predef and profile_dyn are now built from the identical const_for_strategy::<BtUltra2>() call, so the two profile values are equal. The original test distinguished for_mode(BtUltra2, false) from for_mode(BtUltra2, true). The only remaining difference between the two arms is stats.price_type (Predefined vs Dynamic), which is the actual intent — consider removing the now-duplicate profile_dyn binding (or renaming both to make the single-profile fact explicit) so a future reader doesn't mistake them for materially different profiles.

    let profile_predef = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
    let mut stats_predef = HcOptState::new();
    stats_predef.price_type = HcOptPriceType::Predefined;
    let predef_max = profile_predef.lit_length_price(&stats_predef, HC_BLOCKSIZE_MAX);
    let predef_prev =
        profile_predef.lit_length_price(&stats_predef, HC_BLOCKSIZE_MAX.saturating_sub(1));
    assert_eq!(
        predef_max,
        predef_prev + HC_BITCOST_MULTIPLIER,
        "predefined litLength pricing at BLOCKSIZE_MAX must add exactly one bit"
    );

    let profile_dyn = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();

zstd/src/encoding/match_generator.rs:2889

The local type S = super::strategy::BtUltra2; alias here is unused — both const_for_strategy::<...> calls below explicitly spell out super::strategy::BtUltra2, and build_optimal_plan::<S> is the only consumer of the alias. Either remove the alias (and inline BtUltra2 at the build_optimal_plan call site) or use it consistently for the const_for_strategy line too, so the seed-pass body stays uniform.

        // The seed pass is BtUltra2-exclusive by name (`is_btultra2()`
        // gates the only call site), so pin S to BtUltra2 here.
        type S = super::strategy::BtUltra2;
        let seed_profile = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();

codecov · 2026-05-14T20:14:32Z

Codecov Report

❌ Patch coverage is 92.20779% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
zstd/src/encoding/strategy.rs	91.30%	6 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/benches/support/mod.rs`:
- Around line 96-104: supported_levels_filtered currently ignores unknown tokens
from level_filter_from_env, which can silently produce empty benchmark sets;
update supported_levels_filtered to validate the requested `keep` names against
the available names from supported_levels() and fail fast if any unknowns are
present: compute the set/list of available names from supported_levels(),
compute unknowns = requested - available, and if unknowns is non-empty, abort
(panic or e.g. unwrap with a clear message) with a message listing the invalid
tokens and the allowed names (mentioning STRUCTURED_ZSTD_BENCH_LEVEL_FILTER in
the message for clarity); otherwise proceed to filter as before.

In `@zstd/src/encoding/match_generator.rs`:
- Around line 2601-2607: The configure() code in HcMatchGenerator collapses
StrategyTag variants via local booleans (uses_bt/is_btultra2), causing
start_matching() and downstream logic to treat non-BtUltra2 BT modes as BtOpt;
instead store the full StrategyTag on the HcMatchGenerator struct (add a field
like tag: StrategyTag), set that field in configure(), and update
uses_bt/is_btultra2 checks and any dispatch/assert logic in start_matching() and
the parser/optimizer code to branch on self.tag (the preserved StrategyTag) so
BtOpt, BtUltra and BtUltra2 remain distinct for OPT_LEVEL, ACCURATE_PRICE, and
FAVOR_SMALL_OFFSETS decisions.

In `@zstd/src/encoding/strategy.rs`:
- Around line 270-275: The match arm handling CompressionLevel::Level casts n to
u8 before clamping, causing values >= 256 to wrap; change the logic to clamp the
i32 value against the valid range (use
n.max(1).min(CompressionLevel::MAX_LEVEL)) and only then cast to u8 before
calling Self::for_level, ensuring CompressionLevel::MAX_LEVEL and
Self::for_level are used to determine the clamp bounds.

In `@zstd/tests/corpus_level22_gap.rs`:
- Around line 57-76: The loop currently stops after a fixed 256 iterations,
which truncates analysis for larger frames; change the loop control to iterate
until the frame-end or the existing "last" sentinel is hit instead of using the
0..256 cap: replace the "for _ in 0..256" construct so that the body that
computes h, last, bt, bs and advances off runs repeatedly while off + 3 <=
frame.len() (or use an unconditional loop with the same off+3 check and the
existing last==1 break), preserving the inner logic that updates
raw/rle/compressed/total and advances off.

In `@zstd/tests/he_level22_ratio.rs`:
- Around line 48-108: The tests duplicate zstd block-parsing logic
(dump_block_types in he_level22_ratio.rs vs block_breakdown in
corpus_level22_gap.rs); extract the common parser into a shared test utility
(e.g., zstd/tests/util.rs or zstd/tests/common.rs) exposing a single function
(name it e.g. parse_zstd_blocks or dump_block_types) that accepts the frame
bytes and a loop-limit or uses a termination condition instead of hard-coded 64,
preserve the existing return values (raw/rle/compressed/total_size) or printing
behavior, then replace the two local implementations with calls to the shared
function (update he_level22_ratio.rs and corpus_level22_gap.rs to call
parse_zstd_blocks/dump_block_types). Ensure the new shared function reproduces
the same offsets/fhd parsing logic and parameterize the previous loop cap to
avoid behavioral drift.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4b7a4bb4-ace5-4de1-acab-d7e0d91bf34d

📥 Commits

Reviewing files that changed from the base of the PR and between 2e9c99d and 18bb80d.

📒 Files selected for processing (12)

.github/scripts/aggregate-bench-levels.py
.github/workflows/ci.yml
zstd/benches/compare_ffi.rs
zstd/benches/support/mod.rs
zstd/src/encoding/cost_model/mod.rs
zstd/src/encoding/match_generator.rs
zstd/src/encoding/match_table/storage.rs
zstd/src/encoding/mod.rs
zstd/src/encoding/strategy.rs
zstd/tests/corpus_level22_gap.rs
zstd/tests/he_level22_ratio.rs
zstd/tests/incompressibility_falsepos.rs

Six review findings from PR #123: (1) HcMatchGenerator now mirrors the driver-resolved StrategyTag in a new `strategy_tag` field set by `configure()`. The `#[cfg(test)] start_matching` dispatcher branches on it, so BtOpt / BtUltra / BtUltra2 each reach their own monomorphisation instead of collapsing onto BtOpt whenever `is_btultra2 == false`. The matching `collect_optimal_candidates` helper does the same, with a legacy `hash3_log != 0` fallback for artificial tests that wire `hash3_log` directly without calling `configure()`. (2) `HcOptimalCostProfile::const_for_strategy::<S>()` opens with `debug_assert!(S::USE_BT, …)`. The `MAX_CHAIN_DEPTH` and `SUFFICIENT_MATCH_LEN` consts on `Fast` / `Dfast` / `Greedy` / `Lazy` carry placeholder values (the optimal parser entry point is unreachable from those strategies). The debug assert plus fresh doc comments on each non-BT strategy point at the placeholder contract. (3) `supported_levels_filtered()` panics when `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER` is non-empty but matches zero known levels, with a message listing every known level. CI matrix typos (e.g. renaming a level in `supported_levels()` without updating ci.yml) now fail loudly on the affected shard instead of silently producing an empty bench run and a missing regression alert downstream. (4) `StrategyTag::for_compression_level(Level(n))` clamps `n` in `i32` BEFORE casting to `u8`. The previous `(n as u8).min(22)` truncated values >= 256 (e.g. `Level(256)` wrapped to `0` and routed to `Dfast`); now `Level(i32::MAX)` saturates to BtUltra2 as expected. Regression test `for_compression_level_clamps_oversized_numeric_levels_to_btultra2` added. (5) `tests/common/mod.rs` hosts a shared `parse_zstd_block_breakdown` + `dump_block_breakdown` helper. Both `corpus_level22_gap.rs` and `he_level22_ratio.rs` route through it instead of carrying near-identical local copies (the corpus probe had a 256-iter cap, the high-entropy probe had a 64-iter cap — both replaced with a bounded `while off + 3 <= frame.len()` walk so large frames are no longer truncated). 461/461 nextest (+1 regression test for #4), clippy clean (default + --tests), fmt clean, build × 2. Refs #122 Phase 3.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

zstd/benches/support/mod.rs (1)

104-119: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when any requested level token is unknown.

Line 107 filters by intersection only, so STRUCTURED_ZSTD_BENCH_LEVEL_FILTER=default,typo still runs and silently drops typo. That can mask CI matrix mistakes and reduce intended shard coverage.

Suggested minimal fix

 pub(crate) fn supported_levels_filtered() -> Vec<LevelConfig> {
     let all = supported_levels();
     match level_filter_from_env() {
         None => all.to_vec(),
         Some(keep) => {
+            let known: Vec<&'static str> = all.iter().map(|cfg| cfg.name).collect();
+            let unknown: Vec<String> = keep
+                .iter()
+                .filter(|name| !known.iter().any(|k| k == &name.as_str()))
+                .cloned()
+                .collect();
+            if !unknown.is_empty() {
+                panic!(
+                    "STRUCTURED_ZSTD_BENCH_LEVEL_FILTER contains unknown level(s) \
+                     {unknown:?}; supported levels are {known:?}"
+                );
+            }
+
             let kept: Vec<LevelConfig> = all
                 .into_iter()
                 .filter(|cfg| keep.iter().any(|name| name == cfg.name))
                 .collect();
             if kept.is_empty() {
                 let known: Vec<&'static str> =
                     supported_levels().iter().map(|cfg| cfg.name).collect();

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/benches/support/mod.rs` around lines 104 - 119, The current Some(keep)
branch silently drops unknown tokens; update the logic in the Some(keep) arm
(the block that builds kept: Vec<LevelConfig>) to detect any requested names in
keep that do not match supported_levels() and panic immediately listing the
unknown tokens and known levels; specifically, compute the set/difference of
keep minus supported_levels().iter().map(|cfg|cfg.name) and if non-empty call
panic! with a clear message including the unknown tokens and the known levels,
otherwise continue returning kept as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/tests/common/mod.rs`:
- Around line 48-50: The code computes header size into `off` and then proceeds
to parse blocks, but lacks a pre-check that the `frame` contains those header
bytes; update the function so that immediately after computing `let mut off = 5
+ window_descriptor_len + dict_id_len + fcs_len;` you check `if frame.len() <
off { return None; }` (or equivalent) so truncated frame headers return None
instead of `Some(BlockBreakdown::default())`; ensure this change affects the
same parsing flow that enters the `while off + 3 <= frame.len()` loop and
preserves existing uses of `BlockBreakdown::default()` for other cases.

---

Duplicate comments:
In `@zstd/benches/support/mod.rs`:
- Around line 104-119: The current Some(keep) branch silently drops unknown
tokens; update the logic in the Some(keep) arm (the block that builds kept:
Vec<LevelConfig>) to detect any requested names in keep that do not match
supported_levels() and panic immediately listing the unknown tokens and known
levels; specifically, compute the set/difference of keep minus
supported_levels().iter().map(|cfg|cfg.name) and if non-empty call panic! with a
clear message including the unknown tokens and the known levels, otherwise
continue returning kept as before.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: aebed1ce-de63-4c52-9b32-33de83cd28c0

📥 Commits

Reviewing files that changed from the base of the PR and between 18bb80d and 372ef50.

📒 Files selected for processing (7)

zstd/benches/support/mod.rs
zstd/src/encoding/cost_model/mod.rs
zstd/src/encoding/match_generator.rs
zstd/src/encoding/strategy.rs
zstd/tests/common/mod.rs
zstd/tests/corpus_level22_gap.rs
zstd/tests/he_level22_ratio.rs

sw-release-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'structured-zstd vs C FFI (x86_64-gnu)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.30.

Benchmark suite	Current: `372ef50`	Previous: `2e9c99d`	Ratio
`compress/default/low-entropy-1m/matrix/pure_rust`	`18.293` ms	`13.554` ms	`1.35`

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

polaz · 2026-05-14T20:44:01Z

@coderabbitai Re: the STRUCTURED_ZSTD_BENCH_LEVEL_FILTER duplicate finding — fixed in HEAD. supported_levels_filtered now panics for ANY unknown token in the filter list (not only when every token is unknown), so a partial typo like default,typo fails loudly with a message listing the supported levels.

coderabbitai · 2026-05-14T20:44:19Z

Tip

For best results, initiate chat on the files or code changes.

@polaz: Thanks for the update — the stricter per-token validation is the right call. Panicking on any unrecognised token (rather than only when the entire set matches nothing) means a partially-typo'd filter like default,typo won't silently run just the recognised subset and give misleading CI bench results. I'll note this for future reviews.

🐰 (nods approvingly, ears perked)

✏️ Learnings added

Learnt from: polaz
URL: https://github.com/structured-world/structured-zstd/pull/123

Timestamp: 2026-05-14T20:44:18.095Z
Learning: In `zstd/benches/support/mod.rs`, `supported_levels_filtered()` (controlled by `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER` env var) panics for ANY unrecognised token in the comma-separated filter list — not only when all tokens are unknown. A partial typo such as `default,typo` causes a loud failure that lists all supported level names.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

…matches Four follow-up findings from PR #123: - The two `btultra2_profile_*` cost-model tests built `p1` and `p2` with identical `const_for_strategy::<BtUltra2>()` calls (the `pass2: bool` parameter that distinguished them in the runtime `for_mode` API is gone), so `assert_eq!(p1.field, p2.field)` became a tautology. Collapsed `btultra2_profile_disables_small_offset_handicap` to a single profile binding asserting both the small-offset flag and the accurate-pricing flag; deleted `btultra2_profile_is_single_pass_opt2` outright (the only invariant left was the now-trivial reflexive equality). - The `MatcherBackend` → `BackendTag` consolidation made `match self.active_backend { Simple => Simple, … }` an identity mapping. Replaced both the `reset()` debug_assert and the matching `driver_reset_keeps_strategy_tag_in_sync_with_active_backend` test scaffolding with a direct `assert_eq!(strategy_tag.backend(), active_backend, …)`. - `run_btultra2_seed_pass` declared `type S = BtUltra2;` but only used the alias in the `build_optimal_plan::<S>` call site, while the adjacent `const_for_strategy::<super::strategy::BtUltra2>()` spelled the type out. Routed the cost-profile call through `S` too so the seed-pass body stays uniform. - `tests/common::parse_zstd_block_breakdown` returns `None` on truncated frame headers (`frame.len() < header_len`) — without the guard a 4-byte-magic-only input returned `Some(BlockBreakdown::default())`, contradicting the docstring. - `supported_levels_filtered()` panics for *any* unknown token in `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER`, not just for the "every token unknown" case. Partial typos (`default,typo`) previously silently dropped the bad token — they now fail loudly with a message listing the supported levels. Refs #122 Phase 3.

CI bench matrix used to rebuild the `compare_ffi` criterion binary on every shard (3 targets × 6 levels = 18 cargo builds, each ~4-7 min on the GitHub runner). Phase-2 cache hits brought it down but the cold-cache penalty was still ~120-150 min of accumulated build time per CI run. New layout: - `bench-build` matrix (3 jobs, one per target) builds the criterion binary with `cargo bench --no-run --message-format=json`, locates the resolved executable via `jq`, and uploads it as `bench-binary-<target>`. ~20 min budget per build, retention 7 days. - `benchmark` matrix (3 × 6 = 18 shards) now `needs: bench-build`, downloads the matching `bench-binary-<target>` artifact, marks it executable, and points `run-benchmarks.sh` at it through a new `STRUCTURED_ZSTD_BENCH_BIN` env var. - `run-benchmarks.sh` re-execs `$STRUCTURED_ZSTD_BENCH_BIN --bench --output-format bencher` when the var is set; otherwise it falls back to the original `cargo bench` path (so local invocations and any non-CI use stay unchanged). - Shard runners no longer install the rust toolchain or run `cargo` at all — only the target-specific runtime packages (`gcc-multilib`/`musl-tools`) needed to execute a non-host-arch binary. - The `decodecorpus_files/z000033` fixture is referenced via `env!("CARGO_MANIFEST_DIR")` (compile-time absolute path). GitHub Actions always checks out into `$GITHUB_WORKSPACE` which is identical across all `ubuntu-latest` runners, so the embedded path resolves on every shard. Smoke-tested locally: copying the host-target bench binary to `/tmp/compare_ffi_bin` and running `STRUCTURED_ZSTD_BENCH_BIN=... bash run-benchmarks.sh` re-executed the binary directly, no cargo invocation. Expected wall-clock impact: each shard drops ~5 min of build, leaving the criterion measurement itself as the dominant cost. i686-level22 (the bottleneck shard) was ~30 min, now closer to ~20-22 min — full CI bench matrix should land around ~25 min end-to-end including the 3 bench-build jobs running in parallel with the lint/test phase. Refs #122 Phase 3.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

zstd/src/encoding/match_generator.rs (2)
200-207: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Collapse strategy resolution to a single source of truth.

params.backend comes from LEVEL_TABLE, while strategy_tag is recomputed separately via StrategyTag::for_compression_level(level). The Line 666 assert only checks backend-family agreement, so a BtOpt/BtUltra/BtUltra2 drift still passes because all three are HashChain. That would silently combine one level’s HC config with another strategy’s const-gated pricing/hash3 behavior. Prefer carrying the resolved StrategyTag in LevelParams (or deriving BackendTag from it) so reset() consumes one mapping instead of two.

Also applies to: 655-670
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/src/encoding/match_generator.rs` around lines 200 - 207, LevelParams
currently stores backend (params.backend) from LEVEL_TABLE while reset()
separately recomputes strategy_tag via
StrategyTag::for_compression_level(level), allowing mismatches between backend
variants that share a family (e.g., BtOpt/BtUltra/BtUltra2) to slip through;
change LevelParams to carry the resolved StrategyTag instead of raw backend (or
derive backend via StrategyTag), update places that read params.backend to use
params.strategy_tag (or strategy_tag.backend()), and make reset() consume the
single mapping (using the stored StrategyTag) so all config (hc, pricing/hash3
gating, etc.) is derived from one authoritative strategy value rather than two
sources.
3234-3302: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Drop the test-only hash3_log escape hatch.

The has_hash3 => BtUltra2 branch makes this shim less strict than production: tests can get BtUltra2 hash3 behavior without ever calling configure() or setting a matching strategy_tag. That weakens the new contract you just introduced and can let fixture drift pass unnoticed. I’d rather require tests to configure the matcher the same way production does and dispatch purely from self.strategy_tag.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/src/encoding/match_generator.rs` around lines 3234 - 3302, The test-only
escape hatch using table.hash3_log must be removed so dispatch in
collect_optimal_candidates is purely driven by self.strategy_tag; delete the
has_hash3 check and the branch that maps Fast|Dfast|Greedy|Lazy when has_hash3
=> to BtUltra2, and have those tags directly call
collect_optimal_candidates_initialized::<strategy::Lazy, false> (or the
appropriate strategy) while the BT tags (StrategyTag::BtUltra2, ::BtUltra,
::BtOpt) keep their existing collect_optimal_candidates_initialized::<..., true>
calls; update any test fixtures to call configure() so they set strategy_tag
instead of relying on table.hash3_log.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 200-207: LevelParams currently stores backend (params.backend)
from LEVEL_TABLE while reset() separately recomputes strategy_tag via
StrategyTag::for_compression_level(level), allowing mismatches between backend
variants that share a family (e.g., BtOpt/BtUltra/BtUltra2) to slip through;
change LevelParams to carry the resolved StrategyTag instead of raw backend (or
derive backend via StrategyTag), update places that read params.backend to use
params.strategy_tag (or strategy_tag.backend()), and make reset() consume the
single mapping (using the stored StrategyTag) so all config (hc, pricing/hash3
gating, etc.) is derived from one authoritative strategy value rather than two
sources.
- Around line 3234-3302: The test-only escape hatch using table.hash3_log must
be removed so dispatch in collect_optimal_candidates is purely driven by
self.strategy_tag; delete the has_hash3 check and the branch that maps
Fast|Dfast|Greedy|Lazy when has_hash3 => to BtUltra2, and have those tags
directly call collect_optimal_candidates_initialized::<strategy::Lazy, false>
(or the appropriate strategy) while the BT tags (StrategyTag::BtUltra2,
::BtUltra, ::BtOpt) keep their existing
collect_optimal_candidates_initialized::<..., true> calls; update any test
fixtures to call configure() so they set strategy_tag instead of relying on
table.hash3_log.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e80dbeff-6d2b-4cfb-ac61-a2791db2acd1

📥 Commits

Reviewing files that changed from the base of the PR and between 372ef50 and 910db7b.

📒 Files selected for processing (3)

zstd/benches/support/mod.rs
zstd/src/encoding/match_generator.rs
zstd/tests/common/mod.rs

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 261-272: The "Upload benchmark shard artifacts" step is missing a
retention setting and should match the 7-day retention used elsewhere; update
the actions/upload-artifact@v4 step named "Upload benchmark shard artifacts" to
include a retention-days: 7 value under its with block so the intermediate
benchmark shard artifacts (e.g., benchmark-results.${{ matrix.bench.id }}.${{
matrix.level }}.json and the other matrix-based files) are kept for 7 days only.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c40c2d25-3823-46de-a20b-23fd16d6e110

📥 Commits

Reviewing files that changed from the base of the PR and between 910db7b and ee62ec0.

📒 Files selected for processing (2)

.github/scripts/run-benchmarks.sh
.github/workflows/ci.yml

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated no new comments.

Two more findings from PR #123 review: (A) `LevelParams` now carries `strategy_tag: StrategyTag` instead of a stand-alone `backend: BackendTag`. The backend family is derived via `LevelParams::backend() -> StrategyTag::backend()` so there is no second authoritative mapping from `CompressionLevel` to backend, and `BtOpt`/`BtUltra`/`BtUltra2` drift inside the `HashChain` family is no longer possible (the pre-fix `params.backend == HashChain` check was equally happy with any of the three). `MatchGeneratorDriver::reset()` reads `params.strategy_tag` directly and drops the parallel `StrategyTag::for_compression_level(level)` call, eliminating the drift surface the review flagged. (B) The `#[cfg(test)] HcMatchGenerator::collect_optimal_candidates` dispatcher no longer carries the `hash3_log != 0 => BtUltra2` escape hatch. Tests that wired `hash3_log` directly without calling `configure()` are gone or migrated: - `hc_collect_optimal_candidates_uses_hash3_when_chain_depth_zero` - `hc_collect_optimal_candidates_hash3_updates_skipped_prefix_positions` - `hc_hash3_tail_match_advances_update_cursor_on_early_return` All three forced `search_depth = 0` together with `hash3_log != 0` — an HC-chain-walker-only fixture state that production never reaches (hash3 is BtUltra2-exclusive and BtUltra2 always uses `search_depth = 512`). The end-to-end hash3 invariants stay covered by the level22 roundtrip + donor parity ratio gate. Plus the inline #13 nit: `retention-days: 7` on the benchmark shard upload-artifact step. LEVEL_TABLE entries migrate from `backend: BackendTag::*` to `strategy_tag: StrategyTag::{Fast,Dfast,Greedy,Lazy,BtOpt,BtUltra, BtUltra2}` per band. `resolve_level_params` override sites (`Uncompressed`, negative levels, level22 source-size override) emit a `strategy_tag` directly. 457/457 nextest (was 460, minus 3 deleted tests), clippy clean (default + tests + benches), fmt clean, build × 2. Refs #122 Phase 3.

polaz · 2026-05-14T21:31:57Z

@coderabbitai Re: the two outside-diff findings on match_generator.rs — both fixed in b63a37c. (A) LevelParams now carries strategy_tag instead of backend, with backend() derived via StrategyTag::backend(); reset() drops the parallel for_compression_level call so the level→strategy mapping has a single authoritative source. (B) The hash3_log escape hatch is removed; the three tests that depended on the impossible (uses_bt=false, hash3_log!=0) state are deleted — those donor-parity invariants are now covered end-to-end by the level22 ratio gate.

coderabbitai · 2026-05-14T21:32:20Z

Tip

For best results, initiate chat on the files or code changes.

@polaz: Both fixes land cleanly:

(A) Deriving backend() from strategy_tag via StrategyTag::backend() is the right consolidation — a single call-site in reset() is all that should ever decide level → strategy, and having backend as a separate stored field was exactly the kind of drift risk that causes subtle mismatches under future edits.
(B) Deleting the hash3_log escape hatch alongside the three tests that exercised the (uses_bt=false, hash3_log!=0) impossible state is the correct resolution. Keeping dead-state tests alive would only signal false confidence; the level22 ratio gate is the right integration harness for this class of donor-parity invariant.

Thanks for the clear commit reference.

✏️ Learnings added

Learnt from: polaz
URL: https://github.com/structured-world/structured-zstd/pull/123

Timestamp: 2026-05-14T21:32:19.114Z
Learning: In `zstd/src/encoding/match_generator.rs`, `LevelParams` carries `strategy_tag: StrategyTag` (not `backend` directly); `backend()` is always derived via `StrategyTag::backend()`. `reset()` has a single authoritative `for_compression_level` call so the level→strategy mapping has one source of truth — there is no parallel `for_compression_level` call elsewhere. Fixed in commit b63a37cd on PR `#123`.

Learnt from: polaz
URL: https://github.com/structured-world/structured-zstd/pull/123

Timestamp: 2026-05-14T21:32:19.114Z
Learning: In `zstd/src/encoding/match_generator.rs`, the `hash3_log` escape hatch (which allowed `hash3_log != 0` when `uses_bt=false`) was removed as an impossible state. The three tests that depended on that `(uses_bt=false, hash3_log!=0)` configuration were deleted; donor-parity invariants for hash3 behaviour are now covered end-to-end by the level22 ratio gate in `zstd/tests/`. Fixed in commit b63a37cd on PR `#123`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

The repo ships a `zstd/fuzz/` setup (5 targets — decode / encode / interop / huff0 / fse) and a populated `artifacts/` regression corpus, but the CI workflow never invoked any of it. Past crashes could resurface and be merged without anything catching them at PR time. This commit wires `cargo-fuzz` into CI alongside the benchmark matrix: - Matrix of 5 fuzz targets, each shard runs on its own `ubuntu-latest` runner so total wall-clock is bounded by the slowest single target. - `taiki-e/install-action@v2` installs `cargo-fuzz`; nightly Rust toolchain is required (libFuzzer instrumentation). - Replay phase: every artifact in `zstd/fuzz/artifacts/<target>/` is fed through `cargo fuzz run <target> <files>` first. Re-introducing a previously fixed crash fails CI on the original reproducer. - Fresh fuzzing phase: `cargo fuzz run <target> -- -max_total_time=90 -timeout=30`. libFuzzer's own time cap is the inner budget; the `timeout-minutes: 10` on the job is the outer ceiling. - On failure the shard uploads `zstd/fuzz/artifacts/<target>/` as `fuzz-<target>-artifacts` (14-day retention) so the new reproducer survives the runner cleanup. Refs #122 Phase 3.

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated no new comments.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 141-150: The duplicate target lists (the bench entries with ids
x86_64-gnu, i686-gnu, x86_64-musl plus the build matrix, AGGREGATE_TARGETS,
per-target upload steps and per-target pages downloads) should be centralized
into a single source of truth and referenced everywhere; create one canonical
definition (e.g., a YAML anchor or a single matrix/env variable named
AGGREGATE_TARGET_LIST or targets_anchor) that contains the three target objects
(with target_triple and setup_command) and then replace the repeated inline
lists in the build matrix, benchmark matrix, AGGREGATE_TARGETS expansion, upload
steps, and pages-download steps to reference that single definition so future
target changes only need to be made once.
- Around line 213-216: The CI shard configuration for id: x86_64-musl currently
installs build-only packages in setup_command; split build-time and runtime
steps by moving compilation/tooling installs (e.g., "musl-tools") into a
dedicated setup_command used for build shards and add a separate
runtime_setup_command for shards that only execute the prebuilt binary
(STRUCTURED_ZSTD_BENCH_BIN) so runtime shards avoid installing musl-tools;
update the x86_64-musl entry to use setup_command for build dependencies and
runtime_setup_command for only the packages required to run the artifact.

In `@zstd/src/encoding/match_generator.rs`:
- Around line 413-418: The migration comments around the compile-time strategy
tag are stale: they still talk about future work (compress_block::<S> dispatch,
parse-mode removal, and a debug_assert! in reset()) even though the refactor has
landed; update the comment blocks near the strategy/tag explanation and the
other noted ranges (including mentions around reset(), active_backend, and
driver/HC invariants) to accurately describe the current implementation (that
compress_block::<S> dispatch has been implemented, parse-mode is removed, and
there is no debug_assert! in reset()), and replace speculative wording with
concrete statements about the existing invariants between the strategy tag and
active_backend and how the driver/HC invariants are maintained.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 76f332f3-3b93-49eb-911a-eaa266e62801

📥 Commits

Reviewing files that changed from the base of the PR and between ee62ec0 and b63a37c.

📒 Files selected for processing (2)

.github/workflows/ci.yml
zstd/src/encoding/match_generator.rs

- zstd/fuzz/Cargo.toml had no `edition` field, defaulting to 2015 where `use structured_zstd::...` requires `extern crate`; targets failed to compile with `unresolved module or unlinked crate`. - The root `rust-toolchain.toml` pins stable, overriding `dtolnay/rust-toolchain@nightly` once `cargo fuzz` shells out to `cargo build` inside the fuzz crate. `cargo fuzz` needs `-Z sanitizer` which stable rejects. Set `RUSTUP_TOOLCHAIN: nightly` at job env so the override survives the sub-cargo invocation. Verified locally: all 5 targets build, `cargo fuzz run fse` completed 62k runs in 10s with no crashes.

- New `bench-matrix` job emits the canonical bench targets JSON; `bench-build` and `benchmark` consume it via `fromJSON(needs.bench-matrix.outputs.targets)`. - Each target now carries separate `build_setup` (run only on `bench-build`) and `runtime_setup` (run only on `benchmark` shards), so `x86_64-musl` no longer reinstalls `musl-tools` on every runtime shard. - `benchmark-aggregate` ships a single `benchmark-aggregated` artifact with per-target consolidated files; `benchmark-pages` downloads that one artifact instead of three named ones. `merge-benchmarks.py` rglob's so the layout change is transparent. - `AGGREGATE_TARGETS` env now sourced from `needs.bench-matrix.outputs.ids_csv`. Also: refresh stale migration comments in `match_generator.rs` that described the const-generic dispatch as future work (the refactor has landed; `parse_mode` field is retained only for in-crate test callers, strategy-shaped predicates fold at codegen time).

`encode_szstd_uncompressed` and `encode_szstd_compressed` drained the input `Read` into a local buffer and then passed the now-empty `Read` into `compress_to_vec`, so the compression branches of the interop roundtrip silently encoded zero bytes. The two regression artifacts shipped under `artifacts/interop/` triggered the assertion `decoded ([]) != data ([0])` on every CI replay. - Take `&[u8]` directly instead of the drain-then-discard `Read` dance. - `encode_szstd_compressed` now uses `CompressionLevel::Default` instead of being a verbatim duplicate of the uncompressed variant. Verified: both regression artifacts replay cleanly under `cargo fuzz run interop artifacts/interop/*`.

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated no new comments.

The `cargo-fuzz` binary installed via `taiki-e/install-action@v2` is statically linked against musl on Linux; its `default_target()` probe picks `x86_64-unknown-linux-musl`, which then fails because AddressSanitizer cannot link against a static libc: error: sanitizer is incompatible with statically linked libc, disable it using `-C target-feature=-crt-static` error[E0463]: can't find crate for `core` (The musl rust-std is not installed on the runner either.) Pass `--target x86_64-unknown-linux-gnu` explicitly via a job-level `FUZZ_TARGET_TRIPLE` env so both the corpus-replay step and the fresh-fuzz step bypass the probe and link against glibc, which is what the dtolnay/rust-toolchain@nightly install has stdlib for.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

zstd/src/encoding/match_generator.rs (1)
4589-4595: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Build this fixture through configure() instead of mutating strategy flags by hand.

After the strategy-tag refactor, BtUltra2 implies a matching hash3 setup. Flipping strategy_tag/is_btultra2/uses_bt here leaves hash3_log and hash3_table in the old state, so this test now exercises a configuration production never constructs and can trip the new debug invariant in collect_optimal_candidates_initialized_body.

Also applies to: 4611-4615
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/src/encoding/match_generator.rs` around lines 4589 - 4595, The test
constructs an HC fixture by manually mutating strategy_tag/is_btultra2/uses_bt
which leaves hash3_log/hash3_table inconsistent and triggers the debug invariant
in collect_optimal_candidates_initialized_body; instead build the fixture via
the existing configure() path so all derived fields are set consistently for
StrategyTag::BtUltra2 (e.g., call the configure() helper or the HC::configure
method with BtUltra2 rather than setting hc.strategy_tag, hc.table.is_btultra2,
hc.table.uses_bt, etc.), and apply the same change to the other instance around
the 4611-4615 region so collect_optimal_candidates sees a properly initialized
configuration.

♻️ Duplicate comments (1)

zstd/src/encoding/match_generator.rs (1)
2738-2748: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Remove the stale self.parse_mode reference from this doc comment.

HcMatchGenerator no longer has a parse_mode field, so this paragraph now documents a state that does not exist. Point it at the test-only start_matching() / strategy_tag dispatcher instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/src/encoding/match_generator.rs` around lines 2738 - 2748, The doc
comment incorrectly references a non-existent self.parse_mode field on
HcMatchGenerator; remove that sentence and update the text to say the runtime
dispatch is now via the test-only inherent method
HcMatchGenerator::start_matching() which uses the strategy_tag dispatcher (not
parse_mode), and keep the rest describing that
MatchGeneratorDriver::compress_block branches on the compile-time S::USE_BT so
monomorphisations see either start_matching_lazy/start_matching_optimal as
appropriate.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 4589-4595: The test constructs an HC fixture by manually mutating
strategy_tag/is_btultra2/uses_bt which leaves hash3_log/hash3_table inconsistent
and triggers the debug invariant in collect_optimal_candidates_initialized_body;
instead build the fixture via the existing configure() path so all derived
fields are set consistently for StrategyTag::BtUltra2 (e.g., call the
configure() helper or the HC::configure method with BtUltra2 rather than setting
hc.strategy_tag, hc.table.is_btultra2, hc.table.uses_bt, etc.), and apply the
same change to the other instance around the 4611-4615 region so
collect_optimal_candidates sees a properly initialized configuration.

---

Duplicate comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 2738-2748: The doc comment incorrectly references a non-existent
self.parse_mode field on HcMatchGenerator; remove that sentence and update the
text to say the runtime dispatch is now via the test-only inherent method
HcMatchGenerator::start_matching() which uses the strategy_tag dispatcher (not
parse_mode), and keep the rest describing that
MatchGeneratorDriver::compress_block branches on the compile-time S::USE_BT so
monomorphisations see either start_matching_lazy/start_matching_optimal as
appropriate.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 05f44548-751d-4702-8206-aad701f2de38

📥 Commits

Reviewing files that changed from the base of the PR and between ee62ec0 and bd02dc6.

📒 Files selected for processing (4)

.github/workflows/ci.yml
zstd/fuzz/Cargo.toml
zstd/fuzz/fuzz_targets/interop.rs
zstd/src/encoding/match_generator.rs

`btultra_and_btultra2_both_keep_dictionary_candidates` set `hc.strategy_tag`, `hc.table.is_btultra2`, `hc.table.uses_bt`, `hc.backend` and `hc.hc.search_depth` by hand to flip the matcher between BtUltra2 and BtUltra between the two `collect_optimal_candidates` calls. After the strategy-tag refactor, BtUltra2 implies a non-zero `hash3_log` and an allocated `hash3_table`; the hand-mutated path left those fields in their default zero shape, so the test exercised a configuration the production `configure()` path never constructs and risked tripping the `Strategy::USE_HASH3 ⇒ hash3_log != 0` debug invariant in `collect_optimal_candidates_initialized_body`. The fixture now goes through `HcMatchGenerator::configure()` for each phase with explicit `HcConfig` + `StrategyTag` + window_log inputs. The history-prime closure is reused so both phases see identical content and dictionary-limit state. Also drop a stale `self.parse_mode` reference in the doc comment for `start_matching_strategy` and a similar stale reference next to `HcOptimalCostProfile::const_for_strategy` — the field was removed in the parse-mode rip-out commit and the comments now describe the test-only inherent `start_matching`'s `strategy_tag` dispatcher.

The 5-way matrix paid a fresh asan+sancov build of `structured-zstd` (~2 min) on every shard — the dominant cost — for ~3 min of wall- clock parallelism. Wall-clock is already bounded by the bench matrix (~25 min) which runs in parallel, so the matrix split was buying nothing useful. Run all five targets sequentially in one runner instead: the first target builds, the remaining four reuse cargo's incremental cache (~seconds each). Halves the compute spend while keeping each target's output grouped under `::group::` markers for log navigation, and the fresh-fuzz step still respects the 90s libFuzzer budget per target.

Copilot

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated no new comments.

polaz added 11 commits May 14, 2026 19:31

Copilot AI review requested due to automatic review settings May 14, 2026 20:09

Copilot started reviewing on behalf of polaz May 14, 2026 20:10 View session

style: wrap compare_ffi imports for rustfmt

18bb80d

Copilot AI reviewed May 14, 2026

View reviewed changes

Comment thread zstd/src/encoding/match_generator.rs Outdated

Comment thread zstd/benches/support/mod.rs Outdated

Comment thread zstd/src/encoding/strategy.rs Outdated

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread zstd/benches/support/mod.rs Outdated

Comment thread zstd/src/encoding/match_generator.rs

Comment thread zstd/src/encoding/strategy.rs Outdated

Comment thread zstd/tests/corpus_level22_gap.rs Outdated

Comment thread zstd/tests/he_level22_ratio.rs Outdated

polaz requested a review from Copilot May 14, 2026 20:32

Copilot started reviewing on behalf of polaz May 14, 2026 20:34 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

Comment thread zstd/src/encoding/match_generator.rs Outdated

Comment thread zstd/src/encoding/match_generator.rs Outdated

Comment thread zstd/src/encoding/match_generator.rs Outdated

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread zstd/tests/common/mod.rs Outdated

sw-release-bot Bot reviewed May 14, 2026

View reviewed changes

polaz added 2 commits May 14, 2026 23:45

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml

Copilot AI reviewed May 14, 2026

View reviewed changes

This was referenced May 14, 2026

feat(encoding): #111 Phase 3 — const-generic Strategy dispatch #122

Closed

feat: encoder architecture rewrite (split monolith, const-generic Strategy, arena allocator) #111

Open

polaz requested a review from Copilot May 14, 2026 21:38

Copilot started reviewing on behalf of polaz May 14, 2026 21:39 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml Outdated

Comment thread .github/workflows/ci.yml Outdated

Comment thread zstd/src/encoding/match_generator.rs Outdated

polaz added 3 commits May 15, 2026 00:42

polaz requested a review from Copilot May 14, 2026 21:57

Copilot started reviewing on behalf of polaz May 14, 2026 21:58 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

polaz added 2 commits May 15, 2026 01:11

polaz requested a review from Copilot May 14, 2026 22:13

Copilot started reviewing on behalf of polaz May 14, 2026 22:14 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

polaz merged commit b486614 into main May 14, 2026
37 checks passed

polaz deleted the feat/#122-phase3-const-generic-strategy branch May 14, 2026 22:18

sw-release-bot Bot mentioned this pull request May 14, 2026

chore: release v0.0.21 #115

Closed

polaz mentioned this pull request May 14, 2026

feat(encoding): #111 Phase 4 — donor fast paths + Matcher trait #124

Open

8 tasks

sw-release-bot Bot mentioned this pull request May 15, 2026

chore: release v0.0.21 #138

Open

coderabbitai Bot mentioned this pull request May 15, 2026

perf(encoding): #18 Phase 5 — LDM producer (gear hash + bucket table + search/emit) #139

Open

6 tasks

Conversation

polaz commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architectural mechanism

Commit map (20 commits)

Perf delta vs perf/post-pr-121-baseline (host x86_64-darwin, best of 5)

Architectural payoff

Verification

Acceptance criteria from #122

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sw-release-bot Bot left a comment

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

polaz commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

polaz commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

coderabbitai Bot left a comment

polaz commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Perf delta vs `perf/post-pr-121-baseline` (host x86_64-darwin, best of 5)

coderabbitai Bot commented May 14, 2026 •

edited

Loading

codecov Bot commented May 14, 2026 •

edited

Loading