Skip to content

perf(encoding): #111 Phase 3 — const-generic Strategy dispatch#123

Merged
polaz merged 23 commits into
mainfrom
feat/#122-phase3-const-generic-strategy
May 14, 2026
Merged

perf(encoding): #111 Phase 3 — const-generic Strategy dispatch#123
polaz merged 23 commits into
mainfrom
feat/#122-phase3-const-generic-strategy

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented May 14, 2026

Summary

Phase 3 of the #111 encoder architecture rewrite — replace runtime enum HcParseMode + MatcherBackend dispatch with const-generic Strategy monomorphisation. Closes #122.

After Phase 1 (#119) restructured the encoder into per-backend types and Phase 2 (#121) removed hot-path saturating_*, the residual default-level / level-22 perf gap was hypothesised to come from runtime match self.parse_mode / match self.active_backend branches firing on every encoded position. This phase lifts those choices into the type system: each level → Strategy mapping is now a concrete ZST that implements the Strategy trait. Hot-path entry points are generic over S: Strategy, the compiler monomorphises the inner loops per variant, and every if S::FOO branch resolves at codegen time.

Architectural mechanism

Matcher::start_matching                              // 7-arm match on StrategyTag (per block)
 └─ compress_block::<S>                              // S::BACKEND const match
     ├─ Simple/Dfast/Row                             // backends without parse_mode
     └─ HcMatchGenerator::start_matching_strategy::<S>
         ├─ S::USE_BT == false → start_matching_lazy
         └─ S::USE_BT == true  → start_matching_optimal::<S>
             ├─ HcOptimalCostProfile::const_for_strategy::<S>()
             ├─ should_run_btultra2_seed_pass::<S>           // const false unless S = BtUltra2
             └─ build_optimal_plan::<S>
                 └─ build_optimal_plan_impl::<S, ACC, FAV>
                     └─ SIMD wrapper::<S, ACC, FAV>
                         └─ build_optimal_plan_impl_body!(S)
                             ├─ S::OPT_LEVEL == 0  → abort_on_worse_match
                             ├─ S::OPT_LEVEL >= 2  → opt_level (refined)
                             └─ $collect::<S, true>
                                 └─ collect_optimal_candidates_initialized_body!(S)
                                     └─ S::USE_HASH3 → hash3 lookup (const-gated)

LevelParams is the single source of truth from CompressionLevel to strategy: it carries strategy_tag: StrategyTag, and the runtime BackendTag consumed by the driver dispatcher is derived via StrategyTag::backend(). reset() reads params.strategy_tag directly — there is no second for_compression_level(level) recomputation that could drift.

Commit map (20 commits)

Core dispatch (commits 1-10):

  1. c7c0c76eStrategy trait + 7 ZST types (Fast / Dfast / Greedy / Lazy / BtOpt / BtUltra / BtUltra2). Associated consts: BACKEND, MIN_MATCH, ACCURATE_PRICE, FAVOR_SMALL_OFFSETS, USE_HASH3, USE_BT, OPT_LEVEL, MAX_CHAIN_DEPTH, SUFFICIENT_MATCH_LEN. StrategyTag::for_level(u8) mirrors LEVEL_TABLE row-for-row.
  2. 46b0ef94 — plumb StrategyTag into MatchGeneratorDriver.
  3. e27a2e96compress_block<S> generic entry on the driver.
  4. 5c5adfafS::USE_BT replaces match parse_mode in HC dispatch.
  5. 8f599fba — DP body const-folds abort_on_worse_match / opt_level from S::OPT_LEVEL. Primary hot-path commit.
  6. d5d29ac1 — diagnostic perf probes (#[ignore]'d).
  7. dd6f23cdS::USE_HASH3 const-gates the hash3 lookup.
  8. d8355d16 — btultra2 seed pass gated on S::OPT_LEVEL && S::USE_HASH3.
  9. 23839074HcOptimalCostProfile::const_for_strategy<S>().
  10. 30f1c9a7 — full rip-out of HcParseMode + MatcherBackend runtime enums, the parse_mode field on HcConfig and HcMatchGenerator, Strategy::PARSE_MODE, the runtime for_mode(HcParseMode, bool) API.

CI infrastructure (commits 11-12, 16, 19):

  1. 97a8ab28 — split bench matrix per level (target × level = 18 shards) + benchmark-aggregate job; ~25 min wall-clock.
  2. 18bb80d3cargo fmt --check fix.
  3. ee62ec0a — separate bench-build (3 jobs) from benchmark (18 shards). Shards download a pre-built binary via STRUCTURED_ZSTD_BENCH_BIN; no cargo invocation in shards. Saves ~5 min per shard.
  4. 79f0955c — bench target inventory centralised behind a new bench-matrix job that emits the canonical JSON; bench-build, benchmark, benchmark-aggregate, benchmark-pages all consume fromJSON(needs.bench-matrix.outputs.targets). Split per-target build_setup (run only on bench-build) from runtime_setup (run only on benchmark shards) — x86_64-musl no longer reinstalls musl-tools on every runtime shard. benchmark-aggregate ships a single combined benchmark-aggregated artifact, replacing the three hardcoded per-target uploads. Also refreshes stale migration comments in match_generator.rs (the const-generic dispatch had landed but the comments still framed it as future work).

Post-review hardening (commits 13-15, 17-18, 20):

  1. 372ef501strategy_tag field on HcMatchGenerator (mirrored from driver during configure()). Test-only dispatchers route BtOpt/BtUltra/BtUltra2 to distinct monomorphisations instead of collapsing. Level(n) clamp moved to i32 before u8 cast (Level(256) no longer wraps to Dfast); regression test added. tests/common::dump_block_breakdown shared parser, --tests truncated frame header guard. supported_levels_filtered() panics on unknown bench filter tokens.
  2. 910db7b3 — collapse pass2-era profile-duplicate tests, identity BackendTag → BackendTag match dropped, seed pass uses type S = BtUltra2 consistently, parse_zstd_block_breakdown returns None on truncated header, bench filter panics for any unknown token (not just empty result).
  3. b63a37cdLevelParams switched from backend: BackendTag to strategy_tag: StrategyTag with derived backend() method. reset() reads params.strategy_tag directly. Three hc_collect_optimal_candidates_*_hash3_* / hc_hash3_tail_match_* tests removed (they relied on a hash3_log != 0 test-only escape hatch that masked an unreachable production state). Plus retention-days: 7 on bench shard artifacts.
  4. bdd88ffc — new fuzz CI matrix (5 targets — decode/encode/interop/huff0/fse). Each shard replays the existing zstd/fuzz/artifacts/ regression corpus first, then runs 90s of fresh libFuzzer time. Crash artifacts uploaded on failure (14-day retention).
  5. 49f7f2c2 — fuzz CI followup: zstd/fuzz/Cargo.toml had no edition field (defaulted to 2015, breaking use structured_zstd::…); pinned RUSTUP_TOOLCHAIN=nightly at the job env level so cargo fuzz survives the rust-toolchain.toml stable pin once it shells out to its inner cargo build.
  6. bd02dc6efuzz_targets/interop.rs bug: encode_szstd_uncompressed / encode_szstd_compressed drained the input Read into a local Vec and then compressed the now-empty Read, so the encoding branches silently encoded zero bytes and the two corpus artifacts under artifacts/interop/ triggered an assertion on every CI replay. Takes &[u8] directly now; encode_szstd_compressed uses CompressionLevel::Default instead of duplicating the uncompressed variant.

Perf delta vs perf/post-pr-121-baseline (host x86_64-darwin, best of 5)

Note: the original #122 perf targets (≥30 % default-level, ≥15 % fast-level) were not hit. The const-generic dispatch by itself didn't move the needle on host because the runtime reads it replaces (self.parse_mode, self.use_hash3) were already cache-hot single-instruction loads with perfectly-predicted branches. The structural foundation is the real deliverable; the perf payoff is gated on Phase 4 (BT/HC specialization + donor fast paths).

Scenario Level Baseline HEAD Δ
decodecorpus-z000033 1 19.36 ms 18.87 ms −2.53 %
decodecorpus-z000033 3 92.60 ms 94.23 ms +1.76 %
decodecorpus-z000033 7 91.17 ms 92.02 ms +0.93 %
decodecorpus-z000033 11 105.45 ms 102.95 ms −2.37 %
decodecorpus-z000033 22 372.44 ms 366.61 ms −1.56 %
high-entropy-1m 1 0.741 ms 0.706 ms −4.72 %
high-entropy-1m 7 1.19 ms 1.12 ms −5.98 %
high-entropy-1m 11 1.66 ms 1.51 ms −9.17 %
high-entropy-1m 22 2.67 ms 2.81 ms +5.59 %
low-entropy-1m 1–22 ±1 % (noise)
large-log-stream 1 16.15 ms 16.13 ms −0.14 %
large-log-stream 22 20.37 ms 19.81 ms −2.74 %

Architectural payoff

  • Type system enforces correctness: compress_block::<Fast> cannot reach start_matching_optimal (no USE_BT); BtUltra2-only hash3 code is unreachable from non-BtUltra2 monomorphisations.
  • Single source of truth for strategy. LevelParams::strategy_tag is the only authoritative mapping from CompressionLevel to strategy; the driver derives BackendTag from it instead of parallel-resolving the level twice.
  • HcParseMode and MatcherBackend removed, single BackendTag + StrategyTag in strategy.rs.
  • Foundation for Phase 4-5 ready — LDM, block splitter, BT/HC specialization can plug into the per-monomorphisation hot path without restructuring dispatch.
  • CI bench wall-clock cut from ~45 min to ~25 min, with builds reused across the 18 bench shards (one build per target instead of one per shard). Bench target inventory now lives in a single bench-matrix JSON output consumed by every downstream bench job — adding a new target is a one-row change.
  • Fuzz coverage restored — the existing zstd/fuzz/ setup now runs every PR with regression-corpus replay + 90s libFuzzer per target. The replay step also surfaced (and we fixed) a real bug in the interop target that had silently encoded zero bytes since the fuzz crate was first written.

Verification

  • 457/457 nextest (3 tests removed in b63a37cd — they tested a production-unreachable hash3_log != 0 + search_depth = 0 state that the removed test-only escape hatch made possible).
  • cargo clippy clean (default + --tests + --benches).
  • cargo fmt --check clean.
  • Build × 2 (default + --no-default-features) at every commit.
  • Level22 ratio gate (level22_sequences_match_donor_on_corpus_proxy) green at every commit.
  • Three new diagnostic probes (#[ignore]'d) document:
    • high-entropy-1m level22 speedup origin (source-size-hint-gated raw fast path)
    • incompressibility detector false-positive check (mixed-content + textish + pattern-only — all ratios identical to C ±7 byte frame overhead)
    • corpus z000033 +0.08 % gap is block-split heuristic (233 vs 256 compressed blocks), not core compression
  • CI bench split round-tripped through a synthetic two-shard fixture; new bench-build pipeline smoke-tested locally with STRUCTURED_ZSTD_BENCH_BIN=....

Acceptance criteria from #122

  • Strategy trait + 7 concrete strategy types
  • compress_block<S: Strategy> top-level entry — HcParseMode runtime dispatch removed from inner loops
  • No match self.parse_mode or if self.use_hash3 style branches on the encoded-position hot path
  • HcParseMode enum deleted (plus MatcherBackend deleted)
  • cargo nextest run --features hash,std,dict_builder 100 % pass
  • level22_sequences_match_donor_on_corpus_proxy ratio gate PASS on every commit
  • Roundtrip + cross-validation tests pass on every commit
  • Build clean (default + --no-default-features) + cargo clippy --all-targets -- -D warnings clean
  • Perf targets NOT met — see report above. Phase 3 ships as architectural refactor; perf payoff deferred to Phase 4.

Closes #122. Refs #111.

Summary by CodeRabbit

  • Refactor
    • Encoder internals reorganized to a strategy-driven design, reducing runtime branching and improving hot-path performance.
  • CI
    • Added nightly fuzz smoke job and reworked benchmarking into build, per-level shards, and a new aggregation step; publishing updated to use the consolidated benchmark artifact.
    • Benchmark runner can now use a prebuilt bench binary.
  • New Tools
    • Added a script to merge per-level benchmark shards into per-target consolidated outputs.
  • Tests
    • Added several ignored/manual probe tests and new frame-analysis test utilities.

Review Change Stack

polaz added 11 commits May 14, 2026 19:31
Phase 3 commit 1: lay down the type-level skeleton that future commits
will route the encoder hot paths through. No call sites change — this
commit only adds new types alongside the existing runtime
`HcParseMode` / `MatcherBackend` dispatch.

- `Strategy` trait with associated consts mirroring the donor
  `ZSTD_compressionParameters` template parameters: `BACKEND`,
  `MIN_MATCH`, `ACCURATE_PRICE`, `FAVOR_SMALL_OFFSETS`, `USE_HASH3`,
  `USE_BT`, `OPT_LEVEL`, plus a `PARSE_MODE` bridge to the existing
  runtime enum.
- 7 concrete strategy types `Fast` / `Dfast` / `Greedy` / `Lazy` /
  `BtOpt` / `BtUltra` / `BtUltra2`. Levels 5-15 collapse onto `Lazy`
  because their differences are runtime `HcConfig` fields, not
  compile-time consts.
- `BackendTag` enum + `StrategyTag` runtime dispatcher with a
  `for_level(u8) -> StrategyTag` const fn that mirrors the donor
  `clevels.h` mapping (1 → Fast, 2-3 → Dfast, 4 → Greedy, 5-15 → Lazy,
  16-17 → BtOpt, 18-19 → BtUltra, 20-22 → BtUltra2).
- Bridge methods `StrategyTag::parse_mode()` / `::backend()` so the
  next commits can introduce a generic entry point that still hands
  off to the runtime-enum-consuming code.
- Test coverage:
  - Each `Strategy` impl's `PARSE_MODE` and `BACKEND` agree with the
    matching `StrategyTag` arm.
  - `for_level` returns the right tag at every band boundary plus a
    mid-band level.
  - `USE_BT` aligns with parse_mode (only BtOpt / BtUltra / BtUltra2
    walk the BT).
  - `USE_HASH3` is set exclusively for BtUltra2 (donor parity).
  - `ACCURATE_PRICE` / `FAVOR_SMALL_OFFSETS` track the
    `HcOptimalCostProfile::for_mode` runtime table row-for-row so the
    eventual `const_for::<S>` rewrite is a mechanical swap.

Refs #122 Phase 3.
Phase 3 commit 2: resolve the compile-time strategy tag once per
`reset()` and store it on the driver alongside the existing
`active_backend` runtime enum. No hot-path dispatch changes yet — the
field is set, debug-asserted to stay in lock-step with
`active_backend`, and covered by a new test.

- Add `StrategyTag::for_compression_level(CompressionLevel)` mirroring
  `resolve_level_params` exactly: named variants (`Uncompressed` →
  Fast, `Default` → Dfast, `Better`/`Best` → Lazy) and numeric
  `Level(n)` clamped through the existing `for_level(u8)` band map.
  Negative levels fall back to `Fast` (Simple matcher with elevated
  `hash_fill_step`), matching the `resolve_level_params` Simple-arm.
- Add `strategy_tag: StrategyTag` to `MatchGeneratorDriver`. Initialise
  to `StrategyTag::Fast` to match the existing `active_backend =
  MatcherBackend::Simple` default.
- Resolve `strategy_tag` from the requested `CompressionLevel` inside
  `reset()`, immediately after the existing `active_backend`
  assignment. A `debug_assert!` enforces the invariant
  `strategy_tag.backend() == active_backend` so future drift in
  `for_compression_level` or `resolve_level_params` is caught in
  debug builds.
- New regression test
  `driver_reset_keeps_strategy_tag_in_sync_with_active_backend` walks
  every band boundary (1, 2-3, 4, 5-15, 16-17, 18-19, 20-22) plus all
  four named variants and asserts both the expected tag and the
  matching backend.

This commit is purely additive — no behaviour change for any caller.
Sets up the data flow that commit 3 will branch the hot dispatch on
to enter `compress_block::<S>` monomorphisations.

Refs #122 Phase 3.
Phase 3 commit 3: introduce the monomorphised per-block entry point
without changing any inner hot-loop behaviour. Each
`Matcher::start_matching` call now picks a concrete `S: Strategy`
from `self.strategy_tag` (resolved at `reset()` in commit 2) and
hands off to `compress_block::<S>`. The body still uses runtime
dispatch internally, so this commit is behaviour-preserving — it
exists to plumb the type parameter into the HC matcher so later
commits can replace inner `match self.parse_mode` branches with
`if S::USE_BT` / `if S::USE_HASH3` const checks.

- `MatchGeneratorDriver::start_matching` becomes a 7-arm match over
  `self.strategy_tag` dispatching into `compress_block::<S>` per
  variant.
- `compress_block<S: Strategy>(&mut self, …)` lives next to the
  `impl Matcher` block. Body switches on `S::BACKEND` (an associated
  `const`), so each monomorphisation keeps exactly one backend arm
  after const-eval folds the `match` — verified by inspection of the
  generated IR before/after.
- For the `HashChain` backend, `compress_block` calls a new
  `HcMatchGenerator::start_matching_strategy<S>` method that today
  just delegates back to the existing runtime-dispatched
  `start_matching`. A `debug_assert_eq!` enforces
  `S::PARSE_MODE == self.parse_mode` so the bridge cannot drift
  silently while we migrate the inner loops in commits 4-7.

463/463 nextest, clippy clean, build × 2.

Refs #122 Phase 3.
…e_mode

Phase 3 commit 4: the strategy-aware entry on `HcMatchGenerator` now
branches on the compile-time `S::USE_BT` constant. Each
monomorphisation (Lazy / BtOpt / BtUltra / BtUltra2) keeps exactly
one of `start_matching_lazy` / `start_matching_optimal` — the dead
arm is dropped at codegen time, so the per-block dispatch no longer
pays for a runtime tag check.

The lazy/optimal split is the outer-most parse_mode branch — every
encoded position inside the block already runs the right inner
parser, but until now the dispatch *itself* re-evaluated
`self.parse_mode` on every block call.

- `HcMatchGenerator::start_matching_strategy<S>` body becomes
  `if S::USE_BT { start_matching_optimal } else { start_matching_lazy }`.
- The runtime-dispatched inherent `start_matching` is now `#[cfg(test)]`
  — the production driver path goes through `start_matching_strategy`
  exclusively. Tests that exercise HcMatchGenerator directly still
  use the inherent method; the final cleanup commit removes both the
  inherent method and the `parse_mode` field.
- `debug_assert_eq!(self.parse_mode, S::PARSE_MODE)` at the entry
  catches any future drift between the runtime field (still set by
  `configure`) and the const passed by the dispatcher.
- Cost-model and parse-mode parity assertions in
  `strategy::tests` moved from `#[test] fn` into compile-time
  `const _: () = { assert!(…) }` blocks per
  `clippy::assertions_on_constants` (the asserts only inspect
  associated `const`s — they belong at compile time, not in
  the runner).

460/460 nextest, clippy clean (default + `--tests`), build × 2.

Refs #122 Phase 3.
Phase 3 commit 5: thread the const-generic `S: Strategy` through the
optimal parser entry, BT dispatcher, and the four SIMD-kernel
wrappers down to the `build_optimal_plan_impl_body!` macro. Inside
the macro the two `parse_mode` reads that previously drove every
candidate insertion and traceback step are replaced with
strategy-projected `let` bindings — each is a compile-time constant
per monomorphisation, so LLVM folds them and drops the dead arms in
each strategy's instance.

- `build_optimal_plan_impl_body!` gains a `$strategy_ty:ty` parameter
  and computes `abort_on_worse_match = <S>::OPT_LEVEL == 0` /
  `opt_level = <S>::OPT_LEVEL >= 2`. A `debug_assert!` enforces
  `<S>::USE_BT` so the body never runs for a non-BT strategy.
- Each of `build_optimal_plan_impl_neon` / `_sse42` / `_avx2_bmi2` /
  `_scalar` gains `S: Strategy` as a leading type parameter
  alongside the existing `ACCURATE_PRICE` / `FAVOR_SMALL_OFFSETS`
  const-generics; they pass S through into the macro body.
- `build_optimal_plan_impl<S>` plumbs S into the 4 kernel
  dispatchers. The runtime 4-arm match on
  `(profile.accurate, profile.favor_small_offsets)` stays for now —
  `generic_const_exprs` would be needed to project the bool consts
  directly into a const-generic argument list, and the dispatch is
  per-block, not per-position. In practice only two arms are reached
  (BtOpt → (false, true), BtUltra / BtUltra2 → (true, false)).
- `build_optimal_plan<S>` and `start_matching_optimal<S>` follow.
  Both `start_matching_strategy<S>` (production path) and the
  `#[cfg(test)]` inherent `start_matching` (now 4-arm on
  `HcParseMode` → concrete `BtOpt` / `BtUltra` / `BtUltra2` types)
  invoke the generic entry. `run_btultra2_seed_pass` pins
  `S = BtUltra2` locally — by name it is BtUltra2-exclusive.
- `nested const items cannot project through outer generics`:
  switched the per-strategy flags from `const ABORT_ON_WORSE_MATCH:
  bool = …` to `let abort_on_worse_match: bool = …` so the
  expressions live in the parent fn's substitution scope. The
  optimiser still folds them per monomorphisation.

460/460 nextest, clippy clean (default + `--tests`), build × 2.

Refs #122 Phase 3.
Three ignored manual probes that document the level22 perf claims
with executable evidence. All gated on `#[ignore]` so they do not
run in CI by default; invoke with
`cargo nextest run --release ... --run-ignored ignored-only`.

- `he_level22_ratio` — confirms the high-entropy-1m scenario
  speedup (~10x vs C FFI) is driven by the source-size-hinted
  raw-fast-path; toggles the hint off to verify the slow-path
  matches C-zstd timing.
- `incompressibility_falsepos` — exercises the
  `block_looks_incompressible` heuristic on hard-random, mixed,
  textish, and pattern-only inputs to confirm zero false positives
  (rust/C output sizes match within a 7-byte frame overhead).
- `corpus_level22_gap` — quantifies the +0.08 % decodecorpus-z000033
  output gap as a block-split heuristic divergence (233 vs 256
  compressed blocks), not a core-compression regression.

Refs #122 Phase 3 perf accounting.
Phase 3 commit 6: thread S through the optimal-candidates body and
its four SIMD-kernel wrappers, then gate the hash3 short-match
lookup (donor `static (mls==3)` probe) on `S::USE_HASH3`. Only
`BtUltra2` sets `USE_HASH3 = true`; every other monomorphisation
drops the entire `update_hash3_until` + `$hash3` block at codegen
time. On the level22 hot path this removes one indirect read +
branch + table walk per encoded position.

- `collect_optimal_candidates_initialized_body!` gains
  `$strategy_ty:ty` and a `let use_hash3: bool = <S>::USE_HASH3`
  binding. The hash3 sub-block becomes `if use_hash3 && … { … }`
  so the dead arm folds away per S.
- `debug_assert!` confirms the runtime/compile-time invariant
  (`USE_HASH3 = true` implies `hash3_log != 0`).
- Each of the four SIMD wrappers (`collect_optimal_candidates_
  initialized_neon` / `_sse42` / `_avx2_bmi2` / `_scalar`) gains
  `S: Strategy` alongside the existing `USE_BT_MATCHFINDER` const.
  The cross-platform dispatcher (`collect_optimal_candidates_
  initialized`) plumbs S through to all four arms.
- The DP body call sites in `build_optimal_plan_impl_body!`
  (`$self.$collect::<$strategy_ty, true>(…)`) thread S into the
  per-position pipeline.
- `#[cfg(test)] collect_optimal_candidates` rewritten as a four-arm
  match on `(uses_bt, hash3_log != 0)` so artificial test
  configurations still reach the right monomorphisation —
  production callers never hit this helper.

460/460 nextest, clippy clean, build × 2.

Refs #122 Phase 3.
Phase 3 commit 7: make `should_run_btultra2_seed_pass` generic over
`S: Strategy` and short-circuit the predicate at compile time for
every non-BtUltra2 monomorphisation. Only `BtUltra2` satisfies
`S::OPT_LEVEL == 2 && S::USE_HASH3`, so Fast / Dfast / Greedy /
Lazy / BtOpt / BtUltra now drop both the predicate body and the
inlined call to `run_btultra2_seed_pass` after the per-strategy
monomorphisation pass.

- `should_run_btultra2_seed_pass<S>` starts with
  `if !(S::OPT_LEVEL == 2 && S::USE_HASH3) { return false; }` —
  a const expression per S, so LLVM folds it and the rest of the
  predicate (HcBackend::Bt unwrap, frame-start checks, dictionary
  state) only emits for BtUltra2.
- Replaced the runtime `self.is_btultra2()` read inside the
  predicate with the const path; the `is_btultra2` helper stays
  for the (single remaining) profile-selection site.
- `start_matching_optimal::<S>` now invokes
  `self.should_run_btultra2_seed_pass::<S>(current_len)`.
- All test call sites pin `S = BtUltra2` (the only strategy where
  the predicate is interesting); the corresponding assertions
  remain unchanged.

460/460 nextest, clippy clean, build × 2.

Refs #122 Phase 3.
Phase 3 commit 8: replace the runtime profile selection in
`start_matching_optimal::<S>` with `HcOptimalCostProfile::
const_for_strategy::<S>()`. The four profile fields (max chain
depth, sufficient match length, accurate flag, favor-small-offsets
flag) come straight from the strategy's associated consts, so the
optimiser folds the whole profile into a single literal at codegen
time. The old two-arm runtime read (`if self.is_btultra2() {
for_mode(parse_mode, true) } else { for_mode(parse_mode, false) }`)
disappears.

- Add `Strategy::MAX_CHAIN_DEPTH` and `Strategy::SUFFICIENT_MATCH_LEN`
  associated consts, mirroring the `HcOptimalCostProfile::for_mode`
  table row-for-row. Non-BT strategies (Fast / Dfast / Greedy) get
  the Lazy2 row so the trait stays total — those values are never
  read because USE_BT short-circuits the path.
- Add `HcOptimalCostProfile::const_for_strategy<S>()` next to the
  existing `for_mode`. Marked `#[inline]` so the literal lands at
  the call site after monomorphisation.
- `start_matching_optimal::<S>` calls the new const peer; the
  inherent `HcMatchGenerator::is_btultra2()` accessor is gone (no
  remaining readers).
- `for_mode` itself stays — it has callers in tests and in
  `run_btultra2_seed_pass` (which pins `S = BtUltra2` and could
  switch to `const_for_strategy` later, but the call is once per
  frame so it is not on the hot path).

460/460 nextest, clippy clean (default + `--tests`), build × 2.

Refs #122 Phase 3.
Phase 3 commit 9 (close-out): delete both legacy runtime dispatch
enums and every read site that depended on them. The encoder now
selects its match-finder backend exclusively through the
const-generic `S: Strategy` trait and the once-per-frame
`StrategyTag` resolved at `reset()`.

- `HcParseMode` (Lazy2 / BtOpt / BtUltra / BtUltra2) deleted from
  `strategy.rs`. The `Strategy::PARSE_MODE` associated const and the
  `StrategyTag::parse_mode()` bridge are gone with it — nothing
  reads them now that every dispatch routes through `S::USE_BT`,
  `S::USE_HASH3`, `S::OPT_LEVEL`, etc.
- `MatcherBackend` (Simple / Dfast / Row / HashChain) deleted from
  `match_generator.rs`. All references (`LevelParams::backend`,
  `MatchGeneratorDriver::active_backend`, plus the
  `retire_dictionary_budget` / `trim_after_budget_retire` /
  `skip_matching_for_dictionary_priming` 4-arm matches) now use the
  unified `super::strategy::BackendTag`. Same 4-variant shape, one
  enum.
- `HcConfig` drops its `parse_mode: HcParseMode` field; every
  `HcConfig` literal in `LEVEL_TABLE` and the per-source-size
  variants (`BTOPT_HC_CONFIG`, `BTULTRA_HC_CONFIG`,
  `BTULTRA2_HC_CONFIG`, etc.) shrinks by one line.
- `HcMatchGenerator::parse_mode` field deleted.
  `HcMatchGenerator::configure` gains an explicit
  `tag: StrategyTag` argument from which it derives `is_btultra2`,
  `uses_bt`, and the search-depth cap. The driver passes
  `self.strategy_tag` (set at `reset()`) on the single production
  call site.
- `HcOptimalCostProfile::for_mode(HcParseMode, bool)` deleted from
  `cost_model::mod`. `const_for_strategy::<S>()` is now the only
  way to build a profile. Every call site (production +
  `run_btultra2_seed_pass` + the cost-model tests) was migrated to
  pass an explicit `S = strategy::{Lazy, BtOpt, BtUltra, BtUltra2}`.
- The three `level_*_use_*_parse_mode` donor-parity tests are
  rewritten to assert against `StrategyTag::for_level(level)`
  directly (the data they were checking lives in the const-generic
  trait now). The `#[cfg(test)] HcMatchGenerator::start_matching`
  helper picks a strategy from the runtime table flags
  (`uses_bt`/`is_btultra2`/`hash3_log`) so artificial test setups
  still reach the right monomorphisation.
- `start_matching_strategy::<S>`'s `debug_assert_eq!` now compares
  `S::USE_BT` against `self.table.uses_bt` (the field set by
  `configure`) instead of the deleted `self.parse_mode` /
  `S::PARSE_MODE` pair.
- Module-level docs in `strategy.rs` are rewritten as a single
  Phase-3 dispatch flow diagram covering every const-folded gate
  (`USE_BT`, `USE_HASH3`, `OPT_LEVEL`, `ACCURATE_PRICE`,
  `FAVOR_SMALL_OFFSETS`, `MAX_CHAIN_DEPTH`, `SUFFICIENT_MATCH_LEN`).
  `encoding/mod.rs` header updated to drop the now-stale
  `HcParseMode` reference and to add `perf/post-pr-121-baseline`.

460/460 nextest, clippy clean (default + `--tests`), build × 2.
Phase 3 acceptance criterion "HcParseMode enum deleted" — done.

Refs #122 Phase 3.
Before: a single CI runner per target ran every level back-to-back
(~45 min on i686, with level22 dominating ~20 min on its own).

After: matrix split target × level (3 targets × 6 levels = 18
parallel runners) plus a single aggregation job that consolidates
per-target shards back into the existing per-target artifact
shape, so the cross-target merge step downstream stays unchanged.

- `support::supported_levels_filtered()` honours
  `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER` (comma-separated level names);
  every loop in `compare_ffi.rs` that used to iterate over
  `supported_levels()` now uses the filtered variant. With no env
  var, behaviour is identical to before.
- `.github/scripts/aggregate-bench-levels.py` reads
  `benchmark-{results,delta,relative,report}.<target>.<level>.<ext>`
  shards (uploaded as `benchmark-shard-<target>-<level>` artifacts)
  and emits per-target consolidated
  `benchmark-{results,delta,relative,report}.<target>.<ext>` files
  matching the pre-split naming. Markdown shards are concatenated
  under per-level `## Level: <name>` sections.
- CI workflow:
  - `benchmark` job: matrix becomes `bench × level`. Each runner
    sets `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER` and uploads its
    `benchmark-shard-<target>-<level>` artifact.
  - new `benchmark-aggregate` job (post-benchmark): downloads
    every `benchmark-shard-*` artifact, runs the aggregation
    script, and re-uploads three target-shaped artifacts
    (`benchmark-x86_64-gnu`, `benchmark-i686-gnu`,
    `benchmark-x86_64-musl`). `github-action-benchmark` moved
    here so the regression alert fires on the consolidated
    x86_64-gnu result.
  - `benchmark-pages` now depends on `benchmark-aggregate` and
    downloads the three per-target artifacts explicitly (instead
    of the previous `benchmark-*` glob) so the shard artifacts
    don't leak into `merge-benchmarks.py`'s `<target>.<level>`
    name extraction.
  - Per-runner `timeout-minutes` cut from 40/55/40 to 25/30/25
    matching the new per-level work share.

Smoke-test:
- `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER=fastest cargo bench
  --list` emits only `fastest` benchmarks; no env var → all 6
  levels.
- Aggregator round-trips a synthetic two-shard fixture and
  produces the expected consolidated JSON + markdown files.
- Local nextest 460/460, clippy clean (default + benches).

Refs #122 Phase 3 — supporting infrastructure.
Copilot AI review requested due to automatic review settings May 14, 2026 20:09
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b3b28ce5-94ca-4082-9b31-c9aa556ab415

📥 Commits

Reviewing files that changed from the base of the PR and between 3c7f8b7 and b7e45c9.

📒 Files selected for processing (2)
  • .github/workflows/ci.yml
  • zstd/src/encoding/match_generator.rs

📝 Walkthrough

Walkthrough

Adds a nightly fuzz job; splits benchmarks into bench-matrix, bench-build, and a benchmark job matrix that emits per-(target,level) shards; introduces benchmark-aggregate to merge level shards into per-target artifacts using .github/scripts/aggregate-bench-levels.py; updates pages and runner to consume consolidated outputs.

Changes

CI benchmark pipeline

Layer / File(s) Summary
Fuzz job
.github/workflows/ci.yml
Adds fuzz job running short libFuzzer smoke runs per fuzz target, replaying regression artifacts and uploading crash artifacts on failure.
bench-matrix job (targets inventory)
.github/workflows/ci.yml
Produces targets.json and exports targets / ids_csv workflow outputs for downstream matrix and build jobs.
Bench-build job (build & upload bench binary)
.github/workflows/ci.yml
Adds bench-build job that builds the compare_ffi benchmark binary per target and uploads bench-binary-<target-id> artifacts.
Benchmark job: target × level matrix and shard uploads
.github/workflows/ci.yml
Reworks benchmark job to a target×level matrix, sets STRUCTURED_ZSTD_BENCH_LEVEL_FILTER for per-level runs, renames outputs to include target+level, and uploads per-(target,level) shard artifacts benchmark-shard-<target>-<level>.
Aggregate shards into per-target artifacts & pages update
.github/workflows/ci.yml, .github/scripts/aggregate-bench-levels.py
benchmark-aggregate downloads benchmark-shard-* shards, runs aggregate-bench-levels.py to merge level shards into consolidated per-target benchmark-<target> artifacts, uploads a single benchmark-aggregated artifact, and benchmark-pages now depends on benchmark-aggregate and downloads only the consolidated artifact.
Run-benchmarks script: reuse pre-built binary
.github/scripts/run-benchmarks.sh
Support using a pre-built Criterion bench binary via STRUCTURED_ZSTD_BENCH_BIN (executable check + run with --bench --output-format bencher) and fall back to cargo bench when unset; preserves existing raw output capture.

Sequence Diagram(s)

sequenceDiagram
  participant BenchBuild as bench-build (GH job)
  participant ArtifactStore as Actions Artifacts
  participant BenchmarkJob as benchmark (GH job matrix)
  participant RunnerScript as run-benchmarks.sh
  participant AggregateJob as benchmark-aggregate (GH job)
  participant AggregateScript as aggregate-bench-levels.py
  participant PagesJob as benchmark-pages (GH job)

  BenchBuild->>ArtifactStore: upload bench-binary-<target>
  BenchmarkJob->>ArtifactStore: download bench-binary-<target>
  BenchmarkJob->>RunnerScript: invoke with STRUCTURED_ZSTD_BENCH_LEVEL_FILTER
  RunnerScript->>BenchmarkJob: produce per-(target,level) shard files
  BenchmarkJob->>ArtifactStore: upload benchmark-shard-<target>-<level>
  AggregateJob->>ArtifactStore: download benchmark-shard-*
  AggregateJob->>AggregateScript: run aggregate-bench-levels.py to merge shards
  AggregateScript->>ArtifactStore: upload consolidated benchmark-aggregated
  PagesJob->>ArtifactStore: download benchmark-aggregated
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

"I'm a rabbit in the CI shrub,
I hop through artifacts and log,
I build a tiny binary, then I stitch the shards,
Levels line up like garden rows, targets stacked in bog.
I nibble duplicates and stitch the rest — metrics grow like carrot logs."

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: Phase 3 of the encoder rewrite implementing const-generic Strategy dispatch to replace runtime dispatch with compile-time monomorphization.
Linked Issues check ✅ Passed All major objectives from #122 are satisfied: Strategy trait with 7 concrete types, const-generic compress_block dispatch, hot-path runtime branches eliminated, HcParseMode removed, all tests passing, and benchmarks show architectural foundation complete despite mixed perf gains.
Out of Scope Changes check ✅ Passed All changes align with #122 scope: CI/benchmark restructuring (fuzz job, bench-matrix, sharding, aggregation) directly supports the Phase 3 perf testing infrastructure. Test files probe implementation correctness. No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#122-phase3-const-generic-strategy

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Phase 3 of the #111 encoder rewrite. Replaces the runtime HcParseMode / MatcherBackend enum dispatch with const-generic Strategy monomorphisation. A new Strategy trait + 7 ZST implementors (Fast/Dfast/Greedy/Lazy/BtOpt/BtUltra/BtUltra2) lift the parser/finder choice into the type system; hot entry points (compress_block::<S>, start_matching_optimal::<S>, build_optimal_plan_impl::<S, …>, collect_optimal_candidates_initialized::<S, …>) are now generic and let the optimiser fold S::USE_BT / S::USE_HASH3 / S::OPT_LEVEL branches per monomorphisation. The runtime HcParseMode enum and MatcherBackend enum are deleted; BackendTag (in strategy) and a per-driver StrategyTag field resolved at reset() become the single source of truth. CI also splits the bench matrix into target × level shards with a new benchmark-aggregate job consolidating shards for merge-benchmarks.py.

Changes:

  • Introduce Strategy trait + 7 ZST strategies + StrategyTag; thread <S> through optimal-parser dispatch, SIMD wrappers, cost-profile builder, and BTULTRA2 seed-pass gate.
  • Delete HcParseMode and MatcherBackend; replace for_mode(mode, pass2) with const_for_strategy::<S>(); update all in-crate tests accordingly.
  • Split CI bench matrix per level (matrix.bench × matrix.level), add benchmark-aggregate job + aggregate-bench-levels.py to recombine shards.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
zstd/src/encoding/strategy.rs New Strategy trait, 7 ZSTs, BackendTag, StrategyTag::for_level/for_compression_level
zstd/src/encoding/mod.rs Module doc refresh referencing the new strategy module
zstd/src/encoding/match_generator.rs Driver strategy_tag; compress_block::<S>; monomorphised optimal parser / BT-seed paths; HcParseMode/MatcherBackend removed; tests rewritten
zstd/src/encoding/cost_model/mod.rs for_mode replaced by const_for_strategy::<S>()
zstd/src/encoding/match_table/storage.rs Doc-only refresh of is_btultra2/uses_bt source
zstd/benches/support/mod.rs New level_filter_from_env + supported_levels_filtered
zstd/benches/compare_ffi.rs Use supported_levels_filtered
zstd/tests/{he_level22_ratio,incompressibility_falsepos,corpus_level22_gap}.rs New #[ignore]'d diagnostic probes
.github/workflows/ci.yml Bench matrix split per level + new benchmark-aggregate job
.github/scripts/aggregate-bench-levels.py New aggregator combining per-level shards into per-target files
Comments suppressed due to low confidence (3)

zstd/src/encoding/match_generator.rs:3858

  • After migrating from for_mode(mode, pass2) to const_for_strategy::<S>(), p1 and p2 are now built from the identical expression and are guaranteed equal. The test no longer exercises any difference between the former "pass2 = false" and "pass2 = true" call sites — every assertion is trivially satisfied. Either consolidate to a single profile and drop the duplicate checks, or extend the test to exercise the cases the old API distinguished (e.g. by reading both BtUltra and BtUltra2 profiles).
    let p1 = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
    let p2 = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
    assert!(
        !p1.favor_small_offsets,
        "btultra2 primary profile should match donor opt2 offset pricing"
    );
    assert!(
        !p2.favor_small_offsets,
        "btultra2 secondary profile should match donor opt2 offset pricing"
    );
}

#[test]
fn btultra2_profile_is_single_pass_opt2() {
    let p1 = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
    let p2 = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
    assert_eq!(p1.max_chain_depth, p2.max_chain_depth);
    assert_eq!(p1.sufficient_match_len, p2.sufficient_match_len);
    assert_eq!(p1.accurate, p2.accurate);

zstd/src/encoding/match_generator.rs:4051

  • profile_predef and profile_dyn are now built from the identical const_for_strategy::<BtUltra2>() call, so the two profile values are equal. The original test distinguished for_mode(BtUltra2, false) from for_mode(BtUltra2, true). The only remaining difference between the two arms is stats.price_type (Predefined vs Dynamic), which is the actual intent — consider removing the now-duplicate profile_dyn binding (or renaming both to make the single-profile fact explicit) so a future reader doesn't mistake them for materially different profiles.
    let profile_predef = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
    let mut stats_predef = HcOptState::new();
    stats_predef.price_type = HcOptPriceType::Predefined;
    let predef_max = profile_predef.lit_length_price(&stats_predef, HC_BLOCKSIZE_MAX);
    let predef_prev =
        profile_predef.lit_length_price(&stats_predef, HC_BLOCKSIZE_MAX.saturating_sub(1));
    assert_eq!(
        predef_max,
        predef_prev + HC_BITCOST_MULTIPLIER,
        "predefined litLength pricing at BLOCKSIZE_MAX must add exactly one bit"
    );

    let profile_dyn = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();

zstd/src/encoding/match_generator.rs:2889

  • The local type S = super::strategy::BtUltra2; alias here is unused — both const_for_strategy::<...> calls below explicitly spell out super::strategy::BtUltra2, and build_optimal_plan::<S> is the only consumer of the alias. Either remove the alias (and inline BtUltra2 at the build_optimal_plan call site) or use it consistently for the const_for_strategy line too, so the seed-pass body stays uniform.
        // The seed pass is BtUltra2-exclusive by name (`is_btultra2()`
        // gates the only call site), so pin S to BtUltra2 here.
        type S = super::strategy::BtUltra2;
        let seed_profile = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();

Comment thread zstd/src/encoding/match_generator.rs Outdated
Comment thread zstd/benches/support/mod.rs Outdated
Comment thread zstd/src/encoding/strategy.rs Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

❌ Patch coverage is 92.20779% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
zstd/src/encoding/strategy.rs 91.30% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/benches/support/mod.rs`:
- Around line 96-104: supported_levels_filtered currently ignores unknown tokens
from level_filter_from_env, which can silently produce empty benchmark sets;
update supported_levels_filtered to validate the requested `keep` names against
the available names from supported_levels() and fail fast if any unknowns are
present: compute the set/list of available names from supported_levels(),
compute unknowns = requested - available, and if unknowns is non-empty, abort
(panic or e.g. unwrap with a clear message) with a message listing the invalid
tokens and the allowed names (mentioning STRUCTURED_ZSTD_BENCH_LEVEL_FILTER in
the message for clarity); otherwise proceed to filter as before.

In `@zstd/src/encoding/match_generator.rs`:
- Around line 2601-2607: The configure() code in HcMatchGenerator collapses
StrategyTag variants via local booleans (uses_bt/is_btultra2), causing
start_matching() and downstream logic to treat non-BtUltra2 BT modes as BtOpt;
instead store the full StrategyTag on the HcMatchGenerator struct (add a field
like tag: StrategyTag), set that field in configure(), and update
uses_bt/is_btultra2 checks and any dispatch/assert logic in start_matching() and
the parser/optimizer code to branch on self.tag (the preserved StrategyTag) so
BtOpt, BtUltra and BtUltra2 remain distinct for OPT_LEVEL, ACCURATE_PRICE, and
FAVOR_SMALL_OFFSETS decisions.

In `@zstd/src/encoding/strategy.rs`:
- Around line 270-275: The match arm handling CompressionLevel::Level casts n to
u8 before clamping, causing values >= 256 to wrap; change the logic to clamp the
i32 value against the valid range (use
n.max(1).min(CompressionLevel::MAX_LEVEL)) and only then cast to u8 before
calling Self::for_level, ensuring CompressionLevel::MAX_LEVEL and
Self::for_level are used to determine the clamp bounds.

In `@zstd/tests/corpus_level22_gap.rs`:
- Around line 57-76: The loop currently stops after a fixed 256 iterations,
which truncates analysis for larger frames; change the loop control to iterate
until the frame-end or the existing "last" sentinel is hit instead of using the
0..256 cap: replace the "for _ in 0..256" construct so that the body that
computes h, last, bt, bs and advances off runs repeatedly while off + 3 <=
frame.len() (or use an unconditional loop with the same off+3 check and the
existing last==1 break), preserving the inner logic that updates
raw/rle/compressed/total and advances off.

In `@zstd/tests/he_level22_ratio.rs`:
- Around line 48-108: The tests duplicate zstd block-parsing logic
(dump_block_types in he_level22_ratio.rs vs block_breakdown in
corpus_level22_gap.rs); extract the common parser into a shared test utility
(e.g., zstd/tests/util.rs or zstd/tests/common.rs) exposing a single function
(name it e.g. parse_zstd_blocks or dump_block_types) that accepts the frame
bytes and a loop-limit or uses a termination condition instead of hard-coded 64,
preserve the existing return values (raw/rle/compressed/total_size) or printing
behavior, then replace the two local implementations with calls to the shared
function (update he_level22_ratio.rs and corpus_level22_gap.rs to call
parse_zstd_blocks/dump_block_types). Ensure the new shared function reproduces
the same offsets/fhd parsing logic and parameterize the previous loop cap to
avoid behavioral drift.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4b7a4bb4-ace5-4de1-acab-d7e0d91bf34d

📥 Commits

Reviewing files that changed from the base of the PR and between 2e9c99d and 18bb80d.

📒 Files selected for processing (12)
  • .github/scripts/aggregate-bench-levels.py
  • .github/workflows/ci.yml
  • zstd/benches/compare_ffi.rs
  • zstd/benches/support/mod.rs
  • zstd/src/encoding/cost_model/mod.rs
  • zstd/src/encoding/match_generator.rs
  • zstd/src/encoding/match_table/storage.rs
  • zstd/src/encoding/mod.rs
  • zstd/src/encoding/strategy.rs
  • zstd/tests/corpus_level22_gap.rs
  • zstd/tests/he_level22_ratio.rs
  • zstd/tests/incompressibility_falsepos.rs

Comment thread zstd/benches/support/mod.rs Outdated
Comment thread zstd/src/encoding/match_generator.rs
Comment thread zstd/src/encoding/strategy.rs Outdated
Comment thread zstd/tests/corpus_level22_gap.rs Outdated
Comment thread zstd/tests/he_level22_ratio.rs Outdated
Six review findings from PR #123:

(1) HcMatchGenerator now mirrors the driver-resolved StrategyTag in a
    new `strategy_tag` field set by `configure()`. The
    `#[cfg(test)] start_matching` dispatcher branches on it, so
    BtOpt / BtUltra / BtUltra2 each reach their own monomorphisation
    instead of collapsing onto BtOpt whenever
    `is_btultra2 == false`. The matching `collect_optimal_candidates`
    helper does the same, with a legacy `hash3_log != 0` fallback for
    artificial tests that wire `hash3_log` directly without calling
    `configure()`.

(2) `HcOptimalCostProfile::const_for_strategy::<S>()` opens with
    `debug_assert!(S::USE_BT, …)`. The `MAX_CHAIN_DEPTH` and
    `SUFFICIENT_MATCH_LEN` consts on `Fast` / `Dfast` / `Greedy` /
    `Lazy` carry placeholder values (the optimal parser entry point
    is unreachable from those strategies). The debug assert plus
    fresh doc comments on each non-BT strategy point at the
    placeholder contract.

(3) `supported_levels_filtered()` panics when
    `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER` is non-empty but matches
    zero known levels, with a message listing every known level. CI
    matrix typos (e.g. renaming a level in `supported_levels()`
    without updating ci.yml) now fail loudly on the affected shard
    instead of silently producing an empty bench run and a missing
    regression alert downstream.

(4) `StrategyTag::for_compression_level(Level(n))` clamps `n` in
    `i32` BEFORE casting to `u8`. The previous `(n as u8).min(22)`
    truncated values >= 256 (e.g. `Level(256)` wrapped to `0` and
    routed to `Dfast`); now `Level(i32::MAX)` saturates to BtUltra2
    as expected. Regression test
    `for_compression_level_clamps_oversized_numeric_levels_to_btultra2`
    added.

(5) `tests/common/mod.rs` hosts a shared `parse_zstd_block_breakdown`
    + `dump_block_breakdown` helper. Both `corpus_level22_gap.rs` and
    `he_level22_ratio.rs` route through it instead of carrying
    near-identical local copies (the corpus probe had a 256-iter cap,
    the high-entropy probe had a 64-iter cap — both replaced with a
    bounded `while off + 3 <= frame.len()` walk so large frames are
    no longer truncated).

461/461 nextest (+1 regression test for #4), clippy clean (default +
--tests), fmt clean, build × 2.

Refs #122 Phase 3.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Comment thread zstd/src/encoding/match_generator.rs Outdated
Comment thread zstd/src/encoding/match_generator.rs Outdated
Comment thread zstd/src/encoding/match_generator.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
zstd/benches/support/mod.rs (1)

104-119: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when any requested level token is unknown.

Line 107 filters by intersection only, so STRUCTURED_ZSTD_BENCH_LEVEL_FILTER=default,typo still runs and silently drops typo. That can mask CI matrix mistakes and reduce intended shard coverage.

Suggested minimal fix
 pub(crate) fn supported_levels_filtered() -> Vec<LevelConfig> {
     let all = supported_levels();
     match level_filter_from_env() {
         None => all.to_vec(),
         Some(keep) => {
+            let known: Vec<&'static str> = all.iter().map(|cfg| cfg.name).collect();
+            let unknown: Vec<String> = keep
+                .iter()
+                .filter(|name| !known.iter().any(|k| k == &name.as_str()))
+                .cloned()
+                .collect();
+            if !unknown.is_empty() {
+                panic!(
+                    "STRUCTURED_ZSTD_BENCH_LEVEL_FILTER contains unknown level(s) \
+                     {unknown:?}; supported levels are {known:?}"
+                );
+            }
+
             let kept: Vec<LevelConfig> = all
                 .into_iter()
                 .filter(|cfg| keep.iter().any(|name| name == cfg.name))
                 .collect();
             if kept.is_empty() {
                 let known: Vec<&'static str> =
                     supported_levels().iter().map(|cfg| cfg.name).collect();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/benches/support/mod.rs` around lines 104 - 119, The current Some(keep)
branch silently drops unknown tokens; update the logic in the Some(keep) arm
(the block that builds kept: Vec<LevelConfig>) to detect any requested names in
keep that do not match supported_levels() and panic immediately listing the
unknown tokens and known levels; specifically, compute the set/difference of
keep minus supported_levels().iter().map(|cfg|cfg.name) and if non-empty call
panic! with a clear message including the unknown tokens and the known levels,
otherwise continue returning kept as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/tests/common/mod.rs`:
- Around line 48-50: The code computes header size into `off` and then proceeds
to parse blocks, but lacks a pre-check that the `frame` contains those header
bytes; update the function so that immediately after computing `let mut off = 5
+ window_descriptor_len + dict_id_len + fcs_len;` you check `if frame.len() <
off { return None; }` (or equivalent) so truncated frame headers return None
instead of `Some(BlockBreakdown::default())`; ensure this change affects the
same parsing flow that enters the `while off + 3 <= frame.len()` loop and
preserves existing uses of `BlockBreakdown::default()` for other cases.

---

Duplicate comments:
In `@zstd/benches/support/mod.rs`:
- Around line 104-119: The current Some(keep) branch silently drops unknown
tokens; update the logic in the Some(keep) arm (the block that builds kept:
Vec<LevelConfig>) to detect any requested names in keep that do not match
supported_levels() and panic immediately listing the unknown tokens and known
levels; specifically, compute the set/difference of keep minus
supported_levels().iter().map(|cfg|cfg.name) and if non-empty call panic! with a
clear message including the unknown tokens and the known levels, otherwise
continue returning kept as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: aebed1ce-de63-4c52-9b32-33de83cd28c0

📥 Commits

Reviewing files that changed from the base of the PR and between 18bb80d and 372ef50.

📒 Files selected for processing (7)
  • zstd/benches/support/mod.rs
  • zstd/src/encoding/cost_model/mod.rs
  • zstd/src/encoding/match_generator.rs
  • zstd/src/encoding/strategy.rs
  • zstd/tests/common/mod.rs
  • zstd/tests/corpus_level22_gap.rs
  • zstd/tests/he_level22_ratio.rs

Comment thread zstd/tests/common/mod.rs Outdated
Copy link
Copy Markdown

@sw-release-bot sw-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'structured-zstd vs C FFI (x86_64-gnu)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.30.

Benchmark suite Current: 372ef50 Previous: 2e9c99d Ratio
compress/default/low-entropy-1m/matrix/pure_rust 18.293 ms 13.554 ms 1.35

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

@polaz
Copy link
Copy Markdown
Member Author

polaz commented May 14, 2026

@coderabbitai Re: the STRUCTURED_ZSTD_BENCH_LEVEL_FILTER duplicate finding — fixed in HEAD. supported_levels_filtered now panics for ANY unknown token in the filter list (not only when every token is unknown), so a partial typo like default,typo fails loudly with a message listing the supported levels.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Tip

For best results, initiate chat on the files or code changes.

@polaz: Thanks for the update — the stricter per-token validation is the right call. Panicking on any unrecognised token (rather than only when the entire set matches nothing) means a partially-typo'd filter like default,typo won't silently run just the recognised subset and give misleading CI bench results. I'll note this for future reviews.

🐰 (nods approvingly, ears perked)


✏️ Learnings added
Learnt from: polaz
URL: https://github.com/structured-world/structured-zstd/pull/123

Timestamp: 2026-05-14T20:44:18.095Z
Learning: In `zstd/benches/support/mod.rs`, `supported_levels_filtered()` (controlled by `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER` env var) panics for ANY unrecognised token in the comma-separated filter list — not only when all tokens are unknown. A partial typo such as `default,typo` causes a loud failure that lists all supported level names.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

polaz added 2 commits May 14, 2026 23:45
…matches

Four follow-up findings from PR #123:

- The two `btultra2_profile_*` cost-model tests built `p1` and `p2`
  with identical `const_for_strategy::<BtUltra2>()` calls (the
  `pass2: bool` parameter that distinguished them in the runtime
  `for_mode` API is gone), so `assert_eq!(p1.field, p2.field)`
  became a tautology. Collapsed
  `btultra2_profile_disables_small_offset_handicap` to a single
  profile binding asserting both the small-offset flag and the
  accurate-pricing flag; deleted
  `btultra2_profile_is_single_pass_opt2` outright (the only
  invariant left was the now-trivial reflexive equality).
- The `MatcherBackend` → `BackendTag` consolidation made
  `match self.active_backend { Simple => Simple, … }` an identity
  mapping. Replaced both the `reset()` debug_assert and the
  matching `driver_reset_keeps_strategy_tag_in_sync_with_active_backend`
  test scaffolding with a direct
  `assert_eq!(strategy_tag.backend(), active_backend, …)`.
- `run_btultra2_seed_pass` declared `type S = BtUltra2;` but only
  used the alias in the `build_optimal_plan::<S>` call site, while
  the adjacent `const_for_strategy::<super::strategy::BtUltra2>()`
  spelled the type out. Routed the cost-profile call through `S`
  too so the seed-pass body stays uniform.
- `tests/common::parse_zstd_block_breakdown` returns `None` on
  truncated frame headers (`frame.len() < header_len`) — without
  the guard a 4-byte-magic-only input returned
  `Some(BlockBreakdown::default())`, contradicting the docstring.
- `supported_levels_filtered()` panics for *any* unknown token in
  `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER`, not just for the "every
  token unknown" case. Partial typos (`default,typo`) previously
  silently dropped the bad token — they now fail loudly with a
  message listing the supported levels.

Refs #122 Phase 3.
CI bench matrix used to rebuild the `compare_ffi` criterion binary
on every shard (3 targets × 6 levels = 18 cargo builds, each
~4-7 min on the GitHub runner). Phase-2 cache hits brought it down
but the cold-cache penalty was still ~120-150 min of accumulated
build time per CI run.

New layout:

- `bench-build` matrix (3 jobs, one per target) builds the criterion
  binary with `cargo bench --no-run --message-format=json`, locates
  the resolved executable via `jq`, and uploads it as
  `bench-binary-<target>`. ~20 min budget per build, retention
  7 days.
- `benchmark` matrix (3 × 6 = 18 shards) now `needs: bench-build`,
  downloads the matching `bench-binary-<target>` artifact, marks
  it executable, and points `run-benchmarks.sh` at it through a
  new `STRUCTURED_ZSTD_BENCH_BIN` env var.
- `run-benchmarks.sh` re-execs `$STRUCTURED_ZSTD_BENCH_BIN --bench
  --output-format bencher` when the var is set; otherwise it
  falls back to the original `cargo bench` path (so local
  invocations and any non-CI use stay unchanged).
- Shard runners no longer install the rust toolchain or run
  `cargo` at all — only the target-specific runtime packages
  (`gcc-multilib`/`musl-tools`) needed to execute a non-host-arch
  binary.
- The `decodecorpus_files/z000033` fixture is referenced via
  `env!("CARGO_MANIFEST_DIR")` (compile-time absolute path).
  GitHub Actions always checks out into `$GITHUB_WORKSPACE` which
  is identical across all `ubuntu-latest` runners, so the
  embedded path resolves on every shard.

Smoke-tested locally: copying the host-target bench binary to
`/tmp/compare_ffi_bin` and running `STRUCTURED_ZSTD_BENCH_BIN=...
bash run-benchmarks.sh` re-executed the binary directly, no
cargo invocation.

Expected wall-clock impact: each shard drops ~5 min of build,
leaving the criterion measurement itself as the dominant cost.
i686-level22 (the bottleneck shard) was ~30 min, now closer to
~20-22 min — full CI bench matrix should land around ~25 min
end-to-end including the 3 bench-build jobs running in parallel
with the lint/test phase.

Refs #122 Phase 3.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
zstd/src/encoding/match_generator.rs (2)

200-207: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Collapse strategy resolution to a single source of truth.

params.backend comes from LEVEL_TABLE, while strategy_tag is recomputed separately via StrategyTag::for_compression_level(level). The Line 666 assert only checks backend-family agreement, so a BtOpt/BtUltra/BtUltra2 drift still passes because all three are HashChain. That would silently combine one level’s HC config with another strategy’s const-gated pricing/hash3 behavior. Prefer carrying the resolved StrategyTag in LevelParams (or deriving BackendTag from it) so reset() consumes one mapping instead of two.

Also applies to: 655-670

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/src/encoding/match_generator.rs` around lines 200 - 207, LevelParams
currently stores backend (params.backend) from LEVEL_TABLE while reset()
separately recomputes strategy_tag via
StrategyTag::for_compression_level(level), allowing mismatches between backend
variants that share a family (e.g., BtOpt/BtUltra/BtUltra2) to slip through;
change LevelParams to carry the resolved StrategyTag instead of raw backend (or
derive backend via StrategyTag), update places that read params.backend to use
params.strategy_tag (or strategy_tag.backend()), and make reset() consume the
single mapping (using the stored StrategyTag) so all config (hc, pricing/hash3
gating, etc.) is derived from one authoritative strategy value rather than two
sources.

3234-3302: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Drop the test-only hash3_log escape hatch.

The has_hash3 => BtUltra2 branch makes this shim less strict than production: tests can get BtUltra2 hash3 behavior without ever calling configure() or setting a matching strategy_tag. That weakens the new contract you just introduced and can let fixture drift pass unnoticed. I’d rather require tests to configure the matcher the same way production does and dispatch purely from self.strategy_tag.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/src/encoding/match_generator.rs` around lines 3234 - 3302, The test-only
escape hatch using table.hash3_log must be removed so dispatch in
collect_optimal_candidates is purely driven by self.strategy_tag; delete the
has_hash3 check and the branch that maps Fast|Dfast|Greedy|Lazy when has_hash3
=> to BtUltra2, and have those tags directly call
collect_optimal_candidates_initialized::<strategy::Lazy, false> (or the
appropriate strategy) while the BT tags (StrategyTag::BtUltra2, ::BtUltra,
::BtOpt) keep their existing collect_optimal_candidates_initialized::<..., true>
calls; update any test fixtures to call configure() so they set strategy_tag
instead of relying on table.hash3_log.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 200-207: LevelParams currently stores backend (params.backend)
from LEVEL_TABLE while reset() separately recomputes strategy_tag via
StrategyTag::for_compression_level(level), allowing mismatches between backend
variants that share a family (e.g., BtOpt/BtUltra/BtUltra2) to slip through;
change LevelParams to carry the resolved StrategyTag instead of raw backend (or
derive backend via StrategyTag), update places that read params.backend to use
params.strategy_tag (or strategy_tag.backend()), and make reset() consume the
single mapping (using the stored StrategyTag) so all config (hc, pricing/hash3
gating, etc.) is derived from one authoritative strategy value rather than two
sources.
- Around line 3234-3302: The test-only escape hatch using table.hash3_log must
be removed so dispatch in collect_optimal_candidates is purely driven by
self.strategy_tag; delete the has_hash3 check and the branch that maps
Fast|Dfast|Greedy|Lazy when has_hash3 => to BtUltra2, and have those tags
directly call collect_optimal_candidates_initialized::<strategy::Lazy, false>
(or the appropriate strategy) while the BT tags (StrategyTag::BtUltra2,
::BtUltra, ::BtOpt) keep their existing
collect_optimal_candidates_initialized::<..., true> calls; update any test
fixtures to call configure() so they set strategy_tag instead of relying on
table.hash3_log.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e80dbeff-6d2b-4cfb-ac61-a2791db2acd1

📥 Commits

Reviewing files that changed from the base of the PR and between 372ef50 and 910db7b.

📒 Files selected for processing (3)
  • zstd/benches/support/mod.rs
  • zstd/src/encoding/match_generator.rs
  • zstd/tests/common/mod.rs

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 261-272: The "Upload benchmark shard artifacts" step is missing a
retention setting and should match the 7-day retention used elsewhere; update
the actions/upload-artifact@v4 step named "Upload benchmark shard artifacts" to
include a retention-days: 7 value under its with block so the intermediate
benchmark shard artifacts (e.g., benchmark-results.${{ matrix.bench.id }}.${{
matrix.level }}.json and the other matrix-based files) are kept for 7 days only.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c40c2d25-3823-46de-a20b-23fd16d6e110

📥 Commits

Reviewing files that changed from the base of the PR and between 910db7b and ee62ec0.

📒 Files selected for processing (2)
  • .github/scripts/run-benchmarks.sh
  • .github/workflows/ci.yml

Comment thread .github/workflows/ci.yml
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated no new comments.

Two more findings from PR #123 review:

(A) `LevelParams` now carries `strategy_tag: StrategyTag` instead of
    a stand-alone `backend: BackendTag`. The backend family is
    derived via `LevelParams::backend() -> StrategyTag::backend()`
    so there is no second authoritative mapping from
    `CompressionLevel` to backend, and `BtOpt`/`BtUltra`/`BtUltra2`
    drift inside the `HashChain` family is no longer possible (the
    pre-fix `params.backend == HashChain` check was equally happy
    with any of the three).
    `MatchGeneratorDriver::reset()` reads `params.strategy_tag`
    directly and drops the parallel
    `StrategyTag::for_compression_level(level)` call, eliminating
    the drift surface the review flagged.

(B) The `#[cfg(test)] HcMatchGenerator::collect_optimal_candidates`
    dispatcher no longer carries the `hash3_log != 0 => BtUltra2`
    escape hatch. Tests that wired `hash3_log` directly without
    calling `configure()` are gone or migrated:
    - `hc_collect_optimal_candidates_uses_hash3_when_chain_depth_zero`
    - `hc_collect_optimal_candidates_hash3_updates_skipped_prefix_positions`
    - `hc_hash3_tail_match_advances_update_cursor_on_early_return`
    All three forced `search_depth = 0` together with
    `hash3_log != 0` — an HC-chain-walker-only fixture state that
    production never reaches (hash3 is BtUltra2-exclusive and
    BtUltra2 always uses `search_depth = 512`). The end-to-end
    hash3 invariants stay covered by the level22 roundtrip + donor
    parity ratio gate.

Plus the inline #13 nit: `retention-days: 7` on the benchmark shard
upload-artifact step.

LEVEL_TABLE entries migrate from `backend: BackendTag::*` to
`strategy_tag: StrategyTag::{Fast,Dfast,Greedy,Lazy,BtOpt,BtUltra,
BtUltra2}` per band. `resolve_level_params` override sites
(`Uncompressed`, negative levels, level22 source-size override)
emit a `strategy_tag` directly.

457/457 nextest (was 460, minus 3 deleted tests), clippy clean
(default + tests + benches), fmt clean, build × 2.

Refs #122 Phase 3.
@polaz
Copy link
Copy Markdown
Member Author

polaz commented May 14, 2026

@coderabbitai Re: the two outside-diff findings on match_generator.rs — both fixed in b63a37c. (A) LevelParams now carries strategy_tag instead of backend, with backend() derived via StrategyTag::backend(); reset() drops the parallel for_compression_level call so the level→strategy mapping has a single authoritative source. (B) The hash3_log escape hatch is removed; the three tests that depended on the impossible (uses_bt=false, hash3_log!=0) state are deleted — those donor-parity invariants are now covered end-to-end by the level22 ratio gate.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Tip

For best results, initiate chat on the files or code changes.

@polaz: Both fixes land cleanly:

  • (A) Deriving backend() from strategy_tag via StrategyTag::backend() is the right consolidation — a single call-site in reset() is all that should ever decide level → strategy, and having backend as a separate stored field was exactly the kind of drift risk that causes subtle mismatches under future edits.
  • (B) Deleting the hash3_log escape hatch alongside the three tests that exercised the (uses_bt=false, hash3_log!=0) impossible state is the correct resolution. Keeping dead-state tests alive would only signal false confidence; the level22 ratio gate is the right integration harness for this class of donor-parity invariant.

Thanks for the clear commit reference.


✏️ Learnings added
Learnt from: polaz
URL: https://github.com/structured-world/structured-zstd/pull/123

Timestamp: 2026-05-14T21:32:19.114Z
Learning: In `zstd/src/encoding/match_generator.rs`, `LevelParams` carries `strategy_tag: StrategyTag` (not `backend` directly); `backend()` is always derived via `StrategyTag::backend()`. `reset()` has a single authoritative `for_compression_level` call so the level→strategy mapping has one source of truth — there is no parallel `for_compression_level` call elsewhere. Fixed in commit b63a37cd on PR `#123`.

Learnt from: polaz
URL: https://github.com/structured-world/structured-zstd/pull/123

Timestamp: 2026-05-14T21:32:19.114Z
Learning: In `zstd/src/encoding/match_generator.rs`, the `hash3_log` escape hatch (which allowed `hash3_log != 0` when `uses_bt=false`) was removed as an impossible state. The three tests that depended on that `(uses_bt=false, hash3_log!=0)` configuration were deleted; donor-parity invariants for hash3 behaviour are now covered end-to-end by the level22 ratio gate in `zstd/tests/`. Fixed in commit b63a37cd on PR `#123`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

The repo ships a `zstd/fuzz/` setup (5 targets — decode / encode /
interop / huff0 / fse) and a populated `artifacts/` regression
corpus, but the CI workflow never invoked any of it. Past crashes
could resurface and be merged without anything catching them at PR
time.

This commit wires `cargo-fuzz` into CI alongside the benchmark
matrix:

- Matrix of 5 fuzz targets, each shard runs on its own
  `ubuntu-latest` runner so total wall-clock is bounded by the
  slowest single target.
- `taiki-e/install-action@v2` installs `cargo-fuzz`; nightly Rust
  toolchain is required (libFuzzer instrumentation).
- Replay phase: every artifact in `zstd/fuzz/artifacts/<target>/`
  is fed through `cargo fuzz run <target> <files>` first.
  Re-introducing a previously fixed crash fails CI on the original
  reproducer.
- Fresh fuzzing phase: `cargo fuzz run <target> --
  -max_total_time=90 -timeout=30`. libFuzzer's own time cap is the
  inner budget; the `timeout-minutes: 10` on the job is the outer
  ceiling.
- On failure the shard uploads `zstd/fuzz/artifacts/<target>/`
  as `fuzz-<target>-artifacts` (14-day retention) so the new
  reproducer survives the runner cleanup.

Refs #122 Phase 3.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 141-150: The duplicate target lists (the bench entries with ids
x86_64-gnu, i686-gnu, x86_64-musl plus the build matrix, AGGREGATE_TARGETS,
per-target upload steps and per-target pages downloads) should be centralized
into a single source of truth and referenced everywhere; create one canonical
definition (e.g., a YAML anchor or a single matrix/env variable named
AGGREGATE_TARGET_LIST or targets_anchor) that contains the three target objects
(with target_triple and setup_command) and then replace the repeated inline
lists in the build matrix, benchmark matrix, AGGREGATE_TARGETS expansion, upload
steps, and pages-download steps to reference that single definition so future
target changes only need to be made once.
- Around line 213-216: The CI shard configuration for id: x86_64-musl currently
installs build-only packages in setup_command; split build-time and runtime
steps by moving compilation/tooling installs (e.g., "musl-tools") into a
dedicated setup_command used for build shards and add a separate
runtime_setup_command for shards that only execute the prebuilt binary
(STRUCTURED_ZSTD_BENCH_BIN) so runtime shards avoid installing musl-tools;
update the x86_64-musl entry to use setup_command for build dependencies and
runtime_setup_command for only the packages required to run the artifact.

In `@zstd/src/encoding/match_generator.rs`:
- Around line 413-418: The migration comments around the compile-time strategy
tag are stale: they still talk about future work (compress_block::<S> dispatch,
parse-mode removal, and a debug_assert! in reset()) even though the refactor has
landed; update the comment blocks near the strategy/tag explanation and the
other noted ranges (including mentions around reset(), active_backend, and
driver/HC invariants) to accurately describe the current implementation (that
compress_block::<S> dispatch has been implemented, parse-mode is removed, and
there is no debug_assert! in reset()), and replace speculative wording with
concrete statements about the existing invariants between the strategy tag and
active_backend and how the driver/HC invariants are maintained.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 76f332f3-3b93-49eb-911a-eaa266e62801

📥 Commits

Reviewing files that changed from the base of the PR and between ee62ec0 and b63a37c.

📒 Files selected for processing (2)
  • .github/workflows/ci.yml
  • zstd/src/encoding/match_generator.rs

Comment thread .github/workflows/ci.yml Outdated
Comment thread .github/workflows/ci.yml Outdated
Comment thread zstd/src/encoding/match_generator.rs Outdated
polaz added 3 commits May 15, 2026 00:42
- zstd/fuzz/Cargo.toml had no `edition` field, defaulting to 2015 where
  `use structured_zstd::...` requires `extern crate`; targets failed to
  compile with `unresolved module or unlinked crate`.
- The root `rust-toolchain.toml` pins stable, overriding
  `dtolnay/rust-toolchain@nightly` once `cargo fuzz` shells out to
  `cargo build` inside the fuzz crate. `cargo fuzz` needs `-Z sanitizer`
  which stable rejects. Set `RUSTUP_TOOLCHAIN: nightly` at job env so
  the override survives the sub-cargo invocation.

Verified locally: all 5 targets build, `cargo fuzz run fse` completed
62k runs in 10s with no crashes.
- New `bench-matrix` job emits the canonical bench targets JSON;
  `bench-build` and `benchmark` consume it via
  `fromJSON(needs.bench-matrix.outputs.targets)`.
- Each target now carries separate `build_setup` (run only on
  `bench-build`) and `runtime_setup` (run only on `benchmark`
  shards), so `x86_64-musl` no longer reinstalls `musl-tools` on
  every runtime shard.
- `benchmark-aggregate` ships a single `benchmark-aggregated`
  artifact with per-target consolidated files;
  `benchmark-pages` downloads that one artifact instead of three
  named ones. `merge-benchmarks.py` rglob's so the layout change
  is transparent.
- `AGGREGATE_TARGETS` env now sourced from
  `needs.bench-matrix.outputs.ids_csv`.

Also: refresh stale migration comments in `match_generator.rs`
that described the const-generic dispatch as future work (the
refactor has landed; `parse_mode` field is retained only for
in-crate test callers, strategy-shaped predicates fold at
codegen time).
`encode_szstd_uncompressed` and `encode_szstd_compressed` drained the
input `Read` into a local buffer and then passed the now-empty `Read`
into `compress_to_vec`, so the compression branches of the interop
roundtrip silently encoded zero bytes. The two regression artifacts
shipped under `artifacts/interop/` triggered the assertion `decoded
([]) != data ([0])` on every CI replay.

- Take `&[u8]` directly instead of the drain-then-discard `Read` dance.
- `encode_szstd_compressed` now uses `CompressionLevel::Default`
  instead of being a verbatim duplicate of the uncompressed variant.

Verified: both regression artifacts replay cleanly under
`cargo fuzz run interop artifacts/interop/*`.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated no new comments.

The `cargo-fuzz` binary installed via `taiki-e/install-action@v2` is
statically linked against musl on Linux; its `default_target()` probe
picks `x86_64-unknown-linux-musl`, which then fails because
AddressSanitizer cannot link against a static libc:

  error: sanitizer is incompatible with statically linked libc,
         disable it using `-C target-feature=-crt-static`
  error[E0463]: can't find crate for `core`

(The musl rust-std is not installed on the runner either.)

Pass `--target x86_64-unknown-linux-gnu` explicitly via a job-level
`FUZZ_TARGET_TRIPLE` env so both the corpus-replay step and the
fresh-fuzz step bypass the probe and link against glibc, which is
what the dtolnay/rust-toolchain@nightly install has stdlib for.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
zstd/src/encoding/match_generator.rs (1)

4589-4595: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Build this fixture through configure() instead of mutating strategy flags by hand.

After the strategy-tag refactor, BtUltra2 implies a matching hash3 setup. Flipping strategy_tag/is_btultra2/uses_bt here leaves hash3_log and hash3_table in the old state, so this test now exercises a configuration production never constructs and can trip the new debug invariant in collect_optimal_candidates_initialized_body.

Also applies to: 4611-4615

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/src/encoding/match_generator.rs` around lines 4589 - 4595, The test
constructs an HC fixture by manually mutating strategy_tag/is_btultra2/uses_bt
which leaves hash3_log/hash3_table inconsistent and triggers the debug invariant
in collect_optimal_candidates_initialized_body; instead build the fixture via
the existing configure() path so all derived fields are set consistently for
StrategyTag::BtUltra2 (e.g., call the configure() helper or the HC::configure
method with BtUltra2 rather than setting hc.strategy_tag, hc.table.is_btultra2,
hc.table.uses_bt, etc.), and apply the same change to the other instance around
the 4611-4615 region so collect_optimal_candidates sees a properly initialized
configuration.
♻️ Duplicate comments (1)
zstd/src/encoding/match_generator.rs (1)

2738-2748: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Remove the stale self.parse_mode reference from this doc comment.

HcMatchGenerator no longer has a parse_mode field, so this paragraph now documents a state that does not exist. Point it at the test-only start_matching() / strategy_tag dispatcher instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/src/encoding/match_generator.rs` around lines 2738 - 2748, The doc
comment incorrectly references a non-existent self.parse_mode field on
HcMatchGenerator; remove that sentence and update the text to say the runtime
dispatch is now via the test-only inherent method
HcMatchGenerator::start_matching() which uses the strategy_tag dispatcher (not
parse_mode), and keep the rest describing that
MatchGeneratorDriver::compress_block branches on the compile-time S::USE_BT so
monomorphisations see either start_matching_lazy/start_matching_optimal as
appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 4589-4595: The test constructs an HC fixture by manually mutating
strategy_tag/is_btultra2/uses_bt which leaves hash3_log/hash3_table inconsistent
and triggers the debug invariant in collect_optimal_candidates_initialized_body;
instead build the fixture via the existing configure() path so all derived
fields are set consistently for StrategyTag::BtUltra2 (e.g., call the
configure() helper or the HC::configure method with BtUltra2 rather than setting
hc.strategy_tag, hc.table.is_btultra2, hc.table.uses_bt, etc.), and apply the
same change to the other instance around the 4611-4615 region so
collect_optimal_candidates sees a properly initialized configuration.

---

Duplicate comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 2738-2748: The doc comment incorrectly references a non-existent
self.parse_mode field on HcMatchGenerator; remove that sentence and update the
text to say the runtime dispatch is now via the test-only inherent method
HcMatchGenerator::start_matching() which uses the strategy_tag dispatcher (not
parse_mode), and keep the rest describing that
MatchGeneratorDriver::compress_block branches on the compile-time S::USE_BT so
monomorphisations see either start_matching_lazy/start_matching_optimal as
appropriate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 05f44548-751d-4702-8206-aad701f2de38

📥 Commits

Reviewing files that changed from the base of the PR and between ee62ec0 and bd02dc6.

📒 Files selected for processing (4)
  • .github/workflows/ci.yml
  • zstd/fuzz/Cargo.toml
  • zstd/fuzz/fuzz_targets/interop.rs
  • zstd/src/encoding/match_generator.rs

polaz added 2 commits May 15, 2026 01:11
`btultra_and_btultra2_both_keep_dictionary_candidates` set
`hc.strategy_tag`, `hc.table.is_btultra2`, `hc.table.uses_bt`,
`hc.backend` and `hc.hc.search_depth` by hand to flip the matcher
between BtUltra2 and BtUltra between the two `collect_optimal_candidates`
calls. After the strategy-tag refactor, BtUltra2 implies a non-zero
`hash3_log` and an allocated `hash3_table`; the hand-mutated path
left those fields in their default zero shape, so the test exercised
a configuration the production `configure()` path never constructs and
risked tripping the `Strategy::USE_HASH3 ⇒ hash3_log != 0` debug
invariant in `collect_optimal_candidates_initialized_body`.

The fixture now goes through `HcMatchGenerator::configure()` for each
phase with explicit `HcConfig` + `StrategyTag` + window_log inputs.
The history-prime closure is reused so both phases see identical
content and dictionary-limit state.

Also drop a stale `self.parse_mode` reference in the doc comment for
`start_matching_strategy` and a similar stale reference next to
`HcOptimalCostProfile::const_for_strategy` — the field was removed in
the parse-mode rip-out commit and the comments now describe the
test-only inherent `start_matching`'s `strategy_tag` dispatcher.
The 5-way matrix paid a fresh asan+sancov build of `structured-zstd`
(~2 min) on every shard — the dominant cost — for ~3 min of wall-
clock parallelism. Wall-clock is already bounded by the bench matrix
(~25 min) which runs in parallel, so the matrix split was buying
nothing useful.

Run all five targets sequentially in one runner instead: the first
target builds, the remaining four reuse cargo's incremental cache
(~seconds each). Halves the compute spend while keeping each target's
output grouped under `::group::` markers for log navigation, and the
fresh-fuzz step still respects the 90s libFuzzer budget per target.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated no new comments.

@polaz polaz merged commit b486614 into main May 14, 2026
37 checks passed
@polaz polaz deleted the feat/#122-phase3-const-generic-strategy branch May 14, 2026 22:18
@sw-release-bot sw-release-bot Bot mentioned this pull request May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(encoding): #111 Phase 3 — const-generic Strategy dispatch

2 participants