perf(encoding): #111 Phase 3 — const-generic Strategy dispatch#123
Conversation
Phase 3 commit 1: lay down the type-level skeleton that future commits
will route the encoder hot paths through. No call sites change — this
commit only adds new types alongside the existing runtime
`HcParseMode` / `MatcherBackend` dispatch.
- `Strategy` trait with associated consts mirroring the donor
`ZSTD_compressionParameters` template parameters: `BACKEND`,
`MIN_MATCH`, `ACCURATE_PRICE`, `FAVOR_SMALL_OFFSETS`, `USE_HASH3`,
`USE_BT`, `OPT_LEVEL`, plus a `PARSE_MODE` bridge to the existing
runtime enum.
- 7 concrete strategy types `Fast` / `Dfast` / `Greedy` / `Lazy` /
`BtOpt` / `BtUltra` / `BtUltra2`. Levels 5-15 collapse onto `Lazy`
because their differences are runtime `HcConfig` fields, not
compile-time consts.
- `BackendTag` enum + `StrategyTag` runtime dispatcher with a
`for_level(u8) -> StrategyTag` const fn that mirrors the donor
`clevels.h` mapping (1 → Fast, 2-3 → Dfast, 4 → Greedy, 5-15 → Lazy,
16-17 → BtOpt, 18-19 → BtUltra, 20-22 → BtUltra2).
- Bridge methods `StrategyTag::parse_mode()` / `::backend()` so the
next commits can introduce a generic entry point that still hands
off to the runtime-enum-consuming code.
- Test coverage:
- Each `Strategy` impl's `PARSE_MODE` and `BACKEND` agree with the
matching `StrategyTag` arm.
- `for_level` returns the right tag at every band boundary plus a
mid-band level.
- `USE_BT` aligns with parse_mode (only BtOpt / BtUltra / BtUltra2
walk the BT).
- `USE_HASH3` is set exclusively for BtUltra2 (donor parity).
- `ACCURATE_PRICE` / `FAVOR_SMALL_OFFSETS` track the
`HcOptimalCostProfile::for_mode` runtime table row-for-row so the
eventual `const_for::<S>` rewrite is a mechanical swap.
Refs #122 Phase 3.
Phase 3 commit 2: resolve the compile-time strategy tag once per `reset()` and store it on the driver alongside the existing `active_backend` runtime enum. No hot-path dispatch changes yet — the field is set, debug-asserted to stay in lock-step with `active_backend`, and covered by a new test. - Add `StrategyTag::for_compression_level(CompressionLevel)` mirroring `resolve_level_params` exactly: named variants (`Uncompressed` → Fast, `Default` → Dfast, `Better`/`Best` → Lazy) and numeric `Level(n)` clamped through the existing `for_level(u8)` band map. Negative levels fall back to `Fast` (Simple matcher with elevated `hash_fill_step`), matching the `resolve_level_params` Simple-arm. - Add `strategy_tag: StrategyTag` to `MatchGeneratorDriver`. Initialise to `StrategyTag::Fast` to match the existing `active_backend = MatcherBackend::Simple` default. - Resolve `strategy_tag` from the requested `CompressionLevel` inside `reset()`, immediately after the existing `active_backend` assignment. A `debug_assert!` enforces the invariant `strategy_tag.backend() == active_backend` so future drift in `for_compression_level` or `resolve_level_params` is caught in debug builds. - New regression test `driver_reset_keeps_strategy_tag_in_sync_with_active_backend` walks every band boundary (1, 2-3, 4, 5-15, 16-17, 18-19, 20-22) plus all four named variants and asserts both the expected tag and the matching backend. This commit is purely additive — no behaviour change for any caller. Sets up the data flow that commit 3 will branch the hot dispatch on to enter `compress_block::<S>` monomorphisations. Refs #122 Phase 3.
Phase 3 commit 3: introduce the monomorphised per-block entry point without changing any inner hot-loop behaviour. Each `Matcher::start_matching` call now picks a concrete `S: Strategy` from `self.strategy_tag` (resolved at `reset()` in commit 2) and hands off to `compress_block::<S>`. The body still uses runtime dispatch internally, so this commit is behaviour-preserving — it exists to plumb the type parameter into the HC matcher so later commits can replace inner `match self.parse_mode` branches with `if S::USE_BT` / `if S::USE_HASH3` const checks. - `MatchGeneratorDriver::start_matching` becomes a 7-arm match over `self.strategy_tag` dispatching into `compress_block::<S>` per variant. - `compress_block<S: Strategy>(&mut self, …)` lives next to the `impl Matcher` block. Body switches on `S::BACKEND` (an associated `const`), so each monomorphisation keeps exactly one backend arm after const-eval folds the `match` — verified by inspection of the generated IR before/after. - For the `HashChain` backend, `compress_block` calls a new `HcMatchGenerator::start_matching_strategy<S>` method that today just delegates back to the existing runtime-dispatched `start_matching`. A `debug_assert_eq!` enforces `S::PARSE_MODE == self.parse_mode` so the bridge cannot drift silently while we migrate the inner loops in commits 4-7. 463/463 nextest, clippy clean, build × 2. Refs #122 Phase 3.
…e_mode
Phase 3 commit 4: the strategy-aware entry on `HcMatchGenerator` now
branches on the compile-time `S::USE_BT` constant. Each
monomorphisation (Lazy / BtOpt / BtUltra / BtUltra2) keeps exactly
one of `start_matching_lazy` / `start_matching_optimal` — the dead
arm is dropped at codegen time, so the per-block dispatch no longer
pays for a runtime tag check.
The lazy/optimal split is the outer-most parse_mode branch — every
encoded position inside the block already runs the right inner
parser, but until now the dispatch *itself* re-evaluated
`self.parse_mode` on every block call.
- `HcMatchGenerator::start_matching_strategy<S>` body becomes
`if S::USE_BT { start_matching_optimal } else { start_matching_lazy }`.
- The runtime-dispatched inherent `start_matching` is now `#[cfg(test)]`
— the production driver path goes through `start_matching_strategy`
exclusively. Tests that exercise HcMatchGenerator directly still
use the inherent method; the final cleanup commit removes both the
inherent method and the `parse_mode` field.
- `debug_assert_eq!(self.parse_mode, S::PARSE_MODE)` at the entry
catches any future drift between the runtime field (still set by
`configure`) and the const passed by the dispatcher.
- Cost-model and parse-mode parity assertions in
`strategy::tests` moved from `#[test] fn` into compile-time
`const _: () = { assert!(…) }` blocks per
`clippy::assertions_on_constants` (the asserts only inspect
associated `const`s — they belong at compile time, not in
the runner).
460/460 nextest, clippy clean (default + `--tests`), build × 2.
Refs #122 Phase 3.
Phase 3 commit 5: thread the const-generic `S: Strategy` through the optimal parser entry, BT dispatcher, and the four SIMD-kernel wrappers down to the `build_optimal_plan_impl_body!` macro. Inside the macro the two `parse_mode` reads that previously drove every candidate insertion and traceback step are replaced with strategy-projected `let` bindings — each is a compile-time constant per monomorphisation, so LLVM folds them and drops the dead arms in each strategy's instance. - `build_optimal_plan_impl_body!` gains a `$strategy_ty:ty` parameter and computes `abort_on_worse_match = <S>::OPT_LEVEL == 0` / `opt_level = <S>::OPT_LEVEL >= 2`. A `debug_assert!` enforces `<S>::USE_BT` so the body never runs for a non-BT strategy. - Each of `build_optimal_plan_impl_neon` / `_sse42` / `_avx2_bmi2` / `_scalar` gains `S: Strategy` as a leading type parameter alongside the existing `ACCURATE_PRICE` / `FAVOR_SMALL_OFFSETS` const-generics; they pass S through into the macro body. - `build_optimal_plan_impl<S>` plumbs S into the 4 kernel dispatchers. The runtime 4-arm match on `(profile.accurate, profile.favor_small_offsets)` stays for now — `generic_const_exprs` would be needed to project the bool consts directly into a const-generic argument list, and the dispatch is per-block, not per-position. In practice only two arms are reached (BtOpt → (false, true), BtUltra / BtUltra2 → (true, false)). - `build_optimal_plan<S>` and `start_matching_optimal<S>` follow. Both `start_matching_strategy<S>` (production path) and the `#[cfg(test)]` inherent `start_matching` (now 4-arm on `HcParseMode` → concrete `BtOpt` / `BtUltra` / `BtUltra2` types) invoke the generic entry. `run_btultra2_seed_pass` pins `S = BtUltra2` locally — by name it is BtUltra2-exclusive. - `nested const items cannot project through outer generics`: switched the per-strategy flags from `const ABORT_ON_WORSE_MATCH: bool = …` to `let abort_on_worse_match: bool = …` so the expressions live in the parent fn's substitution scope. The optimiser still folds them per monomorphisation. 460/460 nextest, clippy clean (default + `--tests`), build × 2. Refs #122 Phase 3.
Three ignored manual probes that document the level22 perf claims with executable evidence. All gated on `#[ignore]` so they do not run in CI by default; invoke with `cargo nextest run --release ... --run-ignored ignored-only`. - `he_level22_ratio` — confirms the high-entropy-1m scenario speedup (~10x vs C FFI) is driven by the source-size-hinted raw-fast-path; toggles the hint off to verify the slow-path matches C-zstd timing. - `incompressibility_falsepos` — exercises the `block_looks_incompressible` heuristic on hard-random, mixed, textish, and pattern-only inputs to confirm zero false positives (rust/C output sizes match within a 7-byte frame overhead). - `corpus_level22_gap` — quantifies the +0.08 % decodecorpus-z000033 output gap as a block-split heuristic divergence (233 vs 256 compressed blocks), not a core-compression regression. Refs #122 Phase 3 perf accounting.
Phase 3 commit 6: thread S through the optimal-candidates body and
its four SIMD-kernel wrappers, then gate the hash3 short-match
lookup (donor `static (mls==3)` probe) on `S::USE_HASH3`. Only
`BtUltra2` sets `USE_HASH3 = true`; every other monomorphisation
drops the entire `update_hash3_until` + `$hash3` block at codegen
time. On the level22 hot path this removes one indirect read +
branch + table walk per encoded position.
- `collect_optimal_candidates_initialized_body!` gains
`$strategy_ty:ty` and a `let use_hash3: bool = <S>::USE_HASH3`
binding. The hash3 sub-block becomes `if use_hash3 && … { … }`
so the dead arm folds away per S.
- `debug_assert!` confirms the runtime/compile-time invariant
(`USE_HASH3 = true` implies `hash3_log != 0`).
- Each of the four SIMD wrappers (`collect_optimal_candidates_
initialized_neon` / `_sse42` / `_avx2_bmi2` / `_scalar`) gains
`S: Strategy` alongside the existing `USE_BT_MATCHFINDER` const.
The cross-platform dispatcher (`collect_optimal_candidates_
initialized`) plumbs S through to all four arms.
- The DP body call sites in `build_optimal_plan_impl_body!`
(`$self.$collect::<$strategy_ty, true>(…)`) thread S into the
per-position pipeline.
- `#[cfg(test)] collect_optimal_candidates` rewritten as a four-arm
match on `(uses_bt, hash3_log != 0)` so artificial test
configurations still reach the right monomorphisation —
production callers never hit this helper.
460/460 nextest, clippy clean, build × 2.
Refs #122 Phase 3.
Phase 3 commit 7: make `should_run_btultra2_seed_pass` generic over
`S: Strategy` and short-circuit the predicate at compile time for
every non-BtUltra2 monomorphisation. Only `BtUltra2` satisfies
`S::OPT_LEVEL == 2 && S::USE_HASH3`, so Fast / Dfast / Greedy /
Lazy / BtOpt / BtUltra now drop both the predicate body and the
inlined call to `run_btultra2_seed_pass` after the per-strategy
monomorphisation pass.
- `should_run_btultra2_seed_pass<S>` starts with
`if !(S::OPT_LEVEL == 2 && S::USE_HASH3) { return false; }` —
a const expression per S, so LLVM folds it and the rest of the
predicate (HcBackend::Bt unwrap, frame-start checks, dictionary
state) only emits for BtUltra2.
- Replaced the runtime `self.is_btultra2()` read inside the
predicate with the const path; the `is_btultra2` helper stays
for the (single remaining) profile-selection site.
- `start_matching_optimal::<S>` now invokes
`self.should_run_btultra2_seed_pass::<S>(current_len)`.
- All test call sites pin `S = BtUltra2` (the only strategy where
the predicate is interesting); the corresponding assertions
remain unchanged.
460/460 nextest, clippy clean, build × 2.
Refs #122 Phase 3.
Phase 3 commit 8: replace the runtime profile selection in
`start_matching_optimal::<S>` with `HcOptimalCostProfile::
const_for_strategy::<S>()`. The four profile fields (max chain
depth, sufficient match length, accurate flag, favor-small-offsets
flag) come straight from the strategy's associated consts, so the
optimiser folds the whole profile into a single literal at codegen
time. The old two-arm runtime read (`if self.is_btultra2() {
for_mode(parse_mode, true) } else { for_mode(parse_mode, false) }`)
disappears.
- Add `Strategy::MAX_CHAIN_DEPTH` and `Strategy::SUFFICIENT_MATCH_LEN`
associated consts, mirroring the `HcOptimalCostProfile::for_mode`
table row-for-row. Non-BT strategies (Fast / Dfast / Greedy) get
the Lazy2 row so the trait stays total — those values are never
read because USE_BT short-circuits the path.
- Add `HcOptimalCostProfile::const_for_strategy<S>()` next to the
existing `for_mode`. Marked `#[inline]` so the literal lands at
the call site after monomorphisation.
- `start_matching_optimal::<S>` calls the new const peer; the
inherent `HcMatchGenerator::is_btultra2()` accessor is gone (no
remaining readers).
- `for_mode` itself stays — it has callers in tests and in
`run_btultra2_seed_pass` (which pins `S = BtUltra2` and could
switch to `const_for_strategy` later, but the call is once per
frame so it is not on the hot path).
460/460 nextest, clippy clean (default + `--tests`), build × 2.
Refs #122 Phase 3.
Phase 3 commit 9 (close-out): delete both legacy runtime dispatch
enums and every read site that depended on them. The encoder now
selects its match-finder backend exclusively through the
const-generic `S: Strategy` trait and the once-per-frame
`StrategyTag` resolved at `reset()`.
- `HcParseMode` (Lazy2 / BtOpt / BtUltra / BtUltra2) deleted from
`strategy.rs`. The `Strategy::PARSE_MODE` associated const and the
`StrategyTag::parse_mode()` bridge are gone with it — nothing
reads them now that every dispatch routes through `S::USE_BT`,
`S::USE_HASH3`, `S::OPT_LEVEL`, etc.
- `MatcherBackend` (Simple / Dfast / Row / HashChain) deleted from
`match_generator.rs`. All references (`LevelParams::backend`,
`MatchGeneratorDriver::active_backend`, plus the
`retire_dictionary_budget` / `trim_after_budget_retire` /
`skip_matching_for_dictionary_priming` 4-arm matches) now use the
unified `super::strategy::BackendTag`. Same 4-variant shape, one
enum.
- `HcConfig` drops its `parse_mode: HcParseMode` field; every
`HcConfig` literal in `LEVEL_TABLE` and the per-source-size
variants (`BTOPT_HC_CONFIG`, `BTULTRA_HC_CONFIG`,
`BTULTRA2_HC_CONFIG`, etc.) shrinks by one line.
- `HcMatchGenerator::parse_mode` field deleted.
`HcMatchGenerator::configure` gains an explicit
`tag: StrategyTag` argument from which it derives `is_btultra2`,
`uses_bt`, and the search-depth cap. The driver passes
`self.strategy_tag` (set at `reset()`) on the single production
call site.
- `HcOptimalCostProfile::for_mode(HcParseMode, bool)` deleted from
`cost_model::mod`. `const_for_strategy::<S>()` is now the only
way to build a profile. Every call site (production +
`run_btultra2_seed_pass` + the cost-model tests) was migrated to
pass an explicit `S = strategy::{Lazy, BtOpt, BtUltra, BtUltra2}`.
- The three `level_*_use_*_parse_mode` donor-parity tests are
rewritten to assert against `StrategyTag::for_level(level)`
directly (the data they were checking lives in the const-generic
trait now). The `#[cfg(test)] HcMatchGenerator::start_matching`
helper picks a strategy from the runtime table flags
(`uses_bt`/`is_btultra2`/`hash3_log`) so artificial test setups
still reach the right monomorphisation.
- `start_matching_strategy::<S>`'s `debug_assert_eq!` now compares
`S::USE_BT` against `self.table.uses_bt` (the field set by
`configure`) instead of the deleted `self.parse_mode` /
`S::PARSE_MODE` pair.
- Module-level docs in `strategy.rs` are rewritten as a single
Phase-3 dispatch flow diagram covering every const-folded gate
(`USE_BT`, `USE_HASH3`, `OPT_LEVEL`, `ACCURATE_PRICE`,
`FAVOR_SMALL_OFFSETS`, `MAX_CHAIN_DEPTH`, `SUFFICIENT_MATCH_LEN`).
`encoding/mod.rs` header updated to drop the now-stale
`HcParseMode` reference and to add `perf/post-pr-121-baseline`.
460/460 nextest, clippy clean (default + `--tests`), build × 2.
Phase 3 acceptance criterion "HcParseMode enum deleted" — done.
Refs #122 Phase 3.
Before: a single CI runner per target ran every level back-to-back
(~45 min on i686, with level22 dominating ~20 min on its own).
After: matrix split target × level (3 targets × 6 levels = 18
parallel runners) plus a single aggregation job that consolidates
per-target shards back into the existing per-target artifact
shape, so the cross-target merge step downstream stays unchanged.
- `support::supported_levels_filtered()` honours
`STRUCTURED_ZSTD_BENCH_LEVEL_FILTER` (comma-separated level names);
every loop in `compare_ffi.rs` that used to iterate over
`supported_levels()` now uses the filtered variant. With no env
var, behaviour is identical to before.
- `.github/scripts/aggregate-bench-levels.py` reads
`benchmark-{results,delta,relative,report}.<target>.<level>.<ext>`
shards (uploaded as `benchmark-shard-<target>-<level>` artifacts)
and emits per-target consolidated
`benchmark-{results,delta,relative,report}.<target>.<ext>` files
matching the pre-split naming. Markdown shards are concatenated
under per-level `## Level: <name>` sections.
- CI workflow:
- `benchmark` job: matrix becomes `bench × level`. Each runner
sets `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER` and uploads its
`benchmark-shard-<target>-<level>` artifact.
- new `benchmark-aggregate` job (post-benchmark): downloads
every `benchmark-shard-*` artifact, runs the aggregation
script, and re-uploads three target-shaped artifacts
(`benchmark-x86_64-gnu`, `benchmark-i686-gnu`,
`benchmark-x86_64-musl`). `github-action-benchmark` moved
here so the regression alert fires on the consolidated
x86_64-gnu result.
- `benchmark-pages` now depends on `benchmark-aggregate` and
downloads the three per-target artifacts explicitly (instead
of the previous `benchmark-*` glob) so the shard artifacts
don't leak into `merge-benchmarks.py`'s `<target>.<level>`
name extraction.
- Per-runner `timeout-minutes` cut from 40/55/40 to 25/30/25
matching the new per-level work share.
Smoke-test:
- `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER=fastest cargo bench
--list` emits only `fastest` benchmarks; no env var → all 6
levels.
- Aggregator round-trips a synthetic two-shard fixture and
produces the expected consolidated JSON + markdown files.
- Local nextest 460/460, clippy clean (default + benches).
Refs #122 Phase 3 — supporting infrastructure.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds a nightly fuzz job; splits benchmarks into bench-matrix, bench-build, and a benchmark job matrix that emits per-(target,level) shards; introduces benchmark-aggregate to merge level shards into per-target artifacts using ChangesCI benchmark pipeline
Sequence Diagram(s)sequenceDiagram
participant BenchBuild as bench-build (GH job)
participant ArtifactStore as Actions Artifacts
participant BenchmarkJob as benchmark (GH job matrix)
participant RunnerScript as run-benchmarks.sh
participant AggregateJob as benchmark-aggregate (GH job)
participant AggregateScript as aggregate-bench-levels.py
participant PagesJob as benchmark-pages (GH job)
BenchBuild->>ArtifactStore: upload bench-binary-<target>
BenchmarkJob->>ArtifactStore: download bench-binary-<target>
BenchmarkJob->>RunnerScript: invoke with STRUCTURED_ZSTD_BENCH_LEVEL_FILTER
RunnerScript->>BenchmarkJob: produce per-(target,level) shard files
BenchmarkJob->>ArtifactStore: upload benchmark-shard-<target>-<level>
AggregateJob->>ArtifactStore: download benchmark-shard-*
AggregateJob->>AggregateScript: run aggregate-bench-levels.py to merge shards
AggregateScript->>ArtifactStore: upload consolidated benchmark-aggregated
PagesJob->>ArtifactStore: download benchmark-aggregated
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
Phase 3 of the #111 encoder rewrite. Replaces the runtime HcParseMode / MatcherBackend enum dispatch with const-generic Strategy monomorphisation. A new Strategy trait + 7 ZST implementors (Fast/Dfast/Greedy/Lazy/BtOpt/BtUltra/BtUltra2) lift the parser/finder choice into the type system; hot entry points (compress_block::<S>, start_matching_optimal::<S>, build_optimal_plan_impl::<S, …>, collect_optimal_candidates_initialized::<S, …>) are now generic and let the optimiser fold S::USE_BT / S::USE_HASH3 / S::OPT_LEVEL branches per monomorphisation. The runtime HcParseMode enum and MatcherBackend enum are deleted; BackendTag (in strategy) and a per-driver StrategyTag field resolved at reset() become the single source of truth. CI also splits the bench matrix into target × level shards with a new benchmark-aggregate job consolidating shards for merge-benchmarks.py.
Changes:
- Introduce
Strategytrait + 7 ZST strategies +StrategyTag; thread<S>through optimal-parser dispatch, SIMD wrappers, cost-profile builder, and BTULTRA2 seed-pass gate. - Delete
HcParseModeandMatcherBackend; replacefor_mode(mode, pass2)withconst_for_strategy::<S>(); update all in-crate tests accordingly. - Split CI bench matrix per level (matrix.bench × matrix.level), add
benchmark-aggregatejob +aggregate-bench-levels.pyto recombine shards.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| zstd/src/encoding/strategy.rs | New Strategy trait, 7 ZSTs, BackendTag, StrategyTag::for_level/for_compression_level |
| zstd/src/encoding/mod.rs | Module doc refresh referencing the new strategy module |
| zstd/src/encoding/match_generator.rs | Driver strategy_tag; compress_block::<S>; monomorphised optimal parser / BT-seed paths; HcParseMode/MatcherBackend removed; tests rewritten |
| zstd/src/encoding/cost_model/mod.rs | for_mode replaced by const_for_strategy::<S>() |
| zstd/src/encoding/match_table/storage.rs | Doc-only refresh of is_btultra2/uses_bt source |
| zstd/benches/support/mod.rs | New level_filter_from_env + supported_levels_filtered |
| zstd/benches/compare_ffi.rs | Use supported_levels_filtered |
| zstd/tests/{he_level22_ratio,incompressibility_falsepos,corpus_level22_gap}.rs | New #[ignore]'d diagnostic probes |
| .github/workflows/ci.yml | Bench matrix split per level + new benchmark-aggregate job |
| .github/scripts/aggregate-bench-levels.py | New aggregator combining per-level shards into per-target files |
Comments suppressed due to low confidence (3)
zstd/src/encoding/match_generator.rs:3858
- After migrating from
for_mode(mode, pass2)toconst_for_strategy::<S>(),p1andp2are now built from the identical expression and are guaranteed equal. The test no longer exercises any difference between the former "pass2 = false" and "pass2 = true" call sites — every assertion is trivially satisfied. Either consolidate to a single profile and drop the duplicate checks, or extend the test to exercise the cases the old API distinguished (e.g. by reading bothBtUltraandBtUltra2profiles).
let p1 = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
let p2 = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
assert!(
!p1.favor_small_offsets,
"btultra2 primary profile should match donor opt2 offset pricing"
);
assert!(
!p2.favor_small_offsets,
"btultra2 secondary profile should match donor opt2 offset pricing"
);
}
#[test]
fn btultra2_profile_is_single_pass_opt2() {
let p1 = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
let p2 = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
assert_eq!(p1.max_chain_depth, p2.max_chain_depth);
assert_eq!(p1.sufficient_match_len, p2.sufficient_match_len);
assert_eq!(p1.accurate, p2.accurate);
zstd/src/encoding/match_generator.rs:4051
profile_predefandprofile_dynare now built from the identicalconst_for_strategy::<BtUltra2>()call, so the two profile values are equal. The original test distinguishedfor_mode(BtUltra2, false)fromfor_mode(BtUltra2, true). The only remaining difference between the two arms isstats.price_type(Predefined vs Dynamic), which is the actual intent — consider removing the now-duplicateprofile_dynbinding (or renaming both to make the single-profile fact explicit) so a future reader doesn't mistake them for materially different profiles.
let profile_predef = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
let mut stats_predef = HcOptState::new();
stats_predef.price_type = HcOptPriceType::Predefined;
let predef_max = profile_predef.lit_length_price(&stats_predef, HC_BLOCKSIZE_MAX);
let predef_prev =
profile_predef.lit_length_price(&stats_predef, HC_BLOCKSIZE_MAX.saturating_sub(1));
assert_eq!(
predef_max,
predef_prev + HC_BITCOST_MULTIPLIER,
"predefined litLength pricing at BLOCKSIZE_MAX must add exactly one bit"
);
let profile_dyn = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
zstd/src/encoding/match_generator.rs:2889
- The local
type S = super::strategy::BtUltra2;alias here is unused — bothconst_for_strategy::<...>calls below explicitly spell outsuper::strategy::BtUltra2, andbuild_optimal_plan::<S>is the only consumer of the alias. Either remove the alias (and inlineBtUltra2at thebuild_optimal_plancall site) or use it consistently for theconst_for_strategyline too, so the seed-pass body stays uniform.
// The seed pass is BtUltra2-exclusive by name (`is_btultra2()`
// gates the only call site), so pin S to BtUltra2 here.
type S = super::strategy::BtUltra2;
let seed_profile = HcOptimalCostProfile::const_for_strategy::<super::strategy::BtUltra2>();
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@zstd/benches/support/mod.rs`:
- Around line 96-104: supported_levels_filtered currently ignores unknown tokens
from level_filter_from_env, which can silently produce empty benchmark sets;
update supported_levels_filtered to validate the requested `keep` names against
the available names from supported_levels() and fail fast if any unknowns are
present: compute the set/list of available names from supported_levels(),
compute unknowns = requested - available, and if unknowns is non-empty, abort
(panic or e.g. unwrap with a clear message) with a message listing the invalid
tokens and the allowed names (mentioning STRUCTURED_ZSTD_BENCH_LEVEL_FILTER in
the message for clarity); otherwise proceed to filter as before.
In `@zstd/src/encoding/match_generator.rs`:
- Around line 2601-2607: The configure() code in HcMatchGenerator collapses
StrategyTag variants via local booleans (uses_bt/is_btultra2), causing
start_matching() and downstream logic to treat non-BtUltra2 BT modes as BtOpt;
instead store the full StrategyTag on the HcMatchGenerator struct (add a field
like tag: StrategyTag), set that field in configure(), and update
uses_bt/is_btultra2 checks and any dispatch/assert logic in start_matching() and
the parser/optimizer code to branch on self.tag (the preserved StrategyTag) so
BtOpt, BtUltra and BtUltra2 remain distinct for OPT_LEVEL, ACCURATE_PRICE, and
FAVOR_SMALL_OFFSETS decisions.
In `@zstd/src/encoding/strategy.rs`:
- Around line 270-275: The match arm handling CompressionLevel::Level casts n to
u8 before clamping, causing values >= 256 to wrap; change the logic to clamp the
i32 value against the valid range (use
n.max(1).min(CompressionLevel::MAX_LEVEL)) and only then cast to u8 before
calling Self::for_level, ensuring CompressionLevel::MAX_LEVEL and
Self::for_level are used to determine the clamp bounds.
In `@zstd/tests/corpus_level22_gap.rs`:
- Around line 57-76: The loop currently stops after a fixed 256 iterations,
which truncates analysis for larger frames; change the loop control to iterate
until the frame-end or the existing "last" sentinel is hit instead of using the
0..256 cap: replace the "for _ in 0..256" construct so that the body that
computes h, last, bt, bs and advances off runs repeatedly while off + 3 <=
frame.len() (or use an unconditional loop with the same off+3 check and the
existing last==1 break), preserving the inner logic that updates
raw/rle/compressed/total and advances off.
In `@zstd/tests/he_level22_ratio.rs`:
- Around line 48-108: The tests duplicate zstd block-parsing logic
(dump_block_types in he_level22_ratio.rs vs block_breakdown in
corpus_level22_gap.rs); extract the common parser into a shared test utility
(e.g., zstd/tests/util.rs or zstd/tests/common.rs) exposing a single function
(name it e.g. parse_zstd_blocks or dump_block_types) that accepts the frame
bytes and a loop-limit or uses a termination condition instead of hard-coded 64,
preserve the existing return values (raw/rle/compressed/total_size) or printing
behavior, then replace the two local implementations with calls to the shared
function (update he_level22_ratio.rs and corpus_level22_gap.rs to call
parse_zstd_blocks/dump_block_types). Ensure the new shared function reproduces
the same offsets/fhd parsing logic and parameterize the previous loop cap to
avoid behavioral drift.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4b7a4bb4-ace5-4de1-acab-d7e0d91bf34d
📒 Files selected for processing (12)
.github/scripts/aggregate-bench-levels.py.github/workflows/ci.ymlzstd/benches/compare_ffi.rszstd/benches/support/mod.rszstd/src/encoding/cost_model/mod.rszstd/src/encoding/match_generator.rszstd/src/encoding/match_table/storage.rszstd/src/encoding/mod.rszstd/src/encoding/strategy.rszstd/tests/corpus_level22_gap.rszstd/tests/he_level22_ratio.rszstd/tests/incompressibility_falsepos.rs
Six review findings from PR #123: (1) HcMatchGenerator now mirrors the driver-resolved StrategyTag in a new `strategy_tag` field set by `configure()`. The `#[cfg(test)] start_matching` dispatcher branches on it, so BtOpt / BtUltra / BtUltra2 each reach their own monomorphisation instead of collapsing onto BtOpt whenever `is_btultra2 == false`. The matching `collect_optimal_candidates` helper does the same, with a legacy `hash3_log != 0` fallback for artificial tests that wire `hash3_log` directly without calling `configure()`. (2) `HcOptimalCostProfile::const_for_strategy::<S>()` opens with `debug_assert!(S::USE_BT, …)`. The `MAX_CHAIN_DEPTH` and `SUFFICIENT_MATCH_LEN` consts on `Fast` / `Dfast` / `Greedy` / `Lazy` carry placeholder values (the optimal parser entry point is unreachable from those strategies). The debug assert plus fresh doc comments on each non-BT strategy point at the placeholder contract. (3) `supported_levels_filtered()` panics when `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER` is non-empty but matches zero known levels, with a message listing every known level. CI matrix typos (e.g. renaming a level in `supported_levels()` without updating ci.yml) now fail loudly on the affected shard instead of silently producing an empty bench run and a missing regression alert downstream. (4) `StrategyTag::for_compression_level(Level(n))` clamps `n` in `i32` BEFORE casting to `u8`. The previous `(n as u8).min(22)` truncated values >= 256 (e.g. `Level(256)` wrapped to `0` and routed to `Dfast`); now `Level(i32::MAX)` saturates to BtUltra2 as expected. Regression test `for_compression_level_clamps_oversized_numeric_levels_to_btultra2` added. (5) `tests/common/mod.rs` hosts a shared `parse_zstd_block_breakdown` + `dump_block_breakdown` helper. Both `corpus_level22_gap.rs` and `he_level22_ratio.rs` route through it instead of carrying near-identical local copies (the corpus probe had a 256-iter cap, the high-entropy probe had a 64-iter cap — both replaced with a bounded `while off + 3 <= frame.len()` walk so large frames are no longer truncated). 461/461 nextest (+1 regression test for #4), clippy clean (default + --tests), fmt clean, build × 2. Refs #122 Phase 3.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
zstd/benches/support/mod.rs (1)
104-119:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail fast when any requested level token is unknown.
Line 107 filters by intersection only, so
STRUCTURED_ZSTD_BENCH_LEVEL_FILTER=default,typostill runs and silently dropstypo. That can mask CI matrix mistakes and reduce intended shard coverage.Suggested minimal fix
pub(crate) fn supported_levels_filtered() -> Vec<LevelConfig> { let all = supported_levels(); match level_filter_from_env() { None => all.to_vec(), Some(keep) => { + let known: Vec<&'static str> = all.iter().map(|cfg| cfg.name).collect(); + let unknown: Vec<String> = keep + .iter() + .filter(|name| !known.iter().any(|k| k == &name.as_str())) + .cloned() + .collect(); + if !unknown.is_empty() { + panic!( + "STRUCTURED_ZSTD_BENCH_LEVEL_FILTER contains unknown level(s) \ + {unknown:?}; supported levels are {known:?}" + ); + } + let kept: Vec<LevelConfig> = all .into_iter() .filter(|cfg| keep.iter().any(|name| name == cfg.name)) .collect(); if kept.is_empty() { let known: Vec<&'static str> = supported_levels().iter().map(|cfg| cfg.name).collect();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@zstd/benches/support/mod.rs` around lines 104 - 119, The current Some(keep) branch silently drops unknown tokens; update the logic in the Some(keep) arm (the block that builds kept: Vec<LevelConfig>) to detect any requested names in keep that do not match supported_levels() and panic immediately listing the unknown tokens and known levels; specifically, compute the set/difference of keep minus supported_levels().iter().map(|cfg|cfg.name) and if non-empty call panic! with a clear message including the unknown tokens and the known levels, otherwise continue returning kept as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@zstd/tests/common/mod.rs`:
- Around line 48-50: The code computes header size into `off` and then proceeds
to parse blocks, but lacks a pre-check that the `frame` contains those header
bytes; update the function so that immediately after computing `let mut off = 5
+ window_descriptor_len + dict_id_len + fcs_len;` you check `if frame.len() <
off { return None; }` (or equivalent) so truncated frame headers return None
instead of `Some(BlockBreakdown::default())`; ensure this change affects the
same parsing flow that enters the `while off + 3 <= frame.len()` loop and
preserves existing uses of `BlockBreakdown::default()` for other cases.
---
Duplicate comments:
In `@zstd/benches/support/mod.rs`:
- Around line 104-119: The current Some(keep) branch silently drops unknown
tokens; update the logic in the Some(keep) arm (the block that builds kept:
Vec<LevelConfig>) to detect any requested names in keep that do not match
supported_levels() and panic immediately listing the unknown tokens and known
levels; specifically, compute the set/difference of keep minus
supported_levels().iter().map(|cfg|cfg.name) and if non-empty call panic! with a
clear message including the unknown tokens and the known levels, otherwise
continue returning kept as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: aebed1ce-de63-4c52-9b32-33de83cd28c0
📒 Files selected for processing (7)
zstd/benches/support/mod.rszstd/src/encoding/cost_model/mod.rszstd/src/encoding/match_generator.rszstd/src/encoding/strategy.rszstd/tests/common/mod.rszstd/tests/corpus_level22_gap.rszstd/tests/he_level22_ratio.rs
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'structured-zstd vs C FFI (x86_64-gnu)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.30.
| Benchmark suite | Current: 372ef50 | Previous: 2e9c99d | Ratio |
|---|---|---|---|
compress/default/low-entropy-1m/matrix/pure_rust |
18.293 ms |
13.554 ms |
1.35 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @polaz
|
@coderabbitai Re: the |
|
Tip For best results, initiate chat on the files or code changes.
🐰 (nods approvingly, ears perked) ✏️ Learnings added
|
…matches Four follow-up findings from PR #123: - The two `btultra2_profile_*` cost-model tests built `p1` and `p2` with identical `const_for_strategy::<BtUltra2>()` calls (the `pass2: bool` parameter that distinguished them in the runtime `for_mode` API is gone), so `assert_eq!(p1.field, p2.field)` became a tautology. Collapsed `btultra2_profile_disables_small_offset_handicap` to a single profile binding asserting both the small-offset flag and the accurate-pricing flag; deleted `btultra2_profile_is_single_pass_opt2` outright (the only invariant left was the now-trivial reflexive equality). - The `MatcherBackend` → `BackendTag` consolidation made `match self.active_backend { Simple => Simple, … }` an identity mapping. Replaced both the `reset()` debug_assert and the matching `driver_reset_keeps_strategy_tag_in_sync_with_active_backend` test scaffolding with a direct `assert_eq!(strategy_tag.backend(), active_backend, …)`. - `run_btultra2_seed_pass` declared `type S = BtUltra2;` but only used the alias in the `build_optimal_plan::<S>` call site, while the adjacent `const_for_strategy::<super::strategy::BtUltra2>()` spelled the type out. Routed the cost-profile call through `S` too so the seed-pass body stays uniform. - `tests/common::parse_zstd_block_breakdown` returns `None` on truncated frame headers (`frame.len() < header_len`) — without the guard a 4-byte-magic-only input returned `Some(BlockBreakdown::default())`, contradicting the docstring. - `supported_levels_filtered()` panics for *any* unknown token in `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER`, not just for the "every token unknown" case. Partial typos (`default,typo`) previously silently dropped the bad token — they now fail loudly with a message listing the supported levels. Refs #122 Phase 3.
CI bench matrix used to rebuild the `compare_ffi` criterion binary
on every shard (3 targets × 6 levels = 18 cargo builds, each
~4-7 min on the GitHub runner). Phase-2 cache hits brought it down
but the cold-cache penalty was still ~120-150 min of accumulated
build time per CI run.
New layout:
- `bench-build` matrix (3 jobs, one per target) builds the criterion
binary with `cargo bench --no-run --message-format=json`, locates
the resolved executable via `jq`, and uploads it as
`bench-binary-<target>`. ~20 min budget per build, retention
7 days.
- `benchmark` matrix (3 × 6 = 18 shards) now `needs: bench-build`,
downloads the matching `bench-binary-<target>` artifact, marks
it executable, and points `run-benchmarks.sh` at it through a
new `STRUCTURED_ZSTD_BENCH_BIN` env var.
- `run-benchmarks.sh` re-execs `$STRUCTURED_ZSTD_BENCH_BIN --bench
--output-format bencher` when the var is set; otherwise it
falls back to the original `cargo bench` path (so local
invocations and any non-CI use stay unchanged).
- Shard runners no longer install the rust toolchain or run
`cargo` at all — only the target-specific runtime packages
(`gcc-multilib`/`musl-tools`) needed to execute a non-host-arch
binary.
- The `decodecorpus_files/z000033` fixture is referenced via
`env!("CARGO_MANIFEST_DIR")` (compile-time absolute path).
GitHub Actions always checks out into `$GITHUB_WORKSPACE` which
is identical across all `ubuntu-latest` runners, so the
embedded path resolves on every shard.
Smoke-tested locally: copying the host-target bench binary to
`/tmp/compare_ffi_bin` and running `STRUCTURED_ZSTD_BENCH_BIN=...
bash run-benchmarks.sh` re-executed the binary directly, no
cargo invocation.
Expected wall-clock impact: each shard drops ~5 min of build,
leaving the criterion measurement itself as the dominant cost.
i686-level22 (the bottleneck shard) was ~30 min, now closer to
~20-22 min — full CI bench matrix should land around ~25 min
end-to-end including the 3 bench-build jobs running in parallel
with the lint/test phase.
Refs #122 Phase 3.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
zstd/src/encoding/match_generator.rs (2)
200-207: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftCollapse strategy resolution to a single source of truth.
params.backendcomes fromLEVEL_TABLE, whilestrategy_tagis recomputed separately viaStrategyTag::for_compression_level(level). The Line 666 assert only checks backend-family agreement, so aBtOpt/BtUltra/BtUltra2drift still passes because all three areHashChain. That would silently combine one level’s HC config with another strategy’s const-gated pricing/hash3 behavior. Prefer carrying the resolvedStrategyTaginLevelParams(or derivingBackendTagfrom it) soreset()consumes one mapping instead of two.Also applies to: 655-670
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@zstd/src/encoding/match_generator.rs` around lines 200 - 207, LevelParams currently stores backend (params.backend) from LEVEL_TABLE while reset() separately recomputes strategy_tag via StrategyTag::for_compression_level(level), allowing mismatches between backend variants that share a family (e.g., BtOpt/BtUltra/BtUltra2) to slip through; change LevelParams to carry the resolved StrategyTag instead of raw backend (or derive backend via StrategyTag), update places that read params.backend to use params.strategy_tag (or strategy_tag.backend()), and make reset() consume the single mapping (using the stored StrategyTag) so all config (hc, pricing/hash3 gating, etc.) is derived from one authoritative strategy value rather than two sources.
3234-3302: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winDrop the test-only
hash3_logescape hatch.The
has_hash3 => BtUltra2branch makes this shim less strict than production: tests can getBtUltra2hash3 behavior without ever callingconfigure()or setting a matchingstrategy_tag. That weakens the new contract you just introduced and can let fixture drift pass unnoticed. I’d rather require tests to configure the matcher the same way production does and dispatch purely fromself.strategy_tag.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@zstd/src/encoding/match_generator.rs` around lines 3234 - 3302, The test-only escape hatch using table.hash3_log must be removed so dispatch in collect_optimal_candidates is purely driven by self.strategy_tag; delete the has_hash3 check and the branch that maps Fast|Dfast|Greedy|Lazy when has_hash3 => to BtUltra2, and have those tags directly call collect_optimal_candidates_initialized::<strategy::Lazy, false> (or the appropriate strategy) while the BT tags (StrategyTag::BtUltra2, ::BtUltra, ::BtOpt) keep their existing collect_optimal_candidates_initialized::<..., true> calls; update any test fixtures to call configure() so they set strategy_tag instead of relying on table.hash3_log.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 200-207: LevelParams currently stores backend (params.backend)
from LEVEL_TABLE while reset() separately recomputes strategy_tag via
StrategyTag::for_compression_level(level), allowing mismatches between backend
variants that share a family (e.g., BtOpt/BtUltra/BtUltra2) to slip through;
change LevelParams to carry the resolved StrategyTag instead of raw backend (or
derive backend via StrategyTag), update places that read params.backend to use
params.strategy_tag (or strategy_tag.backend()), and make reset() consume the
single mapping (using the stored StrategyTag) so all config (hc, pricing/hash3
gating, etc.) is derived from one authoritative strategy value rather than two
sources.
- Around line 3234-3302: The test-only escape hatch using table.hash3_log must
be removed so dispatch in collect_optimal_candidates is purely driven by
self.strategy_tag; delete the has_hash3 check and the branch that maps
Fast|Dfast|Greedy|Lazy when has_hash3 => to BtUltra2, and have those tags
directly call collect_optimal_candidates_initialized::<strategy::Lazy, false>
(or the appropriate strategy) while the BT tags (StrategyTag::BtUltra2,
::BtUltra, ::BtOpt) keep their existing
collect_optimal_candidates_initialized::<..., true> calls; update any test
fixtures to call configure() so they set strategy_tag instead of relying on
table.hash3_log.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: e80dbeff-6d2b-4cfb-ac61-a2791db2acd1
📒 Files selected for processing (3)
zstd/benches/support/mod.rszstd/src/encoding/match_generator.rszstd/tests/common/mod.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 261-272: The "Upload benchmark shard artifacts" step is missing a
retention setting and should match the 7-day retention used elsewhere; update
the actions/upload-artifact@v4 step named "Upload benchmark shard artifacts" to
include a retention-days: 7 value under its with block so the intermediate
benchmark shard artifacts (e.g., benchmark-results.${{ matrix.bench.id }}.${{
matrix.level }}.json and the other matrix-based files) are kept for 7 days only.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c40c2d25-3823-46de-a20b-23fd16d6e110
📒 Files selected for processing (2)
.github/scripts/run-benchmarks.sh.github/workflows/ci.yml
Two more findings from PR #123 review: (A) `LevelParams` now carries `strategy_tag: StrategyTag` instead of a stand-alone `backend: BackendTag`. The backend family is derived via `LevelParams::backend() -> StrategyTag::backend()` so there is no second authoritative mapping from `CompressionLevel` to backend, and `BtOpt`/`BtUltra`/`BtUltra2` drift inside the `HashChain` family is no longer possible (the pre-fix `params.backend == HashChain` check was equally happy with any of the three). `MatchGeneratorDriver::reset()` reads `params.strategy_tag` directly and drops the parallel `StrategyTag::for_compression_level(level)` call, eliminating the drift surface the review flagged. (B) The `#[cfg(test)] HcMatchGenerator::collect_optimal_candidates` dispatcher no longer carries the `hash3_log != 0 => BtUltra2` escape hatch. Tests that wired `hash3_log` directly without calling `configure()` are gone or migrated: - `hc_collect_optimal_candidates_uses_hash3_when_chain_depth_zero` - `hc_collect_optimal_candidates_hash3_updates_skipped_prefix_positions` - `hc_hash3_tail_match_advances_update_cursor_on_early_return` All three forced `search_depth = 0` together with `hash3_log != 0` — an HC-chain-walker-only fixture state that production never reaches (hash3 is BtUltra2-exclusive and BtUltra2 always uses `search_depth = 512`). The end-to-end hash3 invariants stay covered by the level22 roundtrip + donor parity ratio gate. Plus the inline #13 nit: `retention-days: 7` on the benchmark shard upload-artifact step. LEVEL_TABLE entries migrate from `backend: BackendTag::*` to `strategy_tag: StrategyTag::{Fast,Dfast,Greedy,Lazy,BtOpt,BtUltra, BtUltra2}` per band. `resolve_level_params` override sites (`Uncompressed`, negative levels, level22 source-size override) emit a `strategy_tag` directly. 457/457 nextest (was 460, minus 3 deleted tests), clippy clean (default + tests + benches), fmt clean, build × 2. Refs #122 Phase 3.
|
@coderabbitai Re: the two outside-diff findings on match_generator.rs — both fixed in b63a37c. (A) LevelParams now carries strategy_tag instead of backend, with backend() derived via StrategyTag::backend(); reset() drops the parallel for_compression_level call so the level→strategy mapping has a single authoritative source. (B) The hash3_log escape hatch is removed; the three tests that depended on the impossible (uses_bt=false, hash3_log!=0) state are deleted — those donor-parity invariants are now covered end-to-end by the level22 ratio gate. |
|
Tip For best results, initiate chat on the files or code changes.
Thanks for the clear commit reference. ✏️ Learnings added
|
The repo ships a `zstd/fuzz/` setup (5 targets — decode / encode / interop / huff0 / fse) and a populated `artifacts/` regression corpus, but the CI workflow never invoked any of it. Past crashes could resurface and be merged without anything catching them at PR time. This commit wires `cargo-fuzz` into CI alongside the benchmark matrix: - Matrix of 5 fuzz targets, each shard runs on its own `ubuntu-latest` runner so total wall-clock is bounded by the slowest single target. - `taiki-e/install-action@v2` installs `cargo-fuzz`; nightly Rust toolchain is required (libFuzzer instrumentation). - Replay phase: every artifact in `zstd/fuzz/artifacts/<target>/` is fed through `cargo fuzz run <target> <files>` first. Re-introducing a previously fixed crash fails CI on the original reproducer. - Fresh fuzzing phase: `cargo fuzz run <target> -- -max_total_time=90 -timeout=30`. libFuzzer's own time cap is the inner budget; the `timeout-minutes: 10` on the job is the outer ceiling. - On failure the shard uploads `zstd/fuzz/artifacts/<target>/` as `fuzz-<target>-artifacts` (14-day retention) so the new reproducer survives the runner cleanup. Refs #122 Phase 3.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 141-150: The duplicate target lists (the bench entries with ids
x86_64-gnu, i686-gnu, x86_64-musl plus the build matrix, AGGREGATE_TARGETS,
per-target upload steps and per-target pages downloads) should be centralized
into a single source of truth and referenced everywhere; create one canonical
definition (e.g., a YAML anchor or a single matrix/env variable named
AGGREGATE_TARGET_LIST or targets_anchor) that contains the three target objects
(with target_triple and setup_command) and then replace the repeated inline
lists in the build matrix, benchmark matrix, AGGREGATE_TARGETS expansion, upload
steps, and pages-download steps to reference that single definition so future
target changes only need to be made once.
- Around line 213-216: The CI shard configuration for id: x86_64-musl currently
installs build-only packages in setup_command; split build-time and runtime
steps by moving compilation/tooling installs (e.g., "musl-tools") into a
dedicated setup_command used for build shards and add a separate
runtime_setup_command for shards that only execute the prebuilt binary
(STRUCTURED_ZSTD_BENCH_BIN) so runtime shards avoid installing musl-tools;
update the x86_64-musl entry to use setup_command for build dependencies and
runtime_setup_command for only the packages required to run the artifact.
In `@zstd/src/encoding/match_generator.rs`:
- Around line 413-418: The migration comments around the compile-time strategy
tag are stale: they still talk about future work (compress_block::<S> dispatch,
parse-mode removal, and a debug_assert! in reset()) even though the refactor has
landed; update the comment blocks near the strategy/tag explanation and the
other noted ranges (including mentions around reset(), active_backend, and
driver/HC invariants) to accurately describe the current implementation (that
compress_block::<S> dispatch has been implemented, parse-mode is removed, and
there is no debug_assert! in reset()), and replace speculative wording with
concrete statements about the existing invariants between the strategy tag and
active_backend and how the driver/HC invariants are maintained.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 76f332f3-3b93-49eb-911a-eaa266e62801
📒 Files selected for processing (2)
.github/workflows/ci.ymlzstd/src/encoding/match_generator.rs
- zstd/fuzz/Cargo.toml had no `edition` field, defaulting to 2015 where `use structured_zstd::...` requires `extern crate`; targets failed to compile with `unresolved module or unlinked crate`. - The root `rust-toolchain.toml` pins stable, overriding `dtolnay/rust-toolchain@nightly` once `cargo fuzz` shells out to `cargo build` inside the fuzz crate. `cargo fuzz` needs `-Z sanitizer` which stable rejects. Set `RUSTUP_TOOLCHAIN: nightly` at job env so the override survives the sub-cargo invocation. Verified locally: all 5 targets build, `cargo fuzz run fse` completed 62k runs in 10s with no crashes.
- New `bench-matrix` job emits the canonical bench targets JSON; `bench-build` and `benchmark` consume it via `fromJSON(needs.bench-matrix.outputs.targets)`. - Each target now carries separate `build_setup` (run only on `bench-build`) and `runtime_setup` (run only on `benchmark` shards), so `x86_64-musl` no longer reinstalls `musl-tools` on every runtime shard. - `benchmark-aggregate` ships a single `benchmark-aggregated` artifact with per-target consolidated files; `benchmark-pages` downloads that one artifact instead of three named ones. `merge-benchmarks.py` rglob's so the layout change is transparent. - `AGGREGATE_TARGETS` env now sourced from `needs.bench-matrix.outputs.ids_csv`. Also: refresh stale migration comments in `match_generator.rs` that described the const-generic dispatch as future work (the refactor has landed; `parse_mode` field is retained only for in-crate test callers, strategy-shaped predicates fold at codegen time).
`encode_szstd_uncompressed` and `encode_szstd_compressed` drained the input `Read` into a local buffer and then passed the now-empty `Read` into `compress_to_vec`, so the compression branches of the interop roundtrip silently encoded zero bytes. The two regression artifacts shipped under `artifacts/interop/` triggered the assertion `decoded ([]) != data ([0])` on every CI replay. - Take `&[u8]` directly instead of the drain-then-discard `Read` dance. - `encode_szstd_compressed` now uses `CompressionLevel::Default` instead of being a verbatim duplicate of the uncompressed variant. Verified: both regression artifacts replay cleanly under `cargo fuzz run interop artifacts/interop/*`.
The `cargo-fuzz` binary installed via `taiki-e/install-action@v2` is
statically linked against musl on Linux; its `default_target()` probe
picks `x86_64-unknown-linux-musl`, which then fails because
AddressSanitizer cannot link against a static libc:
error: sanitizer is incompatible with statically linked libc,
disable it using `-C target-feature=-crt-static`
error[E0463]: can't find crate for `core`
(The musl rust-std is not installed on the runner either.)
Pass `--target x86_64-unknown-linux-gnu` explicitly via a job-level
`FUZZ_TARGET_TRIPLE` env so both the corpus-replay step and the
fresh-fuzz step bypass the probe and link against glibc, which is
what the dtolnay/rust-toolchain@nightly install has stdlib for.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
zstd/src/encoding/match_generator.rs (1)
4589-4595:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winBuild this fixture through
configure()instead of mutating strategy flags by hand.After the strategy-tag refactor,
BtUltra2implies a matching hash3 setup. Flippingstrategy_tag/is_btultra2/uses_bthere leaveshash3_logandhash3_tablein the old state, so this test now exercises a configuration production never constructs and can trip the new debug invariant incollect_optimal_candidates_initialized_body.Also applies to: 4611-4615
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@zstd/src/encoding/match_generator.rs` around lines 4589 - 4595, The test constructs an HC fixture by manually mutating strategy_tag/is_btultra2/uses_bt which leaves hash3_log/hash3_table inconsistent and triggers the debug invariant in collect_optimal_candidates_initialized_body; instead build the fixture via the existing configure() path so all derived fields are set consistently for StrategyTag::BtUltra2 (e.g., call the configure() helper or the HC::configure method with BtUltra2 rather than setting hc.strategy_tag, hc.table.is_btultra2, hc.table.uses_bt, etc.), and apply the same change to the other instance around the 4611-4615 region so collect_optimal_candidates sees a properly initialized configuration.
♻️ Duplicate comments (1)
zstd/src/encoding/match_generator.rs (1)
2738-2748: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winRemove the stale
self.parse_modereference from this doc comment.
HcMatchGeneratorno longer has aparse_modefield, so this paragraph now documents a state that does not exist. Point it at the test-onlystart_matching()/strategy_tagdispatcher instead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@zstd/src/encoding/match_generator.rs` around lines 2738 - 2748, The doc comment incorrectly references a non-existent self.parse_mode field on HcMatchGenerator; remove that sentence and update the text to say the runtime dispatch is now via the test-only inherent method HcMatchGenerator::start_matching() which uses the strategy_tag dispatcher (not parse_mode), and keep the rest describing that MatchGeneratorDriver::compress_block branches on the compile-time S::USE_BT so monomorphisations see either start_matching_lazy/start_matching_optimal as appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 4589-4595: The test constructs an HC fixture by manually mutating
strategy_tag/is_btultra2/uses_bt which leaves hash3_log/hash3_table inconsistent
and triggers the debug invariant in collect_optimal_candidates_initialized_body;
instead build the fixture via the existing configure() path so all derived
fields are set consistently for StrategyTag::BtUltra2 (e.g., call the
configure() helper or the HC::configure method with BtUltra2 rather than setting
hc.strategy_tag, hc.table.is_btultra2, hc.table.uses_bt, etc.), and apply the
same change to the other instance around the 4611-4615 region so
collect_optimal_candidates sees a properly initialized configuration.
---
Duplicate comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 2738-2748: The doc comment incorrectly references a non-existent
self.parse_mode field on HcMatchGenerator; remove that sentence and update the
text to say the runtime dispatch is now via the test-only inherent method
HcMatchGenerator::start_matching() which uses the strategy_tag dispatcher (not
parse_mode), and keep the rest describing that
MatchGeneratorDriver::compress_block branches on the compile-time S::USE_BT so
monomorphisations see either start_matching_lazy/start_matching_optimal as
appropriate.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 05f44548-751d-4702-8206-aad701f2de38
📒 Files selected for processing (4)
.github/workflows/ci.ymlzstd/fuzz/Cargo.tomlzstd/fuzz/fuzz_targets/interop.rszstd/src/encoding/match_generator.rs
`btultra_and_btultra2_both_keep_dictionary_candidates` set `hc.strategy_tag`, `hc.table.is_btultra2`, `hc.table.uses_bt`, `hc.backend` and `hc.hc.search_depth` by hand to flip the matcher between BtUltra2 and BtUltra between the two `collect_optimal_candidates` calls. After the strategy-tag refactor, BtUltra2 implies a non-zero `hash3_log` and an allocated `hash3_table`; the hand-mutated path left those fields in their default zero shape, so the test exercised a configuration the production `configure()` path never constructs and risked tripping the `Strategy::USE_HASH3 ⇒ hash3_log != 0` debug invariant in `collect_optimal_candidates_initialized_body`. The fixture now goes through `HcMatchGenerator::configure()` for each phase with explicit `HcConfig` + `StrategyTag` + window_log inputs. The history-prime closure is reused so both phases see identical content and dictionary-limit state. Also drop a stale `self.parse_mode` reference in the doc comment for `start_matching_strategy` and a similar stale reference next to `HcOptimalCostProfile::const_for_strategy` — the field was removed in the parse-mode rip-out commit and the comments now describe the test-only inherent `start_matching`'s `strategy_tag` dispatcher.
The 5-way matrix paid a fresh asan+sancov build of `structured-zstd` (~2 min) on every shard — the dominant cost — for ~3 min of wall- clock parallelism. Wall-clock is already bounded by the bench matrix (~25 min) which runs in parallel, so the matrix split was buying nothing useful. Run all five targets sequentially in one runner instead: the first target builds, the remaining four reuse cargo's incremental cache (~seconds each). Halves the compute spend while keeping each target's output grouped under `::group::` markers for log navigation, and the fresh-fuzz step still respects the 90s libFuzzer budget per target.
Summary
Phase 3 of the #111 encoder architecture rewrite — replace runtime
enum HcParseMode+MatcherBackenddispatch with const-genericStrategymonomorphisation. Closes #122.After Phase 1 (#119) restructured the encoder into per-backend types and Phase 2 (#121) removed hot-path
saturating_*, the residual default-level / level-22 perf gap was hypothesised to come from runtimematch self.parse_mode/match self.active_backendbranches firing on every encoded position. This phase lifts those choices into the type system: eachlevel → Strategymapping is now a concrete ZST that implements theStrategytrait. Hot-path entry points are generic overS: Strategy, the compiler monomorphises the inner loops per variant, and everyif S::FOObranch resolves at codegen time.Architectural mechanism
LevelParamsis the single source of truth fromCompressionLevelto strategy: it carriesstrategy_tag: StrategyTag, and the runtimeBackendTagconsumed by the driver dispatcher is derived viaStrategyTag::backend().reset()readsparams.strategy_tagdirectly — there is no secondfor_compression_level(level)recomputation that could drift.Commit map (20 commits)
Core dispatch (commits 1-10):
c7c0c76e—Strategytrait + 7 ZST types (Fast/Dfast/Greedy/Lazy/BtOpt/BtUltra/BtUltra2). Associated consts:BACKEND,MIN_MATCH,ACCURATE_PRICE,FAVOR_SMALL_OFFSETS,USE_HASH3,USE_BT,OPT_LEVEL,MAX_CHAIN_DEPTH,SUFFICIENT_MATCH_LEN.StrategyTag::for_level(u8)mirrorsLEVEL_TABLErow-for-row.46b0ef94— plumbStrategyTagintoMatchGeneratorDriver.e27a2e96—compress_block<S>generic entry on the driver.5c5adfaf—S::USE_BTreplacesmatch parse_modein HC dispatch.8f599fba— DP body const-foldsabort_on_worse_match/opt_levelfromS::OPT_LEVEL. Primary hot-path commit.d5d29ac1— diagnostic perf probes (#[ignore]'d).dd6f23cd—S::USE_HASH3const-gates the hash3 lookup.d8355d16— btultra2 seed pass gated onS::OPT_LEVEL && S::USE_HASH3.23839074—HcOptimalCostProfile::const_for_strategy<S>().30f1c9a7— full rip-out ofHcParseMode+MatcherBackendruntime enums, theparse_modefield onHcConfigandHcMatchGenerator,Strategy::PARSE_MODE, the runtimefor_mode(HcParseMode, bool)API.CI infrastructure (commits 11-12, 16, 19):
97a8ab28— split bench matrix per level (target × level = 18 shards) +benchmark-aggregatejob; ~25 min wall-clock.18bb80d3—cargo fmt --checkfix.ee62ec0a— separatebench-build(3 jobs) frombenchmark(18 shards). Shards download a pre-built binary viaSTRUCTURED_ZSTD_BENCH_BIN; nocargoinvocation in shards. Saves ~5 min per shard.79f0955c— bench target inventory centralised behind a newbench-matrixjob that emits the canonical JSON;bench-build,benchmark,benchmark-aggregate,benchmark-pagesall consumefromJSON(needs.bench-matrix.outputs.targets). Split per-targetbuild_setup(run only onbench-build) fromruntime_setup(run only onbenchmarkshards) —x86_64-muslno longer reinstallsmusl-toolson every runtime shard.benchmark-aggregateships a single combinedbenchmark-aggregatedartifact, replacing the three hardcoded per-target uploads. Also refreshes stale migration comments inmatch_generator.rs(the const-generic dispatch had landed but the comments still framed it as future work).Post-review hardening (commits 13-15, 17-18, 20):
372ef501—strategy_tagfield onHcMatchGenerator(mirrored from driver duringconfigure()). Test-only dispatchers routeBtOpt/BtUltra/BtUltra2to distinct monomorphisations instead of collapsing.Level(n)clamp moved toi32beforeu8cast (Level(256)no longer wraps toDfast); regression test added.tests/common::dump_block_breakdownshared parser,--teststruncated frame headerguard.supported_levels_filtered()panics on unknown bench filter tokens.910db7b3— collapsepass2-era profile-duplicate tests, identityBackendTag → BackendTagmatch dropped, seed pass usestype S = BtUltra2consistently,parse_zstd_block_breakdownreturnsNoneon truncated header, bench filter panics for any unknown token (not just empty result).b63a37cd—LevelParamsswitched frombackend: BackendTagtostrategy_tag: StrategyTagwith derivedbackend()method.reset()readsparams.strategy_tagdirectly. Threehc_collect_optimal_candidates_*_hash3_*/hc_hash3_tail_match_*tests removed (they relied on ahash3_log != 0test-only escape hatch that masked an unreachable production state). Plusretention-days: 7on bench shard artifacts.bdd88ffc— newfuzzCI matrix (5 targets — decode/encode/interop/huff0/fse). Each shard replays the existingzstd/fuzz/artifacts/regression corpus first, then runs 90s of fresh libFuzzer time. Crash artifacts uploaded on failure (14-day retention).49f7f2c2— fuzz CI followup:zstd/fuzz/Cargo.tomlhad noeditionfield (defaulted to 2015, breakinguse structured_zstd::…); pinnedRUSTUP_TOOLCHAIN=nightlyat the job env level socargo fuzzsurvives therust-toolchain.tomlstable pin once it shells out to its innercargo build.bd02dc6e—fuzz_targets/interop.rsbug:encode_szstd_uncompressed/encode_szstd_compresseddrained the inputReadinto a localVecand then compressed the now-emptyRead, so the encoding branches silently encoded zero bytes and the two corpus artifacts underartifacts/interop/triggered an assertion on every CI replay. Takes&[u8]directly now;encode_szstd_compressedusesCompressionLevel::Defaultinstead of duplicating the uncompressed variant.Perf delta vs
perf/post-pr-121-baseline(host x86_64-darwin, best of 5)Note: the original #122 perf targets (≥30 % default-level, ≥15 % fast-level) were not hit. The const-generic dispatch by itself didn't move the needle on host because the runtime reads it replaces (
self.parse_mode,self.use_hash3) were already cache-hot single-instruction loads with perfectly-predicted branches. The structural foundation is the real deliverable; the perf payoff is gated on Phase 4 (BT/HC specialization + donor fast paths).Architectural payoff
compress_block::<Fast>cannot reachstart_matching_optimal(noUSE_BT); BtUltra2-only hash3 code is unreachable from non-BtUltra2 monomorphisations.LevelParams::strategy_tagis the only authoritative mapping fromCompressionLevelto strategy; the driver derivesBackendTagfrom it instead of parallel-resolving the level twice.HcParseModeandMatcherBackendremoved, singleBackendTag+StrategyTaginstrategy.rs.bench-matrixJSON output consumed by every downstream bench job — adding a new target is a one-row change.zstd/fuzz/setup now runs every PR with regression-corpus replay + 90s libFuzzer per target. The replay step also surfaced (and we fixed) a real bug in theinteroptarget that had silently encoded zero bytes since the fuzz crate was first written.Verification
b63a37cd— they tested a production-unreachablehash3_log != 0+search_depth = 0state that the removed test-only escape hatch made possible).cargo clippyclean (default +--tests+--benches).cargo fmt --checkclean.--no-default-features) at every commit.level22_sequences_match_donor_on_corpus_proxy) green at every commit.#[ignore]'d) document:STRUCTURED_ZSTD_BENCH_BIN=....Acceptance criteria from #122
Strategytrait + 7 concrete strategy typescompress_block<S: Strategy>top-level entry —HcParseModeruntime dispatch removed from inner loopsmatch self.parse_modeorif self.use_hash3style branches on the encoded-position hot pathHcParseModeenum deleted (plusMatcherBackenddeleted)cargo nextest run --features hash,std,dict_builder100 % passlevel22_sequences_match_donor_on_corpus_proxyratio gate PASS on every commit--no-default-features) +cargo clippy --all-targets -- -D warningscleanCloses #122. Refs #111.
Summary by CodeRabbit