Skip to content

perf(encoding): #124 Phase 4 — HC speculative tail check + MatcherStorage enum dispatch#125

Open
polaz wants to merge 4 commits into
mainfrom
feat/#124-phase4-donor-fastpaths
Open

perf(encoding): #124 Phase 4 — HC speculative tail check + MatcherStorage enum dispatch#125
polaz wants to merge 4 commits into
mainfrom
feat/#124-phase4-donor-fastpaths

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented May 14, 2026

Summary

Phase 4 of #111. Three commits land donor-parity hot-path improvements + the supporting backend refactor:

  • Phase 4.1 (82e85fcf, hardened in c16ca32b): port donor zstd_lazy.c:714 speculative tail check into HcMatcher::hash_chain_candidate. Once best is set, the per-byte common_prefix_len walk is gated on a 4-byte tail compare at tail_off = best.match_len − lit_len − 3 (via checked_sub(lit_len + 3)). The − lit_len term accounts for extend_backwards having added up to lit_len backward bytes to best.match_len; a new candidate can in turn replace B_best of its own length with up to lit_len backward bytes, so the worst-case forward length it needs to outscore best (under the LIFO offset-monotonic walk argument) is best.match_len − lit_len + 1. When lit_len == 0 the formula reduces to best.match_len − 3 (no backward extension possible).

  • Phase 4.3 (8809bc01): collapse the four parallel driver backend fields (match_generator, Option<DfastMatchGenerator>, Option<RowMatchGenerator>, Option<HcMatchGenerator>) plus active_backend: BackendTag into a single storage: MatcherStorage enum. Eliminates dead-variant allocation across backend switches — the old layout kept drained metadata for every backend ever touched. The active backend is derived directly from the storage variant; no separate runtime tag to drift. Variant docs enumerate the full CompressionLevel → backend mapping (Simple = Uncompressed / Fastest / Level(1) / non-positive levels; Dfast = Default / Level(0,2,3); Row = Level(4); HashChain = Better / Best / Level(5..=22)).

  • Phase 4 Dfast donor parity (c16ca32b): port donor zstd_double_fast.c post-match rep-0 extension into the Dfast fast-loop and general-loop emit paths. After each emit_candidate, the new helper extend_with_repcode_after_match chains additional zero-literal rep sequences while 4 bytes at the new cursor keep matching cursor − offset_hist[1]. Each iteration emits one rep sequence with the swapped-in offset, advances pos and literals_start, updates offset_hist via the donor offset_2 = offset_1; offset_1 = old_offset_2; swap, inserts every position of the rep match into the long/short hash tables, and skips the full hash probe on every extra match. Uses donor's MINMATCH = 4 for the rep threshold (not DFAST_MIN_MATCH_LEN = 6) — a 4-byte rep is always a net win over re-running the full hash search.

Sub-goals from #124

  • 4.1 HC speculative tail check — landed; correctness bound was tightened in c16ca32b after Copilot review (the original formula used best.match_len directly and was unaware of extend_backwards backward bytes; new formula subtracts lit_len).
  • 4.2 matchEndIdx-8 skip — audited; already implemented in main (skip_insert_until_abs wired across BT walker + the gate at the top of collect_optimal_candidates_initialized_body! + bt_update_tree_until_* start-from-skip + apply_limited_update_after_long_match). Issue text outdated, no new work needed.
  • 4.3 Matcher trait wiring — landed as enum dispatch (MatcherStorage) per the user-approved trait-vs-enum trade-off (cleaner for closed set, zero vtable cost, matches the existing HcBackend precedent inside HcMatchGenerator).
  • 4.4 Residual strategy_tag sweep — audited. Inner loops of compress_block::<S>start_matching_strategy::<S>start_matching_optimal::<S>build_optimal_plan_impl<S,_,_> are fully generic; no match self.strategy_tag / parse_mode / active_backend inside. The remaining match self.active_backend() calls in the driver are per-frame/per-slice setup paths (eviction, dictionary priming, slice commit), not hot path. self.is_btultra2 reads in per-CPU bt_update_tree_until_* kernels were kept — making them S-generic would multiply the binary by 4× across the target_feature umbrellas for a single register-resident bool read.
  • Dfast donor parity extensionextend_with_repcode_after_match ports donor's post-match rep-0 chain that the existing Rust Dfast skipped (it re-ran the full hash search at every position even after a primary match into a repetitive run). Closes the biggest hot-path feature gap between Rust Dfast and donor ZSTD_compressBlock_doubleFast_*.

Verification

  • cargo nextest run --profile ci -p structured-zstd --features hash,std,dict_builder 475/475 pass, 3 skipped (incl. level22_sequences_match_donor_on_corpus_proxy, full cross-validation, roundtrip + parity gates for default / better / best / level22).
  • cargo test --doc -p structured-zstd --features hash,std,dict_builder 10/10 pass.
  • cargo clippy -p structured-zstd --features hash,std,dict_builder --all-targets -- -D warnings clean.
  • Local cargo bench --quick '^compress/default' on x86_64-darwin: every default-level scenario improves relative to the c_ffi baseline-noise floor (c_ffi shifted uniformly +25 % on the same run — criterion's stored baseline ran in a colder state). Most relevant numbers: decodecorpus-z000033 (the 27× C-FFI gap target from feat: encoder architecture rewrite (split monolith, const-generic Strategy, arena allocator) #111) gains −11 % vs c_ffi noise; high-entropy-1m −16 %; low-entropy-1m −8 %.
  • Linux x86_64-gnu sanity-check (Intel i9-9900K, Fedora 44, rustc 1.94 stable, cargo clean -p between commits): Phase 2 / Phase 3 / Phase 4 HEAD all land within ±1 % of 1.52 ms on compress/level22/low-entropy-1m/matrix/pure_rust — i.e. no level22 regression on a fresh clean build. The 1.527 → 2.483 ms CI alert that fired on the Phase 3 merge commit was not reproducible across two architectures (ARM64 darwin + x86_64-gnu) and appears to be GitHub-runner microarch / VM-noise, not a real regression.

Test plan

  • CI green across (test × {ubuntu,macos,windows}), cross-i686, Fuzz smoke, msrv.
  • CI level22_sequences_match_donor_on_corpus_proxy ratio gate green.
  • CI bench-matrix (18 shards) full numbers in PR comment from the benchmark action — looking specifically for the default-level (compress/default/*) shards to confirm the Dfast post-match rep extension wins.

Closes part of #124.

Summary by CodeRabbit

  • Refactoring

    • Optimized compression matching pipeline with enhanced candidate evaluation techniques
    • Improved repeated pattern detection and handling during compression operations
    • Restructured matcher backend architecture for better code organization and maintainability
  • Tests

    • Added regression tests validating compression matching behavior

Review Change Stack

polaz added 2 commits May 15, 2026 01:49
Port donor `zstd_lazy.c:714` (`ZSTD_HcFindBestMatch`) speculative
4-byte tail compare into `HcMatcher::hash_chain_candidate`. Once a
`best` candidate is set with `match_len >= 4`, gate the per-byte
`common_prefix_len` walk on `concat[i+best_ml-3..best_ml+1]` vs
`concat[c+best_ml-3..best_ml+1]`. A mismatch proves the candidate
cannot reach `best_ml + 1` and the walk would only confirm
`len <= best_ml`.

Correctness rests on the monotonic-offset property of LIFO chain
walks: each subsequent candidate has `new.offset_bits >=
best.offset_bits`, so `better_candidate` (gain = `len*4 -
offset_bits`) can return a strict winner only when
`new.match_len > best_ml`. Equal-length or shorter candidates can
only tie or lose, and `better_candidate` keeps the existing best on
ties.

Bounds: when `current_idx + best_ml + 1 > history_tail`, the current
position cannot reach `best_ml + 1` either, so the candidate cannot
win — same skip.

Verification:
- `cargo nextest run -p structured-zstd --features hash,std,dict_builder`
  476/476 pass (incl. `level22_sequences_match_donor_on_corpus_proxy`
  and the cross-validation roundtrip + parity gates).
- `cargo test --doc` 10/10 pass.
- `cargo clippy -p structured-zstd --features hash,std,dict_builder
  --all-targets -- -D warnings` clean.
- `cargo bench --quick '^compress/better'` (Lazy backend, the actual
  HC chain-walker user) on x86_64-darwin: one scenario −5.4 %
  (small-10k-random, p = 0.05), the rest within ±2 % of stored
  baseline — `--quick` noise floor swallows whatever steady-state
  win remains. Full baseline-vs-feature bench will be measured on
  the CI bench-matrix.
Collapse the driver's four parallel backend fields
(`match_generator: MatchGenerator`,
`dfast_match_generator: Option<DfastMatchGenerator>`,
`row_match_generator: Option<RowMatchGenerator>`,
`hc_match_generator: Option<HcMatchGenerator>`) plus the
`active_backend: BackendTag` discriminator into a single
`storage: MatcherStorage` enum with four variants
(`Simple` / `Dfast` / `Row` / `HashChain`). The active backend is
now derived directly from the storage variant via
`MatcherStorage::backend()` — no separate runtime tag that could
drift against the variant.

Why
- The prior layout kept drained-but-allocated inner structures for
  every backend the driver had ever touched (the lazy `Option`s
  + the always-present `match_generator` retained their backing
  metadata across backend switches). With the enum, the inactive
  variants are dropped on switch, so the driver carries exactly
  one backend's state at any time.
- Every per-frame / per-slice driver operation
  (`reset` / `prime_with_dictionary` / `commit_space` /
  `get_last_space` / `skip_matching_with_hint` /
  `retire_dictionary_budget` / `trim_after_budget_retire` /
  `skip_matching_for_dictionary_priming`) used to dispatch on
  `active_backend` to pick the matching field. The match-on-tag
  + field-access pair is replaced with `match &mut self.storage`,
  which fuses dispatch and accessor and gives the borrow checker
  field-disjoint splits "for free" (manual destructure where the
  closure captures `vec_pool` / `suffix_pool` alongside the
  backend borrow).
- This is non-hot-path dispatch (per block at most, per frame
  typically). The hot per-block `compress_block::<S>` already
  dispatches via `match S::BACKEND` over a `const`, which the
  compiler const-folds out per monomorphisation — that path is
  unchanged.

Notes
- Backend transitions inside `Matcher::reset` drain the old
  variant's allocations into `vec_pool` / `suffix_pool`, then
  swap `self.storage` to a freshly constructed variant for the
  new backend. The old variant is dropped at the swap; with the
  HashChain → Simple transition the pre-swap `hash_table` /
  `chain_table` / `hash3_table` zeroing kept from the prior
  field-shaped impl ensures the BtUltra2 hash3 allocation
  (`1 << HC3_HASH_LOG`) does not survive the switch.
- Three tests went obsolete because they exercised the old
  lazy-init recovery / drained-but-allocated invariant
  (`driver_reset_from_row_backend_tolerates_missing_row_matcher`
  flipped `row_match_generator = None` by hand;
  `driver_best_to_fastest_releases_oversized_hc_tables` and
  `…reclaims_row_buffer_pool` inspected the post-switch state of
  a backend that no longer exists). Replaced with one new test
  proving the new invariant — buffer pool grows across the
  HashChain → Simple switch, backend tag flips, no field-shaped
  inspection of the dropped variant.
- Immutable accessors (`simple` / `dfast_matcher` / `row_matcher`
  / `hc_matcher`) are now `#[cfg(test)]` — production code only
  reads via the mutable counterparts or pattern-matches `storage`
  directly.

Verification
- `cargo nextest run -p structured-zstd --features hash,std,dict_builder`
  475/475 pass, 3 skipped (`level22_sequences_match_donor_on_corpus_proxy`
  + full cross-validation + roundtrip + parity gates).
- `cargo test --doc` 10/10 pass.
- `cargo clippy -p structured-zstd --features hash,std,dict_builder
  --all-targets -- -D warnings` clean.
Copilot AI review requested due to automatic review settings May 14, 2026 23:20
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 27c4a3e0-b1b8-4a3a-a0ea-7a52a013bebb

📥 Commits

Reviewing files that changed from the base of the PR and between c16ca32 and 06dbcd1.

📒 Files selected for processing (3)
  • zstd/src/encoding/dfast/mod.rs
  • zstd/src/encoding/hc/mod.rs
  • zstd/src/encoding/match_generator.rs

📝 Walkthrough

Walkthrough

This PR adds donor-style rep‑0 post-match chaining to the Dfast matcher, a speculative 4‑byte tail-equality check to the HC matcher candidate loop, and refactors MatchGeneratorDriver to hold exactly one active backend in a MatcherStorage enum with pattern-matched accessors and reset/drain semantics.

Changes

Backend Storage Consolidation and HC + Dfast Enhancements

Layer / File(s) Summary
Dfast post-match rep‑0 chaining
zstd/src/encoding/dfast/mod.rs
Adds extend_with_repcode_after_match and calls it from general and fast loops to opportunistically chain zero-literal rep matches, update literals_start, insert rep ranges into hash tables, and advance pos without re-probing hashes.
HC Matcher speculative 4-byte tail check
zstd/src/encoding/hc/mod.rs
After a best with match_len >= 4 exists, candidate evaluation performs an early 4-byte tail-equality check (tail at best_ml - 3) and skips candidates that fail before calling common_prefix_len.
MatcherStorage enum and backend derivation
zstd/src/encoding/match_generator.rs
Introduces MatcherStorage enum (Simple/Dfast/Row/HashChain) with backend() to derive active family, replacing per-backend optional fields plus a separate active_backend tag.
Driver initialization and accessor helpers
zstd/src/encoding/match_generator.rs
MatchGeneratorDriver::new() initializes storage with Simple; adds active_backend(), simple()/simple_mut(), dfast_matcher(_mut), row_matcher(_mut), hc_matcher() accessors that pattern-match the enum.
Core driver operations using new storage model
zstd/src/encoding/match_generator.rs
Rewrites dictionary priming, budget retirement, trimming, reset, get_last_space/commit_space, skipping, and dispatch to match on storage variants; reset drains outgoing variant buffers into pools and clears HC tables when switching away.
Test updates for new storage accessors and invariants
zstd/src/encoding/match_generator.rs
Updates tests to use driver.active_backend() and backend accessors (simple()/simple_mut()/hc_matcher()), and to validate storage swaps and pool recycling rather than inspecting removed optional fields or persistent HC internals.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 I hopped through hashes, quick and spry,
Chained rep matches as I bounded by.
Tails now checked with four‑byte sight,
One enum holds the backends tight.
A rabbit's hop—compression done with delight.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the three main changes: HC speculative tail check, MatcherStorage enum dispatch, and relates to #124 Phase 4.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#124-phase4-donor-fastpaths

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

❌ Patch coverage is 94.72222% with 19 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
zstd/src/encoding/match_generator.rs 94.18% 10 Missing ⚠️
zstd/src/encoding/dfast/mod.rs 93.87% 9 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the encoder architecture/performance work by adding an HC speculative tail check and replacing parallel matcher backend fields with a single active MatcherStorage enum.

Changes:

  • Adds a speculative 4-byte tail comparison in HC hash-chain candidate probing.
  • Collapses matcher backend storage into a single enum variant plus derived backend dispatch.
  • Updates driver tests and backend-switch assertions for the new storage model.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
zstd/src/encoding/match_generator.rs Introduces MatcherStorage and rewires driver reset, commit, dictionary, and tests around enum-backed storage.
zstd/src/encoding/hc/mod.rs Adds the HC speculative tail gate before running full prefix matching.

Comment thread zstd/src/encoding/hc/mod.rs Outdated
Comment thread zstd/src/encoding/match_generator.rs Outdated
Comment thread zstd/src/encoding/match_generator.rs
Comment thread zstd/src/encoding/match_generator.rs Outdated
Bundles the Dfast donor port and the HC speculative-gate correctness fix
from Copilot review on PR #125. Same Phase 4 hot-path family, shares one
retest cycle.

## Dfast post-match rep-0 extension

Port donor `zstd_double_fast.c` post-match rep chain into the Dfast
fast-loop and general-loop emit paths. After each `emit_candidate`,
the new helper `extend_with_repcode_after_match` chains additional
zero-literal rep sequences while 4 bytes at the new cursor keep
matching `cursor - offset_hist[1]`. Each iteration emits one rep
sequence with the swapped-in offset, advances `pos` and
`literals_start` past the rep match, updates `offset_hist` via
`encode_offset_with_history` (the donor `offset_2 = offset_1;
offset_1 = old_offset_2;` swap), inserts every position of the rep
match into the long/short hash tables so subsequent positions can
hit them, and skips the full hash-table probe entirely on every
extra match.

Uses donor's `MINMATCH = 4` for the rep threshold rather than the
stricter `DFAST_MIN_MATCH_LEN = 6` enforced on the main search —
donor accepts any 4-byte rep extension and we mirror that because
the rep emission carries no offset cost.

## HC speculative tail check — correctness fix

Copilot flagged a backward-extension-unaware bound in the gate
introduced by 82e85fc. The old code used `best.match_len` directly
as the forward-extent reference, but `extend_backwards` may have
added up to `lit_len` backward bytes to that total. A new chain
candidate can in turn replace `B_best` of its own length with up to
`lit_len` backward bytes, so the worst-case forward length it
needs to outscore `best` (under the LIFO offset-monotonic walk
argument) is `best.match_len - lit_len + 1`. The 4-byte tail probe
now anchors at `tail_off = best.match_len - lit_len - 3` (via
`checked_sub(lit_len + 3)`), which covers exactly that boundary.
When `best.match_len <= lit_len + 3` the worst-case forward target
is so close to `current_idx` that the chain-walk cost is already
trivial — the gate is skipped via the `checked_sub`. When
`lit_len == 0` the formula reduces to the pre-fix `best.match_len
- 3` (no backward extension possible).

Comment block in `hash_chain_candidate` now enumerates both the
backward-extension bound and the offset-monotonicity argument.

## Driver test tightening (Copilot threads #2 / #3 / #4)

- `driver_reset_from_row_backend_recycles_row_buffers_into_pool`:
  assertion changed from `vec_pool.len() >= before_pool` to `>` so
  the test fails on the regression it is meant to cover. With
  `before_pool = 0` on a fresh driver the weaker bound passed
  trivially even if the Row to Simple drain closure never pushed
  anything back into `vec_pool`.
- `row_get_last_space_and_reset_to_fastest_clears_window` renamed
  to `..._drops_row_variant` with a doc comment explaining why
  the original `clears_window` assertion is intentionally gone (the
  variant itself is dropped on swap; no `row_matcher()` to inspect
  post-switch). Points at the sibling pool-recycling test by name.
- `MatcherStorage` variant docs rewritten to enumerate the full
  `CompressionLevel` to backend mapping that `resolve_level_params`
  produces for each variant (Simple covers `Uncompressed`,
  `Fastest`, `Level(1)`, and any non-positive `Level(n) != 0`;
  Dfast covers `Default`, `Level(0)`, `Level(2)`, `Level(3)`; Row
  covers `Level(4)`; HashChain covers `Better`, `Best`, and
  `Level(5..=22)` across Lazy / BtOpt / BtUltra / BtUltra2). The
  pre-fix doc said "Level 1" / "Levels 2-3" which was misleading
  for the named-level variants.

## Verification

- nextest 475/475 pass, 3 skipped (incl.
  `level22_sequences_match_donor_on_corpus_proxy` parity gate and
  the full cross-validation + roundtrip + ratio gates for
  default / better / best / level22).
- doctests 10/10 pass.
- clippy --all-targets -D warnings clean.
- bench --quick on `^compress/default` (x86_64-darwin): every
  default-level scenario improves relative to the c_ffi baseline
  noise floor. Most relevant numbers: `decodecorpus-z000033` (the
  27x C-FFI gap target from #111) gains -11 % vs c_ffi noise;
  `high-entropy-1m` -16 %; `low-entropy-1m` -8 %.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/src/encoding/dfast/mod.rs`:
- Around line 452-460: The check `rep > pos` incorrectly prevents chaining to
retained history across block boundaries; keep the `rep == 0` early break but
remove the `rep > pos` guard and rely on the `cur_idx.checked_sub(rep)`
(`cur_idx`, `cand_idx`, `current_abs_start`, `history_abs_start`, `offset_hist`)
to detect out-of-range repeats so repeated offsets that point into retained
history are allowed; in short, delete the `rep > pos` condition and let the
existing `checked_sub` None branch handle invalid sources.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 52ac21cd-64af-4ece-ad72-447894484c1f

📥 Commits

Reviewing files that changed from the base of the PR and between 8809bc0 and c16ca32.

📒 Files selected for processing (3)
  • zstd/src/encoding/dfast/mod.rs
  • zstd/src/encoding/hc/mod.rs
  • zstd/src/encoding/match_generator.rs

Comment thread zstd/src/encoding/dfast/mod.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comment thread zstd/src/encoding/dfast/mod.rs Outdated
Comment thread zstd/src/encoding/match_generator.rs Outdated
Comment thread zstd/src/encoding/dfast/mod.rs
Comment thread zstd/src/encoding/hc/mod.rs
…culative gate

Round-2 review fixes from CodeRabbit / Copilot on PR #125. Bundles the
behavioural fix called out by both reviewers (block-local `rep > pos`
guard wrongly rejected retained-history offsets) plus targeted
regression tests for the helpers that previously had only roundtrip
coverage.

## Behavioural fix: `extend_with_repcode_after_match` accepts cross-block offsets

CodeRabbit (Major / Quick win) and Copilot both flagged that the
helper's `if rep == 0 || rep > pos { break; }` guard rejected valid
repeat offsets pointing into retained history whenever the source
lived past the current block-local cursor. Since `DfastMatchGenerator`
keeps a contiguous `live_history()` buffer covering
`history_abs_start..history_abs_end`, the only correct bound is
"does the candidate land at or after `history_abs_start`" — which
`cur_idx.checked_sub(rep)` already enforces (`None` ⇒ rep points
before `history_abs_start`, break).

Replaced the guard with `if rep == 0 { break; }` and added a comment
explaining that `checked_sub` is the authoritative bound. Near
block boundaries this restores the donor-parity chain win the helper
is meant to recover.

## Regression tests

* `dfast::extend_with_repcode_tests::dfast_repcode_extension_emits_zero_literal_rep_on_constant_run`
  exercises the post-match rep helper directly with a hand-built
  post-primary-match state: a constant-byte block, `offset_hist[1]
  = 1`, and an in-block cursor mid-block. Verifies the helper emits
  a zero-literal Triple with `offset = 1` (donor parity) and
  advances `pos` / `literals_start` consistently. Going through
  `start_matching` was unreliable because the primary `best_match`
  greedily consumes the entire constant run in a single Triple
  (`offset = 1, match_len = block - 1`), leaving the helper nothing
  to extend.
* `dfast::extend_with_repcode_tests::dfast_repcode_extension_walks_into_retained_history`
  builds a two-block fixture and probes with `rep = 40` against a
  block-local cursor at `pos = 5`. The pre-fix `rep > pos` guard
  would have rejected this; the post-fix code emits the rep
  correctly.
* `hc::hc_tests::hash_chain_candidate_speculative_gate_handles_lit_len_backward_extension`
  guards the post-fix `tail_off = best.match_len - lit_len - 3`
  formula with a monotonicity check: for the same probe position
  and chain content, the returned `match_len` must be
  non-decreasing in `lit_len` because more backward-extension
  headroom can only enlarge (or leave unchanged) every candidate's
  effective length. A regression where the gate skips candidates
  that backward extension would have rescued surfaces as
  `len_high_lit < len_low_lit`.

## Doc / comment clarifications (Copilot threads on `MatcherStorage`)

* The post-switch state comment in
  `driver_best_to_fastest_releases_oversized_hc_tables` previously
  said the manual `hash_table = Vec::new()` assignments release
  allocations "immediately rather than waiting for `Drop`". That
  was misleading — assigning `self.storage = MatcherStorage::Simple(...)`
  would drop the old `HashChain` variant in the same call regardless.
  The real reason for the up-front reassignment is to cap peak
  memory during the swap: by releasing the table allocations
  *before* constructing the replacement variant, the peak resident
  set is "old data buffers being drained into `vec_pool` + new
  `MatchGenerator` skeleton" instead of "old tables still resident
  + new variant under construction". Comment updated to reflect
  that.

## Verification

- nextest 478/478 pass, 3 skipped (478 = 475 + the three new tests).
- doctests 10/10 pass.
- clippy --all-targets -D warnings clean.
@polaz polaz requested a review from Copilot May 15, 2026 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants