Skip to content

feat(quote): downscore peers that supply bad-bound quotes#77

Open
grumbach wants to merge 2 commits intoWithAutonomi:mainfrom
grumbach:plan1/bad-node-eviction
Open

feat(quote): downscore peers that supply bad-bound quotes#77
grumbach wants to merge 2 commits intoWithAutonomi:mainfrom
grumbach:plan1/bad-node-eviction

Conversation

@grumbach
Copy link
Copy Markdown
Contributor

@grumbach grumbach commented May 7, 2026

Summary

  • When a peer's storage quote ships a pub_key that does not BLAKE3-hash to the peer's claimed PeerId, report a strong negative trust event so the local AdaptiveDHT swaps that peer out of the routing table on the next admission cycle. This stops the operator-monopolises-close-K failure mode that took uploads below quorum on 2026-05-06.
  • Wires the report at both detection sites in ant-core/src/data/client/quote.rs:
    • per-peer classify_quote_response (primary defence inside the async closure) — fires for every BadQuoteBinding verdict;
    • post-collection drop_quotes_with_bad_bindings (defensive filter) — fires for every peer whose quote slipped past the per-peer handler.
  • Both sites go through a small TrustReporter trait so the wiring can be unit-tested with a mock recorder rather than a live P2PNode. Production paths use the blanket impl on Arc<P2PNode> which forwards to P2PNode::report_application_failure(peer, 5.0).
  • The 5.0 weight is sized to drop a peer from neutral 0.5 to ~0.26 in a single event, well below the production swap-out threshold (saorsa_core::adaptive::DEFAULT_SWAP_THRESHOLD = 0.35). saorsa-core clamps consumer weights at MAX_CONSUMER_WEIGHT = 5.0 so this is the strongest legal signal — appropriate for a verifiable cryptographic mismatch.
  • See notes/plan-1-bad-node-eviction.md for the full design and the production failure that motivates this work.

What's new in tests

  • classify_quote_response_reports_trust_event_on_bad_binding (B1)
  • classify_quote_response_does_not_report_on_good_binding
  • classify_quote_response_does_not_report_on_non_binding_failures
  • drop_quotes_with_bad_bindings_reports_one_event_per_dropped_peer (B2)
  • bad_binding_does_not_affect_trust_for_other_peers (B3)

Existing tests for drop_quotes_with_bad_bindings are updated for the new return type (Vec<PeerId> rather than usize count) so the caller can attribute trust events per peer.

Test plan

  • cargo test -p ant-core --lib data::client::quote (25/25 pass)
  • cargo clippy -p ant-core --lib -- -D warnings
  • cargo fmt --all -- --check

Dependencies

Cross-links

Behaviour-preservation argument

  • Why this is safe: trust events are advisory; the 0.35 swap threshold is already the production knob. We are emitting a stronger negative signal for strictly stronger evidence — a verifiable cryptographic mismatch is more conclusive than a dial timeout.
  • NAT-safe: trust is per-PeerId, not per-IP. Legitimate peers behind a shared NAT are unaffected.
  • No false-positive risk for honest peers: the score decays back above the swap threshold in ~1 day, so a peer that fixes a temporary mis-sign re-enters the lookup pool naturally.
  • Sybil resistance preserved: the eviction makes crossed-key Sybils less effective, not more — the same attacker now has to spin up new identities at a higher rate to keep occupying close-K slots.

When a peer's storage quote ships a `pub_key` that does not BLAKE3-hash
to the peer's claimed `PeerId` ("crossed-key" / bad-binding), report a
strong negative trust event so the local AdaptiveDHT swaps that peer
out of the routing table on the next admission cycle. This stops the
operator-monopolises-close-K failure mode that took uploads below
quorum on 2026-05-06 (see notes/plan-1-bad-node-eviction.md).

Wires the report at both detection sites in
`ant-core/src/data/client/quote.rs`:

  - per-peer `classify_quote_response` (primary defence inside the
    async closure) — fires for every BadQuoteBinding verdict;
  - post-collection `drop_quotes_with_bad_bindings` (defensive filter)
    — fires for every peer whose quote slipped past the per-peer
    handler. Should be empty in normal operation; non-empty signals
    an upstream regression.

Both sites go through a small `TrustReporter` trait so the wiring can
be unit-tested with a mock recorder rather than a live `P2PNode`.
Production paths use the blanket impl on `Arc<P2PNode>` which
forwards to `P2PNode::report_application_failure(peer, 5.0)`.

The `5.0` weight is sized to drop a peer from neutral 0.5 to ~0.26 in
a single event, well below the production swap-out threshold
(`saorsa_core::adaptive::DEFAULT_SWAP_THRESHOLD = 0.35`). saorsa-core
clamps consumer weights at `MAX_CONSUMER_WEIGHT = 5.0` so this is the
strongest legal signal — appropriate for a verifiable cryptographic
mismatch.

Adds 5 new tests:
  - classify_quote_response_reports_trust_event_on_bad_binding (B1)
  - classify_quote_response_does_not_report_on_good_binding
  - classify_quote_response_does_not_report_on_non_binding_failures
  - drop_quotes_with_bad_bindings_reports_one_event_per_dropped_peer (B2)
  - bad_binding_does_not_affect_trust_for_other_peers (B3)

Existing tests for `drop_quotes_with_bad_bindings` are updated for the
new return type (`Vec<PeerId>` rather than `usize` count) so the
caller can attribute trust events per peer.

Depends on: saorsa-labs/saorsa-core#XXX
  (P2PNode::report_application_failure entry point)

ant-node ships the storer-side mirror of this in a separate PR so
this side and that side apply the same penalty for the same evidence
on each end of the wire.
Routes the workspace's saorsa-core dep through the fork branch carrying
saorsa-labs/saorsa-core#114, which adds the
`P2PNode::report_application_failure` method this PR's quote.rs
depends on. CI was failing because that method does not exist on the
upstream rc-2026.4.4 branch yet.

REMOVE this patch block once #114 merges and the regular saorsa-core
git pin in ant-core/Cargo.toml resolves to a commit that includes
the new public method.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant