feat(dkg): implement sync#311
Conversation
|
Claude finished @iamquang95's task in 6m 16s —— View job Code Review
SummaryThis PR implements the DKG step-synchronization protocol ( Findings[High] Step is updated for peers that fail validation In Evidence: if let Err(error) = protocol::validate_request_with_public_key(...) {
// sets server error, response.error
} else {
server.mark_connected(peer_id).await;
}
// ← this runs even after validation failure
if server.update_step(peer_id, message.step).await? {
send_inbound_event(...);
}
// loop continues - next read will also come from the invalid peerA peer that sends an invalid hash signature (wrong cluster) can still advance the step tracker and the loop keeps reading from that peer. The Go reference returns from the handler after an invalid message. The function should [Medium] Protocol ID has trailing slash — verify against Go
pub const PROTOCOL_NAME: libp2p::StreamProtocol = StreamProtocol::new("/charon/dkg/sync/1.0.0/");The trailing [Medium]
[Low] Redundant explicit arms before wildcard in
ConnectionEvent::AddressChange(_)
| ConnectionEvent::LocalProtocolsChange(_)
| ConnectionEvent::RemoteProtocolsChange(_) => {}
ConnectionEvent::ListenUpgradeError(_) => {}
_ => {} // ← makes the lines above unreachableThe named arms are unreachable because [Low]
[Low]
Parity Matrix
TestsTests were not run (no
Missing coverage:
Open Questions
|
Same as charon implementation
Matched with golang:
Matched with charon, also it's not a bug. |
|
@claude now you will have the charon source code, review it again |
|
Claude finished @iamquang95's task in 3m 23s —— View job Code Review (with Charon source)
SummaryWith access to Corrections to Previous Review[Retracted — High] Step updated after validation failure Confirmed correct. Go's if err := s.validReq(pubkey, msg); err != nil {
s.setErr(...)
resp.Error = err.Error()
} else if !s.isConnected(pID) {
s.setConnected(pID)
}
if err := s.updateStep(pID, int(msg.GetStep())); err != nil { // ← unconditional
return err
}Rust [Retracted — Medium] Protocol ID trailing slash Confirmed correct. Go uses [Retracted — Medium] Confirmed correct. Go's Remaining Valid Findings[Low] Redundant match arms before wildcard in
ConnectionEvent::AddressChange(_)
| ConnectionEvent::LocalProtocolsChange(_)
| ConnectionEvent::RemoteProtocolsChange(_) => {}
ConnectionEvent::ListenUpgradeError(_) => {}
_ => {} // makes the four named arms above dead codeThe named arms are unreachable. Remove them. Fix this → [Low]
New Finding[Low] Relay error detection may miss Go's func isRelayError(err error) bool {
return errors.Is(err, network.ErrReset) ||
errors.Is(err, network.ErrResourceScopeClosed) // ← second case
}Rust ( relay_reset: error.kind() == std::io::ErrorKind::ConnectionReset,
Recommend mapping Informational NoteRust Go's Parity Matrix (updated)
|
Don't have the equivalent error in rust. |
varex83agent
left a comment
There was a problem hiding this comment.
Review: feat(dkg): implement sync
This PR ports the Charon DKG sync protocol to Rust. The architecture is solid — the Arc<Inner> / RwLock<ServerState> / Notify design is correct, the wire format matches Go's i64 LE framing, and the step validation rules are faithful. Test coverage is good with both unit tests mirroring Go's test shapes and an integration round-trip test.
However there are four bugs that need to be fixed before merge:
stop_rx.changed()spurious cancel —activate()sendsfalseonstop_tx, which fireschanged()in the running outbound stream and kills it prematurely (handler.rs:299,319+client.rs:178)- Inbound future replaced without cleanup — dropping the old
BoxFutureskipsserver.clear_connected, leaving the peer permanently in the connected set (handler.rs:257) - No backoff on Reconnectable exit — transitioning to
Idledirectly can spin-loop under relay churn; theon_dial_upgrade_errorpath correctly callsschedule_retry()but the stream-exit path does not (handler.rs:226) NegotiationFailedretries forever — when a peer doesn't support the protocol, retrying with backoff is pointless; the handler should callclient.finish(Err(Unsupported))(handler.rs:125)
Additionally, await_all_shutdown is missing the error-state check that await_all_connected and await_all_at_step both have, and the MAX_MESSAGE_SIZE constant (128 MiB) diverges from Go's 32 MiB cap.
Inline comments are filed for each issue individually.
varex83agent
left a comment
There was a problem hiding this comment.
This PR implements the DKG sync protocol (port of charon/dkg/sync) using a libp2p NetworkBehaviour + per-connection ConnectionHandler, with Server/Client Arc handles for external callers. The overall architecture is solid and the step-validation, version-checking, and signature-verification logic is functionally equivalent to the Go reference. There is one resource-lifecycle bug and two major concurrency/security issues that must be addressed before merge.
Bugs (must fix):
clear_connected()is never called when the inboundBoxFutureis replaced or the connection is dropped mid-stream — the peer stays inserver.connectedpermanently, breakingawait_all_connectedafter reconnect cycles.
Major findings:
2. read_fixed_size_protobuf uses the global 128 MB limit for MsgSync/MsgSyncResponse messages that are at most ~200 bytes — any authenticated peer can force a 128 MB heap allocation.
3. When WaitingRetry fires but try_claim_outbound() fails (a second connection holds the claim), the handler falls back to Idle with no waker registered — it can be permanently stuck until an unrelated connection event arrives.
Minor findings: relay-error detection misses ErrResourceScopeClosed; DialFailure only retries on Transport errors (too narrow); RTT is not recorded to the libp2p peerstore; no integration tests for the error-propagation paths (version mismatch → set_err); inbound stream continues after set_err refreshing notify_waiters at 100 ms; client.shutdown() in the integration test has no timeout.
Verdict: REQUEST_CHANGES — the clear_connected lifecycle bug can cause silent test hangs and incorrect barrier semantics; the 128 MB allocation vector is a DoS risk in a cooperative but potentially adversarial network; and the stuck-Idle handler is a correctness issue in multi-connection scenarios.
# Conflicts: # crates/dkg/src/lib.rs
Merge to unblock dkg, will fix any issues later
Should address #133