Skip to content

fix: apply fixes from bohdan/finish-dkg#361

Open
varex83 wants to merge 1 commit intomainfrom
bohdan/pre-dkg-bug-fix
Open

fix: apply fixes from bohdan/finish-dkg#361
varex83 wants to merge 1 commit intomainfrom
bohdan/pre-dkg-bug-fix

Conversation

@varex83
Copy link
Copy Markdown
Collaborator

@varex83 varex83 commented Apr 29, 2026

Summary

This PR fixes several bugs discovered during DKG integration testing, primarily around relay connection lifecycle management in the P2P layer.

Relay reconnection on disconnect / dial failure (relay.rs)

Previously, if a relay connection dropped or a dial attempt failed, the node would not attempt to re-establish the connection, leaving it permanently unable to route traffic through that relay.

  • Added relay_peers: HashMap<PeerId, Peer> to MutableRelayReservation to remember all known relay peers so they can be re-dialed when a connection is lost.
  • Added connected_relays: HashSet<PeerId> to skip redundant dials when a connection is already established or in-flight.
  • on_swarm_event now handles ConnectionClosed (last connection dropped) and DialFailure to trigger a re-dial, clearing stale pending/connected state first.
  • Switched relay dials from DialOpts::unknown_peer_id() to DialOpts::peer_id(...).condition(DisconnectedAndNotDialing) so libp2p can deduplicate concurrent dial attempts.

Relay-ready settling delay (relay.rs)

RelayRouter was immediately attempting to route peers through a relay the moment the connection was established, before the relay reservation handshake could complete. This caused circuit-dial attempts to fail silently.

  • Added connected_relays: HashMap<PeerId, Instant> to RelayRouter to record when each relay became connected.
  • Added relay_ready() which gates routing behind a 2-second RELAY_READY_DELAY so the reservation handshake has time to finish.
  • RelayRouter::run_relay_router now iterates over connected_relays (only relays with an active connection) instead of all configured relays, and skips any that haven't settled yet.
  • Peer dials via relay also use DisconnectedAndNotDialing to avoid redundant dial attempts.

Fix malformed circuit multiaddresses (relay.rs, utils.rs)

Relay peer addresses stored in the Peer struct sometimes already included a trailing /p2p/<peer-id> component. Appending another /p2p/<relay-id>/p2p-circuit (or /p2p-circuit/p2p/<target-id>) on top produced invalid multiaddresses that libp2p silently rejected.

  • Both queue_relay_dial and multi_addrs_via_relay now strip any existing /p2p/... protocol components from the base address before constructing circuit or direct-dial addresses.

Fix encode_0x_hex for empty input (serde_utils.rs)

encode_0x_hex(&[]) previously returned "0x" instead of "", which caused downstream deserialization failures for optional byte fields encoded as empty strings.

Comment thread crates/p2p/src/relay.rs
&& let Some(peer) = self.relay_peers.get(&peer_id).cloned()
{
self.pending_relays.remove(&peer_id);
Self::queue_relay_dial(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

/// Encodes bytes as lowercase `0x`-prefixed hex.
/// In case of empty bytes, returns an empty string.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will affect the eth2util serialization.
From here: https://github.com/NethermindEth/pluto/blob/main/crates/eth2api/src/spec/bellatrix.rs#L121

The bellatrix extra_data use the Hex0x which uses encode_0x_hex.

In go-eth2-client, this field expect "0x" when the extra_data is empty: https://github.com/attestantio/go-eth2-client/blob/master/spec/bellatrix/executionpayload.go#L91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants