Skip to content

Fix FORCE_SRFLX/FORCE_RELAY escalation for silent peers#88

Open
juerghegglin wants to merge 1 commit into
FAForever:masterfrom
juerghegglin:fix/silent-peer-relay-escalation
Open

Fix FORCE_SRFLX/FORCE_RELAY escalation for silent peers#88
juerghegglin wants to merge 1 commit into
FAForever:masterfrom
juerghegglin:fix/silent-peer-relay-escalation

Conversation

@juerghegglin
Copy link
Copy Markdown

@juerghegglin juerghegglin commented May 6, 2026

Problem

PeerIceModule.gatherCandidates() reads connectivityAttemptTimes to decide whether to drop host or srflx candidates on each retry, via FORCE_SRFLX_COUNT and FORCE_RELAY_COUNT. The intent is clear: if recent attempts have failed, escalate to relay-only because direct candidates aren't working.

But connectivityAttemptTimes is only ever appended in startIce(), which runs after the peer's candidates have arrived and ICE has reached CHECKING. Silent peers never reach CHECKING. So the list stays empty, every retry reads 0 prior attempts, and escalation never fires for the exact case it's most needed: a peer with hard NAT or a network path that selectively breaks direct connectivity.

The check at PeerIceModule.gatherCandidates():197:

long previousConnectivityAttempts = getConnectivityAttempsInThePast(FORCE_SRFLX_RELAY_INTERVAL);
CandidatesMessage localCandidatesMessage = CandidateUtil.packCandidates(
        IceAdapter.getId(),
        peer.getRemoteId(),
        agent,
        component,
        previousConnectivityAttempts < FORCE_SRFLX_COUNT && ALLOW_HOST,
        previousConnectivityAttempts < FORCE_RELAY_COUNT && ALLOW_REFLEXIVE,
        ALLOW_RELAY);

The append at PeerIceModule.startIce():334:

private void startIce() {
    connectivityAttemptTimes.add(0, System.currentTimeMillis());
    ...
}

Reaching startIce() requires onIceMessageReceived() first, which only fires when the peer's candidates arrive via RPC. If they never do, the list stays empty.

Real-world evidence

In a 16-player game session: a peer who had genuinely disconnected mid-game, 112 retry attempts, every single one offered host(udp), host(udp), srflx(udp), relay(udp) — the full set. Escalation never fired because the list was empty after each attempt.

A teammate's log from a different session shows a peer flapping but responding. There the candidate set escalates as expected: host, srflxsrflx only → relay only across 3 retries. That's because startIce was firing on each successful re-establishment, growing the list.

Fix

Move the connectivityAttemptTimes.add(0, System.currentTimeMillis()) call from startIce() into gatherCandidates(), placed immediately after the count is read so the current attempt doesn't influence its own escalation decision.

         long previousConnectivityAttempts = getConnectivityAttempsInThePast(FORCE_SRFLX_RELAY_INTERVAL);
+        // Count this attempt now so FORCE_SRFLX/FORCE_RELAY escalation also fires for peers that
+        // never respond (otherwise startIce() would be the only place the count grows, and that
+        // path is unreachable for silent peers).
+        connectivityAttemptTimes.add(0, System.currentTimeMillis());
         CandidatesMessage localCandidatesMessage = CandidateUtil.packCandidates(
     private void startIce() {
-        connectivityAttemptTimes.add(0, System.currentTimeMillis());
-
         log.debug("{} Starting ICE for peer {}", getLogPrefix(), peer.getRemoteId());

Risk

Scenario Before After
Peer responds on first try (startIce reached) List grows by 1 in startIce List grows by 1 in gatherCandidates (same net effect)
Peer never responds (silent, e.g. crashed mid-game) List stays empty across all retries; escalation never fires; relay-only never offered List grows by 1 per retry; escalation fires at retry 2 (drop host) and retry 3 (relay only)
Peer responds intermittently (flap) List grows on each successful CHECKING; escalation works Same — list grows on each gather; escalation works (each gather counts once, regardless of outcome)

Effectively a one-line move plus a comment, with the only behavior change being for silent-peer retries — which is exactly the case FORCE_RELAY_COUNT = 2 was designed for.

Verification

  • Local :ice-adapter:check (compile + spotlessCheck) passes after spotlessApply.
  • The fix changes when the list grows but not what it represents, so no API or contract changes.

Co-authored with Claude (Anthropic); reviewed and locally verified by me.

Summary by CodeRabbit

  • Bug Fixes
    • Improved connectivity escalation mechanism for peer-to-peer connections that do not respond to initial attempts.

PeerIceModule's connectivityAttemptTimes list is read in
gatherCandidates() to decide whether to drop host or srflx candidates
on retry (FORCE_SRFLX_COUNT, FORCE_RELAY_COUNT). The list is appended
to in startIce(), which only runs after the peer's candidates arrive
and CHECKING is reached.

When a peer never responds (machine crashed, FA closed, route to peer
broken), startIce never runs, the list never grows, every retry reads
0 prior attempts, and the escalation never fires. The retry loop ends
up offering host+srflx+relay forever to a peer that can't be reached
at any of those candidate types.

Real-world evidence from a 16-player session: 112 retries to a peer
that had truly disconnected, every single one offering the full
candidate set. A separate teammate's log (peer responding) shows the
escalation working correctly because startIce was being reached.

Fix: move the connectivityAttemptTimes.add() call from startIce() to
gatherCandidates(), placed AFTER the count is read so the current
attempt isn't included in its own escalation decision. Each gather
attempt now counts once. Behavior for peers that respond is unchanged
(startIce previously appended; gatherCandidates now appends - same
net effect). Behavior for silent peers is fixed: retry 2 drops host,
retry 3 drops srflx, as the FORCE_*_COUNT thresholds intend.

Pairs naturally with the cap-reconnect-attempts PR (#TBD): with that
PR's cap of 5, silent peers now also reach relay-only by attempt 3,
maximizing the chance the cap completes for genuinely-flaky peers
before giving up.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7d465d78-e4b9-4bc6-960c-163e4bbc41ae

📥 Commits

Reviewing files that changed from the base of the PR and between d49f9fe and eac77e6.

📒 Files selected for processing (1)
  • ice-adapter/src/main/java/com/faforever/iceadapter/ice/PeerIceModule.java

📝 Walkthrough

Walkthrough

The connectivity attempt timestamp is now recorded in the gatherCandidates method instead of startIce to ensure the escalation trigger for non-responsive peers activates at the correct point in the ICE candidate gathering flow.

Changes

Connectivity Attempt Timestamp Relocation

Layer / File(s) Summary
ICE Timing Logic
ice-adapter/src/main/java/com/faforever/iceadapter/ice/PeerIceModule.java
Timestamp recording for connectivityAttemptTimes moved from startIce method to gatherCandidates method to align with when candidates are actively gathered, ensuring FORCE_SRFLX/FORCE_RELAY escalation logic triggers correctly for unresponsive peers.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A hop through the timeline,
Where timestamps now flow just right—
From start to gather they leap,
ICE escalates when peers stay out of sight.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix FORCE_SRFLX/FORCE_RELAY escalation for silent peers' directly and specifically describes the main change: fixing escalation behavior for peers that don't respond. It accurately summarizes the core issue and solution.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant