Fix FORCE_SRFLX/FORCE_RELAY escalation for silent peers#88
Conversation
PeerIceModule's connectivityAttemptTimes list is read in gatherCandidates() to decide whether to drop host or srflx candidates on retry (FORCE_SRFLX_COUNT, FORCE_RELAY_COUNT). The list is appended to in startIce(), which only runs after the peer's candidates arrive and CHECKING is reached. When a peer never responds (machine crashed, FA closed, route to peer broken), startIce never runs, the list never grows, every retry reads 0 prior attempts, and the escalation never fires. The retry loop ends up offering host+srflx+relay forever to a peer that can't be reached at any of those candidate types. Real-world evidence from a 16-player session: 112 retries to a peer that had truly disconnected, every single one offering the full candidate set. A separate teammate's log (peer responding) shows the escalation working correctly because startIce was being reached. Fix: move the connectivityAttemptTimes.add() call from startIce() to gatherCandidates(), placed AFTER the count is read so the current attempt isn't included in its own escalation decision. Each gather attempt now counts once. Behavior for peers that respond is unchanged (startIce previously appended; gatherCandidates now appends - same net effect). Behavior for silent peers is fixed: retry 2 drops host, retry 3 drops srflx, as the FORCE_*_COUNT thresholds intend. Pairs naturally with the cap-reconnect-attempts PR (#TBD): with that PR's cap of 5, silent peers now also reach relay-only by attempt 3, maximizing the chance the cap completes for genuinely-flaky peers before giving up.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe connectivity attempt timestamp is now recorded in the ChangesConnectivity Attempt Timestamp Relocation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Problem
PeerIceModule.gatherCandidates()readsconnectivityAttemptTimesto decide whether to drop host or srflx candidates on each retry, viaFORCE_SRFLX_COUNTandFORCE_RELAY_COUNT. The intent is clear: if recent attempts have failed, escalate to relay-only because direct candidates aren't working.But
connectivityAttemptTimesis only ever appended instartIce(), which runs after the peer's candidates have arrived and ICE has reachedCHECKING. Silent peers never reachCHECKING. So the list stays empty, every retry reads 0 prior attempts, and escalation never fires for the exact case it's most needed: a peer with hard NAT or a network path that selectively breaks direct connectivity.The check at
PeerIceModule.gatherCandidates():197:The append at
PeerIceModule.startIce():334:Reaching
startIce()requiresonIceMessageReceived()first, which only fires when the peer's candidates arrive via RPC. If they never do, the list stays empty.Real-world evidence
In a 16-player game session: a peer who had genuinely disconnected mid-game, 112 retry attempts, every single one offered
host(udp), host(udp), srflx(udp), relay(udp)— the full set. Escalation never fired because the list was empty after each attempt.A teammate's log from a different session shows a peer flapping but responding. There the candidate set escalates as expected:
host, srflx→srflxonly →relayonly across 3 retries. That's becausestartIcewas firing on each successful re-establishment, growing the list.Fix
Move the
connectivityAttemptTimes.add(0, System.currentTimeMillis())call fromstartIce()intogatherCandidates(), placed immediately after the count is read so the current attempt doesn't influence its own escalation decision.private void startIce() { - connectivityAttemptTimes.add(0, System.currentTimeMillis()); - log.debug("{} Starting ICE for peer {}", getLogPrefix(), peer.getRemoteId());Risk
startIcereached)startIcegatherCandidates(same net effect)Effectively a one-line move plus a comment, with the only behavior change being for silent-peer retries — which is exactly the case
FORCE_RELAY_COUNT = 2was designed for.Verification
:ice-adapter:check(compile + spotlessCheck) passes afterspotlessApply.Co-authored with Claude (Anthropic); reviewed and locally verified by me.
Summary by CodeRabbit