Cap reconnect attempts in PeerIceModule (default 5) by juerghegglin · Pull Request #86 · FAForever/java-ice-adapter

juerghegglin · 2026-05-06T17:39:27Z

Problem

PeerIceModule.onConnectionLost() schedules a new initiateIce() on every failure with no upper bound on attempts. When a peer truly disappears mid-game — their machine crashes, FA closes, or a P2P route in the mesh breaks in a way that won't recover — the retry loop runs until local FA shutdown:

every ~11 seconds: gather STUN + TURN candidates, send them via RPC to the lobby server, wait 6 s for the peer's response, fail, schedule another attempt
in a 16-player game with one bad link, this is one P2P session burning CPU/network/log indefinitely while the simulation has long since moved past

Real-world evidence

A deadlocked 16-player session: a single peer drop produced 112 ICE re-initiations across 20 minutes for one peer that never returned. The local FA's lockstep took ~10 s to mark the peer defeated and continue (per game-engine state in JsonStats), but the ice-adapter spent the next ~20 minutes retrying into the void:

100+ state-change notifications (onIceConnectionStateChanged: gathering / awaitingCandidates / disconnected) sent to the FAF client
Continuous STUN/TURN harvesting (5 servers per attempt × 112 attempts = ~560 round trips)
Telemetry websocket churn coupled to each state change

A separate teammate's log from a different session shows a peer flapping with 3 successful reconnects in 26 seconds — those recoveries happen well within a 5-attempt budget, so this cap does not regress the transient-flap recovery case that the existing retry loop is designed to handle.

Change

Adds a counter for consecutive failed reconnect attempts:

Reset to 0 in startIce() when CONNECTED is reached. A successful connection clears the budget.
Reset to 0 in onConnectionLost() when the previous state was CONNECTED (a fresh disconnect deserves a fresh retry budget).
Incremented on every retry attempt.
Once attempt > MAX_RECONNECT_ATTEMPTS (default 5), onConnectionLost() returns without scheduling a new initiateIce(). State stays DISCONNECTED.

Total automatic-retry budget: about 55 seconds (5 attempts × ~11 s each).

Recovery semantics (unchanged)

The FAF client's manual reconnect button (#61) calls GameSession.reconnectToPeer(), which disconnects and re-connectToPeer()s. That creates a fresh Peer and a fresh PeerIceModule — counter starts at 0 again. Users retain explicit control to retry.
Answerer-mode peers continue to react to new offers received via onIceMessageReceived() regardless of cap state — that path goes through initiateIce() directly, not the retry scheduler.

Risk

The MAX_RECONNECT_ATTEMPTS = 5 constant is conservative. Could be made configurable via IceOptions in a follow-up if maintainers want tournament-mode tuning. Kept hardcoded here for minimal scope.
Behavior is unchanged for any peer that successfully reconnects within 5 attempts (which is every recovering peer in the available real-world logs).
Behavior changes for peers that never recover: the adapter now stops retrying after ~55 s instead of looping forever.

Verification

Local :ice-adapter:check (compile + spotlessCheck) passes after spotlessApply.
No JUnit tests added — ice-adapter has no JUnit infrastructure yet, and bringing it in alongside this change would expand scope. Happy to follow up with a regression test (and the JUnit infrastructure for it) if maintainers want.

Co-authored with Claude (Anthropic); reviewed and locally verified by me.

Summary by CodeRabbit

Bug Fixes

Implemented a limit of 5 consecutive reconnection attempts to prevent infinite connection loops and improve system stability.
Enhanced connection recovery with intelligent retry delays and improved connection state tracking for better resilience during network disruptions.
Added notifications to inform users when maximum reconnection attempts are reached.

PeerIceModule.onConnectionLost() previously scheduled a new initiateIce() on every failure with no upper bound on attempts. When a peer truly disappears (their machine dies, FA closes, or a P2P route in the mesh breaks unrecoverably), the loop runs until the local FA SHUTDOWN: every ~11 seconds, gather STUN+TURN candidates, send them via RPC, wait 6s for a reply that never comes, repeat. Real-world data from a deadlocked 16-player session: one peer drop led to 112 ICE re-initiations across 20 minutes for a single peer that was never coming back. During that whole window the local game generates state-change RPC notifications, telemetry events, and STUN/TURN load that contributes nothing to recovery. In a P2P mesh, this can also contribute CPU pressure that delays echo replies on healthy peers, inducing further false-positive timeouts on otherwise-good links. This change adds a counter for consecutive failed reconnect attempts: - reset to 0 in startIce() when CONNECTED is reached; - reset to 0 in onConnectionLost() when the previous state was CONNECTED (a fresh disconnect deserves a fresh retry budget); - incremented on every retry attempt; - once the counter exceeds MAX_RECONNECT_ATTEMPTS (default 5), the retry loop stops and the peer stays in DISCONNECTED. Recovery path is unchanged: the FAF client's manual reconnect button (PR FAForever#61) calls GameSession.reconnectToPeer(), which creates a fresh Peer and a fresh PeerIceModule (counter starts at 0 again), so the user retains explicit control. Answerer-mode peers also re-engage normally when they receive a new offer via RPC. 5 attempts at ~11s each gives roughly 55s of automatic retry before giving up: generous enough for transient flaps (logs from a separate session show a peer flapping with 3 successful reconnects in 26s, well within budget), short enough that a truly-gone peer doesn't burn 20 minutes of background work.

coderabbitai · 2026-05-06T17:39:39Z

📝 Walkthrough

Walkthrough

The pull request introduces a capped ICE reconnection policy in PeerIceModule that limits consecutive reconnection attempts to 5 and applies strategic delays before reattempts based on prior connection state.

Changes

ICE Reconnection Cap

Layer / File(s)	Summary
State & Constants `ice-adapter/src/main/java/com/faforever/iceadapter/ice/PeerIceModule.java`	`MAX_RECONNECT_ATTEMPTS` constant set to 5, and `consecutiveFailedAttempts` AtomicInteger field added to track reconnection attempts.
Connection Establishment `ice-adapter/src/main/java/com/faforever/iceadapter/ice/PeerIceModule.java`	When a connection is established, `consecutiveFailedAttempts` is reset to 0.
Reconnection Logic `ice-adapter/src/main/java/com/faforever/iceadapter/ice/PeerIceModule.java`	Enhanced `onConnectionLost` increments the attempt counter, enforces the cap, logs a warning when max is reached, and schedules reattempts with conditional delays (0 ms if previously connected, 5 seconds otherwise).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Five chances to reconnect, no more, no less,
With timely delays to ease the ICE stress,
A counter keeps watch, the logic runs clean,
The most resilient connection you've ever seen! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Cap reconnect attempts in PeerIceModule (default 5)' directly and accurately describes the main change: implementing a reconnect attempt limit of 5 in the PeerIceModule class. It is concise, specific, and aligns with the primary objective of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

ice-adapter/src/main/java/com/faforever/iceadapter/ice/PeerIceModule.java (1)

451-464: 🏗️ Heavy lift

Add automated boundary tests for retry cap/reset behavior.

This async branch is easy to regress; please add tests for: (1) retries 1..5 scheduled, (2) retry 6 blocked, (3) counter reset after reconnect success, and (4) immediate-vs-delayed retry timing by previous state.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ice-adapter/src/main/java/com/faforever/iceadapter/ice/PeerIceModule.java`
around lines 451 - 464, Add automated unit tests in the PeerIceModule test suite
covering retry cap and reset: simulate peer.isLocalOffer() true and assert
consecutiveFailedAttempts increments and that initiateIce is scheduled for
attempts 1..MAX_RECONNECT_ATTEMPTS (use a controllable executor or mock
IceAdapter.getExecutor()), assert that attempt MAX_RECONNECT_ATTEMPTS+1 does not
schedule initiateIce (blocked), verify that a successful reconnect resets
consecutiveFailedAttempts to 0, and verify delayMs is 0 when previousState ==
CONNECTED and 5000 otherwise by asserting which delayedExecutor delay is used;
target symbols: PeerIceModule, consecutiveFailedAttempts,
MAX_RECONNECT_ATTEMPTS, initiateIce, previousState, CONNECTED.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@ice-adapter/src/main/java/com/faforever/iceadapter/ice/PeerIceModule.java`:
- Around line 451-464: Add automated unit tests in the PeerIceModule test suite
covering retry cap and reset: simulate peer.isLocalOffer() true and assert
consecutiveFailedAttempts increments and that initiateIce is scheduled for
attempts 1..MAX_RECONNECT_ATTEMPTS (use a controllable executor or mock
IceAdapter.getExecutor()), assert that attempt MAX_RECONNECT_ATTEMPTS+1 does not
schedule initiateIce (blocked), verify that a successful reconnect resets
consecutiveFailedAttempts to 0, and verify delayMs is 0 when previousState ==
CONNECTED and 5000 otherwise by asserting which delayedExecutor delay is used;
target symbols: PeerIceModule, consecutiveFailedAttempts,
MAX_RECONNECT_ATTEMPTS, initiateIce, previousState, CONNECTED.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 49bcc204-560b-4900-8faa-1ce915e2d9b6

📥 Commits

Reviewing files that changed from the base of the PR and between d49f9fe and f08236f.

📒 Files selected for processing (1)

ice-adapter/src/main/java/com/faforever/iceadapter/ice/PeerIceModule.java

juerghegglin · 2026-05-06T18:28:05Z

Acknowledging the suggestion and proposing to defer it.

ice-adapter currently has no JUnit infrastructure: no junit-jupiter testImplementation, no useJUnitPlatform() in build.gradle, and src/test/java contains only the legacy main()-style integration helpers (IceTest.java, SignallingServer.java). Adding the proposed boundary tests would also require adding the test framework, which expands this PR from a focused 1-file behavior change into a project-wide infrastructure addition.

I'd rather propose JUnit onboarding separately so maintainers can accept (or reject) the test framework on its own merits, independent of accepting this fix. The boundary tests are written and passing locally on a separate branch — happy to open a follow-up PR introducing JUnit Jupiter + these tests (plus equivalent regression tests for #84/#85) if there's appetite for unit-testing infrastructure in this module.

In the meantime the change is small enough to review by inspection: the entire retry-decision block fits on one screen and the counter semantics are isolated to the existing lockLostConnection critical section.

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cap reconnect attempts in PeerIceModule (default 5)#86

Cap reconnect attempts in PeerIceModule (default 5)#86
juerghegglin wants to merge 1 commit intoFAForever:masterfrom
juerghegglin:fix/cap-reconnect-attempts

juerghegglin commented May 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 6, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

juerghegglin commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

juerghegglin commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Real-world evidence

Change

Recovery semantics (unchanged)

Risk

Verification

Summary by CodeRabbit

Bug Fixes

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

juerghegglin commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

juerghegglin commented May 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading