Make echo interval and timeout configurable#87
Make echo interval and timeout configurable#87juerghegglin wants to merge 1 commit intoFAForever:masterfrom
Conversation
PeerConnectivityCheckerModule has been the only knob between "this peer
is healthy" and "this peer is dead." The echo cadence (1000 ms) and
silence threshold (10000 ms) were hardcoded constants. The 10-second
threshold sets the floor on the user-visible deadlock window when a
peer drops mid-game: lockstep stalls until the adapter declares the
peer disconnected, and that takes ten seconds no matter what.
Real-world data: in a 16-player game session, a single peer-side issue
froze every other player's simulation for the full 10 seconds before
the game's vote-to-continue could fire. Tournament players in
particular have asked for tighter detection.
This change exposes two CLI options:
--echo-interval-ms (default 1000)
--echo-timeout-ms (default 10000)
Defaults preserve current behavior. Power users can pass for example
'--echo-timeout-ms 2000' to detect peer loss in 2 seconds instead of
10. The log message "for the past N seconds" becomes "for the past
{}ms" so it reflects the configured value.
The diagnostic-grep pattern "Didn't receive any answer to echo
requests" is unchanged, so log-analysis tooling is not affected.
|
Warning Rate limit exceeded
To continue reviewing without waiting, purchase usage credits in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Problem
PeerConnectivityCheckerModuleis the only thing standing between "this peer is healthy" and "this peer is dead." Both knobs are hardcoded:The 10-second silence threshold puts a hard floor on the user-visible deadlock window when a peer drops mid-game: SC's lockstep simulation stalls until the adapter declares the peer disconnected, and that takes 10 seconds regardless of how fast the actual drop was. Tournament players (and anyone debugging chronic mesh drops) have no way to tighten this.
Real-world evidence
A 16-player game session: one peer experienced what looked like a JVM/CPU pause on their machine. From the local adapter's perspective the peer simply went silent. The game froze for every other player while the lockstep waited for input that wasn't coming, until the 10-second echo timeout fired and the simulation could move on with a vote-to-continue. The full freeze duration is essentially
ECHO_TIMEOUT_MS + (FA's vote-grace period).A separate teammate's log shows a flapping peer with three echo timeouts inside 26 seconds, each followed by a successful re-establishment. With the same data and a 2-second timeout, those would have been 2-second blips instead of 10-second freezes.
Change
Two new CLI options, defaults preserve current behavior:
--echo-interval-ms1000--echo-timeout-ms10000Files touched:
IceOptions.java— adds the two@Optionfields.IceAdapter.java— adds two static accessors (getEchoIntervalMs(),getEchoTimeoutMs()) following the existing pattern (getPingCount,getAcceptableLatency).PeerConnectivityCheckerModule.java— replaces the two hardcoded values with calls to the new accessors. The log message at line 138 becomes"for the past {}ms from {}"(was"for the past 10 seconds from {}") so it reflects the configured value, but the prefixDidn't receive any answer to echo requestsstays identical so existing log-grep tooling continues to work.Use cases
--echo-timeout-ms 2000 --echo-interval-ms 500. Detects peer loss in 2 seconds; loses 4 echoes before declaring dead (still robust to occasional packet loss). Combined with the FAF client's existing reconnect logic, this makes brief network blips much less disruptive.--echo-timeout-ms 15000if needed.Risk
Verification
:ice-adapter:check(compile + spotlessCheck) passes afterspotlessApply.--echo-timeout-ms 2000and observing the connectivity check now fires after 2 s (verified with the existing repro pattern — silent UDP server).Co-authored with Claude (Anthropic); reviewed and locally verified by me.