Skip to content

fix: persist preferred host updates correctly in HttpScheduler retry logic#1206

Open
ttypic wants to merge 1 commit intomainfrom
ECO-5705/fix-host-repin
Open

fix: persist preferred host updates correctly in HttpScheduler retry logic#1206
ttypic wants to merge 1 commit intomainfrom
ECO-5705/fix-host-repin

Conversation

@ttypic
Copy link
Copy Markdown
Contributor

@ttypic ttypic commented Apr 30, 2026

  • Prevent successful response from re-establish fallback host as the preference after fallback timeout
  • Ensured the preferred host is only updated when necessary during retry attempts

Summary by CodeRabbit

  • Bug Fixes

    • Improved host fallback logic to prevent unnecessary switching of preferred hosts after successful fallback connections. The client now only updates the preferred host if the successful connection differs from the original, providing more stable connection behavior in scenarios with multiple hosts and timeout handling.
  • Tests

    • Added comprehensive test coverage for fallback host behavior and timeout scenarios.

…y logic

- Prevent successful response from re-establish fallback host as the preference after fallback timeout
- Ensured the preferred host is only updated when necessary during retry attempts
@ttypic ttypic requested a review from sacOO7 April 30, 2026 22:43
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Walkthrough

This change modifies host fallback pinning logic in the HTTP scheduler to only persist a successful fallback host when it differs from the original preferred host, and adds a test verifying that late-arriving responses don't trigger unwanted re-pinning after the fallback retry timeout expires.

Changes

Cohort / File(s) Summary
Core Fallback Pinning Logic
lib/src/main/java/io/ably/lib/http/HttpScheduler.java
Captures preferred host snapshot at request start and only sets a new preferred host on success if the successful candidate host differs from the original snapshot, preventing unnecessary re-pinning.
Fallback Timeout Behavior Test
lib/src/test/java/io/ably/lib/test/rest/HttpTest.java
Adds integration test verifying that a delayed fallback response arriving after the retry timeout expires does not cause the client to re-pin the fallback host, ensuring correct timeout semantics.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A host snaps its picture, so wise and so keen,
With snapshots and comparisons, the best we have seen,
No re-pinning late guests after timeouts expire,
Fallback behavior perfected—the code we admire! 📌✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fixing how the preferred host is persisted in HttpScheduler retry logic, which aligns with the changeset that prevents re-pinning fallback hosts after timeout.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ECO-5705/fix-host-repin

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/src/main/java/io/ably/lib/http/HttpScheduler.java`:
- Around line 199-210: The current repinning logic compares candidateHost to the
preferredHost captured at request start (preferredHost / candidateHost) which
can incorrectly repin if the original pin expired during the request; change the
check before calling httpCore.hosts.setPreferredHost(candidateHost, true) to
verify the current preferred host and pin age/validity instead of the
start-of-request string — fetch the live preferred host and its pin expiry/age
from httpCore.hosts (or add an accessor like
getPreferredHostExpiry/isPreferredHostPinned) after httpExecuteWithRetry
returns, and only call httpCore.hosts.setPreferredHost(candidateHost, true) if
candidateHost differs from the current live preferred host and the existing pin
is expired/older than fallbackRetryTimeout (or otherwise invalid). Ensure the
change touches the block that calls httpExecuteWithRetry(...) and the subsequent
setPreferredHost(...) call so persistence is gated on current pin validity
rather than the initial preferredHost snapshot.

In `@lib/src/test/java/io/ably/lib/test/rest/HttpTest.java`:
- Around line 1479-1500: The test creates an ExecutorService named executor for
requestFuture but never guarantees shutdown on failures; wrap the code that uses
executor/ requestFuture/ delayedRequestStarted in a try { ... } finally { ... }
block and call executor.shutdownNow() (or shutdown() then awaitTermination) in
the finally to ensure the single-thread executor is always cleaned up; keep
existing assertions and requestFuture.get() inside the try and perform executor
shutdown/termination in the finally to avoid stray test threads.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 086a75d9-c331-4a12-974e-92f411409ff8

📥 Commits

Reviewing files that changed from the base of the PR and between 5c75ea3 and c1dcc7b.

📒 Files selected for processing (2)
  • lib/src/main/java/io/ably/lib/http/HttpScheduler.java
  • lib/src/test/java/io/ably/lib/test/rest/HttpTest.java

Comment on lines +199 to +210
String preferredHost = httpCore.hosts.getPreferredHost();
String candidateHost = preferredHost;
int retryCountRemaining = (httpCore.hosts.fallbackHostsRemaining(candidateHost) > 0) ? httpCore.options.httpMaxRetryCount : 0;

while(!isCancelled) {
try {
boolean shouldPersist = !candidateHost.equals(preferredHost);
result = httpExecuteWithRetry(candidateHost, path, requireAblyAuth);
setResult(result);
httpCore.hosts.setPreferredHost(candidateHost, true);
if (shouldPersist) {
httpCore.hosts.setPreferredHost(candidateHost, true);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid repinning based only on the start-of-request snapshot.

This only compares the final candidateHost against the host captured at request start. A request that begins on a pinned fallback, retries onto another fallback, and completes after fallbackRetryTimeout can still re-pin a fallback host here, which brings back the behavior this PR is trying to remove.

Consider carrying the pin expiry/age through the request or checking whether the pin is still valid at completion time, rather than keying persistence off the initial host string alone.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/src/main/java/io/ably/lib/http/HttpScheduler.java` around lines 199 -
210, The current repinning logic compares candidateHost to the preferredHost
captured at request start (preferredHost / candidateHost) which can incorrectly
repin if the original pin expired during the request; change the check before
calling httpCore.hosts.setPreferredHost(candidateHost, true) to verify the
current preferred host and pin age/validity instead of the start-of-request
string — fetch the live preferred host and its pin expiry/age from
httpCore.hosts (or add an accessor like
getPreferredHostExpiry/isPreferredHostPinned) after httpExecuteWithRetry
returns, and only call httpCore.hosts.setPreferredHost(candidateHost, true) if
candidateHost differs from the current live preferred host and the existing pin
is expired/older than fallbackRetryTimeout (or otherwise invalid). Ensure the
change touches the block that calls httpExecuteWithRetry(...) and the subsequent
setPreferredHost(...) call so persistence is gated on current pin validity
rather than the initial preferredHost snapshot.

Comment on lines +1479 to +1500
ExecutorService executor = Executors.newSingleThreadExecutor();
Future<Long> requestFuture = executor.submit(() -> {
try { return client.time(); }
catch (AblyException e) { throw new RuntimeException(e); }
});

// Wait until req3 has actually entered the listener before starting the clock
assertTrue("Delayed request must start within 5 s", delayedRequestStarted.await(5, TimeUnit.SECONDS));

// Wait 150 ms so that fallbackRetryTimeout (100 ms) expires
Thread.sleep(150L);

// req4: timeout expired → primary tried again
client.time();

// Wait for req3's delayed response to arrive (late fallback success)
requestFuture.get(5, TimeUnit.SECONDS);

// req5: late success from req3 must NOT have re-pinned the fallback
client.time();

executor.shutdown();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Shut down the executor in a finally block.

If await, get, or one of the assertions fails, this single-thread executor never gets cleaned up. That can leave stray test threads running and make failures noisier.

Suggested cleanup
-        ExecutorService executor = Executors.newSingleThreadExecutor();
-        Future<Long> requestFuture = executor.submit(() -> {
-            try { return client.time(); }
-            catch (AblyException e) { throw new RuntimeException(e); }
-        });
+        ExecutorService executor = Executors.newSingleThreadExecutor();
+        try {
+            Future<Long> requestFuture = executor.submit(() -> {
+                try { return client.time(); }
+                catch (AblyException e) { throw new RuntimeException(e); }
+            });
 
-        // Wait until req3 has actually entered the listener before starting the clock
-        assertTrue("Delayed request must start within 5 s", delayedRequestStarted.await(5, TimeUnit.SECONDS));
+            // Wait until req3 has actually entered the listener before starting the clock
+            assertTrue("Delayed request must start within 5 s", delayedRequestStarted.await(5, TimeUnit.SECONDS));
 
-        // Wait 150 ms so that fallbackRetryTimeout (100 ms) expires
-        Thread.sleep(150L);
+            // Wait 150 ms so that fallbackRetryTimeout (100 ms) expires
+            Thread.sleep(150L);
 
-        // req4: timeout expired → primary tried again
-        client.time();
+            // req4: timeout expired → primary tried again
+            client.time();
 
-        // Wait for req3's delayed response to arrive (late fallback success)
-        requestFuture.get(5, TimeUnit.SECONDS);
+            // Wait for req3's delayed response to arrive (late fallback success)
+            requestFuture.get(5, TimeUnit.SECONDS);
 
-        // req5: late success from req3 must NOT have re-pinned the fallback
-        client.time();
+            // req5: late success from req3 must NOT have re-pinned the fallback
+            client.time();
+        } finally {
+            executor.shutdownNow();
+        }
-
-        executor.shutdown();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ExecutorService executor = Executors.newSingleThreadExecutor();
Future<Long> requestFuture = executor.submit(() -> {
try { return client.time(); }
catch (AblyException e) { throw new RuntimeException(e); }
});
// Wait until req3 has actually entered the listener before starting the clock
assertTrue("Delayed request must start within 5 s", delayedRequestStarted.await(5, TimeUnit.SECONDS));
// Wait 150 ms so that fallbackRetryTimeout (100 ms) expires
Thread.sleep(150L);
// req4: timeout expired → primary tried again
client.time();
// Wait for req3's delayed response to arrive (late fallback success)
requestFuture.get(5, TimeUnit.SECONDS);
// req5: late success from req3 must NOT have re-pinned the fallback
client.time();
executor.shutdown();
ExecutorService executor = Executors.newSingleThreadExecutor();
try {
Future<Long> requestFuture = executor.submit(() -> {
try { return client.time(); }
catch (AblyException e) { throw new RuntimeException(e); }
});
// Wait until req3 has actually entered the listener before starting the clock
assertTrue("Delayed request must start within 5 s", delayedRequestStarted.await(5, TimeUnit.SECONDS));
// Wait 150 ms so that fallbackRetryTimeout (100 ms) expires
Thread.sleep(150L);
// req4: timeout expired → primary tried again
client.time();
// Wait for req3's delayed response to arrive (late fallback success)
requestFuture.get(5, TimeUnit.SECONDS);
// req5: late success from req3 must NOT have re-pinned the fallback
client.time();
} finally {
executor.shutdownNow();
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/src/test/java/io/ably/lib/test/rest/HttpTest.java` around lines 1479 -
1500, The test creates an ExecutorService named executor for requestFuture but
never guarantees shutdown on failures; wrap the code that uses executor/
requestFuture/ delayedRequestStarted in a try { ... } finally { ... } block and
call executor.shutdownNow() (or shutdown() then awaitTermination) in the finally
to ensure the single-thread executor is always cleaned up; keep existing
assertions and requestFuture.get() inside the try and perform executor
shutdown/termination in the finally to avoid stray test threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant