Skip to content

fix(mothership): queue supersede crash#4297

Merged
icecrasher321 merged 3 commits intostagingfrom
fix/mothership-abort-race
Apr 25, 2026
Merged

fix(mothership): queue supersede crash#4297
icecrasher321 merged 3 commits intostagingfrom
fix/mothership-abort-race

Conversation

@icecrasher321
Copy link
Copy Markdown
Collaborator

Summary

Queue supersede via double enter abort propagation state machine improvement.

Type of Change

  • Bug fix

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Apr 25, 2026 1:24am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 25, 2026

PR Summary

Medium Risk
Changes stream termination and cancellation classification around abort markers and orchestrator results, which can affect how partial responses are persisted and how runs are traced. Risk is moderated by added unit coverage for the new abort/body-close and cancel paths.

Overview
Improves cancellation/abort propagation for Copilot SSE streaming so a Go stream that closes without a terminal event can be treated as an explicit user abort when a Redis abort marker exists.

runStreamLoop now checks hasAbortMarker(messageId) on body close and, if present, marks the stream aborted and triggers a new onAbortObserved(AbortReason.MarkerObservedAtBodyClose) hook; otherwise it preserves the existing fail-closed 503 error behavior.

Lifecycle outcome classification is updated to treat result.cancelled as cancelled even when the local abortSignal was not aborted (streaming + headless), and startAbortPoller now defaults to a faster 250ms cadence; tests were added/updated to cover these scenarios and the expanded explicit-stop vocabulary.

Reviewed by Cursor Bugbot for commit cd1f826. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

This PR fixes a crash ("queue supersede") where a user-initiated stop propagated via Redis abort marker would close the Go SSE body before the 250 ms poller's next tick, causing runStreamLoop to misclassify the clean abort as a CopilotBackendError 503. The fix adds a Redis marker check at body close that reclassifies the outcome as Aborted, wires a new onAbortObserved callback so the local AbortController gets the correct reason for telemetry, and fixes the outcome-priority ordering in headless.ts to match the SSE path (result.success evaluated first).

Confidence Score: 5/5

Safe to merge; no new P0 or P1 findings — all substantive concerns were addressed in prior review threads.

The logic is well-reasoned: fail-closed on Redis error, correct abort-before-clear ordering in the poller, consistent outcome priority between SSE and headless paths, and strong test coverage for all new branches including the Redis-failure edge case. The open design choices (marker not cleared on the body-close path, 250 ms poll interval) were deliberately discussed in prior review threads and left intentionally.

No files require special attention.

Important Files Changed

Filename Overview
apps/sim/lib/copilot/request/go/stream.ts New body-close abort marker check reclassifies spurious 503s as Aborted; marker is not cleared on this path (pre-existing design choice noted in prior thread, delegated to cleanupAbortMarker in start.ts finally).
apps/sim/lib/copilot/request/lifecycle/start.ts Adds onAbortObserved callback that fires abortController.abort(reason) so the body-close path sets signal reason; adds result.cancelled to the outcome condition on the happy path.
apps/sim/lib/copilot/request/lifecycle/headless.ts Fixes outcome priority: result.success now evaluated before abortSignal?.aborted, and adds result.cancelled to the cancelled condition — consistent with the SSE path.
apps/sim/lib/copilot/request/session/abort.ts Lowers poll interval from 1000 ms to 250 ms; logic otherwise unchanged. The abort-before-clear ordering in the poller is correct and tested.
apps/sim/lib/copilot/request/session/abort-reason.ts Adds MarkerObservedAtBodyClose reason and correctly includes it in isExplicitStopReason so the new path is classified as explicit_stop in telemetry.
apps/sim/lib/copilot/request/types.ts Adds onAbortObserved to OrchestratorOptions with clear documentation of its purpose and wiring contract.
apps/sim/lib/copilot/request/go/stream.test.ts Adds four well-scoped tests covering the body-close abort marker paths including the Redis failure (fail-closed) and onAbortObserved callback assertions.
apps/sim/lib/copilot/request/lifecycle/start.test.ts Adds a test asserting that result.cancelled=true (without abortSignal.aborted) emits a complete/cancelled event rather than an error event.
apps/sim/lib/copilot/request/session/abort.test.ts New tests verify abort-before-clear ordering and that a pre-aborted controller does not trigger a second clearAbortMarker call.

Sequence Diagram

sequenceDiagram
    participant Browser
    participant SimSSE as Sim (SSE node)
    participant Redis
    participant GoBackend as Go Backend

    Browser->>SimSSE: POST /chat/abort
    SimSSE->>Redis: writeAbortMarker(streamId)
    SimSSE-->>Browser: 200 OK

    Note over GoBackend: Observes stop signal,
closes SSE body cleanly
(before 250ms poller tick)

    GoBackend-->>SimSSE: SSE body closes (no terminal event)

    SimSSE->>SimSSE: !streamComplete && !aborted && !wasAborted

    SimSSE->>Redis: hasAbortMarker(messageId)
    Redis-->>SimSSE: true

    SimSSE->>SimSSE: onAbortObserved(MarkerObservedAtBodyClose)
    SimSSE->>SimSSE: abortController.abort(reason)
    SimSSE->>SimSSE: context.wasAborted = true, endedOn = Aborted

    Note over SimSSE: No 503 thrown — outcome
classified as cancelled
    SimSSE->>SimSSE: isExplicitStopReason → true, recordCancelled → explicit_stop
Loading

Reviews (2): Last reviewed commit: "abort observed marker" | Re-trigger Greptile

Comment thread apps/sim/lib/copilot/request/lifecycle/headless.ts Outdated
Comment thread apps/sim/lib/copilot/request/session/abort.ts
Comment thread apps/sim/lib/copilot/request/go/stream.ts
Comment thread apps/sim/lib/copilot/request/lifecycle/start.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit cd1f826. Configure here.

@icecrasher321 icecrasher321 merged commit df581c3 into staging Apr 25, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant