Skip to content

v0.6.57: mothership reliability, ashby refactor, tables row count, copilot id fix, bun upgrade#4293

Merged
TheodoreSpeaks merged 9 commits intomainfrom
staging
Apr 24, 2026
Merged

v0.6.57: mothership reliability, ashby refactor, tables row count, copilot id fix, bun upgrade#4293
TheodoreSpeaks merged 9 commits intomainfrom
staging

Conversation

@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator

@TheodoreSpeaks TheodoreSpeaks commented Apr 24, 2026

TheodoreSpeaks and others added 6 commits April 24, 2026 14:36
…endering (#4287)

* improvement(mothership): stream rety state machine, progressive re-rendering

* address comments
…ct rule (#4268)

AGENTS.md / CLAUDE.md forbid crypto.randomUUID() (non-secure contexts throw
TypeError in browsers). Four copilot server-side files still violated this
rule, left over after PR #3397 polyfilled the client.

Routes through request lifecycle, OAuth draft insertion, persisted message
normalization, and table-row generation now use generateId from @sim/utils/id,
which is a drop-in UUID v4 producer that falls back to crypto.getRandomValues
when randomUUID is unavailable.

Refs #3393.
* refactor(ashby): align tools, block, and triggers with Ashby API

Audit-driven refactor to destructure rich fields per Ashby's API docs,
centralize output shapes via shared mappers in tools/ashby/utils.ts,
and align webhook provider handler with trigger IDs via a shared
action map. Removes stale block outputs left over from prior flat
response shapes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ashby): remove stale noteId output and reject ping events

- Remove stale `noteId` output descriptor from block (create_note
  now returns `id` at the top level via the shared note mapper).
- Explicitly reject `ping` events in the webhook matchEvent before
  falling back to the generic triggerId check, so webhook records
  missing triggerId cannot execute workflows on Ashby ping probes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ashby): trim optional ID params in create/update tools

Optional ID params in create_application, change_application_stage,
and update_candidate were passed through to the request body without
.trim(), unlike their required ID siblings. Normalize to prevent
copy-paste whitespace errors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ashby): add subblock migration for removed filterCandidateId

Add SUBBLOCK_ID_MIGRATIONS entry so deployed workflows that previously
used the `filterCandidateId` subBlock on `list_applications` don't break
after the field was removed (Ashby's application.list doesn't filter by
candidateId). Also regenerate docs to sync noteId removal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ashby): final API alignment from parallel validation

- create_candidate: email is optional per Ashby docs (only name is
  required); tool, types, and block all made non-required.
- list_applications: guard NaN when createdAfter can't be parsed so we
  don't send a bad value to Ashby's API.
- webhook provider: replace createHmacVerifier with explicit
  fail-closed verifyAuth that 401s when secretToken, signature header,
  or signature match is missing (was previously fail-open on missing
  secret).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ashby): preserve input.data path in webhook formatInput

Restore the explicit `data` key alongside the spread so deployed
workflows that reference `input.data.application.*`, `input.data.candidate.*`,
etc. keep working. The spread alone dropped those paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(ashby): drop legacy input.data key from webhook formatInput

Keep formatInput aligned with the advertised trigger outputs schema
(flat top-level entities) and drop the legacy input.data.* compat path.
Every field declared in each trigger's outputs is now populated 1:1 by
the data spread plus the explicit action key — no undeclared keys.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ashby): trim remaining ID body params for parity

Add .trim() on sourceId (create_candidate), jobId (list_applications),
applicationId and interviewStageId (list_interviews) to match the
trim-on-IDs pattern used across the rest of the Ashby tools and guard
against copy-paste whitespace.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* update docs

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…#4289)

* improvement(tables): race-free row-count trigger + scoped tx timeouts

* fix(tables): close upsert race + serialize replaceTableRows

Two concurrency bugs flagged by review:

1. `upsertRow` insert path: removing FOR UPDATE broke serialization between
   the initial existing-row SELECT and the INSERT. Two concurrent upserts
   on the same conflict target could both find no match, then both insert,
   producing a duplicate that bypasses the app-level unique check. Fixed
   by re-checking for the matching row *after* acquiring the per-table
   advisory lock; if a racing tx already inserted, switch to UPDATE.

2. `replaceTableRows`: under READ COMMITTED each tx's DELETE uses its own
   MVCC snapshot, so two concurrent replaces could each DELETE only the
   rows visible at their start, then both INSERT, leaving the table with
   the union of both row sets. Fixed by acquiring the per-table advisory
   lock at the start of the tx to serialize replaces against each other
   and against auto-position inserts.

Also updated a stale docstring on `upsertRow` that still referenced the
removed FOR UPDATE.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(tables): serialize explicit-position inserts with advisory lock

The `(table_id, position)` index is non-unique. Concurrent explicit-
position inserts at the same slot can both observe the slot empty, both
skip the shift, then each INSERT — producing duplicate `(table_id,
position)` rows.

Take the per-table advisory lock in the explicit-position branches of
`insertRow` and `batchInsertRows` (the auto-position branches already do
this). Updated the test that asserted the explicit path skipped the lock,
and added coverage for `batchInsertRows` with explicit positions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(tables): dedupe upsert UPDATE path + extract nextAutoPosition

Two pure cleanups on the round-1 changes:

1. Extract `nextAutoPosition(trx, tableId)` — the `SELECT coalesce(max(
   position), -1) + 1` pattern was repeated in three call sites
   (`insertRow` auto branch, `batchInsertRows` auto branch, `upsertRow`
   insert branch). One helper, same behavior.

2. Consolidate `upsertRow` update path. The initial-SELECT match and the
   post-lock re-select match previously had two literal duplicates of the
   same UPDATE + return block. Resolve `matchedRowId` first, then run one
   UPDATE branch. Lock is still only acquired when we don't find a match
   on the first pass.

No behavior change. 98/98 table unit tests unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Apr 24, 2026 11:37pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 24, 2026

PR Summary

Medium Risk
Medium risk because it changes streaming/SSE behavior and core table row mutation/query logic (locking, timeouts, counts), which could affect performance or correctness under load despite added tests.

Overview
Mothership/Copilot streaming reliability: the chat stream resume endpoint now emits SSE comments (: accepted + periodic keepalive) and tracks last-write time; Go-side stream requests stop retrying non-idempotent failures, treat error events as terminal for telemetry, and ensure spans close correctly on fetch failures. Chat abort polling now heartbeats Redis chat-stream locks via a new extendLock helper (short TTL + periodic renewals) so dead pods don’t strand chats; related tests were added.

Tables performance & correctness: row listing APIs add includeTotal to optionally skip COUNT(*) (returning totalCount: null), with client/query hooks updated to pass/handle it and avoid counting in grid views. Table mutations now set per-transaction Postgres timeouts via SET LOCAL, rely on the row-count trigger instead of app-side counting/FOR UPDATE, and use per-table advisory locks to serialize position-sensitive writes (insert/batch insert/replace/upsert) plus targeted position recompaction after bulk deletes; unique-constraint checks and multi-row updates were optimized.

Ashby alignment & UI: Ashby docs/blocks/triggers were updated to reflect richer tool outputs, new/changed inputs (e.g., optional candidate email, job posting expand flags, includeArchived/syncToken for tags), and removal of the candidateId filter from list applications; the chat UI uses useProgressiveList to progressively render/stage messages and avoid expensive rescans for the preceding user query.

Reviewed by Cursor Bugbot for commit 60b80ec. Bugbot is set up for automated code reviews on this repo. Configure here.

@waleedlatif1 waleedlatif1 changed the title v0.6.57: mothership reliability, ashby refactor, tables row count, copilot id fix v0.6.57: mothership reliability, ashby refactor, tables row count, copilot id fix, bun upgrade Apr 24, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

This PR bundles five distinct reliability improvements: heartbeat-based Redis chat-stream locks (TTL reduced from 2 h → 60 s, renewed every 20 s), a state-machine retry loop with progressive re-rendering for mothership streams, a race-free Postgres trigger + advisory lock for table row counts, optional COUNT(*) skipping for the grid UI (includeTotal: false), and the crypto.randomUUID()generateId() copilot fix. The Ashby refactor aligns tool schemas with the upstream API.

  • The new increment_user_table_row_count trigger's atomic conditional UPDATE ... WHERE row_count < max_rows correctly replaces the old TOCTOU-prone SELECT → check → UPDATE pattern. Application-level SELECT FOR UPDATE is cleanly removed, and the one-shot reconciliation migration handles historical drift.

Confidence Score: 5/5

PR is safe to merge — all findings are P2 style/documentation suggestions with no blocking correctness issues.

The trigger change is logically correct (atomic conditional UPDATE eliminates TOCTOU), the advisory lock correctly serializes position writes, the heartbeat mechanism has adequate safety margin (3 beats before lock expiry), and the cursor-advancement fix in the stream route is a genuine correctness improvement. The totalCount → null type change is handled correctly at all call sites. No P0/P1 issues found.

No files require special attention. The migration (0198_tables_race_free_trigger.sql) and service.ts are the highest-impact files and both look correct.

Important Files Changed

Filename Overview
apps/sim/lib/copilot/request/session/abort.ts Heartbeat added to startAbortPoller: chat stream lock TTL reduced from 2h to 60s, renewed every 20s via extendLock Lua CAS. Clean separation of heartbeat from abort-poll logic.
apps/sim/lib/core/config/redis.ts New extendLock using Lua CAS (get-and-compare before expire) to safely refresh TTL only if caller still owns the lock. No-op path for Redis-less deployments.
apps/sim/lib/table/service.ts Application-level FOR UPDATE + COUNT removed; capacity now enforced atomically by the new trigger. Advisory lock serializes position-aware inserts. Scoped SET LOCAL timeouts added to all mutating transactions.
packages/db/migrations/0198_tables_race_free_trigger.sql Replaces TOCTOU-prone two-step trigger with atomic conditional UPDATE; includes one-shot reconciliation to correct any historical row_count drift.
apps/sim/app/api/copilot/chat/stream/route.ts Cursor/state now advanced only after successful enqueue (preventing position drift on controller close). SSE keepalive comments injected every 15s of idle. enqueueComment helper correctly updates lastWriteTime.
apps/sim/hooks/use-progressive-list.ts Refactored to two separate effects (key sync + RAF chain). renderState computed inline during render for flash-free key transitions. Logic is sound but notably more complex than the prior version.
apps/sim/app/workspace/[workspaceId]/home/components/mothership-chat/mothership-chat.tsx Progressive rendering integrated; precedingUserContent now O(n) via memoized index map instead of O(n²) reverse scan per message. Index arithmetic (stagedOffset + localIndex) for precedingUserContentByIndex lookup is correct.
apps/sim/hooks/queries/tables.ts totalCount type changed to number
apps/sim/app/api/table/[tableId]/rows/route.ts includeTotal query param added; COUNT(*) skipped when false. Row-count trigger error message ("row limit") still matched by existing catch clauses.
apps/sim/app/api/v1/tables/[tableId]/rows/route.ts Same includeTotal treatment as internal route; when includeTotal=true, rowsPromise and countQuery run via Promise.all (parallel). When false, sequential — correct since only rows needed.

Sequence Diagram

sequenceDiagram
    participant C as Client
    participant SR as Stream Route (resume)
    participant AP as AbortPoller
    participant Redis
    participant DB as Postgres

    C->>SR: POST /copilot/chat/stream (resume)
    SR->>SR: enqueueComment("accepted")
    SR->>Redis: acquireLock(chatStreamKey, streamId, TTL=60s)
    SR->>AP: startAbortPoller(streamId, {chatId})
    
    loop Every 250ms poll
        AP->>DB: Check abort marker
        alt Every 20s heartbeat
            AP->>Redis: extendLock(chatStreamKey, streamId, 60s) [Lua CAS]
            Redis-->>AP: 1 (owned) or 0 (lost)
        end
    end

    loop Every 250ms poll (if no terminal event)
        SR->>DB: readEvents(streamId, cursor)
        DB-->>SR: events[]
        SR->>C: SSE envelopes (cursor updated after successful enqueue)
        alt No write for 15s
            SR->>C: SSE comment("keepalive")
        end
    end

    SR->>Redis: releaseLock(chatStreamKey, streamId)
Loading

Comments Outside Diff (1)

  1. apps/sim/lib/copilot/request/session/abort.ts, line 306-340 (link)

    P2 Heartbeat blocked while abort-poll is in flight

    The heartbeat block runs only after pollingStreams.delete(streamId) in the finally block, meaning it executes in the same tick that clears the guard. However, if another interval fires while the IIFE is awaiting the abort check (pollingStreams.has(streamId) is true), that interval exits immediately — skipping the heartbeat for that tick. Since CHAT_STREAM_LOCK_HEARTBEAT_INTERVAL_MS (20 s) is checked against lastHeartbeatAt, a single blocked tick doesn't cause lock expiry given the 60 s TTL, but a prolonged slow abort-check could delay a heartbeat beyond the 20 s window. This is unlikely in practice given the abort check is a lightweight Redis GET, but worth noting in a comment alongside CHAT_STREAM_LOCK_HEARTBEAT_INTERVAL_MS.

Reviews (1): Last reviewed commit: "chore(bun): bump bun to 1.3.13 (#4291)" | Re-trigger Greptile

Comment on lines +71 to +82
*
* `lock_timeout` is the critical one: without it, a waiter inherits the full
* `statement_timeout` clock, so one stuck writer can drain the pool.
*
* Safe under pgBouncer transaction pooling — `SET LOCAL` is transaction-scoped
* and cleared at COMMIT/ROLLBACK before the session returns to the pool.
*/
async function setTableTxTimeouts(
trx: DbTransaction,
opts?: { statementMs?: number; lockMs?: number; idleMs?: number }
) {
const s = opts?.statementMs ?? 10_000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 sql.raw() with string interpolation for SET LOCAL

sql.raw() with template-string number interpolation works here because s, l, and i are always application-computed numbers, but it bypasses Drizzle's parameterization entirely and could silently accept non-numeric values if the call sites ever evolve. PostgreSQL's SET LOCAL doesn't support $1 placeholders, so raw SQL is necessary — a brief inline comment explaining that would help future readers avoid the "why isn't this parameterized?" question.

Comment on lines 62 to +69

const completedKeysRef = useRef(new Set<string>())
const prevKeyRef = useRef(key)
const stagingCountRef = useRef(initialBatch)
const [count, setCount] = useState(() => {
if (items.length <= initialBatch) return items.length
return initialBatch
})
const renderState =
state.key === key && (state.count > 0 || items.length === 0 || state.caughtUp)
? state
: createInitialState(key, items.length, initialBatch)

useEffect(() => {
if (completedKeysRef.current.has(key)) {
setCount(items.length)
return
}
setState((prev) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inline createInitialState during render on key mismatch

When state.key !== key, renderState is computed inline (not from React state), so staged uses the correct initial slice on the very first render after a key change. This avoids a flash of stale content. However, the inline object is recreated on every render while the two clocks are mismatched (before the useEffect fires). This is intentional and cheap, but it means staged is a new array reference on every render during that window — components that memo-compare staged will see unnecessary re-renders. A small comment explaining the intentional dual-clock pattern would help future readers.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

…4294)

The visual filter/sort builders read the selected tableId from subBlock id
'tableId', but the Table block stores it under 'tableSelector' (basic) or
'manualTableId' (advanced) via canonicalParamId. The lookup always returned
null, so useTable was disabled and the column picker always showed
"no options available".

Adds useCanonicalSubBlockValue that resolves by canonicalParamId through
the canonical index, mirroring the pattern used by dropdown dependsOn.
@TheodoreSpeaks TheodoreSpeaks merged commit 595c4c3 into main Apr 24, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants