Skip to content

refactor(storage): replace batch (queue,state) index with active_batch membership#233

Open
albertywu wants to merge 1 commit into
mainfrom
wua/batch-active-membership
Open

refactor(storage): replace batch (queue,state) index with active_batch membership#233
albertywu wants to merge 1 commit into
mainfrom
wua/batch-active-membership

Conversation

@albertywu

@albertywu albertywu commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

The batch table carried a (queue, state) secondary index solely to answer
"which batches in this queue are still active?" — a server-maintained index
over a mutable, non-key column that no plausible key-value/document backend
offers cheaply. Replace it with an explicit active_batch membership table so
the only queue-scoped query becomes a primary-key-prefix read.

  • BatchStore: GetByQueueAndStates(queue, states) -> ListActive(queue). The
    store returns every non-terminal batch; callers filter states in memory
    (batch -> {created,speculating,merging}; cancel -> all active). This pushes
    filtering to the caller and keeps the contract to get/put-by-key semantics.
  • schema: drop idx_queue_state; add active_batch(queue, batch_id) with queue
    leading the PK (shardable by queue; maps to partition+sort key on a KV store).
  • membership is written on Create (membership-first, so a failed batch insert
    leaves a recoverable dangling entry rather than an invisible active batch)
    and self-healed on read: ListActive resolves each member by PK and best-effort
    deletes rows whose batch is missing or terminal. Cleanup failures are swallowed
    so a read never fails on index maintenance; terminal writers need not touch
    the index. A future prune job sweeps orphans that never reach a terminal state.
  • tests: contract-suite coverage for active listing (returns active, excludes
    terminal via both update paths, queue-scoped, unknown queue) plus MySQL-specific
    self-heal tests for terminal and dangling membership.

Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com

Test Plan

make fmt && make build && make test && make e2e-test

Issues

Copilot AI review requested due to automatic review settings June 10, 2026 03:18

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors how “active batches” are queried in the storage layer by removing reliance on a (queue, state) secondary index and introducing an explicit active_batch(queue, batch_id) membership table, enabling queue-scoped listing via primary-key-prefix reads.

Changes:

  • Replace BatchStore.GetByQueueAndStates(queue, states) with BatchStore.ListActive(queue) and move state filtering into callers.
  • Add active_batch membership table; remove the idx_queue_state index from the MySQL schema; update MySQL BatchStore to maintain/self-heal membership.
  • Extend integration/contract tests to cover ListActive semantics and MySQL-specific self-healing behavior.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/integration/submitqueue/extension/storage/suite.go Adds contract tests for BatchStore.ListActive and exposes storage via GetStorage() for backend-specific assertions.
test/integration/submitqueue/extension/storage/mysql/storage_test.go Adds MySQL-specific tests validating self-healing behavior of active_batch rows.
test/integration/submitqueue/extension/storage/mysql/BUILD.bazel Adds entity dependency required by new MySQL integration tests.
submitqueue/orchestrator/controller/cancel/cancel.go Switches cancel controller from GetByQueueAndStates to ListActive.
submitqueue/orchestrator/controller/cancel/cancel_test.go Updates mocks/expectations for ListActive.
submitqueue/orchestrator/controller/batch/batch.go Uses ListActive and filters desired states in-memory for conflict analysis.
submitqueue/orchestrator/controller/batch/batch_test.go Updates mocks/expectations and adds coverage for in-memory filtering behavior.
submitqueue/extension/storage/mysql/schema/README.md Documents the new membership-table approach and maintenance model.
submitqueue/extension/storage/mysql/schema/batch.sql Removes secondary index from batch table schema.
submitqueue/extension/storage/mysql/schema/active_batch.sql Adds new active_batch membership table schema.
submitqueue/extension/storage/mysql/batch_store.go Implements membership-first Create and ListActive resolution + best-effort terminal cleanup.
submitqueue/extension/storage/mock/batch_store_mock.go Updates mock interface to ListActive.
submitqueue/extension/storage/batch_store.go Updates storage interface to ListActive and documents new contract.
Files not reviewed (1)
  • submitqueue/extension/storage/mock/batch_store_mock.go: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread submitqueue/extension/storage/mysql/schema/README.md Outdated
Comment thread submitqueue/extension/storage/mysql/schema/active_batch.sql Outdated
Comment thread submitqueue/extension/storage/mysql/schema/active_batch.sql Outdated
@albertywu albertywu force-pushed the wua/batch-active-membership branch 2 times, most recently from ebba199 to 7c00f7b Compare June 10, 2026 04:07
Copilot AI review requested due to automatic review settings June 10, 2026 04:07

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.

Files not reviewed (1)
  • submitqueue/extension/storage/mock/batch_store_mock.go: Language not supported

Comment thread submitqueue/extension/storage/mysql/batch_store.go
Comment thread submitqueue/extension/storage/mysql/schema/batch.sql Outdated
…h membership

Replace the secondary index over the batch table's mutable `state` column with
an `active_batch` membership table that answers the only queue-scoped query the
pipeline needs: "which batches in this queue are still active?" (the batch
controller uses it to find conflict dependencies; the cancel controller uses it
to find the batch holding a request). A row is intended to exist while its batch
is non-terminal, so the table stays bounded by the live speculation window
rather than growing with batch history. `queue` leads the PK so listing is a
PK-prefix scan and the table is shardable by queue — an access pattern that
ports cleanly to a key-value store (queue = partition key, batch_id = sort key),
unlike a server-maintained secondary index over a mutable non-key column.

Membership is best-effort, not an exact mirror of batch state, and is
maintained without transactions:

- Create writes the membership row before the batch row. This ordering is
  required for correctness: whenever a batch row is visible to a reader its
  membership row is already present, so a concurrent ListActive can never miss
  an active batch. INSERT IGNORE keeps the membership write idempotent across
  retries.
- If the batch insert then fails, Create deliberately leaves the membership row
  in place. A returned error does not prove the row was not written (an
  ambiguous failure can commit the batch row and still return an error), so
  deleting would risk permanently orphaning a live, non-terminal batch from
  ListActive. A dangling membership is the safe direction.
- ListActive resolves each member by primary key: a terminal batch's membership
  is best-effort removed (race-free — a terminal batch is fully committed and
  its id is never reused); a missing batch is skipped but NOT removed (it may
  belong to an in-flight Create that has written its membership but not yet its
  batch row). Cleanup failures are swallowed so a read never fails on index
  maintenance, and terminal-state writers (merge, speculate, dlq) need not touch
  the index.

Genuinely dangling rows (failed/crashed creates) and batches stuck in a
non-terminal state are left for a future reconcile/prune job, documented in
schema/README.md.

Integration tests cover the self-heal and membership invariants:
  - TestActiveBatch_SelfHealsTerminalMembership
  - TestActiveBatch_SkipsDanglingMembershipWithoutDeleting
  - TestActiveBatch_CreateKeepsMembershipOnDuplicate
  - TestActiveBatch_CreateKeepsMembershipOnFailedInsert

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@albertywu albertywu force-pushed the wua/batch-active-membership branch from 7c00f7b to 49119c4 Compare June 10, 2026 04:27
@albertywu albertywu marked this pull request as ready for review June 10, 2026 04:34
Copilot AI review requested due to automatic review settings June 10, 2026 04:34
@albertywu albertywu requested review from a team, behinddwalls and sbalabanov as code owners June 10, 2026 04:34

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • submitqueue/extension/storage/mock/batch_store_mock.go: Language not supported

Comment thread submitqueue/extension/storage/mysql/schema/README.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants