refactor(storage): replace batch (queue,state) index with active_batch membership#233
Open
albertywu wants to merge 1 commit into
Open
refactor(storage): replace batch (queue,state) index with active_batch membership#233albertywu wants to merge 1 commit into
albertywu wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors how “active batches” are queried in the storage layer by removing reliance on a (queue, state) secondary index and introducing an explicit active_batch(queue, batch_id) membership table, enabling queue-scoped listing via primary-key-prefix reads.
Changes:
- Replace
BatchStore.GetByQueueAndStates(queue, states)withBatchStore.ListActive(queue)and move state filtering into callers. - Add
active_batchmembership table; remove theidx_queue_stateindex from the MySQL schema; update MySQLBatchStoreto maintain/self-heal membership. - Extend integration/contract tests to cover
ListActivesemantics and MySQL-specific self-healing behavior.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/integration/submitqueue/extension/storage/suite.go | Adds contract tests for BatchStore.ListActive and exposes storage via GetStorage() for backend-specific assertions. |
| test/integration/submitqueue/extension/storage/mysql/storage_test.go | Adds MySQL-specific tests validating self-healing behavior of active_batch rows. |
| test/integration/submitqueue/extension/storage/mysql/BUILD.bazel | Adds entity dependency required by new MySQL integration tests. |
| submitqueue/orchestrator/controller/cancel/cancel.go | Switches cancel controller from GetByQueueAndStates to ListActive. |
| submitqueue/orchestrator/controller/cancel/cancel_test.go | Updates mocks/expectations for ListActive. |
| submitqueue/orchestrator/controller/batch/batch.go | Uses ListActive and filters desired states in-memory for conflict analysis. |
| submitqueue/orchestrator/controller/batch/batch_test.go | Updates mocks/expectations and adds coverage for in-memory filtering behavior. |
| submitqueue/extension/storage/mysql/schema/README.md | Documents the new membership-table approach and maintenance model. |
| submitqueue/extension/storage/mysql/schema/batch.sql | Removes secondary index from batch table schema. |
| submitqueue/extension/storage/mysql/schema/active_batch.sql | Adds new active_batch membership table schema. |
| submitqueue/extension/storage/mysql/batch_store.go | Implements membership-first Create and ListActive resolution + best-effort terminal cleanup. |
| submitqueue/extension/storage/mock/batch_store_mock.go | Updates mock interface to ListActive. |
| submitqueue/extension/storage/batch_store.go | Updates storage interface to ListActive and documents new contract. |
Files not reviewed (1)
- submitqueue/extension/storage/mock/batch_store_mock.go: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ebba199 to
7c00f7b
Compare
…h membership Replace the secondary index over the batch table's mutable `state` column with an `active_batch` membership table that answers the only queue-scoped query the pipeline needs: "which batches in this queue are still active?" (the batch controller uses it to find conflict dependencies; the cancel controller uses it to find the batch holding a request). A row is intended to exist while its batch is non-terminal, so the table stays bounded by the live speculation window rather than growing with batch history. `queue` leads the PK so listing is a PK-prefix scan and the table is shardable by queue — an access pattern that ports cleanly to a key-value store (queue = partition key, batch_id = sort key), unlike a server-maintained secondary index over a mutable non-key column. Membership is best-effort, not an exact mirror of batch state, and is maintained without transactions: - Create writes the membership row before the batch row. This ordering is required for correctness: whenever a batch row is visible to a reader its membership row is already present, so a concurrent ListActive can never miss an active batch. INSERT IGNORE keeps the membership write idempotent across retries. - If the batch insert then fails, Create deliberately leaves the membership row in place. A returned error does not prove the row was not written (an ambiguous failure can commit the batch row and still return an error), so deleting would risk permanently orphaning a live, non-terminal batch from ListActive. A dangling membership is the safe direction. - ListActive resolves each member by primary key: a terminal batch's membership is best-effort removed (race-free — a terminal batch is fully committed and its id is never reused); a missing batch is skipped but NOT removed (it may belong to an in-flight Create that has written its membership but not yet its batch row). Cleanup failures are swallowed so a read never fails on index maintenance, and terminal-state writers (merge, speculate, dlq) need not touch the index. Genuinely dangling rows (failed/crashed creates) and batches stuck in a non-terminal state are left for a future reconcile/prune job, documented in schema/README.md. Integration tests cover the self-heal and membership invariants: - TestActiveBatch_SelfHealsTerminalMembership - TestActiveBatch_SkipsDanglingMembershipWithoutDeleting - TestActiveBatch_CreateKeepsMembershipOnDuplicate - TestActiveBatch_CreateKeepsMembershipOnFailedInsert Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7c00f7b to
49119c4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The batch table carried a (queue, state) secondary index solely to answer
"which batches in this queue are still active?" — a server-maintained index
over a mutable, non-key column that no plausible key-value/document backend
offers cheaply. Replace it with an explicit active_batch membership table so
the only queue-scoped query becomes a primary-key-prefix read.
store returns every non-terminal batch; callers filter states in memory
(batch -> {created,speculating,merging}; cancel -> all active). This pushes
filtering to the caller and keeps the contract to get/put-by-key semantics.
leading the PK (shardable by queue; maps to partition+sort key on a KV store).
leaves a recoverable dangling entry rather than an invisible active batch)
and self-healed on read: ListActive resolves each member by PK and best-effort
deletes rows whose batch is missing or terminal. Cleanup failures are swallowed
so a read never fails on index maintenance; terminal writers need not touch
the index. A future prune job sweeps orphans that never reach a terminal state.
terminal via both update paths, queue-scoped, unknown queue) plus MySQL-specific
self-heal tests for terminal and dangling membership.
Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com
Test Plan
✅
make fmt && make build && make test && make e2e-testIssues