Fix block LWW tombstone parity handling on local metadata upserts#46
Open
Fix block LWW tombstone parity handling on local metadata upserts#46
Conversation
Make SQL_CLOUDSYNC_UPSERT_RAW_COLVERSION preserve the caller's requested live/tombstone parity on conflict instead of blindly incrementing col_version. This fixes the block-LWW case where re-marking a live block could flip it to an even tombstone version and disappear from materialization, and keeps repeated tombstone writes even instead of resurrecting to an odd version. Qualify the PostgreSQL ON CONFLICT target-side col_version references to avoid ambiguous-column errors, and update the shared formatter call so both backends receive the right table-name substitutions. Add unit regressions for site-id resolution, double tombstone writes, and the block-LWW upsert tombstone scenario.
The debug image builds postgres from source, which uses the upstream postgresql.conf.sample where listen_addresses defaults to localhost. The Debian-packaged postgres in the release image patches the sample to '*', so this only bites the debug compose target — host psql can't reach the container after a fresh volume init. Pass `-c listen_addresses=*` via the compose command override so the fix is applied on every fresh initdb without rebuilding the image.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes a parity bug in SQL_CLOUDSYNC_UPSERT_RAW_COLVERSION that affected block-level LWW metadata updates on the local trigger path.
Previously, the ON CONFLICT branch always did:
col_version = col_version + 1That was unsafe for block LWW because block metadata uses col_version parity to encode liveness:
Because of that, re-marking an already-live block as live could flip it from odd to even and silently tombstone it. Repeated tombstone writes could also flip an even tombstone back to odd and
effectively resurrect the block metadata.
This PR changes the raw metadata upsert so that, on conflict, it preserves the caller’s requested parity:
That keeps col_version monotonic while preserving the intended live/tombstone state.
How The Bug Could Happen In Practice
This bug was not just theoretical. A realistic case is a block-LWW text column where one row is split into multiple block metadata entries, for example a note, document, or message body.
A normal local write can revisit a block that is already live and issue another metadata upsert for that same block. Before this fix, the raw upsert conflict path always incremented by +1, so:
That turns a live block into a tombstone even though the caller was still saying “this block is alive”.
In a real app, that could show up as:
The failure mode is especially subtle because the base row still exists. What breaks is the internal block metadata, so the user-visible symptom is partial text loss or unexpected block resurrection during later materialization or sync.
What Changed
Effect On Regular Non-Tombstone Rows
This fix slightly changes the numeric shape of col_version for regular local metadata rows that use this raw upsert path but do not rely on tombstone parity.
Before this fix, repeated conflicting local upserts would typically evolve like:
After this fix, when the caller keeps requesting the same parity, they can evolve like:
So for some regular non-tombstone rows, the winner may now jump by 2 instead of always by 1.
Why This Does Not Break CRDT Correctness
This does not change the ordering semantics that the CRDT relies on.
What the algorithm needs for correctness is:
That still holds after this fix:
So this PR changes the numeric progression, but not the correctness property the merge algorithm depends on. The winner is still greater than the loser; it just may now be greater by 2 instead of 1.
For parity-sensitive metadata, that broader behavior is necessary to preserve liveness intent. For non-parity-sensitive metadata, it is a harmless change in version spacing, not a change in
conflict outcome.
Why This Matters
This fixes two bad block-LWW behaviors on the local metadata path:
It also fixes the PostgreSQL ambiguity introduced by the parity-aware ON CONFLICT update.
Tests
Validated with:
Relevant regressions now pass:
PostgreSQL smoke/full tests also run cleanly again after qualifying the ON CONFLICT target-side col_version references.
Why The Reproducer Uses A Direct Metadata Upsert
The regression test for the original bug uses the internal metadata helper on an existing block row instead of trying to reproduce the issue only through plain text edits.
That is intentional.
In the current block diff implementation, ordinary text edits usually do not re-upsert the same live block metadata row:
So with the current algorithm, normal text editing does not reliably hit the exact failure mode of “upsert the same already-live block row again”.
The old bug happened in the lower-level raw metadata UPSERT path itself: if the same block metadata row was upserted again with a live intent, the conflict update always did col_version + 1, which
could flip an odd live block version to an even tombstone version.
The direct helper-based regression was added because it isolates and proves that exact engine-level bug without depending on a higher-level diff pattern that the current implementation mostly
avoids.
This also matters for future evolution of block diffing. The code comments already mention a possible
BLOCK_DIFF_MODIFIEDcase. If that is implemented later as “same block identity, new content”,then ordinary user edits could naturally re-upsert the same live block row and would have hit the old bug very easily. So this fix protects both the current low-level path and a plausible future
block-diff implementation.