Skip to content

Fix block LWW tombstone parity handling on local metadata upserts#46

Open
andinux wants to merge 3 commits intomainfrom
fix/block-lww-upsert-tombstone
Open

Fix block LWW tombstone parity handling on local metadata upserts#46
andinux wants to merge 3 commits intomainfrom
fix/block-lww-upsert-tombstone

Conversation

@andinux
Copy link
Copy Markdown
Collaborator

@andinux andinux commented Apr 23, 2026

Summary

This PR fixes a parity bug in SQL_CLOUDSYNC_UPSERT_RAW_COLVERSION that affected block-level LWW metadata updates on the local trigger path.

Previously, the ON CONFLICT branch always did:

col_version = col_version + 1

That was unsafe for block LWW because block metadata uses col_version parity to encode liveness:

  • odd col_version = live block
  • even col_version = tombstone

Because of that, re-marking an already-live block as live could flip it from odd to even and silently tombstone it. Repeated tombstone writes could also flip an even tombstone back to odd and
effectively resurrect the block metadata.

This PR changes the raw metadata upsert so that, on conflict, it preserves the caller’s requested parity:

  • if existing parity already matches the requested parity, bump by +2
  • otherwise bump by +1

That keeps col_version monotonic while preserving the intended live/tombstone state.

How The Bug Could Happen In Practice

This bug was not just theoretical. A realistic case is a block-LWW text column where one row is split into multiple block metadata entries, for example a note, document, or message body.

A normal local write can revisit a block that is already live and issue another metadata upsert for that same block. Before this fix, the raw upsert conflict path always incremented by +1, so:

  • existing live block 1
  • incoming live intent 1
  • result 2

That turns a live block into a tombstone even though the caller was still saying “this block is alive”.

In a real app, that could show up as:

  • a line of text disappearing after a local edit
  • a block vanishing after a no-op rewrite or a second pass over the same row
  • a repeated delete of the same block causing it to come back later because tombstone parity flipped back to odd

The failure mode is especially subtle because the base row still exists. What breaks is the internal block metadata, so the user-visible symptom is partial text loss or unexpected block resurrection during later materialization or sync.

What Changed

  • Updated SQLite SQL_CLOUDSYNC_UPSERT_RAW_COLVERSION to preserve parity on conflict
  • Updated PostgreSQL SQL_CLOUDSYNC_UPSERT_RAW_COLVERSION with the same parity logic
  • Qualified PostgreSQL target-side col_version references in ON CONFLICT DO UPDATE to avoid ambiguous-column errors
  • Updated the shared SQL formatting call to provide the PostgreSQL template with the required table-name substitutions
  • Added unit regressions for:
    • block LWW live-upsert parity preservation
    • double tombstone writes staying tombstoned
    • repeated row-level delete/resurrect cycles with out-of-order delivery
    • site ID resolution through cloudsync_changes

Effect On Regular Non-Tombstone Rows

This fix slightly changes the numeric shape of col_version for regular local metadata rows that use this raw upsert path but do not rely on tombstone parity.

Before this fix, repeated conflicting local upserts would typically evolve like:

  • 1 -> 2 -> 3 -> 4

After this fix, when the caller keeps requesting the same parity, they can evolve like:

  • 1 -> 3 -> 5 -> 7

So for some regular non-tombstone rows, the winner may now jump by 2 instead of always by 1.

Why This Does Not Break CRDT Correctness

This does not change the ordering semantics that the CRDT relies on.

What the algorithm needs for correctness is:

  • newer winning changes must get a strictly larger col_version
  • comparisons between changes must still work by ordering (>, <, =)

That still holds after this fix:

  • versions remain strictly monotonic
  • a newer local winner still gets a larger col_version
  • merge logic still compares versions by relative order, not by assuming an exact +1 step

So this PR changes the numeric progression, but not the correctness property the merge algorithm depends on. The winner is still greater than the loser; it just may now be greater by 2 instead of 1.

For parity-sensitive metadata, that broader behavior is necessary to preserve liveness intent. For non-parity-sensitive metadata, it is a harmless change in version spacing, not a change in
conflict outcome.

Why This Matters

This fixes two bad block-LWW behaviors on the local metadata path:

  • a live block could be accidentally tombstoned after a conflicting local upsert
  • writing the same tombstone twice could resurrect metadata parity

It also fixes the PostgreSQL ambiguity introduced by the parity-aware ON CONFLICT update.

Tests

Validated with:

make dist/unit
HOME=/tmp ./dist/unit

Relevant regressions now pass:

  • Block LWW UPSERT Tombstone Bug
  • Block LWW Double Tombstone
  • Delete/Resurrect Multi-Cycle
  • Site ID Resolution

PostgreSQL smoke/full tests also run cleanly again after qualifying the ON CONFLICT target-side col_version references.

Why The Reproducer Uses A Direct Metadata Upsert

The regression test for the original bug uses the internal metadata helper on an existing block row instead of trying to reproduce the issue only through plain text edits.

That is intentional.

In the current block diff implementation, ordinary text edits usually do not re-upsert the same live block metadata row:

  • unchanged blocks are matched by content and produce no diff entry
  • newly created blocks typically get a new position_id, so they become a different block col_name
  • removed blocks are emitted as removals on the old block identity

So with the current algorithm, normal text editing does not reliably hit the exact failure mode of “upsert the same already-live block row again”.

The old bug happened in the lower-level raw metadata UPSERT path itself: if the same block metadata row was upserted again with a live intent, the conflict update always did col_version + 1, which
could flip an odd live block version to an even tombstone version.

The direct helper-based regression was added because it isolates and proves that exact engine-level bug without depending on a higher-level diff pattern that the current implementation mostly
avoids.

This also matters for future evolution of block diffing. The code comments already mention a possible BLOCK_DIFF_MODIFIED case. If that is implemented later as “same block identity, new content”,
then ordinary user edits could naturally re-upsert the same live block row and would have hit the old bug very easily. So this fix protects both the current low-level path and a plausible future
block-diff implementation.

andinux added 3 commits April 23, 2026 13:32
Make SQL_CLOUDSYNC_UPSERT_RAW_COLVERSION preserve the caller's requested live/tombstone parity on conflict instead of blindly incrementing col_version. This fixes the block-LWW case where re-marking a live block could flip it to an even tombstone version and disappear from materialization, and keeps repeated tombstone writes even instead of resurrecting to an odd version.

Qualify the PostgreSQL ON CONFLICT target-side col_version references to avoid ambiguous-column errors, and update the shared formatter call so both backends receive the right table-name substitutions.

Add unit regressions for site-id resolution, double tombstone writes, and the block-LWW upsert tombstone scenario.
The debug image builds postgres from source, which uses the upstream
postgresql.conf.sample where listen_addresses defaults to localhost.
The Debian-packaged postgres in the release image patches the sample
to '*', so this only bites the debug compose target — host psql can't
reach the container after a fresh volume init.

Pass `-c listen_addresses=*` via the compose command override so the
fix is applied on every fresh initdb without rebuilding the image.
@andinux andinux requested a review from marcobambini April 23, 2026 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant