[Proposal] Restructuring sync_local #178

rkistner · 2026-04-22T15:56:15Z

rkistner
Apr 22, 2026
Maintainer

Background

The sync process generally involves two steps:

Downloading data, persisting in ps_oplog.
Once a complete (or partial) checkpoint has been downloaded, copy those rows into the relevant data tables, filtering by operations added since the last checkpoint. This is the sync_local step.

The big motivation for the split is consistency: The user sees no changes directly during step 1, instead getting an atomic update to the next (or first) checkpoint in the sync_local step.

The user experience in step 1 is generally good: Data is downloaded incrementally, and we report progress on it. But when there is a lot of data, the sync_local step can take long to complete (minutes in some cases), blocks all writes, and gives no feedback on progress.

I'd like to address this in two ways:

Make sync_local faster for bulk sync (e.g. initial sync, resync, or downloading many new buckets).
Make the operation incremental, so that we can report progress.

Current implementation

Currently, we have three indexes on ps_oplog:

CREATE INDEX ps_oplog_key ON ps_oplog (bucket, key);
CREATE INDEX ps_oplog_row ON ps_oplog (row_type, row_id);
CREATE INDEX ps_oplog_opid ON ps_oplog (bucket, op_id);

We use the indexes in these ways:

ps_oplog_key: de-duplicates operations on the same source row in the same bucket, to avoid storing the entire history.
ps_oplog_row: groups operations by target row, allowing us to de-duplicate multiple copies of the same target row if it is synced via multiple buckets or multiple source rows within the same bucket (the latter is an edge case where the synced id is not unique).
ps_oplog_opid: this is used to find unapplied changes from ps_oplog during the sync_local step, to allow efficiently performing incremental updates.

Then we also have this table:

CREATE TABLE ps_updated_rows(
  row_type TEXT,
  row_id TEXT,
  PRIMARY KEY(row_type, row_id)) STRICT, WITHOUT ROWID;

This is used to:

Record rows modified locally, to track of rows we may need to remove if we don't sync them back.
Record removed rows (not tracked in ps_oplog).

We combine those to compute all updated rows for sync_local:

WITH updated_rows AS (
    SELECT b.row_type, b.row_id FROM ps_buckets AS buckets
        CROSS JOIN ps_oplog AS b ON b.bucket = buckets.id
        AND (b.op_id > buckets.last_applied_op)
    UNION ALL SELECT row_type, row_id FROM ps_updated_rows
)

And then query the latest data for each row (in the same query, using the above CTE):

SELECT
    b.row_type,
    b.row_id,
    (
        SELECT iif(max(r.op_id), r.data, null)
                 FROM ps_oplog r
                WHERE r.row_type = b.row_type
                  AND r.row_id = b.row_id

    ) as data
    FROM updated_rows b
    GROUP BY b.row_type, b.row_id;

This is slightly more complex with partial checkpoints, which I'm not covering in detail here.

Proposal

Part 1

Remove the ps_oplog_opid index. Instead, when we sync a new operation, also insert it into ps_updated_rows. This simplifies the query for updated_rows to a simple SELECT row_type, row_id FROM ps_updated_rows.

The direct advantage is that this should be slightly faster to query: We can iterate through the rows from ps_updated_rows directly, where we SQLite currently needs to (1) iterate through the ps_oplog_opid index, then (2) lookup those rows from ps_oplog to get the row_type and row_id.

It is not strictly required to remove ps_oplog_opid index here, but it does offset the write overhead from writing more rows to ps_updated_rows. Removing this does have implications for partial checkpoints - see below.

We'd also need to change the implementation of powersync_trigger_resync() - the current implementation relies on just setting last_applied_op = 0 for all buckets.

Partial checkpoints

Partial checkpoints aren't covered directly by the above. These are more tricky, since they need to separately keep track of priorities of changes, to filter incremental updates by specific priorities. I believe we can't even store the priority on ps_updated_rows directly, since the priority for an entire bucket can change at any point.

One option is to just keep the ps_oplog_opid index and use the current approach for partial checkpoints, but that's not ideal.

It could work to instead store the relevant bucket(s) on ps_updated_rows, which may require a re-design of that table. For example:

CREATE TABLE ps_updated_rows(
  bucket INTEGER, -- 0 for local changes
  row_type TEXT,
  row_id TEXT,
  PRIMARY KEY(bucket, row_type, row_id)) STRICT, WITHOUT ROWID;

Initially, I had the primary key on (row_type, row_id, bucket), with the idea that we can efficiently group on (row_type, row_id), and just filter on bucket, having some overhead for partial checkpoints to do that filter. But from doing actual tests, it appears unnecessary to do that grouping here: We can read all rows from ps_updated_rows, then do the grouping on ps_oplog, where we need to do that grouping anyway.

Part 2 (probably not)

Update: This appears unnecessary given the gains from part 1 above.

A previous idea I had was for bulk sync, we can make sync faster by not computing updated_rows at all, but instead re-syncing the entire ps_oplog table, grouped by (row_type, row_id) using the ps_oplog_row index.

The tricky part is to know when this is faster than the incremental version. We could keep track of total ps_oplog count versus number of rows in ps_updated_rows, and switch over when for example count(ps_oplog) * 0.5 > count(ps_updated_rows). Note that counting the tables directly can be expensive by itself, so we'd need to persist separate counters for this.

TODO: Test if this is actually faster once we have implemented part 1, since that already removes the extra scan through ps_oplog_opid and lookups in ps_oplog. This may also prevent optimizations from part 3.

Part 3: Incremental/chunks sync_local

Once we only use ps_updated_rows to keep track of rows that need to be copied, we can use this do the process in separate chunks:

Query say 10k rows from ps_updated_rows.
Copy those rows to the data tables.
Delete those rows from ps_updated_rows.
Repeat.

We can still keep the atomic nature of sync_local by wrapping all of this in a single transaction. But the client can control that, which means the client can also report progress (we may need to track progress counters if we want to report an actual percentage).

In theory a client could even opt to not wrap that in a single transaction, to avoid the blocking behavior, at the cost of losing consistency properties - see details below.

Optional consistency

For standard full checkpoints, our consistency behavior is always well-defined: All data tables atomically switch to the a checkpoint, and only when all local changes have been uploaded and acknowledged via a write checkpoint.

When using bucket priorities, we relax some of those properties, to get more responsiveness. Currently:

When syncing partial checkpoints for higher-priority buckets, we apply changes to those buckets immediately, if the same write checkpoint conditions are satisfied.
1. We ignore deletes at this stage, only syncing new and updated rows.
2. Apart from deletes, we guarantee consistency within a specific priority number - all changes for the same priority will be applied at the same time.
Whenever we sync a full checkpoint after partial checkpoints, we guarantee a fully consistent state again.
Priority 0 is a further special case:
1. We sync those changes without considering the local upload queue or write checkpoints.
2. This may overwrite changes that have not been uploaded yet, which can lead to flicker.
3. Due to this, we generally recommend this only for rows that are not modified locally.

The changes above can make some of this behavior configurable:

Deletes in partial checkpoints

If we're tracking specific buckets in ps_updated_rows, we can optionally sync deletes in partial checkpoints:

In a way, this improves consistency within a priority number. So if you have certain sets of data always synced with priority 1, we can still guarantee that data stays consistent.
This can cause "flicker" when "moving" data from a higher-priority bucket to a lower-priority bucket: The "REMOVE" operation will sync with a high priority, while the new "PUT" will come later with a low priority.
Due to the above, it might not be ideal to sync deletes in partial checkpoints by default, but we can make it opt-in, and perhaps change the default in new major versions of SDKs. We should probably dig further into specific use cases to make proper recommendations here.
If we make it configurable, should it only be a global configuration option, or should it be per stream, or per table?

Update after discussion with @simolus3: Arguably the case for moving data between different priority buckets is much more of an edge case than needing consistency within a bucket / sync stream, so we should consider whether deletes in partial checkpoints should be the new default. It could be considered a breaking change though.

Avoid overwriting local changes

With priority 0, we can avoid overwriting local changes: Avoid updating rows as long as there is a local entry in ps_updated_rows with bucket = 0 (local write).

That would mean if any row was updated locally, any synced updates would be blocked until the changes are uploaded, and the write checkpoint is synced back. Effectively, it disables the priority 0 behavior for those specific rows.

It would still be in the realm of "eventual consistency" properties for priority 0, but it would avoid the "flicker" currently seen.

Implementing this change is likely to improve apparent consistency in all cases affected by this. Although maybe there is an edge case where the current behavior is desired?

Incremental sync_local

Applying the entire sync_local step in one transaction is important if we want to maintain the current consistency properties. But we could add an option to relax those properties, to avoid blocking all local writes for a long time and get better responsiveness:

We can process the sync_local step in small chunks in separate transactions.
We'd still only perform sync_local for consistent checkpoints.

For each individual row, data would be atomically updated from one checkpoint to the next. But the overall local data would not be consistent: different rows would be updated at different times.

We could make this configurable by priority level:

It could make sense to always do priority 0 in this way, since we're working with eventual consistency there either way.
A user may choose to do this for say priority 1 and 2, but not priority 3.

I'm not sure whether that will actually help though: Full checkpoints would still be blocking in that case, so you're only removing that blocking behavior for a small number of cases.

Comparing with JourneyApps Platform

The PowerSync sync system was designed as an evolution of the JourneyApps Platform sync system, but with much stronger consistency properties.

The JourneyApps Platform sync system effectively has "eventual consistency" only. More specifically:

Updates are applied directly when downloaded - no checkpoint system, no atomic updates.
If a row is modified locally, synced updates to that row are skipped until that row is uploaded.

It is effectively similar to PowerSync with all buckets as priority 0, coupled with "Deletes in partial checkpoints", "Avoid overwriting local changes" and "Incremental sync_local" as described above.

Despite the big reduction in consistency properties, we still have apps syncing hundreds of thousands of rows per client, with no reports of issues caused by sync inconsistencies in practice.

All of that to say, there could be a valid case to relaxing the consistency properties as an opt-in option.

rkistner · 2026-04-22T18:59:22Z

rkistner
Apr 22, 2026
Maintainer Author

Quick and dirty performance test: #179

Initial conclusions:

No need for "Part 2" above - that appears completely unnecessary and potentially counter-productive here.
I have not done exhaustive testing yet, but the change does appear to have a meaningful reduction in filesystem operations and overall performance.
It seems to even have a positive effect for partial checkpoints, when either the PRIMARY KEY(row_type, row_id, bucket) or PRIMARY KEY(bucket, row_type, row_id) variations. The latter approach appears safer for edge cases, but may need more thorough testing.

0 replies

simolus3 · 2026-04-23T07:58:15Z

simolus3
Apr 23, 2026
Maintainer

How would powersync_trigger_resync be implemented under the new scheme, is it essentially an INSERT INTO ps_updated_rows (row_type, row_id) SELECT l.row_type, l.row_id FROM ps_oplog l INNER JOIN ps_buckets b ON l.bucket = b.id AND l.op_id <= b.last_applied_op? We should be aware of that potentially being expensive to do, but if it helps speed up sync_local I think it's worth it.

We can still keep the atomic nature of sync_local by wrapping all of this in a single transaction. But the client can control that, which means the client can also report progress (we may need to track progress counters if we want to report an actual percentage).

I like this idea 👍 Even though it would still block the database for writes during sync_local, at least there's a way for developers / users to know more about what's happening.

1 reply

rkistner Apr 23, 2026
Maintainer Author

How would powersync_trigger_resync be implemented under the new scheme

Yes, something like that is the simplest approach.

If we want to avoid that overhead upfront, there are other options. For example, we could just persist a "need to resync" flag somewhere, and then do that big INSERT INTO ps_updated_rows inside sync_local. I'm not sure if it would make a significant difference overall though - it just moves around where the expensive query is run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Restructuring sync_local #178

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Proposal] Restructuring sync_local #178

Uh oh!

Uh oh!

rkistner Apr 22, 2026 Maintainer

Background

Current implementation

Proposal

Part 1

Partial checkpoints

Part 2 (probably not)

Part 3: Incremental/chunks sync_local

Optional consistency

Deletes in partial checkpoints

Avoid overwriting local changes

Incremental sync_local

Comparing with JourneyApps Platform

Replies: 2 comments · 1 reply

Uh oh!

rkistner Apr 22, 2026 Maintainer Author

Uh oh!

simolus3 Apr 23, 2026 Maintainer

Uh oh!

rkistner Apr 23, 2026 Maintainer Author

rkistner
Apr 22, 2026
Maintainer

Replies: 2 comments 1 reply

rkistner
Apr 22, 2026
Maintainer Author

simolus3
Apr 23, 2026
Maintainer

rkistner Apr 23, 2026
Maintainer Author