Skip to content

feat(ios): introduce SQLiteKVCache — a persistent KV blob cache#477

Merged
jkmassel merged 1 commit intotrunkfrom
jkmassel/replace-urlcache
Apr 30, 2026
Merged

feat(ios): introduce SQLiteKVCache — a persistent KV blob cache#477
jkmassel merged 1 commit intotrunkfrom
jkmassel/replace-urlcache

Conversation

@jkmassel
Copy link
Copy Markdown
Contributor

@jkmassel jkmassel commented Apr 27, 2026

Introduces SQLiteKVCache, a small self-contained SQLite-backed key-value blob cache. Standalone — not yet wired into any consumer. Migration of EditorURLCache (and follow-up of HTMLPreviewManager) onto this cache is a separate PR.

What it is

let cache = SQLiteKVCache(handle: "MyCache", diskCapacity: 100 * 1024 * 1024)
try cache.put(key: "foo", storageDate: .now, metadata: Data(), value: Data("hello"))
let entry = try cache.get(key: "foo")    // (storageDate, metadata, value)?
try cache.delete(key: "foo")
try cache.clear()

Each entry is (key, storageDate, metadata, value). Values and metadata are stored as raw bytes — the caller owns any serialization on top.

API

init(handle: StaticString, directory: URL = URL.cachesDirectory, diskCapacity: Int)
func get(key: String) throws -> Entry?
func put(key: String, storageDate: Date, metadata: Data, value: Data) throws
func delete(key: String) throws
func clear() throws

enum Error: Swift.Error, CustomStringConvertible, LocalizedError {
    case databaseUnavailable
    case readFailed(sqliteCode: Int32)
    case writeFailed(sqliteCode: Int32)
}

init is non-throwing. The SQLite connection is opened lazily on the first operation and the result (success or cached failure) is reused for the lifetime of the cache — callers using try? get a uniform "cache unavailable" fall-through without a separate error path at the construction site.

get, put, delete, and clear all throw. get returns nil only for a genuine miss (SQLITE_DONE); any other non-row step result or a prepare/bind failure throws Error.readFailed. Callers wanting "treat read failures as misses" semantics wrap get in try?. Writes throw on IO failure (disk full, blob oversize, corruption); the SQLite result code is preserved in the associated value for callers that want to pattern-match.

The Error enum renders through sqlite3_errstr, so print(error) and error.localizedDescription both produce sentences like "SQLite write failed: disk I/O error (code 10)" instead of "writeFailed(sqliteCode: 10)".

Notable behavior

  • Cache, not store. synchronous = NORMAL trades the last few committed writes on power loss; schema-version mismatch drops the table; eviction is destructive and oldest-first by storage_date. Cache in the type name is load-bearing — primary-store data does not belong here.
  • Single autocommit statements, no Swift-level lock. Every operation is one SQL statement: put is INSERT … ON CONFLICT DO UPDATE with the eviction sweep happening inside AFTER INSERT/AFTER UPDATE triggers; get/delete/clear are single SELECT/DELETE statements. SQLITE_OPEN_FULLMUTEX serializes the C-level API calls and SQLite's per-statement atomicity (including trigger bodies) handles the rest — there's no Swift-level lock because there's no multi-statement transaction to fence. Single-thread put then get always observes the put.
  • Oldest-first eviction via triggers. Two triggers (AFTER INSERT and AFTER UPDATE, because SQLite fires UPDATE triggers — not INSERT — on the upsert's ON CONFLICT DO UPDATE branch) with identical bodies: a window-function DELETE that removes oldest-by-storage_date rows until total stored size is at or below diskCapacity. Each is guarded by a WHEN (SELECT SUM…) > diskCapacity aggregate so puts that don't push the table past cap pay only the cheap aggregate scan, not the full window-function query. An entry whose own size exceeds diskCapacity is silently dropped (the trigger would insert it then evict it immediately, so the round trip is skipped). Not strict LRU: get doesn't bump recency. Callers wanting LRU semantics re-put on access.
  • Triggers recreated on every open. installEvictionTriggers(on:diskCapacity:) runs unconditionally during the lazy open, with the current instance's diskCapacity baked into the trigger SQL. diskCapacity == 0 drops the triggers and doesn't recreate them, disabling eviction entirely.
  • WAL is required, not best-effort. journal_mode = WAL is verified by reading the resulting mode from the PRAGMA's row (sqlite3_exec can't tell "set" from "stayed at default"), and open fails if anything else comes back. WAL is a correctness dependency for synchronous = NORMAL — NORMAL is documented safe under WAL but only "probably safe" under rollback-journal — so the unsafe pairing is impossible by construction. Acceptable durability tradeoff for a cache: losing the last few writes on power loss costs an extra refetch, not data loss against a primary store.
  • Disk reclamation is one-shot at open. diskCapacity caps content size (sum of length(value) + length(metadata) across rows), not on-disk file size — eviction and DELETE leave free pages on the freelist, which auto_vacuum = NONE (the default) doesn't reclaim. Rather than auto_vacuum = FULL (which writes freelist pages on every transaction), the open path runs a single VACUUM if the freelist exceeds 25% of total pages. Bounds SSD churn to once per process; tight launches pay only the freelist pragma reads.
  • Permissive blob input. Any String key is SHA-256-hashed (raw 32-byte digest) before binding, so length, encoding, embedded null bytes, and SQL-escaping concerns are all moot. Any Data value/metadata uses sqlite3_bind_blob64 so the byte-count cast can't trap on hypothetical >2 GiB inputs.
  • Tight handle input. handle is StaticString, so callers must pass a string literal — runtime-constructed handles are rejected at compile time. The handle is lowercased before becoming the filename (so "EditorURLCache" and "editorurlcache" share a database file, matching case-insensitive filesystem defaults). A debug-build assert catches handles that aren't [a-zA-Z0-9._-]; release builds use the lowercased handle as-given.
  • One instance per backing file. Two SQLiteKVCache instances against the same <directory>/<handle>.sqlite is undefined behavior — the eviction triggers are recreated unconditionally on open, so two instances with different caps would clobber each other's triggers. Documented but not enforced at runtime.
  • Schema versioning. PRAGMA user_version is checked on open; anything other than the constant baked into code triggers a drop-and-recreate. That includes the fresh-DB case (user_version defaults to 0 on a new SQLite file). Currently at v1 with no shipped history; future schema changes that hit user devices bump the constant and accept the cache reset as the migration model.
  • Bind failures throw at the bind site. sqlite3_bind_blob64 / sqlite3_bind_double are wrapped in throwing helpers that surface the bind result code (SQLITE_TOOBIG, SQLITE_RANGE, SQLITE_NOMEM) under the operation's own Error category — no laundering into SQLITE_MISUSE at step time.

Test coverage

SQLiteKVCacheTests — focused tests at the KV-cache API:

  • Round-trip and basic CRUD: put/get round-trip, missing key, overwrite, clear, distinct keys, empty value/metadata round-trip, delete one, delete missing key
  • Persistence: entries persist across instances (same directory + handle), handle independence in the same directory
  • Eviction (oldest-first by storage_date): under capacity nothing evicts, oldest evicted over capacity, eviction tie-breaker deterministic on equal storage_date, diskCapacity: 0 disables, oversized-entry dropped on store
  • Adversarial keys: SQL-shaped ('; DROP TABLE entries; --), unicode/emoji, embedded null bytes — all round-trip independently because SHA-256 distinguishes them
  • Handle validation: isValidHandle accepts ordinary identifiers, rejects empty / . / .. / unsafe characters / uppercase / non-ASCII; init lowercases mixed-case handles so they share a database file
  • Error descriptions: writeFailed, readFailed, and databaseUnavailable all render through sqlite3_errstr; localizedDescription matches description
  • Concurrency: 200-task concurrent put/get against 10 overlapping keys (no eviction); a sibling test runs the same shape with a small diskCapacity so eviction fires on most puts, exercising the eviction-trigger path under contention
  • Schema setup: fresh DB ends up at the current schemaVersion; legacy schema-mismatch recovery (uses Int32.max as a sentinel user_version so the test stays collision-proof if schemaVersion is bumped)
  • WAL: opens with journal_mode = wal, verified via a separate connection
  • VACUUM on open: reclaims the freelist after heavy churn (populate, delete most rows, reopen, assert freelist is zero); skipped when freelist is below threshold (write-only workload, reopen, assert page count unchanged)

What's not in this PR

  • No changes to EditorURLCache — it's still URLCache-backed, exactly as on trunk
  • No changes to RESTAPIRepository or HTMLPreviewManager (the prospective consumers)

The follow-up PR migrates EditorURLCache onto SQLiteKVCache. A subsequent one does the same for HTMLPreviewManager.

Test plan

  • make test-swift-package — all suites pass
  • EditorURLCacheTests (unchanged from trunk) still passes — the new file doesn't introduce any unintended coupling
  • Confirm Buildkite swift-test passes

@github-actions github-actions Bot added the [Type] Task Issues or PRs that have been broken down into an individual action to take label Apr 27, 2026
@jkmassel jkmassel changed the title refactor(ios): replace URLCache with filesystem KV store refactor(ios): replace URLCache with SQLite-backed EditorURLCache Apr 27, 2026
@jkmassel jkmassel force-pushed the jkmassel/replace-urlcache branch from 159613e to 35f9b31 Compare April 28, 2026 18:33
@jkmassel jkmassel changed the title refactor(ios): replace URLCache with SQLite-backed EditorURLCache feat(ios): introduce SQLiteKVStore — a persistent KV blob store Apr 28, 2026
@jkmassel jkmassel requested a review from crazytonyli April 28, 2026 18:47
@jkmassel jkmassel self-assigned this Apr 28, 2026
@jkmassel jkmassel marked this pull request as draft April 28, 2026 20:38
@jkmassel jkmassel force-pushed the jkmassel/replace-urlcache branch 7 times, most recently from e45bace to 150e298 Compare April 28, 2026 22:54
init(
handle: StaticString,
directory: URL = URL.cachesDirectory,
diskCapacity: Int
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using https://developer.apple.com/documentation/Foundation/Measurement, so that the unit is part of the type?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh I was planning to do this in the next PR as a convenience init 😁

I think I always want the ability to specify the size in bytes, but it's nice to have some syntactic sugar atop that.

}
}

var errorDescription: String? { description }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is not localized.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually deliberate – sqlite will emit an English description for any error code, so that's what goes here. Cache errors should never be shown to a user, they're just for debugging. But this way I can print(error) and it prints the nice description.

/// compared against `PRAGMA user_version` on open; mismatches drop and recreate
/// the table. The check trusts the version: if a buggy past build wrote a
/// matching `user_version` but the wrong column shape, this won't detect it.
final class SQLiteKVStore: @unchecked Sendable {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should explicitly add "cache" to the type name, to avoid it being misused.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid point!

}
}

/// Inserts or overwrites the entry at `key`. Runs the LRU eviction sweep
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evictPastCapacity deletes oldest entries, which is not "LRU eviction", I think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the approach is more FIFO and I've noted that now.

let result = sqlite3_exec(db, sql, nil, nil, nil)
if result != SQLITE_OK {
Self.logger.error("exec failed: \(Self.describe(result)) — \(sql)")
throw Error.writeFailed(sqliteCode: result)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: The exec function is not just used for writing, so I think writeFailed may not be the right semantic for the SQLite error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted this

@jkmassel jkmassel force-pushed the jkmassel/replace-urlcache branch 4 times, most recently from 83ae293 to 59af9b2 Compare April 29, 2026 22:58
@jkmassel jkmassel changed the title feat(ios): introduce SQLiteKVStore — a persistent KV blob store feat(ios): introduce SQLiteKVCache — a persistent KV blob cache Apr 29, 2026
A small, self-contained SQLite-backed cache for opaque blobs. Each
entry is (key, storageDate, metadata, value); the caller owns any
serialization on top. Standalone — not yet wired into any consumer.
The EditorURLCache migration and HTMLPreviewManager migration are
follow-up PRs that build on this.

API shape

  init(handle: StaticString, directory: URL = ..., diskCapacity: Int)
  func get(key: String) throws -> Entry?
  func put(key: String, storageDate: Date, metadata: Data, value: Data) throws
  func delete(key: String) throws
  func clear() throws

  enum Error: Swift.Error {
      case databaseUnavailable
      case readFailed(sqliteCode: Int32)
      case writeFailed(sqliteCode: Int32)
  }

Init is non-throwing — the SQLite connection is opened lazily on the
first operation and the result (success or cached failure) is reused
for the lifetime of the cache. Callers using `try?` get a uniform
"cache unavailable" fall-through without a separate error path at the
construction site.

`get`, `put`, `delete`, and `clear` all throw; callers wanting "treat
read failure as a miss" semantics wrap `get` in `try?`. The Error enum
conforms to `CustomStringConvertible` and `LocalizedError`, so
`print(error)` and `Alert(message: error.localizedDescription)` both
render through `sqlite3_errstr` ("SQLite read failed: disk I/O error
(code 10)").

Notable behavior

- Every operation is a single autocommit SQL statement. `put` is
  `INSERT … ON CONFLICT DO UPDATE` with the eviction sweep happening
  inside `AFTER INSERT` / `AFTER UPDATE` triggers; the others are
  single SELECT/DELETE statements. `SQLITE_OPEN_FULLMUTEX` serializes
  the C-level API calls and SQLite's per-statement atomicity (including
  trigger bodies) handles the rest — no Swift-level lock needed because
  there's no multi-statement transaction to fence.
- Oldest-first eviction by storage_date, capped at diskCapacity, runs
  inside the trigger atomically with the firing INSERT/UPDATE. Two
  triggers (AFTER INSERT and AFTER UPDATE) because SQLite fires UPDATE
  triggers, not INSERT, on the upsert's `ON CONFLICT DO UPDATE` path.
  Bodies are identical: a window-function DELETE guarded by a
  `WHEN (SELECT SUM…) > diskCapacity` aggregate, so puts that don't
  push the table past cap pay only the cheap aggregate scan. An entry
  whose own size exceeds diskCapacity is silently dropped (the trigger
  would insert it then evict it immediately). Not strict LRU: `get`
  doesn't bump recency. Callers wanting LRU semantics re-`put` on
  access.
- Triggers are dropped and recreated on every successful open with the
  current instance's `diskCapacity` baked in. `diskCapacity == 0`
  drops the triggers and doesn't recreate them, disabling eviction.
- Single-thread `put` then `get` always observes the put.
- WAL is required, not best-effort. `journal_mode = WAL` is verified
  by reading the resulting mode from the PRAGMA's row (sqlite3_exec
  can't tell "set" from "stayed at default"), and open fails if
  anything else comes back. WAL is a correctness dependency for
  `synchronous = NORMAL` — NORMAL is documented safe under WAL but
  only "probably safe" under rollback-journal — so the unsafe pairing
  is impossible by construction. Acceptable durability tradeoff for a
  cache: losing the last few writes on power loss costs an extra
  refetch, not data loss against a primary store.
- Disk reclamation is one-shot at open. `diskCapacity` caps content
  size (sum of `length(value) + length(metadata)` across rows), not
  on-disk file size — eviction and DELETE leave free pages on the
  freelist, which `auto_vacuum = NONE` (the default) doesn't reclaim.
  Rather than `auto_vacuum = FULL` (which writes freelist pages on
  every transaction), the open path runs a single VACUUM if the
  freelist exceeds 25% of total pages. Bounds SSD churn to once per
  process; tight launches pay only the freelist pragma reads.
- One instance per backing file. Two SQLiteKVCaches against the same
  `<directory>/<handle>.sqlite` is undefined behavior — the eviction
  triggers are recreated unconditionally on open, so two instances
  with different caps would clobber each other's triggers. Documented
  but not enforced at runtime.
- Permissive input contract — any String key (SHA-256 hashed before
  binding so length, encoding, embedded nulls, and SQL escaping are
  all moot), StaticString handle (compile-time literal, lowercased
  for the filename, debug-build assert that it matches
  [a-zA-Z0-9._-]), any Data value/metadata (uses sqlite3_bind_blob64
  so the byte-count cast can't trap on hypothetical >2 GiB inputs).
- Schema versioning via PRAGMA user_version — mismatches (including
  fresh DBs, where user_version defaults to 0) drop and recreate the
  table, accepting cache reset as the migration model. Currently at v1
  with no shipped history.

Notable safety / correctness fixes from review iteration

- sqlite3_close_v2 in deinit (tolerates leaked statements);
  deinit only closes if the lazy open succeeded
- bind_blob64 (UInt64 length) avoids the Int32 cast trap
- Prepare-result checked at every prepare site (get, put, delete,
  readSchemaVersion, enableWAL) — symmetric error surfacing instead
  of laundering prepare failures into MISUSE at step time
- All sqlite3_exec calls log on non-OK return codes via a wrapper
- bind_blob64 / bind_double / bindKey wrapped in throwing helpers so
  bind failures surface with the operation's own Error category
  (readFailed / writeFailed) and the specific bind code (TOOBIG,
  RANGE, NOMEM) at the bind site, not laundered into MISUSE at step
  time

Test coverage

  SQLiteKVCacheTests — focused tests at the KV-cache API:
  - put + get round-trip, missing key, overwrite, clear, distinct
    keys, empty value/metadata
  - persistence across instances, handle independence in the same
    directory, init lowercases mixed-case handles
  - Eviction: under capacity, oldest evicted, ties broken
    deterministically, diskCapacity:0 disables, oversized entry
    dropped on store
  - Error description formatting (writeFailed, readFailed,
    databaseUnavailable)
  - Handle validation: isValidHandle accepts/rejects the expected
    shapes
  - 200-task concurrent stress test, plus a sibling that exercises
    the eviction-trigger path under contention with a small
    diskCapacity
  - Schema setup: fresh DB ends up at the current schemaVersion;
    legacy schema-mismatch recovery
  - WAL: opens at journal_mode=WAL (verified via separate connection)
  - Adversarial keys: SQL-shaped, unicode, embedded nulls
  - VACUUM on open: reclaims freelist after heavy churn; skipped
    when freelist is below threshold
@jkmassel jkmassel force-pushed the jkmassel/replace-urlcache branch from 59af9b2 to 476061e Compare April 29, 2026 23:05
@jkmassel jkmassel marked this pull request as ready for review April 29, 2026 23:39
@jkmassel jkmassel requested a review from crazytonyli April 29, 2026 23:39
Copy link
Copy Markdown
Contributor

@crazytonyli crazytonyli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about sqlite3's WAL and VACUUM. If they are for improving file writing performance and accurate capacity control, I wonder if the implementation can be dumbed down a little bit:

  1. Should we improve performance after we have a baseline benchmark?
  2. If we can't keep capacity accurately, the worst thing is probably that we are a few megabytes (or less) over the limit, which can probably be fixed next time the "evict capacity" is invoked. That does not feel like a big deal to me?

/// silently wipe users' caches a second time on the next upgrade).
private static let schemaVersion: Int32 = 1

private static let logger = Logger(subsystem: "GutenbergKit", category: "sqlite-kv-cache")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: use swift-log, which would work better in the app?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually have swift-log in this project yet. I'd be ok to adopt it, but I think that's a separate PR.

@jkmassel
Copy link
Copy Markdown
Contributor Author

WAL

WAL helps with performance, but more importantly correctness – see https://github.com/wordpress-mobile/GutenbergKit/pull/477/changes#diff-2dca77c03b10dc2fa896ac59deb88c66f46b98aec3c541f445eba596ba5b33ceR402-R405. It's the default for Core Data, so it's not unusual to use it this way.

the worst thing is probably that we are a few megabytes (or less) over the limit

The issue is that without VACUUM that space is never given back to the system. We might only ever cache 100MB of data, but if we hit that number in the DB then evict every entry the DB still has ~100MB reserved on-disk. VACUUM is how we give that space back to the system (I mean...the OS can also just delete our file since it's in the cache directory, but this is slightly nicer).

@jkmassel jkmassel merged commit 9072d67 into trunk Apr 30, 2026
19 checks passed
@jkmassel jkmassel deleted the jkmassel/replace-urlcache branch April 30, 2026 16:03
/// delete.
func clear() throws {
let db = try connection()
try Self.exec("DELETE FROM entries;", on: db)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that without VACUUM that space is never given back to the system.

This statement would need a VACUUM, too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[Type] Task Issues or PRs that have been broken down into an individual action to take

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants