feat(ios): introduce SQLiteKVCache — a persistent KV blob cache#477
feat(ios): introduce SQLiteKVCache — a persistent KV blob cache#477
Conversation
159613e to
35f9b31
Compare
e45bace to
150e298
Compare
| init( | ||
| handle: StaticString, | ||
| directory: URL = URL.cachesDirectory, | ||
| diskCapacity: Int |
There was a problem hiding this comment.
Using https://developer.apple.com/documentation/Foundation/Measurement, so that the unit is part of the type?
There was a problem hiding this comment.
Heh I was planning to do this in the next PR as a convenience init 😁
I think I always want the ability to specify the size in bytes, but it's nice to have some syntactic sugar atop that.
| } | ||
| } | ||
|
|
||
| var errorDescription: String? { description } |
There was a problem hiding this comment.
The description is not localized.
There was a problem hiding this comment.
This is actually deliberate – sqlite will emit an English description for any error code, so that's what goes here. Cache errors should never be shown to a user, they're just for debugging. But this way I can print(error) and it prints the nice description.
| /// compared against `PRAGMA user_version` on open; mismatches drop and recreate | ||
| /// the table. The check trusts the version: if a buggy past build wrote a | ||
| /// matching `user_version` but the wrong column shape, this won't detect it. | ||
| final class SQLiteKVStore: @unchecked Sendable { |
There was a problem hiding this comment.
I think we should explicitly add "cache" to the type name, to avoid it being misused.
| } | ||
| } | ||
|
|
||
| /// Inserts or overwrites the entry at `key`. Runs the LRU eviction sweep |
There was a problem hiding this comment.
The evictPastCapacity deletes oldest entries, which is not "LRU eviction", I think?
There was a problem hiding this comment.
Yeah, the approach is more FIFO and I've noted that now.
| let result = sqlite3_exec(db, sql, nil, nil, nil) | ||
| if result != SQLITE_OK { | ||
| Self.logger.error("exec failed: \(Self.describe(result)) — \(sql)") | ||
| throw Error.writeFailed(sqliteCode: result) |
There was a problem hiding this comment.
Nitpick: The exec function is not just used for writing, so I think writeFailed may not be the right semantic for the SQLite error.
83ae293 to
59af9b2
Compare
A small, self-contained SQLite-backed cache for opaque blobs. Each
entry is (key, storageDate, metadata, value); the caller owns any
serialization on top. Standalone — not yet wired into any consumer.
The EditorURLCache migration and HTMLPreviewManager migration are
follow-up PRs that build on this.
API shape
init(handle: StaticString, directory: URL = ..., diskCapacity: Int)
func get(key: String) throws -> Entry?
func put(key: String, storageDate: Date, metadata: Data, value: Data) throws
func delete(key: String) throws
func clear() throws
enum Error: Swift.Error {
case databaseUnavailable
case readFailed(sqliteCode: Int32)
case writeFailed(sqliteCode: Int32)
}
Init is non-throwing — the SQLite connection is opened lazily on the
first operation and the result (success or cached failure) is reused
for the lifetime of the cache. Callers using `try?` get a uniform
"cache unavailable" fall-through without a separate error path at the
construction site.
`get`, `put`, `delete`, and `clear` all throw; callers wanting "treat
read failure as a miss" semantics wrap `get` in `try?`. The Error enum
conforms to `CustomStringConvertible` and `LocalizedError`, so
`print(error)` and `Alert(message: error.localizedDescription)` both
render through `sqlite3_errstr` ("SQLite read failed: disk I/O error
(code 10)").
Notable behavior
- Every operation is a single autocommit SQL statement. `put` is
`INSERT … ON CONFLICT DO UPDATE` with the eviction sweep happening
inside `AFTER INSERT` / `AFTER UPDATE` triggers; the others are
single SELECT/DELETE statements. `SQLITE_OPEN_FULLMUTEX` serializes
the C-level API calls and SQLite's per-statement atomicity (including
trigger bodies) handles the rest — no Swift-level lock needed because
there's no multi-statement transaction to fence.
- Oldest-first eviction by storage_date, capped at diskCapacity, runs
inside the trigger atomically with the firing INSERT/UPDATE. Two
triggers (AFTER INSERT and AFTER UPDATE) because SQLite fires UPDATE
triggers, not INSERT, on the upsert's `ON CONFLICT DO UPDATE` path.
Bodies are identical: a window-function DELETE guarded by a
`WHEN (SELECT SUM…) > diskCapacity` aggregate, so puts that don't
push the table past cap pay only the cheap aggregate scan. An entry
whose own size exceeds diskCapacity is silently dropped (the trigger
would insert it then evict it immediately). Not strict LRU: `get`
doesn't bump recency. Callers wanting LRU semantics re-`put` on
access.
- Triggers are dropped and recreated on every successful open with the
current instance's `diskCapacity` baked in. `diskCapacity == 0`
drops the triggers and doesn't recreate them, disabling eviction.
- Single-thread `put` then `get` always observes the put.
- WAL is required, not best-effort. `journal_mode = WAL` is verified
by reading the resulting mode from the PRAGMA's row (sqlite3_exec
can't tell "set" from "stayed at default"), and open fails if
anything else comes back. WAL is a correctness dependency for
`synchronous = NORMAL` — NORMAL is documented safe under WAL but
only "probably safe" under rollback-journal — so the unsafe pairing
is impossible by construction. Acceptable durability tradeoff for a
cache: losing the last few writes on power loss costs an extra
refetch, not data loss against a primary store.
- Disk reclamation is one-shot at open. `diskCapacity` caps content
size (sum of `length(value) + length(metadata)` across rows), not
on-disk file size — eviction and DELETE leave free pages on the
freelist, which `auto_vacuum = NONE` (the default) doesn't reclaim.
Rather than `auto_vacuum = FULL` (which writes freelist pages on
every transaction), the open path runs a single VACUUM if the
freelist exceeds 25% of total pages. Bounds SSD churn to once per
process; tight launches pay only the freelist pragma reads.
- One instance per backing file. Two SQLiteKVCaches against the same
`<directory>/<handle>.sqlite` is undefined behavior — the eviction
triggers are recreated unconditionally on open, so two instances
with different caps would clobber each other's triggers. Documented
but not enforced at runtime.
- Permissive input contract — any String key (SHA-256 hashed before
binding so length, encoding, embedded nulls, and SQL escaping are
all moot), StaticString handle (compile-time literal, lowercased
for the filename, debug-build assert that it matches
[a-zA-Z0-9._-]), any Data value/metadata (uses sqlite3_bind_blob64
so the byte-count cast can't trap on hypothetical >2 GiB inputs).
- Schema versioning via PRAGMA user_version — mismatches (including
fresh DBs, where user_version defaults to 0) drop and recreate the
table, accepting cache reset as the migration model. Currently at v1
with no shipped history.
Notable safety / correctness fixes from review iteration
- sqlite3_close_v2 in deinit (tolerates leaked statements);
deinit only closes if the lazy open succeeded
- bind_blob64 (UInt64 length) avoids the Int32 cast trap
- Prepare-result checked at every prepare site (get, put, delete,
readSchemaVersion, enableWAL) — symmetric error surfacing instead
of laundering prepare failures into MISUSE at step time
- All sqlite3_exec calls log on non-OK return codes via a wrapper
- bind_blob64 / bind_double / bindKey wrapped in throwing helpers so
bind failures surface with the operation's own Error category
(readFailed / writeFailed) and the specific bind code (TOOBIG,
RANGE, NOMEM) at the bind site, not laundered into MISUSE at step
time
Test coverage
SQLiteKVCacheTests — focused tests at the KV-cache API:
- put + get round-trip, missing key, overwrite, clear, distinct
keys, empty value/metadata
- persistence across instances, handle independence in the same
directory, init lowercases mixed-case handles
- Eviction: under capacity, oldest evicted, ties broken
deterministically, diskCapacity:0 disables, oversized entry
dropped on store
- Error description formatting (writeFailed, readFailed,
databaseUnavailable)
- Handle validation: isValidHandle accepts/rejects the expected
shapes
- 200-task concurrent stress test, plus a sibling that exercises
the eviction-trigger path under contention with a small
diskCapacity
- Schema setup: fresh DB ends up at the current schemaVersion;
legacy schema-mismatch recovery
- WAL: opens at journal_mode=WAL (verified via separate connection)
- Adversarial keys: SQL-shaped, unicode, embedded nulls
- VACUUM on open: reclaims freelist after heavy churn; skipped
when freelist is below threshold
59af9b2 to
476061e
Compare
crazytonyli
left a comment
There was a problem hiding this comment.
I don't know much about sqlite3's WAL and VACUUM. If they are for improving file writing performance and accurate capacity control, I wonder if the implementation can be dumbed down a little bit:
- Should we improve performance after we have a baseline benchmark?
- If we can't keep capacity accurately, the worst thing is probably that we are a few megabytes (or less) over the limit, which can probably be fixed next time the "evict capacity" is invoked. That does not feel like a big deal to me?
| /// silently wipe users' caches a second time on the next upgrade). | ||
| private static let schemaVersion: Int32 = 1 | ||
|
|
||
| private static let logger = Logger(subsystem: "GutenbergKit", category: "sqlite-kv-cache") |
There was a problem hiding this comment.
Nitpick: use swift-log, which would work better in the app?
There was a problem hiding this comment.
We don't actually have swift-log in this project yet. I'd be ok to adopt it, but I think that's a separate PR.
WAL helps with performance, but more importantly correctness – see https://github.com/wordpress-mobile/GutenbergKit/pull/477/changes#diff-2dca77c03b10dc2fa896ac59deb88c66f46b98aec3c541f445eba596ba5b33ceR402-R405. It's the default for Core Data, so it's not unusual to use it this way.
The issue is that without VACUUM that space is never given back to the system. We might only ever cache 100MB of data, but if we hit that number in the DB then evict every entry the DB still has ~100MB reserved on-disk. VACUUM is how we give that space back to the system (I mean...the OS can also just delete our file since it's in the cache directory, but this is slightly nicer). |
| /// delete. | ||
| func clear() throws { | ||
| let db = try connection() | ||
| try Self.exec("DELETE FROM entries;", on: db) |
There was a problem hiding this comment.
The issue is that without VACUUM that space is never given back to the system.
This statement would need a VACUUM, too?
Introduces
SQLiteKVCache, a small self-contained SQLite-backed key-value blob cache. Standalone — not yet wired into any consumer. Migration ofEditorURLCache(and follow-up ofHTMLPreviewManager) onto this cache is a separate PR.What it is
Each entry is
(key, storageDate, metadata, value). Values and metadata are stored as raw bytes — the caller owns any serialization on top.API
initis non-throwing. The SQLite connection is opened lazily on the first operation and the result (success or cached failure) is reused for the lifetime of the cache — callers usingtry?get a uniform "cache unavailable" fall-through without a separate error path at the construction site.get,put,delete, andclearall throw.getreturnsnilonly for a genuine miss (SQLITE_DONE); any other non-row step result or a prepare/bind failure throwsError.readFailed. Callers wanting "treat read failures as misses" semantics wrapgetintry?. Writes throw on IO failure (disk full, blob oversize, corruption); the SQLite result code is preserved in the associated value for callers that want to pattern-match.The
Errorenum renders throughsqlite3_errstr, soprint(error)anderror.localizedDescriptionboth produce sentences like"SQLite write failed: disk I/O error (code 10)"instead of"writeFailed(sqliteCode: 10)".Notable behavior
synchronous = NORMALtrades the last few committed writes on power loss; schema-version mismatch drops the table; eviction is destructive and oldest-first bystorage_date.Cachein the type name is load-bearing — primary-store data does not belong here.putisINSERT … ON CONFLICT DO UPDATEwith the eviction sweep happening insideAFTER INSERT/AFTER UPDATEtriggers;get/delete/clearare singleSELECT/DELETEstatements.SQLITE_OPEN_FULLMUTEXserializes the C-level API calls and SQLite's per-statement atomicity (including trigger bodies) handles the rest — there's no Swift-level lock because there's no multi-statement transaction to fence. Single-threadputthengetalways observes the put.AFTER INSERTandAFTER UPDATE, because SQLite fires UPDATE triggers — not INSERT — on the upsert'sON CONFLICT DO UPDATEbranch) with identical bodies: a window-functionDELETEthat removes oldest-by-storage_daterows until total stored size is at or belowdiskCapacity. Each is guarded by aWHEN (SELECT SUM…) > diskCapacityaggregate so puts that don't push the table past cap pay only the cheap aggregate scan, not the full window-function query. An entry whose own size exceedsdiskCapacityis silently dropped (the trigger would insert it then evict it immediately, so the round trip is skipped). Not strict LRU:getdoesn't bump recency. Callers wanting LRU semantics re-puton access.installEvictionTriggers(on:diskCapacity:)runs unconditionally during the lazy open, with the current instance'sdiskCapacitybaked into the trigger SQL.diskCapacity == 0drops the triggers and doesn't recreate them, disabling eviction entirely.journal_mode = WALis verified by reading the resulting mode from the PRAGMA's row (sqlite3_execcan't tell "set" from "stayed at default"), and open fails if anything else comes back. WAL is a correctness dependency forsynchronous = NORMAL— NORMAL is documented safe under WAL but only "probably safe" under rollback-journal — so the unsafe pairing is impossible by construction. Acceptable durability tradeoff for a cache: losing the last few writes on power loss costs an extra refetch, not data loss against a primary store.diskCapacitycaps content size (sum oflength(value) + length(metadata)across rows), not on-disk file size — eviction and DELETE leave free pages on the freelist, whichauto_vacuum = NONE(the default) doesn't reclaim. Rather thanauto_vacuum = FULL(which writes freelist pages on every transaction), the open path runs a singleVACUUMif the freelist exceeds 25% of total pages. Bounds SSD churn to once per process; tight launches pay only the freelist pragma reads.Stringkey is SHA-256-hashed (raw 32-byte digest) before binding, so length, encoding, embedded null bytes, and SQL-escaping concerns are all moot. AnyDatavalue/metadata usessqlite3_bind_blob64so the byte-count cast can't trap on hypothetical >2 GiB inputs.handleisStaticString, so callers must pass a string literal — runtime-constructed handles are rejected at compile time. The handle is lowercased before becoming the filename (so"EditorURLCache"and"editorurlcache"share a database file, matching case-insensitive filesystem defaults). A debug-buildassertcatches handles that aren't[a-zA-Z0-9._-]; release builds use the lowercased handle as-given.SQLiteKVCacheinstances against the same<directory>/<handle>.sqliteis undefined behavior — the eviction triggers are recreated unconditionally on open, so two instances with different caps would clobber each other's triggers. Documented but not enforced at runtime.PRAGMA user_versionis checked on open; anything other than the constant baked into code triggers a drop-and-recreate. That includes the fresh-DB case (user_versiondefaults to 0 on a new SQLite file). Currently at v1 with no shipped history; future schema changes that hit user devices bump the constant and accept the cache reset as the migration model.sqlite3_bind_blob64/sqlite3_bind_doubleare wrapped in throwing helpers that surface the bind result code (SQLITE_TOOBIG,SQLITE_RANGE,SQLITE_NOMEM) under the operation's ownErrorcategory — no laundering intoSQLITE_MISUSEat step time.Test coverage
SQLiteKVCacheTests— focused tests at the KV-cache API:put/getround-trip, missing key, overwrite, clear, distinct keys, empty value/metadata round-trip, delete one, delete missing keystorage_date): under capacity nothing evicts, oldest evicted over capacity, eviction tie-breaker deterministic on equalstorage_date,diskCapacity: 0disables, oversized-entry dropped on store'; DROP TABLE entries; --), unicode/emoji, embedded null bytes — all round-trip independently because SHA-256 distinguishes themisValidHandleaccepts ordinary identifiers, rejects empty /./../ unsafe characters / uppercase / non-ASCII;initlowercases mixed-case handles so they share a database filewriteFailed,readFailed, anddatabaseUnavailableall render throughsqlite3_errstr;localizedDescriptionmatchesdescriptionput/getagainst 10 overlapping keys (no eviction); a sibling test runs the same shape with a smalldiskCapacityso eviction fires on most puts, exercising the eviction-trigger path under contentionschemaVersion; legacy schema-mismatch recovery (usesInt32.maxas a sentineluser_versionso the test stays collision-proof ifschemaVersionis bumped)journal_mode = wal, verified via a separate connectionWhat's not in this PR
EditorURLCache— it's stillURLCache-backed, exactly as ontrunkRESTAPIRepositoryorHTMLPreviewManager(the prospective consumers)The follow-up PR migrates
EditorURLCacheontoSQLiteKVCache. A subsequent one does the same forHTMLPreviewManager.Test plan
make test-swift-package— all suites passEditorURLCacheTests(unchanged fromtrunk) still passes — the new file doesn't introduce any unintended coupling