Environment
- engraph 1.7.2 (prebuilt
engraph-macos-arm64)
- macOS, Apple Silicon (Metal), embed model
embeddinggemma-300M-Q8_0
- Vault ~4,100 files / ~607k chunks
Summary
With intelligence enabled (engraph configure --enable-intelligence), a full re-index appears to run the entire mining + embedding pass inside a single, never-committed SQLite transaction. The WAL file (engraph.db-wal) grew to ~12 GB, after which every page operation crawls inside walFindFrame over that WAL. The job ran ~6.5 hours with zero commits and had to be killed.
Impact
A full rebuild is the only way to populate mention / auto-link edges — a plain incremental engraph index does not mine mentions. So on a larger vault, auto-linking is effectively unreachable: the only path to it is the rebuild that doesn't terminate.
Steps to reproduce
engraph configure --enable-intelligence
- Run a full re-index of a few-thousand-file vault.
- Watch
engraph.db-wal grow without bound; CPU pegged in walFindFrame; no commits land.
Observations
- Removing all readers does not shrink the WAL, which points to one large uncommitted write transaction rather than a reader-blocked checkpoint.
- The same full rebuild with intelligence OFF completes normally.
- Recovery that worked: kill the job →
engraph configure --disable-intelligence → sqlite3 engraph.db "PRAGMA wal_checkpoint(TRUNCATE);" (uncommitted WAL rolls back to last good index) → plain incremental engraph index.
Request
Commit/checkpoint periodically during the intelligence mining + embedding pass (e.g. batched every N files/chunks) so the WAL stays bounded and progress is durable and resumable. Alternatively, document the constraint and offer a batched/resumable rebuild mode.
Related, lower priority: BERT-architecture embedding models (bge-small-en-v1.5, all-MiniLM-L6-v2) crash on load with GGML_ASSERT(i01 >= 0 && i01 < ne01) failed in ggml_compute_forward_get_rows; only Gemma-arch embedders load. A newer llama-cpp-2 bump may resolve it. (Not blocking — a QAT q4_0 Gemma quant works as an alternative.)
Environment
engraph-macos-arm64)embeddinggemma-300M-Q8_0Summary
With intelligence enabled (
engraph configure --enable-intelligence), a full re-index appears to run the entire mining + embedding pass inside a single, never-committed SQLite transaction. The WAL file (engraph.db-wal) grew to ~12 GB, after which every page operation crawls insidewalFindFrameover that WAL. The job ran ~6.5 hours with zero commits and had to be killed.Impact
A full rebuild is the only way to populate mention / auto-link edges — a plain incremental
engraph indexdoes not mine mentions. So on a larger vault, auto-linking is effectively unreachable: the only path to it is the rebuild that doesn't terminate.Steps to reproduce
engraph configure --enable-intelligenceengraph.db-walgrow without bound; CPU pegged inwalFindFrame; no commits land.Observations
engraph configure --disable-intelligence→sqlite3 engraph.db "PRAGMA wal_checkpoint(TRUNCATE);"(uncommitted WAL rolls back to last good index) → plain incrementalengraph index.Request
Commit/checkpoint periodically during the intelligence mining + embedding pass (e.g. batched every N files/chunks) so the WAL stays bounded and progress is durable and resumable. Alternatively, document the constraint and offer a batched/resumable rebuild mode.
Related, lower priority: BERT-architecture embedding models (
bge-small-en-v1.5,all-MiniLM-L6-v2) crash on load withGGML_ASSERT(i01 >= 0 && i01 < ne01) failedinggml_compute_forward_get_rows; only Gemma-arch embedders load. A newerllama-cpp-2bump may resolve it. (Not blocking — a QAT q4_0 Gemma quant works as an alternative.)