Full re-index with intelligence enabled balloons the WAL to multi-GB and never commits

**Environment**
- engraph 1.7.2 (prebuilt `engraph-macos-arm64`)
- macOS, Apple Silicon (Metal), embed model `embeddinggemma-300M-Q8_0`
- Vault ~4,100 files / ~607k chunks

**Summary**
With intelligence enabled (`engraph configure --enable-intelligence`), a full re-index appears to run the entire mining + embedding pass inside a single, never-committed SQLite transaction. The WAL file (`engraph.db-wal`) grew to ~12 GB, after which every page operation crawls inside `walFindFrame` over that WAL. The job ran ~6.5 hours with **zero** commits and had to be killed.

**Impact**
A full rebuild is the only way to populate mention / auto-link edges — a plain incremental `engraph index` does not mine mentions. So on a larger vault, auto-linking is effectively unreachable: the only path to it is the rebuild that doesn't terminate.

**Steps to reproduce**
1. `engraph configure --enable-intelligence`
2. Run a full re-index of a few-thousand-file vault.
3. Watch `engraph.db-wal` grow without bound; CPU pegged in `walFindFrame`; no commits land.

**Observations**
- Removing all readers does **not** shrink the WAL, which points to one large uncommitted write transaction rather than a reader-blocked checkpoint.
- The same full rebuild with intelligence **OFF** completes normally.
- Recovery that worked: kill the job → `engraph configure --disable-intelligence` → `sqlite3 engraph.db "PRAGMA wal_checkpoint(TRUNCATE);"` (uncommitted WAL rolls back to last good index) → plain incremental `engraph index`.

**Request**
Commit/checkpoint periodically during the intelligence mining + embedding pass (e.g. batched every N files/chunks) so the WAL stays bounded and progress is durable and resumable. Alternatively, document the constraint and offer a batched/resumable rebuild mode.

---

_Related, lower priority:_ BERT-architecture embedding models (`bge-small-en-v1.5`, `all-MiniLM-L6-v2`) crash on load with `GGML_ASSERT(i01 >= 0 && i01 < ne01) failed` in `ggml_compute_forward_get_rows`; only Gemma-arch embedders load. A newer `llama-cpp-2` bump may resolve it. (Not blocking — a QAT q4_0 Gemma quant works as an alternative.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full re-index with intelligence enabled balloons the WAL to multi-GB and never commits #42

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Full re-index with intelligence enabled balloons the WAL to multi-GB and never commits #42

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions