autobahn: add data_prune_after to bound data.State memory (CON-256)#3375
autobahn: add data_prune_after to bound data.State memory (CON-256)#3375wen-coding wants to merge 3 commits intomainfrom
Conversation
data.State.runPruning is a background goroutine that drops in-memory blocks/QCs/AppProposals older than a configurable duration, but the config knob (data.Config.PruneAfter) was never wired up — giga_router constructed data.NewState with only Committee set, so the pruner never spawned. data.State.PruneBefore (the giga_router-driven path based on cosmos-sdk RetainHeight) is also a no-op when the chain is configured with pruning="nothing" (Sei's localnode default, common in test setups), so in-memory data.State grew with the chain under sustained load and eventually OOM-killed nodes. Plumb DataPruneAfter through: AutobahnFileConfig.data_prune_after (json) → GigaRouterConfig.DataPruneAfter → data.Config.PruneAfter → data.State.runPruning Production default (gen-autobahn-config): 30m, gives operators plenty of recent history for /block, /tx, /trace_*, etc. while bounding memory under load. Localnode/test override (step4_config_override.sh): 1m, keeps data.State small under sustained-throughput tests where cosmos pruning is "nothing".
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3375 +/- ##
==========================================
+ Coverage 59.10% 59.16% +0.06%
==========================================
Files 2106 2097 -9
Lines 173525 172337 -1188
==========================================
- Hits 102558 101961 -597
+ Misses 62093 61529 -564
+ Partials 8874 8847 -27
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
fyi, pruneAfter in data was used in sei-v3, but in sei-chain it is application which is responsible for pruning, via retainHeight field in ResponseCommit. I currently don't know if we want to change ownership of pruning from application to consensus. IMO it would make sense, given that application should rather be solely concerned with the latest state at all times. However perhaps sei-chain app makes some assumptions wrt which blocks are available (I can imagine that it does, but I haven't looked into that yet). |
Sorry I'm confused. Not planning to change prune ownership in this PR (although we can discuss whether that should be done, I'm generally of the opinion this is consensus cleanup which should probably be controlled via consensus), I just want to set a smaller prune period in tests, so that in less powerful machines (my Mac) we can still run long throughput tests without the validators getting OOM. The 30m default in gen-autobahn-config is a defensive cap (still opt-out — operators can drop the field), not a replacement for app-driven pruning. |
|
Currently pruning is driven by retainHeight computed via sei-chain/sei-cosmos/baseapp/abci.go Line 680 in 52d368f |
The 1m localnet override is too aggressive for E2E test suites that take longer than a minute between cluster start and querying earlier blocks (e.g. eth_getBlockByNumber for fork-version fixtures, or sei_getTransactionReceiptExcludeTraceFail's seed-deploy-block lookup). Raising to 5m still actively exercises the pruning code path under sustained load (production runs accumulate millions of blocks; 5m is well within the in-memory window), while letting the integration suites finish their setup-then-query flow without hitting "height not available" on early blocks. Production default (gen-autobahn-config) remains 30m — unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
data.State.runPruning is a background goroutine that drops in-memory blocks/QCs/AppProposals older than a configurable duration, but the config knob (data.Config.PruneAfter) was never wired up — giga_router constructed data.NewState with only Committee set, so the pruner never spawned. data.State.PruneBefore (the giga_router-driven path based on cosmos-sdk RetainHeight) is also a no-op when the chain is configured with pruning="nothing" (the localnode default, common in test setups), so in-memory data.State grew with the chain under sustained load and eventually OOM-killed nodes.
Plumb DataPruneAfter through:
`AutobahnFileConfig.data_prune_after` (json) → `GigaRouterConfig.DataPruneAfter` → `data.Config.PruneAfter` → `data.State.runPruning`.
Production default (gen-autobahn-config): 30m, gives operators plenty of recent history for /block, /tx, /trace_*, etc. while bounding memory under load. Localnode/test override (step4_config_override.sh): 1m, keeps data.State small under sustained-throughput tests where cosmos pruning is "nothing".
Things done