Skip to content

autobahn: add data_prune_after to bound data.State memory (CON-256)#3375

Open
wen-coding wants to merge 3 commits intomainfrom
wen/data_prune_after_for_autobahn
Open

autobahn: add data_prune_after to bound data.State memory (CON-256)#3375
wen-coding wants to merge 3 commits intomainfrom
wen/data_prune_after_for_autobahn

Conversation

@wen-coding
Copy link
Copy Markdown
Contributor

data.State.runPruning is a background goroutine that drops in-memory blocks/QCs/AppProposals older than a configurable duration, but the config knob (data.Config.PruneAfter) was never wired up — giga_router constructed data.NewState with only Committee set, so the pruner never spawned. data.State.PruneBefore (the giga_router-driven path based on cosmos-sdk RetainHeight) is also a no-op when the chain is configured with pruning="nothing" (the localnode default, common in test setups), so in-memory data.State grew with the chain under sustained load and eventually OOM-killed nodes.

Plumb DataPruneAfter through:
`AutobahnFileConfig.data_prune_after` (json) → `GigaRouterConfig.DataPruneAfter` → `data.Config.PruneAfter` → `data.State.runPruning`.

Production default (gen-autobahn-config): 30m, gives operators plenty of recent history for /block, /tx, /trace_*, etc. while bounding memory under load. Localnode/test override (step4_config_override.sh): 1m, keeps data.State small under sustained-throughput tests where cosmos pruning is "nothing".

Things done

  • AutobahnFileConfig.DataPruneAfter
  • GigaRouterConfig.DataPruneAfter + thread into data.Config
  • node/setup.go pass-through from autobahn.json
  • gen-autobahn-config production default (30m)
  • step4_config_override.sh localnode override (1m)
  • gofmt, vet clean

data.State.runPruning is a background goroutine that drops in-memory
blocks/QCs/AppProposals older than a configurable duration, but the
config knob (data.Config.PruneAfter) was never wired up — giga_router
constructed data.NewState with only Committee set, so the pruner
never spawned. data.State.PruneBefore (the giga_router-driven path
based on cosmos-sdk RetainHeight) is also a no-op when the chain
is configured with pruning="nothing" (Sei's localnode default,
common in test setups), so in-memory data.State grew with the chain
under sustained load and eventually OOM-killed nodes.

Plumb DataPruneAfter through:
  AutobahnFileConfig.data_prune_after (json) →
  GigaRouterConfig.DataPruneAfter →
  data.Config.PruneAfter →
  data.State.runPruning

Production default (gen-autobahn-config): 30m, gives operators
plenty of recent history for /block, /tx, /trace_*, etc. while
bounding memory under load.

Localnode/test override (step4_config_override.sh): 1m, keeps
data.State small under sustained-throughput tests where cosmos
pruning is "nothing".
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMay 7, 2026, 9:22 PM

@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.16%. Comparing base (eb17459) to head (f1737bf).

Files with missing lines Patch % Lines
sei-tendermint/node/setup.go 50.00% 1 Missing and 1 partial ⚠️
...int/cmd/tendermint/commands/gen_autobahn_config.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3375      +/-   ##
==========================================
+ Coverage   59.10%   59.16%   +0.06%     
==========================================
  Files        2106     2097       -9     
  Lines      173525   172337    -1188     
==========================================
- Hits       102558   101961     -597     
+ Misses      62093    61529     -564     
+ Partials     8874     8847      -27     
Flag Coverage Δ
sei-chain-pr 68.93% <66.66%> (?)
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-tendermint/config/autobahn.go 28.57% <ø> (ø)
sei-tendermint/internal/p2p/giga_router.go 69.45% <100.00%> (+0.45%) ⬆️
...int/cmd/tendermint/commands/gen_autobahn_config.go 18.18% <0.00%> (-0.34%) ⬇️
sei-tendermint/node/setup.go 69.23% <50.00%> (-0.35%) ⬇️

... and 85 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wen-coding wen-coding changed the title autobahn: add data_prune_after to bound data.State memory (CON-257) autobahn: add data_prune_after to bound data.State memory (CON-256) May 4, 2026
@pompon0
Copy link
Copy Markdown
Contributor

pompon0 commented May 4, 2026

fyi, pruneAfter in data was used in sei-v3, but in sei-chain it is application which is responsible for pruning, via retainHeight field in ResponseCommit. I currently don't know if we want to change ownership of pruning from application to consensus. IMO it would make sense, given that application should rather be solely concerned with the latest state at all times. However perhaps sei-chain app makes some assumptions wrt which blocks are available (I can imagine that it does, but I haven't looked into that yet).

@wen-coding
Copy link
Copy Markdown
Contributor Author

fyi, pruneAfter in data was used in sei-v3, but in sei-chain it is application which is responsible for pruning, via retainHeight field in ResponseCommit. I currently don't know if we want to change ownership of pruning from application to consensus. IMO it would make sense, given that application should rather be solely concerned with the latest state at all times. However perhaps sei-chain app makes some assumptions wrt which blocks are available (I can imagine that it does, but I haven't looked into that yet).

Sorry I'm confused. Not planning to change prune ownership in this PR (although we can discuss whether that should be done, I'm generally of the opinion this is consensus cleanup which should probably be controlled via consensus), I just want to set a smaller prune period in tests, so that in less powerful machines (my Mac) we can still run long throughput tests without the validators getting OOM. The 30m default in gen-autobahn-config is a defensive cap (still opt-out — operators can drop the field), not a replacement for app-driven pruning.

@pompon0
Copy link
Copy Markdown
Contributor

pompon0 commented May 5, 2026

Currently pruning is driven by retainHeight computed via

func (app *BaseApp) GetBlockRetentionHeight(commitHeight int64) (int64, error) {
. If pruning in tests does not work, we should first check whether this function actually advances retainHeight and fix it if it doesn't (or if it doesn't keep up with the block rate).

wen-coding and others added 2 commits May 7, 2026 12:55
The 1m localnet override is too aggressive for E2E test suites that
take longer than a minute between cluster start and querying earlier
blocks (e.g. eth_getBlockByNumber for fork-version fixtures, or
sei_getTransactionReceiptExcludeTraceFail's seed-deploy-block lookup).

Raising to 5m still actively exercises the pruning code path under
sustained load (production runs accumulate millions of blocks; 5m is
well within the in-memory window), while letting the integration suites
finish their setup-then-query flow without hitting "height not
available" on early blocks.

Production default (gen-autobahn-config) remains 30m — unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants