Fix peer store management on channel closure#895
Open
Jolah1 wants to merge 4 commits intolightningdevkit:mainfrom
Open
Fix peer store management on channel closure#895Jolah1 wants to merge 4 commits intolightningdevkit:mainfrom
Jolah1 wants to merge 4 commits intolightningdevkit:mainfrom
Conversation
|
I've assigned @tnull as a reviewer! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR resolves both issues described in #745 related to peer store management on channel closure.
Fixed behaviors
channel_reestablishrecoveryProblems
1. Local force-close removes peer too early
When we force-closed a channel in
close_channel_internal, we immediately removed the peer from the store regardless of closure type:This broke recovery. The background reconnection task only reconnects to peers in the store. Removing the peer immediately guaranteed we would never reconnect, so
channel_reestablishcould never complete. This is especially problematic for LND peers, which may not handle force-closure error messages correctly and rely on us reconnecting to recover.2. Counterparty force-close never cleans up peer
When a counterparty force-closed their last channel with us, the
ChannelClosedevent handler did nothing with the peer store. The peer remained persisted indefinitely and the node kept attempting to reconnect on every background tick. wasted resources and misleading peer state.Solution
1. Retain peer on local force-close (
src/lib.rs)In
close_channel_internal, peers are now only removed for cooperativecloses:
For force-closes, the peer is retained in the store so the background reconnection task can fire,
channel_reestablishcan complete, and peer cleanup is deferred to theChannelClosedevent handler.2. Remove peer on counterparty-initiated closure (
src/event.rs)In the
LdkEvent::ChannelClosedhandler, we now check:If both conditions are true, the peer is removed from the store.
Implementation Details
LDK timing behavior
LDK emits
ChannelClosedbefore removing the channel from its internal state. A naive call tolist_channels_with_counterparty()inside the handler would still return the closing channel, making the "last channel" check always wrong. We fix this by excluding the closing channel explicitly:Ordering to avoid race conditions
Peer removal is performed before calling
add_event. This ensures consumers ofnext_event_async()always observe a consistent peer store when they react toChannelClosed— avoiding a race where the event is visible before the removal completes.Tests
Added two integration tests in
tests/integration_tests_rust.rs:test_peer_removed_on_counterparty_force_closeVerifies that after the counterparty force-closes the last channel, the peer is no longer persisted in the local peer store.
test_peer_retained_on_local_force_closeVerifies that after a locally initiated force-close, the peer remains persisted in the local peer store to allow background reconnection and
channel_reestablishrecovery.Both tests assert on
is_persistedrather than peer presence to correctly distinguish stored peers from transient TCP connections that may still appear inlist_peers()during teardown.Results
test test_peer_removed_on_counterparty_force_close ... ok
test test_peer_retained_on_local_force_close ... ok
test result: ok. 41 passed; 0 failed; 0 ignored; 0 measured
cargo build → no errors
cargo clippy → no new warnings
cargo fmt → applied