Reconstruct `ChannelManager` forwarded HTLCs maps from `Channel`s #4227

valentinewallace · 2025-11-17T23:33:50Z

We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure.

Here we start this process by rebuilding ChannelManager::decode_update_add_htlcs, forward_htlcs, and pending_intercepted_htlcs from the Channels, which will soon be included in the ChannelMonitors as part of #4218.

test upgrading from a manager containing pending HTLC forwards that was serialized on <= LDK 0.2, i.e. where Channels will not contain committed update_add_htlcs
currently, no tests fail when we force using the new rebuilt decode_update_add_htlcs map and ignoring the legacy maps. This may indicate missing test coverage, since in theory we need to re-forward these HTLCs sometimes so they go back in the forward_htlcs map for processing
only use the old legacy maps if the manager and its channels were last serialized on <= 0.2. Currently this is not guaranteed

ldk-reviews-bot · 2025-11-17T23:33:52Z

👋 Thanks for assigning @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

codecov · 2025-11-18T00:49:14Z

Codecov Report

❌ Patch coverage is 75.00000% with 58 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.37%. Comparing base (de384ff) to head (a24dcff).
⚠️ Report is 45 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning/src/ln/channelmanager.rs	62.50%	46 Missing and 2 partials ⚠️
lightning/src/ln/channel.rs	80.00%	7 Missing and 2 partials ⚠️
lightning/src/ln/reload_tests.rs	98.30%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4227      +/-   ##
==========================================
+ Coverage   89.33%   89.37%   +0.03%     
==========================================
  Files         180      180              
  Lines      139042   139599     +557     
  Branches   139042   139599     +557     
==========================================
+ Hits       124219   124770     +551     
- Misses      12196    12240      +44     
+ Partials     2627     2589      -38

Flag	Coverage Δ
fuzzing	`35.13% <27.90%> (-0.84%)`	⬇️
tests	`88.71% <75.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lightning/src/ln/channelmanager.rs

valentinewallace · 2025-11-19T17:24:30Z

@joostjager helped me realize this may be way overcomplicated, essentially all tests pass on main when we simply read-and-discard the pending forwards maps. It's a bit suspicious though that all tests pass so it seems like additional test coverage could be useful.

Nvm, our test coverage for reload of these maps is just pretty incomplete.

valentinewallace · 2025-11-19T22:03:46Z

Updated with new testing and a few tweaks: diff

Will rebase next

joostjager

Did an initial review pass. Even though the change is pretty contained, I still found it difficult to fully follow what's happening.

Overall I think comments are very much on the light side in LDK, and the code areas touched in this PR are no exception in my opinion. Maybe, now that you've invested the time to build understanding, you can liberally sprinkle comments on your changes and nearby code?

lightning/src/ln/channel.rs

lightning/src/ln/channelmanager.rs

lightning/src/ln/reload_tests.rs

Similar to the other /target directories we ignore where a bunch of files are generated during testing.

valentinewallace · 2025-12-02T00:52:40Z

I still need to finish adding some comments and responding to feedback, but I believe the tests are done and this should be mostly there so pushed up the latest state.

tnull · 2025-12-02T11:35:58Z

I still need to finish adding some comments and responding to feedback, but I believe the tests are done and this should be mostly there so pushed up the latest state.

We had discussed whether to split out receive_htlcs as part of this work, since you're already reconstructing the forwards_htlcs map. Are you still considering doing this as part of this PR or rather not?

valentinewallace · 2025-12-02T23:05:10Z

I still need to finish adding some comments and responding to feedback, but I believe the tests are done and this should be mostly there so pushed up the latest state.

We had discussed whether to split out receive_htlcs as part of this work, since you're already reconstructing the forwards_htlcs map. Are you still considering doing this as part of this PR or rather not?

Probably not in this PR since it's already decently sized and has a clear scope, but I will look into it for follow-up!

tnull · 2025-12-03T10:02:56Z

I still need to finish adding some comments and responding to feedback, but I believe the tests are done and this should be mostly there so pushed up the latest state.

We had discussed whether to split out receive_htlcs as part of this work, since you're already reconstructing the forwards_htlcs map. Are you still considering doing this as part of this PR or rather not?

Probably not in this PR since it's already decently sized and has a clear scope, but I will look into it for follow-up!

SGTM!

tnull · 2025-12-03T11:00:16Z

Btw, feel free to ping me whenever you think this is ready for a secondary reviewer!

joostjager

I like the new commit structure. Left a few remarks and questions, some probably due to a few pieces of understanding missing in my mental model.

lightning/src/ln/channel.rs

lightning/src/ln/channelmanager.rs

joostjager · 2025-12-03T16:09:07Z

lightning/src/ln/channelmanager.rs

+								// Reconstruct `ChannelManager::decode_update_add_htlcs` from the serialized
+								// `Channel`, as part of removing the requirement to regularly persist the
+								// `ChannelManager`.
+								decode_update_add_htlcs.insert(


Can't this be done in the next loop over the monitors where the update_add_htlcs are deduped? This loop seems to be about payments.

Hmm, IMO the first loop is about repopulating HTLC maps from monitors, and the second loop is about pruning maps from monitors (hence having two loops so we don't prune before the maps are completely filled out), so this approach seems to fit.

I thought the first loop is about repopulating (outbound) payments. We probably mean the same thing.

But just for populating decode_update_add_htlcs, I don't think there is a two step processes need. I've been playing around a bit along this line. Not completely the same, but you get the idea.

diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs index 299ff60ba..5fedf6fb9 100644 --- a/lightning/src/ln/channelmanager.rs +++ b/lightning/src/ln/channelmanager.rs @@ -17670,21 +17670,6 @@ where let mut peer_state_lock = peer_state_mtx.lock().unwrap(); let peer_state = &mut *peer_state_lock; is_channel_closed = !peer_state.channel_by_id.contains_key(channel_id); - if let Some(chan) = peer_state.channel_by_id.get(channel_id) { - if let Some(funded_chan) = chan.as_funded() { - let inbound_committed_update_adds = - funded_chan.get_inbound_committed_update_adds(); - if !inbound_committed_update_adds.is_empty() { - // Reconstruct `ChannelManager::decode_update_add_htlcs` from the serialized - // `Channel`, as part of removing the requirement to regularly persist the - // `ChannelManager`. - decode_update_add_htlcs.insert( - funded_chan.context.outbound_scid_alias(), - inbound_committed_update_adds, - ); - } - } - } } if is_channel_closed { @@ -17719,9 +17704,17 @@ where for (channel_id, monitor) in args.channel_monitors.iter() { let mut is_channel_closed = true; let counterparty_node_id = monitor.get_counterparty_node_id(); + let mut inbound_committed_update_adds = Vec::new(); if let Some(peer_state_mtx) = per_peer_state.get(&counterparty_node_id) { let mut peer_state_lock = peer_state_mtx.lock().unwrap(); let peer_state = &mut *peer_state_lock; + if let Some(chan) = peer_state.channel_by_id.get(channel_id) { + if let Some(funded_chan) = chan.as_funded() { + inbound_committed_update_adds = + funded_chan.get_inbound_committed_update_adds(); + } + } + is_channel_closed = !peer_state.channel_by_id.contains_key(channel_id); } @@ -17746,12 +17739,9 @@ where // still have an entry for this HTLC in `forward_htlcs`, // `pending_intercepted_htlcs`, or `decode_update_add_htlcs`, we were apparently not // persisted after the monitor was when forwarding the payment. - dedup_decode_update_add_htlcs( - &mut decode_update_add_htlcs, - &prev_hop_data, - "HTLC was forwarded to the closed channel", - &args.logger, - ); + inbound_committed_update_adds.retain(|update_add| { + update_add.htlc_id != prev_hop_data.htlc_id + }); dedup_decode_update_add_htlcs( &mut decode_update_add_htlcs_legacy, &prev_hop_data, @@ -17877,6 +17867,8 @@ where } } + decode_update_add_htlcs.insert(*channel_id, inbound_committed_update_adds); + // Whether the downstream channel was closed or not, try to re-apply any payment // preimages from it which may be needed in upstream channels for forwarded // payments.

Are tests passing for you with that diff? I tried moving the repopulating down to the lower loop and got test failures

I didn't try, but they also fail for me. Did a few more restructurings below that make the tests pass. But the restructure did lead to the realization that when the channel is closed, there are no inbound_committed_update_adds. And that there is no point in deduplicating them and adding to decode_update_add_htlcs? Am i overseeing something here?

diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs index 299ff60ba..8a101e791 100644 --- a/lightning/src/ln/channelmanager.rs +++ b/lightning/src/ln/channelmanager.rs @@ -17670,21 +17670,6 @@ where let mut peer_state_lock = peer_state_mtx.lock().unwrap(); let peer_state = &mut *peer_state_lock; is_channel_closed = !peer_state.channel_by_id.contains_key(channel_id); - if let Some(chan) = peer_state.channel_by_id.get(channel_id) { - if let Some(funded_chan) = chan.as_funded() { - let inbound_committed_update_adds = - funded_chan.get_inbound_committed_update_adds(); - if !inbound_committed_update_adds.is_empty() { - // Reconstruct `ChannelManager::decode_update_add_htlcs` from the serialized - // `Channel`, as part of removing the requirement to regularly persist the - // `ChannelManager`. - decode_update_add_htlcs.insert( - funded_chan.context.outbound_scid_alias(), - inbound_committed_update_adds, - ); - } - } - } } if is_channel_closed { @@ -17717,12 +17702,21 @@ where } } for (channel_id, monitor) in args.channel_monitors.iter() { + let mut chan_state = None; let mut is_channel_closed = true; let counterparty_node_id = monitor.get_counterparty_node_id(); if let Some(peer_state_mtx) = per_peer_state.get(&counterparty_node_id) { let mut peer_state_lock = peer_state_mtx.lock().unwrap(); let peer_state = &mut *peer_state_lock; - is_channel_closed = !peer_state.channel_by_id.contains_key(channel_id); + if let Some(chan) = peer_state.channel_by_id.get(channel_id) { + if let Some(funded_chan) = chan.as_funded() { + chan_state = Some(( + funded_chan.context.outbound_scid_alias(), + funded_chan.get_inbound_committed_update_adds(), + )); + } + is_channel_closed = false; + } } if is_channel_closed { @@ -17746,12 +17740,12 @@ where // still have an entry for this HTLC in `forward_htlcs`, // `pending_intercepted_htlcs`, or `decode_update_add_htlcs`, we were apparently not // persisted after the monitor was when forwarding the payment. - dedup_decode_update_add_htlcs( - &mut decode_update_add_htlcs, - &prev_hop_data, - "HTLC was forwarded to the closed channel", - &args.logger, - ); + + // No need to do this because chan_state is None? + // + // chan_state.1.retain(|update_add| { + // update_add.htlc_id != prev_hop_data.htlc_id + // }); dedup_decode_update_add_htlcs( &mut decode_update_add_htlcs_legacy, &prev_hop_data, @@ -17875,6 +17869,12 @@ where completion_action, )); } + + // Nothing to add because the channel is closed?? + // + // if !chan_state.1.is_empty() { + // decode_update_add_htlcs.insert(chan_state.0, chan_state.1); + // } } // Whether the downstream channel was closed or not, try to re-apply any payment

After some staring and print statements, what's missing from your thoughts above/our offline discussion is the case where we add an HTLC forward to decode_update_add_htlcs when processing the inbound edge channel, then remove it again while processing the (closed) outbound edge channel. So I believe it is important to aggregate them all at once in the first loop and then prune them in the second loop, to avoid failing to prune if we process the outbound edge first.

Oh interesting, didn't think of that. Do you want to add that explanation as a code comment, so that the knowledge is recorded?

Addressed in #4289

lightning/src/ln/channelmanager.rs

lightning/src/ln/functional_test_utils.rs

lightning/src/ln/channelmanager.rs

ldk-reviews-bot · 2025-12-03T16:36:35Z

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

valentinewallace · 2025-12-03T23:37:02Z

I think the sermver CI failure is spurious

tnull · 2025-12-04T10:02:37Z

I think the sermver CI failure is spurious

Fixed by #4260.

joostjager · 2025-12-04T13:49:35Z

I've tried rebasing this PR on top of #4151 to see if that fixes some of the test failures: https://github.com/joostjager/rust-lightning/tree/reconstruct-mgr-fwd-htlcs-rebased

Unfortunately that is not the case. The same 44 failures as before, plus 2 extra being the tests added in this PR.

The new tests fail on the channel not being in the right state after deserialization of chan mgr. Probably unrelated to the tests, and a more general problem that blankets specific issues.

joostjager

Left some comments, but overall I think this PR is ready for reviewer two.

lightning/src/ln/channelmanager.rs

lightning/src/ln/reload_tests.rs

joostjager · 2025-12-04T15:40:48Z

lightning/src/ln/channelmanager.rs

+								// Reconstruct `ChannelManager::decode_update_add_htlcs` from the serialized
+								// `Channel`, as part of removing the requirement to regularly persist the
+								// `ChannelManager`.
+								decode_update_add_htlcs.insert(


I thought the first loop is about repopulating (outbound) payments. We probably mean the same thing.

joostjager · 2025-12-04T16:55:41Z

lightning/src/ln/channelmanager.rs

+								// Reconstruct `ChannelManager::decode_update_add_htlcs` from the serialized
+								// `Channel`, as part of removing the requirement to regularly persist the
+								// `ChannelManager`.
+								decode_update_add_htlcs.insert(


But just for populating decode_update_add_htlcs, I don't think there is a two step processes need. I've been playing around a bit along this line. Not completely the same, but you get the idea.

diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs index 299ff60ba..5fedf6fb9 100644 --- a/lightning/src/ln/channelmanager.rs +++ b/lightning/src/ln/channelmanager.rs @@ -17670,21 +17670,6 @@ where let mut peer_state_lock = peer_state_mtx.lock().unwrap(); let peer_state = &mut *peer_state_lock; is_channel_closed = !peer_state.channel_by_id.contains_key(channel_id); - if let Some(chan) = peer_state.channel_by_id.get(channel_id) { - if let Some(funded_chan) = chan.as_funded() { - let inbound_committed_update_adds = - funded_chan.get_inbound_committed_update_adds(); - if !inbound_committed_update_adds.is_empty() { - // Reconstruct `ChannelManager::decode_update_add_htlcs` from the serialized - // `Channel`, as part of removing the requirement to regularly persist the - // `ChannelManager`. - decode_update_add_htlcs.insert( - funded_chan.context.outbound_scid_alias(), - inbound_committed_update_adds, - ); - } - } - } } if is_channel_closed { @@ -17719,9 +17704,17 @@ where for (channel_id, monitor) in args.channel_monitors.iter() { let mut is_channel_closed = true; let counterparty_node_id = monitor.get_counterparty_node_id(); + let mut inbound_committed_update_adds = Vec::new(); if let Some(peer_state_mtx) = per_peer_state.get(&counterparty_node_id) { let mut peer_state_lock = peer_state_mtx.lock().unwrap(); let peer_state = &mut *peer_state_lock; + if let Some(chan) = peer_state.channel_by_id.get(channel_id) { + if let Some(funded_chan) = chan.as_funded() { + inbound_committed_update_adds = + funded_chan.get_inbound_committed_update_adds(); + } + } + is_channel_closed = !peer_state.channel_by_id.contains_key(channel_id); } @@ -17746,12 +17739,9 @@ where // still have an entry for this HTLC in `forward_htlcs`, // `pending_intercepted_htlcs`, or `decode_update_add_htlcs`, we were apparently not // persisted after the monitor was when forwarding the payment. - dedup_decode_update_add_htlcs( - &mut decode_update_add_htlcs, - &prev_hop_data, - "HTLC was forwarded to the closed channel", - &args.logger, - ); + inbound_committed_update_adds.retain(|update_add| { + update_add.htlc_id != prev_hop_data.htlc_id + }); dedup_decode_update_add_htlcs( &mut decode_update_add_htlcs_legacy, &prev_hop_data, @@ -17877,6 +17867,8 @@ where } } + decode_update_add_htlcs.insert(*channel_id, inbound_committed_update_adds); + // Whether the downstream channel was closed or not, try to re-apply any payment // preimages from it which may be needed in upstream channels for forwarded // payments.

lightning/src/ln/channel.rs

ldk-reviews-bot · 2025-12-08T00:00:29Z

🔔 1st Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

tnull

AFAICT the changes look good (though I might ofc lack context on the overall plan/next steps).

Just some minor comments on the tests.

lightning-tests/Cargo.toml

lightning-tests/src/upgrade_downgrade_tests.rs

elnosh

reaching the limits of my knowledge of this part of the code. Took some time to build context on the problem and understand the change. Is this the problematic part https://github.com/valentinewallace/rust-lightning/blob/77d6dfa3c2170c768e01302081d3550920a836af/lightning/src/ln/channelmanager.rs#L16945-L17029 that we want to get rid of? So in the future that channel state will come from the monitors and avoid this out-of-sync issues.

As far as my understanding goes, I think the changes here look good.

lightning/src/ln/channelmanager.rs

We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure. As part of this, we plan to store at least parts of Channels in ChannelMonitors, and that Channel data will be used in rebuilding the manager. Once we store update_adds in Channels, we can use them on restart when reconstructing ChannelManager maps such as forward_htlcs and pending_intercepted_htlcs. Upcoming commits will start doing this reconstruction.

We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure. As part of rebuilding ChannelManager forward HTLCs maps, we will also add a fix that will regenerate HTLCIntercepted events for HTLC intercepts that are present but have no corresponding event in the queue. That fix will use this new method.

We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure. We'll use this new util when reconstructing the ChannelManager::decode_update_add_htlcs map from Channel data in upcoming commits. While the Channel data is not included in the monitors yet, it will be in future work.

We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure. Soon we'll be reconstructing these now-legacy maps from Channel data (that will also be included in ChannelMonitors in future work), so rename them as part of moving towards not needing to persist them in ChannelManager.

Makes an upcoming commit cleaner

We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure. Here we start this process by rebuilding ChannelManager::decode_update_add_htlcs from the Channels, which will soon be included in the ChannelMonitors as part of a different series of PRs. The newly built map is not yet used but will be in the next commit.

We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure. Here we start this process by rebuilding ChannelManager::decode_update_add_htlcs, forward_htlcs, and pending_intercepted_htlcs from Channel data, which will soon be included in the ChannelMonitors as part of a different series of PRs. We also fix the reload_node test util to use the node's pre-reload config after restart. The previous behavior was a bit surprising and led to one of this commit's tests failing.

We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure. In the previous commit we started this process by rebuilding ChannelManager::decode_update_add_htlcs, forward_htlcs, and pending_intercepted_htlcs from the Channel data, which will soon be included in the ChannelMonitors as part of a different series of PRs. Here we test that HTLC forwards that were originally received on 0.2 can still be successfully forwarded using the new reload + legacy handling code that will be merged for 0.3.

valentinewallace · 2025-12-10T23:05:35Z

reaching the limits of my knowledge of this part of the code. Took some time to build context on the problem and understand the change. Is this the problematic part https://github.com/valentinewallace/rust-lightning/blob/77d6dfa3c2170c768e01302081d3550920a836af/lightning/src/ln/channelmanager.rs#L16945-L17029 that we want to get rid of? So in the future that channel state will come from the monitors and avoid this out-of-sync issues.

Yes, that's right :) We're trying to avoid hitting "A ChannelManager is stale compared to the current ChannelMonitor!", basically.

joostjager

Only non-blocking comments. Happy that you've managed to get a first step towards fixing the force closures landed!

joostjager · 2025-12-11T10:18:25Z

lightning/src/ln/channelmanager.rs

+								// Reconstruct `ChannelManager::decode_update_add_htlcs` from the serialized
+								// `Channel`, as part of removing the requirement to regularly persist the
+								// `ChannelManager`.
+								decode_update_add_htlcs.insert(


Oh interesting, didn't think of that. Do you want to add that explanation as a code comment, so that the knowledge is recorded?

lightning/src/ln/channelmanager.rs

joostjager · 2025-12-11T10:27:34Z

lightning/src/ln/channelmanager.rs

+					HTLCForwardInfo::FailHTLC { htlc_id, .. }
+					| HTLCForwardInfo::FailMalformedHTLC { htlc_id, .. } => (*scid, *htlc_id),
+				};
+				if let Some(pending_update_adds) = decode_update_add_htlcs.get_mut(&prev_scid) {


get_mut -> get?

lightning/src/ln/channelmanager.rs

joostjager · 2025-12-11T10:32:51Z

lightning/src/ln/channelmanager.rs

+			match decode_update_add_htlcs.entry(scid) {
+				hash_map::Entry::Occupied(mut update_adds) => {
+					for legacy_update_add in legacy_update_adds {
+						if !update_adds.get().contains(&legacy_update_add) {


store .get_mut() in a var outside of the loop and reuse it inside?

lightning/src/ln/functional_test_utils.rs

lightning/src/ln/reload_tests.rs

joostjager · 2025-12-11T10:42:07Z

lightning/src/ln/channel.rs

+		/// Used to rebuild `ChannelManager` HTLC state on restart. Previously the manager would track
+		/// and persist all HTLC forwards and receives itself, but newer LDK versions avoid relying on
+		/// its persistence and instead reconstruct state based on `Channel` and `ChannelMonitor` data.
+		update_add_htlc_opt: Option<msgs::UpdateAddHTLC>,


Just a check question: is the disk space requirement for this acceptable?

Should be, what with per-channel HTLC limits. If a node has a ton of forwards going on, their node is expected to be more resourced, I would think.

tnull

LGTM, mod pending comments/questions.

valentinewallace · 2025-12-11T15:49:29Z

Will address the tweaks in the follow-up!

TheBlueMatt · 2025-12-11T16:40:36Z

lightning/src/ln/channel.rs

 										to_forward_infos.push((forward_info, htlc.htlc_id));
-										htlc.state = InboundHTLCState::Committed;
+										htlc.state = InboundHTLCState::Committed {
+											// HTLCs will only be in state `InboundHTLCResolution::Resolved` if they were


So should this be an assertion that gets validated more explicitly on deserialization? This comment seems to imply this code is not reachable and will somehow misbehave, the second isn't clear to me but we should validate the first either way.

What I meant in this comment is that this code is reachable but it shouldn't matter that we set update_add_htlc_opt to None here because we already require this HTLC to not be here on restart, according to the changelog (and restart is the only time when update_add_htlc_opt is used).

We can start returning explicitly DecodeError::InvalidValue if this variant is present on read -- to be clear, is this what you're advocating for?

I unfortunately am not sure the exact misbehavior that can occur, according to the changelog it comes from #3355 but it isn't explained in much detail that I can see

we already require this HTLC to not be here on restart, according to the changelog (and restart is the only time when update_add_htlc_opt is used).
We can start returning explicitly DecodeError::InvalidValue if this variant is present on read -- to be clear, is this what you're advocating for?

Right, if there's some implicit assumption based on the changelog and upgrade behavior, we really need to be validating that before allowing an object to be deserialized.

TheBlueMatt · 2025-12-11T16:42:54Z

lightning/src/ln/channel.rs

-				&InboundHTLCState::Committed => {
+				&InboundHTLCState::Committed { ref update_add_htlc_opt } => {
 					3u8.write(writer)?;
+					inbound_committed_update_adds.push(update_add_htlc_opt.clone());


Can we not avoid cloneing during serialization?

Addressed in #4289

TheBlueMatt · 2025-12-11T19:02:55Z

lightning/src/ln/channel.rs

+		/// Used to rebuild `ChannelManager` HTLC state on restart. Previously the manager would track
+		/// and persist all HTLC forwards and receives itself, but newer LDK versions avoid relying on
+		/// its persistence and instead reconstruct state based on `Channel` and `ChannelMonitor` data.
+		update_add_htlc_opt: Option<msgs::UpdateAddHTLC>,


Bleh, writing the full onion packet for every pending HTLC (not just ones waiting to be forwarded) on every write (today in ChannelManager but also eventually in ChannelMonitorUpdates) is pretty gross (especially if we add an extra onion in the HTLCs for extra data, which we kinda need to do). Maybe the solution is to remove this field once the HTLC has been (irrevocably, by persisting a monitor update) forwarded, though it might be a bit annoying code-wise.

Are we sure this will cause real performance issues? 1000 outstanding inbound HTLCs = 1.3 extra MB per manager persist. It seems potentially like a premature optimization, or at least like the LN might have to really take off in between the time we merge this change and when we release support for not having to persist the manager.

That said, it might be easy to remove these irrevocable forwards on timer_tick_occurred? Or possibly return them from the call to channel.monitor_updating_restored(..) and remove them then.

An extra MiB (on an object that's usually smaller than, or close to for a large routing node, a single MiB or so - its currently 255K on my node) is a lot of extra overhead for an object we write all the time. But I'm also interested in its size in ChannelMonitorUpdates - storing all pending HTLCs takes ChannelMonitorUpdates from small diffs to huge objects, adding a ton of write time...

That said, it might be easy to remove these irrevocable forwards on timer_tick_occurred?

That seems like a lot of overhead to iterate every HTLC and see if its been forwarded on each timer tick.

Or possibly return them from the call to channel.monitor_updating_restored(..) and remove them then.

This seems simpler.

lightning/src/ln/channelmanager.rs

TheBlueMatt · 2025-12-11T19:24:20Z

lightning/src/ln/channel.rs

+				},
+				_ => None,
+			})
+			.collect()


Bleh, the common case as of this PR is that we don't actually use the newly-cloned onion, they're generally always forwarded already. Maybe we should just fix that part and not worry about it here, but it would be nice to avoid copying everything every time just to not need it.

TheBlueMatt · 2025-12-11T19:32:17Z

lightning/src/ln/channelmanager.rs

+			decode_update_add_htlcs_legacy.unwrap_or_else(|| new_hash_map());
+		let mut pending_intercepted_htlcs_legacy =
+			pending_intercepted_htlcs_legacy.unwrap_or_else(|| new_hash_map());
+		let mut decode_update_add_htlcs = new_hash_map();


Why do we have logic to decode into both simultaneously. If we have pending_intercepted_htlcs_legacy we can use it, if we don't, we can insert into it. Trying to handle both at the same time (ie preferring the new-style decoding over the old style) just seems like extra work for no reason?

Part of the reason was to get test coverage for the new paths and gain confidence in them with prod usage. Seems nice to have the same behavior in test and prod.

But we could instead always prefer the old maps in prod + avoid writing the old maps in tests only, would you prefer something along those lines?

That might be simpler. I'm fine with wanting to use the new code, but we have a lot of extra logic to handle both objects which seems like we're trying too hard when we could just do what you describe here. Also nothing wrong with keeping the old logic in prod for now...

Addressed in #4289

lightning/src/ln/channelmanager.rs

valentinewallace commented Nov 19, 2025

View reviewed changes

lightning/src/ln/channelmanager.rs Show resolved Hide resolved

valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch from 0b4eb68 to ce2ccac Compare November 19, 2025 21:56

valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch from ce2ccac to 1e86521 Compare November 19, 2025 22:03

joostjager reviewed Nov 20, 2025

View reviewed changes

lightning/src/ln/reload_tests.rs Show resolved Hide resolved

valentinewallace added this to Weekly Goals Nov 20, 2025

Git-ignore lightning-tests/target

4561bc5

Similar to the other /target directories we ignore where a bunch of files are generated during testing.

valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch from 1e86521 to 26c6dc7 Compare December 2, 2025 00:51

valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch from 26c6dc7 to ad79155 Compare December 2, 2025 23:00

valentinewallace marked this pull request as ready for review December 2, 2025 23:01

ldk-reviews-bot requested a review from jkczyz December 2, 2025 23:02

valentinewallace requested review from joostjager and removed request for jkczyz December 2, 2025 23:03

valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch 2 times, most recently from 6b919ce to 0b83e80 Compare December 3, 2025 01:19

joostjager reviewed Dec 3, 2025

View reviewed changes

valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch from 0b83e80 to dfef41a Compare December 3, 2025 23:23

valentinewallace requested a review from joostjager December 3, 2025 23:53

joostjager reviewed Dec 4, 2025

View reviewed changes

valentinewallace requested a review from tnull December 4, 2025 19:22

valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch from dfef41a to 77d6dfa Compare December 5, 2025 23:55

tnull reviewed Dec 8, 2025

View reviewed changes

lightning-tests/Cargo.toml Outdated Show resolved Hide resolved

lightning-tests/src/upgrade_downgrade_tests.rs Show resolved Hide resolved

lightning-tests/src/upgrade_downgrade_tests.rs Show resolved Hide resolved

elnosh reviewed Dec 8, 2025

View reviewed changes

lightning/src/ln/channelmanager.rs Show resolved Hide resolved

valentinewallace added 8 commits December 10, 2025 17:39

Tweak pending_htlc_intercepts ser on manager read

7c4d021

Makes an upcoming commit cleaner

valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch from 77d6dfa to a24dcff Compare December 10, 2025 22:50

valentinewallace requested review from joostjager and tnull December 10, 2025 23:02

joostjager approved these changes Dec 11, 2025

View reviewed changes

tnull approved these changes Dec 11, 2025

View reviewed changes

valentinewallace merged commit d1f819d into lightningdevkit:main Dec 11, 2025
20 checks passed

github-project-automation bot moved this to Done in Weekly Goals Dec 11, 2025

valentinewallace self-assigned this Dec 11, 2025

TheBlueMatt reviewed Dec 11, 2025

View reviewed changes

TheBlueMatt mentioned this pull request Dec 11, 2025

#4227 Followups #4280

Open

TheBlueMatt reviewed Dec 11, 2025

View reviewed changes

lightning/src/ln/channelmanager.rs Show resolved Hide resolved

lightning/src/ln/channelmanager.rs Show resolved Hide resolved

This was referenced Dec 15, 2025

Stale ChannelManager causes channel force-closures #4286

Open

Follow-ups to #4227 (Part 1) #4289

Draft

Reconstruct ChannelManager forwarded HTLCs maps from Channels #4227

Reconstruct ChannelManager forwarded HTLCs maps from Channels #4227

Uh oh!

Conversation

valentinewallace commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldk-reviews-bot commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

valentinewallace commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valentinewallace commented Nov 19, 2025

Uh oh!

joostjager left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

valentinewallace commented Dec 2, 2025

Uh oh!

tnull commented Dec 2, 2025

Uh oh!

valentinewallace commented Dec 2, 2025

Uh oh!

tnull commented Dec 3, 2025

Uh oh!

tnull commented Dec 3, 2025

Uh oh!

joostjager left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ldk-reviews-bot commented Dec 3, 2025

Uh oh!

valentinewallace commented Dec 3, 2025

Uh oh!

tnull commented Dec 4, 2025

Uh oh!

Reconstruct `ChannelManager` forwarded HTLCs maps from `Channel`s #4227

Reconstruct `ChannelManager` forwarded HTLCs maps from `Channel`s #4227

valentinewallace commented Nov 17, 2025 •

edited

Loading

ldk-reviews-bot commented Nov 17, 2025 •

edited

Loading

codecov bot commented Nov 18, 2025 •

edited

Loading

valentinewallace commented Nov 19, 2025 •

edited

Loading

joostjager left a comment •

edited

Loading