bug fix: valid but different LockPoints after a reorg

glozow commented at 11:26 am on December 6, 2021: member

I introduced a bug in #22677 (sorry! :sweat_smile:)

Mempool entries cache LockPoints, containing the first height/blockhash/CBlockIndex* at which the transaction becomes valid. During a reorg, we re-check timelocks on all mempool entries using CheckSequenceLocks(useExistingLockPoints=false) and remove any now-invalid entries. CheckSequenceLocks() also mutates the LockPoints passed in, and we update valid entries’ LockPoints using update_lock_points. Thus, update_lock_points(lp) needs to be called right after CheckSequenceLocks(lp), otherwise we lose the data in lp. I incorrectly assumed they could be called in separate loops.

The incorrect behavior introduced is: if we have a reorg in which a timelocked mempool transaction is still valid but becomes valid at a different block, the cached LockPoints will be incorrect.

This PR fixes the bug, adds a test, and adds an assertion at the end of removeForReorg() to check that all mempool entries’ lockpoints are valid. You can reproduce the bug by running the test added in the [test] commit on the code before the [bugfix] commit.

glozow commented at 11:27 am on December 6, 2021: member

cc @MarcoFalke, thanks for catching my bug :bug:

in test/functional/mempool_reorg.py:109 in 48d1b46f3e outdated

105@@ -106,8 +106,45 @@ def run_test(self):
106 
107         self.log.info("Check that the mempool is empty")
108         assert_equal(set(self.nodes[0].getrawmempool()), set())
109+        assert_equal(set(self.nodes[1].getrawmempool()), set())

MarcoFalke commented at 11:39 am on December 6, 2021:

nit: Why is this needed? The self.sync_all() in the next line should take care of that already. I think it should be fine to remove either this line or the next.

glozow commented at 12:33 pm on December 6, 2021:

removed

in test/functional/mempool_reorg.py:112 in 48d1b46f3e outdated

105@@ -106,8 +106,45 @@ def run_test(self):
106 
107         self.log.info("Check that the mempool is empty")
108         assert_equal(set(self.nodes[0].getrawmempool()), set())
109+        assert_equal(set(self.nodes[1].getrawmempool()), set())
110         self.sync_all()
111 
112 
113+        self.log.info("Simulate natural reorg")
114+        self.connect_nodes(0, 1)

MarcoFalke commented at 11:40 am on December 6, 2021:

nit: Why is this needed? Are they not connected already?

glozow commented at 12:32 pm on December 6, 2021:

They don’t stay in sync if I don’t call connect_nodes here… it also doesn’t work if I call connect_nodes(1, 0) instead. So my guess is that this creation of a manual connection from node0 to node1 adds a permission that we need.

MarcoFalke commented at 2:24 pm on December 6, 2021:

Ah ok, it is not possible to use compact blocks during IBD

in test/functional/mempool_reorg.py:116 in 48d1b46f3e outdated

111 
112 
113+        self.log.info("Simulate natural reorg")
114+        self.connect_nodes(0, 1)
115+        assert_equal(self.nodes[0].getblockcount(), self.nodes[1].getblockcount())
116+        assert_equal(set(self.nodes[0].getrawmempool()), set(self.nodes[0].getrawmempool()))

MarcoFalke commented at 11:41 am on December 6, 2021:

nit: Why are the asserts needed? The implicit self.sync_all in the next line should assert this already.

glozow commented at 12:32 pm on December 6, 2021:

removed

in test/functional/mempool_reorg.py:122 in 48d1b46f3e outdated

117+        self.generate(self.nodes[0], 3)
118+        # Create another transaction which is time-locked to two blocks in the future
119+        utxo = wallet.get_utxo()
120+        timelock_tx_2 = wallet.create_self_transfer(
121+            from_node=self.nodes[0],
122+            utxo_to_spend=utxo,

MarcoFalke commented at 11:41 am on December 6, 2021:

nit: Why is this needed? The wallet will pick a utxo by itself.

glozow commented at 12:32 pm on December 6, 2021:

removed

in test/functional/mempool_reorg.py:133 in 48d1b46f3e outdated

128+        assert_raises_rpc_error(-26, "non-final", self.nodes[1].sendrawtransaction, timelock_tx_2)
129+
130+        self.log.info("Disconnect nodes and mine separate forks")
131+        self.disconnect_nodes(0, 1)
132+        # Don't try to sync nodes because they are disconnected
133+        no_sync_func = lambda: True

MarcoFalke commented at 11:42 am on December 6, 2021:

nit: Why is this needed? You can use self.no_op

glozow commented at 12:33 pm on December 6, 2021:

ahh, didn’t know about that - fixed

MarcoFalke commented at 11:43 am on December 6, 2021: member

Concept ACK. Left some questions on the test.

glozow force-pushed on Dec 6, 2021

MarcoFalke added this to the milestone 23.0 on Dec 6, 2021

glozow added the label Bug on Dec 6, 2021

glozow added the label Mempool on Dec 6, 2021

glozow added the label Tests on Dec 6, 2021

in src/txmempool.h:316 in 034f13a14a outdated

311@@ -312,6 +312,19 @@ class CompareTxMemPoolEntryByAncestorFee
312     }
313 };
314 
315+/**
316+ * Update LockPoints in a mempool entry for which the caller only has a const iterator to.

MarcoFalke commented at 1:38 pm on December 6, 2021:

I think this comment can be removed. Mutable iterators in the mempool don’t exist and the function name is self-explanatory.

glozow commented at 3:32 pm on December 6, 2021:

removed

in src/validation.cpp:378 in f1f3f19f84 outdated

377@@ -378,6 +378,8 @@ void CChainState::MaybeUpdateMempoolForReorg(
378                 }

MarcoFalke commented at 1:41 pm on December 6, 2021:

in the second commit: Maybe mention that the bug was introduces in commit bedf246 ?

glozow commented at 3:32 pm on December 6, 2021:

added

in test/functional/mempool_reorg.py:119 in bd716bf774 outdated

114+        self.generate(self.nodes[0], 2)
115+        # Create another transaction which is time-locked to two blocks in the future
116+        timelock_tx_2 = wallet.create_self_transfer(
117+            from_node=self.nodes[0],
118+            mempool_valid=False,
119+            locktime=self.nodes[0].getblockcount() + 2

MarcoFalke commented at 2:07 pm on December 6, 2021:

nit: missing trailing ,.

glozow commented at 3:32 pm on December 6, 2021:

added

MarcoFalke approved

MarcoFalke commented at 2:56 pm on December 6, 2021: member

Too bad this can’t properly be tested on regtest. I guess the only way is via a pre-mined main-chain?

Concept ACK bd716bf774d21956ec7ae616d06a2a896568ed18 🙆

Signature:

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA512
 2
 3Concept ACK bd716bf774d21956ec7ae616d06a2a896568ed18 🙆
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
 7pUjr2gwAqMWws2MpZHhtGe0UDOPIk0PKWrc/Xbg991hVwlMwmHv2ZdgTm0aeYa15
 8zI0/9PVfKF7P1mTn5iWjJGaoAzsWPyRQAO7imfKwiYOUQnqBBP8s4qpjZxAyE+y4
 9n9CtBkoDxJd6FBY3/PaC+A/TQCXqrfoefLXMqkCjkFl5aF85YRyAvB69xpKDwMp0
10KJLqSga6ThHVkHJOXPoD09yjCEOkT/2K7z2XMTyzGMkzmTBJnhBaLd50Lfy31Ioj
11EQz4G7wEskfJkQBATIDzvy0nbwbH38hL30GEVolasAYLUdlxnRQr1QnZEnEUEoV1
12MrdzGtQYuBIUyEoaqJyvL/WVb+WEPy1XL/R3VmHf8vWbhAJK2W8RkGW1auE7WZwP
13W8HubXwO/dqmXqGd2A9QUaRN1RtYRKL7QwA5ffym++tJQRtpqT4/htuFH4BaIfv/
14+Z4d4NPkP8ySa1fhwWvMvLf2JM+5V9572LPInHka1B7t9z9DRjVCN3fb/nMOBzUp
15A2NkaSsm
16=9LGj
17-----END PGP SIGNATURE-----

MOVEONLY: update_lock_points to txmempool.h b6002b07a3

[bugfix] update lockpoints correctly during reorg

During a reorg, we re-check timelocks on all mempool entries using
CheckSequenceLocks(useExistingLockPoints=false) and remove any
now-invalid entries. CheckSequenceLocks() also mutates the LockPoints
passed in, and we update valid entries' LockPoints using
update_lock_points. Thus, update_lock_points(lp) needs to be called
right after CheckSequenceLocks(lp), otherwise we lose the data in lp.
commit bedf246 introduced a bug by separating those two loops.

b4adc5ad67

glozow force-pushed on Dec 6, 2021

MarcoFalke removed the label Tests on Dec 7, 2021

in src/validation.cpp:382 in b4adc5ad67

377@@ -378,6 +378,8 @@ void CChainState::MaybeUpdateMempoolForReorg(
378                 }
379             }
380         }
381+        // CheckSequenceLocks updates lp. Update the mempool entry LockPoints.
382+        if (!validLP) m_mempool->mapTx.modify(it, update_lock_points(lp));

naumenkogs commented at 9:21 am on December 7, 2021:

I was thinking “what if we skip those transactions with should_remove=true here?”. I guess, we’ll have the same original problem: they won’t be cleared because we would mistakenly think they have valid lp in removeForReorg?

Perhaps this deserves a comment extension, saying that txs to be removed must be updated here.

glozow commented at 9:41 am on December 7, 2021:

No, they’ll still be cleared - removeForReorg just applies check_final_and_mature on all entries and removes them if it returns true, it doesn’t check whether the lp is valid. I’m not 100% sure if we could skip the should_remove=true ones, but I think it would be fine. The CheckSequenceLocks function doesn’t guarantee lp to be reliable if it returns false, and the function doesn’t even get executed if CheckFinalTx returns false.

MarcoFalke commented at 10:09 am on December 7, 2021:

I agree that it is safe to skip if the tx will be removed anyway. When this was introduced in commit 14d6324a248df50cb79fbeb5b60a978687a3b64e it updates unconditionally.

vasild commented at 4:24 pm on December 7, 2021:

I was just about to post the same optimization suggestion: change this to if (!validLP && !should_remove).

I guess, we’ll have the same original problem: they won’t be cleared because we would mistakenly think they have valid lp in removeForReorg

I don’t see it that way - the tx will be removed if this lambda check_final_and_mature() returns true, so why bother updating that tx data? (I see no reason).

vasild commented at 8:26 am on December 8, 2021:

Sleeping over this, now I think that it is better to leave it as is - just if (!validLP) because skipping update for the to-be-removed transactions will spare some CPU cycles but will increase the coupling between check_final_and_mature() and removeForReorg() - the former will assume the later will remove the transactions and the later must be aware that some transactions have invalid lock points in case it decides to do something else before removal. Plus more complex code. Not worth it.

glozow commented at 12:14 pm on December 9, 2021:

skipping update for the to-be-removed transactions will spare some CPU cycles but will increase the coupling between check_final_and_mature() and removeForReorg()

Agreed 👍

naumenkogs commented at 9:53 am on December 7, 2021: member

utACK 28b60ce3122b29ffad2c60790d3bfe9a2ede2471

The bug is pretty scary, good thing we found it this soon, and the fix is straightforward. The test looks correct.

MarcoFalke commented at 10:13 am on December 7, 2021: member

Too bad this can’t properly be tested on regtest. I guess the only way is via a pre-mined main-chain?

Ah no. I don’t think this is possible to test at all. The lockpoints are just a cache and if it is outdated, evaluating it will always return false, in which case the lockpoints are re-evaluated. So this is not fixing a behaviour bug, but potentially improving performance a bit?

glozow commented at 11:18 am on December 7, 2021: member

Too bad this can’t properly be tested on regtest. I guess the only way is via a pre-mined main-chain?

Ah no. I don’t think this is possible to test at all. The lockpoints are just a cache and if it is outdated, evaluating it will always return false, in which case the lockpoints are re-evaluated. So this is not fixing a behaviour bug, but potentially improving performance a bit?

The “natural reorg to a new chain that is shorter (but has more work)” is probably not possible without significant changes to regtest chainparams, I think. Since this PR adds assertions for the lockpoints, making sure they’re updated is testable. And true, I suppose outdated lockpoints are not that bad, though now I’m wondering if it’s worth adding the assertions if it’s relatively harmless.

shaavan approved

shaavan commented at 2:49 pm on December 7, 2021: contributor

ACK 28b60ce3122b29ffad2c60790d3bfe9a2ede2471

The updates look good to me, and they seem to fix the bug introduced in #22677.

I was able to run the test successfully on the master branch. Thanks for fixing this quickly, @glozow!

p.s. While I ran all the tests on the PR branch using test/functional/test_runner.py the following tests failed:

0feature_coinstatsindex.py                          | ✖ Failed  | 2 s
1feature_rbf.py --legacy-wallet                     | ✖ Failed  | 2 s
2interface_bitcoin_cli.py                           | ✖ Failed  | 3 s
3wallet_transactiontime_rescan.py                   | ✖ Failed  | 6 s

Though it is not probably related to this PR. I still wanted to bring it to other contributors’ attention. I shall also try running these tests on the master branch to check if the error persists.

Update: The same tests fails with the master too (with top commit d20d6ac545159fb98bd9ac828678ddf20f5d016d). I am not sure what the reason is, though.

MarcoFalke commented at 2:52 pm on December 7, 2021: member

Since this PR adds assertions for the lockpoints

I was hoping there was a way to test this on current master without modifying the code or looking at the process internal memory, but yeah…

in src/txmempool.cpp:642 in 28b60ce312 outdated

638@@ -649,10 +639,7 @@ void CTxMemPool::removeForReorg(CChain& chain, std::function<bool(txiter)> check
639     }
640     RemoveStaged(setAllRemoves, false, MemPoolRemovalReason::REORG);
641     for (indexed_transaction_set::const_iterator it = mapTx.begin(); it != mapTx.end(); it++) {
642-        const LockPoints lp{it->GetLockPoints()};
643-        if (!TestLockPointValidity(chain, lp)) {
644-            mapTx.modify(it, update_lock_points(lp));
645-        }
646+        assert(TestLockPointValidity(chain, it->GetLockPoints()));

vasild commented at 4:31 pm on December 7, 2021:

Is it not better to add this assert as soon as the data is updated? That is, something like this:

 0diff --git i/src/txmempool.cpp w/src/txmempool.cpp
 1index 26cb2ff4b4..8149c6dcd7 100644
 2--- i/src/txmempool.cpp
 3+++ w/src/txmempool.cpp
 4@@ -635,15 +635,12 @@ void CTxMemPool::removeForReorg(CChain& chain, std::function<bool(txiter)> check
 5     }
 6     setEntries setAllRemoves;
 7     for (txiter it : txToRemove) {
 8         CalculateDescendants(it, setAllRemoves);
 9     }
10     RemoveStaged(setAllRemoves, false, MemPoolRemovalReason::REORG);
11-    for (indexed_transaction_set::const_iterator it = mapTx.begin(); it != mapTx.end(); it++) {
12-        assert(TestLockPointValidity(chain, it->GetLockPoints()));
13-    }
14 }
15 
16 void CTxMemPool::removeConflicts(const CTransaction &tx)
17 {
18     // Remove transactions which depend on inputs of tx, recursively
19     AssertLockHeld(cs);
20diff --git i/src/validation.cpp w/src/validation.cpp
21index 68729e4863..541b7be499 100644
22--- i/src/validation.cpp
23+++ w/src/validation.cpp
24@@ -377,12 +377,15 @@ void CChainState::MaybeUpdateMempoolForReorg(
25                     break;
26                 }
27             }
28         }
29         // CheckSequenceLocks updates lp. Update the mempool entry LockPoints.
30         if (!validLP) m_mempool->mapTx.modify(it, update_lock_points(lp));
31+        if (!should_remove) {
32+            assert(TestLockPointValidity(m_chain, it->GetLockPoints()));
33+        }
34         return should_remove;
35     };
36 
37     // We also need to remove any now-immature transactions
38     m_mempool->removeForReorg(m_chain, check_final_and_mature);
39     // Re-limit mempool size, in case we added any transactions

MarcoFalke commented at 5:05 pm on December 7, 2021:

Seems better to run it on all remaining txs unconditionally before returning as a sanity check.

glozow commented at 12:18 pm on December 9, 2021:

I agree it seems like a better sanity check to run at the end of removeForReorg, once all of the entries have been processed

in test/functional/mempool_reorg.py:113 in 28b60ce312 outdated

107@@ -108,6 +108,37 @@ def run_test(self):
108         assert_equal(set(self.nodes[0].getrawmempool()), set())
109         self.sync_all()
110 
111+        self.log.info("Simulate a natural reorg where mempool transaction stays valid but on a different block")
112+        self.connect_nodes(0, 1)
113+        # Mine a block to create mature coinbase

vasild commented at 4:35 pm on December 7, 2021:

This test passes also if the fix is missing. Is this expected? If yes, then what is the point of the test? I think tests should fail if the bug resurfaces.

On top of this PR:

git show b4adc5ad67 |git apply -R
recompile
run test/functional/mempool_reorg.py - it passes

MarcoFalke commented at 5:04 pm on December 7, 2021:

I don’t think it is possible to test this without modifying the source code (e.g. addding an assert) or inspecting the process internal memory.

vasild commented at 8:36 am on December 8, 2021:

So, what is the purpose of this test? It should be removed?

MarcoFalke commented at 8:37 am on December 8, 2021:

I think the goal is to do a reorg in this test without using invalidateblock

glozow commented at 9:07 am on December 8, 2021:

The test is supposed to fail without the bugfix if the asserts are there

MarcoFalke commented at 9:17 am on December 8, 2021:

I recall adding the assert and saw this and one other test failing already

vasild commented at 10:05 am on December 8, 2021:

What assert? Something extra, that is not in the PR?

MarcoFalke commented at 10:33 am on December 8, 2021:

Yes, assert.

As mentioned previously it is not possible to test this without modifying the source code or looking at internal memory.

Adding an assert is one way to modify the source code. Another way would be to return the struct over rpc, or log it to the debug log, or …

glozow commented at 12:13 pm on December 9, 2021:

What assert?

This one: https://github.com/bitcoin/bitcoin/pull/23683/files#diff-c065d4cd2398ad0dbcef393c5dfc53f465bf44723348892395fffd2fb3bac522R642

glozow commented at 12:17 pm on December 9, 2021:

In a previous version of this PR I think I had added the assert in a separate commit from the bugfix.

vasild commented at 1:50 pm on December 9, 2021:

I see (which assert). So I took latest master (7629efcc2c3a8a8a7c17b1300cd466ec6c8c1f3f) and applied this patch:

0--- i/src/txmempool.cpp
1+++ w/src/txmempool.cpp
2@@ -653,6 +653,7 @@ void CTxMemPool::removeForReorg(CChain& chain, std::function<bool(txiter)> check
3         if (!TestLockPointValidity(chain, lp)) {
4             mapTx.modify(it, update_lock_points(lp));
5         }
6+        assert(TestLockPointValidity(chain, lp));
7     }
8 }

test/functional/mempool_reorg.py then fails because the added assert is hit. This is unmodified mempool_reorg.py, without the extension from this PR. If the bug can be reproduced by adding an assert + unmodified mempool_reorg.py, then why bother to change (extend) the test?

I think the goal is to do a reorg in this test without using invalidateblock

Why bother if the test fails even without that?

MarcoFalke commented at 2:05 pm on December 9, 2021:

reorgs in the real network happen without invalidateblock, so I think we should attempt to not use the call

glozow commented at 3:38 pm on December 9, 2021:

The new test is not strictly necessary in order to hit this, but it’s not pointless. We didn’t have any natural reorgs. We used invalidateblock as a shortcut to do it manually. I can remove it if you insist, but I don’t really see why it’s problematic?

vasild commented at 10:19 am on December 10, 2021:

Alright, I agree that for the test the natural reorg is better than invalidateblock. But the change in this PR just adds a natural reorg, without removing the invalidateblock one. Shouldn’t it replace invalidateblock with a natural reorg?

I checked the code coverage with and without the extension of the test from this PR (28b60ce3122b29ffad2c60790d3bfe9a2ede2471) and it is the same. In other words, the newly added snippet does not extend the coverage. It exercises code that has already been exercised by preceding parts of mempool_reorg.py.

 0# compile
 1F="-fprofile-instr-generate -fcoverage-mapping" CXXFLAGS=${F} LDFLAGS=${F} ./configure and make as usual
 2
 3# run test and gather coverage data
 4LLVM_PROFILE_FILE="/tmp/coverage1/%m-%p.profraw" test/functional/mempool_reorg.py
 5llvm-profdata13 merge /tmp/coverage1/*.profraw -o /tmp/coverage1/all.profdata
 6
 7# remove topmost commit (test extension)
 8git show |git apply -R
 9
10# run test and gather coverage data
11LLVM_PROFILE_FILE="/tmp/coverage2/%m-%p.profraw" test/functional/mempool_reorg.py
12llvm-profdata13 merge /tmp/coverage2/*.profraw -o /tmp/coverage2/all.profdata
13
14# compare the coverage (both are identical)
15md5 -r /tmp/coverage*/all.profdata
1614196d89227bc6028834aeff32b79ff5 /tmp/coverage1/all.profdata
1714196d89227bc6028834aeff32b79ff5 /tmp/coverage2/all.profdata

I don’t insist on removing the test snippet. I am just trying to understand the point of it. So far, with my understanding, I am ~0 on it because I fail to see what value it adds. If the invalidate block reorg is removed and replaced by a natural reorg without reducing code coverage, then that would be a definite +1.

vasild commented at 4:39 pm on December 7, 2021: member

Concept ACK 28b60ce3122b29ffad2c60790d3bfe9a2ede2471

I verified that this PR reverts bedf246f1e2497a3641093c6e8fa11fb34dddac4 modulo some code reorganization that happened in the mean time. Also, it adds an assert that was not present before and a test (that always passes?).

fanquake deleted a comment on Dec 7, 2021

ariard commented at 2:09 am on December 8, 2021: member

Concept ACK a64078e

Ah no. I don’t think this is possible to test at all. The lockpoints are just a cache and if it is outdated, evaluating it will always return false, in which case the lockpoints are re-evaluated. So this is not fixing a behaviour bug, but potentially improving performance a bit?

Yes, I don’t think this is a bug introducing invalid transactions in the mempool. Just a perf regression.

If you have a reorg from chain H to chain H’’ and the lockpoints are not valid anymore on H’’, the cache is not going be updated accordingly.

Let’s say you have one more reorg from chain H’’ to chain H’’’. If H’’’ includes the max CBlockIndex (tested in TestLockPointValidity), the lockpoints are anew valid. If H’’’ does not include the max CBlockIndex, the lockpoints are going to be recomputed by CheckSequenceLocks to determine if the transaction is final.

So if this reasoning is correct, I also agree that caching lockpoints is a perf improvement, in the sense they save few coins fetches and block index walkes ?

vasild commented at 8:37 am on December 8, 2021: member

cc @MarcoFalke, thanks for catching my bug

How did you catch this? It looks pretty obscure to me.

MarcoFalke commented at 8:46 am on December 8, 2021: member

cc @MarcoFalke, thanks for catching my bug

How did you catch this? It looks pretty obscure to me.

Through code review, basically. Though, it was a bit tricky because there was never an externally observable logic bug, at most the performance difference. For a few hours I even thought the code could be removed (#23660) :sweat_smile:

jnewbery commented at 10:56 am on December 9, 2021: member

Change looks good to me. I agree it fixes the lockpoint invalidity bug introduced in #22677.

A few suggestions for the check_final_and_mature lambda:

A contributing factor to the the bug being introduced was misleading naming (of update_lock_points()) and a lack of documentation. Therefore, we should be extra explicit about what’s going on in these functions by adding plenty of code documentation.
the return value is the inverse of what I’d expect. I’d expect a check_blah() function to return true if the check passes and false if it fails.
the local should_remove variable seems unnecessary. The function can just return immediately if one of the checks fails.

I’ve included a suggested patch below. Feel free to take whatever you think is an improvement.

 0diff --git a/src/txmempool.cpp b/src/txmempool.cpp
 1index 26cb2ff4b4..c742548875 100644
 2--- a/src/txmempool.cpp
 3+++ b/src/txmempool.cpp
 4@@ -631,7 +631,7 @@ void CTxMemPool::removeForReorg(CChain& chain, std::function<bool(txiter)> check
 5 
 6     setEntries txToRemove;
 7     for (indexed_transaction_set::const_iterator it = mapTx.begin(); it != mapTx.end(); it++) {
 8-        if (check_final_and_mature(it)) txToRemove.insert(it);
 9+        if (!check_final_and_mature(it)) txToRemove.insert(it);
10     }
11     setEntries setAllRemoves;
12     for (txiter it : txToRemove) {
13diff --git a/src/validation.cpp b/src/validation.cpp
14index f53641e786..8da2e80510 100644
15--- a/src/validation.cpp
16+++ b/src/validation.cpp
17@@ -350,21 +350,38 @@ void CChainState::MaybeUpdateMempoolForReorg(
18     // the disconnectpool that were added back and cleans up the mempool state.
19     m_mempool->UpdateTransactionsFromBlock(vHashUpdate);
20 
21-    const auto check_final_and_mature = [this, flags=STANDARD_LOCKTIME_VERIFY_FLAGS](CTxMemPool::txiter it)
22+    // Check whether the transaction is still final/mature, and update
23+    // the transaction's cached lockpoints if necessary.
24+    // If this returns false, the mempool *must* remove the transaction, since
25+    // it is no longer valid for inclusion in the next block (and its cached
26+    // lockpoints may be invalid).
27+    const auto check_final_and_mature = [this, flags=STANDARD_LOCKTIME_VERIFY_FLAGS](CTxMemPool::txiter it) -> bool
28         EXCLUSIVE_LOCKS_REQUIRED(m_mempool->cs, ::cs_main) {
29-        bool should_remove = false;
30         AssertLockHeld(m_mempool->cs);
31         AssertLockHeld(::cs_main);
32         const CTransaction& tx = it->GetTx();
33+
34+        // Check transaction finality
35+        if (!CheckFinalTx(m_chain.Tip(), tx, flags)) {
36+            return false;
37+        }
38+
39+        // Check sequence locks
40         LockPoints lp = it->GetLockPoints();
41         const bool validLP{TestLockPointValidity(m_chain, lp)};
42         CCoinsViewMemPool view_mempool(&CoinsTip(), *m_mempool);
43-        if (!CheckFinalTx(m_chain.Tip(), tx, flags)
44-            || !CheckSequenceLocks(m_chain.Tip(), view_mempool, tx, flags, &lp, validLP)) {
45-            // Note if CheckSequenceLocks fails the LockPoints may still be invalid
46-            // So it's critical that we remove the tx and not depend on the LockPoints.
47-            should_remove = true;
48-        } else if (it->GetSpendsCoinbase()) {
49+
50+        if (!CheckSequenceLocks(m_chain.Tip(), view_mempool, tx, flags, &lp, validLP)) {
51+            // Sequence locks fail. The mempool *must* remove this transaction since
52+            // we're not going to update its cached lockpoints.
53+            return false;
54+        } else if (!validLP) {
55+            // CheckSequenceLocks has modified lp. Update the cached lockpoints.
56+             m_mempool->mapTx.modify(it, update_lock_points(lp));
57+        }
58+
59+        // Check coinbase spend maturity
60+        if (it->GetSpendsCoinbase()) {
61             for (const CTxIn& txin : tx.vin) {
62                 auto it2 = m_mempool->mapTx.find(txin.prevout.hash);
63                 if (it2 != m_mempool->mapTx.end())
64@@ -373,18 +390,19 @@ void CChainState::MaybeUpdateMempoolForReorg(
65                 assert(!coin.IsSpent());
66                 const auto mempool_spend_height{m_chain.Tip()->nHeight + 1};
67                 if (coin.IsSpent() || (coin.IsCoinBase() && mempool_spend_height - coin.nHeight < COINBASE_MATURITY)) {
68-                    should_remove = true;
69-                    break;
70+                    // Coinbase spend is now immature. Remove tx from mempool.
71+                    return false;
72                 }
73             }
74         }
75-        // CheckSequenceLocks updates lp. Update the mempool entry LockPoints.
76-        if (!validLP) m_mempool->mapTx.modify(it, update_lock_points(lp));
77-        return should_remove;
78+
79+        // Transaction is still final and mature. Cached lockpoints have been updated.
80+        return true;
81     };
82 
83-    // We also need to remove any now-immature transactions
84+    // Remove any transactions that are no longer final/mature, and update cached lockpoints.
85     m_mempool->removeForReorg(m_chain, check_final_and_mature);
86+
87     // Re-limit mempool size, in case we added any transactions
88     LimitMempoolSize(
89         *m_mempool,

vasild approved

vasild commented at 10:28 am on December 10, 2021: member

ACK b4adc5ad6769e4a5a6179dfff271cd4c9dc47a5b

Notice - the ACKed commit is the HEAD~ of this PR, without the topmost commit which adds a test which I do not think is harmful and I do not object. But I don’t see what value it adds because:

It passes with and without the fix
If an assert is added to the code to make the test fail, then mempool_reorg.py fails even without the extension added in this PR
The extension from this PR does not increase the code coverage

This is not an objection to the added test snippet, rather ~0 on it. I am happy if it gathers enough ACKs from other devs and gets merged.

glozow force-pushed on Dec 14, 2021

glozow commented at 12:51 pm on December 14, 2021: member

Notice - the ACKed commit is the HEAD~ of this PR, without the topmost commit which adds a test which I do not think is harmful and I do not object. But I don’t see what value it adds

I’ve removed the test. Perhaps a followup dedicated to test improvements can convert the invalidateblocks to natural reorgs.

A few suggestions for the check_final_and_mature lambda:

Thanks, I’ve applied (most of) your diff. I see what you mean by the return results being unexpected for something called “check.” My wish is to follow the convention of a unary predicate that returns true if something needs to be removed. I’ve renamed check to filter.

glozow force-pushed on Dec 14, 2021

in src/txmempool.cpp:641 in 2586226ec2 outdated

638@@ -649,10 +639,7 @@ void CTxMemPool::removeForReorg(CChain& chain, std::function<bool(txiter)> check
639     }
640     RemoveStaged(setAllRemoves, false, MemPoolRemovalReason::REORG);
641     for (indexed_transaction_set::const_iterator it = mapTx.begin(); it != mapTx.end(); it++) {

vasild commented at 4:50 pm on December 14, 2021:

This for is removed in commit [bugfix] update lockpoints during reorg only to be re-added later in commit [mempool] check that all LockPoints are valid reorg removals. I think the latter should be squashed into the former. It was like this in a previous incarnation of this PR and it made sense to me because the added assert is logically related to the rest of the changes:

0     for (indexed_transaction_set::const_iterator it = mapTx.begin(); it != mapTx.end(); it++) {
1-        const LockPoints lp{it->GetLockPoints()};
2-        if (!TestLockPointValidity(chain, lp)) {
3-            mapTx.modify(it, update_lock_points(lp));
4-        }
5+        assert(TestLockPointValidity(chain, it->GetLockPoints()));
6     }

glozow commented at 5:45 pm on December 16, 2021:

reverted back to original, squashed commit

in src/validation.cpp:370 in 2586226ec2 outdated

373+        if (!CheckFinalTx(m_chain.Tip(), tx, flags)) return true;
374+        // Note if CheckSequenceLocks fails the LockPoints may still be invalid
375+        // So it's critical that we remove the tx and not depend on the LockPoints.
376+        if (!CheckSequenceLocks(m_chain.Tip(), view_mempool, tx, flags, &lp, validLP)) {
377+            return true;
378+        } else if (!validLP) {

vasild commented at 5:07 pm on December 14, 2021:

nit: else is not necessary after return. May as well remove it also for consistency with the above if (!CheckFinalTx... (which does not include else after return).

0        }
1        if (!validLP) {

jnewbery commented at 7:11 am on December 21, 2021:

The !validLP check here is logically associated with the !CheckSequenceLocks() check, so it makes sense to have it in an else branch. Logically the two things are the same since the if branch contains an early return, but structuring it as if/else signals to the developer that the two things are connected.

in src/validation.cpp:372 in 2586226ec2 outdated

375+        // So it's critical that we remove the tx and not depend on the LockPoints.
376+        if (!CheckSequenceLocks(m_chain.Tip(), view_mempool, tx, flags, &lp, validLP)) {
377+            return true;
378+        } else if (!validLP) {
379+            // CheckSequenceLocks updates lp. Update the mempool entry LockPoints.
380+            m_mempool->mapTx.modify(it, update_lock_points(lp));

vasild commented at 5:11 pm on December 14, 2021:

In a previous incarnation of this PR, this modify() would have been executed regardless of the outcome of CheckFinalTx() and CheckSequenceLocks(). Not so anymore. From #23683 (review) I got the impression that it is better to update for all transactions, even the to-be-removed ones.

in src/validation.cpp:375 in 2586226ec2 outdated

378+        } else if (!validLP) {
379+            // CheckSequenceLocks updates lp. Update the mempool entry LockPoints.
380+            m_mempool->mapTx.modify(it, update_lock_points(lp));
381+        }
382+        // If the transaction spends any coinbase outputs, check maturity.
383+        if (it->GetSpendsCoinbase()) {

vasild commented at 5:14 pm on December 14, 2021:

Previously, this check and the body of the if would have been executed before updating the lock points. Now the order is reversed. Could the updated lock points change the outcome of any of this code, e.g. the mapTx.find() call?

in src/validation.cpp:354 in 2586226ec2 outdated

349@@ -350,21 +350,29 @@ void CChainState::MaybeUpdateMempoolForReorg(
350     // the disconnectpool that were added back and cleans up the mempool state.
351     m_mempool->UpdateTransactionsFromBlock(vHashUpdate);
352 
353-    const auto check_final_and_mature = [this, flags=STANDARD_LOCKTIME_VERIFY_FLAGS](CTxMemPool::txiter it)
354+    // Check whether the transaction is still final and, if it spends a coinbase output, mature.
355+    // Update the entry's cached LockPoints if needed. If this returns true, the tx would be invalid

vasild commented at 5:17 pm on December 14, 2021:

Update the entry’s cached LockPoints if needed

This is a bit vague with the current code. The lock points may or may not be updated if this function returns true and will be updated for sure if it returns false. I think it is better to update them in all cases or change this comment to say “Update the entry’s cached LockPoints if false is returned (and also in some cases if true is returned :disappointed:)”.

glozow force-pushed on Dec 16, 2021

glozow commented at 5:44 pm on December 16, 2021: member

Sorry for the churn. I had wanted to incorporate the suggestions in #23683 (comment) for the lambda, but I now think it’s best saved for a different PR.

jnewbery commented at 7:14 am on December 21, 2021: member

Sorry for the churn. I had wanted to incorporate the suggestions in #23683 (comment) for the lambda, but I now think it’s best saved for a different PR.

I think it’s very important that the comments are updated to clarify what this function is doing. A bug was introduced in this code because the code was unclear, so we should take the opportunity to clarify it to prevent a similar regression being added again.

This branch does fix the recent regression, but I think it’s significantly worse than the previous version since good tests and clarifying code comments have been removed.

mzumsande commented at 7:16 pm on December 27, 2021: member

Code Review ACK b4adc5ad6769e4a5a6179dfff271cd4c9dc47a5b

I agree that this fixes the “bug”, and verified that mempool_reorg.py fails if only the assert from b4adc5ad6769e4a5a6179dfff271cd4c9dc47a5b is added but not the fix. Can’t comment on any of the older versions/tests - I think that the refactoring/doc proposed by @jnewbery looks sensible, but could also be a follow-up PR.

Commit message nit:

During a reorg, we re-check timelocks on all mempool entries using CheckSequenceLocks(useExistingLockPoints=false) and remove any now-invalid entries.

That sounds as if we call CheckSequenceLocks(useExistingLockPoints=false) for all entries, but we call TestLockPointValidity() for all mempool entries, and call CheckSequenceLocks(useExistingLockPoints=false) only for those with invalid lockpoints.

hebasto commented at 11:23 am on December 28, 2021: member

Concept ACK.

in src/txmempool.cpp:655 in b4adc5ad67

638@@ -649,10 +639,7 @@ void CTxMemPool::removeForReorg(CChain& chain, std::function<bool(txiter)> check
639     }
640     RemoveStaged(setAllRemoves, false, MemPoolRemovalReason::REORG);
641     for (indexed_transaction_set::const_iterator it = mapTx.begin(); it != mapTx.end(); it++) {
642-        const LockPoints lp{it->GetLockPoints()};
643-        if (!TestLockPointValidity(chain, lp)) {
644-            mapTx.modify(it, update_lock_points(lp));
645-        }

hebasto commented at 1:33 pm on December 29, 2021:

The removed code is no-op actually. After the if statement, it->GetLockPoints() returns the same value as lp.

in src/txmempool.h:324 in b4adc5ad67

319+    void operator() (CTxMemPoolEntry &e) { e.UpdateLockPoints(lp); }
320+
321+private:
322+    const LockPoints& lp;
323+};
324+

hebasto commented at 1:36 pm on December 29, 2021:

Maybe drop update_lock_points altogether:

 0diff --git a/src/txmempool.h b/src/txmempool.h
 1index ba1d381a5..f87ecc9cd 100644
 2--- a/src/txmempool.h
 3+++ b/src/txmempool.h
 4@@ -312,16 +312,6 @@ public:
 5     }
 6 };
 7 
 8-struct update_lock_points
 9-{
10-    explicit update_lock_points(const LockPoints& _lp) : lp(_lp) { }
11-
12-    void operator() (CTxMemPoolEntry &e) { e.UpdateLockPoints(lp); }
13-
14-private:
15-    const LockPoints& lp;
16-};
17-
18 // Multi_index tag names
19 struct descendant_score {};
20 struct entry_time {};
21diff --git a/src/validation.cpp b/src/validation.cpp
22index 68729e486..a42e1f52a 100644
23--- a/src/validation.cpp
24+++ b/src/validation.cpp
25@@ -379,7 +379,7 @@ void CChainState::MaybeUpdateMempoolForReorg(
26             }
27         }
28         // CheckSequenceLocks updates lp. Update the mempool entry LockPoints.
29-        if (!validLP) m_mempool->mapTx.modify(it, update_lock_points(lp));
30+        if (!validLP) m_mempool->mapTx.modify(it, [lp](CTxMemPoolEntry& e) { e.UpdateLockPoints(lp); });
31         return should_remove;
32     };
33

?

hebasto approved

hebasto commented at 1:36 pm on December 29, 2021: member

ACK b4adc5ad6769e4a5a6179dfff271cd4c9dc47a5b

jnewbery commented at 3:58 pm on December 29, 2021: member

ACK b4adc5ad6769e4a5a6179dfff271cd4c9dc47a5b

But please do update the comments and add a test in a follow-up.

MarcoFalke approved

MarcoFalke commented at 1:29 pm on January 3, 2022: member

re-ACK b4adc5ad6769e4a5a6179dfff271cd4c9dc47a5b 🏁

Only changes:

Remove commit that adds test (bd716bf774)
Remove useless doxygen comment on struct update_lock_points
Expand the commit message of commit b4adc5ad67 “[bugfix] update lockpoints correctly during reorg”

Signature:

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA512
 2
 3re-ACK b4adc5ad6769e4a5a6179dfff271cd4c9dc47a5b 🏁
 4
 5Only changes:
 6* Remove commit that adds test (bd716bf774)
 7* Remove useless doxygen comment on struct update_lock_points
 8* Expand the commit message of commit b4adc5ad67 "[bugfix] update lockpoints correctly during reorg"
 9-----BEGIN PGP SIGNATURE-----
10
11iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
12pUjyJgwAiIcxIULHGXbeJdpJrr/yTOIL9za7KmSpiwuFZgB5alGaxZHVwNXz5tMs
13JgREgzSCQyGFl6otzezJKQ4ZwZ3+m6Xkjlop9DGQNgVXlRPs0hAUwePLj0c1a0Y7
14XpJaCS8U3Ovi/av/eTXBaAAVZ/tH/ngqiSo1Y7E41e4c12IGtE4JlgXaaI8+vaC8
154oF29XugN/FWuoY8WXqXGHwnRyxyhid9QR7uLfmQC+0tcEUW2bu9yDH+hWuFeNs1
168gu6mb76OuMqRQR/PRA7Jx+vN2WlN9eCua+TFVWvxS9vl06jOOrz2Yw4qC8gisgz
17J5rzAH8y5aMnztO9I2//smKojsxtierk2dIe2BXBGsgoo8GqxsbLocCHuTrvlt56
18eBJuUZLFYc3G5WtxD55cc6WpPGMVJJ/5sfptnYYNiaaca6bIYk1dmBklAQDdeyGg
19GtwDfvZIbYFMU2Aq+cAta59oo7Xtw9tSWS6Kp1sKJOQxQfhB39XBxjgc5qwZ7L1z
20hSX5amzs
21=W5Xo
22-----END PGP SIGNATURE-----

MarcoFalke merged this on Jan 3, 2022

MarcoFalke closed this on Jan 3, 2022

sidhujag referenced this in commit f76bc09f0e on Jan 4, 2022

glozow deleted the branch on Jan 4, 2022

MarcoFalke referenced this in commit 06b6369766 on Jan 19, 2022

Fabcien referenced this in commit 848f2f48c2 on Nov 10, 2022

Fabcien referenced this in commit 3b8d42aff6 on Nov 10, 2022

Fabcien referenced this in commit 096a1319b8 on Nov 10, 2022

glozow referenced this in commit a8080c0def on Feb 28, 2023

DrahtBot locked this on Mar 21, 2023

bug fix: valid but different LockPoints after a reorg #23683