validation: Add missing mempool locks

MarcoFalke commented at 9:15 pm on September 10, 2018: member

Take the mempool read lock during reorgs, so that we don’t accidentally read an inconsistent mempool.

MarcoFalke added the label Validation on Sep 10, 2018

in src/validation.cpp:2689 in fa3e0471a8 outdated

2685@@ -2686,6 +2686,7 @@ bool CChainState::ActivateBestChain(CValidationState &state, const CChainParams&
2686 
2687         {
2688             LOCK(cs_main);
2689+            LOCK(::mempool.cs);

practicalswift commented at 9:34 pm on September 10, 2018:

Nit: LOCK2(cs_main, ::mempool.cs);

ryanofsky commented at 9:07 pm on September 12, 2018:

What’s the reason for acquiring mempool lock at a broad scope instead of more narrowly where the mempool is actually accessed? Would be nice to have comments in the places where this PR adds locking to indicate purpose of locks and reasoning behind placement.

ryanofsky commented at 5:40 pm on November 13, 2018:

In commit “validation: Add missing mempool locks” (dea5dbf2d5ca0177d3a4be665790288bef40e27e)

It’s not clear why the lock is being acquired here instead of before calling ActivateBestChainStep. It’s fine to acquire locks at a broad scope, but it’s scary to work with code that acquires locks in unexpected places and gives you no clue where or whether the locks are safe to move. Would suggest either moving the lock to where it is more obviously needed or commenting:

0// Mempool lock for ActivateBestChainStep calls below. (Could be released between calls.)
1LOCK(::mempool.cs);

Or else explaining why it’s held more broadly.

MarcoFalke commented at 1:53 am on November 21, 2018:

It is held just as long as cs_main, because anything shorter would allow other threads (potentially rpc threads) to read from the mempool and get a result that is most likely useless in case we are in the middle of a larger reorg.

DrahtBot commented at 11:50 pm on September 10, 2018: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#16273 (refactor: Remove unused includes by practicalswift)
#16194 (refactor: share blockmetadata with BlockManager by jamesob)
#16066 (mempool: Skip estimator if block is older than X by promag)
#15192 (Add missing cs_main locks in ThreadImport(…)/Shutdown(…)/gettxoutsetinfo(…)/InitScriptExecutionCache(). Add missing locking annotations. by practicalswift)
#13949 (Introduce MempoolObserver interface to break “policy/fees -> txmempool -> policy/fees” circular dependency by Empact)
#10443 (Add fee_est tool for debugging fee estimation code by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

sdaftuar commented at 2:26 pm on September 14, 2018: member

Is the idea that you’re trying to prevent RPC users from having an inconsistent state, like if they call getbestblockhash() and then call getrawmempool(), the latter should be consistent with the former?

That may be reasonable, but I think it’d be good to have some documentation (whether a project document or even a gist explaining our thinking) explaining the consistency guarantees for RPC calls with respect to the mempool, validation, and the wallet. I feel like this kind of thing has come up before in a haphazard way and caused issues when we didn’t think things through or missed an implication of a change.

(Also, I think it would aid RPC users if they knew what our consistency guarantees were.)

ryanofsky commented at 5:31 pm on September 14, 2018: member

Is the idea that you’re trying to prevent RPC users from having an inconsistent state

From the description, I assumed the intent of the PR was to prevent RPCs like getmempoolentry, which acquire mempool.cs but not cs_main, from returning garbage or segfaulting during reorgs because UpdateMempoolForReorg and InvalidateBlock functions were apparently not acquiring mempool.cs while updating the mempool.

But actually it seems like in all the places where the mempool is updated, the code already has both cs_main and mempool.cs locks. So it should be perfectly safe to read the mempool while holding either of the two locks, and trying to acquire both is wasteful.

So maybe this PR is not doing what it intended to do, and instead it would be better just to add comments and fix thread safety annotations somehow. Maybe it would be possible to annotate cs_main and mempool.cs individually as shared locks needed for reading mempool data, and cs_main+mempool.cs both together as an exclusive lock needed for updating mempool data?

MarcoFalke commented at 5:50 pm on September 14, 2018: member

@ryanofsky With inconsistent I don’t mean invalid reads, but rather transactions that are invalid (and still in the process of removal) as of our current block tip.

ryanofsky commented at 7:05 pm on September 14, 2018: member

Code utACK fa3e0471a81252e333886b97d8d33bb275061d42. Code change seems perfectly fine, and it could perhaps lead to more consistency in some cases, but like the previous comment #14193 (comment) says, it’s unclear which cases are better off after this change or what case motivated this PR. More comments and documentation would be helpful.

MarcoFalke force-pushed on Oct 27, 2018

MarcoFalke commented at 3:04 pm on October 27, 2018: member

@sdaftuar, @ryanofsky: Added a test with comments to illustrate my motivation a bit more.

MarcoFalke force-pushed on Oct 27, 2018

MarcoFalke force-pushed on Oct 28, 2018

in src/validation.cpp:2798 in dea5dbf2d5 outdated

2794@@ -2794,6 +2795,7 @@ bool PreciousBlock(CValidationState& state, const CChainParams& params, CBlockIn
2795 bool CChainState::InvalidateBlock(CValidationState& state, const CChainParams& chainparams, CBlockIndex *pindex)
2796 {
2797     AssertLockHeld(cs_main);
2798+    LOCK(::mempool.cs);

ryanofsky commented at 5:27 pm on November 13, 2018:

In commit “validation: Add missing mempool locks” (dea5dbf2d5ca0177d3a4be665790288bef40e27e)

Can you add a comment about what this is for? Do you actually need to hold the lock at the broad scope, or could it be acquired just before calling UpdateMempoolForReorg? Maybe:

0// Mempool lock for UpdateMempool calls below. (Could be released between calls.)
1LOCK(::mempool.cs);

MarcoFalke commented at 2:11 am on November 21, 2018:

Calling it right before UpdateMempoolForReorg is too fragile and exactly the issue I want to avoid. (E.g. we DisconnetTip and then fail to call UpdateMempoolForReorg, because another thread acquired the lock)

At the very least the mempool lock should be held as long as disconnectpool sits on the stack.

in src/rpc/mining.cpp:472 in dea5dbf2d5 outdated

467@@ -468,21 +468,29 @@ static UniValue getblocktemplate(const JSONRPCRequest& request)
468             nTransactionsUpdatedLastLP = nTransactionsUpdatedLast;
469         }
470 
471-        // Release the wallet and main lock while waiting
472+        // Release lock while waiting

ryanofsky commented at 9:16 pm on November 13, 2018:

In commit “validation: Add missing mempool locks” (dea5dbf2d5ca0177d3a4be665790288bef40e27e)

This is hard to follow. Can you mention this change the commit description? It seems like there is no difference in behavior here except to avoid holding g_best_block_mutex while calling mempool.GetTransactionsUpdated()? Or am I misinterpreting the diff? If that is the change, it seems good to release the lock, but why is it important do to that now when it wasn’t needed before?

MarcoFalke commented at 1:41 am on November 21, 2018:

It seems like there is no difference in behavior here except to avoid holding g_best_block_mutex while calling mempool.GetTransactionsUpdated()?

That is indeed the motivation and required to avoid a deadlock. (You should be able to see the deadlock stack by reverting this hunk and running the functional test suite)

MarcoFalke commented at 1:53 am on November 21, 2018:

Oh, and the wallet lock was removed a long time ago, but the comment was never updated.

jamesob commented at 8:41 pm on November 28, 2018:

This hunk as-is seems a little overcomplicated to me too if all we want to do is release g_best_block_mutex for the mempool operation. This patch (atop master) should do:

 0diff --git a/src/rpc/mining.cpp b/src/rpc/mining.cpp
 1index c0287ec17..b7f324a97 100644
 2--- a/src/rpc/mining.cpp
 3+++ b/src/rpc/mining.cpp
 4@@ -514,9 +514,14 @@ static UniValue getblocktemplate(const JSONRPCRequest& request)
 5             {
 6                 if (g_best_block_cv.wait_until(lock, checktxtime) == std::cv_status::timeout)
 7                 {
 8-                    // Timeout: Check transactions for update
 9+                    // Drop the best block mutex while accessing the mempool to avoid deadlock with
10+                    // ::mempool.cs.
11+                    LEAVE_CRITICAL_SECTION(lock);
12+                    // Break if we've seen new transactions after the timeout has elapsed.
13                     if (mempool.GetTransactionsUpdated() != nTransactionsUpdatedLastLP)
14                         break;
15+                    ENTER_CRITICAL_SECTION(lock);
16+
17                     checktxtime += std::chrono::seconds(10);
18                 }
19             }

Edit: ah wait, this isn’t exactly the same behavior since it’d be LOCKish and not WAIT_LOCKish… back to the drawing board. Edit 2: on second inspection, this should work since ENTER_CRITICAL_SECTION blocks in the same way WAIT_LOCK does.

ryanofsky commented at 9:34 pm on November 13, 2018: member

Started review:

dea5dbf2d5ca0177d3a4be665790288bef40e27e validation: Add missing mempool locks (1/2)
fa27735e0f1fd52f03a17d7e9b94a66f310ddfa1 [test] Add test to check mempool consistency in case of reorgs (2/2)

I didn’t look closely at the test yet, since it seems pretty complex. If you could a write a one or two sentence comment before the test describing how it works, that might be helpful.

I also think it would be useful to have a comment on the mempool::cs variable. It’s definitely being used in a way I wouldn’t have expected. From first appearance, it looks like it’s just being used to protect mapTx and allow maybe reads & writes from multiple threads:

https://github.com/bitcoin/bitcoin/blob/5d605b27457dd307e31ac320c83f3b01a41e1ae1/src/txmempool.h#L488-L489

But apparently it’s supposed to not just provide thread safety, but also guarantee consistency between mempool and chainActive? Would suggest maybe:

0    /**
1     * Lock to guarantee consistency between mempool and external chain state.
2     *
3     * Lock must be held whenever mempool is read from or written to and
4     * whenever a block is connected. Lock can be released when the mempool
5     * contains no transactions that conflict internally or externally. It
6     * should be acquired after `cs_main` if both locks need to be held at the
7     * same time.
8     */
9    mutable CCriticalSection cs;

re: #14193 (comment)

I think it’d be good to have some documentation (whether a project document or even a gist explaining our thinking) explaining the consistency guarantees for RPC calls with respect to the mempool, validation, and the wallet

Documentation was added in #14592.

MarcoFalke commented at 2:19 am on November 21, 2018: member

The test is trivial, but looks scary because of the boilerplate overhead of the cpp unit tests. Pretty much the only thing it is doing is to check that during a reorg a (simulated) rpc thread (e.g. getrawmempool) would either return a consistent mempool as of before the reorg or a consistent mempool as of after the reorg and never something inconsistent in the middle of a reorg.

MarcoFalke force-pushed on Nov 28, 2018

MarcoFalke commented at 7:15 pm on November 28, 2018: member

Added a commit with further comments and documentation.

in src/test/validation_block_tests.cpp:224 in fad0712c6e outdated

196@@ -181,4 +197,101 @@ BOOST_AUTO_TEST_CASE(processnewblock_signals_ordering)
197     BOOST_CHECK_EQUAL(sub.m_expected_tip, chainActive.Tip()->GetBlockHash());
198 }
199 
200+BOOST_AUTO_TEST_CASE(mempool_locks_reorg)

jamesob commented at 3:58 pm on November 29, 2018:

Here’s a diff with some added doc and cosmetics:

 0diff --git a/src/test/validation_block_tests.cpp b/src/test/validation_block_tests.cpp
 1index 8c285c7e5..f23e299b1 100644
 2--- a/src/test/validation_block_tests.cpp
 3+++ b/src/test/validation_block_tests.cpp
 4@@ -197,14 +197,29 @@ BOOST_AUTO_TEST_CASE(processnewblock_signals_ordering)
 5     BOOST_CHECK_EQUAL(sub.m_expected_tip, chainActive.Tip()->GetBlockHash());
 6 }
 7 
 8+/**
 9+ * Test that mempool updates happen atomically with reorgs.
10+ *
11+ * This prevents RPC clients, among others, from retrieving immediately-out-of-date mempool data
12+ * during large reorgs.
13+ *
14+ * The test verifies this by creating a chain of `num_txs` blocks, matures their coinbases, and then
15+ * submits txns spending from their coinbase to the mempool. A fork chain is then processed,
16+ * invalidating the txns and evicting them from the mempool.
17+ *
18+ * We verify that the mempool updates atomically by polling it continuously from another thread
19+ * during the reorg and checking that its size only changes once.
20+ */
21 BOOST_AUTO_TEST_CASE(mempool_locks_reorg)
22 {
23     auto last_mined = GoodBlock(Params().GenesisBlock().GetHash());
24-
25     bool ignored;
26+    auto ProcessBlock = [&ignored](std::shared_ptr<const CBlock> block) -> bool  {
27+        return ProcessNewBlock(Params(), block, /* fForceProcessing */ true, /* fNewBlock */ &ignored);
28+    };
29     // Process all mined blocks
30-    BOOST_REQUIRE(ProcessNewBlock(Params(), std::make_shared<CBlock>(Params().GenesisBlock()), /* fForceProcessing */ true, /* fNewBlock */ &ignored));
31-    BOOST_REQUIRE(ProcessNewBlock(Params(), last_mined, /* fForceProcessing */ true, /* fNewBlock */ &ignored));
32+    BOOST_REQUIRE(ProcessBlock(std::make_shared<CBlock>(Params().GenesisBlock())));
33+    BOOST_REQUIRE(ProcessBlock(last_mined));
34 
35     // Run the test multiple times
36     for (int test_runs = 3; !!test_runs; --test_runs) {
37@@ -225,13 +240,13 @@ BOOST_AUTO_TEST_CASE(mempool_locks_reorg)
38             txs.push_back(MakeTransactionRef(mtx));
39 
40             last_mined = GoodBlock(last_mined->GetHash());
41-            BOOST_REQUIRE(ProcessNewBlock(Params(), last_mined, /* fForceProcessing */ true, /* fNewBlock */ &ignored));
42+            BOOST_REQUIRE(ProcessBlock(last_mined));
43         }
44 
45         // Mature the inputs of the txs
46         for (int j = COINBASE_MATURITY; !!j; --j) {
47             last_mined = GoodBlock(last_mined->GetHash());
48-            BOOST_REQUIRE(ProcessNewBlock(Params(), last_mined, /* fForceProcessing */ true, /* fNewBlock */ &ignored));
49+            BOOST_REQUIRE(ProcessBlock(last_mined));
50         }
51 
52         // Add the txs to the tx pool
53@@ -282,10 +297,10 @@ BOOST_AUTO_TEST_CASE(mempool_locks_reorg)
54 
55         // Mine a reorg in this thread to invalidate and remove the txs from the tx pool
56         last_mined = GoodBlock(split_hash);
57-        ProcessNewBlock(Params(), last_mined, /* fForceProcessing */ true, /* fNewBlock */ &ignored);
58+        ProcessBlock(last_mined);
59         for (size_t j = COINBASE_MATURITY + txs.size() + 1; j; --j) {
60             last_mined = GoodBlock(last_mined->GetHash());
61-            ProcessNewBlock(Params(), last_mined, /* fForceProcessing */ true, /* fNewBlock */ &ignored);
62+            ProcessBlock(last_mined);
63         }
64         // Check that the reorg was eventually successful
65         BOOST_CHECK_EQUAL(last_mined->GetHash(), ::chainActive.Tip()->GetBlockHash());

jamesob commented at 5:24 pm on November 29, 2018: member

The change in general seems fine to me (mempool state should update atomically after a contiguous batch of chain maintenance and not during) and the test is great, but it might be nice to:

explain what the motivations are. There’s an argument for RPC client consistency, but IIRC this may also be a prereq for Dandelion?
add comments for new lock acquisitions explaining why we’re grabbing a seemingly unrelated lock at a coarse grain (in line with what @ryanofsky is saying).

In practice, ActivateBestChain calls are very short and so I don’t think there are any real-world downsides to locking ::mempool.cs for their duration - although I’m curious if this affects mempool access during IBD/reindex.

MarcoFalke force-pushed on Nov 29, 2018

MarcoFalke commented at 8:17 pm on November 29, 2018: member

add comments for new lock acquisitions explaining why we’re grabbing a seemingly unrelated lock at a coarse grain (in line with what @ryanofsky is saying).

I think I already did that. Let me know if any of them are still unclear.

MarcoFalke force-pushed on Nov 30, 2018

in src/txmempool.h:593 in fae71ffc99 outdated

558@@ -559,7 +559,7 @@ class CTxMemPool
559      * Check that none of this transactions inputs are in the mempool, and thus
560      * the tx is not dependent on other mempool transactions to be included in a block.
561      */
562-    bool HasNoInputsOf(const CTransaction& tx) const;
563+    bool HasNoInputsOf(const CTransaction& tx) const EXCLUSIVE_LOCKS_REQUIRED(cs);

jamesob commented at 8:14 pm on November 30, 2018:

Would it be clearer to just add LOCK(cs); to the top of the function definition instead of this annotation since there’s a recursive lock acquisition in CTxMempool::exists anyway?

MarcoFalke commented at 9:01 pm on November 30, 2018:

Imo it is clearer to let the caller figure out how long the lock is required to be held. Imagine this is called in a loop, the you, most likely, wouldn’t want to release the lock in between each iteration? Also, it happens that all callers already have the lock taken, since this annotation can be added without any other changes.

jamesob commented at 9:13 pm on December 3, 2018:

Yep, makes sense.

in src/validation.cpp:439 in fae71ffc99 outdated

435@@ -436,7 +436,7 @@ bool CheckSequenceLocks(const CTxMemPool& pool, const CTransaction& tx, int flag
436 // Returns the script flags which should be checked for a given block
437 static unsigned int GetBlockScriptFlags(const CBlockIndex* pindex, const Consensus::Params& chainparams);
438 
439-static void LimitMempoolSize(CTxMemPool& pool, size_t limit, unsigned long age) {
440+static void LimitMempoolSize(CTxMemPool& pool, size_t limit, unsigned long age) EXCLUSIVE_LOCKS_REQUIRED(pool.cs) {

jamesob commented at 8:19 pm on November 30, 2018:

We require the pool.cs lock for the duration of this function (instead of just relying on the lock in pool.Expire) because we want to keep the mempool from changing while pcoinsTip modifications happen, correct?

MarcoFalke commented at 9:02 pm on November 30, 2018:

Yeah, calling LimitMempoolSize only makes sense after you have written to the mempool, in which case you already have acquired the lock and can keep it for LimitMempoolSize.

jamesob approved

jamesob commented at 8:22 pm on November 30, 2018: member

utACK https://github.com/bitcoin/bitcoin/pull/14193/commits/fae71ffc99b04bbcb96f4e2b6fa25ed2fe258912

Just a few clarifying questions.

in src/rpc/mining.cpp:488 in fae71ffc99 outdated

483+                    // ::mempool.cs.
484+                    LEAVE_CRITICAL_SECTION(lock);
485+                    // Break if we've seen new transactions after the timeout has elapsed.
486                     if (mempool.GetTransactionsUpdated() != nTransactionsUpdatedLastLP)
487                         break;
488+                    ENTER_CRITICAL_SECTION(lock);

practicalswift commented at 3:22 pm on December 6, 2018:

ENTER_CRITICAL_SECTION(…); isn’t RAII, right?

~~Do we need to explicitly LEAVE_CRITICAL_SECTION(…); when exiting the loop?~~

jamesob commented at 3:32 pm on December 6, 2018:

Nope; the unlock()/debug stack pop that LEAVE_CRITICAL_SECTION(...) would perform is implicit to the destruction of lock when it falls out of scope after the break.

practicalswift commented at 3:36 pm on December 6, 2018:

@jamesob Makes perfect sense: saw the WAIT_LOCK(g_best_block_mutex, lock); now. Thanks.

in src/test/validation_block_tests.cpp:211 in fae71ffc99 outdated

206+ * The test verifies this by creating a chain of `num_txs` blocks, matures their coinbases, and then
207+ * submits txns spending from their coinbase to the mempool. A fork chain is then processed,
208+ * invalidating the txns and evicting them from the mempool.
209+ *
210+ * We verify that the mempool updates atomically by polling it continuously from another thread
211+ * during the reorg and checking that its size only changes once.

ryanofsky commented at 7:03 pm on December 10, 2018:

It wasn’t clear from reading this why size changing once was a useful thing to check. It would help to add some more explanation from your comment in #14193 (comment). Maybe an additional sentence: “The size changing exactly once indicates that the polling thread’s view of the mempool is either consistent with the chain state before reorg, or consistent with the chain state after the reorg, and not just consistent with some intermediate state during the reorg.

in src/test/validation_block_tests.cpp:215 in fae71ffc99 outdated

210+ * We verify that the mempool updates atomically by polling it continuously from another thread
211+ * during the reorg and checking that its size only changes once.
212+ */
213+BOOST_AUTO_TEST_CASE(mempool_locks_reorg)
214+{
215+    auto last_mined = GoodBlock(Params().GenesisBlock().GetHash());

ryanofsky commented at 7:08 pm on December 10, 2018:

Maybe declare last_mined below where it is used. It was confusing to see it not used at all in the lambda and then later referenced below.

in src/test/validation_block_tests.cpp:225 in fae71ffc99 outdated

220+    // Process all mined blocks
221+    BOOST_REQUIRE(ProcessBlock(std::make_shared<CBlock>(Params().GenesisBlock())));
222+    BOOST_REQUIRE(ProcessBlock(last_mined));
223+
224+    // Run the test multiple times
225+    for (int test_runs = 3; !!test_runs; --test_runs) {

ryanofsky commented at 7:10 pm on December 10, 2018:

Why !!test_runs everywhere in this PR instead of test_runs > 0? This seems strange, and less readable. Also, why count backwards instead of forwards?

in src/test/validation_block_tests.cpp:301 in fae71ffc99 outdated

296+        }};
297+
298+        // Mine a reorg in this thread to invalidate and remove the txs from the tx pool
299+        last_mined = GoodBlock(split_hash);
300+        ProcessBlock(last_mined);
301+        for (size_t j = COINBASE_MATURITY + txs.size() + 1; j; --j) {

ryanofsky commented at 7:20 pm on December 10, 2018:

No more !! here? And I was just getting used to it…

in src/test/validation_block_tests.cpp:315 in fae71ffc99 outdated

285+                // Internally, we might be in the middle of the reorg, but
286+                // externally the reorg to the most-proof-of-work chain should
287+                // be atomic. So the caller assumes that the returned mempool
288+                // is consistent. That is, it has all txs that were there
289+                // before the reorg.
290+                assert(::mempool.mapTx.size() == txs.size());

ryanofsky commented at 7:27 pm on December 10, 2018:

This is actually doing more than checking that “size only changes once” as described in the summary comment above. It is checking that the mempool either contains all of the transactions invalidated by the reorg, or none of them, and not some intermediate amount. Could update the comment above to say this instead.

MarcoFalke commented at 7:37 pm on February 11, 2019:

The test has a ton of comments. Let me know if I need to add anything else to parts where the code is not self-explanatory.

in src/rpc/mining.cpp:483 in d9e6cb497b outdated

477@@ -478,9 +478,14 @@ static UniValue getblocktemplate(const JSONRPCRequest& request)
478             {
479                 if (g_best_block_cv.wait_until(lock, checktxtime) == std::cv_status::timeout)
480                 {
481-                    // Timeout: Check transactions for update
482+                    // Drop the best block mutex while accessing the mempool to avoid deadlock with
483+                    // ::mempool.cs.

ryanofsky commented at 8:06 pm on December 10, 2018:

This seems less than ideal. I would guess that we don’t actually need ConnectTip and DisconnectTip be holding on to mempool.cs while they wait to notify this RPC. At least would seem more straightforward if g_best_block_cv got notified after the fact, when mempool.cs was already released.

I think it would be good to add a TODO to untangle these locks on the future, or a normal comment if mixing these locks has some benefits which are not apparent. The best place to comment would be on the g_best_block_mutex variable. It might also be good to add ACQUIRED_AFTER annotations there to prevent another deadlock like this.

MarcoFalke commented at 7:36 pm on February 11, 2019:

Does you comment still apply after my latest rebase? Also, ACQUIRED_AFTER only works in very limited setttings (e.g. not across translation units or even function bodies). We don’t use it anywhere else, so I will leave this as is for now.

in src/validation.cpp:2689 in d9e6cb497b outdated

2685@@ -2686,6 +2686,7 @@ bool CChainState::ActivateBestChain(CValidationState &state, const CChainParams&
2686 
2687         {
2688             LOCK(cs_main);
2689+            LOCK(::mempool.cs); // Lock for at least as long as it takes for connectTrace to be consumed

ryanofsky commented at 11:01 pm on December 10, 2018:

In commit “validation: Add missing mempool locks” (d9e6cb497bd540078dbf32536825fa3bfd60df42)

re:https://github.com/bitcoin/bitcoin/pull/14193#discussion_r233150812

It’s not clear why the lock is being acquired here instead of before calling ActivateBestChainStep.

Thank you for adding the comment. But this is still not clear to me. If this code locked the mempool.cs before calling ActivateBestChainStep and unlocked it immediately after it returned, would there be a bug? ActivateBestChainStep already updates the mempool, so if this code isn’t locking to maintain the invariant that views of the mempool are consistent with chainActive, what invariant is it trying to maintain?

in src/validation.cpp:2801 in d9e6cb497b outdated

2794@@ -2794,6 +2795,7 @@ bool PreciousBlock(CValidationState& state, const CChainParams& params, CBlockIn
2795 bool CChainState::InvalidateBlock(CValidationState& state, const CChainParams& chainparams, CBlockIndex *pindex)
2796 {
2797     AssertLockHeld(cs_main);
2798+    LOCK(::mempool.cs); // Lock for as long as disconnectpool is in scope to make sure UpdateMempoolForReorg is called after DisconnetTip without unlocking in between

ryanofsky commented at 11:38 pm on December 10, 2018:

In commit “[test] Add test to check mempool consistency in case of reorgs” (fae71ffc99b04bbcb96f4e2b6fa25ed2fe258912)

Calling it right before UpdateMempoolForReorg is too fragile and exactly the issue I want to avoid. (E.g. we DisconnetTip and then fail to call UpdateMempoolForReorg, because another thread acquired the lock)

Thanks for explaining and adding the comment. It is clear if mempool.cs needs to provide views of the mempool that are consistent with chainActive, then it needs to be locked between the first DisconnectTip call and the final UpdateMempoolForReorg call. But it would seem clearer and safer to lock the mempool right where disconnectpool is declared and to unlock it right after UpdateMempoolForReorg returns, instead of continuing to hold it while sending UI notifications.

Also it seems like both DisconnectTip and ConnectTip should be annotated with EXCLUSIVE_LOCKS_REQUIRED(cs_main, ::mempool.cs), and for some reason they aren’t.

MarcoFalke commented at 7:37 pm on February 11, 2019:

Added locking annotations.

ryanofsky commented at 0:44 am on December 11, 2018: member

I requested this in my last review (https://github.com/bitcoin/bitcoin/pull/14193#pullrequestreview-174491669) but I really think mempool.cs needs a comment saying how it’s supposed to be used and what invariants it guarantees. Here’s an attempt at a comment based on what I currently understand about what you’re trying to do with this PR:

 0//! Recursive mutex ensuring internal consistency of transactions within the
 1//! mempool, and external consistency between transactions in the mempool and
 2//! the current chain tip. This mutex must be locked:
 3//!
 4//! - whenever reading or writing to `mempool::mapTx`
 5//! - whenever writing to `chainActive` or `pcoinsTip`
 6//!
 7//! and be held as long as mempool state and chain state are not consistent with
 8//! each other.
 9//!
10//! Locking either the `cs_main` mutex or the `mempool::cs` mutex will lock the
11//! chain tip and prevent it from being updated. But having two mutexes does
12//! allow some useful concurrency, because it lets a thread that reads or writes
13//! to the mempool run at the same time as a thread that reads the chain state.
14mutable CCriticalSection cs;

ryanofsky commented at 1:42 am on December 11, 2018: member

I’m probably missing some context, but what is wrong with a simpler design where you just acquire cs_main to get a consistent view of chain state, and mempool::cs to get a consistent view of the mempool, and both to get a consistent view of both together?

The new unit test and the getrawmempool example both seem very contrived. Why do these cases need to only acquire mempool::cs without acquiring cs_main? If they need to views of the mempool consistent with the current chain tip, why don’t they just acquire both mutexes?

DrahtBot added the label Needs rebase on Dec 13, 2018

ryanofsky commented at 6:51 pm on December 17, 2018: member

It looks like #14963 replaces this PR. Perhaps this should be closed, or marked dependent.

MarcoFalke force-pushed on Dec 22, 2018

DrahtBot removed the label Needs rebase on Dec 24, 2018

ryanofsky commented at 6:03 pm on January 4, 2019: member

re: #14193 (comment)

It looks like #14963 replaces this PR. Perhaps this should be closed, or marked dependent.

Turns out #14963 doesn’t really replace this PR, it just documents a bug which this PR (which is currently WIP) will fix:

https://github.com/bitcoin/bitcoin/blob/fa5c346c5a336ccdce8e50befb53746c280f053e/src/txmempool.h#L515-L522

re: #14193 (comment)

I’m probably missing some context, but what is wrong with a simpler design where you just acquire cs_main to get a consistent view of chain state, and mempool::cs to get a consistent view of the mempool, and both to get a consistent view of both together?

Responding to my own question, but answer from conversation seems to be that we want to allow applications that frequently query the mempool to (1) only acquire cs.mempool without cs_main for better performance and (2) not see any missing transactions if they happen to query the mempool in the middle of a reorg.

MarcoFalke force-pushed on Jan 15, 2019

MarcoFalke commented at 6:52 pm on January 15, 2019: member

Addressed some of @ryanofsky’s feedback

DrahtBot added the label Needs rebase on Mar 7, 2019

MarcoFalke force-pushed on Mar 7, 2019

DrahtBot removed the label Needs rebase on Mar 7, 2019

in src/txmempool.h:522 in fadeeadd93 outdated

515-     * @par Consistency bug
516-     *
517-     * The second guarantee above is not currently enforced, but
518-     * https://github.com/bitcoin/bitcoin/pull/14193 will fix it. No known code
519-     * in bitcoin currently depends on second guarantee, but it is important to
520-     * fix for third party code that needs be able to frequently poll the

ryanofsky commented at 4:33 pm on March 20, 2019:

In commit “validation: Add missing mempool locks” (fadeeadd93a928c8ececa8c96ad2afa45bd25a29)

Might be worth keeping text that says why guarantee 2 is useful. James also asked about it here #14193#pullrequestreview-179871886, wondering if it might be useful for dandelion.

MarcoFalke commented at 5:18 pm on March 20, 2019:

Not sure about that. We generally don’t update (forget to update) comments when adding new features. So if a feature is added in the far future (e.g. Dandelion has no chance to hit any time soon), the phrase “No known code in bitcoin currently depends on second guarantee” might no longer reflect reality.

I believe the test that I added serves a a nice documentation that if this guarantee is violated, it will fail. The motivation to fix this bug for (enterprise) rpc users that poll the mempool should be enough motivation on its own.

ryanofsky approved

ryanofsky commented at 4:36 pm on March 20, 2019: member

utACK fa33e8f41fe14cfd136ac7c96e192a9e14f1d0d3. Looks good! This makes a lot more sense to me now, and the test is actually pretty clear too.

MarcoFalke added this to the milestone 0.19.0 on Mar 20, 2019

MarcoFalke added the label Bug on Mar 20, 2019

in src/validation.cpp:2797 in fa33e8f41f outdated

2793@@ -2793,6 +2794,7 @@ bool CChainState::InvalidateBlock(CValidationState& state, const CChainParams& c
2794         LimitValidationInterfaceQueue();
2795 
2796         LOCK(cs_main);
2797+        LOCK(::mempool.cs); // Lock for as long as disconnectpool is in scope to make sure UpdateMempoolForReorg is called after DisconnetTip without unlocking in between

practicalswift commented at 2:02 pm on April 15, 2019:

Nit: Should be “DisconnectTip” :-)

CLEANSKY97 commented at 1:50 am on April 16, 2019: none

تکراری از #

MarcoFalke force-pushed on Apr 16, 2019

ryanofsky approved

ryanofsky commented at 5:05 pm on April 17, 2019: member

utACK fa81e866ebb2533489cea13618af0d58a1a355fd. Only change since last review is rebase after #15788 and fixing DisconnetTip spelling in comment

MarcoFalke force-pushed on Apr 25, 2019

MarcoFalke commented at 5:58 pm on April 25, 2019: member

rewritten test now that segwit is always active in unit tests:

Add missing call to GenerateCoinbaseCommitment
Mine the reorg before adding txs to mempool, to not include them in the reorg blocks

MarcoFalke force-pushed on Apr 25, 2019

DrahtBot added the label Needs rebase on May 7, 2019

MarcoFalke force-pushed on May 7, 2019

DrahtBot removed the label Needs rebase on May 7, 2019

ryanofsky approved

ryanofsky commented at 9:41 pm on May 7, 2019: member

utACK faa0d61f8ab42cc921bdcd73d6fd4b8b680aa22e. Only changes since last review are rebase and described test changes.

Mine the reorg before adding txs to mempool, to not include them in the reorg blocks.

I don’t understand how it’d be possible for transactions in the mempool to end up in the reorg, if the reorg code is just calling Block() and generating empty blocks. I also don’t understand what changed in the rebase if mining the reorg after adding transactions used to work, but now stopped working. In any case, it’d be useful to have a comment in the test saying that it’s important to mine the reorg before adding transactions, and why, assuming it is important.

MarcoFalke commented at 9:52 pm on May 7, 2019: member

The mining is using CreateNewBlock, which gets txs from the mempool (if available).

Why this worked previously, IIRC, was that segwit wasn’t active on regtest in unit tests, but non-standard txs would be accepted to the mempool. (See https://github.com/bitcoin/bitcoin/pull/15788/files#diff-01edb0ae76ec7f6061ec6ddbdb18375aL45 and #15891). Since the txs have witness, they wouldn’t be mined previously, but now that segwit is always active on regtest, they are.

In this test we don’t want to mine any txs from the mempool, so we mine the reorg when the mempool is empty and then add the txs to the mempool.

MarcoFalke commented at 1:51 pm on May 14, 2019: member

Also, the test is already clear on this: // Mine a reorg (and hold it back) before adding the txs to the mempool. If you have specific concerns, please suggest a rewording of that sentence.

CLEANSKY97 approved

in src/rpc/mining.cpp:484 in fa84fdbb1e outdated

478@@ -479,9 +479,14 @@ static UniValue getblocktemplate(const JSONRPCRequest& request)
479             {
480                 if (g_best_block_cv.wait_until(lock, checktxtime) == std::cv_status::timeout)
481                 {
482-                    // Timeout: Check transactions for update
483+                    // Drop the best block mutex while accessing the mempool to avoid deadlock with
484+                    // ::mempool.cs.
485+                    LEAVE_CRITICAL_SECTION(lock);

sipa commented at 9:52 pm on May 22, 2019:

This is a pretty weird construction; generally you shouldn’t be releasing the lock that protects the state you’re waiting to change while inspecting it.

What about making CTxMemPool::nTransactionsUpdated an atomic, and making CTxMemPool::GetTransactionsUpdated not grap cs? Would it still be needed to release lock here then?

Would you be able to write this as g_best_block_cv.wait_until(lock, checktxtime, [&](){ return g_best_block != hashWatchedChain || !IsRPCRunning() || mempool.GetTransactionsUpdate() != nTransactionsUpdatedLastLP; });?

MarcoFalke commented at 10:35 am on June 7, 2019:

Made it atomic, but I left the refactoring change for a future pull request.

MarcoFalke force-pushed on May 23, 2019

DrahtBot added the label Needs rebase on Jun 5, 2019

MarcoFalke force-pushed on Jun 5, 2019

DrahtBot removed the label Needs rebase on Jun 5, 2019

MarcoFalke force-pushed on Jun 7, 2019

txpool: Make nTransactionsUpdated atomic fa0c9dbf91

validation: Add missing mempool locks fabeb1f613

[test] Add test to check mempool consistency in case of reorgs fa2b083c3f

MarcoFalke force-pushed on Jun 7, 2019

laanwj added this to the "Blockers" column in a project

MarcoFalke moved this from the "Blockers" to the "Bugfixes" column in a project

ryanofsky approved

ryanofsky commented at 6:27 pm on June 27, 2019: member

utACK fa2b083c3feb0522baf652045efa6b73458761a3 [EDIT: was ~~e284e422e75189794e24fe482819d8b1407857c3~~, from bad copy and paste]. Changes since last review: rebase after #15976, adding vTxHashes lock annotation, adding new commit dropping mempool lock for nTransactionsUpdated and making it atomic to avoid deadlock between mempool lock and g_best_block_mutex

in src/txmempool.cpp:326 in fa0c9dbf91 outdated

321@@ -322,8 +322,8 @@ void CTxMemPoolEntry::UpdateAncestorState(int64_t modifySize, CAmount modifyFee,
322     assert(int(nSigOpCostWithAncestors) >= 0);
323 }
324 
325-CTxMemPool::CTxMemPool(CBlockPolicyEstimator* estimator) :
326-    nTransactionsUpdated(0), minerPolicyEstimator(estimator)
327+CTxMemPool::CTxMemPool(CBlockPolicyEstimator* estimator)
328+    : nTransactionsUpdated(0), minerPolicyEstimator(estimator)

promag commented at 10:55 pm on June 29, 2019:

:eyes:

in src/test/validation_block_tests.cpp:316 in fa2b083c3f

311+                // externally the reorg to the most-proof-of-work chain should
312+                // be atomic. So the caller assumes that the returned mempool
313+                // is consistent. That is, it has all txs that were there
314+                // before the reorg.
315+                assert(::mempool.mapTx.size() == txs.size());
316+                continue;

promag commented at 11:07 pm on June 29, 2019:

fa2b083c3feb0522baf652045efa6b73458761a3

Why continue?

MarcoFalke commented at 3:19 pm on June 30, 2019:

Because the thread should continue and not exit (for now)

promag commented at 8:28 pm on July 1, 2019:

Why would it exit?

MarcoFalke commented at 8:46 pm on July 1, 2019:

It would exit when it breaks. I think this is well documented in the test case and I am not sure what you are asking exactly.

If you are asking about cpp syntax, I find cppreference really helpful. E.g. https://en.cppreference.com/w/cpp/language/break

If you are asking about this test case logic, please write a more verbose question.

ryanofsky commented at 2:11 pm on July 2, 2019:

I don’t get this either. If you have a loop, and an unconditional continue is the last statement in the loop, the continue isn’t doing anything and could be removed.

https://github.com/bitcoin/bitcoin/blob/fa2b083c3feb0522baf652045efa6b73458761a3/src/test/validation_block_tests.cpp#L316

MarcoFalke commented at 2:54 pm on July 2, 2019:

Oh, I now I understand that you were asking why I put a redundant continue there.

That was intentional, because I felt it was more clear back when I wrote the test. Happy to remove, but I think we shouldn’t spend too much time on style of tests.

promag commented at 3:04 pm on July 2, 2019:

I didn’t mean to discuss style! I honestly though something was wrong because of the pointless continue, like missing code or something else.

MarcoFalke commented at 3:07 pm on July 2, 2019:

Ah, good point. Sorry for the confusion.

in src/test/validation_block_tests.cpp:304 in fa2b083c3f

299+        // validation is doing a reorg
300+        std::thread rpc_thread{[&]() {
301+            // This thread is checking that the mempool either contains all of
302+            // the transactions invalidated by the reorg, or none of them, and
303+            // not some intermediate amount.
304+            while (true) {

promag commented at 11:07 pm on June 29, 2019:

fa2b083c3feb0522baf652045efa6b73458761a3

I think this is flawless, what guarantees that there’s a non-atomic change once mempool.cs is released?

MarcoFalke commented at 3:18 pm on June 30, 2019:

The code changes in this pull request should guarantee that this doesn’t happen here in the test (you can check it by running the test before the code changes and after), as well as for “real” rpc polling threads. Though, I didn’t write a functional test for this.

promag commented at 11:08 pm on June 29, 2019: member

Concept ACK, some comments.

laanwj commented at 2:05 pm on July 2, 2019: member

code review ACK fa2b083c3feb0522baf652045efa6b73458761a3 @ryanofsky didn’t you ack the wrong commit above ? e284e422e75189794e24fe482819d8b1407857c3 (“Remove getBlockDepth method from Chain::interface”) isn’t part of this PR

ryanofsky commented at 2:20 pm on July 2, 2019: member

@ryanofsky didn’t you ack the wrong commit above

Sorry, bad copy and paste from the command line. Edited #14193#pullrequestreview-255403869 to fix.

laanwj commented at 2:29 pm on July 2, 2019: member

Thanks!

laanwj merged this on Jul 2, 2019

laanwj closed this on Jul 2, 2019

laanwj referenced this in commit c6e42f1ca9 on Jul 2, 2019

MarcoFalke deleted the branch on Jul 2, 2019

laanwj removed this from the "Bugfixes" column in a project

sidhujag referenced this in commit 4408967f33 on Jul 3, 2019

laanwj referenced this in commit e9f73b839a on Oct 26, 2019

jasonbcox referenced this in commit bcc81a5690 on Sep 25, 2020

jasonbcox referenced this in commit 58e79dae40 on Sep 25, 2020

jasonbcox referenced this in commit 472a674203 on Sep 25, 2020

jasonbcox referenced this in commit 211afcf44c on Sep 25, 2020

LarryRuane referenced this in commit b02e36c230 on May 10, 2021

LarryRuane referenced this in commit 9ee6369bcb on May 10, 2021

Munkybooty referenced this in commit 5fdbb0b6de on Nov 4, 2021

DrahtBot locked this on Dec 16, 2021

validation: Add missing mempool locks #14193

Conflicts