validation: Add missing mempool locks #14193
pull MarcoFalke wants to merge 3 commits into bitcoin:master from MarcoFalke:Mf1809-valLocksMempool changing 6 files +183 −45-
MarcoFalke commented at 9:15 pm on September 10, 2018: memberTake the mempool read lock during reorgs, so that we don’t accidentally read an inconsistent mempool.
-
MarcoFalke added the label Validation on Sep 10, 2018
-
in src/validation.cpp:2689 in fa3e0471a8 outdated
2685@@ -2686,6 +2686,7 @@ bool CChainState::ActivateBestChain(CValidationState &state, const CChainParams& 2686 2687 { 2688 LOCK(cs_main); 2689+ LOCK(::mempool.cs);
practicalswift commented at 9:34 pm on September 10, 2018:Nit:LOCK2(cs_main, ::mempool.cs);
ryanofsky commented at 9:07 pm on September 12, 2018:What’s the reason for acquiring mempool lock at a broad scope instead of more narrowly where the mempool is actually accessed? Would be nice to have comments in the places where this PR adds locking to indicate purpose of locks and reasoning behind placement.
ryanofsky commented at 5:40 pm on November 13, 2018:In commit “validation: Add missing mempool locks” (dea5dbf2d5ca0177d3a4be665790288bef40e27e)
It’s not clear why the lock is being acquired here instead of before calling ActivateBestChainStep. It’s fine to acquire locks at a broad scope, but it’s scary to work with code that acquires locks in unexpected places and gives you no clue where or whether the locks are safe to move. Would suggest either moving the lock to where it is more obviously needed or commenting:
0// Mempool lock for ActivateBestChainStep calls below. (Could be released between calls.) 1LOCK(::mempool.cs);
Or else explaining why it’s held more broadly.
MarcoFalke commented at 1:53 am on November 21, 2018:It is held just as long as cs_main, because anything shorter would allow other threads (potentially rpc threads) to read from the mempool and get a result that is most likely useless in case we are in the middle of a larger reorg.DrahtBot commented at 11:50 pm on September 10, 2018: memberThe following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #16273 (refactor: Remove unused includes by practicalswift)
- #16194 (refactor: share blockmetadata with BlockManager by jamesob)
- #16066 (mempool: Skip estimator if block is older than X by promag)
- #15192 (Add missing cs_main locks in ThreadImport(…)/Shutdown(…)/gettxoutsetinfo(…)/InitScriptExecutionCache(). Add missing locking annotations. by practicalswift)
- #13949 (Introduce MempoolObserver interface to break “policy/fees -> txmempool -> policy/fees” circular dependency by Empact)
- #10443 (Add fee_est tool for debugging fee estimation code by ryanofsky)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
sdaftuar commented at 2:26 pm on September 14, 2018: memberIs the idea that you’re trying to prevent RPC users from having an inconsistent state, like if they call getbestblockhash() and then call getrawmempool(), the latter should be consistent with the former?
That may be reasonable, but I think it’d be good to have some documentation (whether a project document or even a gist explaining our thinking) explaining the consistency guarantees for RPC calls with respect to the mempool, validation, and the wallet. I feel like this kind of thing has come up before in a haphazard way and caused issues when we didn’t think things through or missed an implication of a change.
(Also, I think it would aid RPC users if they knew what our consistency guarantees were.)
ryanofsky commented at 5:31 pm on September 14, 2018: memberIs the idea that you’re trying to prevent RPC users from having an inconsistent state
From the description, I assumed the intent of the PR was to prevent RPCs like
getmempoolentry
, which acquiremempool.cs
but notcs_main
, from returning garbage or segfaulting during reorgs becauseUpdateMempoolForReorg
andInvalidateBlock
functions were apparently not acquiringmempool.cs
while updating the mempool.But actually it seems like in all the places where the mempool is updated, the code already has both
cs_main
andmempool.cs
locks. So it should be perfectly safe to read the mempool while holding either of the two locks, and trying to acquire both is wasteful.So maybe this PR is not doing what it intended to do, and instead it would be better just to add comments and fix thread safety annotations somehow. Maybe it would be possible to annotate
cs_main
andmempool.cs
individually as shared locks needed for reading mempool data, andcs_main
+mempool.cs
both together as an exclusive lock needed for updating mempool data?MarcoFalke commented at 5:50 pm on September 14, 2018: member@ryanofsky With inconsistent I don’t mean invalid reads, but rather transactions that are invalid (and still in the process of removal) as of our current block tip.ryanofsky commented at 7:05 pm on September 14, 2018: memberCode utACK fa3e0471a81252e333886b97d8d33bb275061d42. Code change seems perfectly fine, and it could perhaps lead to more consistency in some cases, but like the previous comment #14193 (comment) says, it’s unclear which cases are better off after this change or what case motivated this PR. More comments and documentation would be helpful.MarcoFalke force-pushed on Oct 27, 2018MarcoFalke force-pushed on Oct 27, 2018MarcoFalke commented at 3:04 pm on October 27, 2018: member@sdaftuar, @ryanofsky: Added a test with comments to illustrate my motivation a bit more.MarcoFalke force-pushed on Oct 27, 2018MarcoFalke force-pushed on Oct 28, 2018MarcoFalke force-pushed on Oct 28, 2018in src/validation.cpp:2798 in dea5dbf2d5 outdated
2794@@ -2794,6 +2795,7 @@ bool PreciousBlock(CValidationState& state, const CChainParams& params, CBlockIn 2795 bool CChainState::InvalidateBlock(CValidationState& state, const CChainParams& chainparams, CBlockIndex *pindex) 2796 { 2797 AssertLockHeld(cs_main); 2798+ LOCK(::mempool.cs);
ryanofsky commented at 5:27 pm on November 13, 2018:In commit “validation: Add missing mempool locks” (dea5dbf2d5ca0177d3a4be665790288bef40e27e)
Can you add a comment about what this is for? Do you actually need to hold the lock at the broad scope, or could it be acquired just before calling UpdateMempoolForReorg? Maybe:
0// Mempool lock for UpdateMempool calls below. (Could be released between calls.) 1LOCK(::mempool.cs);
MarcoFalke commented at 2:11 am on November 21, 2018:Calling it right before UpdateMempoolForReorg is too fragile and exactly the issue I want to avoid. (E.g. we
DisconnetTip
and then fail to callUpdateMempoolForReorg
, because another thread acquired the lock)At the very least the mempool lock should be held as long as disconnectpool sits on the stack.
in src/rpc/mining.cpp:472 in dea5dbf2d5 outdated
467@@ -468,21 +468,29 @@ static UniValue getblocktemplate(const JSONRPCRequest& request) 468 nTransactionsUpdatedLastLP = nTransactionsUpdatedLast; 469 } 470 471- // Release the wallet and main lock while waiting 472+ // Release lock while waiting
ryanofsky commented at 9:16 pm on November 13, 2018:In commit “validation: Add missing mempool locks” (dea5dbf2d5ca0177d3a4be665790288bef40e27e)
This is hard to follow. Can you mention this change the commit description? It seems like there is no difference in behavior here except to avoid holding
g_best_block_mutex
while callingmempool.GetTransactionsUpdated()
? Or am I misinterpreting the diff? If that is the change, it seems good to release the lock, but why is it important do to that now when it wasn’t needed before?
MarcoFalke commented at 1:41 am on November 21, 2018:It seems like there is no difference in behavior here except to avoid holding g_best_block_mutex while calling mempool.GetTransactionsUpdated()?
That is indeed the motivation and required to avoid a deadlock. (You should be able to see the deadlock stack by reverting this hunk and running the functional test suite)
MarcoFalke commented at 1:53 am on November 21, 2018:Oh, and the wallet lock was removed a long time ago, but the comment was never updated.
jamesob commented at 8:41 pm on November 28, 2018:This hunk as-is seems a little overcomplicated to me too if all we want to do is release
g_best_block_mutex
for the mempool operation. This patch (atopmaster
) should do:0diff --git a/src/rpc/mining.cpp b/src/rpc/mining.cpp 1index c0287ec17..b7f324a97 100644 2--- a/src/rpc/mining.cpp 3+++ b/src/rpc/mining.cpp 4@@ -514,9 +514,14 @@ static UniValue getblocktemplate(const JSONRPCRequest& request) 5 { 6 if (g_best_block_cv.wait_until(lock, checktxtime) == std::cv_status::timeout) 7 { 8- // Timeout: Check transactions for update 9+ // Drop the best block mutex while accessing the mempool to avoid deadlock with 10+ // ::mempool.cs. 11+ LEAVE_CRITICAL_SECTION(lock); 12+ // Break if we've seen new transactions after the timeout has elapsed. 13 if (mempool.GetTransactionsUpdated() != nTransactionsUpdatedLastLP) 14 break; 15+ ENTER_CRITICAL_SECTION(lock); 16+ 17 checktxtime += std::chrono::seconds(10); 18 } 19 }
Edit: ah wait, this isn’t exactly the same behavior since it’d be
LOCK
ish and notWAIT_LOCK
ish… back to the drawing board. Edit 2: on second inspection, this should work sinceENTER_CRITICAL_SECTION
blocks in the same wayWAIT_LOCK
does.ryanofsky commented at 9:34 pm on November 13, 2018: memberStarted review:
- dea5dbf2d5ca0177d3a4be665790288bef40e27e validation: Add missing mempool locks (1/2)
- fa27735e0f1fd52f03a17d7e9b94a66f310ddfa1 [test] Add test to check mempool consistency in case of reorgs (2/2)
I didn’t look closely at the test yet, since it seems pretty complex. If you could a write a one or two sentence comment before the test describing how it works, that might be helpful.
I also think it would be useful to have a comment on the
mempool::cs
variable. It’s definitely being used in a way I wouldn’t have expected. From first appearance, it looks like it’s just being used to protectmapTx
and allow maybe reads & writes from multiple threads:But apparently it’s supposed to not just provide thread safety, but also guarantee consistency between
mempool
andchainActive
? Would suggest maybe:0 /** 1 * Lock to guarantee consistency between mempool and external chain state. 2 * 3 * Lock must be held whenever mempool is read from or written to and 4 * whenever a block is connected. Lock can be released when the mempool 5 * contains no transactions that conflict internally or externally. It 6 * should be acquired after `cs_main` if both locks need to be held at the 7 * same time. 8 */ 9 mutable CCriticalSection cs;
re: #14193 (comment)
I think it’d be good to have some documentation (whether a project document or even a gist explaining our thinking) explaining the consistency guarantees for RPC calls with respect to the mempool, validation, and the wallet
Documentation was added in #14592.
MarcoFalke commented at 2:19 am on November 21, 2018: memberThe test is trivial, but looks scary because of the boilerplate overhead of the cpp unit tests. Pretty much the only thing it is doing is to check that during a reorg a (simulated) rpc thread (e.g. getrawmempool) would either return a consistent mempool as of before the reorg or a consistent mempool as of after the reorg and never something inconsistent in the middle of a reorg.MarcoFalke force-pushed on Nov 28, 2018MarcoFalke commented at 7:15 pm on November 28, 2018: memberAdded a commit with further comments and documentation.in src/test/validation_block_tests.cpp:224 in fad0712c6e outdated
196@@ -181,4 +197,101 @@ BOOST_AUTO_TEST_CASE(processnewblock_signals_ordering) 197 BOOST_CHECK_EQUAL(sub.m_expected_tip, chainActive.Tip()->GetBlockHash()); 198 } 199 200+BOOST_AUTO_TEST_CASE(mempool_locks_reorg)
jamesob commented at 3:58 pm on November 29, 2018:Here’s a diff with some added doc and cosmetics:
0diff --git a/src/test/validation_block_tests.cpp b/src/test/validation_block_tests.cpp 1index 8c285c7e5..f23e299b1 100644 2--- a/src/test/validation_block_tests.cpp 3+++ b/src/test/validation_block_tests.cpp 4@@ -197,14 +197,29 @@ BOOST_AUTO_TEST_CASE(processnewblock_signals_ordering) 5 BOOST_CHECK_EQUAL(sub.m_expected_tip, chainActive.Tip()->GetBlockHash()); 6 } 7 8+/** 9+ * Test that mempool updates happen atomically with reorgs. 10+ * 11+ * This prevents RPC clients, among others, from retrieving immediately-out-of-date mempool data 12+ * during large reorgs. 13+ * 14+ * The test verifies this by creating a chain of `num_txs` blocks, matures their coinbases, and then 15+ * submits txns spending from their coinbase to the mempool. A fork chain is then processed, 16+ * invalidating the txns and evicting them from the mempool. 17+ * 18+ * We verify that the mempool updates atomically by polling it continuously from another thread 19+ * during the reorg and checking that its size only changes once. 20+ */ 21 BOOST_AUTO_TEST_CASE(mempool_locks_reorg) 22 { 23 auto last_mined = GoodBlock(Params().GenesisBlock().GetHash()); 24- 25 bool ignored; 26+ auto ProcessBlock = [&ignored](std::shared_ptr<const CBlock> block) -> bool { 27+ return ProcessNewBlock(Params(), block, /* fForceProcessing */ true, /* fNewBlock */ &ignored); 28+ }; 29 // Process all mined blocks 30- BOOST_REQUIRE(ProcessNewBlock(Params(), std::make_shared<CBlock>(Params().GenesisBlock()), /* fForceProcessing */ true, /* fNewBlock */ &ignored)); 31- BOOST_REQUIRE(ProcessNewBlock(Params(), last_mined, /* fForceProcessing */ true, /* fNewBlock */ &ignored)); 32+ BOOST_REQUIRE(ProcessBlock(std::make_shared<CBlock>(Params().GenesisBlock()))); 33+ BOOST_REQUIRE(ProcessBlock(last_mined)); 34 35 // Run the test multiple times 36 for (int test_runs = 3; !!test_runs; --test_runs) { 37@@ -225,13 +240,13 @@ BOOST_AUTO_TEST_CASE(mempool_locks_reorg) 38 txs.push_back(MakeTransactionRef(mtx)); 39 40 last_mined = GoodBlock(last_mined->GetHash()); 41- BOOST_REQUIRE(ProcessNewBlock(Params(), last_mined, /* fForceProcessing */ true, /* fNewBlock */ &ignored)); 42+ BOOST_REQUIRE(ProcessBlock(last_mined)); 43 } 44 45 // Mature the inputs of the txs 46 for (int j = COINBASE_MATURITY; !!j; --j) { 47 last_mined = GoodBlock(last_mined->GetHash()); 48- BOOST_REQUIRE(ProcessNewBlock(Params(), last_mined, /* fForceProcessing */ true, /* fNewBlock */ &ignored)); 49+ BOOST_REQUIRE(ProcessBlock(last_mined)); 50 } 51 52 // Add the txs to the tx pool 53@@ -282,10 +297,10 @@ BOOST_AUTO_TEST_CASE(mempool_locks_reorg) 54 55 // Mine a reorg in this thread to invalidate and remove the txs from the tx pool 56 last_mined = GoodBlock(split_hash); 57- ProcessNewBlock(Params(), last_mined, /* fForceProcessing */ true, /* fNewBlock */ &ignored); 58+ ProcessBlock(last_mined); 59 for (size_t j = COINBASE_MATURITY + txs.size() + 1; j; --j) { 60 last_mined = GoodBlock(last_mined->GetHash()); 61- ProcessNewBlock(Params(), last_mined, /* fForceProcessing */ true, /* fNewBlock */ &ignored); 62+ ProcessBlock(last_mined); 63 } 64 // Check that the reorg was eventually successful 65 BOOST_CHECK_EQUAL(last_mined->GetHash(), ::chainActive.Tip()->GetBlockHash());
jamesob commented at 5:24 pm on November 29, 2018: memberThe change in general seems fine to me (mempool state should update atomically after a contiguous batch of chain maintenance and not during) and the test is great, but it might be nice to:
- explain what the motivations are. There’s an argument for RPC client consistency, but IIRC this may also be a prereq for Dandelion?
- add comments for new lock acquisitions explaining why we’re grabbing a seemingly unrelated lock at a coarse grain (in line with what @ryanofsky is saying).
In practice,
ActivateBestChain
calls are very short and so I don’t think there are any real-world downsides to locking::mempool.cs
for their duration - although I’m curious if this affects mempool access during IBD/reindex.MarcoFalke force-pushed on Nov 29, 2018MarcoFalke commented at 8:17 pm on November 29, 2018: memberadd comments for new lock acquisitions explaining why we’re grabbing a seemingly unrelated lock at a coarse grain (in line with what @ryanofsky is saying).
I think I already did that. Let me know if any of them are still unclear.
MarcoFalke force-pushed on Nov 30, 2018in src/txmempool.h:593 in fae71ffc99 outdated
558@@ -559,7 +559,7 @@ class CTxMemPool 559 * Check that none of this transactions inputs are in the mempool, and thus 560 * the tx is not dependent on other mempool transactions to be included in a block. 561 */ 562- bool HasNoInputsOf(const CTransaction& tx) const; 563+ bool HasNoInputsOf(const CTransaction& tx) const EXCLUSIVE_LOCKS_REQUIRED(cs);
jamesob commented at 8:14 pm on November 30, 2018:Would it be clearer to just addLOCK(cs);
to the top of the function definition instead of this annotation since there’s a recursive lock acquisition inCTxMempool::exists
anyway?
MarcoFalke commented at 9:01 pm on November 30, 2018:Imo it is clearer to let the caller figure out how long the lock is required to be held. Imagine this is called in a loop, the you, most likely, wouldn’t want to release the lock in between each iteration? Also, it happens that all callers already have the lock taken, since this annotation can be added without any other changes.
jamesob commented at 9:13 pm on December 3, 2018:Yep, makes sense.in src/validation.cpp:439 in fae71ffc99 outdated
435@@ -436,7 +436,7 @@ bool CheckSequenceLocks(const CTxMemPool& pool, const CTransaction& tx, int flag 436 // Returns the script flags which should be checked for a given block 437 static unsigned int GetBlockScriptFlags(const CBlockIndex* pindex, const Consensus::Params& chainparams); 438 439-static void LimitMempoolSize(CTxMemPool& pool, size_t limit, unsigned long age) { 440+static void LimitMempoolSize(CTxMemPool& pool, size_t limit, unsigned long age) EXCLUSIVE_LOCKS_REQUIRED(pool.cs) {
jamesob commented at 8:19 pm on November 30, 2018:We require thepool.cs
lock for the duration of this function (instead of just relying on the lock inpool.Expire
) because we want to keep the mempool from changing whilepcoinsTip
modifications happen, correct?
MarcoFalke commented at 9:02 pm on November 30, 2018:Yeah, callingLimitMempoolSize
only makes sense after you have written to the mempool, in which case you already have acquired the lock and can keep it forLimitMempoolSize
.jamesob approvedjamesob commented at 8:22 pm on November 30, 2018: memberutACK https://github.com/bitcoin/bitcoin/pull/14193/commits/fae71ffc99b04bbcb96f4e2b6fa25ed2fe258912
Just a few clarifying questions.
in src/rpc/mining.cpp:488 in fae71ffc99 outdated
483+ // ::mempool.cs. 484+ LEAVE_CRITICAL_SECTION(lock); 485+ // Break if we've seen new transactions after the timeout has elapsed. 486 if (mempool.GetTransactionsUpdated() != nTransactionsUpdatedLastLP) 487 break; 488+ ENTER_CRITICAL_SECTION(lock);
practicalswift commented at 3:22 pm on December 6, 2018:ENTER_CRITICAL_SECTION(…);
isn’t RAII, right?Do we need to explicitlyLEAVE_CRITICAL_SECTION(…);
when exiting the loop?
jamesob commented at 3:32 pm on December 6, 2018:Nope; theunlock()
/debug stack pop thatLEAVE_CRITICAL_SECTION(...)
would perform is implicit to the destruction oflock
when it falls out of scope after thebreak
.
practicalswift commented at 3:36 pm on December 6, 2018:@jamesob Makes perfect sense: saw theWAIT_LOCK(g_best_block_mutex, lock);
now. Thanks.in src/test/validation_block_tests.cpp:211 in fae71ffc99 outdated
206+ * The test verifies this by creating a chain of `num_txs` blocks, matures their coinbases, and then 207+ * submits txns spending from their coinbase to the mempool. A fork chain is then processed, 208+ * invalidating the txns and evicting them from the mempool. 209+ * 210+ * We verify that the mempool updates atomically by polling it continuously from another thread 211+ * during the reorg and checking that its size only changes once.
ryanofsky commented at 7:03 pm on December 10, 2018:It wasn’t clear from reading this why size changing once was a useful thing to check. It would help to add some more explanation from your comment in #14193 (comment). Maybe an additional sentence: “The size changing exactly once indicates that the polling thread’s view of the mempool is either consistent with the chain state before reorg, or consistent with the chain state after the reorg, and not just consistent with some intermediate state during the reorg.in src/test/validation_block_tests.cpp:215 in fae71ffc99 outdated
210+ * We verify that the mempool updates atomically by polling it continuously from another thread 211+ * during the reorg and checking that its size only changes once. 212+ */ 213+BOOST_AUTO_TEST_CASE(mempool_locks_reorg) 214+{ 215+ auto last_mined = GoodBlock(Params().GenesisBlock().GetHash());
ryanofsky commented at 7:08 pm on December 10, 2018:Maybe declarelast_mined
below where it is used. It was confusing to see it not used at all in the lambda and then later referenced below.in src/test/validation_block_tests.cpp:225 in fae71ffc99 outdated
220+ // Process all mined blocks 221+ BOOST_REQUIRE(ProcessBlock(std::make_shared<CBlock>(Params().GenesisBlock()))); 222+ BOOST_REQUIRE(ProcessBlock(last_mined)); 223+ 224+ // Run the test multiple times 225+ for (int test_runs = 3; !!test_runs; --test_runs) {
ryanofsky commented at 7:10 pm on December 10, 2018:Why!!test_runs
everywhere in this PR instead oftest_runs > 0
? This seems strange, and less readable. Also, why count backwards instead of forwards?in src/test/validation_block_tests.cpp:301 in fae71ffc99 outdated
296+ }}; 297+ 298+ // Mine a reorg in this thread to invalidate and remove the txs from the tx pool 299+ last_mined = GoodBlock(split_hash); 300+ ProcessBlock(last_mined); 301+ for (size_t j = COINBASE_MATURITY + txs.size() + 1; j; --j) {
ryanofsky commented at 7:20 pm on December 10, 2018:No more !! here? And I was just getting used to it…in src/test/validation_block_tests.cpp:315 in fae71ffc99 outdated
285+ // Internally, we might be in the middle of the reorg, but 286+ // externally the reorg to the most-proof-of-work chain should 287+ // be atomic. So the caller assumes that the returned mempool 288+ // is consistent. That is, it has all txs that were there 289+ // before the reorg. 290+ assert(::mempool.mapTx.size() == txs.size());
ryanofsky commented at 7:27 pm on December 10, 2018:This is actually doing more than checking that “size only changes once” as described in the summary comment above. It is checking that the mempool either contains all of the transactions invalidated by the reorg, or none of them, and not some intermediate amount. Could update the comment above to say this instead.
MarcoFalke commented at 7:37 pm on February 11, 2019:The test has a ton of comments. Let me know if I need to add anything else to parts where the code is not self-explanatory.in src/rpc/mining.cpp:483 in d9e6cb497b outdated
477@@ -478,9 +478,14 @@ static UniValue getblocktemplate(const JSONRPCRequest& request) 478 { 479 if (g_best_block_cv.wait_until(lock, checktxtime) == std::cv_status::timeout) 480 { 481- // Timeout: Check transactions for update 482+ // Drop the best block mutex while accessing the mempool to avoid deadlock with 483+ // ::mempool.cs.
ryanofsky commented at 8:06 pm on December 10, 2018:This seems less than ideal. I would guess that we don’t actually need
ConnectTip
andDisconnectTip
be holding on tomempool.cs
while they wait to notify this RPC. At least would seem more straightforward ifg_best_block_cv
got notified after the fact, whenmempool.cs
was already released.I think it would be good to add a TODO to untangle these locks on the future, or a normal comment if mixing these locks has some benefits which are not apparent. The best place to comment would be on the
g_best_block_mutex
variable. It might also be good to addACQUIRED_AFTER
annotations there to prevent another deadlock like this.
MarcoFalke commented at 7:36 pm on February 11, 2019:Does you comment still apply after my latest rebase? Also,ACQUIRED_AFTER
only works in very limited setttings (e.g. not across translation units or even function bodies). We don’t use it anywhere else, so I will leave this as is for now.in src/validation.cpp:2689 in d9e6cb497b outdated
2685@@ -2686,6 +2686,7 @@ bool CChainState::ActivateBestChain(CValidationState &state, const CChainParams& 2686 2687 { 2688 LOCK(cs_main); 2689+ LOCK(::mempool.cs); // Lock for at least as long as it takes for connectTrace to be consumed
ryanofsky commented at 11:01 pm on December 10, 2018:In commit “validation: Add missing mempool locks” (d9e6cb497bd540078dbf32536825fa3bfd60df42)
re:https://github.com/bitcoin/bitcoin/pull/14193#discussion_r233150812
It’s not clear why the lock is being acquired here instead of before calling ActivateBestChainStep.
Thank you for adding the comment. But this is still not clear to me. If this code locked the
mempool.cs
before callingActivateBestChainStep
and unlocked it immediately after it returned, would there be a bug?ActivateBestChainStep
already updates the mempool, so if this code isn’t locking to maintain the invariant that views of the mempool are consistent with chainActive, what invariant is it trying to maintain?in src/validation.cpp:2801 in d9e6cb497b outdated
2794@@ -2794,6 +2795,7 @@ bool PreciousBlock(CValidationState& state, const CChainParams& params, CBlockIn 2795 bool CChainState::InvalidateBlock(CValidationState& state, const CChainParams& chainparams, CBlockIndex *pindex) 2796 { 2797 AssertLockHeld(cs_main); 2798+ LOCK(::mempool.cs); // Lock for as long as disconnectpool is in scope to make sure UpdateMempoolForReorg is called after DisconnetTip without unlocking in between
ryanofsky commented at 11:38 pm on December 10, 2018:In commit “[test] Add test to check mempool consistency in case of reorgs” (fae71ffc99b04bbcb96f4e2b6fa25ed2fe258912)
Calling it right before UpdateMempoolForReorg is too fragile and exactly the issue I want to avoid. (E.g. we DisconnetTip and then fail to call UpdateMempoolForReorg, because another thread acquired the lock)
Thanks for explaining and adding the comment. It is clear if mempool.cs needs to provide views of the mempool that are consistent with chainActive, then it needs to be locked between the first DisconnectTip call and the final UpdateMempoolForReorg call. But it would seem clearer and safer to lock the mempool right where
disconnectpool
is declared and to unlock it right after UpdateMempoolForReorg returns, instead of continuing to hold it while sending UI notifications.Also it seems like both DisconnectTip and ConnectTip should be annotated with
EXCLUSIVE_LOCKS_REQUIRED(cs_main, ::mempool.cs)
, and for some reason they aren’t.
MarcoFalke commented at 7:37 pm on February 11, 2019:Added locking annotations.ryanofsky commented at 0:44 am on December 11, 2018: memberI requested this in my last review (https://github.com/bitcoin/bitcoin/pull/14193#pullrequestreview-174491669) but I really think
mempool.cs
needs a comment saying how it’s supposed to be used and what invariants it guarantees. Here’s an attempt at a comment based on what I currently understand about what you’re trying to do with this PR:0//! Recursive mutex ensuring internal consistency of transactions within the 1//! mempool, and external consistency between transactions in the mempool and 2//! the current chain tip. This mutex must be locked: 3//! 4//! - whenever reading or writing to `mempool::mapTx` 5//! - whenever writing to `chainActive` or `pcoinsTip` 6//! 7//! and be held as long as mempool state and chain state are not consistent with 8//! each other. 9//! 10//! Locking either the `cs_main` mutex or the `mempool::cs` mutex will lock the 11//! chain tip and prevent it from being updated. But having two mutexes does 12//! allow some useful concurrency, because it lets a thread that reads or writes 13//! to the mempool run at the same time as a thread that reads the chain state. 14mutable CCriticalSection cs;
ryanofsky commented at 1:42 am on December 11, 2018: memberI’m probably missing some context, but what is wrong with a simpler design where you just acquire
cs_main
to get a consistent view of chain state, andmempool::cs
to get a consistent view of the mempool, and both to get a consistent view of both together?The new unit test and the getrawmempool example both seem very contrived. Why do these cases need to only acquire
mempool::cs
without acquiringcs_main
? If they need to views of the mempool consistent with the current chain tip, why don’t they just acquire both mutexes?DrahtBot added the label Needs rebase on Dec 13, 2018MarcoFalke force-pushed on Dec 22, 2018MarcoFalke force-pushed on Dec 22, 2018MarcoFalke force-pushed on Dec 22, 2018DrahtBot removed the label Needs rebase on Dec 24, 2018ryanofsky commented at 6:03 pm on January 4, 2019: memberre: #14193 (comment)
It looks like #14963 replaces this PR. Perhaps this should be closed, or marked dependent.
Turns out #14963 doesn’t really replace this PR, it just documents a bug which this PR (which is currently WIP) will fix:
re: #14193 (comment)
I’m probably missing some context, but what is wrong with a simpler design where you just acquire cs_main to get a consistent view of chain state, and mempool::cs to get a consistent view of the mempool, and both to get a consistent view of both together?
Responding to my own question, but answer from conversation seems to be that we want to allow applications that frequently query the mempool to (1) only acquire cs.mempool without cs_main for better performance and (2) not see any missing transactions if they happen to query the mempool in the middle of a reorg.
MarcoFalke force-pushed on Jan 15, 2019MarcoFalke commented at 6:52 pm on January 15, 2019: memberAddressed some of @ryanofsky’s feedbackDrahtBot added the label Needs rebase on Mar 7, 2019MarcoFalke force-pushed on Mar 7, 2019MarcoFalke force-pushed on Mar 7, 2019DrahtBot removed the label Needs rebase on Mar 7, 2019in src/txmempool.h:522 in fadeeadd93 outdated
515- * @par Consistency bug 516- * 517- * The second guarantee above is not currently enforced, but 518- * https://github.com/bitcoin/bitcoin/pull/14193 will fix it. No known code 519- * in bitcoin currently depends on second guarantee, but it is important to 520- * fix for third party code that needs be able to frequently poll the
ryanofsky commented at 4:33 pm on March 20, 2019:In commit “validation: Add missing mempool locks” (fadeeadd93a928c8ececa8c96ad2afa45bd25a29)
Might be worth keeping text that says why guarantee 2 is useful. James also asked about it here #14193#pullrequestreview-179871886, wondering if it might be useful for dandelion.
MarcoFalke commented at 5:18 pm on March 20, 2019:Not sure about that. We generally don’t update (forget to update) comments when adding new features. So if a feature is added in the far future (e.g. Dandelion has no chance to hit any time soon), the phrase “No known code in bitcoin currently depends on second guarantee” might no longer reflect reality.
I believe the test that I added serves a a nice documentation that if this guarantee is violated, it will fail. The motivation to fix this bug for (enterprise) rpc users that poll the mempool should be enough motivation on its own.
ryanofsky approvedryanofsky commented at 4:36 pm on March 20, 2019: memberutACK fa33e8f41fe14cfd136ac7c96e192a9e14f1d0d3. Looks good! This makes a lot more sense to me now, and the test is actually pretty clear too.MarcoFalke added this to the milestone 0.19.0 on Mar 20, 2019MarcoFalke added the label Bug on Mar 20, 2019in src/validation.cpp:2797 in fa33e8f41f outdated
2793@@ -2793,6 +2794,7 @@ bool CChainState::InvalidateBlock(CValidationState& state, const CChainParams& c 2794 LimitValidationInterfaceQueue(); 2795 2796 LOCK(cs_main); 2797+ LOCK(::mempool.cs); // Lock for as long as disconnectpool is in scope to make sure UpdateMempoolForReorg is called after DisconnetTip without unlocking in between
practicalswift commented at 2:02 pm on April 15, 2019:Nit: Should be “DisconnectTip” :-)CLEANSKY97 commented at 1:50 am on April 16, 2019: noneتکراری از #MarcoFalke force-pushed on Apr 16, 2019ryanofsky approvedMarcoFalke force-pushed on Apr 25, 2019MarcoFalke force-pushed on Apr 25, 2019MarcoFalke commented at 5:58 pm on April 25, 2019: memberrewritten test now that segwit is always active in unit tests:
- Add missing call to
GenerateCoinbaseCommitment
- Mine the reorg before adding txs to mempool, to not include them in the reorg blocks
MarcoFalke force-pushed on Apr 25, 2019DrahtBot added the label Needs rebase on May 7, 2019MarcoFalke force-pushed on May 7, 2019MarcoFalke force-pushed on May 7, 2019DrahtBot removed the label Needs rebase on May 7, 2019ryanofsky approvedryanofsky commented at 9:41 pm on May 7, 2019: memberutACK faa0d61f8ab42cc921bdcd73d6fd4b8b680aa22e. Only changes since last review are rebase and described test changes.
Mine the reorg before adding txs to mempool, to not include them in the reorg blocks.
I don’t understand how it’d be possible for transactions in the mempool to end up in the reorg, if the reorg code is just calling
Block()
and generating empty blocks. I also don’t understand what changed in the rebase if mining the reorg after adding transactions used to work, but now stopped working. In any case, it’d be useful to have a comment in the test saying that it’s important to mine the reorg before adding transactions, and why, assuming it is important.MarcoFalke commented at 9:52 pm on May 7, 2019: memberThe mining is using CreateNewBlock, which gets txs from the mempool (if available).
Why this worked previously, IIRC, was that segwit wasn’t active on regtest in unit tests, but non-standard txs would be accepted to the mempool. (See https://github.com/bitcoin/bitcoin/pull/15788/files#diff-01edb0ae76ec7f6061ec6ddbdb18375aL45 and #15891). Since the txs have witness, they wouldn’t be mined previously, but now that segwit is always active on regtest, they are.
In this test we don’t want to mine any txs from the mempool, so we mine the reorg when the mempool is empty and then add the txs to the mempool.
MarcoFalke commented at 1:51 pm on May 14, 2019: memberAlso, the test is already clear on this:// Mine a reorg (and hold it back) before adding the txs to the mempool
. If you have specific concerns, please suggest a rewording of that sentence.CLEANSKY97 approvedin src/rpc/mining.cpp:484 in fa84fdbb1e outdated
478@@ -479,9 +479,14 @@ static UniValue getblocktemplate(const JSONRPCRequest& request) 479 { 480 if (g_best_block_cv.wait_until(lock, checktxtime) == std::cv_status::timeout) 481 { 482- // Timeout: Check transactions for update 483+ // Drop the best block mutex while accessing the mempool to avoid deadlock with 484+ // ::mempool.cs. 485+ LEAVE_CRITICAL_SECTION(lock);
sipa commented at 9:52 pm on May 22, 2019:This is a pretty weird construction; generally you shouldn’t be releasing the lock that protects the state you’re waiting to change while inspecting it.
What about making
CTxMemPool::nTransactionsUpdated
an atomic, and makingCTxMemPool::GetTransactionsUpdated
not grapcs
? Would it still be needed to releaselock
here then?Would you be able to write this as
g_best_block_cv.wait_until(lock, checktxtime, [&](){ return g_best_block != hashWatchedChain || !IsRPCRunning() || mempool.GetTransactionsUpdate() != nTransactionsUpdatedLastLP; });
?
MarcoFalke commented at 10:35 am on June 7, 2019:Made it atomic, but I left the refactoring change for a future pull request.MarcoFalke force-pushed on May 23, 2019MarcoFalke force-pushed on May 23, 2019DrahtBot added the label Needs rebase on Jun 5, 2019MarcoFalke force-pushed on Jun 5, 2019DrahtBot removed the label Needs rebase on Jun 5, 2019MarcoFalke force-pushed on Jun 7, 2019txpool: Make nTransactionsUpdated atomic fa0c9dbf91validation: Add missing mempool locks fabeb1f613[test] Add test to check mempool consistency in case of reorgs fa2b083c3fMarcoFalke force-pushed on Jun 7, 2019laanwj added this to the "Blockers" column in a project
MarcoFalke moved this from the "Blockers" to the "Bugfixes" column in a project
ryanofsky approvedryanofsky commented at 6:27 pm on June 27, 2019: memberutACK fa2b083c3feb0522baf652045efa6b73458761a3 [EDIT: wase284e422e75189794e24fe482819d8b1407857c3, from bad copy and paste]. Changes since last review: rebase after #15976, adding vTxHashes lock annotation, adding new commit dropping mempool lock for nTransactionsUpdated and making it atomic to avoid deadlock between mempool lock and g_best_block_mutexin src/txmempool.cpp:326 in fa0c9dbf91 outdated
321@@ -322,8 +322,8 @@ void CTxMemPoolEntry::UpdateAncestorState(int64_t modifySize, CAmount modifyFee, 322 assert(int(nSigOpCostWithAncestors) >= 0); 323 } 324 325-CTxMemPool::CTxMemPool(CBlockPolicyEstimator* estimator) : 326- nTransactionsUpdated(0), minerPolicyEstimator(estimator) 327+CTxMemPool::CTxMemPool(CBlockPolicyEstimator* estimator) 328+ : nTransactionsUpdated(0), minerPolicyEstimator(estimator)
promag commented at 10:55 pm on June 29, 2019::eyes:in src/test/validation_block_tests.cpp:316 in fa2b083c3f
311+ // externally the reorg to the most-proof-of-work chain should 312+ // be atomic. So the caller assumes that the returned mempool 313+ // is consistent. That is, it has all txs that were there 314+ // before the reorg. 315+ assert(::mempool.mapTx.size() == txs.size()); 316+ continue;
promag commented at 11:07 pm on June 29, 2019:fa2b083c3feb0522baf652045efa6b73458761a3
Why
continue
?
MarcoFalke commented at 3:19 pm on June 30, 2019:Because the thread should continue and not exit (for now)
promag commented at 8:28 pm on July 1, 2019:Why would it exit?
MarcoFalke commented at 8:46 pm on July 1, 2019:It would exit when it
break
s. I think this is well documented in the test case and I am not sure what you are asking exactly.If you are asking about cpp syntax, I find cppreference really helpful. E.g. https://en.cppreference.com/w/cpp/language/break
If you are asking about this test case logic, please write a more verbose question.
ryanofsky commented at 2:11 pm on July 2, 2019:I don’t get this either. If you have a loop, and an unconditional continue is the last statement in the loop, the continue isn’t doing anything and could be removed.
MarcoFalke commented at 2:54 pm on July 2, 2019:Oh, I now I understand that you were asking why I put a redundant
continue
there.That was intentional, because I felt it was more clear back when I wrote the test. Happy to remove, but I think we shouldn’t spend too much time on style of tests.
promag commented at 3:04 pm on July 2, 2019:I didn’t mean to discuss style! I honestly though something was wrong because of the pointlesscontinue
, like missing code or something else.
MarcoFalke commented at 3:07 pm on July 2, 2019:Ah, good point. Sorry for the confusion.in src/test/validation_block_tests.cpp:304 in fa2b083c3f
299+ // validation is doing a reorg 300+ std::thread rpc_thread{[&]() { 301+ // This thread is checking that the mempool either contains all of 302+ // the transactions invalidated by the reorg, or none of them, and 303+ // not some intermediate amount. 304+ while (true) {
promag commented at 11:07 pm on June 29, 2019:fa2b083c3feb0522baf652045efa6b73458761a3
I think this is flawless, what guarantees that there’s a non-atomic change once
mempool.cs
is released?
MarcoFalke commented at 3:18 pm on June 30, 2019:The code changes in this pull request should guarantee that this doesn’t happen here in the test (you can check it by running the test before the code changes and after), as well as for “real” rpc polling threads. Though, I didn’t write a functional test for this.promag commented at 11:08 pm on June 29, 2019: memberConcept ACK, some comments.laanwj commented at 2:05 pm on July 2, 2019: membercode review ACK fa2b083c3feb0522baf652045efa6b73458761a3 @ryanofsky didn’t you ack the wrong commit above ? e284e422e75189794e24fe482819d8b1407857c3 (“Remove getBlockDepth method from Chain::interface”) isn’t part of this PRryanofsky commented at 2:20 pm on July 2, 2019: member@ryanofsky didn’t you ack the wrong commit above
Sorry, bad copy and paste from the command line. Edited #14193#pullrequestreview-255403869 to fix.
laanwj commented at 2:29 pm on July 2, 2019: memberThanks!laanwj merged this on Jul 2, 2019laanwj closed this on Jul 2, 2019
laanwj referenced this in commit c6e42f1ca9 on Jul 2, 2019MarcoFalke deleted the branch on Jul 2, 2019laanwj removed this from the "Bugfixes" column in a project
sidhujag referenced this in commit 4408967f33 on Jul 3, 2019laanwj referenced this in commit e9f73b839a on Oct 26, 2019jasonbcox referenced this in commit bcc81a5690 on Sep 25, 2020jasonbcox referenced this in commit 58e79dae40 on Sep 25, 2020jasonbcox referenced this in commit 472a674203 on Sep 25, 2020jasonbcox referenced this in commit 211afcf44c on Sep 25, 2020LarryRuane referenced this in commit b02e36c230 on May 10, 2021LarryRuane referenced this in commit 9ee6369bcb on May 10, 2021Munkybooty referenced this in commit 5fdbb0b6de on Nov 4, 2021DrahtBot locked this on Dec 16, 2021
MarcoFalke practicalswift ryanofsky DrahtBot sdaftuar jamesob CLEANSKY97 sipa promag laanwjLabels
Bug ValidationMilestone
0.19.0
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-18 21:12 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me