Improve DisconnectTip performance

sdaftuar commented at 10:42 am on November 23, 2016: member

During a reorg, we currently try to add transactions from each disconnected block back to the mempool, even though we are likely to find many of those transactions in the blocks that get connected.

This PR stores those transactions in a separate “disconnect pool” for later processing. This has a few side effects:

There should be many fewer transactions to re-add to the mempool, assuming the reorg is benign and most transactions appear on both forks, which means we’ll do less work.
When we do add things to the mempool, we’ll do it in one attempt rather than separately for each disconnected block, so mempool chain limits will be calculated and applied differently for reorgs than before. Overall it should be much more efficient, because we will have fewer calls chasing long chains in the mempool to keep track of ancestor and descendant state (both because we will call UpdateTransactionsFromBlock on a smaller set of transactions overall, and because the descendant caching it does isn’t reused across calls, so it’s more efficient to call UpdateTransactionsFromBlock once for all of a reorg’s transactions than multiple times).
Storing the transactions in this new disconnect pool means we use more memory during a reorg. I’ve added a bound on how much memory can be used, and if we would exceed it we just try to keep the oldest transactions that are disconnected (the idea being that it’s more important to ensure that older transactions that are reorged out have an opportunity to be re-mined, compared with newer ones). This could probably be improved in the future, since in such a (rare) scenario, we would probably want to throw out our current mempool in favor of being able to hold more older transactions, or we might want to re-read those transactions off disk to try to pick up everything that is now unconfirmed. But for now I made the memory bound large enough that this shouldn’t practically occur.

fanquake added the label Refactoring on Nov 23, 2016

sdaftuar commented at 2:05 am on November 29, 2016: member

I just realized there’s a potential issue here with the callbacks for conflicted transactions; in the case where a disconnected block transaction ends up conflicting with some tx in a block on the reorg, the call to ATMP will fail (because some input is spent) but not be reported via SyncTransaction() as a conflicted transaction.

EDIT: After some discussion with @morcos I think there’s not an issue here. Mempool conflict tracking in general is basically just best-efforts and, in the case of reorgs, dependent on mempool policy already.

sdaftuar force-pushed on Dec 26, 2016

sdaftuar force-pushed on Jan 4, 2017

sdaftuar renamed this:
~~[WIP] Improve DisconnectTip performance~~
Improve DisconnectTip performance
on Jan 4, 2017

sdaftuar commented at 9:05 pm on January 4, 2017: member

Rebased, and removed WIP tag as this is now ready for review.

dcousens commented at 9:48 pm on January 4, 2017: contributor

concept ACK, will review

gmaxwell commented at 9:08 am on January 8, 2017: contributor

Needs rebase. utACK.

sdaftuar force-pushed on Jan 8, 2017

sdaftuar commented at 3:13 pm on January 8, 2017: member

Rebased

in src/txmempool.h: in 37a90f1ad0 outdated

780+    void removeForBlock(const std::vector<CTransactionRef>& vtx)
781+    {
782+        for (auto const &tx : vtx) {
783+            auto it = queuedTx.find(tx->GetHash());
784+            if (it != queuedTx.end()) {
785+                cachedInnerUsage -= RecursiveDynamicUsage(*it);

TheBlueMatt commented at 7:31 pm on February 28, 2017:

DRY: Maybe use removeEntry here?

sdaftuar commented at 7:16 pm on March 28, 2017:

I’m going to leave this, since removeEntry takes an iterator into a different multi_index index, and I’d rather removeForBlock be as fast as possible.

ryanofsky commented at 10:33 pm on February 28, 2017: member

utACK 37a90f1ad0b57de2ab4294dfa4a97d2e666dd479

PR makes sense, and nothing has changed except a few comments since my last review (for 4d70f47a94cc739e92f7d3f8f4f7b11cf7457ca9).

One thing I don’t understand is why the pruning test needs to change when it sounds like you are saying the disconnect pool memory bound is large enough that losing transactions “shouldn’t practically occur.” Are you just updating the pruning test as a precaution, or are changes there needed because the chain it creates is unusual?

Also this PR has merge conflicts, but they are very minor. The validation.cpp conflict can be fixed by updating AcceptToMemoryPool arguments which changed in edded808fc4eee94c178e1779b90d1c87a08c23a (#9499).

dcousens approved

dcousens commented at 11:32 pm on February 28, 2017: contributor

utACK 37a90f1

sdaftuar force-pushed on Mar 1, 2017

sdaftuar commented at 6:34 pm on March 1, 2017: member

Rebased. @ryanofsky The changes to pruning.py are because:

Policy limits get applied differently than before; we used to ignore some of the chain limits when adding transactions back to the mempool during a reorg (because we wouldn’t look for in-mempool descendants until the end).
The pruning test is trying to build really big blocks in a number of places, to make the chain big enough to be pruned, and in particular in a few places it relies on the mempool being populated with disconnected transactions after a reorg (so changes to policy that would prevent that are problematic for the test).

So I mitigated this in part by increasing the chain limits and in part by trying to keep the mempool somewhat populated with a node’s wallet transactions at a couple relevant places in the test.

[I actually briefly thought that because we now save the mempool in between restarts, we may not need the resendwallettransactions() calls I added (which I use to populate the mempool somewhat in a couple places), but I tested locally that removing them results in test failure (plausibly because the mined blocks aren’t big enough, which I think is due to the reorg behavior change). I do think this test needs to be overhauled now that many of the assumptions around mempool behavior have changed, but I think we should do that in another PR.]

sdaftuar force-pushed on Mar 6, 2017

sdaftuar force-pushed on Mar 7, 2017

sdaftuar commented at 7:38 pm on March 7, 2017: member

Rebased after #9602.

in src/validation.cpp: in 344862a0ac outdated

591+        // ignore validation errors in resurrected transactions
592+        CValidationState stateDummy;
593+        if ((*it)->IsCoinBase() || !AcceptToMemoryPool(mempool, stateDummy, *it, false, NULL, NULL, true)) {
594+            // If the transaction doesn't make it in to the mempool, remove any
595+            // transactions that depend on it (which would now be orphans).
596+            mempool.removeRecursive(**it);

ryanofsky commented at 6:15 pm on March 8, 2017:

In commit “Store disconnected block transactions outside mempool during reorg”:

It looks like the new MemPoolRemovalReason::REORG argument for removeRecursive got dropped during the rebase. (Shame that we don’t a have test that would catch this, and that the argument is optional to begin with.)

sdaftuar commented at 2:28 am on March 11, 2017:

Thanks for catching this! Fixed.

sipa commented at 6:24 am on March 12, 2017:

I don’t see MemPoolRemovalReason::REORG anywhere?

ryanofsky commented at 6:24 pm on March 8, 2017: member

utACK c8d4c5fa8f8d2375ea781523ae7619f1cefea6f9 if the MemPoolRemovalReason::REORG argument is added back.

Confirmed no other code changes since the previous rebase.

sdaftuar force-pushed on Mar 10, 2017

sdaftuar force-pushed on Mar 11, 2017

sdaftuar commented at 2:27 am on March 11, 2017: member

Rebased again to pick up the pruning.py fix from #9972

in src/txmempool.h: in 2863901070 outdated

200     result_type operator() (const CTxMemPoolEntry &entry) const
201     {
202         return entry.GetTx().GetHash();
203     }
204+
205+    result_type operator() (CTransactionRef tx) const

sipa commented at 4:53 am on March 12, 2017:

Pass the CTransactionRef by reference? That will avoid an atomic inc/dec when copying/destroing the CTransactionRef.

in src/txmempool.h: in 2863901070 outdated

739+    // no exact formula for boost::multi_index_contained is implemented.
740+    size_t DynamicMemoryUsage() const {
741+        return memusage::MallocUsage(sizeof(CTransactionRef) + 6 * sizeof(void*)) * queuedTx.size() + cachedInnerUsage;
742+    }
743+
744+    void addTransaction(CTransactionRef tx)

sipa commented at 5:13 am on March 12, 2017:

const CTransactionRef& here as well, or queuedTx.insert(std::move(tx)) below.

sdaftuar commented at 11:03 am on March 13, 2017: member

@sipa Thanks for reviewing (yikes, I think I must have done my second rebase on Friday using an older branch). All the comments should be addressed in the last two fixup commits; let me know if I should go ahead and squash.

in src/validation.cpp: in 2863901070 outdated

2443@@ -2402,6 +2444,10 @@ static bool ActivateBestChainStep(CValidationState& state, const CChainParams& c
2444         }
2445     }
2446 
2447+    // If any blocks were disconnected, disconnectpool may be non empty.  Add
2448+    // any disconnected transactions back to the mempool.
2449+    ReacceptToMemoryPool(disconnectpool);

morcos commented at 3:25 pm on March 13, 2017:

nit: Any reason to put ReacceptToMemoryPool here and not under if (fBlocksDisconnected)? It might make the logic slightly clearer.

sdaftuar commented at 7:16 pm on March 28, 2017:

Fixed

morcos commented at 3:27 pm on March 13, 2017: member

fast review utACK

in src/validation.cpp: in 10853b3517 outdated

2388@@ -2351,9 +2389,13 @@ static bool ActivateBestChainStep(CValidationState& state, const CChainParams& c
2389 
2390     // Disconnect active blocks which are no longer in the best chain.
2391     bool fBlocksDisconnected = false;
2392+    DisconnectedBlockTransactions disconnectpool;
2393     while (chainActive.Tip() && chainActive.Tip() != pindexFork) {
2394-        if (!DisconnectTip(state, chainparams))
2395+        if (!DisconnectTip(state, chainparams, disconnectpool)) {
2396+            // The mempool will be inconsistent, but we're about to shutdown

TheBlueMatt commented at 3:46 pm on March 13, 2017:

I dont think this is always true. Callers dont always enforce this (eg ActivateBestChains in net_processing, PreciousBlock from RPC, etc,e tc) and callees dont either (eg DisconnectBlock’s UndoReadFromDisk doesnt appear to generate an AbortNode). These might actually also be bugs, but I think you’re introducing a stronger assumption than previously here.

sdaftuar commented at 7:16 pm on March 28, 2017:

Fixed. I do think that a failure in DisconnectTip() should always cause shutdown, as we can no longer be sure we’re in consensus, but no need to worry about that here.

in src/validation.cpp: in 10853b3517 outdated

3750@@ -3697,7 +3751,9 @@ bool RewindBlockIndex(const CChainParams& params)
3751             // of the blockchain).
3752             break;
3753         }
3754-        if (!DisconnectTip(state, params, true)) {
3755+        DisconnectedBlockTransactions dummypool;
3756+        // fBare=true means nothing will actually be added to dummypool

TheBlueMatt commented at 4:10 pm on March 13, 2017:

Hmm, thats a super confusing API now….maybe either pass in a pointer to a DisconnectedBlockTransactions as both fBare and the pool?

sdaftuar commented at 7:16 pm on March 28, 2017:

Fixed.

in src/validation.h: in 10853b3517 outdated

68@@ -69,6 +69,8 @@ static const unsigned int DEFAULT_DESCENDANT_LIMIT = 25;
69 static const unsigned int DEFAULT_DESCENDANT_SIZE_LIMIT = 101;
70 /** Default for -mempoolexpiry, expiration time for mempool transactions in hours */
71 static const unsigned int DEFAULT_MEMPOOL_EXPIRY = 336;
72+/** Maximum kilobytes for transactions to store for processing during reorg */
73+static const unsigned int MAX_DISCONNECTED_TX_POOL_SIZE = 75000;

TheBlueMatt commented at 4:11 pm on March 13, 2017:

nit: Yeesh, isnt that a bit high? Maybe like 10MB - if we reorg something larger than 2/3 blocks throwing away tons of txn is fine by me.

sdaftuar commented at 7:16 pm on March 28, 2017:

Dropping it to 20 (segwit blocks can be bigger).

TheBlueMatt commented at 4:14 pm on March 13, 2017: member

Strong concept ACK

dcousens commented at 7:05 am on March 24, 2017: contributor

Needs rebase?

sdaftuar force-pushed on Mar 28, 2017

sdaftuar commented at 7:17 pm on March 28, 2017: member

First, I rebased https://github.com/sdaftuar/bitcoin/commits/fd-rebased.2 -> https://github.com/sdaftuar/bitcoin/commits/fd-rebased.3, no actual conflicts.

Then I added 4 fixup commits to address the nits mentioned so far. I also added an assertion that a DisconnectedBlockTransactions object can’t be destroyed if it’s non-empty… Not sure how good an idea that is, but I thought it might catch code bugs in the future related to mempool consistency?

sdaftuar force-pushed on Mar 28, 2017

sdaftuar commented at 7:21 pm on March 28, 2017: member

Squashed all those commits down (https://github.com/sdaftuar/bitcoin/commits/fd-rebased.4 -> https://github.com/bitcoin/bitcoin/pull/9208/commits/5685657564e7a19635529eabea6af829a3a4996c)

in src/validation.cpp:2469 in 5685657564 outdated

2428@@ -2379,7 +2429,7 @@ static bool ActivateBestChainStep(CValidationState& state, const CChainParams& c
2429 
2430         // Connect new blocks.
2431         BOOST_REVERSE_FOREACH(CBlockIndex *pindexConnect, vpindexToConnect) {
2432-            if (!ConnectTip(state, chainparams, pindexConnect, pindexConnect == pindexMostWork ? pblock : std::shared_ptr<const CBlock>(), connectTrace)) {
2433+            if (!ConnectTip(state, chainparams, pindexConnect, pindexConnect == pindexMostWork ? pblock : std::shared_ptr<const CBlock>(), connectTrace, disconnectpool)) {

TheBlueMatt commented at 0:54 am on March 30, 2017:

We’ll now assert() on L2445 when we return false if ConnectTip fails, instead of gracefully AbortNode()ing.

sdaftuar commented at 6:07 pm on March 30, 2017:

I don’t know what the best answer here is. I could call ReacceptToMemoryPool(), but doing so after we’ve invoked AbortNode() seems either annoying (trying to lookup UTXOs from disk when we’re potentially having disk trouble seems bad), or potentially problematic (is it okay to trigger SyncTransaction() callbacks when our on-disk state is potentially corrupt?).

One idea: check to see if fShutdownRequested is set, and if so, just wipe the disconnectpool and the mempool so that any observers of the mempool won’t see a potentially inconsistent state (and if fShutdownRequested is not set, call ReacceptToMemoryPool() – this should never happen). Then the mempool saving won’t work in the event of an AbortNode(), but maybe that’s okay, we’re probably out of disk space anyway. @sipa Any suggestions on how to handle this?

TheBlueMatt commented at 8:31 pm on March 30, 2017:

For posterity: meeting conclusion, broadly, appeared to be that we should move forward with AbortNode() eventually migrating towards being used only for non-critical errors that result in a “we should shutdown soonish”, and critical errors such as on-disk corruption should be an assert(false), ie that we should move forward here ensuring that the mempool is consistent when we return.

sdaftuar commented at 9:13 pm on March 30, 2017:

@TheBlueMatt I pushed up a commit that addresses the edge-cases better, I think – it now will just erase from the mempool to become consistent again. Also patches up the missing mempool.removeForReorg calls in a couple places, and moves all the reorg consistency handling into one function to get rid of some code duplication.

TheBlueMatt commented at 10:26 pm on April 12, 2017: member

utACK 65a4dcc46f1ef942d52d8445cdc4a2ef385c816b, though please squash it with e4a0440fab759a0a3c2d4d03f9bd7463d84dde9a

sdaftuar force-pushed on Apr 13, 2017

sdaftuar commented at 2:03 am on April 13, 2017: member

Squashed 65a4dcc -> 688417b

in src/txmempool.h:753 in 688417b932 outdated

758+    }
759+
760+    // Remove entries based on txid_index, and update memory usage.
761+    void removeForBlock(const std::vector<CTransactionRef>& vtx)
762+    {
763+        for (auto const &tx : vtx) {

morcos commented at 3:05 pm on April 13, 2017:

probably should shortcut to skip this loop if disconnectpool is empty which is the common case in connecttip

sdaftuar commented at 3:54 pm on April 13, 2017:

Agreed, fixed in 15b46d94b53af6429f808bb47ebc2a40da9dc791

morcos commented at 3:12 pm on April 13, 2017: member

utACK 688417b modulo performance nit

sdaftuar force-pushed on Apr 13, 2017

sdaftuar commented at 3:55 pm on April 13, 2017: member

Squashed 15b46d94b53af6429f808bb47ebc2a40da9dc791 -> 75b072df512442829bf288f82ed8ce0830512763

morcos commented at 5:25 pm on April 17, 2017: member

utACK 75b072d

morcos approved

in src/core_memusage.h:68 in 75b072df51 outdated

62@@ -63,4 +63,9 @@ static inline size_t RecursiveDynamicUsage(const CBlockLocator& locator) {
63     return memusage::DynamicUsage(locator.vHave);
64 }
65 
66+template<typename X>
67+static inline size_t RecursiveDynamicUsage(const std::shared_ptr<X>& p) {
68+    return RecursiveDynamicUsage(*p) + memusage::DynamicUsage(p);

theuni commented at 5:12 pm on April 20, 2017:

Watch the null deref:

0return p ? memusage::DynamicUsage(p) + RecursiveDynamicUsage(*p) : 0;

sdaftuar commented at 12:52 pm on April 21, 2017:

Thanks, fixed in ea1b122

sdaftuar force-pushed on Apr 21, 2017

sdaftuar commented at 12:59 pm on April 21, 2017: member

Fixed @theuni’s nit in ea1b122 (https://github.com/sdaftuar/bitcoin/commits/fd-rebased.7), and squashed in 2cb9509

jonasschnelli added this to the "Blockers" column in a project

[qa] Relax assumptions on mempool behavior during reorg

Policy limits (such as chain limits and mempool total size) could reasonably
be enforced more aggressively during a reorg, so use resendwallettransactions
to repopulate the mempool to avoid mined blocks being too small, and increase
the chain limits from the default for this test.

This is in preparation for a change in mempool behavior during a reorg.

9decd648ac

Store disconnected block transactions outside mempool during reorg

Rather than re-add disconnected block transactions back to the mempool
immediately, store them in a separate disconnectpool for later processing,
because we expect most such transactions to reappear in the chain that is
still to be connected (and thus we can avoid the work of reprocessing those
transactions through the mempool altogether).

71f1903353

Add RecursiveDynamicUsage overload for std::shared_ptr

This simplifies a few usage expressions.

c1235e3f2d

sdaftuar force-pushed on May 4, 2017

sdaftuar commented at 9:12 pm on May 4, 2017: member

Rebased again due to merge conflict in pruning.py

sdaftuar commented at 1:25 am on May 12, 2017: member

Anything else needed here?

gmaxwell approved

gmaxwell commented at 2:05 pm on May 22, 2017: contributor

tested ACK (I previously utACKed, I know longer remember the code but I’ve used it a bunch and beat on it a bit)

TheBlueMatt commented at 9:26 pm on May 29, 2017: member

re-utACK c1235e3f2dd5b01b63b020d1b8f7283e8badaf09

morcos commented at 3:20 pm on May 30, 2017: member

re-utACK c1235e3

laanwj merged this on May 30, 2017

laanwj closed this on May 30, 2017

laanwj referenced this in commit acd9957b72 on May 30, 2017

laanwj removed this from the "Blockers" column in a project

PastaPastaPasta referenced this in commit d686d1c8f4 on Jun 20, 2019

PastaPastaPasta referenced this in commit c64f2c635c on Jun 22, 2019

PastaPastaPasta referenced this in commit 923dc3db00 on Jun 22, 2019

PastaPastaPasta referenced this in commit 04de684675 on Jun 22, 2019

PastaPastaPasta referenced this in commit 33387d52a2 on Jun 22, 2019

PastaPastaPasta referenced this in commit 08fcd38f4c on Jun 22, 2019

PastaPastaPasta referenced this in commit fc63960a13 on Jun 22, 2019

PastaPastaPasta referenced this in commit b7c941a1b5 on Jun 22, 2019

PastaPastaPasta referenced this in commit c3d5c66440 on Jun 22, 2019

PastaPastaPasta referenced this in commit 03780c7932 on Jun 22, 2019

PastaPastaPasta referenced this in commit d2638d7823 on Jun 22, 2019

PastaPastaPasta referenced this in commit a8566404be on Jun 24, 2019

barrystyle referenced this in commit f378afe693 on Jan 22, 2020

random-zebra referenced this in commit 1f655a649a on Mar 22, 2021

random-zebra referenced this in commit 425f72f00c on Mar 25, 2021

random-zebra referenced this in commit b33d4596f9 on Mar 25, 2021

random-zebra referenced this in commit 2a2fb4b168 on Mar 25, 2021

random-zebra referenced this in commit b80ec44f3b on Mar 30, 2021

random-zebra referenced this in commit 2ae953af3d on Apr 4, 2021

random-zebra referenced this in commit af984e1780 on Apr 6, 2021

DrahtBot locked this on Sep 8, 2021

Improve DisconnectTip performance #9208