Improve DisconnectTip performance #9208

pull sdaftuar wants to merge 3 commits into bitcoin:master from sdaftuar:faster-disconnect-rebased changing 6 files +213 −35
  1. sdaftuar commented at 10:42 am on November 23, 2016: member

    During a reorg, we currently try to add transactions from each disconnected block back to the mempool, even though we are likely to find many of those transactions in the blocks that get connected.

    This PR stores those transactions in a separate “disconnect pool” for later processing. This has a few side effects:

    • There should be many fewer transactions to re-add to the mempool, assuming the reorg is benign and most transactions appear on both forks, which means we’ll do less work.

    • When we do add things to the mempool, we’ll do it in one attempt rather than separately for each disconnected block, so mempool chain limits will be calculated and applied differently for reorgs than before. Overall it should be much more efficient, because we will have fewer calls chasing long chains in the mempool to keep track of ancestor and descendant state (both because we will call UpdateTransactionsFromBlock on a smaller set of transactions overall, and because the descendant caching it does isn’t reused across calls, so it’s more efficient to call UpdateTransactionsFromBlock once for all of a reorg’s transactions than multiple times).

    • Storing the transactions in this new disconnect pool means we use more memory during a reorg. I’ve added a bound on how much memory can be used, and if we would exceed it we just try to keep the oldest transactions that are disconnected (the idea being that it’s more important to ensure that older transactions that are reorged out have an opportunity to be re-mined, compared with newer ones). This could probably be improved in the future, since in such a (rare) scenario, we would probably want to throw out our current mempool in favor of being able to hold more older transactions, or we might want to re-read those transactions off disk to try to pick up everything that is now unconfirmed. But for now I made the memory bound large enough that this shouldn’t practically occur.

  2. fanquake added the label Refactoring on Nov 23, 2016
  3. sdaftuar commented at 2:05 am on November 29, 2016: member

    I just realized there’s a potential issue here with the callbacks for conflicted transactions; in the case where a disconnected block transaction ends up conflicting with some tx in a block on the reorg, the call to ATMP will fail (because some input is spent) but not be reported via SyncTransaction() as a conflicted transaction.

    EDIT: After some discussion with @morcos I think there’s not an issue here. Mempool conflict tracking in general is basically just best-efforts and, in the case of reorgs, dependent on mempool policy already.

  4. sdaftuar force-pushed on Dec 26, 2016
  5. sdaftuar force-pushed on Jan 4, 2017
  6. sdaftuar renamed this:
    [WIP] Improve DisconnectTip performance
    Improve DisconnectTip performance
    on Jan 4, 2017
  7. sdaftuar commented at 9:05 pm on January 4, 2017: member
    Rebased, and removed WIP tag as this is now ready for review.
  8. dcousens commented at 9:48 pm on January 4, 2017: contributor
    concept ACK, will review
  9. gmaxwell commented at 9:08 am on January 8, 2017: contributor
    Needs rebase. utACK.
  10. sdaftuar force-pushed on Jan 8, 2017
  11. sdaftuar commented at 3:13 pm on January 8, 2017: member
    Rebased
  12. in src/txmempool.h: in 37a90f1ad0 outdated
    780+    void removeForBlock(const std::vector<CTransactionRef>& vtx)
    781+    {
    782+        for (auto const &tx : vtx) {
    783+            auto it = queuedTx.find(tx->GetHash());
    784+            if (it != queuedTx.end()) {
    785+                cachedInnerUsage -= RecursiveDynamicUsage(*it);
    


    TheBlueMatt commented at 7:31 pm on February 28, 2017:
    DRY: Maybe use removeEntry here?

    sdaftuar commented at 7:16 pm on March 28, 2017:
    I’m going to leave this, since removeEntry takes an iterator into a different multi_index index, and I’d rather removeForBlock be as fast as possible.
  13. ryanofsky commented at 10:33 pm on February 28, 2017: member

    utACK 37a90f1ad0b57de2ab4294dfa4a97d2e666dd479

    PR makes sense, and nothing has changed except a few comments since my last review (for 4d70f47a94cc739e92f7d3f8f4f7b11cf7457ca9).

    One thing I don’t understand is why the pruning test needs to change when it sounds like you are saying the disconnect pool memory bound is large enough that losing transactions “shouldn’t practically occur.” Are you just updating the pruning test as a precaution, or are changes there needed because the chain it creates is unusual?

    Also this PR has merge conflicts, but they are very minor. The validation.cpp conflict can be fixed by updating AcceptToMemoryPool arguments which changed in edded808fc4eee94c178e1779b90d1c87a08c23a (#9499).

  14. dcousens approved
  15. dcousens commented at 11:32 pm on February 28, 2017: contributor
    utACK 37a90f1
  16. sdaftuar force-pushed on Mar 1, 2017
  17. sdaftuar commented at 6:34 pm on March 1, 2017: member

    Rebased. @ryanofsky The changes to pruning.py are because:

    • Policy limits get applied differently than before; we used to ignore some of the chain limits when adding transactions back to the mempool during a reorg (because we wouldn’t look for in-mempool descendants until the end).
    • The pruning test is trying to build really big blocks in a number of places, to make the chain big enough to be pruned, and in particular in a few places it relies on the mempool being populated with disconnected transactions after a reorg (so changes to policy that would prevent that are problematic for the test).

    So I mitigated this in part by increasing the chain limits and in part by trying to keep the mempool somewhat populated with a node’s wallet transactions at a couple relevant places in the test.

    [I actually briefly thought that because we now save the mempool in between restarts, we may not need the resendwallettransactions() calls I added (which I use to populate the mempool somewhat in a couple places), but I tested locally that removing them results in test failure (plausibly because the mined blocks aren’t big enough, which I think is due to the reorg behavior change). I do think this test needs to be overhauled now that many of the assumptions around mempool behavior have changed, but I think we should do that in another PR.]

  18. sdaftuar force-pushed on Mar 6, 2017
  19. sdaftuar force-pushed on Mar 7, 2017
  20. sdaftuar commented at 7:38 pm on March 7, 2017: member
    Rebased after #9602.
  21. in src/validation.cpp: in 344862a0ac outdated
    591+        // ignore validation errors in resurrected transactions
    592+        CValidationState stateDummy;
    593+        if ((*it)->IsCoinBase() || !AcceptToMemoryPool(mempool, stateDummy, *it, false, NULL, NULL, true)) {
    594+            // If the transaction doesn't make it in to the mempool, remove any
    595+            // transactions that depend on it (which would now be orphans).
    596+            mempool.removeRecursive(**it);
    


    ryanofsky commented at 6:15 pm on March 8, 2017:

    In commit “Store disconnected block transactions outside mempool during reorg”:

    It looks like the new MemPoolRemovalReason::REORG argument for removeRecursive got dropped during the rebase. (Shame that we don’t a have test that would catch this, and that the argument is optional to begin with.)


    sdaftuar commented at 2:28 am on March 11, 2017:
    Thanks for catching this! Fixed.

    sipa commented at 6:24 am on March 12, 2017:
    I don’t see MemPoolRemovalReason::REORG anywhere?
  22. ryanofsky commented at 6:24 pm on March 8, 2017: member

    utACK c8d4c5fa8f8d2375ea781523ae7619f1cefea6f9 if the MemPoolRemovalReason::REORG argument is added back.

    Confirmed no other code changes since the previous rebase.

  23. sdaftuar force-pushed on Mar 10, 2017
  24. sdaftuar force-pushed on Mar 10, 2017
  25. sdaftuar force-pushed on Mar 11, 2017
  26. sdaftuar commented at 2:27 am on March 11, 2017: member
    Rebased again to pick up the pruning.py fix from #9972
  27. in src/txmempool.h: in 2863901070 outdated
    200     result_type operator() (const CTxMemPoolEntry &entry) const
    201     {
    202         return entry.GetTx().GetHash();
    203     }
    204+
    205+    result_type operator() (CTransactionRef tx) const
    


    sipa commented at 4:53 am on March 12, 2017:
    Pass the CTransactionRef by reference? That will avoid an atomic inc/dec when copying/destroing the CTransactionRef.
  28. in src/txmempool.h: in 2863901070 outdated
    739+    // no exact formula for boost::multi_index_contained is implemented.
    740+    size_t DynamicMemoryUsage() const {
    741+        return memusage::MallocUsage(sizeof(CTransactionRef) + 6 * sizeof(void*)) * queuedTx.size() + cachedInnerUsage;
    742+    }
    743+
    744+    void addTransaction(CTransactionRef tx)
    


    sipa commented at 5:13 am on March 12, 2017:
    const CTransactionRef& here as well, or queuedTx.insert(std::move(tx)) below.
  29. sdaftuar commented at 11:03 am on March 13, 2017: member
    @sipa Thanks for reviewing (yikes, I think I must have done my second rebase on Friday using an older branch). All the comments should be addressed in the last two fixup commits; let me know if I should go ahead and squash.
  30. in src/validation.cpp: in 2863901070 outdated
    2443@@ -2402,6 +2444,10 @@ static bool ActivateBestChainStep(CValidationState& state, const CChainParams& c
    2444         }
    2445     }
    2446 
    2447+    // If any blocks were disconnected, disconnectpool may be non empty.  Add
    2448+    // any disconnected transactions back to the mempool.
    2449+    ReacceptToMemoryPool(disconnectpool);
    


    morcos commented at 3:25 pm on March 13, 2017:
    nit: Any reason to put ReacceptToMemoryPool here and not under if (fBlocksDisconnected)? It might make the logic slightly clearer.

    sdaftuar commented at 7:16 pm on March 28, 2017:
    Fixed
  31. morcos commented at 3:27 pm on March 13, 2017: member
    fast review utACK
  32. in src/validation.cpp: in 10853b3517 outdated
    2388@@ -2351,9 +2389,13 @@ static bool ActivateBestChainStep(CValidationState& state, const CChainParams& c
    2389 
    2390     // Disconnect active blocks which are no longer in the best chain.
    2391     bool fBlocksDisconnected = false;
    2392+    DisconnectedBlockTransactions disconnectpool;
    2393     while (chainActive.Tip() && chainActive.Tip() != pindexFork) {
    2394-        if (!DisconnectTip(state, chainparams))
    2395+        if (!DisconnectTip(state, chainparams, disconnectpool)) {
    2396+            // The mempool will be inconsistent, but we're about to shutdown
    


    TheBlueMatt commented at 3:46 pm on March 13, 2017:
    I dont think this is always true. Callers dont always enforce this (eg ActivateBestChains in net_processing, PreciousBlock from RPC, etc,e tc) and callees dont either (eg DisconnectBlock’s UndoReadFromDisk doesnt appear to generate an AbortNode). These might actually also be bugs, but I think you’re introducing a stronger assumption than previously here.

    sdaftuar commented at 7:16 pm on March 28, 2017:
    Fixed. I do think that a failure in DisconnectTip() should always cause shutdown, as we can no longer be sure we’re in consensus, but no need to worry about that here.
  33. in src/validation.cpp: in 10853b3517 outdated
    3750@@ -3697,7 +3751,9 @@ bool RewindBlockIndex(const CChainParams& params)
    3751             // of the blockchain).
    3752             break;
    3753         }
    3754-        if (!DisconnectTip(state, params, true)) {
    3755+        DisconnectedBlockTransactions dummypool;
    3756+        // fBare=true means nothing will actually be added to dummypool
    


    TheBlueMatt commented at 4:10 pm on March 13, 2017:
    Hmm, thats a super confusing API now….maybe either pass in a pointer to a DisconnectedBlockTransactions as both fBare and the pool?

    sdaftuar commented at 7:16 pm on March 28, 2017:
    Fixed.
  34. in src/validation.h: in 10853b3517 outdated
    68@@ -69,6 +69,8 @@ static const unsigned int DEFAULT_DESCENDANT_LIMIT = 25;
    69 static const unsigned int DEFAULT_DESCENDANT_SIZE_LIMIT = 101;
    70 /** Default for -mempoolexpiry, expiration time for mempool transactions in hours */
    71 static const unsigned int DEFAULT_MEMPOOL_EXPIRY = 336;
    72+/** Maximum kilobytes for transactions to store for processing during reorg */
    73+static const unsigned int MAX_DISCONNECTED_TX_POOL_SIZE = 75000;
    


    TheBlueMatt commented at 4:11 pm on March 13, 2017:
    nit: Yeesh, isnt that a bit high? Maybe like 10MB - if we reorg something larger than 2/3 blocks throwing away tons of txn is fine by me.

    sdaftuar commented at 7:16 pm on March 28, 2017:
    Dropping it to 20 (segwit blocks can be bigger).
  35. TheBlueMatt commented at 4:14 pm on March 13, 2017: member
    Strong concept ACK
  36. dcousens commented at 7:05 am on March 24, 2017: contributor
    Needs rebase?
  37. sdaftuar force-pushed on Mar 28, 2017
  38. sdaftuar commented at 7:17 pm on March 28, 2017: member

    First, I rebased https://github.com/sdaftuar/bitcoin/commits/fd-rebased.2 -> https://github.com/sdaftuar/bitcoin/commits/fd-rebased.3, no actual conflicts.

    Then I added 4 fixup commits to address the nits mentioned so far. I also added an assertion that a DisconnectedBlockTransactions object can’t be destroyed if it’s non-empty… Not sure how good an idea that is, but I thought it might catch code bugs in the future related to mempool consistency?

  39. sdaftuar force-pushed on Mar 28, 2017
  40. in src/validation.cpp:2469 in 5685657564 outdated
    2428@@ -2379,7 +2429,7 @@ static bool ActivateBestChainStep(CValidationState& state, const CChainParams& c
    2429 
    2430         // Connect new blocks.
    2431         BOOST_REVERSE_FOREACH(CBlockIndex *pindexConnect, vpindexToConnect) {
    2432-            if (!ConnectTip(state, chainparams, pindexConnect, pindexConnect == pindexMostWork ? pblock : std::shared_ptr<const CBlock>(), connectTrace)) {
    2433+            if (!ConnectTip(state, chainparams, pindexConnect, pindexConnect == pindexMostWork ? pblock : std::shared_ptr<const CBlock>(), connectTrace, disconnectpool)) {
    


    TheBlueMatt commented at 0:54 am on March 30, 2017:
    We’ll now assert() on L2445 when we return false if ConnectTip fails, instead of gracefully AbortNode()ing.

    sdaftuar commented at 6:07 pm on March 30, 2017:

    I don’t know what the best answer here is. I could call ReacceptToMemoryPool(), but doing so after we’ve invoked AbortNode() seems either annoying (trying to lookup UTXOs from disk when we’re potentially having disk trouble seems bad), or potentially problematic (is it okay to trigger SyncTransaction() callbacks when our on-disk state is potentially corrupt?).

    One idea: check to see if fShutdownRequested is set, and if so, just wipe the disconnectpool and the mempool so that any observers of the mempool won’t see a potentially inconsistent state (and if fShutdownRequested is not set, call ReacceptToMemoryPool() – this should never happen). Then the mempool saving won’t work in the event of an AbortNode(), but maybe that’s okay, we’re probably out of disk space anyway. @sipa Any suggestions on how to handle this?


    TheBlueMatt commented at 8:31 pm on March 30, 2017:
    For posterity: meeting conclusion, broadly, appeared to be that we should move forward with AbortNode() eventually migrating towards being used only for non-critical errors that result in a “we should shutdown soonish”, and critical errors such as on-disk corruption should be an assert(false), ie that we should move forward here ensuring that the mempool is consistent when we return.

    sdaftuar commented at 9:13 pm on March 30, 2017:
    @TheBlueMatt I pushed up a commit that addresses the edge-cases better, I think – it now will just erase from the mempool to become consistent again. Also patches up the missing mempool.removeForReorg calls in a couple places, and moves all the reorg consistency handling into one function to get rid of some code duplication.
  41. TheBlueMatt commented at 10:26 pm on April 12, 2017: member
    utACK 65a4dcc46f1ef942d52d8445cdc4a2ef385c816b, though please squash it with e4a0440fab759a0a3c2d4d03f9bd7463d84dde9a
  42. sdaftuar force-pushed on Apr 13, 2017
  43. sdaftuar commented at 2:03 am on April 13, 2017: member
    Squashed 65a4dcc -> 688417b
  44. in src/txmempool.h:753 in 688417b932 outdated
    758+    }
    759+
    760+    // Remove entries based on txid_index, and update memory usage.
    761+    void removeForBlock(const std::vector<CTransactionRef>& vtx)
    762+    {
    763+        for (auto const &tx : vtx) {
    


    morcos commented at 3:05 pm on April 13, 2017:
    probably should shortcut to skip this loop if disconnectpool is empty which is the common case in connecttip

    sdaftuar commented at 3:54 pm on April 13, 2017:
    Agreed, fixed in 15b46d94b53af6429f808bb47ebc2a40da9dc791
  45. morcos commented at 3:12 pm on April 13, 2017: member
    utACK 688417b modulo performance nit
  46. sdaftuar force-pushed on Apr 13, 2017
  47. sdaftuar commented at 3:55 pm on April 13, 2017: member
    Squashed 15b46d94b53af6429f808bb47ebc2a40da9dc791 -> 75b072df512442829bf288f82ed8ce0830512763
  48. morcos commented at 5:25 pm on April 17, 2017: member
    utACK 75b072d
  49. morcos approved
  50. in src/core_memusage.h:68 in 75b072df51 outdated
    62@@ -63,4 +63,9 @@ static inline size_t RecursiveDynamicUsage(const CBlockLocator& locator) {
    63     return memusage::DynamicUsage(locator.vHave);
    64 }
    65 
    66+template<typename X>
    67+static inline size_t RecursiveDynamicUsage(const std::shared_ptr<X>& p) {
    68+    return RecursiveDynamicUsage(*p) + memusage::DynamicUsage(p);
    


    theuni commented at 5:12 pm on April 20, 2017:

    Watch the null deref:

    0return p ? memusage::DynamicUsage(p) + RecursiveDynamicUsage(*p) : 0;
    

    sdaftuar commented at 12:52 pm on April 21, 2017:
    Thanks, fixed in ea1b122
  51. sdaftuar force-pushed on Apr 21, 2017
  52. sdaftuar commented at 12:59 pm on April 21, 2017: member
    Fixed @theuni’s nit in ea1b122 (https://github.com/sdaftuar/bitcoin/commits/fd-rebased.7), and squashed in 2cb9509
  53. jonasschnelli added this to the "Blockers" column in a project

  54. [qa] Relax assumptions on mempool behavior during reorg
    Policy limits (such as chain limits and mempool total size) could reasonably
    be enforced more aggressively during a reorg, so use resendwallettransactions
    to repopulate the mempool to avoid mined blocks being too small, and increase
    the chain limits from the default for this test.
    
    This is in preparation for a change in mempool behavior during a reorg.
    9decd648ac
  55. Store disconnected block transactions outside mempool during reorg
    Rather than re-add disconnected block transactions back to the mempool
    immediately, store them in a separate disconnectpool for later processing,
    because we expect most such transactions to reappear in the chain that is
    still to be connected (and thus we can avoid the work of reprocessing those
    transactions through the mempool altogether).
    71f1903353
  56. Add RecursiveDynamicUsage overload for std::shared_ptr
    This simplifies a few usage expressions.
    c1235e3f2d
  57. sdaftuar force-pushed on May 4, 2017
  58. sdaftuar commented at 9:12 pm on May 4, 2017: member
    Rebased again due to merge conflict in pruning.py
  59. sdaftuar commented at 1:25 am on May 12, 2017: member
    Anything else needed here?
  60. gmaxwell approved
  61. gmaxwell commented at 2:05 pm on May 22, 2017: contributor
    tested ACK (I previously utACKed, I know longer remember the code but I’ve used it a bunch and beat on it a bit)
  62. TheBlueMatt commented at 9:26 pm on May 29, 2017: member
    re-utACK c1235e3f2dd5b01b63b020d1b8f7283e8badaf09
  63. morcos commented at 3:20 pm on May 30, 2017: member
    re-utACK c1235e3
  64. laanwj merged this on May 30, 2017
  65. laanwj closed this on May 30, 2017

  66. laanwj referenced this in commit acd9957b72 on May 30, 2017
  67. laanwj removed this from the "Blockers" column in a project

  68. PastaPastaPasta referenced this in commit d686d1c8f4 on Jun 20, 2019
  69. PastaPastaPasta referenced this in commit c64f2c635c on Jun 22, 2019
  70. PastaPastaPasta referenced this in commit 923dc3db00 on Jun 22, 2019
  71. PastaPastaPasta referenced this in commit 04de684675 on Jun 22, 2019
  72. PastaPastaPasta referenced this in commit 33387d52a2 on Jun 22, 2019
  73. PastaPastaPasta referenced this in commit 08fcd38f4c on Jun 22, 2019
  74. PastaPastaPasta referenced this in commit fc63960a13 on Jun 22, 2019
  75. PastaPastaPasta referenced this in commit b7c941a1b5 on Jun 22, 2019
  76. PastaPastaPasta referenced this in commit c3d5c66440 on Jun 22, 2019
  77. PastaPastaPasta referenced this in commit 03780c7932 on Jun 22, 2019
  78. PastaPastaPasta referenced this in commit d2638d7823 on Jun 22, 2019
  79. PastaPastaPasta referenced this in commit a8566404be on Jun 24, 2019
  80. barrystyle referenced this in commit f378afe693 on Jan 22, 2020
  81. random-zebra referenced this in commit 1f655a649a on Mar 22, 2021
  82. random-zebra referenced this in commit 425f72f00c on Mar 25, 2021
  83. random-zebra referenced this in commit b33d4596f9 on Mar 25, 2021
  84. random-zebra referenced this in commit 2a2fb4b168 on Mar 25, 2021
  85. random-zebra referenced this in commit b80ec44f3b on Mar 30, 2021
  86. random-zebra referenced this in commit 2ae953af3d on Apr 4, 2021
  87. random-zebra referenced this in commit af984e1780 on Apr 6, 2021
  88. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-06 16:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me