Change UpdateForDescendants to use Epochs #18120

pull JeremyRubin wants to merge 1 commits into bitcoin:master from JeremyRubin:epoch-mempool-clean-split-updatefordescendants-pt1 changing 2 files +58 −35
  1. JeremyRubin commented at 8:10 pm on February 11, 2020: contributor

    Refactors UpdateForDescendants to use Epochs instead of sets to de-duplicate traversal, and replaces the cache entries with a vector instead of a set. This is a straightforward win. The algorithm is a bit clever so as not to require two separate vectors for txiters that need expanding and txiters that have already been processed, but I think it’s relatively easy to reason about.

    extracted from #18063 as a standalone because it turns out the DoS issues with this code perhaps merit more extensive review http://gnusha.org/bitcoin-core-dev/2020-02-11.log. That PR will stay open for review, but this PR is designed as an “obviously better” win that can be merged now to keep progress up on the mempool project.

  2. Change UpdateForDescendants to use Epochs 057c8b4ce8
  3. JeremyRubin added the label Refactoring on Feb 11, 2020
  4. JeremyRubin added the label Mempool on Feb 11, 2020
  5. JeremyRubin requested review from ajtowns on Feb 11, 2020
  6. JeremyRubin requested review from sdaftuar on Feb 11, 2020
  7. DrahtBot commented at 2:32 am on February 12, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    No conflicts as of last run.

  8. fanquake added this to the "Epoch MemPool" column in a project

  9. in src/txmempool.cpp:70 in 057c8b4ce8
    81+    // already de-duplicated in case multiple outputs of ours are spent in one
    82+    // transaction)
    83+    vecEntries update_cache;
    84+    update_cache.reserve(direct_children.size());
    85+    // mark every direct_child as visited so that we don't accidentally re-add them
    86+    // to the cache in the grandchild is child case
    


    sdaftuar commented at 4:29 pm on February 19, 2020:

    nit: This comment could be improved, for code readers not as familiar with the issues here. Perhaps something like:

    0mark every direct child as visited, so that if we encounter one of these transactions again 
    1(as the descendant of some other direct child), we don't re-add them to the cache.
    

    JeremyRubin commented at 6:56 pm on February 19, 2020:

    Maybe adding an example helps?

    0mark every direct child as visited, so that if we encounter one of these transactions again 
    1(as the descendant of some other direct child), we don't re-add them to the cache. e.g.,
    2
    3Tx A: L1 -> {O1, O2}
    4Tx B: {O1} -> {P1}
    5Tx C: {P1, O2} -> {Q1}
    6
    7In this case transaction C is both a direct child of A and a grandchild via B.
    

    I can improve the documentation with this in the follow up PR where we clean this up more (because maybe we’d just end up removing the caching anyways).

  10. sdaftuar approved
  11. sdaftuar commented at 4:46 pm on February 19, 2020: member

    ACK, this looks correct to me.

    I tried to benchmark some sort of “worst-case” scenario, where all of a block’s transactions have all of the mempool as descendants, making the number of in-mempool transactions and number of in-block transactions as large as possible. I came up with a scenario of a block that has 250 transactions and a mempool of 77,500 transactions, and measured how long UpdateTransactionsFromBlock takes on my machine after disconnecting that block. In the 0.19 branch, this took about 54 seconds. In master, this takes about 48 seconds, and after this commit it takes about 25 seconds.

    (25 seconds is still terrible, of course, and we should continue to find ways to make structural improvements so that we don’t have long delays in the critical path of block acceptance, but this change makes sense to me as a significant algorithmic improvement.)

  12. JeremyRubin commented at 1:06 am on March 3, 2020: contributor
    Dismaying but I noticed another flaw here; I don’t think we ever check if the direct children of a transaction are in the cache. I’ll fix this in the follow up PR for review there.
  13. JeremyRubin closed this on Mar 6, 2020

  14. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me