validation: improve performance of CheckBlockIndex #28339

pull mzumsande wants to merge 3 commits into bitcoin:master from mzumsande:202308_speedup_checkblockindex changing 8 files +73 −19
  1. mzumsande commented at 6:18 pm on August 24, 2023: contributor

    CheckBlockIndex() are consistency checks that are currently enabled by default on regtest.

    The function is rather slow, which is annoying if you

    • attempt to run it on other networks, especially if not fully synced
    • want to generate a long chain on regtest and see block generation slow down because you forgot to disable -checkblockindex or don’t know it existed.

    One reason why it’s slow is that in order to be able to traverse the block tree depth-first from genesis, it inserts pointers to all block indices into a std::multimap - for which inserts and lookups become slow once there are hundred thousands of entries. However, typically the block index is mostly chain-like with just a few forks so a multimap isn’t really needed for the most part. This PR suggests to store the block indices of the chain ending in the best header in a vector instead, and store only the rest of the indices in a multimap. This does not change the actual consistency checks that are being performed for each index, just the way the block index tree is stored and traversed.

    This adds a bit of complication to make sure each block is visited (note that there are asserts that check it), making sure that the two containers are traversed correctly, but it speeds up the function considerably:

    On master, a single invocation of CheckBlockIndex takes ~1.4s on mainnet for me (4.9s on testnet which has >2.4 million blocks). With this branch, the runtime goes down to ~0.27s (0.85s on testnet).This is a speedup by a factor ~5.

  2. DrahtBot commented at 6:18 pm on August 24, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK furszy, ryanofsky, achow101
    Concept ACK jonatack

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. DrahtBot added the label Validation on Aug 24, 2023
  4. mzumsande marked this as a draft on Aug 24, 2023
  5. mzumsande marked this as ready for review on Aug 25, 2023
  6. fanquake commented at 2:22 pm on August 25, 2023: member
  7. martinus commented at 7:51 pm on August 25, 2023: contributor
    Hi, I wonder how you measured the runtime, is there a benchmark? Another optimization that might be worth it, it is to replace the std::multimap<CBlockIndex*,CBlockIndex*> forward with an std::unordered_map<CBlockIndex*, std::vector<CBlockIndex*>> forward.
  8. in src/validation.cpp:5019 in 3e0f001898 outdated
    4702+    // Build forward-pointing data structure for the entire block tree.
    4703+    // For performance reasons, indexes of the best header chain are stored in a vector.
    4704+    // All remaining blocks are stored in a multimap.
    4705+    CBlockIndex* index_walk{m_best_header};
    4706+    const int best_hdr_height{m_best_header->nHeight};
    4707+    std::vector<CBlockIndex*> best_hdr_chain(best_hdr_height + 1);
    


    sipa commented at 2:56 pm on August 28, 2023:
    Instead of explicitly building your own vector, can you reuse Chainstate::m_chain, which effectively contains pointers to all current-mainchain block index entries in a vector already?

    mzumsande commented at 3:14 pm on August 28, 2023:
    I didn’t do that so that the speedup is independent from the chain, and also applies to situations where we have all the headers, but m_chain is behind (IBD, reindex). Which seemed important because these are the exact situations where CheckBlockIndex will be called a lot and slows bitcoind down most noticeably.

    sipa commented at 3:18 pm on August 28, 2023:
    Makes sense!
  9. mzumsande commented at 3:17 pm on August 28, 2023: contributor

    Hi, I wonder how you measured the runtime, is there a benchmark?

    So far only manually, by adding log statements to the beginning and end of CheckBlockIndex and running with -logtimemicros. I will add a benchmark!

    Another optimization that might be worth it, it is to replace the std::multimap<CBlockIndex*,CBlockIndex*> forward with an std::unordered_map<CBlockIndex*, std::vector<CBlockIndex*» forward.

    Thanks, I’ll look into that!

  10. DrahtBot added the label CI failed on Sep 3, 2023
  11. DrahtBot removed the label CI failed on Sep 5, 2023
  12. mzumsande marked this as a draft on Sep 11, 2023
  13. Sjors commented at 7:24 pm on September 14, 2023: member
    While you’re at it, I noticed that the help text for -checkblockindex says “occasionally” (added in #7290), but afaik it happens every block (and every header during the initial header sync). We should probably drop that word, or alternatively provide an interval argument.
  14. mzumsande commented at 6:02 pm on September 15, 2023: contributor

    While you’re at it, I noticed that the help text for -checkblockindex says “occasionally” (added in #7290), but afaik it happens every block (and every header during the initial header sync). We should probably drop that word, or alternatively provide an interval argument.

    Good point, I think that an interval argument similar to addrman or mempool consistency checks makes sense. Will try that in addition to the other points above (it might take a few weeks though, therefore put the PR into draft).

  15. DrahtBot added the label CI failed on Oct 7, 2023
  16. DrahtBot removed the label CI failed on Oct 15, 2023
  17. mzumsande force-pushed on Dec 22, 2023
  18. mzumsande commented at 4:18 pm on December 22, 2023: contributor

    I added a commit with a benchmark, and another one that allow to specify the frequency for -checkblockindex (analogous to the way -checkaddrman and -checkmempool work).

    Another optimization that might be worth it, it is to replace the std::multimap<CBlockIndex*,CBlockIndex*> forward with an std::unordered_map<CBlockIndex*, std::vector<CBlockIndex*» forward.

    I tried that in https://github.com/mzumsande/bitcoin/commit/51aae54cfafc9700581e69dc129866cff76925ae (directly on master without the changes of this PR), however it doesn’t seem to improve performance - I’ll post benchmark results below.

  19. mzumsande commented at 4:43 pm on December 22, 2023: contributor

    Benchmark results:

    ns/op op/s err% total benchmark
    331,165.67 3,019.64 0.3% 0.01 CheckBlockIndex
    320,603.00 3,119.12 1.7% 0.01 CheckBlockIndex
    315,683.67 3,167.73 0.4% 0.01 CheckBlockIndex
    329,077.33 3,038.80 0.3% 0.01 CheckBlockIndex
    315,345.33 3,171.13 0.4% 0.01 CheckBlockIndex
    ns/op op/s err% total benchmark
    127,273.00 7,857.13 1.8% 0.01 CheckBlockIndex
    123,687.22 8,084.91 0.1% 0.01 CheckBlockIndex
    136,637.17 7,318.65 0.1% 0.01 CheckBlockIndex
    123,805.67 8,077.17 0.3% 0.01 CheckBlockIndex
    124,162.62 8,053.95 0.2% 0.01 CheckBlockIndex

    Uses std::unordered_map<CBlockIndex*, std::vector<CBlockIndex*>>, see https://github.com/mzumsande/bitcoin/commit/51aae54cfafc9700581e69dc129866cff76925ae for my implementation

    ns/op op/s err% total benchmark
    384,395.00 2,601.49 0.5% 0.01 CheckBlockIndex
    380,674.67 2,626.92 0.3% 0.01 CheckBlockIndex
    384,337.33 2,601.88 1.1% 0.01 CheckBlockIndex
    410,836.50 2,434.06 3.4% 0.01 CheckBlockIndex
    384,461.33 2,601.04 2.0% 0.01 CheckBlockIndex

    The benchmark currently uses a linear chain of 1100 headers. In general, the efficiency gain becomes larger with a longer chain, for example with just 100 headers the speedup is less.

    So there is a speedup of a facto >2.5 with the benchmark (that becomes larger on larger chains, see results for mainnet / testnet in OP). The suggestion by @martinus to use std::unordered_map<CBlockIndex*, std::vector<CBlockIndex*>> doesn’t seem to improve performance (though it’s also very possible that I might have not implemented it in a great away).

  20. mzumsande marked this as ready for review on Dec 22, 2023
  21. jonatack commented at 10:41 pm on January 6, 2024: contributor
    Concept ACK
  22. DrahtBot added the label CI failed on Jan 12, 2024
  23. DrahtBot added the label Needs rebase on Jan 31, 2024
  24. mzumsande force-pushed on Feb 5, 2024
  25. mzumsande commented at 8:19 pm on February 5, 2024: contributor
    87d6155 to efdc8e3: rebased
  26. DrahtBot removed the label Needs rebase on Feb 5, 2024
  27. DrahtBot removed the label CI failed on Feb 5, 2024
  28. in src/validation.cpp:5041 in efdc8e3a0e outdated
    4816@@ -4817,6 +4817,13 @@ void ChainstateManager::LoadExternalBlockFile(
    4817     LogPrintf("Loaded %i blocks from external file in %dms\n", nLoaded, Ticks<std::chrono::milliseconds>(SteadyClock::now() - start));
    4818 }
    4819 
    4820+bool ChainstateManager::ShouldCheckBlockIndex() const
    4821+{
    4822+    if (!*Assert(m_options.check_block_index)) return false;
    4823+    if (GetRand(*m_options.check_block_index) >= 1) return false;
    


    maflcko commented at 9:34 am on February 6, 2024:
    May be better to do this deterministically to avoid non-determinism in tests?

    mzumsande commented at 4:30 pm on February 6, 2024:
    I think it would be nice to have consistency among -checkaddrman, -checkmempool and -checkblockindex. Since non-determinism would only occur for values other than 0 and 1, maybe it’d be best not to use other values than that in tests? Or change all three do be deterministic?
  29. furszy commented at 2:24 pm on February 8, 2024: member
    Concept ACK. Will start reviewing next week.
  30. in src/bench/checkblockindex.cpp:16 in 94f2da6c62 outdated
    11+    auto testing_setup{MakeNoLogFileContext<TestChain100Setup>()};
    12+    // Mine some more blocks
    13+    testing_setup->mineBlocks(1000);
    14+    bench.run([&] {
    15+        testing_setup->m_node.chainman->CheckBlockIndex();
    16+    });
    


    furszy commented at 9:28 pm on February 27, 2024:
    Have you thought on making check_block_index configurable during runtime? (at least at the code level). It would speedup this and other bench setups greatly.

    mzumsande commented at 8:12 pm on March 8, 2024:
    no, I hadn’t thought about it, but that sounds like a good idea (which seems independent from this PR though)
  31. in src/validation.cpp:5017 in 0f7963e0b3 outdated
    4832@@ -4833,18 +4833,31 @@ void ChainstateManager::CheckBlockIndex()
    4833         return;
    4834     }
    4835 
    4836-    // Build forward-pointing map of the entire block tree.
    4837+    // Build forward-pointing data structure for the entire block tree.
    4838+    // For performance reasons, indexes of the best header chain are stored in a vector.
    4839+    // All remaining blocks are stored in a multimap.
    4840+    CBlockIndex* index_walk{m_best_header};
    


    furszy commented at 1:49 am on March 4, 2024:
    Haven’t gone deeper yet but this reminds me to #26245 (and #26260).
  32. in src/validation.cpp:5348 in 0f7963e0b3 outdated
    5095             } else {
    5096                 // Move up further.
    5097                 pindex = pindexPar;
    5098                 nHeight--;
    5099                 continue;
    5100             }
    


    furszy commented at 11:28 pm on March 10, 2024:

    tiny nit:

     0diff --git a/src/validation.cpp b/src/validation.cpp
     1--- a/src/validation.cpp	(revision 7ae0a3cf9eba43a0132e99f33ad58990866c587e)
     2+++ b/src/validation.cpp	(date 1710113242384)
     3@@ -5206,12 +5206,11 @@
     4                     finished = true;
     5                     break;
     6                 }
     7-            } else {
     8-                // Move up further.
     9-                pindex = pindexPar;
    10-                nHeight--;
    11-                continue;
    12-            }
    13+            }
    14+
    15+            // Move up further
    16+            pindex = pindexPar;
    17+            nHeight--;
    18         }
    19         if (finished) break;
    20     }
    

    mzumsande commented at 8:59 pm on March 20, 2024:
    done!
  33. in src/node/chainstatemanager_args.cpp:27 in efdc8e3a0e outdated
    23@@ -24,7 +24,7 @@
    24 namespace node {
    25 util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManager::Options& opts)
    26 {
    27-    if (auto value{args.GetBoolArg("-checkblockindex")}) opts.check_block_index = *value;
    28+    if (auto value{args.GetIntArg("-checkblockindex")}) opts.check_block_index = *value;
    


    ryanofsky commented at 9:37 pm on March 12, 2024:

    In commit “validation: allow to specify frequency for -checkblockindex” (efdc8e3a0ee81b143ee17a578f1e27b1c427a912)

    A problem with this change is that it will now interpret bare -checkblockindex arguments as -checkblockindex=0 instead of -checkblockindex=1. Could fix this with:

    0if (auto value{args.GetIntArg("-checkblockindex")}) {
    1    // Interpret bare -checkblockindex argument as 1 instead of 0.
    2   opts.check_block_index = args.GetArg("-checkblockindex")->empty() ? 1 : *value;
    3}
    

    mzumsande commented at 9:02 pm on March 18, 2024:
    Good catch! Took your suggestion. I copied this from the -checkmempool parsing code, which will still interpret a bare -checkmempool as -checkmempool=0 (even though that was a boolean arg originally as well). Maybe that should be changed as well.
  34. in src/validation.cpp:5023 in 0f7963e0b3 outdated
    4841+    const int best_hdr_height{m_best_header->nHeight};
    4842+    std::vector<CBlockIndex*> best_hdr_chain(best_hdr_height + 1);
    4843+    while (index_walk) {
    4844+        best_hdr_chain[index_walk->nHeight] = index_walk;
    4845+        index_walk = index_walk->pprev;
    4846+    }
    


    ryanofsky commented at 9:43 pm on March 12, 2024:

    In commit “validation: improve performance of CheckBlockIndex” (0f7963e0b306257ae5d547ba979712156c9d3a30)

    This seems like this is reimplementing what’s already implemented in the CChain class, and all this initialization could be replaced with:

    0CChain best_headers;
    1if (m_best_header) best_headers.SetTip(m_best_header});
    

    Also code below could be simplified.


    mzumsande commented at 8:56 pm on March 18, 2024:
    nice! Changed it to use CChain.
  35. in src/validation.cpp:5028 in 0f7963e0b3 outdated
    4847+
    4848     std::multimap<CBlockIndex*,CBlockIndex*> forward;
    4849     for (auto& [_, block_index] : m_blockman.m_block_index) {
    4850-        forward.emplace(block_index.pprev, &block_index);
    4851+        // Only save indexes in forward that are not part of the best header chain.
    4852+        if (block_index.nHeight > best_hdr_height || best_hdr_chain[block_index.nHeight]->GetBlockHash() != block_index.GetBlockHash()) {
    


    ryanofsky commented at 9:50 pm on March 12, 2024:

    In commit “validation: improve performance of CheckBlockIndex” (0f7963e0b306257ae5d547ba979712156c9d3a30)

    Calling GetBlockHash() here is unnecessary, right? It seems like it would be more consistent with rest of code to just compare pointers not hashes.


    mzumsande commented at 8:58 pm on March 18, 2024:
    Makes sense, but since I replaced then entire condition with CChain::Contains(), this is no longer relevant.
  36. ryanofsky commented at 9:54 pm on March 12, 2024: contributor

    Code review efdc8e3a0ee81b143ee17a578f1e27b1c427a912.

    Concept ACK. Seems like this gives a significant speedup from a simple code change.

    I left some questions below and didn’t ack yet because I think the current implementation might inadvertently break passing a -checkblockindex argument with no value, but otherwise this looks good.

  37. mzumsande force-pushed on Mar 18, 2024
  38. mzumsande commented at 6:34 pm on March 18, 2024: contributor
    just rebased, will address feedback soon (I have a local merge conflict which apparently doesn’t bother Github)
  39. mzumsande force-pushed on Mar 18, 2024
  40. mzumsande commented at 9:04 pm on March 18, 2024: contributor
    08a9b35 to 4f51a4c: Addressed feedback by @ryanofsky, thanks!
  41. furszy commented at 3:08 pm on March 19, 2024: member

    Code review ACK 4f51a4c

    Would be nice to cover this function properly with a good number of unit tests. It shouldn’t be hard, the validation_block_tests.cpp file contain all the util functions to create different chain topologies.

  42. DrahtBot requested review from jonatack on Mar 19, 2024
  43. DrahtBot requested review from ryanofsky on Mar 19, 2024
  44. DrahtBot added the label Needs rebase on Mar 20, 2024
  45. bench: add benchmark for checkblockindex 32c80413fd
  46. mzumsande force-pushed on Mar 20, 2024
  47. mzumsande commented at 9:07 pm on March 20, 2024: contributor

    4f51a4c to b51eb42: Rebased, addressed nit.

    Would be nice to cover this function properly with a good number of unit tests. It shouldn’t be hard, the validation_block_tests.cpp file contain all the util functions to create different chain topologies.

    I agree, might take me a few weeks to get to that thoughs since I have a few other things on my list. Although just mentioning that quite a few constellations are already covered in random unit / functional tests that are targeted at something else - when I implemented this originally and didn’t get some details right, I got several failures.

  48. mzumsande force-pushed on Mar 20, 2024
  49. DrahtBot removed the label Needs rebase on Mar 20, 2024
  50. in src/validation.cpp:5072 in 8cf5b83a86 outdated
    5075-    CBlockIndex *pindex = rangeGenesis.first->second;
    5076-    rangeGenesis.first++;
    5077-    assert(rangeGenesis.first == rangeGenesis.second); // There is only one index entry with parent nullptr.
    5078+    CBlockIndex* pindex = best_hdr_chain[0];
    5079+    assert(pindex);
    5080+    bool in_best_hdr_chain{true};
    


    ryanofsky commented at 1:22 pm on March 28, 2024:

    In commit “validation: improve performance of CheckBlockIndex” (8cf5b83a86efa55b95d81f86f4f467277791dc62)

    I think you could consider getting rid of the in_best_hdr_chain variable and just writing best_hdr_chain.Contains(pindex) below. IMO, this would make code easier to understand and eliminate a potential source of bugs (having an extra state variable that needs to be updated correctly in all code paths). Maybe keeping the bool would be slightly faster but I wouldn’t expect the difference to significant.


    mzumsande commented at 2:50 pm on April 1, 2024:
    I changed it as suggested, and didn’t see a noticeable impact on performance.
  51. in src/validation.cpp:5335 in 8cf5b83a86 outdated
    5336-                continue;
    5337+            } else if (pindexPar->nHeight <= best_hdr_chain.Height() && pindexPar == best_hdr_chain[nHeight - 1]) {
    5338+                // The parent is part of the best header chain, all forks have been processed.
    5339+                in_best_hdr_chain = true;
    5340+                if (nHeight <= m_best_header->nHeight) {
    5341+                    // If possible, continue by processing a sibling of the best header chain.
    


    ryanofsky commented at 1:38 pm on March 28, 2024:

    In commit “validation: improve performance of CheckBlockIndex” (8cf5b83a86efa55b95d81f86f4f467277791dc62)

    Seems like this should say “sibling on” not “sibling of”?


    mzumsande commented at 2:50 pm on April 1, 2024:
    fixed
  52. in src/validation.cpp:5305 in 8cf5b83a86 outdated
    5302         }
    5303         // This is a leaf node.
    5304+        if (in_best_hdr_chain) break; // we are finished, since the best header chain is alway processed last
    5305+
    5306         // Move upwards until we reach a node of which we have not yet visited the last child.
    5307+        bool finished{false};
    


    ryanofsky commented at 1:42 pm on March 28, 2024:

    In commit “validation: improve performance of CheckBlockIndex” (8cf5b83a86efa55b95d81f86f4f467277791dc62)

    IMO, it would be clearer to drop the finished variable and just write pindex = nullptr instead of finished = true to keep the same end condition.


    mzumsande commented at 2:52 pm on April 1, 2024:
    Done.
  53. in src/validation.cpp:5331 in 8cf5b83a86 outdated
    5332-            } else {
    5333-                // Move up further.
    5334-                pindex = pindexPar;
    5335-                nHeight--;
    5336-                continue;
    5337+            } else if (pindexPar->nHeight <= best_hdr_chain.Height() && pindexPar == best_hdr_chain[nHeight - 1]) {
    


    ryanofsky commented at 1:46 pm on March 28, 2024:

    In commit “validation: improve performance of CheckBlockIndex” (8cf5b83a86efa55b95d81f86f4f467277791dc62)

    Could write this more concisely as pindexPar == best_hdr_chain[nHeight - 1] since operator[] is already implemented to return nullptr if height is out of bounds.


    mzumsande commented at 2:55 pm on April 1, 2024:
    True, changed.
  54. in src/validation.cpp:5347 in 8cf5b83a86 outdated
    5348+                    break;
    5349+                }
    5350             }
    5351+            // Move up further.
    5352+            pindex = pindexPar;
    5353+            nHeight--;
    


    ryanofsky commented at 1:49 pm on March 28, 2024:

    In commit “validation: improve performance of CheckBlockIndex” (8cf5b83a86efa55b95d81f86f4f467277791dc62)

    Diff could be a little smaller if this code was not moving. I believe this could just be written as:

     0if (rangePar.first != rangePar.second) {
     1    // Move to pindex's sibling not on the best-chain.
     2     pindex = rangePar.first->second;
     3     break;
     4} else if (pindexPar == best_hdr_chain[nHeight - 1])
     5     // Move to pindex's sibling on the best-chain, if it has one.
     6     pindex = best_hdr_chain[nHeight];
     7     assert(pindex || pindexPar == m_best_header);
     8     break;
     9} else {
    10    // Move up further.
    11    pindex = pindexPar;
    12    nHeight--;
    13}
    

    mzumsande commented at 2:55 pm on April 1, 2024:
    Nice! I took that suggestion.
  55. in src/node/chainstatemanager_args.cpp:28 in b51eb422d6 outdated
    23@@ -24,8 +24,10 @@
    24 namespace node {
    25 util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManager::Options& opts)
    26 {
    27-    if (auto value{args.GetBoolArg("-checkblockindex")}) opts.check_block_index = *value;
    28-
    


    ryanofsky commented at 1:57 pm on March 28, 2024:

    In commit “validation: allow to specify frequency for -checkblockindex” (b51eb422d657bb1d0e6f9c7e71eb8a87a8c74f42)

    Probably should keep this line break to keep style consistent within the function


    mzumsande commented at 2:47 pm on April 1, 2024:
    done
  56. in src/validation.cpp:5040 in b51eb422d6 outdated
    5033@@ -5034,6 +5034,13 @@ void ChainstateManager::LoadExternalBlockFile(
    5034     LogPrintf("Loaded %i blocks from external file in %dms\n", nLoaded, Ticks<std::chrono::milliseconds>(SteadyClock::now() - start));
    5035 }
    5036 
    5037+bool ChainstateManager::ShouldCheckBlockIndex() const
    5038+{
    5039+    if (!*Assert(m_options.check_block_index)) return false;
    


    ryanofsky commented at 2:05 pm on March 28, 2024:

    In commit “validation: allow to specify frequency for -checkblockindex” (b51eb422d657bb1d0e6f9c7e71eb8a87a8c74f42)

    Note: I was initially confused why this is has an Assert to check if the option is set, instead of just returning a default value when the option is unset, but the reason seems to be that this is a check to make sure that Flatten() has been called and opts.chainparams.DefaultConsistencyChecks() has been applied. Could maybe add a comment above this like “// Assert to verify Flatten() has been called.” to clarify this


    mzumsande commented at 2:49 pm on April 1, 2024:
    I added a comment. By the way, the Assert is also already present on master.
  57. ryanofsky approved
  58. ryanofsky commented at 2:09 pm on March 28, 2024: contributor
    Code review ACK b51eb422d657bb1d0e6f9c7e71eb8a87a8c74f42 but it seems like code could be simpler and clearer (could eliminate finished and in_best_hdr_chain variables and redundant height checks before using CChain::operator[])
  59. DrahtBot requested review from furszy on Mar 28, 2024
  60. mzumsande force-pushed on Apr 1, 2024
  61. mzumsande force-pushed on Apr 1, 2024
  62. mzumsande commented at 2:59 pm on April 1, 2024: contributor

    b51eb42 to e880ee0 Addressed feedback by @ryanofsky - now the diff to master should be a bit smaller than before.

    Still haven’t had the time to write more unit tests for CheckBlockIndex().

  63. DrahtBot added the label CI failed on Apr 1, 2024
  64. DrahtBot removed the label CI failed on Apr 5, 2024
  65. in src/validation.cpp:5057 in 4b05c654c1 outdated
    5049@@ -5050,19 +5050,25 @@ void ChainstateManager::CheckBlockIndex()
    5050         return;
    5051     }
    5052 
    5053-    // Build forward-pointing map of the entire block tree.
    5054+    // Build forward-pointing data structure for the entire block tree.
    5055+    // For performance reasons, indexes of the best header chain are stored in a vector (within CChain).
    5056+    // All remaining blocks are stored in a multimap.
    5057+    CChain best_hdr_chain;
    5058+    if (m_best_header) best_hdr_chain.SetTip(*m_best_header);
    


    ryanofsky commented at 7:27 pm on April 17, 2024:

    In commit “validation: improve performance of CheckBlockIndex” (4b05c654c1a5df294fa32a94c38001ac2b4ff2de)

    It seems like this code cannot work in the case where m_best_header is null because assert(pindex) on line 5071 below will be triggered.

    I think it might make sense to replace if (m_best_header) here with assert(m_best_header); to make the assumption explicit. Otherwise, I think you will need to set the best_hdr_chain tip to something else, like blockman.m_block_index.begin()


    furszy commented at 1:17 pm on April 18, 2024:

    In 4b05c654c1:

    Using m_best_header shouldn’t be working (at least not as you expect) for invalidateblock.

    During InvalidateBlock(), we first disconnect all blocks, then call CheckBlockIndex without updating m_best_header (this is done later on the flow, when we call InvalidChainFound()). So, by the time CheckBlockIndex is called, best_hdr_chain is loaded with a chain that is not the active one.

    Any reason not to use ActiveTip() instead of m_best_header here?


    ryanofsky commented at 1:29 pm on April 18, 2024:

    Any reason not to use ActiveTip() instead of m_best_header here?

    I think the reason is just that we want best_hdr_chain to be the longest chain with the most forks for the optimization to work well. We don’t actually care which chain it is because CheckBlockIndex will traverse all the blocks in any case. If a different chain is chosen it just means the blocks will be traversed in a different order, less or more efficiently.


    furszy commented at 1:53 pm on April 18, 2024:
    ok, yeah. Then it is not an issue but, isn’t error-prone to call best_hdr_chain something that might not be the best chain then? E.g. it seems that if we ever add a check only for the active chain blocks here, the best_hdr_chain.Contains() usage temptation will be there.

    ryanofsky commented at 2:47 pm on April 18, 2024:

    ok, yeah. Then it is not an issue but, isn’t error-prone to call best_hdr_chain something that might not be the best chain then? E.g. it seems that if we ever add a check only for the active chain blocks here, the best_hdr_chain.Contains() usage temptation will be there.

    I don’t love the name best_hdr_chain but don’t think it is too bad. I agree it could potentially be confused with the active chain in new code, and that would be a bug. In my original suggestion #28339 (review), I called it best_headers so at least the name was consistent with m_best_header. I don’t really have other ideas though.


    mzumsande commented at 6:44 pm on April 18, 2024:

    Any reason not to use ActiveTip() instead of m_best_header here

    My main reason is that CheckBlockIndex() being independent from the active chainstate makes the performance improvement applicable to situations where we have all the headers but haven’t validated much of the chain: See this old answer. IBD and reindex seem much more relevant than extreme invalidateblock scenarios.


    mzumsande commented at 4:23 pm on April 22, 2024:

    I agree it could potentially be confused with the active chain in new code, and that would be a bug.

    I added a comment in CheckBlockIndex() describing why these can differ.


    mzumsande commented at 4:23 pm on April 22, 2024:
    changed to an assert.
  66. in src/validation.cpp:5330 in 4b05c654c1 outdated
    5326                 pindex = rangePar.first->second;
    5327                 break;
    5328+            } else if (pindexPar == best_hdr_chain[nHeight - 1]) {
    5329+                // Move to pindex's sibling on the best-chain, if it has one.
    5330+                pindex = best_hdr_chain[nHeight];
    5331+                assert(pindex || pindexPar == m_best_header);
    


    ryanofsky commented at 7:46 pm on April 17, 2024:

    In commit “validation: improve performance of CheckBlockIndex” (4b05c654c1a5df294fa32a94c38001ac2b4ff2de)

    I think this assert might be a little clearer if it replaced m_best_header with best_hdr_chain.Tip(). Both values should be identical but latter doesn’t require remembering how the best_hdr_chain tip was assigned ~300 lines earlier in the code. Also I think the check could be stricter because if pindex is not null, pindexPar cannot be the best header. Would suggest:

    0// There will not be a next block if (and only if) parent block is the best header.
    1assert((pindex == nullptr) == (pindexPar == best_hdr_chain.Tip()));
    

    mzumsande commented at 4:23 pm on April 22, 2024:
    done, thanks!
  67. ryanofsky approved
  68. ryanofsky commented at 7:54 pm on April 17, 2024: contributor
    Code review ACK e880ee0a634f0be5079d39328fa5e118452553a9. Just rebase and suggested simplifications since the last review
  69. DrahtBot requested review from furszy on Apr 18, 2024
  70. furszy commented at 1:19 pm on April 18, 2024: member
    @DrahtBot, I just reviewed it..
  71. in src/validation.cpp:5063 in 4b05c654c1 outdated
    5060     std::multimap<CBlockIndex*,CBlockIndex*> forward;
    5061     for (auto& [_, block_index] : m_blockman.m_block_index) {
    5062-        forward.emplace(block_index.pprev, &block_index);
    5063+        // Only save indexes in forward that are not part of the best header chain.
    5064+        if (!best_hdr_chain.Contains(&block_index)) {
    5065+            // Only genesis, which must be be part of the best header chain, can have a nullptr parent.
    


    furszy commented at 11:30 am on April 19, 2024:

    tiny typo

    0            // Only genesis, which must be part of the best header chain, can have a nullptr parent.
    

    mzumsande commented at 4:20 pm on April 22, 2024:
    fixed
  72. DrahtBot requested review from furszy on Apr 19, 2024
  73. mzumsande force-pushed on Apr 22, 2024
  74. mzumsande commented at 4:24 pm on April 22, 2024: contributor
    e880ee0 to 3c3895c: Addressed review feedback.
  75. in src/validation.cpp:5302 in 7183bc9048 outdated
    5299+            nHeight++;
    5300+            pindex = best_hdr_chain[nHeight];
    5301+            continue;
    5302         }
    5303         // This is a leaf node.
    5304+        if (best_hdr_chain.Contains(pindex)) break; // we are finished, since the best header chain is alway processed last
    


    ryanofsky commented at 1:10 pm on April 24, 2024:

    In commit “validation: improve performance of CheckBlockIndex” (7183bc90483cc9361cfffb157b91fd4245ab77e0)

    This might be a little simpler and clearer since it consolidates the best header chain cases:

     0--- a/src/validation.cpp
     1+++ b/src/validation.cpp
     2@@ -5292,15 +5292,14 @@ void ChainstateManager::CheckBlockIndex()
     3             pindex = range.first->second;
     4             nHeight++;
     5             continue;
     6-        } else if (best_hdr_chain.Contains(pindex) && nHeight < best_hdr_chain.Height()) {
     7+        } else if (best_hdr_chain.Contains(pindex)) {
     8             // Descend further into best header chain.
     9             nHeight++;
    10             pindex = best_hdr_chain[nHeight];
    11+            if (!pindex) break; // we are finished, since the best header chain is always processed last
    12             continue;
    13         }
    14         // This is a leaf node.
    15-        if (best_hdr_chain.Contains(pindex)) break; // we are finished, since the best header chain is alway processed last
    16-
    17         // Move upwards until we reach a node of which we have not yet visited the last child.
    18         while (pindex) {
    19             // We are going to either move to a parent or a sibling of pindex.
    

    (also fixes spelling of “alway”)


    mzumsande commented at 5:37 pm on April 26, 2024:
    done as suggested.
  76. in src/validation.cpp:5334 in 7183bc9048 outdated
    5330                 break;
    5331+            } else if (pindexPar == best_hdr_chain[nHeight - 1]) {
    5332+                // Move to pindex's sibling on the best-chain, if it has one.
    5333+                pindex = best_hdr_chain[nHeight];
    5334+                // There will not be a next block if (and only if) parent block is the best header.
    5335+                assert(pindex || pindexPar == best_hdr_chain.Tip());
    


    ryanofsky commented at 1:25 pm on April 24, 2024:

    In commit “validation: improve performance of CheckBlockIndex” (7183bc90483cc9361cfffb157b91fd4245ab77e0)

    Note: this assert is a little different than what I suggested #28339 (review) because it is not enforcing that if the parent block is the best header, then pindex must be null. So the statement that there will not be a best block if the parent block is the best header is not checked by the assert.

    But I think this is fine. The comment is still true and the assert is still useful even if they don’t align exactly.


    mzumsande commented at 5:36 pm on April 26, 2024:
    That was not on purpose, I just missed the second half of you suggestion. Changed it now, slightly stricter checks can only be helpful.
  77. ryanofsky approved
  78. ryanofsky commented at 1:40 pm on April 24, 2024: contributor

    Code review ACK 3c3895ccfa108e10f4a0b1a78f64ffedade1fd11

    Left another suggestion, but this looks good in its current form.

  79. validation: improve performance of CheckBlockIndex
    by not saving all indexes in a std::multimap, but only
    those that are not part of the best header chain.
    The indexes of the best header chain are stored in a vector,
    which, in the typical case of a mostly linear chain with
    a few forks, results in a much smaller multimap, and increases
    performance noticeably for long chains.
    
    This does not change the actual consistency checks that are being
    performed for each index, just the way the block index tree is
    stored and traversed.
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    d5a631b959
  80. validation: allow to specify frequency for -checkblockindex
    This makes it similar to -checkaddrman and -checkmempool, which
    also allow to run the check occasionally instead of always / never.
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    5bc2077e8f
  81. mzumsande force-pushed on Apr 26, 2024
  82. mzumsande commented at 5:37 pm on April 26, 2024: contributor
    3c3895c to 5bc2077: Addressed review feedback
  83. furszy commented at 1:15 pm on May 20, 2024: member
    ACK 5bc2077e8f592442b089affdf0b5795fbc053bb8
  84. DrahtBot requested review from ryanofsky on May 20, 2024
  85. ryanofsky approved
  86. ryanofsky commented at 2:49 pm on June 10, 2024: contributor
    Code review ACK 5bc2077e8f592442b089affdf0b5795fbc053bb8. Just added suggested assert and simplification since last review
  87. achow101 commented at 8:33 pm on June 11, 2024: member
    ACK 5bc2077e8f592442b089affdf0b5795fbc053bb8
  88. achow101 merged this on Jun 11, 2024
  89. achow101 closed this on Jun 11, 2024

  90. mzumsande deleted the branch on Jun 11, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-01 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me