refactor: Make mapBlocksUnknownParent local, and rename it #19594

pull hebasto wants to merge 1 commits into bitcoin:master from hebasto:200726-lebf changing 4 files +32 −12
  1. hebasto commented at 9:18 pm on July 26, 2020: member

    LarryRuane’s comment in #16981:

    Does it bother anyone else that this is static? This is global state, which, in general, equals evil. It must be static, given that it’s declared here, since its state must persist across calls to this function (very often, a block and its parent are in different blk files). We could eliminate this global state by allocating this map in the caller of this function, src/init.cpp: ThreadImport(), and passing it by reference to LoadExternalBlockFile().

    Another thing that seems missing from the current design is, what happens if there are leftover entries in this map after reindexing has completed? That means we’ve read blocks from disk but never found their parent blocks. In my testing, this map is empty at the end of reindexing, as expected. But what if it wasn’t? That would mean we’re missing one or more blocks/blk00nnn.dat files, or some blocks from those files. Isn’t that worth at least a LogPrintf()? Making this change would also require moving the map out to the caller, because this function doesn’t know if it will be called again.

  2. refactor: Make mapBlocksUnknownParent local, and rename it adc08e1bdc
  3. hebasto commented at 9:18 pm on July 26, 2020: member
    Friendly ping @LarryRuane :)
  4. DrahtBot added the label Refactoring on Jul 26, 2020
  5. DrahtBot added the label Tests on Jul 26, 2020
  6. DrahtBot added the label Validation on Jul 26, 2020
  7. DrahtBot commented at 3:25 am on July 27, 2020: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #20750 ([Bundle 2/n] Prune g_chainman usage in mempool-related validation functions by dongcarl)
    • #20749 ([Bundle 1/n] Prune g_chainman usage related to ::LookupBlockIndex by dongcarl)
    • #20331 (allow -loadblock blocks to be unsorted by LarryRuane)
    • #20158 (tree-wide: De-globalize ChainstateManager by dongcarl)
    • #16981 (Improve runtime performance of –reindex by LarryRuane)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  8. practicalswift commented at 3:26 pm on August 14, 2020: contributor
    Concept ACK: local is easier to reason about :)
  9. in src/validation.cpp:4761 in adc08e1bdc
    4757@@ -4752,15 +4758,17 @@ void LoadExternalBlockFile(const CChainParams& chainparams, FILE* fileIn, FlatFi
    4758 
    4759                 NotifyHeaderTip();
    4760 
    4761+                if (!blocks_with_unknown_parent) continue;
    


    LarryRuane commented at 4:31 pm on August 14, 2020:
    Nice! It was hard to understand the previous code when !dbp.
  10. in src/validation.cpp:4786 in adc08e1bdc
    4782@@ -4775,7 +4783,7 @@ void LoadExternalBlockFile(const CChainParams& chainparams, FILE* fileIn, FlatFi
    4783                             }
    4784                         }
    4785                         range.first++;
    4786-                        mapBlocksUnknownParent.erase(it);
    4787+                        blocks_with_unknown_parent->erase(it);
    


    LarryRuane commented at 5:10 pm on August 14, 2020:
    0                        range.first = blocks_with_unknown_parent->erase(it);
    

    As written this is safe with this container type, but the assignment makes fewer assumptions (and is one less line of code!).


    hebasto commented at 7:06 pm on August 14, 2020:
    Agree. But current diff is minimal :)
  11. in src/validation.cpp:4732 in adc08e1bdc
    4726@@ -4722,8 +4727,9 @@ void LoadExternalBlockFile(const CChainParams& chainparams, FILE* fileIn, FlatFi
    4727                     if (hash != chainparams.GetConsensus().hashGenesisBlock && !LookupBlockIndex(block.hashPrevBlock)) {
    4728                         LogPrint(BCLog::REINDEX, "%s: Out of order block %s, parent %s not known\n", __func__, hash.ToString(),
    4729                                 block.hashPrevBlock.ToString());
    4730-                        if (dbp)
    4731-                            mapBlocksUnknownParent.insert(std::make_pair(block.hashPrevBlock, *dbp));
    4732+                        if (dbp && blocks_with_unknown_parent) {
    4733+                            blocks_with_unknown_parent->emplace(block.hashPrevBlock, *dbp);
    4734+                        }
    


    LarryRuane commented at 6:05 pm on August 14, 2020:
    You could test either variable here (probably blocks_with_unknown_parent would be best), but I assume you’re testing both since the code references both. The only downside is it could lead the casual reader to think that only one of these could be nullptr and not the other. The emplace simplification is nice here.

    hebasto commented at 6:54 pm on August 14, 2020:

    … but I assume you’re testing both since the code references both.

    Correct.

  12. LarryRuane approved
  13. LarryRuane commented at 6:38 pm on August 14, 2020: contributor

    ACK adc08e1bdcd092b07a349908c22a82a9ad7adeea Looks good, very nice PR, better than I could have done! I did some basic testing (ran bitcoind -reindex).

    Here’s a slightly tangential suggestion, as long as you’re changing this code, to eliminate one level of looping:

     0while (!queue.empty()) {
     1    const uint256& head = queue.front();
     2    auto it = blocks_with_unknown_parent->find(head);
     3    if (it == blocks_with_unknown_parent->end()) {
     4        queue.pop_front();
     5        continue;
     6    }
     7    std::shared_ptr<CBlock> pblockrecursive = std::make_shared<CBlock>();
     8    if (ReadBlockFromDisk(*pblockrecursive, it->second, chainparams.GetConsensus()))
     9    {
    10        LogPrint(BCLog::REINDEX, "%s: Processing out of order child %s of %s\n", __func__, pblockrecursive->GetHash().ToString(),
    11                head.ToString());
    12        LOCK(cs_main);
    13        BlockValidationState dummy;
    14        if (::ChainstateActive().AcceptBlock(pblockrecursive, dummy, chainparams, nullptr, true, &it->second, nullptr))
    15        {
    16            nLoaded++;
    17            queue.push_back(pblockrecursive->GetHash());
    18        }
    19    }
    20    blocks_with_unknown_parent->erase(it);
    21    NotifyHeaderTip();
    22}
    
  14. hebasto commented at 7:08 pm on August 14, 2020: member

    @LarryRuane

    Here’s a slightly tangential suggestion, as long as you’re changing this code, to eliminate one level of looping:

    Let’s keep this PR diff minimal and focused :)

  15. MarcoFalke removed the label Tests on Aug 23, 2020
  16. promag commented at 8:13 am on September 7, 2020: member
    Concept ACK.
  17. epson121 commented at 9:50 am on September 7, 2020: none
    Code review ACK adc08e1bdcd092b07a349908c22a82a9ad7adeea
  18. theStack approved
  19. theStack commented at 3:41 pm on October 11, 2020: contributor
    Code Review ACK adc08e1bdcd092b07a349908c22a82a9ad7adeea 🧇
  20. dongcarl commented at 9:17 pm on October 19, 2020: contributor
    Concept ACK. This is a necessary change for 2f2e8a2 (#20158) to work correctly. Sorry, looking at this again now. It doesn’t seem like it’s fixing anything that’s not working. If the concern is “what happens if the map is not empty at the end of reindexing,” then we should write a test case for that and implement the warning.
  21. amitiuttarwar commented at 2:40 am on November 3, 2020: contributor

    code review ACK adc08e1bdcd092b07a349908c22a82a9ad7adeea

    but I think OP should be updated. code currently is a pure refactor to maintain less global state. however, OP also has section that asks “what happens if there are leftover entries in this map after reindexing has completed”, which I don’t believe is addressed in this PR?

    tangential question: is -loadblock still used, or mostly a leftover relic? I was having a hard time understanding the purpose of -loadblock, and why it wouldn’t take advantage of the orphan handling we have available. Looks like it was introduced in #883 and intentionally doesn’t handle orphans (#1695). Seems like it might just be a dated mechanism that we still have around? So I’m interested to hear if there are current use cases.

  22. LarryRuane commented at 5:33 pm on November 3, 2020: contributor

    @amitiuttarwar :

    … I don’t believe is addressed in this PR

    The OP does mention logging if there are entries leftover entries, and I think that would be a good idea. But no other recovery is necessary, because the client will download the missing blocks from peers (I’ve verified this).

    While working on zcash, I recently looked into the -loadblock argument and concluded that it should be removed, see https://github.com/zcash/zcash/issues/4679#issuecomment-678839093, I’ll quote the relevant part (the same 2012 commit hash referenced in this comment also exists in the bitcoin repo here):

    I don’t think -loadblock makes sense at all. I just found this interesting commit where -reindex was added and -loadblock already existed – so -reindex was a nicer way to do what people used -loadblock to do previously. That strengthens my opinion that -loadblock isn’t worth fixing. My guess is that back in 2012, the “headers-first” block download hadn’t been implemented and merged yet, so blocks were in height order in the blk?????.dat files, so -loadblock worked well. Headers-first download broke it, but there’s been no strong reason to fix it since -reindex existed by then.

    I could make a small PR to remove -loadblock, if that would be helpful (that shouldn’t be part of the current PR).

  23. LarryRuane commented at 6:06 pm on November 3, 2020: contributor

    me:

    I could make a small PR to remove -loadblock

    On further thought, we shouldn’t do this, because I can easily imagine that it may be currently used for testing (outside of our repo), for example, with regtest.

  24. sipa commented at 6:07 pm on November 3, 2020: member
    @amitiuttarwar The purpose of -loadblock is loading external blocks from e.g. a bootstrap.dat file. I don’t know if those see much use still, but we distribute the script to produce bootstrap.dat files still (see https://github.com/bitcoin/bitcoin/tree/master/contrib/linearize).
  25. MarcoFalke commented at 6:39 pm on November 3, 2020: member
    It should also be possible to specify all block files of another datadir via multiple -loadblock. (sync via local disk)
  26. LarryRuane commented at 7:09 pm on November 3, 2020: contributor

    It should also be possible to specify all block files of another datadir via multiple -loadblock

    No, because when using -loadblock (as opposed to -reindex), there is no unknown parent map. When -loadblock encounters a block whose parent hasn’t yet been seen, which is very common beginning with headers-first download, it’s silently ignored.

    We could add an unknown parents map to the -loadblock path, but that doesn’t exist currently. I could make a tiny PR to do that, if it would be useful (it would make sense base it on the current PR).

  27. amitiuttarwar commented at 11:00 pm on November 3, 2020: contributor

    @LarryRuane

    The OP does mention logging if there are entries leftover entries, and I think that would be a good idea. But no other recovery is necessary, because the client will download the missing blocks from peers (I’ve verified this).

    right, sure, I think logging would make sense, but I’m talking about the difference between OP and code changes. Unless I’m mistaken, the code doesn’t address this, so its misleading to talk about in the OP. additional logging could be added to this PR or it could be addressed in a separate PR, but the code & description should match :)


    RE: -loadblock:

    • I just posted in irc asking if people are currently using. one person said they still use it
    • if we are trying to import block files from another datadir, wouldn’t we just add them in the right directory and then reindex?

    on irc @sipa said

    i think part is that you don’t want to encourage people copying entire datadirs, as chainstates don’t get re-verified if you copy a chainstate from someone (not just block files), they don’t get re-verified (because your node wouldn’t know it’s operating on an imported copy)

    as Larry said, one reason loadblock wouldn’t work with the blocks dat is because it disables the parent map. but I don’t think we should add functionality to amplify support unless we have concretely known use cases

    but again, I don’t think this is the most relevant to this PR. I current don’t feel any conclusive evidence towards removing loadblock or adding support, but if I did I think opening an issue & continuing conversation there would make sense

  28. DrahtBot commented at 1:10 pm on February 1, 2021: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

    Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

  29. DrahtBot added the label Needs rebase on Feb 1, 2021
  30. LarryRuane commented at 4:51 am on April 24, 2021: contributor
    @hebasto, I rebased this PR’s commit as part of #20331. Feel free to cherry-pick it back to here. (A review of #20331 would be appreciated, BTW.)
  31. MarcoFalke commented at 10:01 am on November 3, 2021: member
    This will also fix an issue where the fuzz test goes OOM when running too long
  32. DrahtBot commented at 12:17 pm on February 22, 2022: contributor
    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  33. hebasto commented at 2:44 pm on April 21, 2022: member

    I won’t be able to focus on this stuff in the near future.

    So closing up for grabs.

  34. hebasto closed this on Apr 21, 2022

  35. hebasto added the label Up for grabs on Apr 21, 2022
  36. MarcoFalke removed the label Up for grabs on Jul 8, 2022
  37. fanquake referenced this in commit 5871b5b5ab on Jul 29, 2022
  38. bitcoin locked this on Jul 8, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me