refactor: Make mapBlocksUnknownParent local, and rename it #25571

pull LarryRuane wants to merge 1 commits into bitcoin:master from LarryRuane:2022-07-mapBlocksUnknownParent changing 4 files +60 −12
  1. LarryRuane commented at 5:01 am on July 8, 2022: contributor

    This PR is a second attempt at #19594. This PR has two motivations:

    • Improve code hygiene by eliminating a global variable, mapBlocksUnknownParent
    • Fix fuzz test OOM when running too long ([see #19594 comment](/bitcoin-bitcoin/19594/#issuecomment-958801638))

    A minor added advantage is to release mapBlocksUnknownParent memory when the reindexing phase is done. The current situation is somewhat similar to a memory leak because this map exists unused for the remaining lifetime of the process. It’s true that this map should be empty of data elements after use, but its internal metadata (indexing structures, etc.) can have non-trivial size because there can be many thousands of simultaneous elements in this map.

    This PR helps our efforts to reduce the use of global variables. This variable isn’t just global, it’s hidden inside a function (it looks like a local variable but has the static attribute).

    This global variable exists because the -reindex processing code calls LoadExternalBlockFile() multiple times (once for each block file), but that function must preserve some state between calls (the mapBlocksUnknownParent map). This PR fixes this by allocating this map as a local variable in the caller’s scope and passing it in on each call. When reindexing completes, the map goes out of scope and is deallocated.

    I tested this manually by reindexing on mainnet and signet. Also, the existing feature_reindex.py functional test passes.

  2. DrahtBot added the label Refactoring on Jul 8, 2022
  3. DrahtBot commented at 9:34 am on July 8, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #25499 (Use steady clock for all millis bench logging by MarcoFalke)
    • #16981 (Improve runtime performance of –reindex by LarryRuane)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. in src/validation.cpp:4264 in 3bd8416bf8 outdated
    4262     AssertLockNotHeld(m_chainstate_mutex);
    4263-    // Map of disk positions for blocks with unknown parent (only used for reindex)
    4264-    static std::multimap<uint256, FlatFilePos> mapBlocksUnknownParent;
    4265+
    4266+    // Either both should be specified (-reindex), or neither (-loadblock).
    4267+    assert((dbp && blocks_with_unknown_parent) || (!dbp && !blocks_with_unknown_parent));
    


    sipa commented at 1:54 pm on July 8, 2022:
    Could be written somewhat shorter as assert(!dbp == !blocks_with_unknown_parent);.
  5. sipa commented at 1:55 pm on July 8, 2022: member
    utACK 3bd8416bf82a57e3aa1b869dc06fe4d55879a20a
  6. shaavan commented at 2:59 pm on July 8, 2022: contributor

    Concept ACK

    • This PR changes the scope of mapBlocksUnknown parent to the local variable of the caller function of LoadExternalBlockFile.

    • I agree with converting the variable from a static global type to a local type.

      The current situation is somewhat similar to a memory leak because this map exists unused for the remaining lifetime of the process. It’s true that this map should be empty of data elements after use, but its internal metadata (indexing structures, etc.) can have non-trivial size because there can be many thousands of simultaneous elements in this map.

      I agree with the reasoning, and this seems to cause an OOM error during fuzz testing.

    • I verified that the code refactoring was concise and clear. The code changes are purely refactoring modifications and do not introduce and change in code functionality.

    • The added documentation for the LoadExternalBlockFile, clearly explains the usage of the function and the reasoning for the used type of the arguments.

    • I was able to compile and run this PR on Ubuntu 22.04 successfully. And I was able to reindex on the signet network using the following command successfully

    0./src/bitcoind -signet -reindex
    

    Since, according to the guidelines, we need to include all the headers directly used in a file, I would suggest two instances of such inclusion.

    1. #include <map> in src/node/blockstorage.cpp and src/test/fuzz/load_external_block_file.cpp file.
    2. #include <cassert> in src/validation.cpp file.
  7. theStack commented at 3:25 pm on July 8, 2022: contributor
    Concept ACK
  8. LarryRuane force-pushed on Jul 8, 2022
  9. LarryRuane commented at 4:30 pm on July 8, 2022: contributor
    Force-pushed to address review comments (thanks @sipa and @shaavan!)
  10. in src/validation.h:608 in d6bbb6519b outdated
    605+     *                                              unknown parent, key is parent block hash
    606+     *                                              (only used for reindex)
    607+     * */
    608+    void LoadExternalBlockFile(
    609+        FILE* fileIn,
    610+        FlatFilePos* dps = nullptr,
    


    theStack commented at 12:53 pm on July 9, 2022:

    nit: seems like this parameter’s name is not consistent with the implementation

    0        FlatFilePos* dbp = nullptr,
    

    shaavan commented at 1:51 pm on July 9, 2022:

    Nice catch, @theStack. I noticed that the variable name is spelled dbp while defining the function, but it had been misspelled here.

    I think dps seems like a more apt name for this variable since it stands for “Disk Position”. So I would favor renaming this variable from dbp -> dps at other places as well.


    LarryRuane commented at 4:26 pm on July 9, 2022:
    That was a typo, good catch, I’m going to preserve the existing naming to minimize the size of the patch. (I think dbp stands for “disk block pointer”)
  11. theStack approved
  12. theStack commented at 1:06 pm on July 9, 2022: contributor

    Code-review ACK d6bbb6519b07b8afbe3950c707dcba6178d5106c (also tested -reindexing on signet)

    Nice improvement! Probably it’s worth it to investigate if the code-base contains more instances of static variables inside functions that don’t need to exist for the whole lifetime of the process and can thus be eliminiated by using a limited scope instead.

  13. LarryRuane force-pushed on Jul 9, 2022
  14. LarryRuane commented at 4:32 pm on July 9, 2022: contributor
    Force-pushed to address review comment (change variable name dps back to the original name dbp), thanks @theStack.
  15. theStack approved
  16. theStack commented at 4:35 pm on July 9, 2022: contributor
    re-ACK e9636158f29cb44f348ac7713294dc17358361f4
  17. shaavan approved
  18. shaavan commented at 2:33 pm on July 11, 2022: contributor

    reACK e9636158f29cb44f348ac7713294dc17358361f4

    Changes since my last review:

    • Renamed dps -> dbp
    • Added the headers <map> and <cassert> where they were used.
    • Rewritten assert condition to make the code simpler.
  19. in src/node/blockstorage.cpp:838 in e9636158f2 outdated
    834@@ -834,6 +835,9 @@ void ThreadImport(ChainstateManager& chainman, std::vector<fs::path> vImportFile
    835         // -reindex
    836         if (fReindex) {
    837             int nFile = 0;
    838+            // Map of disk positions for blocks with unknown parent (only used for reindex);
    


    mzumsande commented at 6:17 pm on July 11, 2022:
    nit (no need to re-push for that): “(only used for reindex)” is obvious and could be dropped now that this got moved into a code block conditional on fReindex.
  20. mzumsande commented at 6:33 pm on July 11, 2022: contributor

    Code Review ACK e9636158f29cb44f348ac7713294dc17358361f4

    I agree that it is good to remove the global map (I think this could also improve memory usage a little bit if the -reindex didn’t finish successfully in case of block file corruption, and we switched to IBD without being able to connect and remove all entries from the map).

    I don’t get how this could fix a fuzzer OOM error, #19594 (comment) also doesn’t elaborate on this.

  21. LarryRuane commented at 10:24 pm on July 11, 2022: contributor

    I don’t get how this could fix a fuzzer OOM error, #19594 (comment) also doesn’t elaborate on this.

    The fuzzer calls LoadExternalBlockFile() many times, and without this PR, each call adds entries to the static multimap that are never removed (they would be in real life, almost always, but not with random input).

    It’s easy to reproduce the OOM problem without causing an actual OOM (which could take a long time depending on your system’s memory resources):

    0$ FUZZ=load_external_block_file src/test/fuzz/fuzz
    

    and while that’s running, in another window, monitor its memory usage. I ran ps -C b-test -o rss= every 10 minutes (for some reason, the fuzz executable is called b-test when running). With the master branch (units are kbyte):

    0104664
    1112532
    2121912
    3156560
    

    (Then I stopped the test, clearly it continuously grows.) With this PR, there’s expected startup growth, then it’s unchanging:

    0104688
    1109440
    2109440
    3109440
    
  22. LarryRuane commented at 1:19 am on July 17, 2022: contributor
    @practicalswift @promag @amitiuttarwar You had previously ACKed PR #19594 by @hebasto that this PR replaces, maybe you would like to review, thanks.
  23. DrahtBot added the label Needs rebase on Jul 18, 2022
  24. refactor: Make mapBlocksUnknownParent local, and rename it
    Co-authored-by: Larry Ruane <larryruane@gmail.com>
    dd065dae9f
  25. LarryRuane force-pushed on Jul 18, 2022
  26. LarryRuane commented at 6:07 pm on July 18, 2022: contributor
    Force-pushed rebase for merge conflict.
  27. DrahtBot removed the label Needs rebase on Jul 18, 2022
  28. theStack approved
  29. theStack commented at 4:51 pm on July 19, 2022: contributor
    re-ACK dd065dae9fcebd6806ff67703ffa8128e80b97cc
  30. shaavan approved
  31. shaavan commented at 12:31 pm on July 22, 2022: contributor

    reACK dd065dae9fcebd6806ff67703ffa8128e80b97cc

    Changes since my last review:

    1. Rebased over the master.
  32. mzumsande commented at 2:11 pm on July 29, 2022: contributor
    re-ACK dd065dae9fcebd6806ff67703ffa8128e80b97cc
  33. fanquake merged this on Jul 29, 2022
  34. fanquake closed this on Jul 29, 2022

  35. sidhujag referenced this in commit 466a45a11a on Jul 29, 2022
  36. LarryRuane deleted the branch on Jul 29, 2022
  37. fanquake referenced this in commit 07ac7a2dbf on Dec 8, 2022
  38. bitcoin locked this on Jul 29, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 16:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me