allow -loadblock blocks to be unsorted #20331

pull LarryRuane wants to merge 3 commits into bitcoin:master from LarryRuane:loadblock-unsorted changing 6 files +104 −47
  1. LarryRuane commented at 5:24 am on November 7, 2020: contributor

    This PR was suggested in this comment in #19594, and builds on that PR (the first commit here).

    This PR allows the blocks in files specified by the -loadblock=file command line (configuration) options (there can be multiple) to be unsorted. This is already allowed by -reindex, but currently when -loadblock encounters a block with an unknown parent, it is ignored. This means the blocks must be sorted by height to be loaded successfully. This PR fixes this restriction, allowing the blocks across all the specified blocks files to be in any order. This makes the -loadblock option more useful. Also, the affected code is cleaner and more maintainable.

    (I’ll simplify this description move details to a separate comment before this PR gets merged.)

  2. DrahtBot added the label Validation on Nov 7, 2020
  3. DrahtBot commented at 12:45 pm on November 7, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #22937 (refactor: Forbid calling unsafe fs::path(std::string) constructor and fs::path::string() method by ryanofsky)
    • #16981 (Improve runtime performance of –reindex by LarryRuane)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. LarryRuane force-pushed on Nov 9, 2020
  5. LarryRuane commented at 6:00 pm on November 9, 2020: contributor

    A useful follow-on to this PR, since loading unordered blocks would become possible, might be the addition of a -loadblocksdir= argument that takes a directory path and loads all blocks found in the directory. That way it’s not necessary to specify a very large number of -loadblock arguments. It would then be easy to import an entire blockchain.

    Another way to accomplish approximately the same thing would be to replace the entire blocks directory in the data directory and then -reindex. But using -loadblocksdir would allow you to “merge” blocks into your existing blocks list (suppressing duplicates), which is kind of cool.

  6. adamjonas commented at 0:37 am on November 10, 2020: member

    Appears that the CI picked up a couple of argument issues with the LoadExternalBlockFile fuzz test:

     0test/fuzz/load_external_block_file.cpp:33:9: error: no matching function for call to 'LoadExternalBlockFile'
     1        LoadExternalBlockFile(Params(), fuzzed_block_file, &flat_file_pos, &blocks_with_unknown_parent);
     2        ^~~~~~~~~~~~~~~~~~~~~
     3./validation.h:162:6: note: candidate function not viable: requires 5 arguments, but 4 were provided
     4void LoadExternalBlockFile(
     5     ^
     6test/fuzz/load_external_block_file.cpp:35:9: error: no matching function for call to 'LoadExternalBlockFile'
     7        LoadExternalBlockFile(Params(), fuzzed_block_file);
     8        ^~~~~~~~~~~~~~~~~~~~~
     9./validation.h:162:6: note: candidate function not viable: requires 5 arguments, but 2 were provided
    10void LoadExternalBlockFile(
    11     ^
    122 errors generated.
    
  7. LarryRuane commented at 3:03 am on November 10, 2020: contributor
    I don’t know if anyone has been confused by -loadblock not loading all the blocks, but that did happen on the zcash network (a bitcoin core fork that I also work on): https://github.com/zcash/zcash/issues/4679
  8. LarryRuane force-pushed on Nov 10, 2020
  9. LarryRuane commented at 3:32 am on November 10, 2020: contributor
    Thanks, @adamjonas, force-pushed to fix the fuzz test.
  10. in src/validation.cpp:1189 in 7afedcfc71 outdated
    1158+    }
    1159+    if (offset > 0 && fseek(file, offset, SEEK_SET)) {
    1160+        fclose(file);
    1161+        return error("ReadBlockFromDisk: fseek failed for %s offset %u", path.string(), offset);
    1162+    }
    1163+    CAutoFile filein(file, SER_DISK, CLIENT_VERSION);
    


    mjdietzx commented at 6:45 pm on November 11, 2020:
    Why was this change made? Could you explain how the behavior of this is changing, aside from the function signature taking const fs::path& path, unsigned int offset instead of const FlatFilePos& pos?

    mjdietzx commented at 7:15 pm on November 11, 2020:

    Note: I’m guessing this change was made bc of what you do at line 4810: ReadBlockFromDisk(*pblockrecursive, blk_paths[child_pos.nFile], child_pos.nPos, chainparams.GetConsensus())

    But I’m failing to understand why you can’t just do: ReadBlockFromDisk(*pblockrecursive, child_pos, chainparams.GetConsensus())

    I’m probably missing something here, but I just want to check


    LarryRuane commented at 8:34 pm on November 11, 2020:
    Yes, I reached several dead-ends before arriving at this code factoring. We can’t use ReadBlockFromDisk(..., child_pos, ...) here because the pos argument is used in lower-level code (OpenFlockFile()) to construct and open a pathname like .../.bitcoin/blocks/blknnnnn.dat, which is just what we want for the -reindex case, but not for the -loadblock case (both share this code), because in the latter, we have a list of arbitrary pathnames to read from. Instead of calling the pos variant in the -reindex case and the path, offset variant in the -loadblock case, it’s simpler to just call it in the same way for both cases.
  11. in src/init.cpp:729 in 7afedcfc71 outdated
    752-                return;
    753-            }
    754-        } else {
    755-            LogPrintf("Warning: Could not open blocks file %s\n", path.string());
    756-        }
    757+    if (vImportFiles.size() > 0) {
    


    mjdietzx commented at 6:46 pm on November 11, 2020:
    Do you need the additional code branch here? Would it be preferable to just invoke LoadExternalBlockFiles("Importing", chainparams, vImportFiles, true); no matter what, and the for loop just won’t iterate through anything if vImportFiles is empty?

    LarryRuane commented at 8:14 pm on November 11, 2020:

    You’re right, it’s not really needed, but I think it’s useful to show the reader here that loading blocks this isn’t always being done; otherwise, to know nothing’s being done, the reader has to go look at LoadExternalBlockFiles().

    I also like that the conditional makes this part of the code similar to the if (fReindex) code just above it; they are doing very similar work.

    But I could go either way (no strong preference).

  12. in src/validation.cpp:4672 in 7afedcfc71 outdated
    4697+        LogPrintf("%s block file %s ...\n", action, blk_paths[n_file].filename().string());
    4698+        FILE* file = fsbridge::fopen(blk_paths[n_file], "rb");
    4699+        if (file == nullptr) {
    4700+            LogPrintf("%s: Warning: Could not open blocks file %s\n", __func__, blk_paths[n_file]);
    4701+            return;
    4702+        }
    


    mjdietzx commented at 7:09 pm on November 11, 2020:
    Isn’t this kinda duplication of what ReadBlockFromDisk does? I’m wondering why FlatFilePos pos(n_file, 0); isn’t created, passed to ReadBlockFromDisk (like before, when signature took FlatFilePos), and then you can pass it to LoadExternalBlockFile

    LarryRuane commented at 11:09 pm on November 11, 2020:

    This duplication exists even without this PR. Before this PR, this fopen() happens here.

    To summarize why this happens in two places: There are two ways blocks are “loaded” into memory when reindexing (and, now, with this PR, when using -loadblock). First, a blocks file is opened (here) and read from disk (using large reads) sequentially. At this point, we don’t know the individual blocks’ starting offsets, so we can’t create a FlatFilePos to pass to ReadBlockFromDisk() as you suggest. Besides, doing these large reads is more efficient, typically many blocks per read – plus a partial block at the end (since, again, the block boundaries are unknown). Once in memory, the block boundaries are discovered, and each block is deserialized. But the blocks in these blocks files are mostly out of order, so most of the blocks can’t be accepted (have AcceptBlock() called on it) because its parent has not yet been seen.

    When such a parent is later found, the child block is then re-read from disk using ReadBlockFromDisk(), deserialized a second time, and then passed to AcceptBlock() (allowed because its parent is now present). In this case, we’re reading exactly a single block, because we know where it starts (in the file) and its length (which is what a FlatFilePos encodes).

  13. in src/validation.h:165 in 7afedcfc71 outdated
    163     const CChainParams& chainparams,
    164-    FILE* fileIn,
    165-    FlatFilePos* dbp = nullptr,
    166-    std::multimap<uint256, FlatFilePos>* blocks_with_unknown_parent = nullptr);
    167+    const std::vector<fs::path>& blk_paths,
    168+    size_t n_file,
    


    mjdietzx commented at 7:11 pm on November 11, 2020:
    Given the name LoadExternalBlockFile I’m confused why blk_paths plural is passed as a param, instead of just a single file/path

    LarryRuane commented at 11:20 pm on November 11, 2020:
    That function, LoadExternalBlockFile(), does (attempt to) load the blocks from exactly one block file, but, doing so may cause, as a side effect, the loading of blocks seen earlier (in earlier files) but that could not be loaded then because their parent was missing (and is now present, see my earlier comment). This requires reaching back and reading specific blocks from earlier files, and that requires the full list of block file paths (the blk_paths argument). This basic flow is the same as before this PR. I agree this is somewhat confusing, maybe a comment would help. But maybe not, because perhaps there’s no shortcut to just studying the code, I’m not sure.
  14. in src/validation.cpp:52 in 7afedcfc71 outdated
    48@@ -49,6 +49,7 @@
    49 #include <validationinterface.h>
    50 #include <warnings.h>
    51 
    52+#include <cassert>
    


    LarryRuane commented at 11:22 pm on November 11, 2020:
    This isn’t needed, I’ll remove in a later force-push.
  15. in src/test/fuzz/load_external_block_file.cpp:14 in 7afedcfc71 outdated
    10@@ -11,6 +11,7 @@
    11 #include <validation.h>
    12 
    13 #include <cstdint>
    14+#include <map>
    


    LarryRuane commented at 11:22 pm on November 11, 2020:
    This isn’t needed, I’ll remove in a later force-push.
  16. LarryRuane commented at 11:23 pm on November 11, 2020: contributor
    @mjdietzx, thank you for your review!
  17. LarryRuane commented at 5:00 pm on November 12, 2020: contributor

    Here’s a manual test procedure (I’ll describe testnet because it’s faster, but mainnet works too)

    • testnet node is not running
    • wait for your testnet3 data directory to get out of date (don’t run the client for at least several days)
    • change directory to your .bitcoin data directory
    • rename testnet3 to testnet3-save
    • start the node with -testnet, wait for it to sync (IBD), it will have more blocks (and more blocks files) than testnet3-save
    • stop the node
    • rename testnet3 to testnet3-new
    • rename testnet3-save to testnet3
    • start the node as follows, except replace 191 with the highest-numbered file in the testnet3-save directory:
    0bitcoind -testnet -networkactive=0 $(seq 0 191|xargs printf '-loadblock=testnet3-new/blocks/blk%05u.dat ')
    

    This generates a command line like:

    0bitcoind -testnet -networkactive=0 -loadblock=testnet3-new/blocks/blk00000.dat -loadblock=testnet3-new/blocks/blk00001.dat ...
    
    • observe that, even though not syncing with peers, the node successfully reaches a recent block height (look at the most recent UpdateTip in debug.log).
  18. promag commented at 11:15 am on November 14, 2020: member
    @LarryRuane I guess it’s possible to make a functional test then?
  19. LarryRuane commented at 4:02 pm on November 14, 2020: contributor
    @promag I didn’t write a new functional test, but I did modify the existing loadblock functional test to cover the new behavior (the test fails without the PR). Is that sufficient?
  20. dongcarl commented at 5:26 pm on December 3, 2020: member

    Verified that the test fails without the new behaviour, cool!

    A couple of things that could be done:

    1. Instead of asserting based on the debug log (which is a bit more fragile), perhaps the assertion can be made based on the number of blocks connected (tip height). We may need to disable networking for this to work reliably.
    2. I think the changes in test/functional should be split into its own commit and put as the first commit in this patchset, this way it’s easier for reviewers to verify that the new behavior does indeed do what it’s supposed to do.
  21. LarryRuane force-pushed on Dec 4, 2020
  22. LarryRuane commented at 2:42 am on December 4, 2020: contributor

    Force-pushed to implement both suggestions from @dongcarl (thank you!).

    Reminder, this PR is layered on top of #19594 so please review only the last two commits (of the 3).

    UPDATE Mar 5, 2021: I squashed the last two commits, so only review the last commit.

  23. DrahtBot added the label Needs rebase on Feb 1, 2021
  24. LarryRuane force-pushed on Feb 4, 2021
  25. DrahtBot removed the label Needs rebase on Feb 4, 2021
  26. LarryRuane commented at 3:25 pm on February 4, 2021: contributor
    Rebased, no functional changes.
  27. DrahtBot added the label Needs rebase on Mar 4, 2021
  28. LarryRuane force-pushed on Mar 5, 2021
  29. DrahtBot removed the label Needs rebase on Mar 5, 2021
  30. DrahtBot added the label Needs rebase on Apr 13, 2021
  31. LarryRuane force-pushed on Apr 23, 2021
  32. LarryRuane commented at 4:10 pm on April 23, 2021: contributor
    This required rebase also splits the second commit into two commits to separate refactoring from the functional changes. Ready for review.
  33. DrahtBot removed the label Needs rebase on Apr 23, 2021
  34. LarryRuane force-pushed on Apr 24, 2021
  35. DrahtBot added the label Needs rebase on May 5, 2021
  36. LarryRuane force-pushed on May 7, 2021
  37. LarryRuane commented at 4:38 am on May 7, 2021: contributor
    Rebased. The second commit is now much simpler because of #21727.
  38. DrahtBot removed the label Needs rebase on May 7, 2021
  39. sipa commented at 10:45 pm on June 10, 2021: member
    utACK dbd7f729f42d58a1d47700dc75fb1a5a6d76cdf2
  40. DrahtBot added the label Needs rebase on Jun 12, 2021
  41. LarryRuane force-pushed on Jun 14, 2021
  42. DrahtBot removed the label Needs rebase on Jun 14, 2021
  43. LarryRuane commented at 3:40 am on June 14, 2021: contributor
    Rebased to resolve merge conflicts.
  44. DrahtBot added the label Needs rebase on Jun 29, 2021
  45. refactor: Make mapBlocksUnknownParent local, and rename it f58f1aa36f
  46. refactor: Add a new version of `ReadBlockFromDisk()`
    This lower-level version takes a path and a seek offset as arguments.
    Currently unused but is needed by the next commit.
    130508ebf8
  47. allow -loadblock blocks to be unsorted
    Add LoadExternalBlockFiles() (plural) that accepts a list of paths and
    loads all the blocks found in these paths. The blocks can be completely
    out of order; block C (child) can be seen first and its parent P not
    seen until a later file, then block C will be processed.  This requires
    re-reading block C from the earlier file; block C isn't kept in memory
    during this interval.
    
    The list of paths is either the blocks/blk*.dat files (during reindexing)
    or given with one or more -loadblk arguments.
    
    LoadExternalBlockFiles() calls LoadExternalBlockFile() (singular) to
    read a single file, but out-of-order blocks requires that this function
    be able to read individual blocks from earlier files.
    0c97fa3d22
  48. LarryRuane force-pushed on Jul 27, 2021
  49. DrahtBot removed the label Needs rebase on Jul 28, 2021
  50. DrahtBot commented at 9:40 am on October 15, 2021: member

    🐙 This pull request conflicts with the target branch and needs rebase.

    Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

  51. DrahtBot added the label Needs rebase on Oct 15, 2021
  52. DrahtBot commented at 10:49 am on February 22, 2022: member
    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  53. LarryRuane commented at 9:12 pm on March 8, 2022: contributor
    The -loadblock feature doesn’t seem to be often used, so improvements to it aren’t much in demand, so closing this. If this feature becomes more important, we can resurrect this PR.
  54. LarryRuane closed this on Mar 8, 2022

  55. DrahtBot locked this on Mar 8, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-03 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me