blockstorage: Adjust fastprune limit if block exceeds blockfile size #27191

pull mzumsande wants to merge 2 commits into bitcoin:master from mzumsande:202302_fastprune_oom changing 3 files +55 −1
  1. mzumsande commented at 5:13 pm on March 2, 2023: contributor

    The debug-only -fastprune option used in several tests is not always safe to use: If a -fastprune node receives a block larger than the maximum blockfile size of 64kb bad things happen: The while loop in BlockManager::FindBlockPos never terminates, and the node runs oom because memory for m_blockfile_info is allocated in each iteration of the loop. The same would happen if a naive user used -fastprune on anything other than regtest (so this can be tested by syncing on signet for example, the first block that crashes the node is at height 2232).

    Change the approach by raising the blockfile size to the size of the block, if that block otherwise wouldn’t fit (idea by TheCharlatan).

  2. DrahtBot commented at 5:13 pm on March 2, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK ryanofsky, TheCharlatan, pinheadmz

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #27125 (refactor, kernel: Decouple ArgsManager from blockstorage by TheCharlatan)
    • #27039 (blockstorage: do not flush block to disk if it is already there by pinheadmz)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Block storage on Mar 2, 2023
  4. in src/node/blockstorage.cpp:604 in fe6d866c03 outdated
    600@@ -601,7 +601,9 @@ bool BlockManager::FindBlockPos(FlatFilePos& pos, unsigned int nAddSize, unsigne
    601 
    602     bool finalize_undo = false;
    603     if (!fKnown) {
    604-        while (m_blockfile_info[nFile].nSize + nAddSize >= (gArgs.GetBoolArg("-fastprune", false) ? 0x10000 /* 64kb */ : MAX_BLOCKFILE_SIZE)) {
    605+        const unsigned int max_blockfile_size{gArgs.GetBoolArg("-fastprune", false) ? 0x10000 : MAX_BLOCKFILE_SIZE};
    


    pinheadmz commented at 6:47 pm on March 2, 2023:
    This cleanup is already addressed in #27039 ;-)

    mzumsande commented at 5:45 am on March 3, 2023:
    Good, should make rebasing easy! Here, it’s not a cleanup but needed to use max_blockfile_size in the added assert.
  5. pinheadmz commented at 6:51 pm on March 2, 2023: member

    concept ACK

    I think assert is the appropriate move here to prevent the infinite memory loop! We mine as well just crash early. It might also make sense to not allow -fastprune on anything besides regtest as well. I’m not sure if that is covered by ArgsManager::DEBUG_ONLY

  6. mzumsande force-pushed on Mar 3, 2023
  7. mzumsande commented at 5:49 am on March 3, 2023: contributor

    It might also make sense to not allow -fastprune on anything besides regtest as well. I’m not sure if that is covered by ArgsManager::DEBUG_ONLY

    Not sure if suppressing options for specific networks is something we usually do (are there any other examples for this?). But I added some more explanation to the -fastprune documentation in init with the latest push.

  8. TheCharlatan commented at 12:10 pm on March 3, 2023: contributor
    Not sure how I feel about this change. It is trivial to create blocks exceeding this limit on regtest. If this happens, it should at least log a message explaining why it failed. The size limit constants used by fastprune seem arbitrary to me. What is their rationale, just fewer resources allocated/used while running the pruning functional tests? If so, why are the revision files not size restricted as well?
  9. mzumsande commented at 4:46 pm on March 3, 2023: contributor

    Not sure how I feel about this change. It is trivial to create blocks exceeding this limit on regtest. If this happens, it should at least log a message explaining why it failed.

    I’ll add a log message with my next push.

    The size limit constants used by fastprune seem arbitrary to me. What is their rationale, just fewer resources allocated/used while running the pruning functional tests? If so, why are the revision files not size restricted as well?

    It was introduced in c286a22f7b63a8bd336d5d7606c339053ee0054b (#15946) without much discussion - I’d say it’s a quick-and-dirty way to allow testing the pruning code in functional tests - tests like feature_index_prune.py would probably take hours without it.

    I stumbled upon this while reviewing #27125, when I had the great idea to quickly sync on signet with -fastprune to make sure that it still works the same, and after my system crashed due to running oom I thought this could be handled a bit better…

  10. in src/node/blockstorage.cpp:605 in d526dbb50f outdated
    600@@ -601,7 +601,9 @@ bool BlockManager::FindBlockPos(FlatFilePos& pos, unsigned int nAddSize, unsigne
    601 
    602     bool finalize_undo = false;
    603     if (!fKnown) {
    604-        while (m_blockfile_info[nFile].nSize + nAddSize >= (gArgs.GetBoolArg("-fastprune", false) ? 0x10000 /* 64kb */ : MAX_BLOCKFILE_SIZE)) {
    605+        const unsigned int max_blockfile_size{gArgs.GetBoolArg("-fastprune", false) ? 0x10000 /* 64kb */ : MAX_BLOCKFILE_SIZE};
    606+        assert(nAddSize < max_blockfile_size); // otherwise (possible in debug-only -fastprune mode) the while loop wouldn't terminate
    


    TheCharlatan commented at 10:03 pm on March 4, 2023:

    Instead of asserting, how about setting the block file size dynamically in case the block to be written exceeds the limit? Something like:

    0-        const unsigned int max_blockfile_size{gArgs.GetBoolArg("-fastprune", false) ? 0x10000 /* 64kb */ : MAX_BLOCKFILE_SIZE};
    1+        unsigned int max_blockfile_size = MAX_BLOCKFILE_SIZE;
    2+        if (gArgs.GetBoolArg("-fastprune", false)) {
    3+            max_blockfile_size = 0x10000; // 64kiB
    4+            if (nAddSize >= max_blockfile_size) {
    5+                max_blockfile_size = nAddSize + 1;
    6+            }
    7+        }
    

    mzumsande commented at 11:27 pm on March 10, 2023:

    Nice idea! I think that would work - I could do a full signet sync with it.

    I’m still a bit undecided on general grounds, i.e. if it’s okay to add more logic to accommodate a test-only mode that doesn’t make sense for anything else but easier testing of pruning in the functional and unit tests. One could also argue that if no one but a developer would ever have a reason to use it, it’d be ok to just tell them more clearly not to create large blocks alongside using this option and abort otherwise? @MarcoFalke : Do you have an opinion on this, or do you remember past discussions about test-only bitcoind options like this one?


    maflcko commented at 11:28 am on March 11, 2023:

    @MarcoFalke : Do you have an opinion on this, or do you remember past discussions

    Both no :)


    mzumsande commented at 7:36 pm on March 21, 2023:
    I switched to your suggestion, removing the assert!
  11. mzumsande force-pushed on Mar 21, 2023
  12. mzumsande commented at 7:31 pm on March 21, 2023: contributor
    I changed the approach to @TheCharlatan’s idea (added you as coauthor) - now we don’t assert anymore but raise the size of the blockfile to the size of the added block if needed.
  13. mzumsande renamed this:
    blockstorage: add an assert to avoid running oom with `-fastprune`
    blockstorage: Adjust fastprune limit if block exceeds blockfile size
    on Mar 21, 2023
  14. TheCharlatan commented at 8:55 pm on March 21, 2023: contributor
    ACK e930814fdb9261f180d5c436cefe52e9cf38fd67 fwiw :P Thank you for following up on this @mzumsande.
  15. pinheadmz approved
  16. pinheadmz commented at 5:45 pm on March 22, 2023: member

    ACK e930814fdb9261f180d5c436cefe52e9cf38fd67

    I also wrote a test for this that freezes (infinite loop) on master but passes on this branch. Take it or leave it, I just wanted to see the mechanism work before acking:

    https://github.com/pinheadmz/bitcoin/commit/c576efa6dc7c32981d3b2fb42c9c8573b8c2e293

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA256
     2
     3ACK e930814fdb9261f180d5c436cefe52e9cf38fd67
     4-----BEGIN PGP SIGNATURE-----
     5
     6iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmQbPmoACgkQ5+KYS2KJ
     7yTq1vQ//ddoQ67cVvJRnnTcXybDOjTtHHVMSoWCyujBBCDr54sOVNr0VgqTgkOYB
     8tAHWOVjlVwbIOCVG71GmmbfWAaQ97+3RICvRsAq0WaD8CrkV677nY4d5VYA471kU
     9qKnmRXm4j0Shb+7l4p6kDhX67pWDD1q9kACia823BsMhi1c7AbqYJM9Wte9oVrjM
    10FxZaWjFTcZiFSk9o6kDnAA4Va4H8vQCI5SGAyYynw1LaBJvDT+UjEPsheKGDQ45p
    11JqkASuNDTNzGtLGZsYR2W42zbIImHg8YMVqIaJUtCWZJgGMUlQz4qGRA4tylAE/H
    12IDcyuz+dCRmqlVHWczgNrYrSRASmFzWpvNFBdL67x36OndY/o4QehGU3PF/gazAR
    135o6jIrr6AieuttFwWEF82s7fwCJWtkbx1jTVPCpRXrKPNhQvaVMcVBYx5ah0IG93
    14V3n6q3iayIu8xZ3yY1MSuGtszbxTkxSMH7cwOGNGfj2nAwGdANFmtYzU3lE2SjBd
    15HD1felNhxRcSyjvxLWh5+21O1BuD7sj7+lrULFZR5eYKUzn2dI266HwJNOir8HrT
    16FybjNY1RQThHizZG8iWQjsEBvBVCkZxm0xMVQE8W33TIOBjxouJs2BAX5CVR2RyF
    17NfhriauvkMXJYJh4clbVy814BhCm+aWSmEwl5ODsVtjnC6QgUH0=
    18=b/RR
    19-----END PGP SIGNATURE-----
    

    pinheadmz’s public key is on keybase

  17. fanquake requested review from ryanofsky on Mar 22, 2023
  18. in src/node/blockstorage.cpp:629 in e930814fdb outdated
    606+        // Use smaller blockfiles in test-only -fastprune mode - but avoid
    607+        // the possibility of having a block not fit into the block file.
    608+        if (gArgs.GetBoolArg("-fastprune", false)) {
    609+            max_blockfile_size = 0x10000; // 64kiB
    610+            if (nAddSize >= max_blockfile_size) {
    611+                max_blockfile_size = nAddSize + 1;
    


    ryanofsky commented at 3:42 pm on April 14, 2023:

    In commit “blockstorage: Adjust fastprune limit if block exceeds blockfile size” (e930814fdb9261f180d5c436cefe52e9cf38fd67)

    What is the +1 for? Would be helpful to add a code comment saying what the extra byte is.


    TheCharlatan commented at 9:09 am on April 18, 2023:
    I checked this carefully again and don’t think anymore that the + 1 is necessary.

    mzumsande commented at 3:45 pm on April 18, 2023:

    I think it is necessary, as long as the while condition says while (m_blockfile_info[nFile].nSize + nAddSize >= max_blockfile_size) it won’t terminate if nAddSize == max_blockfile_size, because m_blockfile_info[nFile].nSize will be 0 if we skip to the next block.

    Maybe we could lose the + 1 if we’d change the >= from the while loop into a >, however I really don’t want to change this critical code in this PR because it would mean we’d try to squeeze another byte into all blocks, affecting blockstorage in normal operation as well, not just the test-only -fastprune mode. So I’ll add a comment.


    ryanofsky commented at 5:45 pm on May 1, 2023:
    Makes sense the +1 is needed to be consistent with >= below. It does look like there is a minor bug here that will cause block files to be at most MAX_BLOCKFILE_SIZE-1 bytes and never reach the maximum size. But if this is a bug, it goes back to 5382bcf8cd23c36a435c29080770a79b5e28af42 when this code was introduced. Agree it would not be good to change that behavior here.

    pinheadmz commented at 6:25 pm on May 1, 2023:
    The functional test covers the +1 nicely as well. Without the +1 (and removing the assertion on L632) the test will hang until timed out at 30s
  19. in src/node/blockstorage.cpp:633 in e930814fdb outdated
    609+            max_blockfile_size = 0x10000; // 64kiB
    610+            if (nAddSize >= max_blockfile_size) {
    611+                max_blockfile_size = nAddSize + 1;
    612+            }
    613+        }
    614+        while (m_blockfile_info[nFile].nSize + nAddSize >= max_blockfile_size) {
    


    ryanofsky commented at 3:49 pm on April 14, 2023:

    In commit “blockstorage: Adjust fastprune limit if block exceeds blockfile size” (e930814fdb9261f180d5c436cefe52e9cf38fd67)

    Would be good to add to add assert(nAddSize <= max_blockfile_size); here to make it clear what this function is assuming and catch the problem instead of going into an infinite loop if outside code changes.


    mzumsande commented at 6:37 pm on April 18, 2023:
    done, although I think we should assert nAddSize < max_blockfile_size; - see explanation above.
  20. ryanofsky approved
  21. ryanofsky commented at 3:57 pm on April 14, 2023: contributor
    Code review ACK e930814fdb9261f180d5c436cefe52e9cf38fd67, but I think it would be great to add the test written by @pinheadmz #27191#pullrequestreview-1353171052 because the blockmanager code is fragile and a regression could happen, and because the test setup is very clean so it could probably be used to branch out and test other things.
  22. ryanofsky commented at 4:07 pm on April 14, 2023: contributor

    I think assert is the appropriate move here to prevent the infinite memory loop!

    I think throwing an exception or calling abort() would be ok, but better to reserve the assert macro for conditions which are actually impossible and not use it for error handling, because that would water down the meaning of other asserts.

  23. mzumsande force-pushed on Apr 18, 2023
  24. mzumsande commented at 6:39 pm on April 18, 2023: contributor
    Thanks for the reviews! I pushed an update to address the outstanding comments and added the test by @pinheadmz (thanks!).
  25. mzumsande force-pushed on Apr 19, 2023
  26. maflcko commented at 2:50 pm on April 19, 2023: member
    unrelated: Red CI can be ignored, or if you care a lot, you can rebase
  27. DrahtBot added the label CI failed on Apr 19, 2023
  28. blockstorage: Adjust fastprune limit if block exceeds blockfile size
    If the added block exceeds the blockfile size in test-only
    -fastprune mode, the node would get stuck in an infinite loop and
    run out of memory.
    
    Avoid this by raising the blockfile size to the size of the added block
    in this situation.
    
    Co-authored-by: TheCharlatan <seb.kung@gmail.com>
    271c23e87f
  29. test: cover fastprune with excessive block size 8f14fc8622
  30. mzumsande force-pushed on Apr 19, 2023
  31. ryanofsky approved
  32. ryanofsky commented at 5:49 pm on May 1, 2023: contributor
    Code review ACK 8f14fc86225d8fe77353f61ebd6b0bdda8d13aa9. Added new assert, test, and comment since last review
  33. DrahtBot requested review from pinheadmz on May 1, 2023
  34. DrahtBot requested review from TheCharlatan on May 1, 2023
  35. TheCharlatan approved
  36. TheCharlatan commented at 6:07 pm on May 1, 2023: contributor
    ACK 8f14fc86225d8fe77353f61ebd6b0bdda8d13aa9
  37. pinheadmz approved
  38. pinheadmz commented at 6:25 pm on May 1, 2023: member

    ACK 8f14fc86225d8fe77353f61ebd6b0bdda8d13aa9

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA256
     2
     3ACK 8f14fc86225d8fe77353f61ebd6b0bdda8d13aa9
     4-----BEGIN PGP SIGNATURE-----
     5
     6iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmRQBA4ACgkQ5+KYS2KJ
     7yTr14BAAw1BUdgVGe3UKG5epiL1XsijS0stgQRrXVwPJKM/7O4nAy5LRQBx/ASB+
     8+xH0CGkd8dULO9yPrXZHXn2tjlv36Wde+ECHy76BRy6qPYgQYf/KfX0RKl7IUw3D
     9RNTXxYPOopXDO9h9U2jmQpwZDU7Ri3DOD5BujVq+oAu0a+782QvoTMo/Yv9mEiKr
    10JOyNhSzCuy1vHK/bgO+CafNQLW0JwyQVntVDtVj3vC36OG0ChauUfJV7vALJ5KYI
    11/s5F9JA9BUPSNH1a6ruqCyc2pJR1URj1MVu44DQko4ICKQpphcq54r8dtlX4FvK2
    12/1F+czB4BKTmvhJyujynMte1FrgcHeZ/8df4tyvBa3+6j4y1jUjLHKEJaMXTgxC3
    13NdI1OMjskZdbTYnr+kewTemI261SXUOx87bL95BG2Mqk7drKpraQG7PG9JkB94oQ
    14g7W1B0xapGlP+AItM4esikInBKXE17ITxy5pQwznyonHCnAOiT9rbw12RSqXH/le
    15yiDhXsj6BzY6fiR7nOwjbzJuqlg6RGCgmbjz5zk5MYeiW0fuCbh30rbM07MY6+6u
    16CV9BQ0+Hn4RSDYQ4Na8v2dY0s8BbA4gd7swzcjPCveBppb1dQ2qt5CQ7Zk/Y5l21
    17jeG1vtRxvVt3U9QENfdghAj4yabhTSkO6pWc1nBK7DFOvOMFVmw=
    18=k0Pv
    19-----END PGP SIGNATURE-----
    

    pinheadmz’s public key is on keybase

  39. fanquake merged this on May 2, 2023
  40. fanquake closed this on May 2, 2023

  41. mzumsande deleted the branch on May 2, 2023
  42. in test/functional/feature_fastprune.py:36 in 8f14fc8622
    31+        tip = int(self.nodes[0].getbestblockhash(), 16)
    32+        time = self.nodes[0].getblock(self.nodes[0].getbestblockhash())['time'] + 1
    33+        height = self.nodes[0].getblockcount() + 1
    34+        block = create_block(hashprev=tip, ntime=time, txlist=[tx], coinbase=create_coinbase(height=height))
    35+        add_witness_commitment(block)
    36+        block.solve()
    


    maflcko commented at 12:19 pm on May 2, 2023:
    Any reason to make this simple test more complicated than it needs to be? See https://github.com/bitcoin/bitcoin/pull/27553
  43. sidhujag referenced this in commit 0f0ea07a6f on May 4, 2023
  44. Fabcien referenced this in commit b5e5bf5848 on Apr 19, 2024
  45. bitcoin locked this on May 1, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-31 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me