Introduce ‘getblockfileinfo’ RPC command #27770

pull furszy wants to merge 2 commits into bitcoin:master from furszy:2023_rpc_getblockfileinfo changing 5 files +73 −9
  1. furszy commented at 9:14 pm on May 26, 2023: member

    Implemented getblockfileinfo RPC command to obtain block files related data.

    The new RPC could be useful for:

    1. Make manual pruning process more transparent: Currently, users blindly provide a height to pruneblockchain and only realizes up to which block the node was pruned from the RPC command result. This new introduced command lets the user know where to set the pruneblockchain height by knowing the highest and lowest block heights stored on each file. So users will actually be able to remove specific block ranges (with some exceptions, as blocks can be stored out of order in different block files.. but that is a different topic).

    2. Make tests more robust and readable by removing magic numbers like the ones presented in rpc_getblockfrompeer.py (which are actually off by a good degree and not really document anywhere, see comment).

    3. And also for testing AssumeUTXO. Which, per mzumzande comment: it usage could result in block files with a very mixed-up block order. So having a way to analyse this better seems like it could be helpful. Also see a live scenario in theStack comment.

  2. DrahtBot commented at 9:14 pm on May 26, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29418 (rpc: provide per message stats for global traffic via new RPC ‘getnetmsgstats’ by vasild)
    • #19463 (Prune locks by luke-jr)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label CI failed on May 26, 2023
  4. furszy force-pushed on May 26, 2023
  5. maflcko commented at 1:57 pm on May 27, 2023: member
    Not sure. My preference would be to figure out why the blocks are of different size and then make that deterministic instead.
  6. furszy commented at 8:53 pm on May 27, 2023: member

    Not sure. My preference would be to figure out why the blocks are of different size and then make that deterministic instead.

    I started there but.. do we really care? (aside from dev curiosity, where I agree that wouldn’t be bad to know why).

    The test goal is to exercise getblockfrompeer and not pruning nor where blocks are stored. Having all those hardcoded heights, expecting to have a precise number of blocks in each block file, looks quite fragile.

    I mean, all the hardcoded pruneheight numbers and pruneblockchain(num) calls looks random at first. It takes a bit to realize that the numbers were picked to prune one block file at time. The pruneblockchain() numbers are also extended a bit just in case block files are bigger. Check how they are called: pruneblockchain(300), pruneblockchain(700) when each block file has ~250 blocks; this is my local getblockfileinfo() output on each of the three block files presented in the test prior the first pruneblockchain call:

    {‘blocks_num’: 249, ’lowest_block’: 0, ‘highest_block’: 248, ‘data_size’: 65319, ‘undo_size’: 10168} {‘blocks_num’: 251, ’lowest_block’: 249, ‘highest_block’: 499, ‘data_size’: 65511, ‘undo_size’: 10291} {‘blocks_num’: 106, ’lowest_block’: 500, ‘highest_block’: 605, ‘data_size’: 27666, ‘undo_size’: 4346}

    At least with the getblockfileinfo() call we are explicitly pruning the first file, then the second one, etc. Without having to remember how many blocks are stored in each file, nor requiring to extend the prune height just in case.

    I also think that getblockfileinfo could be useful in other test contexts as well. Plus, I’m not sure how people call pruneblockchain right now, sounds like they blindly call it with a height and only realize up to which block the node pruned the chain when the command returns.

    (if we choose to go into this direction, can remove the remaining hardcoded prune heights in the test).

  7. fanquake commented at 9:49 am on May 29, 2023: member

    I started there but.. do we really care?

    We definitely care enough to investigate further, before adding a new RPC command to fix an intermittent issue in a functional test.

    Looks like @theStack has done that investigation here: #27749 (comment), and @mzumsande has suggested a potential alternative fix: #27749 (comment):

    add a sync_blocks call between node0 and node2 before letting node0 generate the 400 blocks. This should guarantee that all of these 400 blocks will be processed by compact block reconstruction.

  8. furszy commented at 1:48 pm on May 29, 2023: member

    I started there but.. do we really care?

    We definitely care enough to investigate further, before adding a new RPC command to fix an intermittent issue in a functional test.

    Looks like @theStack has done that investigation here: #27749 (comment), and @mzumsande has suggested a potential alternative fix: #27749 (comment):

    add a sync_blocks call between node0 and node2 before letting node0 generate the 400 blocks. This should guarantee that all of these 400 blocks will be processed by compact block reconstruction.

    Yeah, I know. theStack investigation is awesome!. We kept ping-ponging about it after this PR was pushed :). And also martin’s fix sounds good.

    If you don’t like the first sentence in my previous comment remove it. It was just the intro, no hard feelings. The background rationale there still applies even with the investigation and the possible fix.

    The test is still fragile, the hardcoded pruneblockchain heights are off by a good degree to cover possible pruning diffs, an also not well documented. I don’t think that anyone knew that the pruneblockchain(300) or pruneblockchain(700) test calls are there to remove the first and second block file (when first block file highest block number is 250, and second one is 500..).

    Furthermore, what theStack analysis really says is that pruneblockchain result is not accurate. Which is the real issue for me. The RPC command result description says that it returns the “height of the last block pruned”, which could not be true as blocks could have been stored out of order. So retrieving block height 249 as “last block pruned” doesn’t mean that block 248 has been pruned from disk.

    So, I actually think that adding the getblockfileinfo RPC command makes even more sense now. Not to solve the current intermittent issue, but to make the manual pruning process clearer for users and tests (check the penultimate paragraph in my previous comment). And also as a debugging tool.

  9. mzumsande commented at 2:53 pm on May 29, 2023: contributor

    Furthermore, what theStack analysis really says is that pruneblockchain result is not accurate. Which is the real issue for me. The RPC command result description says that it returns the “height of the last block pruned”, which could not be true as blocks could have been stored out of order. So retrieving block height 249 as “last block pruned” doesn’t mean that block 248 has been pruned from disk.

    But is it really incorrect that the “last block pruned” is 249? Yes, this doesn’t imply that all blocks before have been pruned as well - but the RPC description isn’t claiming that.

    So, I actually think that adding the getblockfileinfo RPC command makes even more sense now.

    Having this RPC could also make sense because of assumeutxo, which could result in block files with a very mixed-up block order. Having a way to analyse this better seems like it could be helpful (ping @jamesob). But I think that introducing any new RPC should primarily have some benefit to users, so that should be the main selling point - the use in tests should be more of a side benefit.

  10. furszy commented at 3:55 pm on May 29, 2023: member

    But is it really incorrect that the “last block pruned” is 249? Yes, this doesn’t imply that all blocks before have been pruned as well - but the RPC description isn’t claiming that.

    That would be the devil’s advocate argument. It works for us, but not for our users. Let’s agree that while it’s not clearly stated, anyone reading the RPC result description would understand that all blocks prior the returned height were pruned.

    But I think that introducing any new RPC should primarily have some benefit to users, so that should be the main selling point - the use in tests should be more of a side benefit.

    Yeah, I agree. But also, I think there is some synergy between the functional tests and other projects building upon core’s API as well. Sometimes commands that are useful for tests, not for end users, and also useful for other projects.

    I think that this one makes the manual pruning process a bit clearer. So there is no blind pruneblockchain(<height>) call. Right now callers only realize up to which block the node pruned the chain when the command returns (and that is not even accurate enough). Great to hear that this has benefits for assumeUTXO as well.

  11. maflcko commented at 6:57 am on May 30, 2023: member
    What about a one-line fix to clear the test issue and then a follow-up with the RPC, which may take longer to review than a trivial one-line patch?
  12. furszy commented at 12:35 pm on May 30, 2023: member

    What about a one-line fix to clear the test issue and then a follow-up with the RPC, which may take longer to review than a trivial one-line patch?

    np for me. @mzumsande do you want to go ahead with your fix? I can just relabel this PR to “Implement getblockfileinfo RPC command” and done.

  13. mzumsande commented at 4:59 pm on May 30, 2023: contributor

    np for me. @mzumsande do you want to go ahead with your fix? I can just relabel this PR to “Implement getblockfileinfo RPC command” and done.

    yes, I’ll open a PR with the one-line fix soon! [Edit]: #27784

  14. theStack commented at 8:28 pm on May 30, 2023: contributor

    I agree with @furszy and @mzumsande that a getblockfileinfo RPC could be quite handy, both for making some of the pruning-related tests more robust and readable (potentially eliminating some magic numbers that can only be explained by the one who introduced them) and also for testing AssumeUTXO. As an illustration, the blocks directory of a mainnet test-run of AssumeUTXO ended up at some point like this on my disk:

     0$ ls -l ./datadir2/blocks/blk*.dat 
     1-rw-------  1 thestack  thestack  133255069 May 21 12:38 ./datadir2/blocks/blk00123.dat
     2-rw-------  1 thestack  thestack  133591927 May 21 20:35 ./datadir2/blocks/blk00226.dat
     3-rw-------  1 thestack  thestack  134063604 May 23 00:47 ./datadir2/blocks/blk00278.dat
     4-rw-------  1 thestack  thestack  133150555 May 23 00:49 ./datadir2/blocks/blk00285.dat
     5-rw-------  1 thestack  thestack  100663296 May 23 02:51 ./datadir2/blocks/blk00286.dat
     6-rw-------  1 thestack  thestack  133995151 May 23 02:51 ./datadir2/blocks/blk00400.dat
     7-rw-------  1 thestack  thestack  133651193 May 23 02:51 ./datadir2/blocks/blk00401.dat
     8-rw-------  1 thestack  thestack  133962542 May 23 02:52 ./datadir2/blocks/blk00402.dat
     9-rw-------  1 thestack  thestack  133673223 May 23 02:52 ./datadir2/blocks/blk00403.dat
    10-rw-------  1 thestack  thestack   50331648 May 23 02:53 ./datadir2/blocks/blk00404.dat
    

    For finding out where the block-file gaps come from, which rough block height ranges were stored in which of the different block files, having getblockfileinfo at hand would have been very nice.

    Though, I’m less convinced that typical users would have a strong need for that RPC, as the block-file structure is something internal that shouldn’t be really relevant for typical use – I doubt users running pruneblockchain care in detail which block files are removed, I’d figure it’s more like signalling “I don’t need blocks up to height N anymore, if you can, please get rid of them”. But at least as I’m aware, introducing test-only RPCs is fine and not an common practice at all (see git grep \"hidden\").

  15. fanquake referenced this in commit 433f17bc3f on May 31, 2023
  16. DrahtBot removed the label CI failed on May 31, 2023
  17. furszy renamed this:
    Fix rpc_getblockfrompeer intermittency by introducing 'getblockfileinfo' RPC command
    Introduce 'getblockfileinfo' RPC command
    on May 31, 2023
  18. sidhujag referenced this in commit 7704bfe509 on May 31, 2023
  19. furszy force-pushed on Jun 1, 2023
  20. furszy commented at 10:05 pm on June 1, 2023: member
    title and description updated. And also went a bit further with the rpc_getblockfrompeer.py changes, added some explanations to it.
  21. glozow added the label RPC/REST/ZMQ on Jun 2, 2023
  22. theStack commented at 9:38 pm on June 6, 2023: contributor

    FWIW, I was curious how hard it would be to implement getblockfileinfo in the functional test framework by parsing the block files: https://github.com/theStack/bitcoin/blob/97ea647ee9b186ff4d3be2ba5ac91b94e5371323/test/functional/test_framework/util.py#L559-L584 (branch https://github.com/theStack/bitcoin/tree/getblockfileinfo_python)

    The result seems to work, but is still kind of ugly and hacky. To get a parsed block’s height (without flooding the node with hundreds of getblockheader RPC requests), we need to analyze its coinbase’s scriptSig (see BIP34), with the need for an exception on the genesis block (or for any other blocks that don’t follow BIP34, if we’d ever generate ones in our tests). It’s probably generally a bad idea to access block files while the node is running, at least for the latest one which is currently still written to by bitcoind. So if we want that functionality, then implementing it as a RPC seems to be reasonable. It might also make sense to get more general information, e.g. about what block files are available (as discussed on IRC via @furszy).

  23. luke-jr referenced this in commit 4a1e08b4c2 on Aug 16, 2023
  24. luke-jr referenced this in commit 9497ecff9e on Aug 16, 2023
  25. DrahtBot added the label Needs rebase on Oct 2, 2023
  26. RPC: introduce 'getblockfileinfo' RPC command
    Simple command to retrieve information about a certain block file
    24bfe4e1a0
  27. test: rpc_getblockfrompeer.py, remove magic numbers usage
    Instead of hardcoding the `pruneblockchain(<height>)` heights,
    use 'getblockfileinfo' to obtain the highest block number of
    each of the block files.
    
    Making the test more robust and readable by stating which file
    is being pruned at every point of time (the goal is to mimic
    how the automatic pruning process work).
    5090771f32
  28. furszy force-pushed on Oct 18, 2023
  29. DrahtBot removed the label Needs rebase on Oct 18, 2023
  30. DrahtBot added the label CI failed on Jan 13, 2024
  31. achow101 requested review from mzumsande on Apr 9, 2024
  32. achow101 requested review from sr-gi on Apr 9, 2024
  33. achow101 commented at 2:44 pm on April 9, 2024: member
    Are you still working on this?
  34. laanwj commented at 2:51 pm on April 9, 2024: member

    I think this information is too low-level. In the past we’ve rejected changes that expose low-level information about our block storage because block files are not an external API, and exposing this kind of info limits our flexibility in regard to changing the block storage model.

    I agree that it can be useful to have more transparency into pruning, however, but I’m not sure adding a RPC giving information per file is the best approach.

  35. furszy commented at 3:40 pm on April 14, 2024: member

    I think this information is too low-level. In the past we’ve rejected changes that expose low-level information about our block storage because block files are not an external API, and exposing this kind of info limits our flexibility in regard to changing the block storage model.

    Conceptually, I agree. But I don’t think that the block storage model flexibility argument is the best for not doing this. We do have procedures tightly coupled to the current block storage model (e.g. the best chain itself or pruning). Removing this new debug-only RPC command would be the least of our problems if we ever seek to change the storage model.

    That being said, no problem on closing this for now. I agree that making pruning more transparent can be orthogonal to the introduction of this command.

  36. furszy closed this on Apr 14, 2024

  37. laanwj commented at 3:54 pm on April 14, 2024: member

    Removing this new debug-only RPC command would be the least of our problems if we ever seek to change the storage model.

    Sure, that’s true. In that sense it’s different than say, adding block file/pos to getblock info, as it doesn’t affect the output of a commonly used RPC.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me