RPC/Blockchain: Optimise getblock for simple disk->hex case #21319

pull luke-jr wants to merge 1 commits into bitcoin:master from luke-jr:getblock_optimise changing 1 files +21 −4
  1. luke-jr commented at 8:15 pm on February 28, 2021: member
    Implemented #17529 (comment)
  2. DrahtBot added the label RPC/REST/ZMQ on Feb 28, 2021
  3. luke-jr force-pushed on Feb 28, 2021
  4. in src/rpc/blockchain.cpp:972 in 33cead5415 outdated
    967+            if (!ReadRawBlockFromDisk(raw_block, pblockindex, Params().MessageStart())) {
    968+                throw JSONRPCError(RPC_MISC_ERROR, "Block not found on disk");
    969+            }
    970+        }
    971+
    972         block = GetBlockChecked(pblockindex);
    


    0xB10C commented at 6:36 pm on March 3, 2021:

    Why read the block from disk twice when verbosity <= 0 ?

    1. in the if (verbosity <= 0 && !RPCSerializationFlags()) {} to raw_block
    2. with block = GetBlockChecked(pblockindex);

    luke-jr commented at 11:36 pm on March 3, 2021:
    Good point. Needs an else here.

    luke-jr commented at 11:37 pm on March 3, 2021:
    Fixed
  5. 0xB10C changes_requested
  6. luke-jr force-pushed on Mar 3, 2021
  7. in src/rpc/blockchain.cpp:1080 in 7b23e32c35 outdated
    958@@ -958,14 +959,30 @@ static RPCHelpMan getblock()
    959             throw JSONRPCError(RPC_INVALID_ADDRESS_OR_KEY, "Block not found");
    960         }
    961 
    962-        block = GetBlockChecked(pblockindex);
    963+        if (IsBlockPruned(pblockindex)) {
    


    0xB10C commented at 8:44 am on March 4, 2021:

    nit: could be moved into the if (verbosity <= 0 && !RPCSerializationFlags()) as we check if the block is pruned in GetBlockChecked() again, but happy with leaving where it is.

    edit: I mean the whole if (IsBlockPruned(pblockindex)) {}. It’s not clear on GH

  8. 0xB10C commented at 10:04 am on March 4, 2021: contributor

    Some notes and context for others looking at this PR:

    The getblock <hash> has different levels of verbosity. A verbosity of 0 returns the block in hex, verbosity 1 in JSON, and further verbosity levels add more transaction details to the JSON.

    When calling the getblock <hash> RPC with a verbosity of 0 we currently (before this PR) de-serialize the raw block stored on disk into a CBlock (happens in GetBlockChecked()) and then serialize it back into a hex string. This isn’t ideal performance wise (block and tx are de-serialized and (I think) all transactions are hashed in CTransaction initialization).

    This PR adds special handling to the verbosity 0 case to skip the CBlock de- and serialization. This works only if the block on disk is serialized the same way as the RPC serialization is configured. If you were to manually set -rpcserialversion=0 (currently version 1 (SegWit) by default), then special case wouldn’t work.

    Something very similar has been implemented in #13151 for P2P block serving. @MarcoFalke opened PR #17529 attempting to improve the performance (10%-25% on Intel CPUs) by leaving out the CTransaction hashing where not needed.

    Comparison:

    0master:   disk -> deserialize -> hash -> serialize -> hex string
    1[#17529](/bitcoin-bitcoin/17529/):   disk -> deserialize -> ____ -> serialize -> hex string
    2this PR:  disk -> ___________ -> ____ -> _________ -> hex string
    

    As @promag mentioned in #13151#pullrequestreview-116907566 in regards to P2P block serving, skipping the de- and serialization here could potentially feed the RPC client corrupted blocks if the data on the disk is corrupted. Can’t really judge how problematic that is for RPC.

  9. DrahtBot commented at 9:39 pm on July 28, 2021: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK andrewtoth

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #26415 (rpc,rest: faster getblock and rest_block by reading raw block by andrewtoth)
    • #25232 (rpc: Faster getblock API by AaronDewes)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  10. luke-jr commented at 10:06 pm on July 28, 2021: member
    Closing due to lack of interest. Can reopen if people want to review.
  11. luke-jr closed this on Jul 28, 2021

  12. bitcoin locked this on Aug 18, 2022
  13. bitcoin unlocked this on Oct 30, 2022
  14. luke-jr reopened this on Nov 5, 2022

  15. luke-jr force-pushed on Nov 5, 2022
  16. DrahtBot added the label Needs rebase on Nov 5, 2022
  17. andrewtoth commented at 4:46 pm on November 5, 2022: contributor

    Concept ACK. This RPC is used by a lot of projects and optimizing it would be a big benefit.

    I think the main problem though is serving potentially corrupted blocks. It was suggested in #13151 (comment) that CRC32Cs can be added to the on disk blocks, which I think would be sufficient. How to do that though I’m not sure of. Ideally they would be in the meta header of the on disk block. Adding them to every block on disk would require rewriting the entire blockchain on already synced nodes, which would take a very long time and not allow downgrading. We would also need to update code that assumes a fixed 8 byte header before the block. Maybe we could add a block file version and only do this for blocks added to the new version type.

    Alternatively it could be added to the block index. That would take less time to upgrade but would require a migration that would not allow downgrading.

    I tried adding CRCs as a new index type in #26415, but that requires storing all the 4 byte CRCs along with the 32 byte block hash so it’s a lot more data on disk. It also adds the CRCs after the block is stored on the scheduler thread, so potentially a system that got notified of the new block and called getblock wouldn’t have the CRC yet.

  18. RPC/Blockchain: Optimise getblock for simple disk->hex case 68226673ed
  19. luke-jr force-pushed on Nov 5, 2022
  20. luke-jr commented at 5:07 pm on November 5, 2022: member

    I don’t think we need CRCs to make this change. The code already assumes no corruption in lots of other places.

    Rebased

  21. andrewtoth commented at 5:25 pm on November 5, 2022: contributor
    The only other place using ReadRawBlockFromDisk is in net processing, where we assume the peer will validate correctly and disconnect. The same assumption can’t be made for consumers of the RPC. All other block reads deserialize the block which will fail if the data is corrupt.
  22. luke-jr commented at 5:48 pm on November 5, 2022: member
    What happens if you corrupt the chainstate?
  23. DrahtBot removed the label Needs rebase on Nov 5, 2022
  24. andrewtoth commented at 2:37 pm on November 6, 2022: contributor
    Chainstate is separate from blocksdir. They can live on a different storage device. If an old block file gets corrupted it doesn’t affect the chainstate.
  25. luke-jr commented at 5:55 pm on November 6, 2022: member
    Nevertheless, I don’t think we guarantee anything in the case of disk corruption, nor should or can we.
  26. andrewtoth commented at 6:39 pm on November 6, 2022: contributor
    Well, I think currently we do guarantee we won’t serve a corrupt block when we use ReadBlockFromDisk.
  27. DrahtBot added the label Needs rebase on Dec 8, 2022
  28. DrahtBot commented at 9:58 am on December 8, 2022: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  29. maflcko commented at 11:38 am on December 8, 2022: member

    Well, I think currently we do guarantee we won’t serve a corrupt block when we use ReadBlockFromDisk.

    Do we?

    The block header is checked, but not the transaction payload itself

  30. andrewtoth commented at 4:43 pm on December 8, 2022: contributor

    The block header is checked, but not the transaction payload itself

    Indeed, I tested by modifying https://github.com/bitcoin/bitcoin/pull/26415/files#diff-186034bc1b6a0e662a59c81182b3d4a0318d44dc28810d2b837c92033e0d5630R142 in my PR to use ReadBlockFromDisk and it passes, but if I switch it to a byte inside the header (like 50), it fails. Looks like I was mistaken. The risk is just limited to the header then. Would it be worth it to modify ReadRawBlockFromDIsk to check the header hash at least then, or does that remove enough of the speed benefit here?

  31. luke-jr commented at 6:08 pm on December 11, 2022: member
    I don’t think there’s a benefit to checking the header, or trying to plan for corrupt data in general (aside from startup verification of high-risk/recent data).
  32. in src/rpc/blockchain.cpp:732 in 68226673ed
    728+            throw JSONRPCError(RPC_MISC_ERROR, "Block not available (pruned data)");
    729+        }
    730+
    731+        if (verbosity <= 0 && !RPCSerializationFlags()) {
    732+            // This one case doesn't need to parse the block at all
    733+            if (!node::ReadRawBlockFromDisk(raw_block, pblockindex->GetBlockPos(), Params().MessageStart())) {
    


    maflcko commented at 8:21 am on December 12, 2022:
    nit: Maybe move this into a GetRawBlockChecked to avoid the duplicate pruned check when GetBlockChecked below is called?
  33. maflcko commented at 8:21 am on December 12, 2022: member
    lgtm
  34. fanquake commented at 3:57 pm on February 6, 2023: member
    Still needs a rebase. Maybe someone wants to take this over?
  35. maflcko added the label Up for grabs on Feb 6, 2023
  36. andrewtoth commented at 4:09 pm on February 6, 2023: contributor
    I can add this functionality to #26415.
  37. fanquake commented at 4:28 pm on February 6, 2023: member

    I can add this functionality to #26415. @andrewtoth ok. Will close this for now.

  38. fanquake closed this on Feb 6, 2023

  39. fanquake removed the label Up for grabs on Feb 6, 2023
  40. achow101 referenced this in commit 00bf4a1711 on Jan 2, 2024
  41. bitcoin locked this on Feb 6, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me