RPC/Blockchain: Optimise getblock for simple disk->hex case #21319
pull luke-jr wants to merge 1 commits into bitcoin:master from luke-jr:getblock_optimise changing 1 files +21 −4-
luke-jr commented at 8:15 pm on February 28, 2021: memberImplemented #17529 (comment)
-
DrahtBot added the label RPC/REST/ZMQ on Feb 28, 2021
-
luke-jr force-pushed on Feb 28, 2021
-
in src/rpc/blockchain.cpp:972 in 33cead5415 outdated
967+ if (!ReadRawBlockFromDisk(raw_block, pblockindex, Params().MessageStart())) { 968+ throw JSONRPCError(RPC_MISC_ERROR, "Block not found on disk"); 969+ } 970+ } 971+ 972 block = GetBlockChecked(pblockindex);
0xB10C commented at 6:36 pm on March 3, 2021:Why read the block from disk twice when
verbosity <= 0
?- in the
if (verbosity <= 0 && !RPCSerializationFlags()) {}
toraw_block
- with
block = GetBlockChecked(pblockindex);
luke-jr commented at 11:36 pm on March 3, 2021:Good point. Needs an else here.
luke-jr commented at 11:37 pm on March 3, 2021:Fixed0xB10C changes_requestedluke-jr force-pushed on Mar 3, 2021in src/rpc/blockchain.cpp:1080 in 7b23e32c35 outdated
958@@ -958,14 +959,30 @@ static RPCHelpMan getblock() 959 throw JSONRPCError(RPC_INVALID_ADDRESS_OR_KEY, "Block not found"); 960 } 961 962- block = GetBlockChecked(pblockindex); 963+ if (IsBlockPruned(pblockindex)) {
0xB10C commented at 8:44 am on March 4, 2021:nit: could be moved into the
if (verbosity <= 0 && !RPCSerializationFlags())
as we check if the block is pruned inGetBlockChecked()
again, but happy with leaving where it is.edit: I mean the whole
if (IsBlockPruned(pblockindex)) {}
. It’s not clear on GH0xB10C commented at 10:04 am on March 4, 2021: contributorSome notes and context for others looking at this PR:
The
getblock <hash>
has different levels of verbosity. A verbosity of0
returns the block in hex, verbosity1
in JSON, and further verbosity levels add more transaction details to the JSON.When calling the
getblock <hash>
RPC with a verbosity of0
we currently (before this PR) de-serialize the raw block stored on disk into aCBlock
(happens inGetBlockChecked()
) and then serialize it back into a hex string. This isn’t ideal performance wise (block and tx are de-serialized and (I think) all transactions are hashed inCTransaction
initialization).This PR adds special handling to the verbosity 0 case to skip the
CBlock
de- and serialization. This works only if the block on disk is serialized the same way as the RPC serialization is configured. If you were to manually set-rpcserialversion=0
(currently version1 (SegWit)
by default), then special case wouldn’t work.Something very similar has been implemented in #13151 for P2P block serving. @MarcoFalke opened PR #17529 attempting to improve the performance (10%-25% on Intel CPUs) by leaving out the CTransaction hashing where not needed.
Comparison:
0master: disk -> deserialize -> hash -> serialize -> hex string 1[#17529](/bitcoin-bitcoin/17529/): disk -> deserialize -> ____ -> serialize -> hex string 2this PR: disk -> ___________ -> ____ -> _________ -> hex string
As @promag mentioned in #13151#pullrequestreview-116907566 in regards to P2P block serving, skipping the de- and serialization here could potentially feed the RPC client corrupted blocks if the data on the disk is corrupted. Can’t really judge how problematic that is for RPC.
DrahtBot commented at 9:39 pm on July 28, 2021: contributorThe following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Reviews
See the guideline for information on the review process.
Type Reviewers Concept ACK andrewtoth If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #26415 (rpc,rest: faster getblock and rest_block by reading raw block by andrewtoth)
- #25232 (rpc: Faster getblock API by AaronDewes)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
luke-jr commented at 10:06 pm on July 28, 2021: memberClosing due to lack of interest. Can reopen if people want to review.luke-jr closed this on Jul 28, 2021
bitcoin locked this on Aug 18, 2022bitcoin unlocked this on Oct 30, 2022luke-jr reopened this on Nov 5, 2022
luke-jr force-pushed on Nov 5, 2022DrahtBot added the label Needs rebase on Nov 5, 2022andrewtoth commented at 4:46 pm on November 5, 2022: contributorConcept ACK. This RPC is used by a lot of projects and optimizing it would be a big benefit.
I think the main problem though is serving potentially corrupted blocks. It was suggested in #13151 (comment) that CRC32Cs can be added to the on disk blocks, which I think would be sufficient. How to do that though I’m not sure of. Ideally they would be in the meta header of the on disk block. Adding them to every block on disk would require rewriting the entire blockchain on already synced nodes, which would take a very long time and not allow downgrading. We would also need to update code that assumes a fixed 8 byte header before the block. Maybe we could add a block file version and only do this for blocks added to the new version type.
Alternatively it could be added to the block index. That would take less time to upgrade but would require a migration that would not allow downgrading.
I tried adding CRCs as a new index type in #26415, but that requires storing all the 4 byte CRCs along with the 32 byte block hash so it’s a lot more data on disk. It also adds the CRCs after the block is stored on the scheduler thread, so potentially a system that got notified of the new block and called
getblock
wouldn’t have the CRC yet.RPC/Blockchain: Optimise getblock for simple disk->hex case 68226673edluke-jr force-pushed on Nov 5, 2022luke-jr commented at 5:07 pm on November 5, 2022: memberI don’t think we need CRCs to make this change. The code already assumes no corruption in lots of other places.
Rebased
andrewtoth commented at 5:25 pm on November 5, 2022: contributorThe only other place usingReadRawBlockFromDisk
is in net processing, where we assume the peer will validate correctly and disconnect. The same assumption can’t be made for consumers of the RPC. All other block reads deserialize the block which will fail if the data is corrupt.luke-jr commented at 5:48 pm on November 5, 2022: memberWhat happens if you corrupt the chainstate?DrahtBot removed the label Needs rebase on Nov 5, 2022andrewtoth commented at 2:37 pm on November 6, 2022: contributorChainstate is separate from blocksdir. They can live on a different storage device. If an old block file gets corrupted it doesn’t affect the chainstate.luke-jr commented at 5:55 pm on November 6, 2022: memberNevertheless, I don’t think we guarantee anything in the case of disk corruption, nor should or can we.andrewtoth commented at 6:39 pm on November 6, 2022: contributorWell, I think currently we do guarantee we won’t serve a corrupt block when we useReadBlockFromDisk
.DrahtBot added the label Needs rebase on Dec 8, 2022DrahtBot commented at 9:58 am on December 8, 2022: contributor🐙 This pull request conflicts with the target branch and needs rebase.
maflcko commented at 11:38 am on December 8, 2022: memberWell, I think currently we do guarantee we won’t serve a corrupt block when we use ReadBlockFromDisk.
Do we?
The block header is checked, but not the transaction payload itself
andrewtoth commented at 4:43 pm on December 8, 2022: contributorThe block header is checked, but not the transaction payload itself
Indeed, I tested by modifying https://github.com/bitcoin/bitcoin/pull/26415/files#diff-186034bc1b6a0e662a59c81182b3d4a0318d44dc28810d2b837c92033e0d5630R142 in my PR to use
ReadBlockFromDisk
and it passes, but if I switch it to a byte inside the header (like 50), it fails. Looks like I was mistaken. The risk is just limited to the header then. Would it be worth it to modifyReadRawBlockFromDIsk
to check the header hash at least then, or does that remove enough of the speed benefit here?luke-jr commented at 6:08 pm on December 11, 2022: memberI don’t think there’s a benefit to checking the header, or trying to plan for corrupt data in general (aside from startup verification of high-risk/recent data).in src/rpc/blockchain.cpp:732 in 68226673ed
728+ throw JSONRPCError(RPC_MISC_ERROR, "Block not available (pruned data)"); 729+ } 730+ 731+ if (verbosity <= 0 && !RPCSerializationFlags()) { 732+ // This one case doesn't need to parse the block at all 733+ if (!node::ReadRawBlockFromDisk(raw_block, pblockindex->GetBlockPos(), Params().MessageStart())) {
maflcko commented at 8:21 am on December 12, 2022:nit: Maybe move this into a GetRawBlockChecked to avoid the duplicate pruned check whenGetBlockChecked
below is called?maflcko commented at 8:21 am on December 12, 2022: memberlgtmfanquake commented at 3:57 pm on February 6, 2023: memberStill needs a rebase. Maybe someone wants to take this over?maflcko added the label Up for grabs on Feb 6, 2023andrewtoth commented at 4:09 pm on February 6, 2023: contributorI can add this functionality to #26415.fanquake commented at 4:28 pm on February 6, 2023: memberI can add this functionality to #26415. @andrewtoth ok. Will close this for now.
fanquake closed this on Feb 6, 2023
fanquake removed the label Up for grabs on Feb 6, 2023achow101 referenced this in commit 00bf4a1711 on Jan 2, 2024bitcoin locked this on Feb 6, 2024
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-22 00:12 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me - in the