If a block was pruned, getblock already returns a specific error: “Block not available (pruned data)”.
But if we haven’t received the full block yet (e.g. in a race with block downloading after a new block was received headers-first, or during IBD) we just return an unspecific “Block not found on disk” error and log
ERROR: ReadBlockFromDisk: OpenBlockFile failed for FlatFilePos(nFile=-1, nPos=0)
which suggest something went wrong even though this is a completely normal and expected situation.
This PR improves the error message and stops calling ReadRawBlockFromDisk(), when we already know from the header that the block is not available on disk.
Similarly, it prevents all other rpcs from calling blockstorage read functions unless we expect the data to be there, so that LogError() will only be thrown when there is an actual file system problem.
I’m not completely sure if the cause is important enough to change the wording of the rpc error, that some scripts may rely on.
If reviewers prefer it, an alternative solution would be to keep returning the current “Block not found on disk” error, but return it immediately instead of calling ReadRawBlockFromDisk, which would at least prevent the log error and also be an improvement in my opinion.
DrahtBot
commented at 4:00 am on July 9, 2024:
contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Here and in rest.cpp: Structuring it like this should save us the IsBlockPruned() call in the normal case (untested). IsBlockPruned() checks the same condition plus something extra so this should not change behavior.
0if (!(blockindex.nStatus & BLOCK_HAVE_DATA)) {
1if (blockman.IsBlockPruned(blockindex) {
2 throw JSONRPCError(RPC_MISC_ERROR, "Block not available (pruned data)");
3 }
4 throw JSONRPCError(RPC_MISC_ERROR, "Block not available (not fully downloaded)");
5 }
done. Performance doesn’t seem that important, but I also think it’s a bit clearer that way.
fjahr
commented at 11:31 am on July 9, 2024:
contributor
Concept ACK
maflcko
commented at 12:24 pm on July 9, 2024:
member
Thanks for picking this up.
Seems fine to fix it this way.
However, you forgot GetBlockChecked (and all the other functions that can fail in the same way)?
My preference to fix it would be to move the checks to the inside of the block manager. This way,
module-external cs_main locking (and races) are avoided, because a prune call happening after the IsBlockPruned check (and leaving the scope of cs_main) could be detected, if wanted, by doing the check again in the block manager
error case enumeration and handling are done in a single place, as opposed to every call site
It would be easier for call sites to differentiate between read failures caused by logic bugs (what this pull is trying to solve) and read failures caused by IO corruption.
However, this basically requires a full rewrite of the blockmanager read-interface, so I haven’t completed it yet. (Happy to pick it up again, if you think it makes sense as well)
mzumsande force-pushed
on Jul 9, 2024
mzumsande
commented at 10:17 pm on July 9, 2024:
contributor
However, you forgot GetBlockChecked (and all the other functions that can fail in the same way)?
Updated to change all functions I could find. I was only looking at getblock before, not at other RPCs.
My preference to fix it would be to move the checks to the inside of the block manager.
However, this basically requires a full rewrite of the blockmanager read-interface, so I haven’t completed it yet. (Happy to pick it up again, if you think it makes sense as well)
Makes sense to me, so concept a c k to that! Depending on how large this PR is going to be and how long it’s going to take to complete it, I could either close this simple PR, which changes return errors and is not a refactor, or it could be a pure refactor on top of it? Let me know what you prefer.
in
src/rpc/blockchain.cpp:590
in
1d97b4cfdeoutdated
587- throw JSONRPCError(RPC_MISC_ERROR, "Block not available (pruned data)");
588+ if (!(blockindex.nStatus & BLOCK_HAVE_DATA)) {
589+ if (blockman.IsBlockPruned(blockindex)) {
590+ throw JSONRPCError(RPC_MISC_ERROR, "Block not available (pruned data)");
591+ }
592+ throw JSONRPCError(RPC_MISC_ERROR, "Block not available (not fully downloaded)");
nit: The comment below could also be adjusted. The case where the header is available but we don’t have the block yet is covered by this new JSONRPCError.
// Block not found on disk. This could be because we have the block
// header in our index but not yet have the block or did not accept the
// block. Or if the block was pruned right after we released the lock above.
in
src/rpc/blockchain.cpp:614
in
1d97b4cfdeoutdated
611- throw JSONRPCError(RPC_MISC_ERROR, "Block not available (pruned data)");
612+ if (!(blockindex.nStatus & BLOCK_HAVE_DATA)) {
613+ if (blockman.IsBlockPruned(blockindex)) {
614+ throw JSONRPCError(RPC_MISC_ERROR, "Block not available (pruned data)");
615+ }
616+ throw JSONRPCError(RPC_MISC_ERROR, "Block not available (not fully downloaded)");
// Block not found on disk. This could be because we have the block
// header in our index but not yet have the block or did not accept the
// block. Or if the block was pruned right after we released the lock above.
tdb3
commented at 2:42 am on July 10, 2024:
contributor
Concept ACK
Thanks for picking this up. Overall, I think adding clarity for the user is a net positive.
I’m not completely sure if the cause is important enough to change the wording of the rpc error, that some scripts may rely on.
I agree, this may be a borderline case because Block not found on disk seems like it is correct albeit not as descriptive. Since we’re not changing RPC data structure or error codes (just the message), I’m not even sure if implementing deprecatedrpc is fully appropriate.
Looks like we lost coverage for the original Block not found on disk error, so it would be good to add a commit with a test for that (or in a follow up). Maybe something like the following (and possibly move move_block_file() into the framework to allow reuse).
0diff --git a/test/functional/rpc_blockchain.py b/test/functional/rpc_blockchain.py
1index 4807551c0c3..363255ba48f 100755
2--- a/test/functional/rpc_blockchain.py
3+++ b/test/functional/rpc_blockchain.py
4@@ -638,6 +638,11 @@ class BlockchainTest(BitcoinTestFramework):
5 node.submitheader(block["hex"])
6 assert_raises_rpc_error(-1, "Block not available (not fully downloaded)", lambda: node.getblock(block["hash"]))
7 8+ self.log.info("Test getblock when block is missing")
9+ move_block_file('blk00000.dat', 'blk00000.dat.bak')
10+ assert_raises_rpc_error(-1, "Block not found on disk", node.getblock, blockhash)
11+ move_block_file('blk00000.dat.bak', 'blk00000.dat')
12+
1314 if __name__ == '__main__':
15 BlockchainTest().main()
16diff --git a/test/functional/rpc_getblockstats.py b/test/functional/rpc_getblockstats.py
17index bf261befcc8..c4b4de65e80 100755
18--- a/test/functional/rpc_getblockstats.py
19+++ b/test/functional/rpc_getblockstats.py
20@@ -182,5 +182,10 @@ class GetblockstatsTest(BitcoinTestFramework):
21 assert_equal(tip_stats["utxo_increase_actual"], 4)
22 assert_equal(tip_stats["utxo_size_inc_actual"], 300)
2324+ self.log.info('Test when block is missing')
25+ (self.nodes[0].blocks_path / 'blk00000.dat').rename(self.nodes[0].blocks_path / 'blk00000.dat.backup')
26+ assert_raises_rpc_error(-1, 'Block not found on disk', self.nodes[0].getblockstats, hash_or_height=1)
27+ (self.nodes[0].blocks_path / 'blk00000.dat.backup').rename(self.nodes[0].blocks_path / 'blk00000.dat')
28+
29 if __name__ == '__main__':
30 GetblockstatsTest().main()
in
src/rpc/blockchain.cpp:640
in
1d97b4cfdeoutdated
634@@ -629,8 +635,11 @@ static CBlockUndo GetUndoChecked(BlockManager& blockman, const CBlockIndex& bloc
635636 {
637 LOCK(cs_main);
638- if (blockman.IsBlockPruned(blockindex)) {
639- throw JSONRPCError(RPC_MISC_ERROR, "Undo data not available (pruned data)");
640+ if (!(blockindex.nStatus & BLOCK_HAVE_DATA)) {
641+ if (blockman.IsBlockPruned(blockindex)) {
642+ throw JSONRPCError(RPC_MISC_ERROR, "Block not available (pruned data)");
in
src/rpc/blockchain.cpp:642
in
1d97b4cfdeoutdated
639- throw JSONRPCError(RPC_MISC_ERROR, "Undo data not available (pruned data)");
640+ if (!(blockindex.nStatus & BLOCK_HAVE_DATA)) {
641+ if (blockman.IsBlockPruned(blockindex)) {
642+ throw JSONRPCError(RPC_MISC_ERROR, "Block not available (pruned data)");
643+ }
644+ throw JSONRPCError(RPC_MISC_ERROR, "Block not available (not fully downloaded)");
It feels like the rule of the 3 is being hit here and this code could be DRY’d up? Maybe in a helper function that takes the string "Block" or "Undo data" and does string formatting?
maflcko
commented at 7:48 am on July 11, 2024:
member
Makes sense to me, so concept a c k to that! Depending on how large this PR is going to be and how long it’s going to take to complete it, I could either close this simple PR, which changes return errors and is not a refactor, or it could be a pure refactor on top of it? Let me know what you prefer.
Ok, I’ll give it a try.
Another thing to check before closing the issue is the RPC return code: #20978 (comment) to confirm it is accurate (server error vs user error). (Haven’t looked at the changes, just mentioning it so that it is not forgotten).
maflcko
commented at 12:15 pm on July 11, 2024:
member
Ok, I’ll give it a try.
Ok, I am done, but I think it is easier to review this pull request.
However, you forgot GetBlockChecked (and all the other functions that can fail in the same way)?
Updated to change all functions I could find. I was only looking at getblock before, not at other RPCs.
Still missing:
blockToJSON
getrawtransaction
gettxoutproof
I think you can just expose the helper functions and use them everywhere?
mzumsande
commented at 11:39 pm on July 16, 2024:
contributor
Ok, I am done, but I think it is easier to review this pull request.
OK - I’m working on addressing the feedback, plus adding more test coverage for the different types of errors. Will update soon!
BrandonOdiwuor
commented at 3:55 pm on July 17, 2024:
contributor
Concept ACK
mzumsande force-pushed
on Jul 17, 2024
mzumsande force-pushed
on Jul 17, 2024
DrahtBot added the label
CI failed
on Jul 17, 2024
DrahtBot
commented at 6:39 pm on July 17, 2024:
contributor
Make sure to run all tests locally, according to the documentation.
The failure may happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
mzumsande
commented at 6:44 pm on July 17, 2024:
contributor
I have now updated the PR with the feedback above.
split the changes over multiple commits
adjusted all call sites of ReadRawBlockFromDisk / ReadBlockFromDisk / UndoReadFromDisk from rpc / rest to to check the blockindex before whether we expect to have the (undo) data.
Introduced a single function CheckHeaderForData() to throw the rpc errors that is called by multiple rpcs.
added some more test coverage (among them the suggestions by @tdb3)
mzumsande renamed this:
rpc, rest: Improve getblock error when only header is available
rpc, rest: Improve block rpc error handling, check header before attempting to read block data.
on Jul 17, 2024
DrahtBot removed the label
CI failed
on Jul 17, 2024
andrewtoth approved
andrewtoth
commented at 11:10 pm on August 6, 2024:
contributor
utACK02f6a42be3dfc57998304801c0d909bfa96745e4
DrahtBot requested review from TheCharlatan
on Aug 6, 2024
DrahtBot requested review from BrandonOdiwuor
on Aug 6, 2024
DrahtBot requested review from fjahr
on Aug 6, 2024
DrahtBot requested review from tdb3
on Aug 6, 2024
DrahtBot added the label
CI failed
on Aug 22, 2024
mzumsande force-pushed
on Aug 29, 2024
mzumsande
commented at 10:06 am on August 29, 2024:
contributor
02f6a42 to 4fb708c:
Rebased and fixed silent conflict with #30636.
DrahtBot removed the label
CI failed
on Aug 29, 2024
andrewtoth
commented at 2:33 am on September 3, 2024:
contributor
I’m unsure why CI is failing now…
fanquake
commented at 9:12 am on September 3, 2024:
member
@andrewtoth the CI failure in the test-each-commit job is unrelated to the changes (missing zmq package, fixed in #30749).
mzumsande force-pushed
on Sep 3, 2024
mzumsande
commented at 9:21 am on September 3, 2024:
contributor
@andrewtoth the CI failure in the test-each-commit job is unrelated to the changes (missing zmq package, fixed in #30749).
Thanks, I also wasn’t sure! I’ve just rebased again.
in
src/rpc/blockchain.cpp:583
in
856c162fb6outdated
In 856c162fb6cdbb5715cae23d1b690d5bb3812afc: The function is introduced just two commits earlier, I think the static can be dropped there and the header entry added there as well.
mzumsande
commented at 11:06 pm on September 9, 2024:
done
in
src/rpc/blockchain.cpp:583
in
8cfaf6e6efoutdated
Not sure I am happy with the naming of this function. We are handling a blockindex here and we know implicitly that he are trying to catch errors here when the blockindex exists because there is just a header available but not the block. But someone looking at this function might think: “What does this have to do with a header if there is nothing about headers in the function?” It’s not a blocker for me if others don’t agree this is suboptimal but I would be more happy with something along the lines of CheckBlockDataAvailability().
Or adding a one line comment explaining the link to headers would be fine for me as well.
mzumsande
commented at 11:06 pm on September 9, 2024:
renamed to CheckBlockDataAvailability()
in
src/rpc/blockchain.cpp:187
in
041b78ccedoutdated
mzumsande
commented at 11:06 pm on September 9, 2024:
I can break this up, but I think the suggestion doesn’t work because we still need to feed the result into have_undo so that if UndoReadFromDisk fails for an unexpected reason like disk errors, we don’t attempt to access blockUndo below and segfault.
In rpc/rawtransaction.cpp, did you mean to introduce a variable like skip_details or something to break this up?
I think the suggestion doesn’t work because we still need to feed the result into have_undo so that if UndoReadFromDisk fails for an unexpected reason like disk errors, we don’t attempt to access blockUndo below and segfault.
I see, I think in that scenario it would be best to throw an error right there instead of ignoring it silently and giving the user an incomplete response? I mean, we have all indication that we should have the undo data in that scenario and then we can not read it, it seems like an error that the user should know about.
In rpc/rawtransaction.cpp, did you mean to introduce a variable like skip_details or something to break this up?
I think that would work, I don’t remember if I had something explicit in mind when I wrote it. It’s just an awfully long list of conditions already and summing parts of them up in a variable with a concise name would make it easier to parse. I have left two additional suggestions there for shortening it inline and I think there it’s the same question if we really want to silently return a limited result if we fail to read from disk what we are supposed to have. I’ll wait for you feedback on that part and my inline suggestions. If nothing else I think it would also help a lot to just add line breaks after every condition.
mzumsande
commented at 10:26 pm on September 10, 2024:
I think it makes sense to throw an error in this situation. Although it’s not necessarily due to disk corruption, it would also be possible that the block/undo data is pruned away right after the blockindex check (cs_main is released) and before the UndoReadFromDisk call.
We also have a functional test for just this situation that wouldn’t work anymore (here) but that tests doesn’t actually prune a block but deletes an undo file to simulate pruning. So we’d either have to just remove it or actually prune a block… I’ll try this out in the next days.
DrahtBot added the label
CI failed
on Sep 8, 2024
mzumsande force-pushed
on Sep 9, 2024
mzumsande
commented at 11:24 pm on September 9, 2024:
contributor
mzumsande
commented at 6:04 pm on September 12, 2024:
I’ve implemented it with separate errors for block and undo missing.
mzumsande
commented at 6:05 pm on September 12, 2024:
done, good idea!
fjahr
commented at 12:09 pm on September 11, 2024:
contributor
@mzumsande Sorry, I had written the comment below already earlier but failed to submit it because of github being github. But it seems like it’s in line with your response.
in
src/rpc/rawtransaction.cpp:409
in
92236f43ffoutdated
furszy
commented at 11:00 pm on September 11, 2024:
q: if the block is only used to obtain the undo data, shouldn’t this be a || !chainman.m_blockman.ReadBlockFromDisk(block, *blockindex))? (negated)
mzumsande
commented at 1:24 am on September 12, 2024:
We need both block and undo data below if we don’t skip with the non-detailed response here. There is a parenthesis after the first negation, so the logic is not(a and b) which is the same as not(a) or not(b).
furszy
commented at 12:06 pm on September 12, 2024:
ah, missed the extra parenthesis. Good.
Might be good to decouple it as proposed in #30410 (review).
I wish there would be methods for “hasBlockData()” and “hasUndoData()” so we are not forced to use the internal BlockStatus enum everywhere.
furszy
commented at 12:08 pm on September 12, 2024:
member
Changed the last commit such that now it throws an error if our block index indicates we should have block/undo data but we can’t read it.
Also changed the functional test: Before, it would try to imitate a “pruning” situation by just deleting the undo file. This made little sense to me, because when we prune we prune both block and undo data together (and would have failed earlier due to missing block data). The situation with block data and no undo data can occur though, with blocks that haven’t been connected. I’ve added a test for that.
ccee8a2 to a5df908: rebased due to unrelated CI failure and fixed conflict with #30807
mzumsande force-pushed
on Sep 12, 2024
mzumsande force-pushed
on Sep 12, 2024
DrahtBot removed the label
CI failed
on Sep 13, 2024
in
src/rpc/blockchain.cpp:605
in
9c32d906dboutdated
More of a question: when check_for_undo is true do we actually want to check for both? Having undo and not data set here would seem to be something going weirdly wrong though I guess. And it’s definitely not a blocker because from the way it is used I can see that this works fine as is.
0 uint32_t flag = check_for_undo ? BLOCK_HAVE_MASK : BLOCK_HAVE_DATA;
mzumsande
commented at 3:03 pm on September 13, 2024:
I agree that it shouldn’t be possible to have undo data but no block data.
On the other hand, callers of this function may not even be interested in the block data, and I’m not sure if a RPC call is the right place to check for this inconsistency. At least it’s checked in CheckBlockIndex()here, so I think I’ll leave this as is for now.
in
src/rpc/rawtransaction.cpp:413
in
a5df908d98outdated
410+ if (tx->IsCoinBase() || !blockindex || WITH_LOCK(::cs_main, return !(blockindex->nStatus & BLOCK_HAVE_MASK))) {
411 TxToJSON(*tx, hash_block, result, chainman.ActiveChainstate());
412 return result;
413 }
414+ if (!chainman.m_blockman.UndoReadFromDisk(blockUndo, *blockindex)) {
415+ throw JSONRPCError(RPC_INTERNAL_ERROR, "Undo data expected but can't be read. This could be due to disk coruption or a conflict with a pruning event.");
mzumsande
commented at 3:03 pm on September 13, 2024:
fixed
in
src/rpc/rawtransaction.cpp:416
in
a5df908d98outdated
413 }
414+ if (!chainman.m_blockman.UndoReadFromDisk(blockUndo, *blockindex)) {
415+ throw JSONRPCError(RPC_INTERNAL_ERROR, "Undo data expected but can't be read. This could be due to disk coruption or a conflict with a pruning event.");
416+ }
417+ if (!chainman.m_blockman.ReadBlockFromDisk(block, *blockindex)) {
418+ throw JSONRPCError(RPC_INTERNAL_ERROR, "Block data expected but can't be read. This could be due to disk coruption or a conflict with a pruning event.");
in
src/rpc/blockchain.cpp:206
in
a5df908d98outdated
200@@ -201,8 +201,10 @@ UniValue blockToJSON(BlockManager& blockman, const CBlock& block, const CBlockIn
201 case TxVerbosity::SHOW_DETAILS_AND_PREVOUT:
202 CBlockUndo blockUndo;
203 const bool is_not_pruned{WITH_LOCK(::cs_main, return !blockman.IsBlockPruned(blockindex))};
204- const bool have_undo{is_not_pruned && blockman.UndoReadFromDisk(blockUndo, blockindex)};
205-
206+ bool have_undo{is_not_pruned && WITH_LOCK(::cs_main, return blockindex.nStatus & BLOCK_HAVE_UNDO)};
207+ if (have_undo && !blockman.UndoReadFromDisk(blockUndo, blockindex)) {
208+ throw JSONRPCError(RPC_INTERNAL_ERROR, "Undo data expected but can't be read. This could be due to disk coruption or a conflict with a pruning event.");
Feel free to ignore typo comments unless you retouch.
Also saw a typo in 8bfb0bd7bd5f8963fb1f58988f9bb36199dc98de commit message: “dowloaded”
DrahtBot requested review from furszy
on Sep 13, 2024
rest: improve error when only header of a block is available.
This avoids calling ReadRawBlockFromDisk() when the block is expected
not to be available because we haven't downloaded it yet and only know
the header.
e5b537bbdf
rpc: Improve getblock / getblockstats error when only header is available.
This improves the error message of the getblock and getblockstats rpc and prevents calls to
ReadRawBlockFromDisk(), which are unnecessary if we know
from the header nStatus field that the block is not available.
5290cbd585
test: add coverage to getblock and getblockstats
also removes an unnecessary newline.
Co-authored-by: tdb3 <106488469+tdb3@users.noreply.github.com>
69fc867ea1
rpc: Improve gettxoutproof error when only header is available.6cbf2e5f81
rpc: check block index before reading block / undo data
This avoids low-level log errors that are supposed to only occur when
there is an actual problem with the block on disk missing unexpectedly,
but not in the case where the block and/or undo data are expected not to be there.
It changes behavior such that in the first case (block index indicates
data is available but retrieving it fails) an error is thrown.
It also adjusts a functional tests that tried to simulate not
having undo data (but having block data) by deleting the undo file.
This situation should occur reality because block and undo data are pruned together.
Instead, test this situation with a block that hasn't been connected.
6a1aa510e3
mzumsande force-pushed
on Sep 13, 2024
mzumsande
commented at 3:04 pm on September 13, 2024:
contributor
in
test/functional/rpc_blockchain.py:617
in
6a1aa510e3
619620 self.log.info("Test getblock with invalid verbosity type returns proper error message")
621 assert_raises_rpc_error(-3, "JSON value of type string is not of expected type number", node.getblock, blockhash, "2")
622623- self.log.info("Test that getblock with verbosity 2 and 3 still works with pruned Undo data")
624+ self.log.info("Test that getblock doesn't work with deleted Undo data")
andrewtoth
commented at 3:17 pm on September 13, 2024:
nit: why remove the extra context that we are testing with verbosity 2 and 3 specifically?
in
test/functional/rpc_blockchain.py:558
in
6a1aa510e3
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2025-04-05 09:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me