Inspired by #11913 and #26308.
cs_main
doesn’t need to be locked while reading blocks. This removes the locks in net_processing
.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
For detailed information about the code coverage, see the test coverage report.
See the guideline for information on the review process.
Type | Reviewers |
---|---|
ACK | sr-gi, furszy, mzumsande, TheCharlatan, achow101 |
Concept ACK | dergoegge, pablomartin4btc |
Approach ACK | hernanmarino |
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
3874+ is_above_blocktxn_depth = pindex->nHeight >= m_chainman.ActiveChain().Height() - MAX_BLOCKTXN_DEPTH;
3875+ }
3876+ if (is_above_blocktxn_depth) {
3877+ CBlock block;
3878+ if (!ReadBlockFromDisk(block, pindex, m_chainparams.GetConsensus())) {
3879+ LogPrint(BCLog::NET, "Cannot load block from disk. It was likely pruned before we could read it.\n");
MIN_BLOCKS_TO_KEEP
, which is also enforced in regtest
FlatFilePos
in ReadBlockFromDisk
so it doesn’t have to reacquire the lock again after releasing it.
210@@ -211,7 +211,7 @@ void UnlinkPrunedFiles(const std::set<int>& setFilesToPrune);
211 /** Functions for disk access for blocks */
212 bool ReadBlockFromDisk(CBlock& block, const FlatFilePos& pos, const Consensus::Params& consensusParams);
213 bool ReadBlockFromDisk(CBlock& block, const CBlockIndex* pindex, const Consensus::Params& consensusParams);
214-bool ReadRawBlockFromDisk(std::vector<uint8_t>& block, const FlatFilePos& pos, const CMessageHeader::MessageStartChars& message_start);
215+bool ReadRawBlockFromDisk(std::vector<uint8_t>& block, const CBlockIndex* pindex, const CMessageHeader::MessageStartChars& message_start);
0bool ReadRawBlockFromDisk(std::vector<uint8_t>& block, const CBlockIndex& block, const CMessageHeader::MessageStartChars& message_start);
shouldn’t the *
be a &
, like before? (nullptr
is not accepted)
Can someone add a “needs benchmark” label?
What kind of benchmark would be appropriate here? AFAICT message processing is done on a single thread, so this patch wouldn’t speed up handling requests. It would free up other threads that are waiting on the lock as the message processing thread does lengthy disk IO.
I wrote a tool to allow benchmarking the affected code paths https://github.com/andrewtoth/spam_block_reqs. I ran it on block 768022 using 5000 requests for each of the four paths on both this branch and master. For example:
0cargo run --release -- -r block-transactions -b 000000000000000000064b1c6e9606714fd4a96d6ff877e4a93b170d86361e28 -n 5000
I did not benchmark the MSG_FILTERED_BLOCK
path because it isn’t supported in rust-bitcoin
, however there isn’t anything in that path that requires taking the lock other than the shared code for the other paths.
Results show no regression:
Request Type | master (cb32328d1b80d0ccd6eb9532bd8fe4e0a4de385e) | branch |
---|---|---|
witness-block | 22.6s | 22.3s |
compact-block | 60.9s | 59.1s |
block-transactions | 58.5s | 59.0s |
legacy-block | 77.7s | 77.4s |
However, while running the spam-blocks tool using a very high number of requests so it would not complete, I ran a benchmark with ApacheBench requesting 2000 headers from block 750k 1000 times on a single core using REST.
0ab -n 1000 -c 1 "http://127.0.0.1:8332/rest/headers/0000000000000000000592a974b1b9f087cb77628bb4a097d5c2c11b3476a58e.bin?count=2000"
The mean response times for fetching headers while running the spam-blocks tool for each request-type is below:
Request Type | master (cb32328d1b80d0ccd6eb9532bd8fe4e0a4de385e) | branch |
---|---|---|
witness-block | 14ms | 1ms |
compact-block | 19ms | 1ms |
block-transactions | 19ms | 1ms |
legacy-block | 42ms | 1ms |
So it seems like this definitely improves responsiveness for any other threads that need to acquire a lock to cs_main
.
3878 }
3879
3880+ if (!block_pos.IsNull()) {
3881+ CBlock block;
3882+ bool ret = ReadBlockFromDisk(block, block_pos, m_chainparams.GetConsensus());
3883+ assert(ret);
ReadBlockFromDisk
return false if the block is not on disk anymore?
MAX_BLOCKTXN_DEPTH
can’t be pruned, so we should retain the current behavior if this read fails.
2209@@ -2205,23 +2210,29 @@ void PeerManagerImpl::ProcessGetBlockData(CNode& pfrom, Peer& peer, const CInv&
2210 if (!(pindex->nStatus & BLOCK_HAVE_DATA)) {
2211 return;
2212 }
2213+ can_direct_fetch = CanDirectFetch();
2214+ block_pos = pindex->GetBlockPos();
2215+ }
2216+ const CNetMsgMaker msgMaker(pfrom.GetCommonVersion());
2171@@ -2172,16 +2172,20 @@ void PeerManagerImpl::ProcessGetBlockData(CNode& pfrom, Peer& peer, const CInv&
2172 }
2173 }
2174
2175+ const CBlockIndex* pindex;
2176+ const CBlockIndex* tip;
2177+ bool can_direct_fetch;
diff-ACK fae1fd61
Have to admit that I’m not fan of the first commit. It forces me to remember where the indentation changes were located when I’m reviewing the other commits.
2314 std::shared_ptr<CBlock> pblockRead = std::make_shared<CBlock>();
2315- if (!m_chainman.m_blockman.ReadBlockFromDisk(*pblockRead, *pindex)) {
2316- assert(!"cannot load block from disk");
2317+ if (!m_chainman.m_blockman.ReadBlockFromDisk(*pblockRead, block_pos)) {
2318+ LogPrint(BCLog::NET, "Cannot load block from disk. It was likely pruned before we could read it.\n");
2319+ return;
Last commit: Not sure. This looks like a p2p protocol change?
For the local node, this change means that the node will continue functional instead of crashing. Which I think is good. The node shouldn’t crash because it received a p2p message that cannot answer.
For the remote peer, this means that the connection will remain active for ~10 more minutes until the block request times out and the peer closes the socket. Which isn’t the best, but it is how we currently handle unanswered block requests.
Are you seeing something else?
For the local node, this change means that the node will continue functional instead of crashing.
The node wouldn’t crash here, because cs_main was being held, so no pruning could happen in-between.
My point is that it would be good to explain behavior changes, especially if they are p2p protocol changes. Hiding this under a “reduce cs_main” commit, which appears like a refactor doesn’t seem ideal.
For the local node, this change means that the node will continue functional instead of crashing.
The node wouldn’t crash here, because cs_main was being held, so no pruning could happen in-between.
Could also be that the requested block data is not on disk or not accessible for the time this happen. Something external to our software.
Still, removing the external cause from the equation. The only diff here is that we allow a tiny extra pruning window in-between the index lookup and the block data read. Because, we currently return early if the block is pruned here, without logging anything. And after this PR, we could either return early on the no logging line (if the block is already pruned) or log “Cannot load block from disk. It was likely pruned before we could read it” if the block was pruned right after the index lookup.
But this is all internal to the node, I’m not seeing any p2p protocol change.
Maybe @andrewtoth could take this to the PR description?
For the local node, this change means that the node will continue functional instead of crashing.
The node wouldn’t crash here, because cs_main was being held, so no pruning could happen in-between.
Could also be that the requested block data is not on disk or not accessible for the time this happen. Something external to our software.
If your datadir was corrupted from outside, I think a crash is acceptable. You likely wouldn’t be able to re-org or otherwise stay in consensus anyway.
I’m not seeing any p2p protocol change.
Maybe “protocol change” was the wrong word? I guess “behavior change” may be better?
If your datadir was corrupted from outside, I think a crash is acceptable. You likely wouldn’t be able to re-org or otherwise stay in consensus anyway.
You know; if the block in conflict is old enough, then the node should be “mostly” ok. The problem would be on new peers performing IBD, which would disconnect from the node for the lack of response.
I’m not seeing any p2p protocol change.
Maybe “protocol change” was the wrong word? I guess “behavior change” may be better?
Sounds good.
If a block is pruned before we lock cs_main, this function returns early. This behavior is not changed in this PR.
If a block is not pruned before we lock cs_main, previously we are guaranteed to send the block to the peer.
This PR changes this to have a short window after unlocking cs_main but before opening the fd that if the block is pruned and unlinked from another thread, then this function also returns early.
I think we all agree this is not a p2p protocol change? I also believe I documented this behavior change in the PR description and in the debug logs.
So the other behavior change is that if we fail to read a block for reasons other than being pruned, previously the node would crash. Now we will log the failure and continue.
If your datadir was corrupted from outside, I think a crash is acceptable. You likely wouldn’t be able to re-org or otherwise stay in consensus anyway.
I suppose if the read fails, we can retake cs_main and check if the block was pruned out from under us. If so, we log and continue, otherwise we can keep crashing on an assert? Do we really want to keep crashing here if we fail to read a block? I’m not sure.
if the read fails, we can retake cs_main and check if the block was pruned out from under us. If so, we log and continue, otherwise we can keep crashing on an assert @maflcko I updated the code to do this. Thus I don’t think there is any behavior change. If the block requested is pruned, we return early before and after this PR. If the block requested is unpruned and we fail to read it, we will crash before and after this PR.
Ah sorry for being unclear. I am not asking to add back the assert. Just adding the comment from #26326 (review) to the commit description would be great.
However, I wonder what the point is of forcing the remote peer into a 10 minute delay+disconnect. Why not disconnect immediately? Also, if the remote has set a permission flag to avoid the disconnect (not sure if this exists), I wonder what the correct behavior would be.
I am not asking to add back the assert.
+1 on not re-adding the assert.
However, I wonder what the point is of forcing the remote peer into a 10 minute delay+disconnect. Why not disconnect immediately? Also, if the remote has set a permission flag to avoid the disconnect (not sure if this exists), I wonder what the correct behavior would be.
If the remote peer has a permission flag to avoid the disconnection, maybe it could respond with a NOTFOUND
message? Same as we do with tx getdata requests.
But probably better to discuss this in a follow-up or in an issue?
Updated to log and return instead of hitting an assert. Also updated the commit description to clarify this behavior.
However, I wonder what the point is of forcing the remote peer into a 10 minute delay+disconnect. Why not disconnect immediately?
We could, but I think this is very unlikely to occur in practice. We already disconnect above on line 2293 if we are past the NODE_NETWORK_LIMITED_MIN_BLOCKS threshold, which is MIN_BLOCKS_TO_KEEP (288) so we won’t prune below that anyways.
4078 }
4079
4080+ if (!block_pos.IsNull()) {
4081+ CBlock block;
4082+ const bool ret{m_chainman.m_blockman.ReadBlockFromDisk(block, block_pos)};
4083+ assert(ret);
MAX_BLOCKTXN_DEPTH < MIN_BLOCKS_TO_KEEP
.
Would it be worth having a static assert for this when defining MAX_BLOCKTXN_DEPTH
?
0static_assert(MAX_BLOCKTXN_DEPTH < MIN_BLOCKS_TO_KEEP, "MIN_BLOCKS_TO_KEEP too low");
This was not obvious to me either, but I’m not too familiar with this part of net_processing
Concept ACK
I agree with @furszy in #26326 (review) that we should followup this with sending NotFounds for all the blocks that we are unable (or unwilling) to provide
we should followup this with sending NotFounds for all the blocks that we are unable (or unwilling) to provide
We can’t just change the protocol like that; other software than Bitcoin Core might not expect notfound
for block messages. But I do believe it would be useful to have some way of signalling (whether that’s notfound
or something else), and using the information, that a block is unavailable. It would need a short BIP, and a way to negotiate support for it, but it would mean that over time we may also be able to prefer downloading from peers which commit to supporting such a message, and perhaps have shorted timeouts for non-responses from those.
@sr-gi thank you for your review!
that we should followup this with sending NotFounds for all the blocks that we are unable (or unwilling) to provide
After #28120 I don’t think it is very likely for this to occur by updated nodes. For older nodes and other software that requests blocks past the prune threshold it would require a protocol change as @sipa mentioned.
2310 // as the network format matches the format on disk
2311 std::vector<uint8_t> block_data;
2312- if (!m_chainman.m_blockman.ReadRawBlockFromDisk(block_data, pindex->GetBlockPos())) {
2313- assert(!"cannot load block from disk");
2314+ if (!m_chainman.m_blockman.ReadRawBlockFromDisk(block_data, block_pos)) {
2315+ LogPrint(BCLog::NET, "Cannot load block from disk. It was likely pruned before we could read it.\n");
As written in #26326 (review), we never prune blocks > 288 blocks from the tip, and if pruning, we should have disconnected any peer requesting older blocks - so it should be impossible to trigger this log even with pruning (unless there are extreme scenarios like major reorgs). On the other hand, it could still be hit while not pruning due to disk problems - e.g. while helping other peers with IBD. Searching for this assert gives me multiple matches, e.g.
It apparently also crashed forks like Bitcoin Unlimited back in the days.
I’m not against removing it, but I think that if we do, we should at least have an error message that mentions the possibility of disk failures, and that is also unconditional so it doesn’t get hidden in the spammy NET log that very few people enable.
so it should be impossible to trigger this log even with pruning
Since we have a 2 block buffer, peers can request up to 290 blocks. So one way this can be triggered - a peer requests block 290 and we get the filepos, we release the lock, pruneblockchain
RPC is called, rpc thread acquires the lock and block 290 is pruned and unlinked before this thread can acquire the fd, read now fails.
I don’t think this is really likely to occur though.
Indeed, I think that’s the best way. I did that before #26326 (review) but other reviewers thought removing the assert is better. But, I think it makes it easier to merge this PR if there is strictly no behavior change, and other patches can be proposed to modify the assertion behavior or disconnecting before returning early. This PR is just to reduce lock scope when doing block reading.
So, if reading the block fails, we retake cs_main
and determine if the block was pruned out from under us. If so, we log and return early. Otherwise, we assert as before.
@maflcko @furszy what do you think?
determine if the block was pruned out from under us
That sounds like work, not sure how easy it is - I just meant to check whether we’re running in pruning mode, so the block could have been pruned theoretically. I don’t have a strong opinion between asserting / logging an error unconditionally, I just think that a conditional NET log is not sufficient in a case where something is seriously wrong on our end.
I just meant to check whether we’re running in pruning mode, so the block could have been pruned theoretically.
Just in case, it should also check that the block is further than the pruning threshold.
So, if reading the block fails, we retake cs_main and determine if the block was pruned out from under us. If so, we log and return early. Otherwise, we assert as before.
For now (at least in this PR), we should maintain the current network behavior if the assert is replaced. This means disconnecting the node immediately when it happens (same behavior as if the node crashed for the assertion. The socket gets closed).
Also adds a static assertion that MAX_BLOCKTXN_DEPTH <= MIN_BLOCKS_TO_KEEP
This also changes behavior if ReadBlockFromDisk or
ReadRawBlockFromDisk fails. Previously, the node would crash
due to an assert. This has been replaced with logging the error,
disconnecting the peer, and returning early.
Rebased and updated
static_assert(MAX_BLOCKTXN_DEPTH <= MIN_BLOCKS_TO_KEEP, "MAX_BLOCKTXN_DEPTH too high");
in first commitcs_main
and check if the block has been pruned. If so, we conditionally log to NET
category, otherwise we unconditionally log an error. In either case, we also disconnect the peer before returning.Thank you for your reviews @furszy @sr-gi @mzumsande !
ACK 75d27fe
The only differences are the ones mentioned, which align with the approach after considering #26326 (review)
2507+ if (WITH_LOCK(m_chainman.GetMutex(), return m_chainman.m_blockman.IsBlockPruned(*pindex))) {
2508+ LogPrint(BCLog::NET, "Block was pruned before it could be read, disconnect peer=%s\n", pfrom.GetId());
2509+ } else {
2510+ LogError("Cannot load block from disk, disconnect peer=%d\n", pfrom.GetId());
2511+ }
2512+ pfrom.fDisconnect = true;
2505- assert(!"cannot load block from disk");
2506+ if (!m_chainman.m_blockman.ReadRawBlockFromDisk(block_data, block_pos)) {
2507+ if (WITH_LOCK(m_chainman.GetMutex(), return m_chainman.m_blockman.IsBlockPruned(*pindex))) {
2508+ LogPrint(BCLog::NET, "Block was pruned before it could be read, disconnect peer=%s\n", pfrom.GetId());
2509+ } else {
2510+ LogError("Cannot load block from disk, disconnect peer=%d\n", pfrom.GetId());