net: move cs_main up in FetchBlock to fix rpc assert crash #35498

pull Crypt-iQ wants to merge 1 commits into bitcoin:master from Crypt-iQ:06092026/fetchblock_fix changing 1 files +6 −2
  1. Crypt-iQ commented at 6:12 PM on June 9, 2026: contributor

    A benign, racy assert can fail when calling FetchBlock and the peer is being cleaned up.

    1. FetchBlock runs in a http worker thread. It acquires a PeerRef, locks cs_main, then may later call BlockRequested which asserts that CNodeState exists for the peer.

    2. FinalizeNode may run in either the bitcoind or b-net threads. It locks cs_main, fetches a PeerRef from RemovePeer, fetches a CNodeState, and later removes it from m_node_states.

    Because of the lock placement in FetchBlock, the http worker thread in 1) can acquire a valid PeerRef and block while the b-net thread in 2) is cleaning up the peer in FinalizeNode. When the worker thread later acquires cs_main, it may crash in BlockRequested since no CNodeState exists. Fix this by acquiring the lock earlier in FetchBlock.

    I tested the assert can be hit and the fix works by adding sleeps. Was introduced in #25514 which moved the lock down.

  2. DrahtBot added the label P2P on Jun 9, 2026
  3. DrahtBot commented at 6:12 PM on June 9, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35498.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #35113 (net: introduce block tracker to retry to download blocks after failure by optout21)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  4. in src/net_processing.cpp:1980 in e3f8fac9b8 outdated
    1976 | @@ -1977,15 +1977,15 @@ util::Expected<void, std::string> PeerManagerImpl::FetchBlock(NodeId peer_id, co
    1977 |  {
    1978 |      if (m_chainman.m_blockman.LoadingBlocks()) return util::Unexpected{"Loading blocks ..."};
    1979 |  
    1980 | +    LOCK(cs_main);
    


    maflcko commented at 6:47 PM on June 9, 2026:

    Maybe add a comment here, why this is needed? Otherwise someone will move this down again the next time this function is touched.

    Also, I guess an alternative fix would be to move the fields from CNodeState to Peer? Probably too tedious to do here, but if true, it could be mentioned in the comment as well, so that it is known when the cs_main lock can be removed.


    Crypt-iQ commented at 7:52 PM on June 9, 2026:

    Maybe add a comment here, why this is needed? Otherwise someone will move this down again the next time this function is touched.

    Good point, added in 359680b74db840c2424b944af358ed215aff3019 and left a note in the commit message.

    Also, I guess an alternative fix would be to move the fields from CNodeState to Peer? Probably too tedious to do here, but if true, it could be mentioned in the comment as well, so that it is known when the cs_main lock can be removed.

    I thought the same, but I think this doesn't work. If fields are moved from CNodeState to Peer and you replace the lock with a net-specific lock without moving it up, the assert may still get hit if a lock is not held the whole time in FetchBlock (though it depends what the fix looks like). Even though refactoring CNodeState to Peer doesn't fix it, I still think it should be done since trying to manage CNodeState and Peer is harder to reason about.


    maflcko commented at 1:56 PM on June 10, 2026:

    I thought the same, but I think this doesn't work. If fields are moved from CNodeState to Peer and you replace the lock with a net-specific lock without moving it up, the assert may still get hit if a lock is not held the whole time in FetchBlock

    The assert is only hit when the peer lookup fails. So I was thinking about moving the fields to Peer and then pass a Peer& ref instead of NodeId to BlockRequested? This way, there is no lookup and thus the lookup can't fail.


    Crypt-iQ commented at 3:59 PM on June 10, 2026:

    The assert is only hit when the peer lookup fails. So I was thinking about moving the fields to Peer and then pass a Peer& ref instead of NodeId to BlockRequested? This way, there is no lookup and thus the lookup can't fail.

    Oh yes, this will work. I think vBlocksInFlight, m_downloading_since, and m_stalling_since need to be moved to Peer. I'll think more on it since this logic confuses me, I like your suggestion over this PR since it prunes down CNodeState a little more.

    Also, I think currently an assert can be hit in RemoveBlockRequest as well if there is an in-flight block (didn't test).

  5. Crypt-iQ force-pushed on Jun 9, 2026
  6. net: move cs_main up in FetchBlock to fix rpc assert crash
    1. FetchBlock runs in a http worker thread. It acquires a PeerRef,
    locks cs_main, then may later call BlockRequested which asserts
    that CNodeState exists for the peer.
    
    2. FinalizeNode may run in either the bitcoind or b-net threads. It
    locks cs_main, fetches a PeerRef from RemovePeer, fetches a CNodeState,
    and later removes it from m_node_states.
    
    Because of the lock placement in FetchBlock, the http worker thread in 1)
    can acquire a valid PeerRef and block while the b-net thread in 2) is
    cleaning up the peer in FinalizeNode. When the worker thread later acquires
    cs_main, it may crash in BlockRequested since no CNodeState exists. Fix
    this by acquiring the lock earlier in FetchBlock.
    
    The lock can be replaced with a net-specific lock when the remaining
    CNodeState fields are moved to Peer.
    359680b74d
  7. Crypt-iQ force-pushed on Jun 9, 2026
  8. DrahtBot added the label CI failed on Jun 9, 2026
  9. DrahtBot removed the label CI failed on Jun 9, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-06-11 10:51 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me