rpc: getblockfrompeer, introduce fetch from “any” functionality #27836

pull furszy wants to merge 3 commits into bitcoin:master from furszy:2023_rpc_fetchblock_improvements changing 6 files +108 āˆ’20
  1. furszy commented at 6:43 pm on June 7, 2023: member

    Coming from #27652. Implementing the first part of it.

    The idea is to let users call getblockfrompeer without providing any peer id. The node will internally select one peer at random and make the getdata request.

    This also fixes a bug where the user is allowed to call getblockfrompeer providing an id of a peer that signals a “limited” service. As limited peers cannot provide historical blocks, it is not correct to allow the user to do that.

  2. DrahtBot commented at 6:43 pm on June 7, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Approach ACK ismaelsadeeq

    If your review is incorrectly listed, please react with šŸ‘Ž to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28055 (Bugfix: net_processing: Restore “Already requested” error for FetchBlock by luke-jr)
    • #27837 (net: introduce block tracker to retry to download blocks after failure by furszy)
    • #25722 (refactor: Use util::Result class for wallet loading by ryanofsky)
    • #25665 (refactor: Add util::Result failure values, multiple error and warning messages by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label RPC/REST/ZMQ on Jun 7, 2023
  4. DrahtBot added the label CI failed on Jun 7, 2023
  5. net: FetchBlock, disallow requesting an old block from a network limited peer
    Limited peers only store the last 288 blocks, not all historical blocks.
    e743efe691
  6. furszy force-pushed on Jun 10, 2023
  7. DrahtBot removed the label CI failed on Jun 11, 2023
  8. in src/net_processing.cpp:1784 in e743efe691 outdated
    1779@@ -1780,13 +1780,24 @@ std::optional<std::string> PeerManagerImpl::FetchBlock(NodeId peer_id, const CBl
    1780 {
    1781     if (m_chainman.m_blockman.LoadingBlocks()) return "Loading blocks ...";
    1782 
    1783+    // Only allow fetching from limited peers if the block height is within the allowed window
    1784+    bool allow_limited_peers = false;
    


    ismaelsadeeq commented at 4:33 pm on June 16, 2023:

    nit: feel free to ignore

    0    bool allow_limited_peers{false};
    
  9. in test/functional/rpc_getblockfrompeer.py:166 in e743efe691 outdated
    161+        pruned_node.add_p2p_connection(P2PInterface(), services=NODE_NETWORK_LIMITED | NODE_WITNESS)
    162+        limited_peer_id = next(peer for peer in pruned_node.getpeerinfo() if peer["servicesnames"] == ['WITNESS', 'NETWORK_LIMITED'])["id"]
    163+
    164+        pruned_block_10 = self.nodes[0].getblockhash(10)
    165+        assert_raises_rpc_error(-1, "Cannot fetch old block from a limited peer", pruned_node.getblockfrompeer, pruned_block_10, limited_peer_id)
    166+
    


    ismaelsadeeq commented at 4:49 pm on June 16, 2023:

    It would be nice to include a test specifically targeting the scenario of retrieving a block within the last 288 blocks from a Network Limited peer.

    The ‘prune’ option cannot be set below 550 MiB, it is unclear whether it allows pruning to the last 288 blocks. ( That is the possibility of a node not having any of the last 288 blocks ). We could just exclude limited peers from fetching the blocks?


    furszy commented at 8:12 pm on June 18, 2023:

    The ‘prune’ option cannot be set below 550 MiB, it is unclear whether it allows pruning to the last 288 blocks. ( That is the possibility of a node not having any of the last 288 blocks ).

    Have you seen FindFilesToPrune?, MIN_BLOCKS_TO_KEEP prevents pruning from deleting files that contain a block within the last 288 range.


    ismaelsadeeq commented at 10:11 am on June 19, 2023:
    Oh yeah, the MIN_BLOCKS_TO_KEEP are protected from pruning, all pruned nodes retain the last 288 blocks. Limited peers should just be excluded from fetching blocks, unless there are any cases where the last 288 blocks might not be available?

    furszy commented at 12:54 pm on June 19, 2023:

    Limited peers should just be excluded from fetching blocks, unless there are any cases where the last 288 blocks might not be available?

    Not yet. Basically, one of the features that have locally for the on-going #21267 working path, is to be able to sync-up the node on-demand by doing getblockfrompeer requests instead of using the node’s automatic sync process. Which is later used for the filters matching block requests too.

    Said that, no problem on excluding limited peers for now. We could enable them later, when the new feature is introduced.

  10. in src/rpc/blockchain.cpp:435 in 31ae2d20da outdated
    434@@ -435,7 +435,8 @@ static RPCHelpMan getblockfrompeer()
    435         "Returns an empty JSON object if the request was successfully scheduled.",
    


    ismaelsadeeq commented at 4:56 pm on June 16, 2023:
    0        "Returns the peer id we are requesting the block from if the request was successfully scheduled.",
    

    furszy commented at 8:15 pm on June 18, 2023:
    Updated, thanks
  11. ismaelsadeeq commented at 5:08 pm on June 16, 2023: member

    Approach ACK

    Nice to have the option to call getpeerinfo without explicitly passing a peer id. Often, I get block from a random peer, and the current implementation’s random selection of a peer when no id is provided aligns with that.

    Tests fail on the master pass on 31ae2d20da0800201376be71d41f19064edd3f92. I’ve left a few suggestions and a question.

  12. RPC: getblockfrompeer, return id of peer from where block is being fetched
    groundwork for introducing the fetch block from "any" peer functionality.
    911a8f320b
  13. RPC: getblockfrompeer, introduce fetch from "any" peer functionality
    If no 'peer_id' is provided, 'getblockfrompeer' will pick a random
    full node to fetch the block from.
    94e6eeabd9
  14. furszy force-pushed on Jun 18, 2023
  15. sipa commented at 12:32 pm on June 19, 2023: member

    I would much prefer an approach where this is handled by the existing background block fetches, because that one is aware of which peers have which blocks, and can load balance across this, and re-issue on timeout etc.

    It would make the functionality also usable for e.g. redownloading for wallet rescan after pruning or so.

  16. furszy commented at 2:23 pm on June 19, 2023: member

    I would much prefer an approach where this is handled by the existing background block fetches, because that one is aware of which peers have which blocks, and can load balance across this, and re-issue on timeout etc.

    Guess that you are talking about the random peer selection part? Like, you are expecting to have a set of “retry” blocks in net_processing, so FindNextBlocksToDownload can take one of those, mark it as in-flight, request it to a peer, and track the timeout etc (like what you suggested in #27652)

    Because, that is implemented in the follow-up #27837, in a slightly different way. With a BlockRequestTracker class, which does a bit more; with a pending requests queue instead of the retry set (I coded your suggestion first and IIRC, what didn’t convince me was the need to be continually traversing the retry set to see if any of those timed out).

    The purpose of this PR was to limit the scope of the initial changes to getblockfrompeer so that the user is aware of the source from which the block is initially being fetched, without introducing additional functionality beyond that. Then, in #27837, the tracking aspect is implemented, along with a fallback mechanism to handle cases where the initial fetch fails, and other related functionalities.

    But.. thinking further, what if instead of returning the initial peer id, we return the tracking request id from getblockfrompeer, then extend the command to return information about tracked block requests. With this, we wouldn’t need the random peer selection changes (nor most of the changes here), everything will be contained inside the background fetching logic. Which means that focus would be directly on #27837. The only drawback of this approach is that the “fetch from any” functionality will not be available until the entire BlockRequestTracker class is introduced. Which I think that is fine.

    What do you think?


    Side note about the tracker class: My idea is to start with this getblockfrompeer scenario, so we can add proper test coverage for the class, and then expand it to track what is currently being tracked by mapBlocksInFlight (which involves few more changes like the peer stalling detection).

    It would make the functionality also usable for e.g. redownloading for wallet rescan after pruning or so.

    Yeah. That one, and the rescan by using compact block filters, are my long-term goals.

  17. sipa commented at 2:53 pm on June 19, 2023: member

    @furszy Thanks for giving some background, I do see better where this is coming from now.

    But yes, my thinking is that it’s the wrong approach to assign block requests to random peers, for multiple reasons:

    • It doesn’t take into account which peer has what.
    • If it gets assigned incorrectly, there is no way to recover by retrying from another peer (on timeout or disconnect).
    • It ignores the existing information we have about currently outstanding block requests to peers (e.g. we may wish to avoid requesting from peers which already have many unanswered block requests).

    And it seemed to me that tacking something on, without reusing the existing logic for downloading and tracking blocks, seemed both unnecessary and inferior to me.

    However, thinking about it more, I realize there is perhaps something I missed, if that’s an intended use case: downloading blocks we don’t have a block header for. All the existing logic in FindNextBlockToDownload, the reasoning about which peers have what, and the timeout logic/tracking around block downloading in general is based on identifying blocks using their entry in the block index. For downloading blocks just by hash, very little of that is reusable. So perhaps the right (eventual) approach is actually treating these two separately: redownloading of known blocks using the existing logic, but downloading of unknown blocks using a new mechanism.

    Regarding identification of tracking requests: is there a need for that? For normal IBD/sync, there is no such functionality either. Perhaps I’m missing the use case for this. If you do need an identifier to track (re)download requests, what about using the block hash itself? Multiple redownload requests for the same block don’t make sense anyway.

  18. furszy commented at 4:18 pm on June 19, 2023: member

    Thanks sipa for all the feedback!

    @furszy Thanks for giving some background, I do see better where this is coming from now.

    But yes, my thinking is that it’s the wrong approach to assign block requests to random peers, for multiple reasons:

    • It doesn’t take into account which peer has what.
    • If it gets assigned incorrectly, there is no way to recover by retrying from another peer (on timeout or disconnect).
    • It ignores the existing information we have about currently outstanding block requests to peers (e.g. we may wish to avoid requesting from peers which already have many unanswered block requests).

    And it seemed to me that tacking something on, without reusing the existing logic for downloading and tracking blocks, seemed both unnecessary and inferior to me.

    Ok yeah, now I see the full picture, and agree. For manually requested blocks, #27837 introduces most of those points while re-using the existing downloading logic. It just need few adjustments.

    However, thinking about it more, I realize there is perhaps something I missed, if that’s an intended use case: downloading blocks we don’t have a block header for. All the existing logic in FindNextBlockToDownload, the reasoning about which peers have what, and the timeout logic/tracking around block downloading in general is based on identifying blocks using their entry in the block index. For downloading blocks just by hash, very little of that is reusable. So perhaps the right (eventual) approach is actually treating these two separately: redownloading of known blocks using the existing logic, but downloading of unknown blocks using a new mechanism.

    Hmm, yeah. Right now, it is not possible to call getblockfrompeer with a hash of an unknown block. But, might be useful to enable it in the future. The tracking mechanism being introduced in #27837 can help with the timeouts, disconnections and even to set a max number of download attempts or a request TTL.

    Regarding identification of tracking requests: is there a need for that? For normal IBD/sync, there is no such functionality either. Perhaps I’m missing the use case for this. If you do need an identifier to track (re)download requests, what about using the block hash itself? Multiple redownload requests for the same block don’t make sense anyway.

    Yeah, no.. I had one of those fake epiphanies, probably originated by a lack of caffeine in my system.. block requests are tracked by hash in #27837. So we can easily retrieve their tracking information with it.


    So, to summarize:

    Iā€™m going to decouple the commit that prevents users from calling getblockfrompeer with a network limited peer id into another PR (bugfix). Close this one, and adjust #27837 so it also handles the initial peer selection (removing the random selection). This is because the existing download logic is used in 27837 when the initial download attempt fails.

  19. furszy commented at 12:28 pm on August 2, 2023: member
    Work implemented in #27837. Closing.
  20. furszy closed this on Aug 2, 2023

  21. bitcoin locked this on Aug 1, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me