p2p: Increase tx relay rate #28592

pull ajtowns wants to merge 1 commits into bitcoin:master from ajtowns:202310-txrelayrate changing 1 files +1 −1
  1. ajtowns commented at 1:59 am on October 5, 2023: contributor

    In the presence of smaller transactions on the network, blocks can sustain a higher relay rate than 7tx/second. In this event, the per-peer inventory queues can grow too large.

    This commit bumps the rate up to 14 tx/s (for inbound peers), increasing the safety margin by a factor of 2.

    Outbound peers continue to receive relayed transactions at 2.5x the rate of inbound peers, for a rate of 35tx/second.

  2. DrahtBot commented at 1:59 am on October 5, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Stale ACK naumenkogs

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. DrahtBot added the label P2P on Oct 5, 2023
  4. ajtowns commented at 2:02 am on October 5, 2023: contributor

    Reopens #27630 with a reduced rate (14tx/s vs 22tx/s). Here’s a graph showing the rise in the tx rate over the past year or so, generated from getchaintxstats:

    image

    Spreadsheet with the detailed numbers (and command line for reproducing) at: https://docs.google.com/spreadsheets/d/1RSWOmT_fWOdDERuI7yM_v_57C6DkCBojebyILT4eVZw/edit?usp=sharing

    Note that these numbers will be an underestimate as it ignores RBFed transactions, and transaction bursts.

    (144 blocks is about 24 hours, 36 blocks is about 6 hours)

  5. ajtowns commented at 2:10 am on October 5, 2023: contributor
    Perhaps worth explicitly noting: this is simplified compared to #27630 since #27675 removed INVENTORY_MAX_RECENT_RELAY (and m_recently_announced_invs for which that constant was needed). If it’s backported without that PR, then that constant will also need to be updated, which will increase per-peer memory usage. Backporting shouldn’t be necessary though, as #27610 (see also #27613, #27614) should be sufficient.
  6. Sjors commented at 2:38 pm on October 5, 2023: member
    Seems reasonable. I’ll run with 80489ba6e8430177db5c1d0c9148e83e410bd704 for a while.
  7. ariard commented at 7:30 pm on October 5, 2023: contributor
    If this is same than #27630 at least one benefit for second-layer and Bitcoin applications can be too document tx relay rate, see #27630 (comment)
  8. naumenkogs commented at 8:11 am on October 9, 2023: member

    ACK 80489ba6e8430177db5c1d0c9148e83e410bd704

    The justification above makes sense. With this change, going through m_tx_inventory_to_send will be more predictable and reactive, making the tx relay smoother.

    I also checked that the catch up code introduced in 5b3406094f2679dfb3763de4414257268565b943 still makes sense. (I think the commit message from there better be in the code because the division by 1000 is not trivial). I think catching up will become slower in the case when m_tx_inventory_to_send is around 1,000,000; but I think it’s fine.


    I also want to note that it’s not exactly the relay rate, it’s the rate of going through the queue. Because some transactions won’t be relayed e.g. due to the fee filter.

  9. luke-jr referenced this in commit 87cffd6789 on Oct 19, 2023
  10. ajtowns commented at 5:28 pm on November 28, 2023: contributor
    @Sjors did you want to comment on any results re #28592 (comment) ?
  11. Sjors commented at 6:10 pm on November 28, 2023: member
    @ajtowns it did not crash my node. That’s about the extend of monitoring I did on it, not sure what to measure.
  12. ajtowns commented at 3:35 pm on November 30, 2023: contributor
    <instagibbs> _aj_ might be helpful to know what the potential downsides are to the PR
                 (I have not thought about this stuff deeply)
    <_aj_> instagibbs: ie, why there's a limit at all?
    

    The code comment explanation of this is:

    0/** Maximum rate of inventory items to send per second.
    1 *  Limits the impact of low-fee transaction floods. */
    2static constexpr unsigned int INVENTORY_BROADCAST_PER_SECOND = 7;
    

    My understanding of this limit is that our tx relay works like this:

    • we have perhaps 100 peers, who send INV messages to us at whatever rate they like
    • our first limit applies if they send us huge bursts of INV messages that we/they can’t keep up to with GETDATA/TX: in that case AddTxAnnouncement will hit the MAX_PEER_TX_ANNOUNCEMENTS limit (5000, overridden by “relay” perm), and drop INV messages until we catch up (due to sending a GETDATA for the corresponding tx and getting a reply from any node; or a NOTFOUND or timeout by this node)
    • we then send GETDATA for the new txs: the second limit applies here - if we already have more than MAX_PEER_TX_REQUEST_IN_FLIGHT GETDATA requests outstanding with this peer (100), we’ll wait an extra OVERLOADED_PEER_DELAY seconds (2) in the hopes we can get the tx from another peer first
    • then we get the TX response, and add it to our mempool and relay it to everyone else
    • but the final rate limit applies here: we limit our outbound INV announcemnets to only happen every INBOUND_INVENTORY_BROADCAST_INTERVAL (5s; 2s for outbound connections) on average, and aim to only include INVENTORY_BROADCAST_TARGET (7*5 = 35) txs in each announcement; which is bumped up by an additional 5 txs for every 1000 txs of backlog we have (see #27610).

    The first two rate limits have no effect if our peers are high-bandwidth well-behaved peers: they can send us 500 tx/s every second indefinitely, and we’ll accept them all into our mempool provided they’re valid/obey rbf rules/etc. That’s a huge waste of bandwidth though: we can only put txs into blocks at much lower rates, so perhaps 450 or more of those 500 txs are going to either be replaced before being mined or just expire. We can’t really stop our peers from wasting bandwidth by sending tx data to us, but we can be kind to our peers, and only relay tx data that’s likely to be mined: so we pick the top set of txs (CompareInvMempoolOrder) trim the txs that the peer has announced to us, and then limit that to a rate somewhat around what can reasonably be confirmed.

    (Note that the we’ll send txs faster to our outbounds by a factor of 2.5x (35tx/2s instead of 35tx/5s), and that we don’t announce txs that have been announced to us also has a multiplying effect)

    The drawback of having this number be too low is thus (a) that txs may not propagate well, and this may affect txs that should go in the next blocks if feerates are rising, and (b) that our internal queues of what to send (m_tx_inventory_to_send) may grow large. The drawback of having this number be too high is that we will waste our peers’ bandwidth and cpu by giving them more txs than can possibly go into blocks, which they’ll then have to validate and may in turn relay onwards.

  13. ajtowns commented at 3:39 pm on November 30, 2023: contributor

    @ajtowns it did not crash my node. That’s about the extend of monitoring I did on it, not sure what to measure.

    I have a couple of patches at https://github.com/ajtowns/bitcoin/tree/202305-reportnetdata which can be used to track if the inv_to_send queues grow large and how quickly they empty out. With those patches, this script can be interesting:

    0bitcoin-cli getmempoolinfo | jq -j '"loaded:", .loaded, " ", .size, " txs ", (.bytes/1000000|floor), "MvB ", (.usage/.maxmempool*1000|floor)/10, "% used ", (.mempoolminfee*1e6|floor)/10, "s/vB minfee\n"'
    1echo
    2bitcoin-cli getnetworkinfo | jq -j '(.internal | to_entries | .[] | (.value, " ", .key, "\n"))' | fmt
    3echo
    4bitcoin-cli getpeerinfo | jq -j '.[] | (.inv_to_send, " ", .connection_type, " ", .id, " ", .relaytxes, " ", .synced_blocks, " ", (.network|sub("not_publicly_routable";"onion")), " inv-rx:", ((.bytesrecv_per_msg.inv//0)/1024|floor), "kB inv-tx:", ((.bytessent_per_msg.inv//0)/1024|floor), "kB gd-rx:", ((.bytesrecv_per_msg.getdata//0)/1024|floor), "kB nf-tx:", ((.bytessent_per_msg.notfound//0)/1024|floor), "kB minfee:", (.minfeefilter*1e6|floor)/10, "s/vb ", (.subver|gsub(" |(^$)"; "_")), " ", .addr, "\n")' | sort -rn | column -t
    
  14. amitiuttarwar commented at 9:25 pm on December 14, 2023: contributor
    posted a topic to delving bitcoin to brainstorm using warnet to evaluate the impacts of this change
  15. DrahtBot added the label CI failed on Jan 17, 2024
  16. p2p: Increase tx relay rate
    In the presence of smaller transactions on the network, blocks can sustain a
    higher relay rate than 7tx/second. In this event, the per-peer inventory queues
    can grow too large.
    
    This commit bumps the rate up to 14 tx/s (for inbound peers), increasing the
    safety margin by a factor of 2.
    
    Outbound peers continue to receive relayed transactions at 2.5x the rate of
    inbound peers, for a rate of 35tx/second.
    
    Co-Authored-By: Suhas Daftuar <sdaftuar@gmail.com>
    22c2b52c12
  17. ajtowns force-pushed on Feb 13, 2024
  18. Americason1 approved
  19. DrahtBot removed the label CI failed on Feb 13, 2024
  20. sdaftuar commented at 11:46 am on February 13, 2024: member

    The drawback of having this number be too high is that we will waste our peers’ bandwidth and cpu by giving them more txs than can possibly go into blocks, which they’ll then have to validate and may in turn relay onwards.

    Just to expand on this a bit more, a benefit of this number not being too high is that we reserve some of the network’s relay capacity for higher feerate transactions that might show up after, say, a flood of low feerate transactions arrives.

    If we had no limit at all, then you could think of relay as being much more like a big FIFO queue, where the first transactions announced are the ones that end up being relayed to everyone. By buffering transactions, we allow higher feerate ones that are announced later to jump the queue.

    See also #7805, where gmaxwell originally proposed this behavior (before it was adopted in #7840).

  21. Retropex referenced this in commit c247609a9d on Mar 28, 2024
  22. achow101 requested review from glozow on Apr 9, 2024
  23. DrahtBot added the label CI failed on May 12, 2024
  24. DrahtBot removed the label CI failed on May 13, 2024
  25. DrahtBot added the label CI failed on Sep 12, 2024
  26. DrahtBot removed the label CI failed on Sep 15, 2024
  27. Sjors commented at 8:08 am on September 18, 2024: member
    I’ve been running this patch for almost a year now. Didn’t do very extensive monitoring, but at least it didn’t crash the node or noticeably upset the machine.
  28. DrahtBot added the label CI failed on Oct 24, 2024
  29. DrahtBot commented at 10:00 am on October 25, 2024: contributor
    Could rebase for fresh CI after 8 months?
  30. DrahtBot removed the label CI failed on Oct 31, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me