net: fix premature stale flagging of unpicked private broadcast txs #34873

pull Mccalabrese wants to merge 2 commits into bitcoin:master from Mccalabrese:fix/private-broadcast-epoch changing 5 files +60 −22
  1. Mccalabrese commented at 4:22 am on March 20, 2026: none

    Motivation Currently, freshly added transactions in private_broadcast are almost immediately flagged and logged as stale by the resend-stale job.

    The Bug m_transactions maps a transaction to a std::vector<SendStatus>. When try_emplace adds a new transaction, this vector is empty. When GetStale() runs, DerivePriority() evaluates the empty vector and returns a default Priority struct where last_confirmed evaluates to the Unix Epoch (Jan 1, 1970). The stale checker sees a 50-year-old timestamp and flags it on the next resend-stale cycle.

    The Fix Rather than modifying the transient Priority struct or creating a “Zombie Transaction” edge case by ignoring transactions with 0 picks, this PR modifies the state container:

    • Wraps the SendStatus vector in a new TxSendStatus struct inside private_broadcast.h.
    • TxSendStatus automatically captures time_added upon emplace.
    • GetStale() now checks p.num_confirmed == 0 to measure age against time_added using a new 5-minute INITIAL_STALE_DURATION grace period, falling back to last_confirmed and the standard 1-minute STALE_DURATION once network interaction begins.

    Additional Polish

    • Exposed time_added via the getprivatebroadcastinfo RPC endpoint so users can see when a transaction entered the queue.
    • Added a dedicated stale_unpicked_tx test case and updated private_broadcast_tests.cpp to properly mock the passage of time for the new grace period. Closes #34862
  2. DrahtBot added the label P2P on Mar 20, 2026
  3. DrahtBot commented at 4:22 am on March 20, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK andrewtoth
    Concept ACK w0xlt
    Approach ACK vasild

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34707 (net: keep finished private broadcast txs in memory by andrewtoth)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. Mccalabrese marked this as ready for review on Mar 20, 2026
  5. in src/test/private_broadcast_tests.cpp:124 in 6b69f048f2
    120@@ -114,8 +121,9 @@ BOOST_AUTO_TEST_CASE(basic)
    121         BOOST_CHECK(!peers[0].received.has_value());
    122     }
    123 
    124-    BOOST_CHECK_EQUAL(pb.GetStale().size(), 1);
    125-    BOOST_CHECK_EQUAL(pb.GetStale()[0], tx_for_recipient2);
    126+    const auto stale_after_one_confirm{pb.GetStale()};
    


    optout21 commented at 10:44 am on March 20, 2026:
    Nit naming: This name is rather specific, prone to go out of sync when test is changed or code is copied. Suggestion: stale_state. BTW, this change is not necessary.

    vasild commented at 1:22 pm on March 20, 2026:

    Yeah, drop the unnecessary change. It burdens review now and later when people* dig into git blame and git log.

    * or robots


    Mccalabrese commented at 9:15 pm on March 21, 2026:
    Updated the PR
  6. in src/private_broadcast.cpp:100 in 6b69f048f2
     94@@ -95,9 +95,11 @@ std::vector<CTransactionRef> PrivateBroadcast::GetStale() const
     95     LOCK(m_mutex);
     96     const auto stale_time{NodeClock::now() - STALE_DURATION};
     97     std::vector<CTransactionRef> stale;
     98-    for (const auto& [tx, send_status] : m_transactions) {
     99-        const Priority p{DerivePriority(send_status)};
    100-        if (p.last_confirmed < stale_time) {
    101+    for (const auto& [tx, state] : m_transactions) {
    102+        const Priority p{DerivePriority(state.node_attempts)};
    103+        const bool is_unconfirmed{p.num_confirmed == 0};
    


    optout21 commented at 10:46 am on March 20, 2026:
    Putting this in its own variable doesn’t save code, it should be just used directly.
  7. in src/private_broadcast.cpp:101 in 6b69f048f2
     99-        const Priority p{DerivePriority(send_status)};
    100-        if (p.last_confirmed < stale_time) {
    101+    for (const auto& [tx, state] : m_transactions) {
    102+        const Priority p{DerivePriority(state.node_attempts)};
    103+        const bool is_unconfirmed{p.num_confirmed == 0};
    104+        const auto time_to_check{is_unconfirmed ? state.time_added : p.last_confirmed};
    


    optout21 commented at 10:47 am on March 20, 2026:
    I find the time_to_check name wrong/misleading, something like last_action_time would make more sense.

    vasild commented at 1:17 pm on March 20, 2026:

    So, when a transaction is being send we:

    1. pick a random peer from addrman, try to connect to it, if not then find another one and eventually send the transaction to the peer
    2. they send the transaction to their peers and
    3. eventually it comes back to us from somebody else (round-trip)

    Before this PR we consider a transaction as stale if the duration between 2. and 3. is too long - 1 minute. Because that is normally done in a few seconds.

    However with this PR we will also take into account the duration between 1. and 3. and that does not usually complete in seconds. It could take longer - having to timeout to an offline peer candidate, retry a few times.

    I think all this warrants a longer tolerance for transactions with p.num_confirmed == 0.

  8. in src/private_broadcast.h:180 in 6b69f048f2
    176@@ -177,9 +177,12 @@ class PrivateBroadcast
    177      */
    178     std::optional<TxAndSendStatusForNode> GetSendStatusByNode(const NodeId& nodeid)
    179         EXCLUSIVE_LOCKS_REQUIRED(m_mutex);
    180-
    181+    struct TxState {
    


    optout21 commented at 10:49 am on March 20, 2026:
    Naming: TxState is not a descriptive/good name, it’s too generic. I suggest TxSendStatus.
  9. in src/private_broadcast.h:182 in 6b69f048f2
    176@@ -177,9 +177,12 @@ class PrivateBroadcast
    177      */
    178     std::optional<TxAndSendStatusForNode> GetSendStatusByNode(const NodeId& nodeid)
    179         EXCLUSIVE_LOCKS_REQUIRED(m_mutex);
    180-
    181+    struct TxState {
    182+        NodeClock::time_point time_added{NodeClock::now()};
    183+        std::vector<SendStatus> node_attempts;
    


    optout21 commented at 10:51 am on March 20, 2026:
    Naming: please improve naming. E.g. node_status, peer_status.

    vasild commented at 1:18 pm on March 20, 2026:
    I think this should be called send_statuses.
  10. optout21 commented at 10:54 am on March 20, 2026: contributor
    • The change seems to accomplish the goal.
    • The test appears to correctly check for the problem; it fails without the change.
    • I’ve tried without the new time_added field, just extending the condition in GetStale():
    0    // Pick transactions that were confirmed long ago, but were picked long ago or not at all
    1    if (p.last_confirmed < stale_time && (p.num_picked == 0 || p.last_picked < stale_time)) {
    

    This solution also achieves the same result, but it’s simpler. However, the time_added-based is more robust (against error during the initial pick).

    • An extra test case testing staleness of a num_picked = 0 transaction would be helpful.
    • I feel the descriptiveness could be improved, e.g. by better naming.
    • Please take some extra care in reviewing/polishing LLM-generated code.
  11. in src/private_broadcast.h:181 in 6b69f048f2 outdated
    176@@ -177,9 +177,12 @@ class PrivateBroadcast
    177      */
    178     std::optional<TxAndSendStatusForNode> GetSendStatusByNode(const NodeId& nodeid)
    179         EXCLUSIVE_LOCKS_REQUIRED(m_mutex);
    180-
    181+    struct TxState {
    182+        NodeClock::time_point time_added{NodeClock::now()};
    


    andrewtoth commented at 1:10 pm on March 20, 2026:
    Should we return this data in getprivatebroadcastinfo as well? Could be interesting to the user.

    Mccalabrese commented at 11:42 pm on March 20, 2026:
    Added to PR
  12. vasild commented at 1:23 pm on March 20, 2026: contributor
    Approach ACK 6b69f048f2b50a237ae4027ab4cfc878e3c5b282
  13. vasild commented at 1:24 pm on March 20, 2026: contributor
    Could add Fixes: [#34862](/bitcoin-bitcoin/34862/) to the OP and to the commit message.
  14. Mccalabrese force-pushed on Mar 20, 2026
  15. Mccalabrese commented at 3:19 pm on March 20, 2026: none

    @vasild Good call, added the Fixes link to both the OP and the commit message

    Also updated the PR with the following changes based on the review feedback:

    • Grace Period: Added INITIAL_STALE_DURATION (5min) to give freshly added, unconfirmed transactions more time to find a peer before being flagged as stale, plus a new stale_unpicked_tx test case to cover it.
    • RPC Output: time_added is now correctly exposed to the user in getprivatebroadcastinfo (and fixed the structured binding in net_processing that tripped when expanding the info struct).
    • Naming: Renamed the wrapper to TxSendStatus and the vector to send_statuses.
    • Tests & Git Blame: Reverted the unnecessary BOOST_REQUIRE changes in the basic test case to keep git blame clean.

    On the test lines, directly indexing the temporary vector returned by GetStale() inside the BOOST macro was causing test instability. I stored it in stale_state first to ensure memory safety.

    I believe this is ready for another look

  16. andrewtoth approved
  17. andrewtoth commented at 3:33 pm on March 20, 2026: contributor
    ACK 50f0731926b035a23f0b0a590fcdf2e0da7dccb7
  18. DrahtBot requested review from vasild on Mar 20, 2026
  19. optout21 commented at 9:45 am on March 21, 2026: contributor

    Some administration issues:

    • Please split the single commit into two: the introduction of TxSendStatus that is easy to review but creates more diff lines, and the logic change in GetStale()
    • Please improve the PR title. “1970 epoch stale tx bug” is hard to understand (1970 is not very relevant). Derive a concise and specific description from the issue title&description. Also, remove the issue number from the title.
    • Add Closes [#34862](/bitcoin-bitcoin/34862/) to the description.
  20. Mccalabrese renamed this:
    net: fix 1970 epoch stale tx bug in private_broadcast - Issue #34862
    net: fix premature stale flagging of unpicked private broadcast txs
    on Mar 21, 2026
  21. net: introduce TxSendStatus to track private broadcast timestamps cf44e2419c
  22. net: delay stale evaluation for unconfirmed private broadcast txs 198a49e248
  23. Mccalabrese force-pushed on Mar 21, 2026
  24. Mccalabrese commented at 12:07 pm on March 21, 2026: none

    @optout21 Thanks for the review!

    Administrative updates are in: -changes are split into two distinct commits -Title updated to be more descriptive -Added Closes to the description

  25. andrewtoth commented at 11:28 pm on March 21, 2026: contributor

    ACK 198a49e24872ce28c9f88e12719850b07cdeda7f

    About the “Additional polish” section of the PR description:

    Applied const and uniform initialization {} to the updated GetStale() loop to align with modern style guidelines.

    I don’t see how you changed anything like this in the GetStale() loop.

    Swapped BOOST_CHECK_EQUAL for BOOST_REQUIRE_EQUAL in the test suite when evaluating vector sizes to prevent a known out-of-bounds 0x0 memory access violation if the size check fails.

    I don’t see this being done for some existing BOOST_CHECK_EQUAL. Only in the new unit test.

  26. Mccalabrese commented at 0:10 am on March 22, 2026: none

    @andrewtoth good catch, I forgot to edit the description after updating the commits based on feedback.

    I have edited the OP so it accurately reflects the diff. Thanks for the ACK!

  27. w0xlt commented at 8:05 am on March 22, 2026: contributor
    Concept ACK
  28. DrahtBot added the label CI failed on Mar 23, 2026
  29. DrahtBot commented at 5:46 am on March 23, 2026: contributor

    🚧 At least one of the CI tasks failed. Task test ancestor commits: https://github.com/bitcoin/bitcoin/actions/runs/23379165994/job/68126883006 LLM reason (✨ experimental): CI failed due to a build/compile error while compiling private_broadcast.cpp (CMake/gmake returned non-zero).

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-03-23 09:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me