Package Relay 1/3: Introduce TxDownloadManager and improve orphan-handling #28031

pull glozow wants to merge 17 commits into bitcoin:master from glozow:orphan-resolution-module changing 18 files +849 −293
  1. glozow commented at 3:30 pm on July 5, 2023: member

    See #27463 for full project tracking. Please see #27742 for how this PR fits into the big picture. This PR is based on a more recent commit than that one.

    This branch includes: (1) (now merged, see #30110) Introduces TxDownloadManager, which handles all transaction downloading. Adds tests for TxDownloadManager. (2) Adds an “orphan resolution module”. It adds all announcers of an orphan as potential resolution candidates, in a tracker implemented as a TxRequestTracker. In this PR, “orphan resolution” means requesting missing parents by getdata(MSG_TX | MSG_WITNESS_FLAG, missing_txid). In a future PR, we’ll add another resolution method, requesting ancestor wtxids using getdata(MSG_ANCPKGINFO, orphan_wtxid). (3) Makes TxDownloadManager internally thread-safe

  2. glozow added the label P2P on Jul 5, 2023
  3. DrahtBot commented at 3:30 pm on July 5, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/28031.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31397 (p2p: track and use all potential peers for orphan resolution by glozow)
    • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)
    • #27052 (test: rpc: add last block announcement time to getpeerinfo result by LarryRuane)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. glozow force-pushed on Jul 5, 2023
  5. DrahtBot added the label CI failed on Jul 5, 2023
  6. in src/net_processing.cpp:2966 in 543273d96e outdated
    2959@@ -2960,9 +2960,10 @@ bool PeerManagerImpl::ProcessOrphanTx(Peer& peer)
    2960         const MempoolAcceptResult result = m_chainman.ProcessTransaction(porphanTx);
    2961         const TxValidationState& state = result.m_state;
    2962         const uint256& orphanHash = porphanTx->GetHash();
    2963+        const uint256 orphan_wtxid = porphanTx->GetWitnessHash();
    2964 
    2965         if (result.m_result_type == MempoolAcceptResult::ResultType::VALID) {
    2966-            LogPrint(BCLog::MEMPOOL, "   accepted orphan tx %s\n", orphanHash.ToString());
    2967+            LogPrint(BCLog::MEMPOOL, "   accepted orphan tx %s\n", orphan_wtxid.ToString());
    


    instagibbs commented at 5:50 pm on July 5, 2023:
    make these be TXPACKAGES? then you get the entire “story” with a single log type (which helped me diagnose the Timeout issue)

    glozow commented at 3:04 pm on July 24, 2023:

    Done :+1: This now has 2 logs:

    • TXPACKAGES “accepted orphan tx (wtxid)”
    • MEMPOOL “AcceptToMemoryPool … (txid)” which matches the one in ProcessMessage for a tx message

    achow101 commented at 6:22 pm on July 31, 2023:

    In 543273d96e896adf5531ed961856aa0eb70cbe57 “[log] log ProcessOrphanTx() events by wtxid”

    Perhaps log both txid and wtxid?


    glozow commented at 9:13 pm on August 14, 2023:
    Logging both :+1:
  7. in src/node/txpackagetracker.cpp:30 in 116378efc1 outdated
    25+
    26+    /** Tracks orphans for which we need to request ancestor information. All hashes stored are
    27+     * wtxids, i.e., the wtxid of the orphan. However, the Announcement::m_is_wtxid field is used to
    28+     * indicate whether we would request the ancestor information by wtxid (via package relay) or by
    29+     * txid (via prevouts of the missing inputs). */
    30+    TxRequestTracker orphan_request_tracker GUARDED_BY(m_mutex);
    


    instagibbs commented at 6:02 pm on July 5, 2023:
    m_orphan_request_tracker :pray:

    glozow commented at 3:37 pm on July 24, 2023:
    Done
  8. in src/node/txpackagetracker.cpp:75 in 116378efc1 outdated
    70+    size_t OrphanageSize() { return m_orphanage.Size(); }
    71+    void MempoolAcceptedTx(const CTransactionRef& ptx) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    72+    {
    73+        LOCK(m_mutex);
    74+        m_orphanage.AddChildrenToWorkSet(*ptx);
    75+        m_orphanage.EraseTx(ptx->GetWitnessHash());
    


    instagibbs commented at 6:05 pm on July 5, 2023:

    seems wrong/very noisy without this? e.g., this line https://github.com/bitcoin/bitcoin/pull/28031/files#diff-ece439372a3e31da3141ed8fda99b37381e32cdab17ca26fffd5dfd916c300c8R124 will fire constantly

    0        m_orphanage.EraseTx(ptx->GetWitnessHash());
    1        orphan_request_tracker.ForgetTxHash(ptx->GetWitnessHash());
    

    glozow commented at 9:09 pm on August 14, 2023:
    Thanks, fixed
  9. in src/node/txpackagetracker.cpp:124 in 116378efc1 outdated
    119+        AssertLockNotHeld(m_mutex);
    120+        LOCK(m_mutex);
    121+        std::vector<std::pair<NodeId, GenTxid>> expired;
    122+        auto tracker_requestable = orphan_request_tracker.GetRequestable(nodeid, current_time, &expired);
    123+        for (const auto& entry : expired) {
    124+            LogPrint(BCLog::TXPACKAGES, "\nTimeout of inflight %s %s from peer=%d\n", entry.second.IsWtxid() ? "ancpkginfo" : "orphan parent",
    


    instagibbs commented at 6:08 pm on July 5, 2023:
    any principle on prefixing and postfixing \n to everything in these logs?

    mzumsande commented at 10:20 pm on July 12, 2023:
    I don’t think there should ever be a \n prefix, why would we want to separate the meta information (timestamp, threadname etc.) from the actual log entry?

    glozow commented at 3:31 pm on July 24, 2023:
    Removed the prefix
  10. in src/node/txpackagetracker.cpp:136 in 116378efc1 outdated
    131+            if (!ptx) {
    132+                // We can't request ancpkginfo and we have no way of knowing what the missing
    133+                // parents are (it could also be that the orphan has already been resolved).
    134+                // Give up.
    135+                orphan_request_tracker.ForgetTxHash(gtxid.GetHash());
    136+                LogPrint(BCLog::TXPACKAGES, "\nForgetting orphan %s from peer=%d\n", gtxid.GetHash().ToString(), nodeid);
    


    instagibbs commented at 6:10 pm on July 5, 2023:
    0                LogPrint(BCLog::TXPACKAGES, "\nForgetting orphan request %s from peer=%d\n", gtxid.GetHash().ToString(), nodeid);
    
  11. in src/node/txpackagetracker.cpp:80 in 116378efc1 outdated
    75+        m_orphanage.EraseTx(ptx->GetWitnessHash());
    76+    }
    77+    void MempoolRejectedTx(const uint256& wtxid) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    78+    {
    79+        LOCK(m_mutex);
    80+        m_orphanage.EraseTx(wtxid);
    


    instagibbs commented at 6:26 pm on July 5, 2023:

    trying to fetch after failure seems wrong

    0        m_orphanage.EraseTx(wtxid);
    1        orphan_request_tracker.ForgetTxHash(wtxid);
    

    glozow commented at 9:09 pm on August 14, 2023:
    Fixed
  12. in src/txorphanage.cpp:21 in 29d9d326d5 outdated
    13@@ -15,6 +14,16 @@ static constexpr int64_t ORPHAN_TX_EXPIRE_TIME = 20 * 60;
    14 /** Minimum time between orphan transactions expire time checks in seconds */
    15 static constexpr int64_t ORPHAN_TX_EXPIRE_INTERVAL = 5 * 60;
    16 
    17+void TxOrphanage::SubtractOrphanBytes(unsigned int size, NodeId peer)
    18+{
    19+    AssertLockHeld(m_mutex);
    20+    Assume(m_peer_bytes_used.count(peer) > 0);
    21+    Assume(m_peer_bytes_used.at(peer) >= size);
    


    instagibbs commented at 8:37 pm on July 5, 2023:
    since we’re using size_t for these fields, do we want to continue on with UB for release builds or do an assert?

    glozow commented at 9:10 pm on August 14, 2023:
    Changed these to if(!Assume(...)) return;
  13. in src/txorphanage.cpp:31 in 6cd8be8a4c outdated
    27@@ -28,6 +28,10 @@ void TxOrphanage::SubtractOrphanBytes(unsigned int size, NodeId peer)
    28 bool TxOrphanage::AddTx(const CTransactionRef& tx, NodeId peer)
    29 {
    30     LOCK(m_mutex);
    31+    if (tx == nullptr) {
    


    instagibbs commented at 8:41 pm on July 5, 2023:

    [txorphanage] handle AddTx(nullptr)

    Could we motivate this change in the commit message?


    glozow commented at 4:26 pm on July 24, 2023:
    Dropped the commit instead
  14. in src/net_processing.cpp:5926 in b5ab45e595 outdated
    5922@@ -5926,6 +5923,22 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
    5923             }
    5924         }
    5925 
    5926+        auto requestable_orphans = m_txpackagetracker->GetOrphanRequests(pto->GetId(), current_time);
    


    instagibbs commented at 8:51 pm on July 5, 2023:
    requestable_orphans is more like requestable_parents?

    glozow commented at 3:21 pm on July 24, 2023:
    (this is deleted now)
  15. in src/net_processing.cpp:5928 in b5ab45e595 outdated
    5922@@ -5926,6 +5923,22 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
    5923             }
    5924         }
    5925 
    5926+        auto requestable_orphans = m_txpackagetracker->GetOrphanRequests(pto->GetId(), current_time);
    5927+        for (const auto& gtxid : requestable_orphans) {
    5928+            if (AlreadyHaveTx(gtxid, /*include_orphanage=*/false)) {
    


    instagibbs commented at 9:00 pm on July 5, 2023:

    Suggested comment, something like:

    “We do not presume the parent will still be in the orphanage by the time a response is received, so we exclude the orphanage from the check when deciding what to request.”

    If that’s wrong, then it needs better explanation than what exists in the commit message :)


    glozow commented at 3:29 pm on July 24, 2023:

    Good point. The reason for excluding orphanage is actually not applicable yet so I have dropped it for now.

    This is really only applicable when we are requesting the ancpkginfo for a tx. We want to exclude orphanage because otherwise, AlreadyHaveTx will return true and we will never request ancpkginfos.

  16. in src/node/txpackagetracker.cpp:54 in b5ab45e595 outdated
    52+        for (const CTransactionRef& ptx : block.vtx) {
    53+            block_wtxids.insert(ptx->GetWitnessHash());
    54+        }
    55+        for (const auto& wtxid : wtxids_erased) {
    56+            if (block_wtxids.count(wtxid) == 0) {
    57+                conflicted_wtxids.insert(wtxid);
    


    instagibbs commented at 1:48 pm on July 6, 2023:

    why not just ForgetTxHash here? Why gate it on not being in the block?

    I’m presuming following commits will reveal the answer…


    glozow commented at 9:21 pm on August 14, 2023:
    Removed the special casing. I can’t remember why it comes into play later, but if I do, I’ll add it when it’s needed.
  17. in src/node/txpackagetracker.h:74 in b5ab45e595 outdated
    69+     */
    70+    void AddOrphanTx(NodeId nodeid, const uint256& wtxid, const CTransactionRef& tx, bool is_preferred, std::chrono::microseconds reqtime);
    71+
    72+    /** Number of packages we are working on with this peer. Includes any entries in the orphan
    73+     * tracker and in-flight requests. */
    74+    size_t Count(NodeId nodeid) const;
    


    instagibbs commented at 2:03 pm on July 6, 2023:
    in other words, it includes anything they’ve told us about, even if we haven’t acted on it in any way?

    glozow commented at 9:12 pm on August 14, 2023:
    Replaced the exposure of Count to CheckIsEmpty() functions
  18. in src/txorphanage.h:153 in 116378efc1 outdated
    123@@ -98,6 +124,9 @@ class TxOrphanage {
    124      *  transactions using their witness ids. */
    125     std::map<uint256, OrphanMap::iterator> m_wtxid_to_orphan_it GUARDED_BY(m_mutex);
    126 
    127+    /** Map from nodeid to the amount of orphans provided by this peer, in bytes. */
    


    instagibbs commented at 3:15 pm on July 6, 2023:
    Might be good to note that this will “multi-count” a single known orphan, counting the bytes for each node. Was wondering early in the PR why the update for this field and m_total_orphan_bytes wasn’t atomic

    glozow commented at 4:17 pm on July 24, 2023:
    Added in comment
  19. DrahtBot commented at 3:25 pm on July 6, 2023: contributor

    Looks like the CI fails:

    0�[0m�[0;31mp2p_orphan_handling.py                                 | ✖ Failed  | 2407 s
    
  20. glozow commented at 3:32 pm on July 6, 2023: member
    Investigating, thanks @DrahtBot
  21. in src/txorphanage.cpp:198 in d92b017f68 outdated
    194@@ -181,9 +195,15 @@ void TxOrphanage::AddChildrenToWorkSet(const CTransaction& tx)
    195         const auto it_by_prev = m_outpoint_to_orphan_it.find(COutPoint(tx.GetHash(), i));
    196         if (it_by_prev != m_outpoint_to_orphan_it.end()) {
    197             for (const auto& elem : it_by_prev->second) {
    198+                Assume(elem->second.announcers.size() >= 1);
    


    instagibbs commented at 3:37 pm on July 6, 2023:
    you allow this to be empty one line below with a break; just belt and suspenders?

    glozow commented at 4:17 pm on July 24, 2023:
    Yes, belt and suspenders - commented
  22. in src/txorphanage.cpp:201 in d92b017f68 outdated
    194@@ -181,9 +195,15 @@ void TxOrphanage::AddChildrenToWorkSet(const CTransaction& tx)
    195         const auto it_by_prev = m_outpoint_to_orphan_it.find(COutPoint(tx.GetHash(), i));
    196         if (it_by_prev != m_outpoint_to_orphan_it.end()) {
    197             for (const auto& elem : it_by_prev->second) {
    198+                Assume(elem->second.announcers.size() >= 1);
    199+                if (elem->second.announcers.empty()) break;
    200+                // Pick a random peer from announcers set.
    201+                FastRandomContext rng;
    


    instagibbs commented at 3:38 pm on July 6, 2023:
    initialize outside of loop?

    glozow commented at 4:17 pm on July 24, 2023:
    Done thanks
  23. in src/test/orphanage_tests.cpp:366 in b43033cfb8 outdated
    212+        BOOST_CHECK_EQUAL(orphanage.Size(), expected_total_count);
    213+        BOOST_CHECK_EQUAL(orphanage.TotalOrphanBytes(), expected_total_size);
    214+        BOOST_CHECK_EQUAL(orphanage.BytesFromPeer(node0), expected_node0_size);
    215+        BOOST_CHECK_EQUAL(orphanage.BytesFromPeer(node1), expected_node1_size);
    216+        // if EraseForPeer is called for an orphan with multiple announcers, the orphanage should only
    217+        // decrement the number of bytes for that peer.
    


    instagibbs commented at 3:50 pm on July 6, 2023:
    also check that EraseTxNoLock wasn’t called, i.e. m_orphans still has the orphan tx

    glozow commented at 4:21 pm on July 24, 2023:
    Added
  24. in src/test/orphanage_tests.cpp:383 in b43033cfb8 outdated
    227+        expected_total_size -= tx_size;
    228+        expected_node1_size -= tx_size;
    229+        BOOST_CHECK_EQUAL(orphanage.Size(), expected_total_count);
    230+        BOOST_CHECK_EQUAL(orphanage.TotalOrphanBytes(), expected_total_size);
    231+        BOOST_CHECK_EQUAL(orphanage.BytesFromPeer(node0), expected_node0_size);
    232+        BOOST_CHECK_EQUAL(orphanage.BytesFromPeer(node1), expected_node1_size);
    


    instagibbs commented at 3:50 pm on July 6, 2023:
    check that the orphan tx is now missing from m_orphans?

    glozow commented at 4:21 pm on July 24, 2023:
    Added
  25. in test/functional/p2p_segwit.py:2057 in 0d945feaf6 outdated
    2057@@ -2058,7 +2058,7 @@ def received_wtxidrelay():
    2058         test_transaction_acceptance(self.nodes[0], self.wtx_node, tx2, with_witness=True, accepted=False)
    2059 
    2060         # Expect a request for parent (tx) by txid despite use of WTX peer
    2061-        self.wtx_node.wait_for_getdata([tx.sha256], 60)
    


    instagibbs commented at 4:36 pm on July 6, 2023:
    can we programatically justify this magic :)
  26. instagibbs commented at 4:39 pm on July 6, 2023: member

    some initial comments through https://github.com/bitcoin/bitcoin/pull/28031/commits/116378efc1c9c1fe0d26cb42e2bdbb5770815c35

    Log changes suggested are helpful for tracing what’s happening in the orphanage on my node I’m testing.

  27. in src/txorphanage.cpp:34 in 116378efc1 outdated
    29 {
    30     LOCK(m_mutex);
    31+    if (tx == nullptr) {
    32+        Assume(false);
    33+        return false;
    34+    }
    


    dergoegge commented at 4:04 pm on July 12, 2023:

    Would be cleaner to write:

    0    if (!Assume(tx)) {
    1        return false;
    2    }
    

    glozow commented at 9:12 pm on August 14, 2023:
    Done
  28. in src/net_processing.cpp:1448 in 116378efc1 outdated
    1440@@ -1434,6 +1441,49 @@ void PeerManagerImpl::PushNodeVersion(CNode& pnode, const Peer& peer)
    1441     }
    1442 }
    1443 
    1444+void PeerManagerImpl::AddOrphanAnnouncer(NodeId nodeid, const uint256& orphan_wtxid, const CTransactionRef& tx, std::chrono::microseconds current_time)
    1445+{
    1446+    AssertLockHeld(::cs_main); // For m_txrequest
    1447+    const bool connected = m_connman.ForNode(nodeid, [](CNode* node) { return node->fSuccessfullyConnected && !node->fDisconnect; });
    1448+    if (!connected) return;
    


    dergoegge commented at 4:05 pm on July 12, 2023:
    This can be dropped in favor of just checking that the CNodeState exists

    glozow commented at 9:21 pm on August 14, 2023:
    This is gone now, replaced with registration of ConnectionInfo when the node first connects.
  29. in src/net_processing.cpp:1449 in 116378efc1 outdated
    1440@@ -1434,6 +1441,49 @@ void PeerManagerImpl::PushNodeVersion(CNode& pnode, const Peer& peer)
    1441     }
    1442 }
    1443 
    1444+void PeerManagerImpl::AddOrphanAnnouncer(NodeId nodeid, const uint256& orphan_wtxid, const CTransactionRef& tx, std::chrono::microseconds current_time)
    1445+{
    1446+    AssertLockHeld(::cs_main); // For m_txrequest
    1447+    const bool connected = m_connman.ForNode(nodeid, [](CNode* node) { return node->fSuccessfullyConnected && !node->fDisconnect; });
    1448+    if (!connected) return;
    1449+    if (m_txpackagetracker->Count(nodeid) + m_txrequest.Count(nodeid) >= MAX_PEER_TX_ANNOUNCEMENTS) {
    


    dergoegge commented at 4:09 pm on July 12, 2023:

    (I’ve mentioned this offline as well)

    The locking assumptions around this check are weird because m_txpackagetracker has its own internal mutex where as m_txrequest is guarded by cs_main. There is nothing stopping m_txpackagetracker->Count() form returning something different right after this check.


    glozow commented at 9:50 am on July 13, 2023:

    Yeah good point.

    It seems that, given the need to synchronize between TxRequestTracker and the package tracking stuff, we should have them both guarded by 1 lock. Looking at #26151#pullrequestreview-1116661944 it seems like we could have a m_txrequest GUARDED_BY(tx_download_mutex), m_txpackagetracker GUARDED_BY(tx_download_mutex), and lock it from these peerman functions?

    Alternatively, I wonder if it makes sense to take it a step further and put this all in a TxDownloadManager module that wraps orphanage, txrequest, and package tracking.

  30. in src/net_processing.cpp:1444 in 116378efc1 outdated
    1440@@ -1434,6 +1441,49 @@ void PeerManagerImpl::PushNodeVersion(CNode& pnode, const Peer& peer)
    1441     }
    1442 }
    1443 
    1444+void PeerManagerImpl::AddOrphanAnnouncer(NodeId nodeid, const uint256& orphan_wtxid, const CTransactionRef& tx, std::chrono::microseconds current_time)
    


    dergoegge commented at 4:10 pm on July 12, 2023:
    0void PeerManagerImpl::AddOrphanAnnouncer(NodeId nodeid, const CTransactionRef& tx, std::chrono::microseconds current_time)
    
  31. in src/txorphanage.cpp:26 in 116378efc1 outdated
    13@@ -15,14 +14,35 @@ static constexpr int64_t ORPHAN_TX_EXPIRE_TIME = 20 * 60;
    14 /** Minimum time between orphan transactions expire time checks in seconds */
    15 static constexpr int64_t ORPHAN_TX_EXPIRE_INTERVAL = 5 * 60;
    16 
    17+void TxOrphanage::SubtractOrphanBytes(unsigned int size, NodeId peer)
    


    dergoegge commented at 4:11 pm on July 12, 2023:
    Imo, it would be nicer to do the orphanage changes separately, including amended functional/unit/fuzz tests.

    glozow commented at 3:12 pm on September 14, 2023:

    Imo, it would be nicer to do the orphanage changes separately, including amended functional/unit/fuzz tests.

    Opened #28481 for orphanage changes

  32. in src/txorphanage.cpp:21 in 116378efc1 outdated
    13@@ -15,14 +14,35 @@ static constexpr int64_t ORPHAN_TX_EXPIRE_TIME = 20 * 60;
    14 /** Minimum time between orphan transactions expire time checks in seconds */
    15 static constexpr int64_t ORPHAN_TX_EXPIRE_INTERVAL = 5 * 60;
    16 
    17+void TxOrphanage::SubtractOrphanBytes(unsigned int size, NodeId peer)
    18+{
    19+    AssertLockHeld(m_mutex);
    20+    Assume(m_peer_bytes_used.count(peer) > 0);
    21+    Assume(m_peer_bytes_used.at(peer) >= size);
    


    dergoegge commented at 4:13 pm on July 12, 2023:

    Not a big fan of using Assume like this because we can’t test these conditions but they can happen in production (if the caller is doing something wrong).

    I’d suggest just returning if these assumptions don’t hold.


    glozow commented at 4:25 pm on July 24, 2023:
    Agree - returning when the assumption doesn’t hold
  33. in src/node/txpackagetracker.h:19 in 116378efc1 outdated
    14+class CBlock;
    15+class TxOrphanage;
    16+namespace node {
    17+static constexpr bool DEFAULT_ENABLE_PACKAGE_RELAY{false};
    18+
    19+class TxPackageTracker {
    


    dergoegge commented at 4:16 pm on July 12, 2023:
    Would be nice to add unit and fuzz tests for this
  34. in src/node/txpackagetracker.h:56 in 116378efc1 outdated
    51+    size_t OrphanageSize();
    52+
    53+    /** Should be called when a transaction is accepted to the mempool. If it was an orphan we were
    54+     * trying to resolve, remove its entries from the orphanage and other data structures. If it is
    55+     * the ancestor of an orphan, add the orphan to its associated peer's workset. */
    56+    void MempoolAcceptedTx(const CTransactionRef& ptx);
    


    dergoegge commented at 4:16 pm on July 12, 2023:
    0    void MempoolAcceptedTx(const CTransactionRef& tx);
    

    glozow commented at 4:19 pm on July 24, 2023:
    I actually think ptx is better, since it’s a pointer?
  35. in src/node/txpackagetracker.cpp:149 in 116378efc1 outdated
    144+            for (const auto& txin : ptx->vin) {
    145+                // We start with all parents, and then remove duplicates below.
    146+                unique_parents.push_back(txin.prevout.hash);
    147+            }
    148+            std::sort(unique_parents.begin(), unique_parents.end());
    149+            unique_parents.erase(std::unique(unique_parents.begin(), unique_parents.end()), unique_parents.end());
    


    dergoegge commented at 4:31 pm on July 12, 2023:
    0            // Add the orphan's parents. Net processing will filter out what we already have.
    1            // Deduplicate parent txids, so that we don't have to loop over
    2            // the same parent txid more than once down below.
    3            std::set<uint256> unique_parents;
    4            auto to_prevout = [](const CTxIn& in) { return in.prevout.hash; };
    5            std::transform(ptx->vin.begin(), ptx->vin.end(), std::inserter(unique_parents, unique_parents.begin()), to_prevout);
    
  36. in src/node/txpackagetracker.cpp:128 in 116378efc1 outdated
    123+        for (const auto& entry : expired) {
    124+            LogPrint(BCLog::TXPACKAGES, "\nTimeout of inflight %s %s from peer=%d\n", entry.second.IsWtxid() ? "ancpkginfo" : "orphan parent",
    125+                entry.second.GetHash().ToString(), entry.first);
    126+        }
    127+        std::vector<GenTxid> results;
    128+        for (const auto& gtxid : tracker_requestable) {
    


    dergoegge commented at 4:35 pm on July 12, 2023:

    Would you mind adding some spacing around logical blocks in your code? I find a lot of the new code hard to read with a lack of spacing

    e.g.:

     0    {
     1        AssertLockNotHeld(m_mutex);
     2        LOCK(m_mutex);
     3
     4        std::vector<std::pair<NodeId, GenTxid>> expired;
     5        auto tracker_requestable = orphan_request_tracker.GetRequestable(nodeid, current_time, &expired);
     6        for (const auto& entry : expired) {
     7            LogPrint(BCLog::TXPACKAGES, "\nTimeout of inflight %s %s from peer=%d\n", entry.second.IsWtxid() ? "ancpkginfo" : "orphan parent",
     8                entry.second.GetHash().ToString(), entry.first);
     9        }
    10
    11        std::vector<GenTxid> results;
    12        for (const auto& gtxid : tracker_requestable) {
    
  37. in test/functional/p2p_orphan_handling.py:160 in 116378efc1 outdated
    155+        orphan_wtxid, orphan_tx, parent_tx = self.create_package()
    156+        orphan_inv = CInv(t=MSG_WTX, h=int(orphan_wtxid, 16))
    157+
    158+        peer_inbound = node.add_p2p_connection(PeerTxRelayer())
    159+        peer_inbound.send_and_ping(msg_inv([orphan_inv]))
    160+        self.fastforward(TXREQUEST_TIME_SKIP)
    


    mzumsande commented at 6:56 pm on July 12, 2023:
    There should be a setmocktime call with the initial time somewhere before the first fastforward, otherwise this can fail intermittently.

    glozow commented at 4:42 pm on July 24, 2023:
    Thanks, added at the top of run_test
  38. in src/net_processing.cpp:2963 in 543273d96e outdated
    2959@@ -2960,9 +2960,10 @@ bool PeerManagerImpl::ProcessOrphanTx(Peer& peer)
    2960         const MempoolAcceptResult result = m_chainman.ProcessTransaction(porphanTx);
    2961         const TxValidationState& state = result.m_state;
    2962         const uint256& orphanHash = porphanTx->GetHash();
    2963+        const uint256 orphan_wtxid = porphanTx->GetWitnessHash();
    


    mzumsande commented at 7:35 pm on July 12, 2023:
    commit 543273d96e896adf5531ed961856aa0eb70cbe57: nit: uint256& like in the line above?

    glozow commented at 3:20 pm on July 24, 2023:
    Done, thanks
  39. in src/net_processing.cpp:5967 in 116378efc1 outdated
    5962+        for (const auto& gtxid : requestable_orphans) {
    5963+            if (AlreadyHaveTx(gtxid, /*include_orphanage=*/false)) {
    5964+                // We don't know that the transaction was rejected by mempool. But if the
    5965+                // transaction was added to mempool, we would have already called
    5966+                // MempoolAcceptedTx().
    5967+                m_txpackagetracker->MempoolRejectedTx(gtxid.GetHash());
    


    instagibbs commented at 7:39 pm on July 12, 2023:
    hmmm, this is a txid, not a wtxid, which is essentially a no-op unless it’s a non-segwit tx. What cases do we expect this MempoolRejectedTx is even needed? I added logging and will see if this case ever actually would be needed.

    instagibbs commented at 3:18 pm on July 20, 2023:
    also, adding wtxid logging at MEMPOOLREJ log is done would be <3

    glozow commented at 9:17 pm on August 14, 2023:
    Added MEMPOOLREJ log and successful AcceptToMemoryPool log for both orphan and non-orphan tx results.
  40. mzumsande commented at 10:26 pm on July 12, 2023: contributor
    some more comments, not finished yet though.
  41. in src/net_processing.cpp:2969 in 4cbb631753 outdated
    2965@@ -2966,7 +2966,7 @@ bool PeerManagerImpl::ProcessOrphanTx(Peer& peer)
    2966             LogPrint(BCLog::MEMPOOL, "   accepted orphan tx %s\n", orphan_wtxid.ToString());
    2967             RelayTransaction(orphanHash, porphanTx->GetWitnessHash());
    2968             m_txpackagetracker->AddChildrenToWorkSet(*porphanTx);
    2969-            m_txpackagetracker->EraseOrphanTx(orphanHash);
    2970+            m_txpackagetracker->EraseOrphanTx(porphanTx->GetWitnessHash());
    


    mzumsande commented at 10:31 pm on July 12, 2023:
    nit: could use orphan_wtxid introduced in the previous commit, here and in various other places.
  42. mzumsande commented at 8:47 pm on July 14, 2023: contributor

    Some thoughts about logging:

    After testing this on mainnet for a bit, I think it would be nice to be able to follow the fate of each individual orphan in BCLog::TXPACKAGES: We currently have a tx-level based log entry for addition (“stored orphan…"), but nothing for removal, so having additional log entries in MempoolAcceptedTx() for succesful resolution and MempoolRejectedTx() for failure would be nice, maybe also a tx-level message for removal due to expiration and overflow.

    If this became too spammy, we could use different severities, but I think it shouldn’t be too bad except if you just started with an empty mempool?

    Also, could we log wtxids whenever possible? There is currently a bit of a mix.

  43. in src/node/txpackagetracker.cpp:87 in b5ab45e595 outdated
    88+        LOCK(m_mutex);
    89+        // Skip if we weren't provided the tx and can't find the wtxid in the orphanage.
    90+        if (tx == nullptr && !m_orphanage.HaveTx(GenTxid::Wtxid(wtxid))) return;
    91+
    92+        // Even though this stores the orphan wtxid, GenTxid::Txid instead of Wtxid because we will be requesting the parents via txid.
    93+        orphan_request_tracker.ReceivedInv(nodeid, GenTxid::Txid(wtxid), is_preferred, reqtime);
    


    mzumsande commented at 9:23 pm on July 17, 2023:
    (moved over from 27742): The approach to add a wtxid disguised as a txid to distinguish between legacy orphan processing and package relay seems a bit like a hack to me. I’m not strictly against it, but I guess I don’t completely understand yet why it’s necessary - the information whether a peer supports package relay does not change, so why can’t we just always use GenTxid::Wtxid(wtxid) and check again in GetOrphanRequests() and elsewhere whether we do package relay with the peer instead of checking whether it’s a txid / wtxid?

    glozow commented at 1:16 pm on July 18, 2023:
    Ah good point, it’s not necessary, we could just use GenTxid::Wtxid for all of them. :+1:
  44. in src/node/txpackagetracker.h:24 in 116378efc1 outdated
    19+class TxPackageTracker {
    20+    class Impl;
    21+    const std::unique_ptr<Impl> m_impl;
    22+public:
    23+    struct Options {
    24+        unsigned int m_max_orphanage_count;
    


    ariard commented at 1:16 am on July 21, 2023:
    A comment can be added to precise this is the global not per-peer limit, at least on how it is used by LimitOrphans() (DEFAULT_MAX_ORPHAN_TRANSACTIONS).

    glozow commented at 4:49 pm on July 24, 2023:
    Added comment
  45. in src/node/txpackagetracker.cpp:156 in 116378efc1 outdated
    151+                results.emplace_back(GenTxid::Txid(txid));
    152+            }
    153+            // Mark the orphan as requested
    154+            orphan_request_tracker.RequestedTx(nodeid, gtxid.GetHash(), current_time + ORPHAN_ANCESTOR_GETDATA_INTERVAL);
    155+        }
    156+        if (!results.empty()) LogPrint(BCLog::TXPACKAGES, "\nRequesting %u items from peer=%d\n", results.size(), nodeid);
    


    instagibbs commented at 1:22 pm on July 21, 2023:
    This logging should be handled by the caller, since the caller sometimes isn’t actually requesting the items!
  46. DrahtBot added the label Needs rebase on Jul 25, 2023
  47. in src/txorphanage.h:81 in 29d9d326d5 outdated
    76+        return peer_bytes_it == m_peer_bytes_used.end() ? 0 : peer_bytes_it->second;
    77+    }
    78+
    79+    /** Remove a peer from an orphan's announcers list, erasing the orphan if this is the only peer
    80+     * who announced it. If the orphan doesn't exist or does not list this peer as an announcer, do nothing. */
    81+    void EraseOrphanOfPeer(const uint256& wtxid, NodeId peer) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
    


    achow101 commented at 6:39 pm on July 31, 2023:

    In 29d9d326d5193bb9a410a8881eabc93de5dd6266 “[txorphanage] track size of stored orphans, total and by peer”

    This function is unimplemented.


    glozow commented at 9:14 pm on August 14, 2023:
    Deleted from that commit
  48. in src/net_processing.cpp:4288 in 974d864419 outdated
    4284@@ -4286,6 +4285,7 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
    4285                 if (RecursiveDynamicUsage(*ptx) < 100000) {
    4286                     AddToCompactExtraTransactions(ptx);
    4287                 }
    4288+                m_txpackagetracker->MempoolRejectedTx(tx.GetWitnessHash());
    


    achow101 commented at 6:53 pm on July 31, 2023:

    In 974d864419dd98be6e32dec3ee11f5082b060b1b “[refactor] make TxPackageTracker responsible for EraseTx and AddChildrenToWorkset”

    This line seems a bit unrelated to this commit as there is no pre-existing EraseTx.


    glozow commented at 9:14 pm on August 14, 2023:
    Rearranged the commits so it’s more clear where the erasure is new
  49. in src/node/txpackagetracker.cpp:15 in b5ab45e595 outdated
    10+#include <txrequest.h>
    11+#include <util/hasher.h>
    12 
    13 namespace node {
    14+    /** How long to wait before requesting orphan ancpkginfo/parents from an additional peer. */
    15+    static constexpr auto ORPHAN_ANCESTOR_GETDATA_INTERVAL{60s};
    


    achow101 commented at 6:58 pm on July 31, 2023:

    In b5ab45e595bbcedbd602b6385b83e9ffd983f216 “[p2p/refactor] make TxPackageTracker responsible for orphan resolution”

    nit: indentation


    glozow commented at 9:14 pm on August 14, 2023:
    fixed
  50. glozow marked this as a draft on Aug 3, 2023
  51. glozow commented at 2:09 pm on August 3, 2023: member
    Opened #28199 with tests that may help us ensure this PR isn’t changing specific behaviors. Marking this as draft as it depends on that one.
  52. glozow force-pushed on Aug 14, 2023
  53. glozow commented at 9:15 pm on August 14, 2023: member
    Reworked this into a TxDownloadManager (https://github.com/bitcoin/bitcoin/pull/28031#discussion_r1262316909) and addressed most comments.
  54. DrahtBot removed the label Needs rebase on Aug 14, 2023
  55. glozow force-pushed on Aug 15, 2023
  56. DrahtBot added the label Needs rebase on Aug 21, 2023
  57. achow101 referenced this in commit 5aa67eb365 on Aug 22, 2023
  58. glozow force-pushed on Aug 29, 2023
  59. DrahtBot removed the label Needs rebase on Aug 29, 2023
  60. glozow commented at 1:35 pm on August 29, 2023: member

    Some thoughts about logging:

    After testing this on mainnet for a bit, I think it would be nice to be able to follow the fate of each individual orphan in BCLog::TXPACKAGES: We currently have a tx-level based log entry for addition (“stored orphan…"), but nothing for removal, so having additional log entries in MempoolAcceptedTx() for succesful resolution and MempoolRejectedTx() for failure would be nice, maybe also a tx-level message for removal due to expiration and overflow.

    If this became too spammy, we could use different severities, but I think it shouldn’t be too bad except if you just started with an empty mempool?

    Also, could we log wtxids whenever possible? There is currently a bit of a mix.

    also, adding wtxid logging at MEMPOOLREJ log is done would be <3 @mzumsande @instagibbs I’ve split the commits adding new logs and wtxids to the logging to its own small PR, #28364

    (Also, rebased.)

  61. in src/txorphanage.cpp:229 in b34c7ba883 outdated
    233@@ -226,7 +234,7 @@ void TxOrphanage::EraseForBlock(const CBlock& block)
    234             if (itByPrev == m_outpoint_to_orphan_it.end()) continue;
    235             for (auto mi = itByPrev->second.begin(); mi != itByPrev->second.end(); ++mi) {
    236                 const CTransaction& orphanTx = *(*mi)->second.tx;
    237-                const uint256& orphanHash = orphanTx.GetHash();
    


    instagibbs commented at 3:31 pm on August 29, 2023:
    let’s rename orphanHash while we’re here to something that doesn’t sound like txid

    glozow commented at 3:02 pm on September 14, 2023:
    Renamed to orphan_wtxid in #28481
  62. in src/net_processing.cpp:4009 in 1ea66e8c2c outdated
    4247-            std::vector<uint256> unique_parents;
    4248-            unique_parents.reserve(tx.vin.size());
    4249-            for (const CTxIn& txin : tx.vin) {
    4250-                // We start with all parents, and then remove duplicates below.
    4251-                unique_parents.push_back(txin.prevout.hash);
    4252+            for (const CTransactionRef& removedTx : result.m_replaced_transactions.value()) {
    


    instagibbs commented at 3:48 pm on August 29, 2023:

    in 1ea66e8c2ce9e05e1346c1363196073eec554292 [refactor] consolidate invalid ATMP processing

    this loop is already taken care of in ProcessValidTx, no?

  63. in src/net_processing.cpp:2957 in 1ea66e8c2c outdated
    3106+                orphan_wtxid.ToString(),
    3107+                peer.m_id,
    3108+                state.ToString());
    3109+            // Ignoring the return value. Within orphan processing, we do not make
    3110+            // additional orphan resolution requests when a transaction is missing inputs.
    3111+            ProcessInvalidTx(porphanTx, peer.m_id, state);
    


    instagibbs commented at 3:56 pm on August 29, 2023:

    note: this is now being called in the case of TX_MISSING_INPUTS where it wasn’t prior

    is this a behavior change? Looks like if it has rejected parents additional things can happen.


    glozow commented at 3:27 pm on September 14, 2023:

    Yes, behavior change. To make more concrete:

    1. We get an orphan C with 2 missing parents, A and B
    2. We reject A
    3. We get B and accept it
    4. We add C to workset and process it in ProcessOrphanTx.
    5. Here, since we find A in recent_rejects, we also add C to recent_rejects when we wouldn’t have before.

    I think this is good actually… :thinking:


    instagibbs commented at 5:37 pm on September 15, 2023:
    IIRC I was fine with it, just making sure I understood!
  64. in src/txorphanage.h:71 in af6602e7ff outdated
    66+    unsigned int TotalOrphanBytes() const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    67+    {
    68+        LOCK(m_mutex);
    69+        return m_total_orphan_bytes;
    70+    }
    71+    /** Return total amount of orphans stored by this transaction, in bytes. */
    


    instagibbs commented at 4:07 pm on August 29, 2023:
    0    /** Return total amount of orphans stored by this peer, in bytes. */
    

    glozow commented at 3:07 pm on September 14, 2023:
    haha, fixed in #28481
  65. in src/txorphanage.cpp:313 in 0e27954b71 outdated
    298@@ -266,3 +299,24 @@ void TxOrphanage::EraseForBlock(const CBlock& block)
    299         LogPrint(BCLog::TXPACKAGES, "Erased %d orphan tx included or conflicted by block\n", nErased);
    300     }
    301 }
    302+void TxOrphanage::EraseOrphanOfPeer(const uint256& wtxid, NodeId peer)
    303+{
    304+    AssertLockNotHeld(m_mutex);
    305+    LOCK(m_mutex);
    306+    // Nothing to do if this peer isn't storing any orphans.
    


    instagibbs commented at 4:08 pm on August 29, 2023:
    is this a necessary optimization? the next check would catch this case as well

    glozow commented at 3:01 pm on September 14, 2023:
    You’re right, removed in #28481
  66. in src/txorphanage.h:52 in ed557d07c6 outdated
    44@@ -45,13 +45,13 @@ class TxOrphanage {
    45 
    46     /** Maybe erase all orphans announced by a peer (eg, after that peer disconnects). If an orphan
    47      * has been announced by another peer, don't erase, just remove this peer from the list of announcers. */
    48-    void EraseForPeer(NodeId peer) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
    49+    std::vector<uint256> EraseForPeer(NodeId peer) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
    


    instagibbs commented at 4:23 pm on August 29, 2023:
    I don’t see it being used anywhere?

    glozow commented at 3:00 pm on September 14, 2023:
    Removed, I think in a past implementation I was using it but I’m not anymore
  67. in src/txorphanage.h:99 in ed557d07c6 outdated
    103@@ -101,6 +104,8 @@ class TxOrphanage {
    104         int64_t nTimeExpire;
    105         size_t list_pos;
    106         std::set<NodeId> announcers;
    107+        /** Txids of the missing parents to request. Determined by peerman. */
    


    instagibbs commented at 4:26 pm on August 29, 2023:

    in [txorphanage] store parent txids in OrphanTx

    Is this just the deduplicated set of txids to possibly fetch, if still in orphanage?


    glozow commented at 2:57 pm on September 14, 2023:
    Yes
  68. in src/net_processing.cpp:1883 in 94f9de3eac outdated
    1879@@ -1880,6 +1880,15 @@ void PeerManagerImpl::BlockConnected(const std::shared_ptr<const CBlock>& pblock
    1880         }
    1881     }
    1882 
    1883+    if (m_chainman.ActiveChain().Tip()->GetBlockHash() != hashRecentRejectsChainTip) {
    


    instagibbs commented at 4:52 pm on August 29, 2023:
    if a block is being connected, why don’t we just unconditionally wipe m_recent_rejects? Could delete hashRecentRejectsChainTip as well
  69. in src/node/txdownloadman.h:67 in 5d380c3b88 outdated
    67@@ -68,6 +68,13 @@ class TxDownloadManager {
    68 
    69     /** Should be called when a notfound for a tx has been received. */
    70     void ReceivedNotFound(NodeId nodeid, const std::vector<uint256>& txhashes) { m_impl->ReceivedNotFound(nodeid, txhashes); }
    71+
    72+    /** Add a new orphan transaction. Returns whether this orphan is going to be processed and the
    


    instagibbs commented at 5:22 pm on August 29, 2023:

    can already exist in orphanage

    0    /** Add a potentially new orphan transaction. Returns whether this orphan is going to be processed and the
    

    glozow commented at 3:43 pm on September 14, 2023:
    added
  70. in src/net_processing.cpp:4016 in 5d380c3b88 outdated
    4038-                    AddKnownTx(*peer, parent_txid);
    4039-                    m_txdownloadman.ReceivedTxInv(pfrom.GetId(), gtxid, current_time);
    4040-                }
    4041-
    4042-                if (m_txdownloadman.GetOrphanageRef().AddTx(ptx, pfrom.GetId(), unique_parents)) {
    4043+                const auto& [will_process_orphan, unique_parents] = m_txdownloadman.NewOrphanTx(ptx, pfrom.GetId(), current_time);
    


    instagibbs commented at 5:28 pm on August 29, 2023:

    I find the should_process/will_process interaction a bit non-obvious on the surface

    It’s detecting an orphan that you may want to fetch, then attempting to add to the orphanage, and checking if it’s newly-entered?


    glozow commented at 3:55 pm on September 14, 2023:
    Yep. We only want to AddToCompactExtraTransactions if this is a new orphan that we’re actually keeping. We might fail to add it to orphanage if it’s full.
  71. in src/net_processing.cpp:690 in f6e0d98b1f outdated
    687@@ -690,8 +688,7 @@ class PeerManagerImpl final : public PeerManager
    688     CTxMemPool& m_mempool;
    689 
    690     /** Protects tx download, rejection filter. */
    


    instagibbs commented at 5:38 pm on August 29, 2023:
    nothing being protected here anymore

    glozow commented at 3:53 pm on September 14, 2023:
    Removed comment
  72. in src/txrequest.h:205 in 711419ffc0 outdated
    200@@ -195,6 +201,9 @@ class TxRequestTracker {
    201     /** Count how many announcements are being tracked in total across all peers and transaction hashes. */
    202     size_t Size() const;
    203 
    204+    /** For some tx hash (either txid or wtxid), return all peers with non-COMPLETED announcements. */
    205+    std::vector<NodeId> GetCandidatePeers(const uint256& txhash) const;
    


    instagibbs commented at 5:41 pm on August 29, 2023:
    would it make sense to have the arg be a GenTxid

    glozow commented at 3:44 pm on September 14, 2023:
    Probably not?
  73. in src/node/txdownload_impl.cpp:35 in c437c48e60 outdated
    31 void TxDownloadImpl::BlockConnected(const CBlock& block, const uint256& tiphash)
    32     EXCLUSIVE_LOCKS_REQUIRED(!m_tx_download_mutex)
    33 {
    34     LOCK(m_tx_download_mutex);
    35-    m_orphanage.EraseForBlock(block);
    36+    const auto erased_wtxids = m_orphanage.EraseForBlock(block);
    


    instagibbs commented at 5:44 pm on August 29, 2023:
    can you push it down to where erased_wtxids is used since it’s only used once?

    glozow commented at 3:40 pm on September 14, 2023:
    Done
  74. in src/node/txdownload_impl.cpp:239 in c437c48e60 outdated
    229@@ -216,6 +230,39 @@ std::vector<GenTxid> TxDownloadImpl::GetRequestsToSend(NodeId nodeid, std::chron
    230     EXCLUSIVE_LOCKS_REQUIRED(!m_tx_download_mutex)
    231 {
    232     LOCK(m_tx_download_mutex);
    233+    // First process orphan resolution so that the tx requests can be sent asap
    234+    std::vector<std::pair<NodeId, GenTxid>> expired_orphan_resolution;
    235+    const auto orphans_ready = m_orphan_resolution_tracker.GetRequestable(nodeid, current_time, &expired_orphan_resolution);
    236+    // Expire orphan resolution attempts
    237+    for (const auto& [nodeid, orphan_gtxid] : expired_orphan_resolution) {
    238+        LogPrintf("timeout of in-flight orphan resolution %s for peer=%d\n", orphan_gtxid.GetHash().ToString(), nodeid);
    


    instagibbs commented at 5:50 pm on August 29, 2023:
    do we really want to log unconditionally?

    glozow commented at 3:42 pm on September 14, 2023:
    oop changed
  75. in src/node/txdownload_impl.cpp:262 in c437c48e60 outdated
    255+            }
    256+            m_orphan_resolution_tracker.RequestedTx(nodeid, orphan_gtxid.GetHash(),
    257+                                                    current_time + ORPHAN_ANCESTOR_GETDATA_INTERVAL);
    258+        } else {
    259+            LogPrint(BCLog::TXPACKAGES, "couldn't find parent txids to resolve orphan %s with peer=%d\n",
    260+                     nodeid, orphan_gtxid.GetHash().ToString());
    


    instagibbs commented at 6:00 pm on August 29, 2023:
    print args are flipped

    glozow commented at 3:43 pm on September 14, 2023:
    Fixed
  76. in src/node/txdownload_impl.cpp:289 in 7f9db92ba3 outdated
    254+                         txid.ToString(), nodeid, orphan_gtxid.GetHash().ToString());
    255+            }
    256+            m_orphan_resolution_tracker.RequestedTx(nodeid, orphan_gtxid.GetHash(),
    257+                                                    current_time + ORPHAN_ANCESTOR_GETDATA_INTERVAL);
    258+        } else {
    259+            LogPrint(BCLog::TXPACKAGES, "couldn't find parent txids to resolve orphan %s with peer=%d\n",
    


    instagibbs commented at 6:35 pm on August 29, 2023:
    under what circumstances is this expected? Eviction of the orphan in the orphanage, but outstanding resolution request? A comment to the expected conditions would be helpful for documentation.

    instagibbs commented at 8:05 pm on August 29, 2023:

    ok, it’s happening because of: https://github.com/bitcoin/bitcoin/pull/28031/files#r1309300862

    see logs:

    02023-08-29T19:19:00.066912Z [mempoolrej] 259b016ce9258c536a0a222728795aa6c22de47dbd6e166f4330e33ce075fe31 (wtxid=d6dc1525169700841c605385514a6069959f86bceccd33ede3fce714fa699219) from peer=2 was not accepted: bad-txns-inputs-missingorspent
    12023-08-29T19:19:00.066970Z [txpackages] stored orphan tx 259b016ce9258c536a0a222728795aa6c22de47dbd6e166f4330e33ce075fe31 (wtxid=d6dc1525169700841c605385514a6069959f86bceccd33ede3fce714fa699219) (mapsz 4 outsz 5)
    22023-08-29T19:19:00.066990Z [txpackages] adding peer=2 as a candidate for resolving orphan d6dc1525169700841c605385514a6069959f86bceccd33ede3fce714fa699219
    32023-08-29T19:19:00.067015Z [txpackages] adding peer=6 as a candidate for resolving orphan d6dc1525169700841c605385514a6069959f86bceccd33ede3fce714fa699219
    42023-08-29T19:19:00.067027Z [txpackages] added peer=6 as announcer of orphan tx d6dc1525169700841c605385514a6069959f86bceccd33ede3fce714fa699219
    52023-08-29T19:19:04.361970Z [txpackages] adding peer=21 as a candidate for resolving orphan 259b016ce9258c536a0a222728795aa6c22de47dbd6e166f4330e33ce075fe31
    62023-08-29T19:19:06.399524Z [txpackages] couldn't find parent txids to resolve orphan 21 with peer=259b016ce9258c536a0a222728795aa6c22de47dbd6e166f4330e33ce075fe31
    72023-08-29T19:20:02.183893Z timeout of in-flight orphan resolution d6dc1525169700841c605385514a6069959f86bceccd33ede3fce714fa699219 for peer=6
    82023-08-29T19:21:02.206171Z timeout of in-flight orphan resolution d6dc1525169700841c605385514a6069959f86bceccd33ede3fce714fa699219 for peer=2
    92023-08-29T19:21:02.206223Z [txpackages]    removed orphan tx 259b016ce9258c536a0a222728795aa6c22de47dbd6e166f4330e33ce075fe31 (wtxid=d6dc1525169700841c605385514a6069959f86bceccd33ede3fce714fa699219)
    
  77. instagibbs commented at 6:47 pm on August 29, 2023: member

    went over new-er implementation through 7f9db92ba3f483f78bd248406d921bd73b1e7054

    dropping some comments

    fuzzer fail:

    0node/txdownload_impl.cpp:18 ConnectedPeer: Assertion `m_peer_info.count(nodeid) == 0' failed.
    
  78. in src/node/txdownload_impl.cpp:199 in 7f9db92ba3 outdated
    194+
    195+void TxDownloadImpl::AddTxAnnouncement(NodeId peer, const GenTxid& gtxid, std::chrono::microseconds now)
    196+    EXCLUSIVE_LOCKS_REQUIRED(m_tx_download_mutex)
    197+{
    198+    if (!Assume(m_peer_info.count(peer) > 0)) return;
    199+    if (m_orphanage.HaveTx(gtxid)) AddOrphanAnnouncer(peer, gtxid.GetHash(), now);
    


    instagibbs commented at 8:05 pm on August 29, 2023:
    this adds an announcer even if the INV is txid type, even though the assumption is wtxid arg

    instagibbs commented at 3:54 pm on August 30, 2023:
    I think the simplest solution is just filter this addition based on wtxid-ness, since this only helps in the multi-announcer case, which wouldn’t be a regression, iiuc

    glozow commented at 11:57 am on September 4, 2023:
    ooh thanks :+1:

    glozow commented at 7:57 am on September 13, 2023:
    Fixed, just did a gate on IsWtxid. Don’t think that many non-wtxidrelay nodes are still out there anyway.
  79. in src/node/txdownload_impl.cpp:250 in 7f9db92ba3 outdated
    245+        const auto parent_txids{m_orphanage.GetParentTxids(orphan_gtxid.GetHash())};
    246+        if (parent_txids.has_value()) {
    247+            if (!Assume(m_peer_info.count(nodeid) > 0)) continue;
    248+            const auto& info = m_peer_info.at(nodeid).m_connection_info;
    249+            for (const auto& txid : *parent_txids) {
    250+                // Schedule with no delay. It should be requested immediately
    


    instagibbs commented at 3:53 pm on August 30, 2023:
    0                // Schedule with no delay, i.e. not using ReceivedTxInv(). It should be requested immediately
    

    glozow commented at 7:56 am on September 13, 2023:
    Added
  80. achow101 referenced this in commit 5666966dff on Aug 31, 2023
  81. DrahtBot added the label Needs rebase on Aug 31, 2023
  82. Frank-GER referenced this in commit 52cce72c8f on Sep 8, 2023
  83. instagibbs commented at 7:35 pm on September 12, 2023: member
    think this can be rebased since tests are merged
  84. glozow force-pushed on Sep 13, 2023
  85. glozow commented at 7:56 am on September 13, 2023: member
    Rebased but not all comments addressed yet, still working on it.
  86. DrahtBot removed the label Needs rebase on Sep 13, 2023
  87. glozow force-pushed on Sep 14, 2023
  88. glozow renamed this:
    Package Relay 1/3: Introduce TxPackageTracker as Orphan Resolution Module
    Package Relay 1/3: Introduce TxDownloadManager and improve orphan-handling
    on Sep 15, 2023
  89. glozow force-pushed on Sep 29, 2023
  90. DrahtBot added the label Needs rebase on Oct 2, 2023
  91. glozow force-pushed on Oct 3, 2023
  92. glozow force-pushed on Oct 3, 2023
  93. DrahtBot removed the label Needs rebase on Oct 3, 2023
  94. glozow force-pushed on Oct 12, 2023
  95. in src/node/txdownload_impl.cpp:209 in 36f4d0d544 outdated
    270+            const auto& info = m_peer_info.at(nodeid).m_connection_info;
    271+            for (const auto& txid : *parent_txids) {
    272+                // Schedule with no delay instead of using ReceivedTxInv. This means it's scheduled
    273+                // for request immediately unless there is already a request out for the same txhash
    274+                // (e.g. if there is another orphan that needs this parent).
    275+                m_txrequest.ReceivedInv(nodeid, GenTxid::Txid(txid), info.m_preferred, current_time);
    


    instagibbs commented at 8:56 pm on October 12, 2023:

    caught by your new fuzzer: Note that the parent txid is already in the orphanage by this line, we try to re-request something we already have locally in our orphanage.

    0                if (m_orphanage.HaveTxAndPeer(GenTxid::Txid(txid), nodeid)) continue;
    1                m_txrequest.ReceivedInv(nodeid, GenTxid::Txid(txid), info.m_preferred, current_time);
    
  96. in src/test/fuzz/txorphan.cpp:97 in 36f4d0d544 outdated
    92@@ -91,8 +93,11 @@ FUZZ_TARGET(txorphan, .init = initialize_orphanage)
    93                     {
    94                         CTransactionRef ref = orphanage.GetTxToReconsider(peer_id);
    95                         if (ref) {
    96-                            bool have_tx = orphanage.HaveTx(GenTxid::Txid(ref->GetHash())) || orphanage.HaveTx(GenTxid::Wtxid(ref->GetHash()));
    97-                            Assert(have_tx);
    98+                            Assert(orphanage.HaveTx(GenTxid::Txid(ref->GetHash())) ||
    99+                                   orphanage.HaveTx(GenTxid::Wtxid(ref->GetHash())));
    


    instagibbs commented at 4:29 pm on October 13, 2023:

    Looks like this happens in a few places? wouldn’t type safety for the constructor be nice

    0                                   orphanage.HaveTx(GenTxid::Wtxid(ref->GetWitnessHash())));
    

    glozow commented at 4:20 pm on October 16, 2023:
    hahahahaha Unfortunately #28107 doesn’t help in this case since this function takes a gentxid, but I’ve rebased on top of that PR.
  97. in src/txorphanage.cpp:157 in 36f4d0d544 outdated
    219         size_t randompos = rng.randrange(m_orphan_list.size());
    220-        EraseTxNoLock(m_orphan_list[randompos]->first);
    221+        const auto& wtxid = m_orphan_list[randompos]->second.tx->GetWitnessHash();
    222+        expired_and_evicted.emplace_back(wtxid);
    223+        EraseTxNoLock(wtxid);
    224         ++nEvicted;
    


    instagibbs commented at 6:10 pm on October 13, 2023:
    we should += like the above call or do ++ for both I think

    glozow commented at 1:56 pm on October 17, 2023:
    done
  98. in src/txorphanage.cpp:331 in 36f4d0d544 outdated
    329     if (vOrphanErase.size()) {
    330         int nErased = 0;
    331-        for (const uint256& orphanHash : vOrphanErase) {
    332-            nErased += EraseTxNoLock(orphanHash);
    333+        for (const uint256& orphan_wtxid : vOrphanErase) {
    334+            nErased += EraseTxNoLock(orphan_wtxid);
    


    instagibbs commented at 6:28 pm on October 13, 2023:

    for fuzz/txorphan

    0echo Kv//////////////lJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSU//////////////////////////////+A////////kEEKJf//ev///////8P///////8AAIH//wkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkB/wD8///SQP///////////////////////////////////4D///////+QQQol//96////////w/////////8KJf//ev//////////kEEKJf//1v///wP/////kEH///+QQQo1//8BPwAAv67Kf/////8u//////////////////////////////////////////////////////////////////////////96AP/1///////S0P//IP96AP////////+A////////kEEKJf//ev//////////kEEKJf//1v///wP/////kEH///+QQQo1//96/////P////+QQQol/yX//3r//////////wol///W//////////97e3t7e3t7e3t7e3t7e3t7e3t7e/+QQQol//96//////////////8s/yb/ | base64 -d > crash.txt
    

    These call EraseTxNoLock without doing m_peer_work_set management: TxOrphanage::EraseForBlock TxOrphanage::LimitOrphans

    This results in fuzzer crashes where it’s (still) in the peer’s work set, but not counted as an announcer since it’s been evicted(then later sent).

    Any time EraseTx{NoLock} is called we need to make sure that m_peer_work_set is appropriately being cleaned.

    Two other spots I haven’t thought through yet but I suspect need the same fix: TxDownloadImpl::MempoolAcceptedTx TxDownloadImpl::MempoolRejectedTx

    edit:

    latter two mempool things are called; cleanup needs to happen! working on it


    instagibbs commented at 2:46 pm on October 16, 2023:

    pushed another couple fixes to https://github.com/instagibbs/bitcoin/tree/orphan-resolution-module-oct-12 , was same issue as before just in

    TxDownloadImpl::MempoolAcceptedTx TxDownloadImpl::MempoolRejectedTx

    as noted before


    glozow commented at 8:56 am on October 17, 2023:

    Is it a problem to lazily handle transactions in peer work sets that have been deleted from the orphanage? It’s cleaner to remove_work_from_all_sets of course, but it iterates through every entry in m_peer_work_set so there is a performance difference but the external outcome is the same.

    I think erasing proactively in EraseOrphanOfPeer is necessary because the tx isn’t deleted. Trying to avoid returning the tx in GetTxToReconsider(peer) after we’ve already decided to give up on (tx, peer).


    glozow commented at 9:11 am on October 17, 2023:

    TxOrphanage::EraseForBlock TxOrphanage::LimitOrphans

    This results in fuzzer crashes where it’s (still) in the peer’s work set, but not counted as an announcer since it’s been evicted(then later sent).

    Question: is it crashing because the tx is being returned from GetTxToReconsider? Seems weird as it should return a nullptr if it gets a tx that isn’t there anymore…

    OOhhhhhhhhh could it be that it’s re-added? My guess:

    1. AddTx(tx_child, peer)
    2. AddChildrenToWorkSet(tx_parent) -> peer workset contains tx_child
    3. EraseTx(tx_child) -> peer workset still contains tx_child but we expect orphanage to lazily delete that when we call GetTxToReconsider
    4. AddTx(tx_child, other_peer) (makes no sense but we’re fuzzing)
    5. GetTxToReconsider(peer) returns tx_child even though it’s not in orphan_resolution_tracker

    instagibbs commented at 1:43 pm on October 17, 2023:

    Is it a problem to lazily handle transactions in peer work sets that have been deleted from the orphanage?

    Since GetTxToReconsider appears to be used in a single spot, and it really doesn’t matter that it once was announced, then erased, then reconsidered for a peer, I suppose the fuzz test should just be modified to not crash :)


    glozow commented at 1:59 pm on October 17, 2023:
    Removed Assert(HaveTxAndPeer()) in the fuzzer, and added comments for when we clean up the work sets and when we don’t
  99. instagibbs commented at 6:47 pm on October 13, 2023: member

    Some not snapshot-ready fixes for fuzz crashes I worked through:

    https://github.com/instagibbs/bitcoin/commit/5388830c3c5bb0ed9a1d1379ec94b285de68aa9a

  100. in src/test/fuzz/txdownloadman.cpp:428 in 36f4d0d544 outdated
    381+                // nullptr, as the transaction could have been removed from orphanage without being
    382+                // removed from the peer's workset.
    383+                if (ptx) {
    384+                    // However, if there was a non-null tx in the workset, HaveMoreWork should have
    385+                    // returned true.
    386+                    Assert(expect_work);
    


    instagibbs commented at 2:21 pm on October 16, 2023:
    should be able to assert that there also exists an announcement for this node?

    glozow commented at 8:45 am on October 17, 2023:
    an entry in orphan_resolution_tracker?

    instagibbs commented at 1:46 pm on October 17, 2023:
    Was going to suggest adding a check for OrphanTx.announcers, but it may not actually exist as discussed elsewhere.

    glozow commented at 2:00 pm on October 17, 2023:
    That, and we can’t check announcers because it’s private
  101. glozow force-pushed on Oct 16, 2023
  102. DrahtBot added the label Needs rebase on Oct 16, 2023
  103. glozow force-pushed on Oct 17, 2023
  104. instagibbs commented at 2:04 pm on October 17, 2023: member

    worked through https://github.com/bitcoin/bitcoin/pull/28031/commits/e2e262fc1f6ea41d6f46b8db713c12df035464f6

    All suggestions taken(or withdrawn by myself).

  105. glozow commented at 2:44 pm on October 20, 2023: member
    Note I know there’s a “needs rebase” label but that’s because I rebased on top of 28107
  106. glozow force-pushed on Dec 18, 2023
  107. DrahtBot removed the label Needs rebase on Dec 18, 2023
  108. glozow force-pushed on Jan 2, 2024
  109. DrahtBot added the label Needs rebase on Mar 9, 2024
  110. achow101 referenced this in commit d813ba1bc4 on Apr 30, 2024
  111. glozow force-pushed on Jul 31, 2024
  112. DrahtBot removed the label CI failed on Jul 31, 2024
  113. DrahtBot removed the label Needs rebase on Jul 31, 2024
  114. DrahtBot added the label Needs rebase on Aug 1, 2024
  115. glozow force-pushed on Aug 1, 2024
  116. DrahtBot removed the label Needs rebase on Aug 1, 2024
  117. DrahtBot added the label Needs rebase on Aug 5, 2024
  118. glozow force-pushed on Aug 12, 2024
  119. glozow force-pushed on Aug 28, 2024
  120. DrahtBot removed the label Needs rebase on Aug 28, 2024
  121. DrahtBot commented at 2:41 am on August 29, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/29377989202

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  122. DrahtBot added the label CI failed on Aug 29, 2024
  123. DrahtBot added the label Needs rebase on Sep 2, 2024
  124. glozow force-pushed on Sep 6, 2024
  125. achow101 referenced this in commit 7b66815b16 on Oct 29, 2024
  126. [txrequest] GetCandidatePeers and ResetRequestTimeout 370e7708fb
  127. [refactor] change type of unique_parents to Txid 938659fbd8
  128. [txorphanage] store parent txids in OrphanTx 7badc7e004
  129. [txorphanage] support multiple announcers
    Add ability to add and track multiple announcers per orphan transaction,
    erasing announcers but not the entire orphan.
    
    The tx creation code in orphanage_tests needs to be updated so that each
    tx is unique, because the CountOrphans() check assumes that calling
    EraseForPeer necessarily means its orphans are deleted.
    acaa7c13fa
  130. [txorphanage] return wtxids from EraseForBlock d8bdb69652
  131. [txorphanage] return erased wtxids from LimitOrphans 83d276c7dd
  132. [unit test] multiple announcers fa8e543278
  133. [fuzz] orphanage multiple announcer functions a0b1953d4f
  134. glozow force-pushed on Nov 28, 2024
  135. DrahtBot removed the label Needs rebase on Nov 28, 2024
  136. glozow force-pushed on Nov 28, 2024
  137. glozow force-pushed on Nov 28, 2024
  138. [txdownload] remove unique_parents that we already have
    This means we no longer return parents we already have in the
    m_unique_parents result from MempoolRejectedTx.
    
    We need to separate the loop that checks AlreadyHave parents from the
    loop that adds parents as announcments, because we may do the latter
    loop multiple times for different peers.
    b64be9ed2c
  139. [p2p] try multiple peers for orphan resolution c04d1a876c
  140. [functional test] orphan handling with multiple announcers 780e28cf9a
  141. [p2p] only attempt 1p1c when both txns provided by the same peer
    Now that we track all announcers of an orphan, it's not helpful to
    consider an orphan provided by a peer that didn't send us this parent.
    It can only hurt our chances of finding the right orphan when there are
    multiple candidates.
    
    Adapt the 2 tests in p2p_opportunistic_1p1c.py that looked at 1p1c
    packages from different peers. Instead of checking that the right peer
    is punished, we now check that the package is not submitted. We can't
    use the functional test to see that the package was not considered
    because the behavior is indistinguishable (except for the logs).
    3e16a36959
  142. [p2p] grab m_tx_download_mutex only when needed
    The places where m_tx_download_mutex is held in between calls to
    txdownloadman are unnecessary.
    
    For example, m_tx_download_mutex was previously held throughout these steps:
      1. ReceivedTx(), receive should_validate=true
      2. MempoolRejectedTx(tx), receive package_to_validate={parent, child}
      3. MempoolRejectedPackage(parent, child)
      4. MempoolRejectedTx(parent)
      5. MempoolRejectedTx(child)
    
    But this is not necessary for correctness. If, for example,
    BlockConnected is called in between any of the steps, txdownloadman's
    internal state will remain consistent, even if the relevant
    transaction(s) confirmed. See the txdownloadman fuzzers.
    
    This sets up the next commit which moves locking into TxDownloadManagerImpl.
    51b7d4d052
  143. [refactor] make Find1P1CPackage internal 87783ac69a
  144. [refactor] split AlreadyHaveTx into internal and external versions 3fbc0021f3
  145. add internal Mutex to TxDownloadManagerImpl
    It is named slightly differently to make it easy to check that
    m_tx_download_mutex is fully removed in the next commit.
    214d893822
  146. remove m_tx_download_mutex in net_processing 0a85e99511
  147. glozow force-pushed on Nov 28, 2024
  148. DrahtBot removed the label CI failed on Nov 28, 2024
  149. glozow commented at 5:27 pm on December 1, 2024: member
    Rebased. I have split the first 12 commits (the majority of this PR) into #31397. The other 5 commits are for making the module internally thread-safe, and can be opened in another PR afterward.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me