p2p: Fill reconciliation sets (Erlay) attempt 2 #30116

pull sr-gi wants to merge 10 commits into bitcoin:master from sr-gi:2023-11-erlay2.1 changing 5 files +949 −26
  1. sr-gi commented at 8:01 pm on May 15, 2024: member

    This is a re-attempt of #28765

    The main differences from it are:

    • Most outstanding comments have been addressed (or responded to on the original PR)
    • The description of how a node is picked in IsFanoutTarget has been updated to reflect what the algorithm is doing (not how it is doing it)
    • The way hash_key is seeded in IsFanoutTarget has changed (from m_k0 to node_id). This is to prevent using m_k0 for something it is not intended to, given what data we feed to the randomizer should not matter for this
    • The cache in IsFanoutTarget was only being fed data to, but never checked. This has been reworked to make use of it
    • destinations/target has been renamed to n in ShouldFanoutTo/IsFanoutTarget
    • The PR has been rebased

    Also, the following things have been added/changed to the approach:

    • Deal with short id collisions by fanning out both transactions
    • Deal with ancestors being present on the mempool when a child needs to be placed in the fanout or reconciliation sets (as opposed to dealing with parents, which was the wrong approach)
    • Move fanout/reconcile logic from INV creation to RelayTransaction
      • Create a delayed set where transactions that are added for reconciliation but still not offered to peers (for privacy reasons) live. This is analogous to how fanout works, where transactions are only made available on trickle intervals

    Erlay Project Tracking: https://github.com/bitcoin/bitcoin/issues/30249

  2. DrahtBot commented at 8:01 pm on May 15, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30116.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31001 (refactor: ensure type safety for txid and wtxid in RelayTransaction by marcofleon)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label P2P on May 15, 2024
  4. sr-gi commented at 8:05 pm on May 15, 2024: member
    I’ve talked to @naumenkogs about picking this up and he was happy about it. I’m happy to close this if he changes his mind.
  5. sr-gi renamed this:
    p2p: Fill reconciliation sets (Erlay) attempt: 2
    p2p: Fill reconciliation sets (Erlay) attempt 2
    on May 15, 2024
  6. in src/node/txreconciliation.h:127 in 7c047d30cb outdated
    80+     * Returns whether the transaction appears in the set.
    81+     */
    82+    bool AddToSet(NodeId peer_id, const Wtxid& wtxid);
    83+
    84+    /**
    85+     * Before Step 2, we might want to remove a wtxid from the reconciliation set, for example if
    


    brunoerg commented at 10:03 pm on May 15, 2024:
    In “p2p: Functions to add/remove wtxids to tx reconciliation sets” 7c047d30cb0eadc8a424cb01de2fcd0978e22206: What is the step 2? This comment seems a little confuse because I couldn’t find it as I see for Step 1.

    sr-gi commented at 1:44 pm on May 16, 2024:
    I think this refers to the steps listed at the beginning of the file. Not all of them are covered yet. This PR leaves this right before 2 AFAICT

    brunoerg commented at 3:02 pm on May 16, 2024:
    My bad, missed that. Thanks.
  7. in src/node/txreconciliation.cpp:180 in 7c047d30cb outdated
    176+                          wtxid.ToString(), peer_id, peer_state->m_local_set.size());
    177+        }
    178+        return true;
    179+    }
    180+
    181+    bool TryRemovingFromSet(NodeId peer_id, const Wtxid& wtxid_to_remove) EXCLUSIVE_LOCKS_REQUIRED(!m_txreconciliation_mutex)
    


    brunoerg commented at 10:06 pm on May 15, 2024:
    In “p2p: Functions to add/remove wtxids to tx reconciliation sets” https://github.com/bitcoin/bitcoin/commit/7c047d30cb0eadc8a424cb01de2fcd0978e22206: Perhaps adding a log when removing from set?

    sr-gi commented at 2:51 pm on May 16, 2024:
    Updated to track both successful and unsuccessful deletions (in debug log, to prevent unnecessarily spamming the logs)
  8. in src/bench/txreconciliation.cpp:7 in 3f59bf83a4 outdated
    0@@ -0,0 +1,40 @@
    1+// Copyright (c) 2020-2022 The Bitcoin Core developers
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#include <bench/bench.h>
    6+
    7+#include <net.h>
    


    brunoerg commented at 1:02 pm on May 16, 2024:
    In “add bench for ShouldFanoutTo” 3f59bf83a41b7787f9c43c277b5e62f327a72c3e: net.h include seems unnecessary.

    sr-gi commented at 2:57 pm on May 16, 2024:
    Removed
  9. sr-gi force-pushed on May 16, 2024
  10. sr-gi force-pushed on May 16, 2024
  11. sr-gi force-pushed on May 16, 2024
  12. in src/test/txreconciliation_tests.cpp:110 in 2e29e9beee outdated
    105+    NodeId peer_id1 = 1;
    106+    tracker.PreRegisterPeer(peer_id1);
    107+    BOOST_REQUIRE_EQUAL(tracker.RegisterPeer(peer_id1, true, 1, 1), ReconciliationRegisterResult::SUCCESS);
    108+    BOOST_CHECK(tracker.IsPeerRegistered(peer_id1));
    109+
    110+    for (size_t i = 0; i < 3000; ++i)
    


    brunoerg commented at 2:51 pm on May 22, 2024:

    In “p2p: Functions to add/remove wtxids to tx reconciliation sets” 2e29e9beeebcadfc7afc56a28791456d40551bed: I think MAX_SET_SIZE could be in txreconciliation.h then we can use it in the tests.

     0-/**
     1- * Maximum number of wtxids stored in a peer local set, bounded to protect the memory use of
     2- * reconciliation sets and short ids mappings, and CPU used for sketch computation.
     3- */
     4-constexpr size_t MAX_SET_SIZE = 3000;
     5 
     6 /**
     7  * Maximum number of transactions for which we store assigned fanout targets.
     8diff --git a/src/node/txreconciliation.h b/src/node/txreconciliation.h
     9index c2cdf9875f..648b6369c0 100644
    10--- a/src/node/txreconciliation.h
    11+++ b/src/node/txreconciliation.h
    12@@ -14,6 +14,12 @@
    13 /** Supported transaction reconciliation protocol version */
    14 static constexpr uint32_t TXRECONCILIATION_VERSION{1};
    15 
    16+/**
    17+ * Maximum number of wtxids stored in a peer local set, bounded to protect the memory use of
    18+ * reconciliation sets and short ids mappings, and CPU used for sketch computation.
    19+ */
    20+constexpr size_t MAX_SET_SIZE = 3000;
    21+
    22 enum class ReconciliationRegisterResult {
    23     NOT_FOUND,
    24     SUCCESS,
    25diff --git a/src/test/txreconciliation_tests.cpp b/src/test/txreconciliation_tests.cpp
    26index a0071e4388..8befe0d36e 100644
    27--- a/src/test/txreconciliation_tests.cpp
    28+++ b/src/test/txreconciliation_tests.cpp
    29@@ -115,7 +115,7 @@ BOOST_AUTO_TEST_CASE(AddToSetTest)
    30     BOOST_REQUIRE_EQUAL(tracker.RegisterPeer(peer_id1, true, 1, 1), ReconciliationRegisterResult::SUCCESS);
    31     BOOST_CHECK(tracker.IsPeerRegistered(peer_id1));
    32 
    33-    for (size_t i = 0; i < 3000; ++i)
    34+    for (size_t i = 0; i < MAX_SET_SIZE; ++i)
    35         BOOST_REQUIRE(tracker.AddToSet(peer_id1, Wtxid::FromUint256(frc.rand256())));
    36     BOOST_REQUIRE(!tracker.AddToSet(peer_id1, Wtxid::FromUint256(frc.rand256())));
    

    sr-gi commented at 2:59 pm on May 22, 2024:
    I’ve been talking to @sipa about this recently and I think the limit may not be necessary if the approach for when to compute the reconciliation sets is changed. I’m may get rid of it, but if not, I’ll take this

    sr-gi commented at 3:05 pm on May 30, 2024:

    Ended up taking this, since the limit will be kept in the end.

    Thanks!

  13. sr-gi force-pushed on May 30, 2024
  14. sr-gi force-pushed on May 31, 2024
  15. sr-gi commented at 4:29 pm on May 31, 2024: member

    I’ve slightly extended the approach adding 3 commits to deal with short id collisions, which were not taken into account. Some of this may be squashable, I’ve added them separately for now so they are easy to diff/review.

    This would be missing an additional commit/amend to deal with #28765 (review), which I overlooked when addressing the outstanding comments

  16. sr-gi force-pushed on May 31, 2024
  17. DrahtBot added the label CI failed on May 31, 2024
  18. DrahtBot commented at 4:32 pm on May 31, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/25659971810

  19. sr-gi force-pushed on Jun 3, 2024
  20. sr-gi force-pushed on Jun 3, 2024
  21. sr-gi force-pushed on Jun 3, 2024
  22. sr-gi force-pushed on Jun 3, 2024
  23. sr-gi force-pushed on Jun 3, 2024
  24. DrahtBot removed the label CI failed on Jun 3, 2024
  25. sr-gi force-pushed on Jun 7, 2024
  26. sr-gi marked this as a draft on Jun 7, 2024
  27. sr-gi commented at 6:21 pm on June 7, 2024: member

    I added two more commits, moving the fanout/reconciling logic to RelayTransaction instead of send message, plus dealing with ancestors in mempool, instead of descendants (which seemed to be the wrong approach).

    I’m moving this to draft for now until I clean it a bit, plus get some feedback on the approach

  28. DrahtBot added the label CI failed on Jun 8, 2024
  29. DrahtBot commented at 5:17 am on June 8, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/25955152377

  30. Saraeutsza approved
  31. sr-gi force-pushed on Jun 10, 2024
  32. sr-gi force-pushed on Jun 10, 2024
  33. in src/node/txreconciliation.cpp:488 in f71575cf43 outdated
    424@@ -390,7 +425,13 @@ TxReconciliationTracker::~TxReconciliationTracker() = default;
    425 
    426 uint64_t TxReconciliationTracker::PreRegisterPeer(NodeId peer_id)
    427 {
    428-    return m_impl->PreRegisterPeer(peer_id);
    429+    const uint64_t local_salt{GetRand(UINT64_MAX)};
    430+    return m_impl->PreRegisterPeer(peer_id, local_salt);
    


    Prabhat1308 commented at 11:58 am on June 11, 2024:
    What is the intention behind moving salt calculation out of PreRegisterPeer ? Now if PreRegisterPeer is not executed due to lock being held we are doing meaningless computation .

    sr-gi commented at 5:09 pm on June 11, 2024:
    The purpose is for PreRegisterPeer to be callable with a fixed salt (via PreRegisterPeerWithSalt), so collisions can be tested. This is just moving it out of the PImpl, the effects on the external caller should be the same
  34. sr-gi force-pushed on Jun 11, 2024
  35. sr-gi force-pushed on Jun 11, 2024
  36. DrahtBot removed the label CI failed on Jun 12, 2024
  37. sr-gi force-pushed on Jun 13, 2024
  38. sr-gi force-pushed on Jun 13, 2024
  39. sr-gi force-pushed on Jun 13, 2024
  40. sr-gi marked this as ready for review on Jun 13, 2024
  41. sr-gi commented at 8:01 pm on June 13, 2024: member
    This should be ready for review now.
  42. in src/net_processing.cpp:2166 in 9de4233d00 outdated
    2366+                if (!parents_refs.empty()) {
    2367+                    for (const auto &tx : parents_refs) {
    2368+                        parents.emplace_back(tx.get().GetTx().GetWitnessHash());
    2369+                    }
    2370+                    fanout_with_ancestors = m_txreconciliation->SortPeersByFewestParents(parents);
    2371+                    fanout_with_ancestors.resize(2); // FIXME: Resize to 2 for now
    


    sr-gi commented at 8:02 pm on June 13, 2024:
    This is pending. Happy get get feedback on how many to pick (either hardcoded or based on what)

    Prabhat1308 commented at 3:26 pm on June 16, 2024:
    The number of parents per peer should mostly be in the range of (0-2/3) , This should include a significant numbers of peers to reconcile with . A constant number might not be ideal considering if the array returned is smaller than that constant number , we might be reconciling with non-existent peers .

    sr-gi commented at 2:53 pm on June 18, 2024:
    I’m curious to know where do you get that parent count range from 🤔

    Prabhat1308 commented at 7:57 pm on June 19, 2024:
    I got some data for about 4100+ transactions from a source (cant be sure of the credibility) for bitcoin transactions in a mempool and i ran a script to count the avg. parents which came to approx. 1.6 . Another way I thought of this is I got the avg. number of inputs per transaction . For 2023-2024 the average is somewhat between 2-3 . If we make a tree like dependency structure for transaction considering the transactions which had their inputs as utxo’s , the probability of having higher ancestories diminishes rapidly since any transaction missing will eliminate the ancestory above it . Considering the mempool are in general highly updated , ancestories beyond grandparent transactions are not very likely to be there . So if I consider this , there are supposed to be 6 ancestors for a transactions on an average and we can expect 2-3 range to have a significant number of transactions.

    sr-gi commented at 2:46 pm on June 20, 2024:

    Notice this does not apply to all transactions though. The exception here applies to transactions that have some ancestors in the mempool. In that case, we need to be consistent with the way we share this set of transactions, otherwise, it could be the case that the parents are reconciled at time T and the children are fanout at T-n, potentially making the children orphan.

    Under these conditions, we are checking how many peers have the ancestors in their reconciliation sets, and selecting a subset of the ones with fewer of them, so the amount of transactions to be fanout is the smallest possible.


    brunoerg commented at 2:28 pm on September 25, 2024:
    Shouldn’t we decrease the number of inbounds_fanout_tx_relay/outbounds_fanout_tx_relay by the number of peers we’re going to fanout with ancestors?

    sr-gi commented at 10:11 am on September 26, 2024:

    We could. I left it as for simplicity, given this case should not be too frequent, so having a higher fanout under these conditions shouldn’t be too bad.

    I’ll consider adding it if it does not make the code much harder to read, or if there is a clear counterargument in favor of it.


    naumenkogs commented at 9:54 am on November 1, 2024:

    This function could easily return two numbers: how many of these peers are inbound and outbound. And then just pass this pair along to ShouldFanoutTo. Should be not that complex code. A cheap cost to preserve the efficiency model.

    I might try if you don’t get to it earlier.


  43. sr-gi force-pushed on Jun 14, 2024
  44. sr-gi force-pushed on Jun 14, 2024
  45. sr-gi force-pushed on Jun 18, 2024
  46. DrahtBot added the label CI failed on Jun 19, 2024
  47. DrahtBot removed the label CI failed on Jun 19, 2024
  48. DrahtBot added the label Needs rebase on Jun 20, 2024
  49. sr-gi force-pushed on Jun 20, 2024
  50. DrahtBot removed the label Needs rebase on Jun 20, 2024
  51. glozow commented at 11:03 am on July 4, 2024: member
    Needs rebase (for #29625)
  52. DrahtBot added the label Needs rebase on Jul 4, 2024
  53. in src/test/txreconciliation_tests.cpp:391 in 2e0b6742b8 outdated
    262+        total_fanouted2 += tracker2.ShouldFanoutTo(wtxid, i,
    263+                                                   /*inbounds_fanout_tx_relay=*/0, /*outbounds_fanout_tx_relay=*/0);
    264+    }
    265+    BOOST_CHECK_EQUAL(total_fanouted1, 3);
    266+    BOOST_CHECK_EQUAL(total_fanouted2, 4);
    267+
    


    Prabhat1308 commented at 7:52 pm on July 7, 2024:
    0for (int i = 1; i < inbound_peers; ++i) {
    1        // create 4000 random wtxids so that FANOUT_TARGETS_PER_TX_CACHE_SIZE is breached
    2        for (int j = 0; j < 4000; ++j) {
    3            tracker.ShouldFanoutTo(Wtxid::FromUint256(frc.rand256()), i,
    4                                   /*inbounds_fanout_tx_relay=*/0, /*outbounds_fanout_tx_relay=*/0);
    5        }
    6    }
    

    increases coverage of the units tests to check the cache_size check in IsFanoutTarget .


    sr-gi commented at 8:22 pm on September 13, 2024:
    I haven’t taken this yet, given I’m working on a last commit reworking part of how the fanout selection works. I may consider it after that. In any case, looks like this is only filling the cache but not testing anything after it

    sr-gi commented at 8:00 pm on November 1, 2024:
    Resolving since we don’t have a cache anymore
  54. sr-gi commented at 1:40 pm on July 8, 2024: member

    Needs rebase (for #29625)

    Currently working on some simulations, will rebase soon

  55. sr-gi force-pushed on Sep 13, 2024
  56. sr-gi force-pushed on Sep 13, 2024
  57. DrahtBot added the label CI failed on Sep 13, 2024
  58. DrahtBot commented at 7:59 pm on September 13, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/30127266270

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  59. DrahtBot removed the label Needs rebase on Sep 13, 2024
  60. sr-gi force-pushed on Sep 13, 2024
  61. DrahtBot removed the label CI failed on Sep 13, 2024
  62. in src/net_processing.cpp:2453 in 3879f550d3 outdated
    2451+            if (!result.m_succeeded) {
    2452+                // If the transactions fails to get into the set, we fanout
    2453+                invs_to_send.push_back(wtxid);
    2454+                // If the transaction fails because it collides with an existing one,
    2455+                // we also remove and fanout the conflict and all its descendants.
    2456+                // This is because our peer may have added the conflicting tranaction
    


    brunoerg commented at 12:47 pm on September 21, 2024:
    0                // This is because our peer may have added the conflicting transaction
    
  63. Saraeutsza approved
  64. sr-gi force-pushed on Oct 2, 2024
  65. sr-gi force-pushed on Oct 2, 2024
  66. sr-gi force-pushed on Oct 2, 2024
  67. sr-gi commented at 3:35 pm on October 2, 2024: member
    I’ve added a commit introducing a delayed set to store transactions that are being reconciled, but won’t be added to the sketch should it be constructed at this time. This is to mimic the fanout trickling logic where transactions are only made available to peers if they would have been INVed to them. This was already part of @naumenkogs approach, but was scrapped out when reworking when data is added to the reconciliation set in a22b2364f5942266596c78ad306b1e32aa89aa0f
  68. sr-gi force-pushed on Oct 2, 2024
  69. sr-gi force-pushed on Oct 3, 2024
  70. sr-gi force-pushed on Oct 3, 2024
  71. sr-gi force-pushed on Oct 3, 2024
  72. sr-gi force-pushed on Oct 3, 2024
  73. in src/node/txreconciliation.cpp:457 in 20772c0778 outdated
    452+        // We first decide how many reconciling peers of a given direction we want to flood to,
    453+        // and then generate a list of peers of that size for a given transaction. We then see
    454+        // whether the given peer falls into this list.
    455+        double n;
    456+        if (peer_state->m_we_initiate) {
    457+            n = OUTBOUND_FANOUT_DESTINATIONS - outbounds_fanout_tx_relay;
    


    marcofleon commented at 3:46 pm on October 12, 2024:
    Can outbounds_fanout_tx_relay ever be greater than 1 here? I think if we have two (or more) unregistered outbound peers and at least one registered peer, then this results in an unsigned integer overflow. Which then overflows targets_size in IsFanoutTarget.

    naumenkogs commented at 3:03 pm on October 23, 2024:
    working on a test and fix for this.

    sr-gi commented at 3:44 pm on October 23, 2024:
    Oh sorry, I forgot to reply to this. We discussed this in person last week and this is indeed an issue. Happy to take your fix @naumenkogs

    naumenkogs commented at 10:06 am on October 24, 2024:

    Okay, see the last two commits here.

    I also dropped the <0.001 check, it’s kinda pointless anyway…


    sr-gi commented at 7:42 pm on October 28, 2024:

    The commit with the casting works, but the test one is actually not a good way of testing this; it would pass both with and without the change.

    Turns out that passing a huge value to IsFanoutTarget would result in targets_size being 0, which results in the method returning false. I don’t think there is really a good way of testing this.

    I’ll take the code commit but skip the addition to the test.


    sr-gi commented at 7:48 pm on October 28, 2024:
    I squashed the change in here: ad95b2c0e21f1e865f967aa9463ef99bc252550a

    naumenkogs commented at 9:57 am on October 29, 2024:

    Interesting. So, for me, the first commit (with a new test) takes forever to execute.

    More specifically, the execution hangs on best_peers.resize(targets_size);, with targets_size being a huge number.

    I guess your system handles this gracefully instead?

    A clear failure could be triggered by e.g. Assume(targets_size <= m_states.size());


    naumenkogs commented at 10:19 am on October 29, 2024:
    feel free to resolve this and let’s talk in the new commit.
  74. marcofleon commented at 4:19 pm on October 12, 2024: contributor

    I fuzzed the txreconciliation class with -DSANITIZERS=fuzzer,undefined,integer and two related runtime errors occurred:

     0../../src/node/txreconciliation.cpp:457:46: runtime error: unsigned integer overflow: 1 - 3 cannot be represented in type 'size_t' (aka 'unsigned long')
     1    [#0](/bitcoin-bitcoin/0/) 0x55ddab3b7d99 in TxReconciliationTracker::Impl::ShouldFanoutTo(transaction_identifier<true> const&, long, unsigned long, unsigned long) erlayfuzzbuild/src/../../src/node/txreconciliation.cpp:457:46
     2    [#1](/bitcoin-bitcoin/1/) 0x55ddaa5521c7 in txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_7::operator()() const erlayfuzzbuild/src/test/fuzz/../../../../src/test/fuzz/txreconciliation.cpp:118:46
     3    [#2](/bitcoin-bitcoin/2/) 0x55ddaa5521c7 in unsigned long CallOneOf<txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_0, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_1, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_2, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_3, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_4, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_5, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_6, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_7, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_8>(FuzzedDataProvider&, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_0, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_1, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_2, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_3, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_4, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_5, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_6, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_7, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_8) erlayfuzzbuild/src/test/fuzz/../../../../src/test/fuzz/util.h:42:27
     4    [#3](/bitcoin-bitcoin/3/) 0x55ddaa5521c7 in txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>) erlayfuzzbuild/src/test/fuzz/../../../../src/test/fuzz/txreconciliation.cpp:68:9
     5    [#4](/bitcoin-bitcoin/4/) 0x55ddaa6bf726 in std::function<void (std::span<unsigned char const, 18446744073709551615ul>)>::operator()(std::span<unsigned char const, 18446744073709551615ul>) const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_function.h:591:9
     6    [#5](/bitcoin-bitcoin/5/) 0x55ddaa6bf726 in LLVMFuzzerTestOneInput erlayfuzzbuild/src/test/fuzz/util/../../../../../src/test/fuzz/fuzz.cpp:209:5
     7    [#6](/bitcoin-bitcoin/6/) 0x55ddaa04a840 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e6840) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
     8    [#7](/bitcoin-bitcoin/7/) 0x55ddaa049f75 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e5f75) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
     9    [#8](/bitcoin-bitcoin/8/) 0x55ddaa04b705 in fuzzer::Fuzzer::MutateAndTestOne() (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e7705) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
    10    [#9](/bitcoin-bitcoin/9/) 0x55ddaa04c305 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e8305) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
    11    [#10](/bitcoin-bitcoin/10/) 0x55ddaa03a52f in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19d652f) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
    12    [#11](/bitcoin-bitcoin/11/) 0x55ddaa0638e2 in main (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19ff8e2) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
    13    [#12](/bitcoin-bitcoin/12/) 0x7fcd32ab6249  (/lib/x86_64-linux-gnu/libc.so.6+0x27249) (BuildId: 58254ca972028402bc40624f81388d85ec95f70d)
    14    [#13](/bitcoin-bitcoin/13/) 0x7fcd32ab6304 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x27304) (BuildId: 58254ca972028402bc40624f81388d85ec95f70d)
    15    [#14](/bitcoin-bitcoin/14/) 0x55ddaa02f920 in _start (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19cb920) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
    16SUMMARY: UndefinedBehaviorSanitizer: unsigned-integer-overflow ../../src/node/txreconciliation.cpp:457:46 
    17
    18../../src/node/txreconciliation.cpp:401:100: runtime error: 7.92282e+28 is outside the range of representable values of type 'unsigned long'
    19    [#0](/bitcoin-bitcoin/0/) 0x55ddab3c704a in TxReconciliationTracker::Impl::IsFanoutTarget(transaction_identifier<true> const&, long, bool, double) erlayfuzzbuild/src/../../src/node/txreconciliation.cpp:401:100
    20    [#1](/bitcoin-bitcoin/1/) 0x55ddab3b7b83 in TxReconciliationTracker::Impl::ShouldFanoutTo(transaction_identifier<true> const&, long, unsigned long, unsigned long) erlayfuzzbuild/src/../../src/node/txreconciliation.cpp:471:16
    21    [#2](/bitcoin-bitcoin/2/) 0x55ddaa5521c7 in txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_7::operator()() const erlayfuzzbuild/src/test/fuzz/../../../../src/test/fuzz/txreconciliation.cpp:118:46
    22    [#3](/bitcoin-bitcoin/3/) 0x55ddaa5521c7 in unsigned long CallOneOf<txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_0, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_1, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_2, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_3, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_4, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_5, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_6, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_7, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_8>(FuzzedDataProvider&, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_0, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_1, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_2, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_3, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_4, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_5, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_6, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_7, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_8) erlayfuzzbuild/src/test/fuzz/../../../../src/test/fuzz/util.h:42:27
    23    [#4](/bitcoin-bitcoin/4/) 0x55ddaa5521c7 in txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>) erlayfuzzbuild/src/test/fuzz/../../../../src/test/fuzz/txreconciliation.cpp:68:9
    24    [#5](/bitcoin-bitcoin/5/) 0x55ddaa6bf726 in std::function<void (std::span<unsigned char const, 18446744073709551615ul>)>::operator()(std::span<unsigned char const, 18446744073709551615ul>) const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_function.h:591:9
    25    [#6](/bitcoin-bitcoin/6/) 0x55ddaa6bf726 in LLVMFuzzerTestOneInput erlayfuzzbuild/src/test/fuzz/util/../../../../../src/test/fuzz/fuzz.cpp:209:5
    26    [#7](/bitcoin-bitcoin/7/) 0x55ddaa04a840 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e6840) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
    27    [#8](/bitcoin-bitcoin/8/) 0x55ddaa049f75 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e5f75) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
    28    [#9](/bitcoin-bitcoin/9/) 0x55ddaa04b705 in fuzzer::Fuzzer::MutateAndTestOne() (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e7705) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
    29    [#10](/bitcoin-bitcoin/10/) 0x55ddaa04c305 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e8305) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
    30    [#11](/bitcoin-bitcoin/11/) 0x55ddaa03a52f in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19d652f) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
    31    [#12](/bitcoin-bitcoin/12/) 0x55ddaa0638e2 in main (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19ff8e2) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
    32    [#13](/bitcoin-bitcoin/13/) 0x7fcd32ab6249  (/lib/x86_64-linux-gnu/libc.so.6+0x27249) (BuildId: 58254ca972028402bc40624f81388d85ec95f70d)
    33    [#14](/bitcoin-bitcoin/14/) 0x7fcd32ab6304 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x27304) (BuildId: 58254ca972028402bc40624f81388d85ec95f70d)
    34    [#15](/bitcoin-bitcoin/15/) 0x55ddaa02f920 in _start (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19cb920) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
    35SUMMARY: UndefinedBehaviorSanitizer: float-cast-overflow ../../src/node/txreconciliation.cpp:401:100 
    36==222068== ERROR: libFuzzer: out-of-memory (used: 4046Mb; limit: 2048Mb)
    37   To change the out-of-memory limit use -rss_limit_mb=<N>
    38
    39MS: 4 ChangeBinInt-ChangeBit-CrossOver-CrossOver-; base unit: 0cbe4c9fe667ead352437139e9ea437783de5c0c
    400xad,0xad,0xad,0x60,0x0,0x0,0x0,0x0,0x0,0x0,0x37,0xff,0xff,0xff,0xff,0xd2,0xff,0xff,0xff,0xd2,0xd2,0xd2,0xd2,0xd2,0xd2,0xd2,0xd2,0x3b,0xd2,0xd2,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xdf,0xff,0x0,0xff,0xff,0x9,0x0,0x0,0x0,0x0,0x0,0xff,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x75,0xff,0xff,0xff,0xd2,0xd2,0xd2,0xd2,0xd2,0xd2,0xbf,0xff,0x0,0x0,0x0,0x52,0xad,0xad,0xbf,0x6a,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xad,0x1,0x0,0x0,
    41\255\255\255`\000\000\000\000\000\0007\377\377\377\377\322\377\377\377\322\322\322\322\322\322\322\322;\322\322\377\377\377\377\377\377\377\377\337\377\000\377\377\011\000\000\000\000\000\377\000\000\000\000\000\000\000u\377\377\377\322\322\322\322\322\322\277\377\000\000\000R\255\255\277j\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\255\001\000\000
    42artifact_prefix='./'; Test unit written to ./oom-b217ee9df2670caf13a47abe1623c2f80e6d1f02
    43Base64: ra2tYAAAAAAAADf/////0v///9LS0tLS0tLSO9LS///////////f/wD//wkAAAAAAP8AAAAAAAAAdf///9LS0tLS0r//AAAAUq2tv2qgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgrQEAAA==
    44SUMMARY: libFuzzer: out-of-memory
    

    oom-b217ee9df2670caf13a47abe1623c2f80e6d1f02.txt

    Left a question about it below.

  75. in src/node/txreconciliation.cpp:234 in 50c19dd21f outdated
    208+        }
    209+
    210+        return false;
    211+    }
    212+
    213+    bool HasCollision(NodeId peer_id, const Wtxid& wtxid, Wtxid& collision, uint32_t &short_id) EXCLUSIVE_LOCKS_REQUIRED(!m_txreconciliation_mutex)
    


    naumenkogs commented at 8:16 am on October 28, 2024:
    50c19dd21f6e437cd8e3d9047a89d793d097c5e5 Not sure it’s worth exposing this external method if the only reason to have it is testing.

    sr-gi commented at 2:34 pm on November 1, 2024:
    Happy to consider an alternative approach that allows us to check there are no collisions without having to expose a public method if you have something in mind

    naumenkogs commented at 9:07 am on November 4, 2024:
    Yeah just dropping HasCollision, keeping HasCollisionInternal. The former is used only in tests. And since its implementation is trivial, i think testing for collisions through AddToSet is sufficient.

    sr-gi commented at 7:12 pm on November 13, 2024:
    Sure, looks like tests can be rewritten for it not to be needed. Will address in the next push
  76. in src/net_processing.cpp:6276 in 50c19dd21f outdated
    6272+                            auto result = m_txreconciliation->AddToSet(pto->GetId(), wtxid);
    6273+                            if (!result.m_succeeded) {
    6274+                                vInv.push_back(inv);
    6275+                                // If the transaction fails because it collides with an existing one,
    6276+                                // we also remove and fanout the conflict and all its descendants.
    6277+                                // This is because our peer may have added the conflicting tranaction
    


    naumenkogs commented at 8:20 am on October 28, 2024:

    50c19dd21f6e437cd8e3d9047a89d793d097c5e5

    worth adding a comment on what would happen if there are 3 colliding transactions (probably fine but some text would be helpful) :)


    sr-gi commented at 6:58 pm on October 30, 2024:

    Being this part of RelayTransaction, which processed transactions one at a time, and given that if a collision happens the existing transaction is removed from the set, there cannot be a three-way collision. If a third transaction that happens to have the same short id as the two previous ones is added to the set later, it would be added normally since there won’t be a transaction to collide with.

    I could add a comment for three, but someone may then mention what happens if there are four 😅


    naumenkogs commented at 8:22 am on October 31, 2024:

    I was looking for something like “As a general rule, we keep the transaction that appeared first”.

    Currently, the AddToSet documentation seems outdated—it basically claims that the only way to fail is to reach a set limit.

    How about the following replacement:

    0    /**
    1     * Step 1. Add a to-be-announced transaction to the local reconciliation set of the target peer.
    2     * Returns false if the set is at capacity, or if the set contains a colliding transaction.
    3     * Returns true if the transaction appears in the set (whether it was already there or just was added).
    4     */
    

    naumenkogs commented at 10:23 am on October 31, 2024:

    Apparently I was wrong.

    0                // If the transaction fails because it collides with an existing one,
    1                // we also remove and fanout the conflict and all its descendants.
    2                // This is because our peer may have added the conflicting transaction
    3                // to its set, in which reconciliation of these two would fail
    

    Say A, B, C arrive.

    • First A added to the set.
    • then B doesn’t make it to the set, but also drops A from it.
    • then C takes a place in the set.

    sr-gi commented at 8:05 pm on November 1, 2024:
    Agreed the docs for TxReconciliationTracker::AddToSet could be improved. I’ll update it with a version of your suggestion
  77. in src/net_processing.cpp:6286 in 50c19dd21f outdated
    6282+                                    CTxMemPool::setEntries descendants;
    6283+                                    m_mempool.CalculateDescendants(m_mempool.get_iter_from_wtxid(collision.value()), descendants);
    6284+                                    for (const auto &txit: descendants) {
    6285+                                        auto wtxid = txit->GetTx().GetWitnessHash();
    6286+                                        m_txreconciliation->TryRemovingFromSet(pto->GetId(), wtxid);
    6287+                                        vInv.emplace_back(MSG_WTX, wtxid);
    


    naumenkogs commented at 8:22 am on October 28, 2024:
    50c19dd21f6e437cd8e3d9047a89d793d097c5e5 what would happen if a descendant was already flooded?

    sr-gi commented at 3:14 pm on November 1, 2024:

    That’s a good point. It may be the case that the selected peer has a parent -> child dependency but was not picked as one of the fanout_with_ancestors peers, the parent and the child may have ended up in different sets.

    We should check the return of TryRemovingFromSet before adding it to vInv


    sr-gi commented at 8:39 pm on November 1, 2024:
    I just realized this is also the case for the ancestors. When relaying with ancestors, we should also check that the ancestor we are trying to relay is part of the reconciliation set (i.e. that it has not been previously flooded)

    naumenkogs commented at 9:47 am on November 4, 2024:
    The fix should be done in p2p: Deal with shortid collisions for reconciliation sets, please move it if you retouch.
  78. in src/net_processing.cpp:6285 in 50c19dd21f outdated
    6281+                                    Assume(peer->m_wtxid_relay);
    6282+                                    CTxMemPool::setEntries descendants;
    6283+                                    m_mempool.CalculateDescendants(m_mempool.get_iter_from_wtxid(collision.value()), descendants);
    6284+                                    for (const auto &txit: descendants) {
    6285+                                        auto wtxid = txit->GetTx().GetWitnessHash();
    6286+                                        m_txreconciliation->TryRemovingFromSet(pto->GetId(), wtxid);
    


    naumenkogs commented at 8:32 am on October 28, 2024:
    50c19dd21f6e437cd8e3d9047a89d793d097c5e5 could this be used to game collisions (for wtxid you have here)?

    naumenkogs commented at 8:33 am on October 28, 2024:

    say a descendant was never considered for reconciliation (somehow), but now is used to removefromset some other transaction (it collides with)

    or maybe it was considered, but then was dropped in favor of something else (that’s not how we currently handle collisions, but that would be a hard assumption)


    sr-gi commented at 3:28 pm on November 1, 2024:

    Collisions of this kind could happen in theory, but I don’t think they can be really gamed:

    short ids are peer specific, they are generated using SipHashUint256 which are salted using m_k0 and m_k1. Therefore, while a collision of this kind is theoretically possible, an adversarial peer should not be able to exploit them


    naumenkogs commented at 9:08 am on November 4, 2024:
    Right, please resolve this issue.
  79. in src/net_processing.cpp:509 in 392facfce1 outdated
    522@@ -523,7 +523,7 @@ class PeerManagerImpl final : public PeerManager
    523     PeerManagerInfo GetInfo() const override EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex);
    524     void SendPings() override EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex);
    525     std::pair<size_t, size_t> GetFanoutPeersCount() override EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex);
    526-    void RelayTransaction(const uint256& txid, const uint256& wtxid) override EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex);
    527+    void RelayTransaction(const uint256& txid, const uint256& wtxid, bool force_relay) override EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex);
    


    naumenkogs commented at 8:37 am on October 28, 2024:

    392facfce1be6d0395abecc2e049ff39635bd1a5

    why do you think force relay should influence whether we reconcile or flood? The transaction is relayed in either case (the original meaning of force relay)


    sr-gi commented at 7:23 pm on November 14, 2024:
    I misunderstood the use of ForceRelay here. Will amend it so it applied to both

    naumenkogs commented at 6:27 am on November 26, 2024:
    What’s the plan? I still think force_relay is unnecessary here.

    sr-gi commented at 4:07 pm on November 26, 2024:
    Yep, I’m planning to drop it

    sr-gi commented at 9:32 pm on November 26, 2024:
    Covered in the last push
  80. in src/net_processing.cpp:2129 in 392facfce1 outdated
    2369@@ -2370,8 +2370,8 @@ std::pair<size_t, size_t> PeerManagerImpl::GetFanoutPeersCount()
    2370     if (m_txreconciliation) {
    2371         LOCK(m_peer_mutex);
    2372         for(const auto& [peer_id, peer] : m_peer_map) {
    2373-            if (auto tx_relay = peer->GetTxRelay()) {
    2374-                bool peer_relays_txs = WITH_LOCK(tx_relay->m_bloom_filter_mutex, return tx_relay->m_relay_txs);
    2375+            if (const auto tx_relay = peer->GetTxRelay()) {
    


    naumenkogs commented at 8:38 am on October 28, 2024:

    392facfce1be6d0395abecc2e049ff39635bd1a5

    these kind of refactors we better apply where the code was introduced, in the older commits of this PR


    sr-gi commented at 6:45 pm on October 30, 2024:
    You’re right, should have been added to the previous commit. Will fix
  81. in src/node/txreconciliation.h:158 in 392facfce1 outdated
    138@@ -139,6 +139,12 @@ class TxReconciliationTracker
    139      */
    140     bool ShouldFanoutTo(const Wtxid& wtxid, NodeId peer_id,
    141                         size_t inbounds_fanout_tx_relay, size_t outbounds_fanout_tx_relay);
    142+
    143+    /**
    144+     * Returns a collections of node ids sorted by how many parents the peer has in its reconciliation set
    


    naumenkogs commented at 8:43 am on October 28, 2024:

    392facfce1be6d0395abecc2e049ff39635bd1a5

    you can technically feed it arbitrary transactions. I think the comment should say it, and then later say “we normally pass a list of parents to determine whether [… … …]”.


    sr-gi commented at 6:26 pm on November 14, 2024:
    I’ve reworked the docs for this
  82. in src/net_processing.cpp:6220 in 392facfce1 outdated
    6283@@ -6216,27 +6284,6 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
    6284                     // No reason to drain out at many times the network's capacity,
    6285                     // especially since we have many peers and some will draw much shorter delays.
    6286                     unsigned int nRelayedTransactions = 0;
    6287-
    6288-                    size_t inbounds_fanout_tx_relay = 0, outbounds_fanout_tx_relay = 0;
    


    naumenkogs commented at 8:44 am on October 28, 2024:

    392facfce1be6d0395abecc2e049ff39635bd1a5

    why not modifying the former commit that introduces this code?


    sr-gi commented at 7:31 pm on October 30, 2024:
    Do you mean squashing these changes instead of having a “move” commit?

    naumenkogs commented at 10:25 am on October 31, 2024:
    yes

    sr-gi commented at 2:30 pm on November 1, 2024:
    Because it is doing more than just moving the code around, so I thought it may be easier for reviewers already familiar with the original approach to understand the changes. I wouldn’t mind squashing it once all the discussion around RelayTranaction is resolved

    naumenkogs commented at 9:10 am on November 4, 2024:
    My impression was that there are very-very few reviewers already familiar with the original approach at the moment, so it’s an extra burden for most to understand that former code you drop anyway. But yeah it’s ok. Perhaps a commit message saying it will be dropped will work.

    sr-gi commented at 9:36 pm on November 4, 2024:
    I’ll squash in the next force push
  83. in src/net_processing.cpp:2188 in 392facfce1 outdated
    2429+            // to reconcile, fanout the transaction an all its ancestors. We just add the parents here and leave fanout as true
    2430+            auto it = std::find(fanout_with_ancestors.begin(), fanout_with_ancestors.end(), peer_id);
    2431+            if (it != fanout_with_ancestors.end()) {
    2432+                for (const auto wtxid: parents) {
    2433+                    m_txreconciliation->TryRemovingFromSet(peer_id, wtxid);
    2434+                    invs_to_send.push_back(wtxid);
    


    naumenkogs commented at 8:54 am on October 28, 2024:

    392facfce1be6d0395abecc2e049ff39635bd1a5

    what about the order of the parents here? Deserves a comment at least. With the current non-cluster mempool it might break something?

    UPD:

    Say the following happens: A -> B - > C

    • A arrives, we add it to some sets.
    • B arrives, triggers a flood from half the sets, but the other half has A+B now.
    • C arrives, triggers a flood from A+B. But then, can we guarantee A is flooded before B?

    UPD2: topo sort should work. Have a common place where the INV vector is handled?


    naumenkogs commented at 10:17 am on October 31, 2024:

    another question: say a child arrives a few minutes after a parent (parent still in the mempool, but it’s been announced to peers a while ago). Why bother with this inv? Perhaps we should do it only if TryRemovingFromSet actually have removed it?

    (same applies to handling a collision below)

    i know m_tx_inventory_known_filter will probably catch it, but i think it’s better to do it here.


    sr-gi commented at 7:52 pm on November 14, 2024:
    Not sure I follow how the first comment is an issue, but I do agree with the second. This needs to be conditional to TryRemovingFromSet succeeding.

    sr-gi commented at 4:08 pm on November 26, 2024:
    @naumenkogs can this be resolved?
  84. in src/net_processing.cpp:2202 in 392facfce1 outdated
    2442+        if (fanout) {
    2443+            // We fanout if force relay is set, if the peer does not reconcile transactions, or if it does but it has been picked for fanout.
    2444+            invs_to_send.push_back(peer->m_wtxid_relay ? wtxid : txid);
    2445+        } else {
    2446+            // Otherwise, we try to add the transaction to the peer's reconciliation set.
    2447+            // If the transaction has in mempool descendants, we will fanout all the descendant
    


    naumenkogs commented at 9:11 am on October 28, 2024:

    392facfce1be6d0395abecc2e049ff39635bd1a5

    where do you do this (consider in-mempool descendants and decide to fanout everything)? I only see you do it for the collision


    sr-gi commented at 2:52 pm on November 1, 2024:

    I don’t think this comment is accurate since this can never be the case. A transaction considered for relay cannot have in-mempool descendants by definition, since they would have been orphans at the time of considering them.

    I cannot recall if this is a re-word of the previous approach, which tried to consider descendants, or I tried to refer to the collisions and it didn’t make much sense.

    I’ll remove the comment

  85. in src/net_processing.cpp:5847 in 20772c0778 outdated
    6269@@ -6270,6 +6270,10 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
    6270 
    6271                 // Determine transactions to relay
    6272                 if (fSendTrickle) {
    6273+                    if (m_txreconciliation && m_txreconciliation->IsPeerRegistered(pto->GetId())) {
    6274+                        // Make transactions added to the reconciliation set during the last interval available
    6275+                        m_txreconciliation->ReadyDelayedTransactions(pto->GetId());
    


    naumenkogs commented at 9:19 am on October 28, 2024:

    20772c077832592265bd6a2876aa6b4cb7dbde7d

    with this code, and especially the 5/2 trickle delay (very long), previous performance measurements become outdated.

    My idea was to delay the responses. And by this time, the hope was flooding does enough job so that scanning the sets through many connections does not work.

    Also, adding to a set already happens on a trickle… so that’s another level of protection


    naumenkogs commented at 11:35 am on October 30, 2024:

    Just talked to Bruno a bit more about this.

    I think it’s fair to compare this delayed set mechanism to what i have in the code there. I like my solution more because:

    • its synced across peers (handles better spying across multiple conns i think?)
    • simpler code
    • more flexibility (doesn’t have to be same trickling as adding to sets or fanouting)

    Interested in what other people think though.


    sr-gi commented at 3:07 pm on October 30, 2024:

    I think the current approach mimics what you were doing with your old one. I haven’t picked up the trickle reduction commit not the delayed response, but I was planning to in the next PR.

    Your approach used to add transactions to the reconciliation set during INV building, which happens on trickle intervals. So data was added to to_be_announced (delayed) on RelayTransaction and then made available on trickle. The current approach adds data to the delayed set on RelayTransaction and makes it available on trickle too.

    So I don’t think the previous performance measurements become outdated, building the whole Erlay PR on this should yield similar results.


    naumenkogs commented at 10:14 am on October 31, 2024:

    I agree now, it’s equivalent. Still wrapping my head around RelayTransaction refactoring :)


    A difference could be in handling dependencies: during the trickle window after the first transaction arrives.

    Say, a parent arrives, and then a child arrives in a second. In my approach we may notice the child when handling (announcement-wise) the parent (because handling the parent is postponed via trickle). In your approach, you handle the parent independently.

    Is there any use of knowing about the child while handling (announcement-wise) the parent? I don’t think so. You probably could save some efforts of adding the parent to reconset, and then removing it, but that should be cheap anyway.


    (the child-before-parent case is entirely different, as the child-to-arrive-early doesn’t normally hit the announcement step while in orphanage)


    naumenkogs commented at 12:22 pm on November 1, 2024:

    Found another relevant thing! (here)

    009:42:53<@gmaxwell> Existing relay logic checks that transactions are still in the mempool before relaying them. I think the issue there would go away if instead of keeping around some erlay datastructure you just keep growing the queue of the peers transactions to relay until its time to reconcile, enh?  then the existing logic to check if things are still in the mempool is sufficient.
    

    “Existing” refers to the legacy code in my branch.


    sr-gi commented at 4:07 pm on November 26, 2024:
    I’m planning to simulate both cases and report back

    sr-gi commented at 2:03 am on December 22, 2024:

    I simulated this to see if there was any difference between doing the decision-making before or after trickle.

    I didn’t do it for dependant transactions, just for a single one, trying to answer whether choosing fanout peers at scheduling time or at relay time made any difference. The answer is no, given the number of peers that know about the transaction at both times is almost the same.

    The approach, simulation and results can be found here: https://srgi.notion.site/Select-fanout-candidates-at-relay-time-instead-of-at-relay-scheduling-time-1617b3fef97780b9aa9bc52ec6fb4049

    It is worth mentioning that this applies to fully deployed erlay, which I think is what we should be designing for.

  86. in src/node/txreconciliation.cpp:213 in d2f2605072 outdated
    132@@ -115,10 +133,66 @@ class TxReconciliationTracker::Impl
    133                       peer_id, is_peer_inbound);
    134 
    135         const uint256 full_salt{ComputeSalt(local_salt, remote_salt)};
    136-        recon_state->second = TxReconciliationState(!is_peer_inbound, full_salt.GetUint64(0), full_salt.GetUint64(1));
    137+
    138+        auto new_state = TxReconciliationState(!is_peer_inbound, full_salt.GetUint64(0), full_salt.GetUint64(1));;
    139+        m_states.erase(recon_state);
    140+        m_states.emplace(peer_id, std::move(new_state));
    


    naumenkogs commented at 9:28 am on October 28, 2024:
    d2f260507279b1edd6c0f55d02c7ae2e598b3585 nit: Assume(m_states.emplace(peer_id, std::move(new_state)).second);

    sr-gi commented at 8:46 pm on November 14, 2024:
    Done
  87. in src/node/txreconciliation.cpp:325 in d2f2605072 outdated
    180+        auto peer_state = GetRegisteredPeerState(peer_id);
    181+        if (!peer_state) return false;
    182+
    183+        auto removed = peer_state->m_local_set.erase(wtxid) > 0;
    184+        if (removed) {
    185+            LogPrintLevel(BCLog::TXRECONCILIATION, BCLog::Level::Debug, "Removed %s from the reconciliation set for peer=%d. " /* Continued */
    


    naumenkogs commented at 9:29 am on October 28, 2024:

    https://github.com/bitcoin/bitcoin/commit/d2f260507279b1edd6c0f55d02c7ae2e598b3585

    nit: in three places in this commit /* Continued */ should be dropped, it’s no longer necessary (been a linter thing)

  88. in src/net_processing.cpp:6211 in 873e8dd60b outdated
    6206+                                // When we consider to which (and how many) Erlay peers
    6207+                                // we should fanout a tx, we must know to how
    6208+                                // many peers we would certainly announce this tx
    6209+                                // (non-Erlay peers).
    6210+                                if (peer_tx_relay->m_relay_txs && !m_txreconciliation->IsPeerRegistered(cur_peer_id)) {
    6211+                                    inbounds_fanout_tx_relay += cur_peer->m_is_inbound;
    


    naumenkogs commented at 9:31 am on October 28, 2024:

    873e8dd60b93f740330e9b0fbae4b3200204653a

    nit: cur_peer->m_is_inbound ? ++inbounds_fanout_tx_relay : ++outbounds_fanout_tx_relay

  89. naumenkogs changes_requested
  90. sr-gi force-pushed on Oct 28, 2024
  91. sr-gi commented at 7:50 pm on October 28, 2024: member
    Squashed @naumenkogs fix on @marcofleon comment in ad95b2c0e21f1e865f967aa9463ef99bc252550a, plus added a commit replacing IsFanoutTarget with GetFanoutTargets so the cache for fanout can be dropped. After this, the fanout targets are computed just once per transaction
  92. sr-gi force-pushed on Oct 28, 2024
  93. DrahtBot added the label CI failed on Oct 28, 2024
  94. DrahtBot commented at 8:02 pm on October 28, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/32180649974

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  95. sr-gi force-pushed on Oct 28, 2024
  96. DrahtBot removed the label CI failed on Oct 28, 2024
  97. in src/node/txreconciliation.cpp:389 in 4234f5ebe1 outdated
    393+        const double inbound_targets = (inbounds_fanout_tx_relay + m_inbounds_count) * INBOUND_FANOUT_DESTINATIONS_FRACTION;
    394+        double n = std::max(inbound_targets - inbounds_fanout_tx_relay, 0.0);
    395+
    396+        // Being this a fraction, we need to round it either up or down. We do this deterministically at random based on the
    397+        // transaction we are picking the peers for.
    398+        CSipHasher deterministic_randomizer_in{m_deterministic_randomizer};
    


    naumenkogs commented at 10:06 am on October 29, 2024:

    4234f5ebe133b949080aaa56da8cbdd18b650ff2

    Originally, the code forced me to make this call for every peer, that’s why i needed it to be deterministic. It’s not the case anymore. Do we still need determinism then?


    naumenkogs commented at 10:08 am on October 29, 2024:
    Otherwise we can drop m_deterministic_randomizer altogether and use some cheap FastRandomContext instead.

    sr-gi commented at 7:34 pm on October 29, 2024:
    Well, this really depends. GetFanoutTargets is called by RelayTransaction, which, in turn, may be called by ProcessMessage for a peer that has NetPermissionFlags::ForceRelay. So if force relay is used for both fanout and set reconciliation, as suggested in #30116 (review), then this could be called twice leading to two different sorting if we don’t do so deterministically

    naumenkogs commented at 9:15 am on October 30, 2024:

    This could be ReattemptInitialBroadcast, BroadcastTransaction (rpc/wallet), ProcessValidTx, or earlier upon handling NetMsgType::TX.

    I think it’s odd we have two calls to RelayTransaction in the two latter cases. I’d rather change this code, e.g. pass already_relayed (duplicating force) to RelayTransaction. Perhaps too much for this PR and we should do it elsewhere (especially now that the package stuff happens in-between). I understand why it was ok when RelayTransaction was simple.

    And while we’re stuck with these two calls, let’s say we drop the determinism.

    Why RelayTransaction doesn’t look at m_tx_inventory_to_send very early (and maybe m_tx_inventory_known_filter too)? That would catch the transactions we already set for flooding, and just pass on them.

    Now, the transactions in the sets. If we add them to the set again, nothing is done. If we try to flood them by a different choice, nothing is done still because we hit m_tx_inventory_to_send already having it.

    Thus, for now, i think we should repeat the m_tx_inventory_to_send check above, rather than having the determinism. It kinda makes sense anyway, no?


    sr-gi commented at 7:41 pm on December 5, 2024:

    Why RelayTransaction doesn’t look at m_tx_inventory_to_send very early (and maybe m_tx_inventory_known_filter too)? That would catch the transactions we already set for flooding, and just pass on them.

    I don’t think this really works(?).

    Imagine we call this once, and peer p (who has ForceRelay` permissions) is selected for reconciliation using a non-deterministic approach.

    Now, the transaction is sent again by p shortly after, and now he is selected for flooding instead. The filter won’t catch this, and the transaction would end up in both sets.

    This makes me recall that even though the process of selecting peers is deterministic at the moment, it is dependent on our peer list. Calling this twice with a different set of peers can result in a different ordering, so maybe we should not assume that transactions can be freely added to any of the two sets, and we should make sure that the opposite set does not contain the transaction before adding it to our desired one?

  98. in src/node/txreconciliation.cpp:374 in 4234f5ebe1 outdated
    385+        LOCK(m_txreconciliation_mutex);
    386 
    387+        // We decide whether a particular peer is a low-fanout flood target differently based on its connection direction:
    388+        // - for outbounds we have a fixed number of flood destinations.
    389+        // - for inbounds we use a fraction of all inbound peers supporting tx relay.
    390+        auto outbounds_target_size = (double)OUTBOUND_FANOUT_DESTINATIONS - outbounds_fanout_tx_relay;
    


    naumenkogs commented at 10:17 am on October 29, 2024:

    There is no need to cast it to double anymore.

    What we should do is something like

    0        size_t outbounds_target_size = 0;
    1        if (OUTBOUND_FANOUT_DESTINATIONS > outbounds_fanout_tx_relay) outbounds_target_size = OUTBOUND_FANOUT_DESTINATIONS - outbounds_fanout_tx_relay;
    

    Perhaps there is a less ugly way to do this. But casting to double seems worse.


    naumenkogs commented at 10:27 am on October 29, 2024:

    The following test would break the code if we forget the double cast (discussion here). I suggest adding it to the tests.

    0        std::tie(in_fanout_targets, out_fanout_targets) = tracker.GetFanoutTargets(Wtxid::FromUint256(frc.rand256()), /*inbounds_fanout_tx_relay=*/0, /*outbounds_fanout_tx_relay=*/100);
    1        BOOST_CHECK(!tracker.ShouldFanoutTo(peer_id0, in_fanout_targets, out_fanout_targets));
    

    naumenkogs commented at 10:30 am on October 29, 2024:
    (technically the code you have is correct, it’s just confusing for no reason i think)

    sr-gi commented at 9:06 pm on October 29, 2024:
    I’ve added it to the initial commit and reworked it on the last one
  99. in src/node/txreconciliation.cpp:420 in 4234f5ebe1 outdated
    440+            for_each(sorted_outbounds.begin(), sorted_outbounds.end(),
    441+                    [&out_fanout_targets](auto& keyed_peer) { out_fanout_targets.push_back(keyed_peer.second); });
    442+            out_fanout_targets.resize(outbounds_target_size);
    443+        }
    444+
    445+        return std::make_tuple(in_fanout_targets, out_fanout_targets);
    


    naumenkogs commented at 10:33 am on October 29, 2024:

    4234f5ebe133b949080aaa56da8cbdd18b650ff2

    We can just merge the two vectors, no? There is no use for this separation.


    sr-gi commented at 7:38 pm on October 29, 2024:
    Yeah, you’re right. Given how small the two vectors are, calling ShouldFanoutTo with a merged vector should not incur any major overhead, so merging will make it easier to read
  100. DrahtBot added the label Needs rebase on Oct 29, 2024
  101. sr-gi force-pushed on Oct 29, 2024
  102. sr-gi force-pushed on Oct 30, 2024
  103. sr-gi commented at 2:59 pm on October 30, 2024: member
    Rebasing to deal with merge conflicts
  104. DrahtBot removed the label Needs rebase on Oct 30, 2024
  105. in src/node/txreconciliation.cpp:110 in ccd695851c outdated
    106+        return m_local_set.size() + m_delayed_local_set.size();
    107+    }
    108+
    109+    bool RemoveFromSet(const Wtxid& wtxid)
    110+    {
    111+        auto r = m_local_set.erase(wtxid) + m_delayed_local_set.erase(wtxid);
    


    naumenkogs commented at 11:17 am on October 31, 2024:

    Nit

    0        const size_t removed = m_local_set.erase(wtxid) + m_delayed_local_set.erase(wtxid);
    1        // Data must be in one of the sets at most
    2        Assume(removed <= 1);
    3        return removed == 1;
    

    sr-gi commented at 8:27 pm on November 1, 2024:

    Not too sure about the return removed == 1

    The assume is there to make sure that if the code happens to be wrong, it is catched on testing. In the current approach, if the code moving data from delayed_set to m_local_set was broken somehow, this method will return that data was actually removed. If were to return removed == 1 in the aforementioned case, it would return false, and the caller’s logic may break.

    I don’t think any of these cases is likely to happen given the assume will catch it before getting merged, but the proposed change makes me anxious.

    PS: I agree on doing the variable renaming if you feel that makes the code more readable though


    naumenkogs commented at 10:08 am on November 4, 2024:

    I thought size_t 0 will be cast to bool false on return, and just suggested the exact same behavior (1==true, 0==false) but without forcing the reader to think about casting and cause this exact confusion between us. Am i wrong?


    Ahhh, so you saying 2==true is also important? Then I’ll be fine with explicit return removed >= 1, and comment that 2 should not normally happen.


    sr-gi commented at 9:10 pm on November 13, 2024:

    2==true would mean that we had the transaction in both places, which should never happen (I’d mean that a transaction is both ready and delayed at the same time).

    If that were to happen in production for whatever reason, this would return false with the suggested changes, which may break something?


    naumenkogs commented at 7:33 am on November 14, 2024:

    Probably a taste thing. I think auto should be used only when it’s obvious what type it is.

    Here there are two non-obvious occasions:

    • auto x = bool + bool resulting in x=2.
    • casting 2 to bool for the result.

    I won’t insist but i think verbose types are better here. Feel free to close.


    sr-gi commented at 8:20 pm on November 14, 2024:
    That is fair. I’ve updated it to make it more explicit (and readable)
  106. in src/node/txreconciliation.cpp:436 in fd7c35a226 outdated
    376@@ -376,6 +377,38 @@ class TxReconciliationTracker::Impl
    377 
    378         return IsFanoutTarget(wtxid, peer_id, peer_state->m_we_initiate, n);
    379     }
    380+
    381+    std::vector<NodeId> SortPeersByFewestParents(std::vector<Wtxid> parents) EXCLUSIVE_LOCKS_REQUIRED(!m_txreconciliation_mutex)
    


    naumenkogs commented at 9:49 am on November 1, 2024:

    fd7c35a22669d440afe182a153bed28fd326b51f

    Often, there will be no parents (e.g., a parent is in the mempool but has already been flooded/reconciled). In that case, it would be nice to give a random order. And then comment it please.

    (currently it’s ordered by m_states i think).


    sr-gi commented at 8:42 pm on November 1, 2024:
    This is only called if there are in mempol parents though, the ordering doesn’t matter much otherwise

    naumenkogs commented at 9:57 am on November 4, 2024:
    There could be mempool parents, but none are in reconciliation sets (say, they were already reconciled). For all these cases, it would be nice to not choose the peer sitting first in m_states?

    sr-gi commented at 9:51 pm on November 13, 2024:

    In that case, no transaction would be moved from the reconciliation set to invs_to_send, given no transaction would be removed. Therefore, the order wouldn’t matter:

    0auto it = std::find(fanout_with_ancestors.begin(), fanout_with_ancestors.end(), peer_id);
    1if (it != fanout_with_ancestors.end()) {
    2    for (const auto wtxid: parents) {
    3        m_txreconciliation->TryRemovingFromSet(peer_id, wtxid);
    4        invs_to_send.push_back(wtxid);
    5    }
    6} else {
    7    // If the peer is registered for set reconciliation, maybe pick it as fanout
    8    fanout = m_txreconciliation->ShouldFanoutTo(peer_id, fanout_targets);
    9}
    

    This may be a bit counterintuitive. I’ll give some though and potentially re-design so it clearer


    naumenkogs commented at 7:12 am on November 14, 2024:

    Say there is an in-mempool parent announced earlier, so it’s not in any of the sets. SortPeersByFewestParents will return the same order as m_states (since all parent_count are 0), and fanout_with_ancestors will take first two after resize.

    Then for two peers (specifically, the first two from m_states) this code is executed:

    0                for (const auto wtxid: parents) {
    1                    m_txreconciliation->TryRemovingFromSet(peer_id, wtxid);
    2                    invs_to_send.push_back(wtxid);
    3                }
    

    TryRemovingFromSet will do nothing. But then invs_to_send.push_back(wtxid); is executed anyway. And always for the same two first peers from m_states. In other words, the first two are guaranteed to take the fanout in this case.


    sr-gi commented at 7:49 pm on November 14, 2024:

    Sorry, you’re right. I’ve been looking at this being conditional to m_txreconciliation->TryRemovingFromSet(peer_id, wtxid), which is what happens with the descendants, but here it is unconditional.

    It should be added only if TryRemovingFromSet succeeds, and therefore the ordering in case there are no ancestors won’t matter. I’ll also add some comments regarding that.

    Notice the point of this is not to add something to invs_to_send but rather to move it from m_txreconciliation to invs_to_send, otherwise things would be sent more than once (or potentially captured by the bloom filters before hand, but still not optimal)


    sr-gi commented at 8:49 pm on November 14, 2024:
    Made the move dependant on TryRemovingFromSet
  107. in src/net_processing.cpp:2232 in fd7c35a226 outdated
    2212+                // we also remove and fanout the conflict and all its descendants.
    2213+                // This is because our peer may have added the conflicting transaction
    2214+                // to its set, in which reconciliation of these two would fail
    2215+                if (const auto collision = result.m_conflict; collision.has_value()) {
    2216+                    CTxMemPool::setEntries descendants;
    2217+                    WITH_LOCK(m_mempool.cs, m_mempool.CalculateDescendants(m_mempool.get_iter_from_wtxid(collision.value()), descendants));
    


    naumenkogs commented at 10:06 am on November 1, 2024:
    Why having a collision means we should fanout the children? They may get reconciled in their own time.

    sr-gi commented at 2:20 pm on November 1, 2024:
    Because if you remove the parent from the reconciliation set but not the children, it could be the case that you reconcile with that peer before fanning out the parent, and all those children will be orphan

    naumenkogs commented at 9:54 am on November 4, 2024:
    Something worth documenting too :) Again, at least one place where we talk about handling dependencies.

    sr-gi commented at 10:01 pm on November 13, 2024:
    Done
  108. refactor: remove legacy comments
    These comments became irrelevant in one of the previous code changes.
    They simply don't make sense anymore.
    7e42a50631
  109. sr-gi force-pushed on Nov 1, 2024
  110. sr-gi commented at 8:44 pm on November 1, 2024: member

    Rebased to make CI green, plus covered (and masked as resolved) some outstanding comments:

    #30116 (review) #30116 (review) #30116 (review) #30116 (review) #30116 (review) #30116 (review)

  111. sr-gi force-pushed on Nov 1, 2024
  112. DrahtBot added the label CI failed on Nov 1, 2024
  113. sr-gi force-pushed on Nov 1, 2024
  114. DrahtBot removed the label CI failed on Nov 2, 2024
  115. in src/node/txreconciliation.cpp:261 in e942f3a749 outdated
    151+
    152+        // Check if the reconciliation set is not at capacity for two reasons:
    153+        // - limit sizes of reconciliation sets and short id mappings;
    154+        // - limit CPU use for sketch computations.
    155+        //
    156+        // Since we reconcile frequently, reaching capacity either means:
    


    naumenkogs commented at 9:16 am on November 4, 2024:

    e942f3a7493908988eea08640d87b2ae420c5eef

    This entire paragraph currently seems to me more confusing than useful. I suggest dropping it altogether.


    sr-gi commented at 8:52 pm on November 14, 2024:
    The whole thing from L151-L160?

    naumenkogs commented at 6:13 am on November 26, 2024:
    This comment is related to L156-160. L151-155 could be dropped for a different reason (redundancy with the MAX_RECONSET_SIZE comment)

    sr-gi commented at 9:31 pm on November 26, 2024:
    Covered in the last push
  116. in src/node/txreconciliation.cpp:257 in e942f3a749 outdated
    147+        AssertLockNotHeld(m_txreconciliation_mutex);
    148+        LOCK(m_txreconciliation_mutex);
    149+        auto peer_state = GetRegisteredPeerState(peer_id);
    150+        if (!peer_state) return false;
    151+
    152+        // Check if the reconciliation set is not at capacity for two reasons:
    


    naumenkogs commented at 9:18 am on November 4, 2024:

    e942f3a7493908988eea08640d87b2ae420c5eef

    Duplicates the comment around MAX_RECONSET_SIZE. One of them we better dop.


    sr-gi commented at 8:53 pm on November 14, 2024:
    I’m guessing this is related to #30116 (review), in which case I’ll drop that
  117. in src/net_processing.cpp:5827 in 9005953037 outdated
    5823+                            auto result = m_txreconciliation->AddToSet(pto->GetId(), wtxid);
    5824+                            if (!result.m_succeeded) {
    5825+                                vInv.push_back(inv);
    5826+                                // If the transaction fails because it collides with an existing one,
    5827+                                // we also remove and fanout the conflict and all its descendants.
    5828+                                // This is because our peer may have added the conflicting tranaction
    


    naumenkogs commented at 9:44 am on November 4, 2024:

    90059530379cf99cf4b82d664ad35c1ddefc3d07

    “Would fail” is very confusing. “Failed reconciliation” will have a technical meaning in the latter commits. Here, i’d rather be more explicit “two transactions mapped to the same short-id won’t be announced between these two peers”.


    sr-gi commented at 10:02 pm on November 13, 2024:
    Done
  118. in src/net_processing.cpp:5826 in 9005953037 outdated
    5822+                            // If the transactions fails to get into the set, we fanout
    5823+                            auto result = m_txreconciliation->AddToSet(pto->GetId(), wtxid);
    5824+                            if (!result.m_succeeded) {
    5825+                                vInv.push_back(inv);
    5826+                                // If the transaction fails because it collides with an existing one,
    5827+                                // we also remove and fanout the conflict and all its descendants.
    


    naumenkogs commented at 9:45 am on November 4, 2024:
    Let’s provide a hint why fanouting the descendants (or at least send a reader to where the dependency issue is explained better — we should have a one place explaining it)

    sr-gi commented at 10:02 pm on November 13, 2024:
    Done
  119. in src/node/txreconciliation.cpp:62 in c790aa0e7e outdated
    55@@ -56,6 +56,13 @@ class TxReconciliationState
    56      */
    57     uint64_t m_k0, m_k1;
    58 
    59+    /**
    60+    * Set of transactions to be added to the reconciliation set on the next trickle. These are still unrequestable for
    61+    * privacy reasons (to prevent transaction probing), transactions became available (moved to m_local_set) once they
    62+    * would have been announced via fanout.
    


    naumenkogs commented at 10:04 am on November 4, 2024:

    c790aa0e7ebfcf0b03b4d0ef5fcc0ddfa4199943

    I think “upon trickle” or “see m_next_inv_send_time” is much less confusing.

  120. in src/node/txreconciliation.cpp:279 in c790aa0e7e outdated
    285+        if (peer_state->m_delayed_local_set.insert(wtxid).second) {
    286             peer_state->m_short_id_mapping.emplace(short_id, wtxid);
    287             LogPrintLevel(BCLog::TXRECONCILIATION, BCLog::Level::Debug, "Added %s to the reconciliation set for peer=%d. "
    288                                                                         "Now the set contains %i transactions.\n",
    289-                          wtxid.ToString(), peer_id, peer_state->m_local_set.size());
    290+                          wtxid.ToString(), peer_id, set_size + 1);
    


    naumenkogs commented at 10:12 am on November 4, 2024:
    c790aa0e7ebfcf0b03b4d0ef5fcc0ddfa4199943 This is probably the most important log in the entire file, and I think it’s better to distinguish two set sizes here (preferably other places the size is logged too).

    sr-gi commented at 9:16 pm on November 14, 2024:

    I changed ReconSetSize to return a tuple, and updated this to read:

    0 "Now the set contains %i reconcilable transactions (plus %i delayed transactions).\n",
    

    sr-gi commented at 9:22 pm on November 14, 2024:
    I have also applied the same logic when removing

    sr-gi commented at 4:08 pm on November 26, 2024:
    Can this be resolved?

    naumenkogs commented at 8:01 am on November 27, 2024:
    yes
  121. bitcoin deleted a comment on Nov 4, 2024
  122. sr-gi force-pushed on Nov 13, 2024
  123. in src/net_processing.cpp:2186 in ee46e21a5c outdated
    2184+            // to reconcile, fanout the transaction an all its ancestors. We just add the parents here and leave fanout as true
    2185+            auto it = std::find(fanout_with_ancestors.begin(), fanout_with_ancestors.end(), peer_id);
    2186+            if (it != fanout_with_ancestors.end()) {
    2187+                for (const auto wtxid: parents) {
    2188+                    m_txreconciliation->TryRemovingFromSet(peer_id, wtxid);
    2189+                    invs_to_send.push_back(wtxid);
    


    naumenkogs commented at 7:16 am on November 14, 2024:

    ee46e21a5c913175415a192661b406b328209800

    First I thought why don’t fanout=true here and handle it below. We can’t do it because txid/wtxid difference (although we could have a txid_or_wtxid variable above….). But then, you certainly want to at least set fanout=false here, otherwise it remains true and added to invs_to_send twice (it’s not critical but a bug still).


    sr-gi commented at 9:32 pm on November 14, 2024:

    Not sure if this comment only applied before #30116 (review). But I think this is not currently the case.

    Here we are only adding the parents. Leaving fanout=true triggers adding the actual transaction RelayTransaction was called with in the following if (fanout) block.

    We could also add it here, but it would involve adding another boolean to skip the if (fanout) block, since we need to populate m_tx_inventory_to_send with data from invs_to_send which happens right after


    sr-gi commented at 9:30 pm on November 26, 2024:
    Is this still an issue?

    naumenkogs commented at 8:04 am on November 27, 2024:
    The issue is re-defining wtxid (for parent_wtxid) while it’s already defined as a transaction-in-question :) That was the source in my confusion originally

    sr-gi commented at 5:46 pm on November 27, 2024:
    Ohh I see, I’ll rename that to parent_wtxid for the internal context

  124. sr-gi force-pushed on Nov 14, 2024
  125. sr-gi force-pushed on Nov 14, 2024
  126. in src/node/txreconciliation.cpp:236 in 1dce9aedc5 outdated
    231+        }
    232+
    233+        for (auto &[parent_count, peer_id]: parents_by_peer) {
    234+            const auto state = std::get<TxReconciliationState>(m_states.find(peer_id)->second);
    235+            for (const auto& wtxid: parents) {
    236+                if (auto found = state.m_local_set.find(wtxid); found != state.m_local_set.end()) {
    


    naumenkogs commented at 6:23 am on November 26, 2024:

    1dce9aedc59eba2b623c7785f59430a586fbd833

    the variable found could be dropped


    sr-gi commented at 4:31 pm on November 26, 2024:
    That commit is from an old revision

    naumenkogs commented at 7:21 am on November 27, 2024:
     0        std::multimap<uint16_t, NodeId> parents_by_peer;
     1        for (const auto &[peer_id, state_or_salt]: m_states) {
     2            if (const auto state = std::get_if<TxReconciliationState>(&state_or_salt)) {
     3                const size_t parent_count = std::count_if(parents.begin(), parents.end(),
     4                       [=](const auto& wtxid){return state->m_local_set.find(wtxid) != state->m_local_set.end();});
     5                parents_by_peer.insert(std::make_pair(parent_count, peer_id));
     6            }
     7        }
     8
     9        std::vector<NodeId> sorted_peers;
    10        sorted_peers.reserve(parents_by_peer.size());
    11        for (const auto &[_, peer_id]: parents_by_peer) {
    12            sorted_peers.emplace_back(peer_id);
    13        }
    14        return sorted_peers;
    

    naumenkogs commented at 7:25 am on November 27, 2024:

    Multimap is ordered by key, and for same-key (same parent_count) since c++11 the order is consistent (documented here). Feel free to include this as a comment, although I’m not even sure we care about the same exact output of multiple SortPeersByFewestParents calls.

    (also it depends on the consistent ordering of m_states iterator too, which could be documented along the same lines, here and elsewhere it is iterated over)


    naumenkogs commented at 7:58 am on November 27, 2024:
    (this could also be done in 4b4d99fb2df2c60a2214487cec627bc560f50f53 GetFanoutTargets, although it’s not that much an improvement over there)

    sr-gi commented at 8:32 pm on December 4, 2024:

    I’ve been playing a bit with this, plus adding a test for SortPeersByFewestParents, which was missing. The multimap does have a consistent ordering, but it does reverse order of insertion instead of order of insertion for ties. Not really like it matters to be honest, if that is something we care about we can always start from a shuffled copy of m_states.

    nvm, the order was changing because this is iterating over m_states, which is an unordered_map.

    I’ll take the suggestion, but I’m thinking about modifying it a bit so the order does not have to be so reliant on m_states, which is something you also pointed out in an older review IIRC. What about something along the line of:

     0std::vector<NodeId> registered_peers;
     1for (const auto &[peer_id, _]: m_states) {
     2    registered_peers.push_back(peer_id);
     3}
     4std::shuffle(registered_peers.begin(), registered_peers.end(), FastRandomContext());
     5
     6std::multimap<uint16_t, NodeId> parents_by_peer;
     7for (const auto &peer_id: registered_peers) {
     8    if (const auto state = GetRegisteredPeerState(peer_id)) {
     9        const size_t parent_count = std::count_if(parents.begin(), parents.end(),
    10               [state](const auto& wtxid){return state->m_local_set.find(wtxid) != state->m_local_set.end();});
    11        parents_by_peer.emplace(parent_count, peer_id);
    12    }
    13}
    14
    15std::vector<NodeId> sorted_peers;
    16sorted_peers.reserve(parents_by_peer.size());
    17for (const auto &[_, node_id]: parents_by_peer) {
    18    sorted_peers.emplace_back(node_id);
    19}
    20
    21return sorted_peers;
    

    naumenkogs commented at 9:02 am on December 5, 2024:

    I think an extra shuffle is a bit too much, because then for consistency you’d do it every time we access m_states? I think a comment in those places where order might matter is sufficient.

    Im fine if you do this extra shuffle i guess.


    sr-gi commented at 6:38 pm on December 5, 2024:
    I added a comment on the header explaining how the order of ties work and dropped the shuffle
  127. in src/net_processing.cpp:2216 in 0092f70314 outdated
    2194+                fanout = m_txreconciliation->ShouldFanoutTo(Wtxid::FromUint256(wtxid), peer_id, inbounds_fanout_tx_relay, outbounds_fanout_tx_relay);
    2195+            }
    2196         }
    2197+
    2198+        if (fanout || !m_txreconciliation->AddToSet(peer_id, Wtxid::FromUint256(wtxid)).m_succeeded) {
    2199+            invs_to_send.push_back(peer->m_wtxid_relay ? wtxid : txid);
    


    naumenkogs commented at 6:37 am on November 26, 2024:

    0092f70314b8d59b41ff4bd2bc15a71f954668d4

    Consider fanout=false because the peer has too many parents (cut the tail from fanout_with_ancestors). Now, we fail to add a transaction to the set (whatever reason, say set too full), and we’re now going to flood a child while reconciling the parents again?

    Perhaps AddToSet result could be handled more carefully instead (or have a todo/comment, this is a corner case of a corner case of course).


    naumenkogs commented at 6:51 am on November 26, 2024:
    This is slightly changed later in 5996d473434d364cff937ae89ffd95cd09c5d1a7. AddToSetResult::Failed still not handled though?

    sr-gi commented at 7:19 pm on November 26, 2024:

    Yes, that’s right. In the current revision, if the set is full and the given transaction has some ancestors, the children will be flooded while the parents will be reconciled, which may cause orphans.

    I think we could handle this at the same level that we handled fanout_with_ancestors:

    • Fanout to peers that have ancestors and have a full set
    • Fanout to peers that have the least amount of ancestors

    However, this will still be an issue in some cases, given we only select a subset of peers to fanout to when these conditions hold (currently 2, but let’s say n). If we do not fanout to all peers that fall under these conditions, there will always be some that will receive parents and children using different approaches, which may cause orphans. However, I think it is undesirable to fanout to all matches in this situation.

    This raises the question, is it worth the added complexity, or should we add a comment acknowledging that this can be the case, but that we expect it to happen really infrequently?


    sr-gi commented at 9:45 pm on December 3, 2024:

    I added a comment for this instead of making a full set part of the sorting criteria. This should not happen under normal circumstances provided the set size is defined to account for way over the expected traffic between reconciliations. A peer hitting the limit is likely to be either broken or an attacker, and I don’t think we should be catering to them (nor making the logic more complex based on that)

    https://github.com/bitcoin/bitcoin/pull/30116/commits/f51e36057e509356890a70ced3a7209f98e8db4f

  128. in src/net_processing.cpp:2204 in 0092f70314 outdated
    2202+        for (const auto& hash : invs_to_send) {
    2203+            if (!tx_relay->m_tx_inventory_known_filter.contains(hash)) {
    2204+                tx_relay->m_tx_inventory_to_send.insert(hash);
    2205+            }
    2206+        }
    2207+        invs_to_send.clear();
    


    naumenkogs commented at 6:39 am on November 26, 2024:

    https://github.com/bitcoin/bitcoin/commit/0092f70314b8d59b41ff4bd2bc15a71f954668d4

    consider defining this variable inside the loop rather than cleaning it after each iteration?

  129. in src/node/txreconciliation.cpp:432 in 840a4e14f2 outdated
    444@@ -381,9 +445,9 @@ class TxReconciliationTracker::Impl
    445         }
    446 
    447         for (auto &[parent_count, peer_id]: parents_by_peer) {
    448-            const auto state = std::get<TxReconciliationState>(m_states.find(peer_id)->second);
    449+            auto state = std::get<TxReconciliationState>(m_states.find(peer_id)->second);
    450             for (const auto& wtxid: parents) {
    451-                if (auto found = state.m_local_set.find(wtxid); found != state.m_local_set.end()) {
    452+                if (state.ContainsTx(wtxid, /*include_delayed=*/true)) {
    


    naumenkogs commented at 7:05 am on November 26, 2024:

    840a4e14f2d833717c87a54ec2707a92d5131ada

    been thinking about the use of a delayed set here.

    We apply the delay only on addition, but not on fanouting parents (or counting them), neither on removing stuff from the sets (say at receiving a transaction).

    For me the choice is pretty arbitrary (and probably fine to save the code complexity for this rare corner case), but perhaps we would benefit from better documentation around it.


    sr-gi commented at 8:24 pm on November 26, 2024:
    I think I’m not following here. We do apply the delayed set both on addition (AddToSet), deletion (RemoveFromSet), and when counting parents (this bit of code is part of the parent sorting).

    naumenkogs commented at 6:45 am on November 27, 2024:
    I mean the delay specifically, not the set. If we talk about an abstract “reconciliation set” (which is a combination of two), you apply the delay for an addition, but not for the removal action.

    sr-gi commented at 5:47 pm on December 4, 2024:

    I see your point. The reasoning for this is so we can use ContainsTx in testing to check that things have been added/removed to/from the proper collection. Throughout the rest of the codebase, we always call this with true, and we do an implicit check on deletion.

    I’m happy to consider an alternative that covers the same functionality if you have one

  130. in src/node/txreconciliation.cpp:416 in fc06786f32 outdated
    454-     * due to caching. Thus, we can save us adding that extra function for now.
    455-     */
    456-    bool ShouldFanoutTo(const Wtxid& wtxid, NodeId peer_id,
    457-                        size_t inbounds_fanout_tx_relay, size_t outbounds_fanout_tx_relay)
    458-        EXCLUSIVE_LOCKS_REQUIRED(!m_txreconciliation_mutex)
    459+    bool ShouldFanoutTo(NodeId peer_id, std::vector<NodeId> &fanout_targets) EXCLUSIVE_LOCKS_REQUIRED(!m_txreconciliation_mutex)
    


    naumenkogs commented at 7:10 am on November 26, 2024:

    fc06786f32b2fc82c9571ebde67b4ff54f4ad825

    this wrapper is not needed anymore. The Registered check already happens at the caller site. Just drop it and expose GetFanoutTargets?


    sr-gi commented at 8:29 pm on November 26, 2024:
    Sure. This also implies dropping the bench for it, which honestly I do not think is as useful anymore now that we do not have a cache

    naumenkogs commented at 8:00 am on November 27, 2024:
    Yeah, and perhaps good time to squash this into eb2b19d2c237f2cc7b4ff3e7cefec52a37eb24b3, it got much stale by now.

    sr-gi commented at 9:47 pm on December 3, 2024:
    Done in the last force push
  131. in src/node/txreconciliation.cpp:417 in fc06786f32 outdated
    423-        std::sort(best_peers.begin(), best_peers.end());
    424-
    425-        auto it = best_peers.begin();
    426-        for (size_t i = 0; i < targets_size && it != best_peers.end(); ++i, ++it) {
    427-            if (it->second == peer_id) return true;
    428+        // Sort the peers based on their assigned random value, extract the node_ids and trim the collections to size
    


    naumenkogs commented at 10:44 am on November 26, 2024:
     0        // TODO: now rename sorted_inbounds to weighted_inbounds
     1        std::vector<NodeId> fanout_targets(inbounds_target_size + outbounds_target_size);
     2        auto collect_fanout_targets = [&fanout_targets](std::vector<std::pair<uint64_t, NodeId>> weighted_peers, const size_t target_size) {
     3            const size_t result_size = std::min(target_size, weighted_peers.size());
     4            if (result_size == 0) return;
     5            std::nth_element(weighted_peers.begin(), weighted_peers.begin() + result_size, weighted_peers.end());
     6            for_each(weighted_peers.begin(), weighted_peers.begin() + result_size,
     7                    [&fanout_targets](auto& keyed_peer) { fanout_targets.push_back(keyed_peer.second); });
     8        };
     9
    10        collect_fanout_targets(sorted_inbounds, inbounds_target_size);
    11        collect_fanout_targets(sorted_outbounds, outbounds_target_size);
    12        return fanout_targets;
    

    sr-gi commented at 9:00 pm on November 26, 2024:
    I think this is slightly harder to follow, but I like that the collections are only sorted up to some point, so I’ll take it. Instead of using the std::min, I’ve added two Assume when defining {outbounds, inbounds}_target_size to ensure that the targets are never bigger than the sets the pull from, since this should never be the case

    sr-gi commented at 9:29 pm on November 26, 2024:
    Covered in 4b4d99f
  132. in src/node/txreconciliation.cpp:394 in fc06786f32 outdated
    408-            if (peer_state && peer_state->m_we_initiate == we_initiate) {
    409-                uint64_t hash_key = CSipHasher(deterministic_randomizer).Write(node_id).Finalize();
    410-                best_peers.emplace_back(hash_key, node_id);
    411+            if (peer_state) {
    412+                if (peer_state->m_we_initiate) {
    413+                    uint64_t hash_key = CSipHasher(deterministic_randomizer_out).Write(node_id).Finalize();
    


    naumenkogs commented at 11:33 am on November 26, 2024:
     0        auto assign_key = [](const TxReconciliationState& peer_state, NodeId node_id, CSipHasher randomizer, std::vector<std::pair<uint64_t, NodeId>>& weighted_peers) {
     1            uint64_t hash_key = randomizer.Write(node_id).Finalize();
     2            weighted_peers.emplace_back(hash_key, node_id);
     3        };
     4
     5        for (const auto& [node_id, op_peer_state]: m_states) {
     6            const auto peer_state = std::get_if<TxReconciliationState>(&op_peer_state);
     7            if (peer_state) {
     8                if (peer_state->m_we_initiate) {
     9                    assign_key(*peer_state, node_id, deterministic_randomizer_out, sorted_outbounds);
    10                } else {
    11                    assign_key(*peer_state, node_id, deterministic_randomizer_in, sorted_inbounds);
    12                }
    13            }
    14        }
    

    very subjectively a little cleaner :)


    sr-gi commented at 9:29 pm on November 26, 2024:
    I don’t think it makes a huge difference, but I’ll take it -> 4b4d99f
  133. in src/net_processing.cpp:2166 in 0092f70314 outdated
    2154+                if (!parents_refs.empty()) {
    2155+                    for (const auto &tx : parents_refs) {
    2156+                        parents.emplace_back(tx.get().GetTx().GetWitnessHash());
    2157+                    }
    2158+                    fanout_with_ancestors = m_txreconciliation->SortPeersByFewestParents(parents);
    2159+                    fanout_with_ancestors.resize(2); // FIXME: Resize to 2 for now
    


    naumenkogs commented at 11:42 am on November 26, 2024:
    Better check if the size is not 1 already? It’s currently safe, but could cause bugs in the future.

    sr-gi commented at 9:33 pm on November 26, 2024:
    This is just temporary. It should not be merged like this. I’ll leave it open once we decide on a clear way to trim
  134. sr-gi force-pushed on Nov 26, 2024
  135. sr-gi force-pushed on Nov 26, 2024
  136. in src/node/txreconciliation.cpp:334 in 8a73b1ca91 outdated
    219+        if (registered && !registered->m_we_initiate) {
    220+            Assert(m_inbounds_count > 0);
    221+            --m_inbounds_count;
    222         }
    223+
    224+        m_states.erase(peer);
    


    naumenkogs commented at 7:35 am on November 27, 2024:

    8a73b1ca91b1ff47b56451ca4074f73ef71b9f79

    Mind handling the result of erasure here? Either Assert/Assume or if


    sr-gi commented at 4:29 pm on December 4, 2024:
    Covered in 9918f3b08768dece8d8daf73b8144ada60c4ae5a
  137. in src/node/txreconciliation.cpp:400 in 4b4d99fb2d outdated
    420-        for (size_t i = 0; i < targets_size && it != best_peers.end(); ++i, ++it) {
    421-            if (it->second == peer_id) return true;
    422-        }
    423-        return false;
    424+        // Make sure we never select more targets than we can
    425+        Assert(outbounds_target_size <= weighed_outbounds.size());
    


    naumenkogs commented at 7:48 am on November 27, 2024:

    4b4d99fb2df2c60a2214487cec627bc560f50f53

    this could be moved into the lambda :)


    sr-gi commented at 4:28 pm on December 4, 2024:
    That means capturing 4 additional arguments in the lambda just for these two asserts, which seems overkill

    sr-gi commented at 7:05 pm on December 4, 2024:
    TIL about [=] for lambdas, I may take it on the next push

    sr-gi commented at 6:38 pm on December 5, 2024:
    Added in 4d66110
  138. DrahtBot added the label CI failed on Dec 3, 2024
  139. sr-gi force-pushed on Dec 3, 2024
  140. DrahtBot removed the label CI failed on Dec 3, 2024
  141. sr-gi force-pushed on Dec 4, 2024
  142. sr-gi force-pushed on Dec 5, 2024
  143. sr-gi force-pushed on Dec 5, 2024
  144. DrahtBot added the label CI failed on Dec 5, 2024
  145. DrahtBot commented at 7:04 pm on December 5, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/33993403729

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  146. DrahtBot removed the label CI failed on Dec 5, 2024
  147. in src/node/txreconciliation.cpp:422 in 4b9f83a8b7 outdated
    398+
    399+        // Sort the peers based on their assigned random value, extract the node_ids and trim the collections to size
    400+        std::vector<NodeId> fanout_targets(inbounds_target_size + outbounds_target_size);
    401+        auto collect_fanout_targets = [&](std::vector<std::pair<uint64_t, NodeId>> weighted_peers, const size_t target_size) {
    402+            // Make sure we never select more targets than we can
    403+            Assert(outbounds_target_size <= weighed_outbounds.size());
    


    marcofleon commented at 5:21 pm on December 10, 2024:

    Fuzzed the class for a bit (fuzz test here: https://github.com/marcofleon/bitcoin/commits/erlay30116-fuzztest/) and this assertion failed.

    I think having a preregistered peer and one inbound peer in m_states causes outbound_target_size to be 1 while weighed_outbounds.size() is 0 because assign_key is never called.

    The equivalent unit test (which fails when I run it) would be something like

     0CSipHasher hasher(0x0706050403020100ULL, 0x0F0E0D0C0B0A0908ULL);
     1TxReconciliationTracker tracker(TXRECONCILIATION_VERSION, hasher);
     2NodeId peer_id0 = 0;
     3NodeId peer_id1 = 1;
     4FastRandomContext frc{/*fDeterministic=*/true};
     5std::vector<NodeId> fanout_targets;
     6
     7tracker.PreRegisterPeer(peer_id0);
     8tracker.PreRegisterPeer(peer_id1);
     9BOOST_REQUIRE_EQUAL(tracker.RegisterPeer(peer_id1, /*is_peer_inbound=*/true, 1, 1), ReconciliationRegisterResult::SUCCESS);
    10fanout_targets = tracker.GetFanoutTargets(Wtxid::FromUint256(frc.rand256()), /*inbounds_fanout_tx_relay=*/0, /*outbounds_fanout_tx_relay=*/0);
    

    Maybe it makes sense to have separate maps for preregistered vs registered peers?


    naumenkogs commented at 7:52 am on December 11, 2024:

    Great fuzzing!

    ahhhh the sloppy part is m_states.size() - m_inbounds_count; , it ignores the presence of preregistered peers.

    Maybe it makes sense to have separate maps for preregistered vs registered peers?

    I’d rather have m_outbounds_count cached value (all similar to m_inbounds_count), but no big deal.


    sr-gi commented at 7:24 pm on January 2, 2025:

    I’ve cached m_outbounds_count to prevent this from happening.

    This made me realize that fanout_targets was incorrectly initialized, instead of reserving space for inbounds_target_size + outbounds_target_size, it was being initialized with inbounds_target_size + outbounds_target_size pre-existing elements initialized to 0, meaning that the returned vector contained more elements than expected


    sr-gi commented at 7:29 pm on January 2, 2025:
    Addressed in 804ccf030362fe5b232fe5c4213b77899a425f2b
  148. in src/node/txreconciliation.cpp:383 in 4b9f83a8b7 outdated
    378+        std::vector<std::pair<uint64_t, NodeId>> weighted_inbounds, weighed_outbounds;
    379+        weighted_inbounds.reserve(m_inbounds_count);
    380+        Assume(m_states.size() >= m_inbounds_count);
    381+        weighed_outbounds.reserve(m_states.size() - m_inbounds_count);
    382+
    383+        auto assign_key = [](const TxReconciliationState& peer_state, NodeId node_id, CSipHasher randomizer, std::vector<std::pair<uint64_t, NodeId>>& weighted_peers) {
    


    marcofleon commented at 5:22 pm on December 10, 2024:
    Does peer_state need to be passed in here? I don’t think it’s used.

    sr-gi commented at 7:20 pm on January 2, 2025:
    It does not. I’ll remove it

    sr-gi commented at 7:29 pm on January 2, 2025:
    Addressed in 804ccf030362fe5b232fe5c4213b77899a425f2b
  149. in src/node/txreconciliation.cpp:217 in 131c28bfd9 outdated
    132@@ -115,10 +133,57 @@ class TxReconciliationTracker::Impl
    133                       peer_id, is_peer_inbound);
    134 
    135         const uint256 full_salt{ComputeSalt(local_salt, remote_salt)};
    136-        recon_state->second = TxReconciliationState(!is_peer_inbound, full_salt.GetUint64(0), full_salt.GetUint64(1));
    137+
    138+        auto new_state = TxReconciliationState(!is_peer_inbound, full_salt.GetUint64(0), full_salt.GetUint64(1));;
    139+        m_states.erase(recon_state);
    140+        bool emplaced = m_states.emplace(peer_id, std::move(new_state)).second;
    


    ariard commented at 0:22 am on December 22, 2024:

    I think this snippet of code could be documented in the following fashion

    0// If we receive many `SENDTXRCNCL` from the same peer, before to receive a
    1// single `VERACK`, re-register the peer id state with potentially different local
    2// and remote salts.
    

    As far as I can see, at SENDTXRCNCL reception we do not check for duplicate SENDTXRCNCL message (we check if we have txreconciliation enabled, then that it’s happening before verack reception and then that transaction-relay is enabled on the link) and we can call RegisterPeer multiple times.


    sr-gi commented at 8:43 pm on January 2, 2025:

    Calling RegisterPeer with the same peer_id multiple times will result in ReconciliationRegisterResult::ALREADY_REGISTERED.

    The docs of both PreRegisterPeer and RegisterPeer specify that they should called only once.

  150. in src/node/txreconciliation.cpp:153 in 131c28bfd9 outdated
    149+        LOCK(m_txreconciliation_mutex);
    150+        auto peer_state = GetRegisteredPeerState(peer_id);
    151+        if (!peer_state) return false;
    152+
    153+        // Transactions which don't make it to the set due to the limit are announced via fan-out.
    154+        if (peer_state->m_local_set.size() >= MAX_RECONSET_SIZE) return false;
    


    ariard commented at 0:29 am on December 22, 2024:

    Logging that a given peer’s local reconciliation set max size has been reached can be useuful info for future debug.

     0diff --git a/src/node/txreconciliation.cpp b/src/node/txreconciliation.cpp
     1index d4d6bdf1ea..b99bb41681 100644
     2--- a/src/node/txreconciliation.cpp
     3+++ b/src/node/txreconciliation.cpp
     4@@ -150,7 +150,11 @@ public:
     5         if (!peer_state) return false;
     6 
     7         // Transactions which don't make it to the set due to the limit are announced via fan-out.
     8-        if (peer_state->m_local_set.size() >= MAX_RECONSET_SIZE) return false;
     9+        if (peer_state->m_local_set.size() >= MAX_RECONSET_SIZE) {
    10+            LogPrintLevel(BCLog::TXRECONCILIATION, BCLog::Level::Debug, "Reconciliation set maximum size reachedfor peer=%d. ",
    11+                    peer_id);
    12+            return false;
    13+        }
    14 
    15         // The caller currently keeps track of the per-peer transaction announcements, so it
    16         // should not attempt to add same tx to the set twice. However, if that happens, we will
    

    sr-gi commented at 8:44 pm on January 2, 2025:
    Added in b84348a
  151. in src/node/txreconciliation.cpp:215 in 8811fd28e1 outdated
    208@@ -206,6 +209,27 @@ class TxReconciliationTracker::Impl
    209     }
    210 };
    211 
    212+AddToSetResult::AddToSetResult(bool succeeded, std::optional<Wtxid> conflict)
    213+{
    214+    m_succeeded = succeeded;
    215+    m_conflict = conflict;
    


    ariard commented at 0:45 am on December 22, 2024:
    I think the variable naming could be enhanced, e.g m_collision, to dissociate from mempool conflicts. Here, it’s a collision from SipHash outputs that can arise from 2 distinct transactions, not even spending the same set of UTXOs.

    sr-gi commented at 8:46 pm on January 2, 2025:
    Addressed in 0e290996874200317ba4d3f1b3e12992b40db4d1
  152. in src/net_processing.h:112 in 478137aa82 outdated
    107@@ -108,6 +108,9 @@ class PeerManager : public CValidationInterface, public NetEventsInterface
    108     /** Relay transaction to all peers. */
    109     virtual void RelayTransaction(const uint256& txid, const uint256& wtxid) = 0;
    110 
    111+    /** Get the amount on inbounds and outbounds fanout peers. */
    112+    virtual std::pair<size_t, size_t> GetFanoutPeersCount() = 0;
    


    ariard commented at 0:48 am on December 22, 2024:
    nit: s/Get the amount on/Get the amount of/g. And can say inbound is given first.

    sr-gi commented at 8:46 pm on January 2, 2025:
    Addressed in a461a1204109762ea38b21d9b6c445e9b9ba570f
  153. ariard commented at 1:05 am on December 22, 2024: contributor
    Reviewed up to commit 478137aa (“p2p: Add PeerManager method to count…”). Just minor comments so far and I still have to answer back comments on #28765.
  154. in src/node/txreconciliation.h:159 in c86f4f3359 outdated
    129+     * can be found in their reconciliation sets (from less to more). If two values tie, they will be
    130+     * sorted in order or addition, which depends on the internal order of [m_states].
    131+     * This is meant to be called with a collection of ancestor ids for a given transaction in order to
    132+     * decide whether the transaction should be fanout with ancestors or not.
    133+    */
    134+    std::vector<NodeId> SortPeersByFewestParents(std::vector<Wtxid> parents);
    


    ariard commented at 10:13 pm on January 1, 2025:
    I think the rationale for introducing SortPeersByFewestParents could be more fleshed out, i.e iiuc explaining that eligible peers with the lowest number of parents for a target transaction in their reconciliation sets are favored for fan-out announcements, as this can be a hint of better connectivity in the overall transaction-relay graph and we wish to propagate the target transaction faster (..?).

    sr-gi commented at 7:58 pm on January 2, 2025:

    The rationale for who this method is used is provided in RelayTransaction when the method is used

    https://github.com/bitcoin/bitcoin/blob/804ccf030362fe5b232fe5c4213b77899a425f2b/src/net_processing.cpp#L2154-L2159

    I think the docs give enough rationale as to what this is intended for without being excessively narrow as to what the use cases could be.

  155. in src/net_processing.cpp:2245 in feb8c98db1 outdated
    2219+            // This should not happen under normal conditions, given the set size should be well over the number of transactions received
    2220+            // between reconciling intervals. A peer hitting the limit is likely to be either a broken implementation or an attacker.
    2221+            invs_to_send.push_back(peer->m_wtxid_relay ? wtxid : txid);
    2222+        }
    2223+
    2224+        for (const auto& hash : invs_to_send) {
    


    ariard commented at 10:46 pm on January 1, 2025:
    If the peer is m_wtxid_relay=false should m_tx_inven tory_known_filter be checked by txid ? Note, given the parent identifiers are retrieved from GetMemPoolParents() we can have the identifiers easily. This could avoid some duplicated INV flooding.

    sr-gi commented at 7:48 pm on January 2, 2025:
    This is already the case. The transaction that is passed to RelayTransaction is added to invs_to_send using wtxid/txid based on m_wtxid_relay (that is part of the code you quoted already). For other transactions that may be in inv_to_send (e.g. ancestors and/or descendants), this only apply to Erlay, and Erlay enforces WTXID_RELAY
  156. ariard commented at 10:58 pm on January 1, 2025: contributor
    Reviewed up to commit feb8c98db (“p2p: Add transactions to reconciliation sets…”).
  157. sr-gi force-pushed on Jan 2, 2025
  158. p2p: Functions to add/remove wtxids to tx reconciliation sets
    They will be used later on.
    b84348a8fc
  159. p2p: Make short id collision detectable when adding wtxids to tx reconciliation sets 0e29099687
  160. p2p: Add PeerManager method to count the amount of inbound/outbounds fanout peers a461a12041
  161. p2p: Adds a method to sort peers by fewest parent
    When receiving a transaction that depends on an unconfirmed transactions, and needing to
    decide what peer to fanout to, those with fewer parents already in the reconciliation set
    will be prioritized
    8384cadb96
  162. p2p: Cache inbound reconciling peers count
    It helps to avoid recomputing every time we consider
    a transaction for fanout/reconciliation.
    
    Co-authored-by: Sergi Delgado Segura <sergi.delgado.s@gmail.com>
    ea0f6d9d0d
  163. p2p: Add transactions to reconciliation sets
    Transactions eligible for reconciliation are added to the reconciliation sets. For the remaining txs, low-fanout is used.
    
    Co-authored-by:  Gleb Naumenko <naumenko.gs@gmail.com>
    bcdccad41b
  164. p2p: Add helper to compute reconciliation tx short ids and a cache of short ids to wtxids d5404c2dc0
  165. p2p: Deal with shortid collisions for reconciliation sets
    If a wtxid to be added to a peer's recon set has a shot id collisions (a previously
    added wtxid maps to the same short id), both transaction should be fanout, given
    our peer may have added the opposite transaction to our recon set, and reconciliation
    will fail. Moreover, all descendants of the previously added transaction that were in
    the recon set should also be removed and fanout.
    bbdc84ca0f
  166. p2p: Makes transactions available for reconciliation on trickle
    Splits the reconciliation set in two, a delayed set and an available set.
    Transactions are added to the delayed set and made available when on the next
    trickle interval for the peer. This prevents adversarial nodes from proving
    our reconciliation set by spamming reconciliation requests, in an equivalent
    manner to how fanout delays work
    b1b85105be
  167. sr-gi force-pushed on Jan 2, 2025
  168. in src/node/txreconciliation.cpp:227 in bbdc84ca0f outdated
    224+        // Make sure there is no short id collision between the wtxid we are trying to add
    225+        // and any existing one in the reconciliation set
    226+        Wtxid collision;
    227+        uint32_t short_id;
    228+        if (HasCollision(peer_state, wtxid, collision, short_id)) {
    229+            return AddToSetResult::Collision(collision);
    


    ariard commented at 10:44 pm on January 2, 2025:
    nit: The Collision here could be LogPrintLevel(BCLog::TXRECONCILIATION, BCLog::Level::Debug, “…”).
  169. ariard commented at 10:56 pm on January 2, 2025: contributor
    Reviewed up to 4b9f83a (“p2p: Makes transactions available…”), all the code changes. I’ll check the test coverage.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me