p2p: Fill reconciliation sets (Erlay) attempt 2

sr-gi commented at 8:01 pm on May 15, 2024: member

This is a re-attempt of #28765

The new approach differs from the previous in how peers are selected for fanout:

For Erlay-enabled peers, fanout targets are chosen for each transaction individually. Once the fanout target is reached, the transaction is reconciled with the rest of our peers. Whether a peer is selected for fanout depends on factors like peer type (inbound or outbound), how the transaction was received (via fanout or reconciliation), and how many of our peers are already aware of it.

For outbound peers: If the transaction was received through fanout (or we originated it), we set a fanout rate threshold OUTBOUND_FANOUT_THRESHOLD{4}. As long as the transaction has been announced to (or received from) fewer than this number of peers, we continue faning out based on the order in which peers’ Poisson timers go off.

If the transaction was received via reconciliation, we simply reconcile it with the rest of our peers.

For inbound peers: We select a subset of inbound peers as fanout targets and send all transactions to them within a set time interval. Once the interval ends, the selection is randomly rotated.

This approach helps scale the transaction fanout rate for outbound peers by approximating how widely a transaction has already been propagated. For inbound peers, we use a random selection strategy to prevent them from easily exploiting our heuristic.

You may notice 398078a8a92725f33ac28107f8fa045e02b2c5bd is not currently being used. It is likely to be dropped and brought back in a follow-up. Currently, we are picking Erlay fanout targets independently of whether we already have other non-erlay peers (which are always fanout). We may want to scale the selection based on non-Erlay connections, but I think that can be addressed on it’s own PR.

Erlay Project Tracking: https://github.com/bitcoin/bitcoin/issues/30249

DrahtBot commented at 8:01 pm on May 15, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30116.

Reviews

See the guideline for information on the review process. A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#30277 ([DO NOT MERGE] Erlay: bandwidth-efficient transaction relay protocol (Full implementation) by sr-gi)
#28690 (build: Introduce internal kernel library by TheCharlatan)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

In txreconciliation.h: “In case of a collision, both transactions should be fanout.” -> “should be fanned out” [‘fanout’ is a noun; as a verb form it should be ‘fanned out’]
In net_processing.cpp near “This in only relevant for Erlay nodes…” -> “This is only relevant for Erlay nodes…” [missing “is”]
In net_processing.cpp comment “into a private function We would have to pass…” -> insert period: “…private function. We would have to pass…” [run-on sentence; needs a period for clarity]

drahtbot_id_4_m

DrahtBot added the label P2P on May 15, 2024

sr-gi commented at 8:05 pm on May 15, 2024: member

I’ve talked to @naumenkogs about picking this up and he was happy about it. I’m happy to close this if he changes his mind.

sr-gi renamed this:
~~p2p: Fill reconciliation sets (Erlay) attempt: 2~~
p2p: Fill reconciliation sets (Erlay) attempt 2
on May 15, 2024

in src/node/txreconciliation.h:131 in 7c047d30cb outdated

80+     * Returns whether the transaction appears in the set.
81+     */
82+    bool AddToSet(NodeId peer_id, const Wtxid& wtxid);
83+
84+    /**
85+     * Before Step 2, we might want to remove a wtxid from the reconciliation set, for example if

brunoerg commented at 10:03 pm on May 15, 2024:

In “p2p: Functions to add/remove wtxids to tx reconciliation sets” 7c047d30cb0eadc8a424cb01de2fcd0978e22206: What is the step 2? This comment seems a little confuse because I couldn’t find it as I see for Step 1.

sr-gi commented at 1:44 pm on May 16, 2024:

I think this refers to the steps listed at the beginning of the file. Not all of them are covered yet. This PR leaves this right before 2 AFAICT

brunoerg commented at 3:02 pm on May 16, 2024:

My bad, missed that. Thanks.

in src/node/txreconciliation.cpp:180 in 7c047d30cb outdated

176+                          wtxid.ToString(), peer_id, peer_state->m_local_set.size());
177+        }
178+        return true;
179+    }
180+
181+    bool TryRemovingFromSet(NodeId peer_id, const Wtxid& wtxid_to_remove) EXCLUSIVE_LOCKS_REQUIRED(!m_txreconciliation_mutex)

brunoerg commented at 10:06 pm on May 15, 2024:

In “p2p: Functions to add/remove wtxids to tx reconciliation sets” https://github.com/bitcoin/bitcoin/commit/7c047d30cb0eadc8a424cb01de2fcd0978e22206: Perhaps adding a log when removing from set?

sr-gi commented at 2:51 pm on May 16, 2024:

Updated to track both successful and unsuccessful deletions (in debug log, to prevent unnecessarily spamming the logs)

in src/bench/txreconciliation.cpp:7 in 3f59bf83a4 outdated

0@@ -0,0 +1,40 @@
1+// Copyright (c) 2020-2022 The Bitcoin Core developers
2+// Distributed under the MIT software license, see the accompanying
3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
4+
5+#include <bench/bench.h>
6+
7+#include <net.h>

brunoerg commented at 1:02 pm on May 16, 2024:

In “add bench for ShouldFanoutTo” 3f59bf83a41b7787f9c43c277b5e62f327a72c3e: net.h include seems unnecessary.

sr-gi commented at 2:57 pm on May 16, 2024:

Removed

sr-gi force-pushed on May 16, 2024

in src/test/txreconciliation_tests.cpp:110 in 2e29e9beee outdated

105+    NodeId peer_id1 = 1;
106+    tracker.PreRegisterPeer(peer_id1);
107+    BOOST_REQUIRE_EQUAL(tracker.RegisterPeer(peer_id1, true, 1, 1), ReconciliationRegisterResult::SUCCESS);
108+    BOOST_CHECK(tracker.IsPeerRegistered(peer_id1));
109+
110+    for (size_t i = 0; i < 3000; ++i)

brunoerg commented at 2:51 pm on May 22, 2024:

In “p2p: Functions to add/remove wtxids to tx reconciliation sets” 2e29e9beeebcadfc7afc56a28791456d40551bed: I think MAX_SET_SIZE could be in txreconciliation.h then we can use it in the tests.

 0-/**
 1- * Maximum number of wtxids stored in a peer local set, bounded to protect the memory use of
 2- * reconciliation sets and short ids mappings, and CPU used for sketch computation.
 3- */
 4-constexpr size_t MAX_SET_SIZE = 3000;
 5 
 6 /**
 7  * Maximum number of transactions for which we store assigned fanout targets.
 8diff --git a/src/node/txreconciliation.h b/src/node/txreconciliation.h
 9index c2cdf9875f..648b6369c0 100644
10--- a/src/node/txreconciliation.h
11+++ b/src/node/txreconciliation.h
12@@ -14,6 +14,12 @@
13 /** Supported transaction reconciliation protocol version */
14 static constexpr uint32_t TXRECONCILIATION_VERSION{1};
15 
16+/**
17+ * Maximum number of wtxids stored in a peer local set, bounded to protect the memory use of
18+ * reconciliation sets and short ids mappings, and CPU used for sketch computation.
19+ */
20+constexpr size_t MAX_SET_SIZE = 3000;
21+
22 enum class ReconciliationRegisterResult {
23     NOT_FOUND,
24     SUCCESS,
25diff --git a/src/test/txreconciliation_tests.cpp b/src/test/txreconciliation_tests.cpp
26index a0071e4388..8befe0d36e 100644
27--- a/src/test/txreconciliation_tests.cpp
28+++ b/src/test/txreconciliation_tests.cpp
29@@ -115,7 +115,7 @@ BOOST_AUTO_TEST_CASE(AddToSetTest)
30     BOOST_REQUIRE_EQUAL(tracker.RegisterPeer(peer_id1, true, 1, 1), ReconciliationRegisterResult::SUCCESS);
31     BOOST_CHECK(tracker.IsPeerRegistered(peer_id1));
32 
33-    for (size_t i = 0; i < 3000; ++i)
34+    for (size_t i = 0; i < MAX_SET_SIZE; ++i)
35         BOOST_REQUIRE(tracker.AddToSet(peer_id1, Wtxid::FromUint256(frc.rand256())));
36     BOOST_REQUIRE(!tracker.AddToSet(peer_id1, Wtxid::FromUint256(frc.rand256())));

sr-gi commented at 2:59 pm on May 22, 2024:

I’ve been talking to @sipa about this recently and I think the limit may not be necessary if the approach for when to compute the reconciliation sets is changed. I’m may get rid of it, but if not, I’ll take this

sr-gi commented at 3:05 pm on May 30, 2024:

Ended up taking this, since the limit will be kept in the end.

Thanks!

sr-gi force-pushed on May 30, 2024

sr-gi force-pushed on May 31, 2024

sr-gi commented at 4:29 pm on May 31, 2024: member

I’ve slightly extended the approach adding 3 commits to deal with short id collisions, which were not taken into account. Some of this may be squashable, I’ve added them separately for now so they are easy to diff/review.

This would be missing an additional commit/amend to deal with #28765 (review), which I overlooked when addressing the outstanding comments

sr-gi force-pushed on May 31, 2024

DrahtBot added the label CI failed on May 31, 2024

DrahtBot commented at 4:32 pm on May 31, 2024: contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/25659971810

sr-gi force-pushed on Jun 3, 2024

DrahtBot removed the label CI failed on Jun 3, 2024

sr-gi force-pushed on Jun 7, 2024

sr-gi marked this as a draft on Jun 7, 2024

sr-gi commented at 6:21 pm on June 7, 2024: member

I added two more commits, moving the fanout/reconciling logic to RelayTransaction instead of send message, plus dealing with ancestors in mempool, instead of descendants (which seemed to be the wrong approach).

I’m moving this to draft for now until I clean it a bit, plus get some feedback on the approach

DrahtBot added the label CI failed on Jun 8, 2024

DrahtBot commented at 5:17 am on June 8, 2024: contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/25955152377

sr-gi force-pushed on Jun 10, 2024

in src/node/txreconciliation.cpp:377 in f71575cf43 outdated

424@@ -390,7 +425,13 @@ TxReconciliationTracker::~TxReconciliationTracker() = default;
425 
426 uint64_t TxReconciliationTracker::PreRegisterPeer(NodeId peer_id)
427 {
428-    return m_impl->PreRegisterPeer(peer_id);
429+    const uint64_t local_salt{GetRand(UINT64_MAX)};
430+    return m_impl->PreRegisterPeer(peer_id, local_salt);

Prabhat1308 commented at 11:58 am on June 11, 2024:

What is the intention behind moving salt calculation out of PreRegisterPeer ? Now if PreRegisterPeer is not executed due to lock being held we are doing meaningless computation .

sr-gi commented at 5:09 pm on June 11, 2024:

The purpose is for PreRegisterPeer to be callable with a fixed salt (via PreRegisterPeerWithSalt), so collisions can be tested. This is just moving it out of the PImpl, the effects on the external caller should be the same

sr-gi force-pushed on Jun 11, 2024

DrahtBot removed the label CI failed on Jun 12, 2024

sr-gi force-pushed on Jun 13, 2024

sr-gi marked this as ready for review on Jun 13, 2024

sr-gi commented at 8:01 pm on June 13, 2024: member

This should be ready for review now.

in src/net_processing.cpp:2166 in 9de4233d00 outdated

2366+                if (!parents_refs.empty()) {
2367+                    for (const auto &tx : parents_refs) {
2368+                        parents.emplace_back(tx.get().GetTx().GetWitnessHash());
2369+                    }
2370+                    fanout_with_ancestors = m_txreconciliation->SortPeersByFewestParents(parents);
2371+                    fanout_with_ancestors.resize(2); // FIXME: Resize to 2 for now

sr-gi commented at 8:02 pm on June 13, 2024:

This is pending. Happy get get feedback on how many to pick (either hardcoded or based on what)

Prabhat1308 commented at 3:26 pm on June 16, 2024:

The number of parents per peer should mostly be in the range of (0-2/3) , This should include a significant numbers of peers to reconcile with . A constant number might not be ideal considering if the array returned is smaller than that constant number , we might be reconciling with non-existent peers .

sr-gi commented at 2:53 pm on June 18, 2024:

I’m curious to know where do you get that parent count range from 🤔

Prabhat1308 commented at 7:57 pm on June 19, 2024:

I got some data for about 4100+ transactions from a source (cant be sure of the credibility) for bitcoin transactions in a mempool and i ran a script to count the avg. parents which came to approx. 1.6 . Another way I thought of this is I got the avg. number of inputs per transaction . For 2023-2024 the average is somewhat between 2-3 . If we make a tree like dependency structure for transaction considering the transactions which had their inputs as utxo’s , the probability of having higher ancestories diminishes rapidly since any transaction missing will eliminate the ancestory above it . Considering the mempool are in general highly updated , ancestories beyond grandparent transactions are not very likely to be there . So if I consider this , there are supposed to be 6 ancestors for a transactions on an average and we can expect 2-3 range to have a significant number of transactions.

sr-gi commented at 2:46 pm on June 20, 2024:

Notice this does not apply to all transactions though. The exception here applies to transactions that have some ancestors in the mempool. In that case, we need to be consistent with the way we share this set of transactions, otherwise, it could be the case that the parents are reconciled at time T and the children are fanout at T-n, potentially making the children orphan.

Under these conditions, we are checking how many peers have the ancestors in their reconciliation sets, and selecting a subset of the ones with fewer of them, so the amount of transactions to be fanout is the smallest possible.

brunoerg commented at 2:28 pm on September 25, 2024:

Shouldn’t we decrease the number of inbounds_fanout_tx_relay/outbounds_fanout_tx_relay by the number of peers we’re going to fanout with ancestors?

sr-gi commented at 10:11 am on September 26, 2024:

We could. I left it as for simplicity, given this case should not be too frequent, so having a higher fanout under these conditions shouldn’t be too bad.

I’ll consider adding it if it does not make the code much harder to read, or if there is a clear counterargument in favor of it.

naumenkogs commented at 9:54 am on November 1, 2024:

This function could easily return two numbers: how many of these peers are inbound and outbound. And then just pass this pair along to ShouldFanoutTo. Should be not that complex code. A cheap cost to preserve the efficiency model.

I might try if you don’t get to it earlier.

sr-gi commented at 9:48 pm on December 3, 2024:

Covered in https://github.com/bitcoin/bitcoin/pull/30116/commits/f51e36057e509356890a70ced3a7209f98e8db4f

sr-gi force-pushed on Jun 14, 2024

sr-gi force-pushed on Jun 18, 2024

DrahtBot added the label CI failed on Jun 19, 2024

DrahtBot removed the label CI failed on Jun 19, 2024

DrahtBot added the label Needs rebase on Jun 20, 2024

sr-gi force-pushed on Jun 20, 2024

DrahtBot removed the label Needs rebase on Jun 20, 2024

glozow commented at 11:03 am on July 4, 2024: member

Needs rebase (for #29625)

DrahtBot added the label Needs rebase on Jul 4, 2024

in src/test/txreconciliation_tests.cpp:391 in 2e0b6742b8 outdated

262+        total_fanouted2 += tracker2.ShouldFanoutTo(wtxid, i,
263+                                                   /*inbounds_fanout_tx_relay=*/0, /*outbounds_fanout_tx_relay=*/0);
264+    }
265+    BOOST_CHECK_EQUAL(total_fanouted1, 3);
266+    BOOST_CHECK_EQUAL(total_fanouted2, 4);
267+

Prabhat1308 commented at 7:52 pm on July 7, 2024:

0for (int i = 1; i < inbound_peers; ++i) {
1        // create 4000 random wtxids so that FANOUT_TARGETS_PER_TX_CACHE_SIZE is breached
2        for (int j = 0; j < 4000; ++j) {
3            tracker.ShouldFanoutTo(Wtxid::FromUint256(frc.rand256()), i,
4                                   /*inbounds_fanout_tx_relay=*/0, /*outbounds_fanout_tx_relay=*/0);
5        }
6    }

increases coverage of the units tests to check the cache_size check in IsFanoutTarget .

sr-gi commented at 8:22 pm on September 13, 2024:

I haven’t taken this yet, given I’m working on a last commit reworking part of how the fanout selection works. I may consider it after that. In any case, looks like this is only filling the cache but not testing anything after it

sr-gi commented at 8:00 pm on November 1, 2024:

Resolving since we don’t have a cache anymore

sr-gi commented at 1:40 pm on July 8, 2024: member

Needs rebase (for #29625)

Currently working on some simulations, will rebase soon

sr-gi force-pushed on Sep 13, 2024

DrahtBot added the label CI failed on Sep 13, 2024

DrahtBot commented at 7:59 pm on September 13, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/30127266270

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label Needs rebase on Sep 13, 2024

sr-gi force-pushed on Sep 13, 2024

DrahtBot removed the label CI failed on Sep 13, 2024

in src/net_processing.cpp:2453 in 3879f550d3 outdated

2451+            if (!result.m_succeeded) {
2452+                // If the transactions fails to get into the set, we fanout
2453+                invs_to_send.push_back(wtxid);
2454+                // If the transaction fails because it collides with an existing one,
2455+                // we also remove and fanout the conflict and all its descendants.
2456+                // This is because our peer may have added the conflicting tranaction

brunoerg commented at 12:47 pm on September 21, 2024:

0                // This is because our peer may have added the conflicting transaction

sr-gi force-pushed on Oct 2, 2024

sr-gi commented at 3:35 pm on October 2, 2024: member

I’ve added a commit introducing a delayed set to store transactions that are being reconciled, but won’t be added to the sketch should it be constructed at this time. This is to mimic the fanout trickling logic where transactions are only made available to peers if they would have been INVed to them. This was already part of @naumenkogs approach, but was scrapped out when reworking when data is added to the reconciliation set in a22b2364f5942266596c78ad306b1e32aa89aa0f

sr-gi force-pushed on Oct 2, 2024

sr-gi force-pushed on Oct 3, 2024

in src/node/txreconciliation.cpp:457 in 20772c0778 outdated

452+        // We first decide how many reconciling peers of a given direction we want to flood to,
453+        // and then generate a list of peers of that size for a given transaction. We then see
454+        // whether the given peer falls into this list.
455+        double n;
456+        if (peer_state->m_we_initiate) {
457+            n = OUTBOUND_FANOUT_DESTINATIONS - outbounds_fanout_tx_relay;

marcofleon commented at 3:46 pm on October 12, 2024:

Can outbounds_fanout_tx_relay ever be greater than 1 here? I think if we have two (or more) unregistered outbound peers and at least one registered peer, then this results in an unsigned integer overflow. Which then overflows targets_size in IsFanoutTarget.

naumenkogs commented at 3:03 pm on October 23, 2024:

working on a test and fix for this.

sr-gi commented at 3:44 pm on October 23, 2024:

Oh sorry, I forgot to reply to this. We discussed this in person last week and this is indeed an issue. Happy to take your fix @naumenkogs

naumenkogs commented at 10:06 am on October 24, 2024:

Okay, see the last two commits here.

I also dropped the <0.001 check, it’s kinda pointless anyway…

sr-gi commented at 7:42 pm on October 28, 2024:

The commit with the casting works, but the test one is actually not a good way of testing this; it would pass both with and without the change.

Turns out that passing a huge value to IsFanoutTarget would result in targets_size being 0, which results in the method returning false. I don’t think there is really a good way of testing this.

I’ll take the code commit but skip the addition to the test.

sr-gi commented at 7:48 pm on October 28, 2024:

I squashed the change in here: ad95b2c0e21f1e865f967aa9463ef99bc252550a

naumenkogs commented at 9:57 am on October 29, 2024:

Interesting. So, for me, the first commit (with a new test) takes forever to execute.

More specifically, the execution hangs on best_peers.resize(targets_size);, with targets_size being a huge number.

I guess your system handles this gracefully instead?

A clear failure could be triggered by e.g. Assume(targets_size <= m_states.size());

naumenkogs commented at 10:19 am on October 29, 2024:

feel free to resolve this and let’s talk in the new commit.

marcofleon commented at 4:19 pm on October 12, 2024: contributor

I fuzzed the txreconciliation class with -DSANITIZERS=fuzzer,undefined,integer and two related runtime errors occurred:

 0../../src/node/txreconciliation.cpp:457:46: runtime error: unsigned integer overflow: 1 - 3 cannot be represented in type 'size_t' (aka 'unsigned long')
 1    [#0](/bitcoin-bitcoin/0/) 0x55ddab3b7d99 in TxReconciliationTracker::Impl::ShouldFanoutTo(transaction_identifier<true> const&, long, unsigned long, unsigned long) erlayfuzzbuild/src/../../src/node/txreconciliation.cpp:457:46
 2    [#1](/bitcoin-bitcoin/1/) 0x55ddaa5521c7 in txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_7::operator()() const erlayfuzzbuild/src/test/fuzz/../../../../src/test/fuzz/txreconciliation.cpp:118:46
 3    [#2](/bitcoin-bitcoin/2/) 0x55ddaa5521c7 in unsigned long CallOneOf<txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_0, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_1, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_2, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_3, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_4, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_5, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_6, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_7, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_8>(FuzzedDataProvider&, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_0, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_1, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_2, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_3, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_4, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_5, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_6, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_7, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_8) erlayfuzzbuild/src/test/fuzz/../../../../src/test/fuzz/util.h:42:27
 4    [#3](/bitcoin-bitcoin/3/) 0x55ddaa5521c7 in txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>) erlayfuzzbuild/src/test/fuzz/../../../../src/test/fuzz/txreconciliation.cpp:68:9
 5    [#4](/bitcoin-bitcoin/4/) 0x55ddaa6bf726 in std::function<void (std::span<unsigned char const, 18446744073709551615ul>)>::operator()(std::span<unsigned char const, 18446744073709551615ul>) const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_function.h:591:9
 6    [#5](/bitcoin-bitcoin/5/) 0x55ddaa6bf726 in LLVMFuzzerTestOneInput erlayfuzzbuild/src/test/fuzz/util/../../../../../src/test/fuzz/fuzz.cpp:209:5
 7    [#6](/bitcoin-bitcoin/6/) 0x55ddaa04a840 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e6840) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
 8    [#7](/bitcoin-bitcoin/7/) 0x55ddaa049f75 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e5f75) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
 9    [#8](/bitcoin-bitcoin/8/) 0x55ddaa04b705 in fuzzer::Fuzzer::MutateAndTestOne() (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e7705) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
10    [#9](/bitcoin-bitcoin/9/) 0x55ddaa04c305 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e8305) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
11    [#10](/bitcoin-bitcoin/10/) 0x55ddaa03a52f in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19d652f) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
12    [#11](/bitcoin-bitcoin/11/) 0x55ddaa0638e2 in main (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19ff8e2) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
13    [#12](/bitcoin-bitcoin/12/) 0x7fcd32ab6249  (/lib/x86_64-linux-gnu/libc.so.6+0x27249) (BuildId: 58254ca972028402bc40624f81388d85ec95f70d)
14    [#13](/bitcoin-bitcoin/13/) 0x7fcd32ab6304 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x27304) (BuildId: 58254ca972028402bc40624f81388d85ec95f70d)
15    [#14](/bitcoin-bitcoin/14/) 0x55ddaa02f920 in _start (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19cb920) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
16SUMMARY: UndefinedBehaviorSanitizer: unsigned-integer-overflow ../../src/node/txreconciliation.cpp:457:46 
17
18../../src/node/txreconciliation.cpp:401:100: runtime error: 7.92282e+28 is outside the range of representable values of type 'unsigned long'
19    [#0](/bitcoin-bitcoin/0/) 0x55ddab3c704a in TxReconciliationTracker::Impl::IsFanoutTarget(transaction_identifier<true> const&, long, bool, double) erlayfuzzbuild/src/../../src/node/txreconciliation.cpp:401:100
20    [#1](/bitcoin-bitcoin/1/) 0x55ddab3b7b83 in TxReconciliationTracker::Impl::ShouldFanoutTo(transaction_identifier<true> const&, long, unsigned long, unsigned long) erlayfuzzbuild/src/../../src/node/txreconciliation.cpp:471:16
21    [#2](/bitcoin-bitcoin/2/) 0x55ddaa5521c7 in txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_7::operator()() const erlayfuzzbuild/src/test/fuzz/../../../../src/test/fuzz/txreconciliation.cpp:118:46
22    [#3](/bitcoin-bitcoin/3/) 0x55ddaa5521c7 in unsigned long CallOneOf<txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_0, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_1, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_2, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_3, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_4, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_5, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_6, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_7, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_8>(FuzzedDataProvider&, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_0, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_1, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_2, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_3, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_4, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_5, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_6, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_7, txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_8) erlayfuzzbuild/src/test/fuzz/../../../../src/test/fuzz/util.h:42:27
23    [#4](/bitcoin-bitcoin/4/) 0x55ddaa5521c7 in txreconciliation_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>) erlayfuzzbuild/src/test/fuzz/../../../../src/test/fuzz/txreconciliation.cpp:68:9
24    [#5](/bitcoin-bitcoin/5/) 0x55ddaa6bf726 in std::function<void (std::span<unsigned char const, 18446744073709551615ul>)>::operator()(std::span<unsigned char const, 18446744073709551615ul>) const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_function.h:591:9
25    [#6](/bitcoin-bitcoin/6/) 0x55ddaa6bf726 in LLVMFuzzerTestOneInput erlayfuzzbuild/src/test/fuzz/util/../../../../../src/test/fuzz/fuzz.cpp:209:5
26    [#7](/bitcoin-bitcoin/7/) 0x55ddaa04a840 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e6840) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
27    [#8](/bitcoin-bitcoin/8/) 0x55ddaa049f75 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e5f75) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
28    [#9](/bitcoin-bitcoin/9/) 0x55ddaa04b705 in fuzzer::Fuzzer::MutateAndTestOne() (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e7705) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
29    [#10](/bitcoin-bitcoin/10/) 0x55ddaa04c305 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19e8305) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
30    [#11](/bitcoin-bitcoin/11/) 0x55ddaa03a52f in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19d652f) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
31    [#12](/bitcoin-bitcoin/12/) 0x55ddaa0638e2 in main (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19ff8e2) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
32    [#13](/bitcoin-bitcoin/13/) 0x7fcd32ab6249  (/lib/x86_64-linux-gnu/libc.so.6+0x27249) (BuildId: 58254ca972028402bc40624f81388d85ec95f70d)
33    [#14](/bitcoin-bitcoin/14/) 0x7fcd32ab6304 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x27304) (BuildId: 58254ca972028402bc40624f81388d85ec95f70d)
34    [#15](/bitcoin-bitcoin/15/) 0x55ddaa02f920 in _start (/root/bitcoin/erlayfuzzbuild/src/test/fuzz/fuzz+0x19cb920) (BuildId: 9429992cf566016ee2d3d51625e57700f09635fc)
35SUMMARY: UndefinedBehaviorSanitizer: float-cast-overflow ../../src/node/txreconciliation.cpp:401:100 
36==222068== ERROR: libFuzzer: out-of-memory (used: 4046Mb; limit: 2048Mb)
37   To change the out-of-memory limit use -rss_limit_mb=<N>
38
39MS: 4 ChangeBinInt-ChangeBit-CrossOver-CrossOver-; base unit: 0cbe4c9fe667ead352437139e9ea437783de5c0c
400xad,0xad,0xad,0x60,0x0,0x0,0x0,0x0,0x0,0x0,0x37,0xff,0xff,0xff,0xff,0xd2,0xff,0xff,0xff,0xd2,0xd2,0xd2,0xd2,0xd2,0xd2,0xd2,0xd2,0x3b,0xd2,0xd2,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xdf,0xff,0x0,0xff,0xff,0x9,0x0,0x0,0x0,0x0,0x0,0xff,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x75,0xff,0xff,0xff,0xd2,0xd2,0xd2,0xd2,0xd2,0xd2,0xbf,0xff,0x0,0x0,0x0,0x52,0xad,0xad,0xbf,0x6a,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xa0,0xad,0x1,0x0,0x0,
41\255\255\255`\000\000\000\000\000\0007\377\377\377\377\322\377\377\377\322\322\322\322\322\322\322\322;\322\322\377\377\377\377\377\377\377\377\337\377\000\377\377\011\000\000\000\000\000\377\000\000\000\000\000\000\000u\377\377\377\322\322\322\322\322\322\277\377\000\000\000R\255\255\277j\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\240\255\001\000\000
42artifact_prefix='./'; Test unit written to ./oom-b217ee9df2670caf13a47abe1623c2f80e6d1f02
43Base64: ra2tYAAAAAAAADf/////0v///9LS0tLS0tLSO9LS///////////f/wD//wkAAAAAAP8AAAAAAAAAdf///9LS0tLS0r//AAAAUq2tv2qgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgrQEAAA==
44SUMMARY: libFuzzer: out-of-memory

oom-b217ee9df2670caf13a47abe1623c2f80e6d1f02.txt

Left a question about it below.

in src/node/txreconciliation.cpp:234 in 50c19dd21f outdated

208+        }
209+
210+        return false;
211+    }
212+
213+    bool HasCollision(NodeId peer_id, const Wtxid& wtxid, Wtxid& collision, uint32_t &short_id) EXCLUSIVE_LOCKS_REQUIRED(!m_txreconciliation_mutex)

naumenkogs commented at 8:16 am on October 28, 2024:

50c19dd21f6e437cd8e3d9047a89d793d097c5e5 Not sure it’s worth exposing this external method if the only reason to have it is testing.

sr-gi commented at 2:34 pm on November 1, 2024:

Happy to consider an alternative approach that allows us to check there are no collisions without having to expose a public method if you have something in mind

naumenkogs commented at 9:07 am on November 4, 2024:

Yeah just dropping HasCollision, keeping HasCollisionInternal. The former is used only in tests. And since its implementation is trivial, i think testing for collisions through AddToSet is sufficient.

sr-gi commented at 7:12 pm on November 13, 2024:

Sure, looks like tests can be rewritten for it not to be needed. Will address in the next push

in src/net_processing.cpp:6276 in 50c19dd21f outdated

6272+                            auto result = m_txreconciliation->AddToSet(pto->GetId(), wtxid);
6273+                            if (!result.m_succeeded) {
6274+                                vInv.push_back(inv);
6275+                                // If the transaction fails because it collides with an existing one,
6276+                                // we also remove and fanout the conflict and all its descendants.
6277+                                // This is because our peer may have added the conflicting tranaction

naumenkogs commented at 8:20 am on October 28, 2024:

50c19dd21f6e437cd8e3d9047a89d793d097c5e5

worth adding a comment on what would happen if there are 3 colliding transactions (probably fine but some text would be helpful) :)

sr-gi commented at 6:58 pm on October 30, 2024:

Being this part of RelayTransaction, which processed transactions one at a time, and given that if a collision happens the existing transaction is removed from the set, there cannot be a three-way collision. If a third transaction that happens to have the same short id as the two previous ones is added to the set later, it would be added normally since there won’t be a transaction to collide with.

I could add a comment for three, but someone may then mention what happens if there are four 😅

naumenkogs commented at 8:22 am on October 31, 2024:

I was looking for something like “As a general rule, we keep the transaction that appeared first”.

Currently, the AddToSet documentation seems outdated—it basically claims that the only way to fail is to reach a set limit.

How about the following replacement:

0    /**
1     * Step 1. Add a to-be-announced transaction to the local reconciliation set of the target peer.
2     * Returns false if the set is at capacity, or if the set contains a colliding transaction.
3     * Returns true if the transaction appears in the set (whether it was already there or just was added).
4     */

naumenkogs commented at 10:23 am on October 31, 2024:

Apparently I was wrong.

0                // If the transaction fails because it collides with an existing one,
1                // we also remove and fanout the conflict and all its descendants.
2                // This is because our peer may have added the conflicting transaction
3                // to its set, in which reconciliation of these two would fail

Say A, B, C arrive.

First A added to the set.
then B doesn’t make it to the set, but also drops A from it.
then C takes a place in the set.

sr-gi commented at 8:05 pm on November 1, 2024:

Agreed the docs for TxReconciliationTracker::AddToSet could be improved. I’ll update it with a version of your suggestion

in src/net_processing.cpp:6286 in 50c19dd21f outdated

6282+                                    CTxMemPool::setEntries descendants;
6283+                                    m_mempool.CalculateDescendants(m_mempool.get_iter_from_wtxid(collision.value()), descendants);
6284+                                    for (const auto &txit: descendants) {
6285+                                        auto wtxid = txit->GetTx().GetWitnessHash();
6286+                                        m_txreconciliation->TryRemovingFromSet(pto->GetId(), wtxid);
6287+                                        vInv.emplace_back(MSG_WTX, wtxid);

naumenkogs commented at 8:22 am on October 28, 2024:

50c19dd21f6e437cd8e3d9047a89d793d097c5e5 what would happen if a descendant was already flooded?

sr-gi commented at 3:14 pm on November 1, 2024:

That’s a good point. It may be the case that the selected peer has a parent -> child dependency but was not picked as one of the fanout_with_ancestors peers, the parent and the child may have ended up in different sets.

We should check the return of TryRemovingFromSet before adding it to vInv

sr-gi commented at 8:39 pm on November 1, 2024:

I just realized this is also the case for the ancestors. When relaying with ancestors, we should also check that the ancestor we are trying to relay is part of the reconciliation set (i.e. that it has not been previously flooded)

naumenkogs commented at 9:47 am on November 4, 2024:

The fix should be done in p2p: Deal with shortid collisions for reconciliation sets, please move it if you retouch.

in src/net_processing.cpp:6285 in 50c19dd21f outdated

6281+                                    Assume(peer->m_wtxid_relay);
6282+                                    CTxMemPool::setEntries descendants;
6283+                                    m_mempool.CalculateDescendants(m_mempool.get_iter_from_wtxid(collision.value()), descendants);
6284+                                    for (const auto &txit: descendants) {
6285+                                        auto wtxid = txit->GetTx().GetWitnessHash();
6286+                                        m_txreconciliation->TryRemovingFromSet(pto->GetId(), wtxid);

naumenkogs commented at 8:32 am on October 28, 2024:

50c19dd21f6e437cd8e3d9047a89d793d097c5e5 could this be used to game collisions (for wtxid you have here)?

naumenkogs commented at 8:33 am on October 28, 2024:

say a descendant was never considered for reconciliation (somehow), but now is used to removefromset some other transaction (it collides with)

or maybe it was considered, but then was dropped in favor of something else (that’s not how we currently handle collisions, but that would be a hard assumption)

sr-gi commented at 3:28 pm on November 1, 2024:

Collisions of this kind could happen in theory, but I don’t think they can be really gamed:

short ids are peer specific, they are generated using SipHashUint256 which are salted using m_k0 and m_k1. Therefore, while a collision of this kind is theoretically possible, an adversarial peer should not be able to exploit them

naumenkogs commented at 9:08 am on November 4, 2024:

Right, please resolve this issue.

in src/net_processing.cpp:509 in 392facfce1 outdated

522@@ -523,7 +523,7 @@ class PeerManagerImpl final : public PeerManager
523     PeerManagerInfo GetInfo() const override EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex);
524     void SendPings() override EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex);
525     std::pair<size_t, size_t> GetFanoutPeersCount() override EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex);
526-    void RelayTransaction(const uint256& txid, const uint256& wtxid) override EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex);
527+    void RelayTransaction(const uint256& txid, const uint256& wtxid, bool force_relay) override EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex);

naumenkogs commented at 8:37 am on October 28, 2024:

392facfce1be6d0395abecc2e049ff39635bd1a5

why do you think force relay should influence whether we reconcile or flood? The transaction is relayed in either case (the original meaning of force relay)

sr-gi commented at 7:23 pm on November 14, 2024:

I misunderstood the use of ForceRelay here. Will amend it so it applied to both

naumenkogs commented at 6:27 am on November 26, 2024:

What’s the plan? I still think force_relay is unnecessary here.

sr-gi commented at 4:07 pm on November 26, 2024:

Yep, I’m planning to drop it

sr-gi commented at 9:32 pm on November 26, 2024:

Covered in the last push

in src/net_processing.cpp:2172 in 392facfce1 outdated

2369@@ -2370,8 +2370,8 @@ std::pair<size_t, size_t> PeerManagerImpl::GetFanoutPeersCount()
2370     if (m_txreconciliation) {
2371         LOCK(m_peer_mutex);
2372         for(const auto& [peer_id, peer] : m_peer_map) {
2373-            if (auto tx_relay = peer->GetTxRelay()) {
2374-                bool peer_relays_txs = WITH_LOCK(tx_relay->m_bloom_filter_mutex, return tx_relay->m_relay_txs);
2375+            if (const auto tx_relay = peer->GetTxRelay()) {

naumenkogs commented at 8:38 am on October 28, 2024:

392facfce1be6d0395abecc2e049ff39635bd1a5

these kind of refactors we better apply where the code was introduced, in the older commits of this PR

sr-gi commented at 6:45 pm on October 30, 2024:

You’re right, should have been added to the previous commit. Will fix

in src/node/txreconciliation.h:158 in 392facfce1 outdated

138@@ -139,6 +139,12 @@ class TxReconciliationTracker
139      */
140     bool ShouldFanoutTo(const Wtxid& wtxid, NodeId peer_id,
141                         size_t inbounds_fanout_tx_relay, size_t outbounds_fanout_tx_relay);
142+
143+    /**
144+     * Returns a collections of node ids sorted by how many parents the peer has in its reconciliation set

naumenkogs commented at 8:43 am on October 28, 2024:

392facfce1be6d0395abecc2e049ff39635bd1a5

you can technically feed it arbitrary transactions. I think the comment should say it, and then later say “we normally pass a list of parents to determine whether [… … …]”.

sr-gi commented at 6:26 pm on November 14, 2024:

I’ve reworked the docs for this

in src/net_processing.cpp:6220 in 392facfce1 outdated

6283@@ -6216,27 +6284,6 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
6284                     // No reason to drain out at many times the network's capacity,
6285                     // especially since we have many peers and some will draw much shorter delays.
6286                     unsigned int nRelayedTransactions = 0;
6287-
6288-                    size_t inbounds_fanout_tx_relay = 0, outbounds_fanout_tx_relay = 0;

naumenkogs commented at 8:44 am on October 28, 2024:

392facfce1be6d0395abecc2e049ff39635bd1a5

why not modifying the former commit that introduces this code?

sr-gi commented at 7:31 pm on October 30, 2024:

Do you mean squashing these changes instead of having a “move” commit?

naumenkogs commented at 10:25 am on October 31, 2024:

yes

sr-gi commented at 2:30 pm on November 1, 2024:

Because it is doing more than just moving the code around, so I thought it may be easier for reviewers already familiar with the original approach to understand the changes. I wouldn’t mind squashing it once all the discussion around RelayTranaction is resolved

naumenkogs commented at 9:10 am on November 4, 2024:

My impression was that there are very-very few reviewers already familiar with the original approach at the moment, so it’s an extra burden for most to understand that former code you drop anyway. But yeah it’s ok. Perhaps a commit message saying it will be dropped will work.

sr-gi commented at 9:36 pm on November 4, 2024:

I’ll squash in the next force push

in src/net_processing.cpp:2188 in 392facfce1 outdated

2429+            // to reconcile, fanout the transaction an all its ancestors. We just add the parents here and leave fanout as true
2430+            auto it = std::find(fanout_with_ancestors.begin(), fanout_with_ancestors.end(), peer_id);
2431+            if (it != fanout_with_ancestors.end()) {
2432+                for (const auto wtxid: parents) {
2433+                    m_txreconciliation->TryRemovingFromSet(peer_id, wtxid);
2434+                    invs_to_send.push_back(wtxid);

naumenkogs commented at 8:54 am on October 28, 2024:

392facfce1be6d0395abecc2e049ff39635bd1a5

what about the order of the parents here? Deserves a comment at least. With the current non-cluster mempool it might break something?

UPD:

Say the following happens: A -> B - > C

A arrives, we add it to some sets.
B arrives, triggers a flood from half the sets, but the other half has A+B now.
C arrives, triggers a flood from A+B. But then, can we guarantee A is flooded before B?

UPD2: topo sort should work. Have a common place where the INV vector is handled?

naumenkogs commented at 10:17 am on October 31, 2024:

another question: say a child arrives a few minutes after a parent (parent still in the mempool, but it’s been announced to peers a while ago). Why bother with this inv? Perhaps we should do it only if TryRemovingFromSet actually have removed it?

(same applies to handling a collision below)

i know m_tx_inventory_known_filter will probably catch it, but i think it’s better to do it here.

sr-gi commented at 7:52 pm on November 14, 2024:

Not sure I follow how the first comment is an issue, but I do agree with the second. This needs to be conditional to TryRemovingFromSet succeeding.

sr-gi commented at 4:08 pm on November 26, 2024:

@naumenkogs can this be resolved?

in src/net_processing.cpp:2202 in 392facfce1 outdated

2442+        if (fanout) {
2443+            // We fanout if force relay is set, if the peer does not reconcile transactions, or if it does but it has been picked for fanout.
2444+            invs_to_send.push_back(peer->m_wtxid_relay ? wtxid : txid);
2445+        } else {
2446+            // Otherwise, we try to add the transaction to the peer's reconciliation set.
2447+            // If the transaction has in mempool descendants, we will fanout all the descendant

naumenkogs commented at 9:11 am on October 28, 2024:

392facfce1be6d0395abecc2e049ff39635bd1a5

where do you do this (consider in-mempool descendants and decide to fanout everything)? I only see you do it for the collision

sr-gi commented at 2:52 pm on November 1, 2024:

I don’t think this comment is accurate since this can never be the case. A transaction considered for relay cannot have in-mempool descendants by definition, since they would have been orphans at the time of considering them.

I cannot recall if this is a re-word of the previous approach, which tried to consider descendants, or I tried to refer to the collisions and it didn’t make much sense.

I’ll remove the comment

in src/net_processing.cpp:5847 in 20772c0778 outdated

6269@@ -6270,6 +6270,10 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
6270 
6271                 // Determine transactions to relay
6272                 if (fSendTrickle) {
6273+                    if (m_txreconciliation && m_txreconciliation->IsPeerRegistered(pto->GetId())) {
6274+                        // Make transactions added to the reconciliation set during the last interval available
6275+                        m_txreconciliation->ReadyDelayedTransactions(pto->GetId());

naumenkogs commented at 9:19 am on October 28, 2024:

20772c077832592265bd6a2876aa6b4cb7dbde7d

with this code, and especially the 5/2 trickle delay (very long), previous performance measurements become outdated.

My idea was to delay the responses. And by this time, the hope was flooding does enough job so that scanning the sets through many connections does not work.

Also, adding to a set already happens on a trickle… so that’s another level of protection

naumenkogs commented at 11:35 am on October 30, 2024:

Just talked to Bruno a bit more about this.

I think it’s fair to compare this delayed set mechanism to what i have in the code there. I like my solution more because:

its synced across peers (handles better spying across multiple conns i think?)
simpler code
more flexibility (doesn’t have to be same trickling as adding to sets or fanouting)

Interested in what other people think though.

sr-gi commented at 3:07 pm on October 30, 2024:

I think the current approach mimics what you were doing with your old one. I haven’t picked up the trickle reduction commit not the delayed response, but I was planning to in the next PR.

Your approach used to add transactions to the reconciliation set during INV building, which happens on trickle intervals. So data was added to to_be_announced (delayed) on RelayTransaction and then made available on trickle. The current approach adds data to the delayed set on RelayTransaction and makes it available on trickle too.

So I don’t think the previous performance measurements become outdated, building the whole Erlay PR on this should yield similar results.

naumenkogs commented at 10:14 am on October 31, 2024:

I agree now, it’s equivalent. Still wrapping my head around RelayTransaction refactoring :)

A difference could be in handling dependencies: during the trickle window after the first transaction arrives.

Say, a parent arrives, and then a child arrives in a second. In my approach we may notice the child when handling (announcement-wise) the parent (because handling the parent is postponed via trickle). In your approach, you handle the parent independently.

Is there any use of knowing about the child while handling (announcement-wise) the parent? I don’t think so. You probably could save some efforts of adding the parent to reconset, and then removing it, but that should be cheap anyway.

(the child-before-parent case is entirely different, as the child-to-arrive-early doesn’t normally hit the announcement step while in orphanage)

naumenkogs commented at 12:22 pm on November 1, 2024:

Found another relevant thing! (here)

009:42:53<@gmaxwell> Existing relay logic checks that transactions are still in the mempool before relaying them. I think the issue there would go away if instead of keeping around some erlay datastructure you just keep growing the queue of the peers transactions to relay until its time to reconcile, enh?  then the existing logic to check if things are still in the mempool is sufficient.

“Existing” refers to the legacy code in my branch.

sr-gi commented at 4:07 pm on November 26, 2024:

I’m planning to simulate both cases and report back

sr-gi commented at 2:03 am on December 22, 2024:

I simulated this to see if there was any difference between doing the decision-making before or after trickle.

I didn’t do it for dependant transactions, just for a single one, trying to answer whether choosing fanout peers at scheduling time or at relay time made any difference. The answer is no, given the number of peers that know about the transaction at both times is almost the same.

The approach, simulation and results can be found here: https://srgi.notion.site/Select-fanout-candidates-at-relay-time-instead-of-at-relay-scheduling-time-1617b3fef97780b9aa9bc52ec6fb4049

It is worth mentioning that this applies to fully deployed erlay, which I think is what we should be designing for.

in src/node/txreconciliation.cpp:213 in d2f2605072 outdated

132@@ -115,10 +133,66 @@ class TxReconciliationTracker::Impl
133                       peer_id, is_peer_inbound);
134 
135         const uint256 full_salt{ComputeSalt(local_salt, remote_salt)};
136-        recon_state->second = TxReconciliationState(!is_peer_inbound, full_salt.GetUint64(0), full_salt.GetUint64(1));
137+
138+        auto new_state = TxReconciliationState(!is_peer_inbound, full_salt.GetUint64(0), full_salt.GetUint64(1));;
139+        m_states.erase(recon_state);
140+        m_states.emplace(peer_id, std::move(new_state));

naumenkogs commented at 9:28 am on October 28, 2024:

d2f260507279b1edd6c0f55d02c7ae2e598b3585 nit: Assume(m_states.emplace(peer_id, std::move(new_state)).second);

sr-gi commented at 8:46 pm on November 14, 2024:

Done

in src/node/txreconciliation.cpp:325 in d2f2605072 outdated

180+        auto peer_state = GetRegisteredPeerState(peer_id);
181+        if (!peer_state) return false;
182+
183+        auto removed = peer_state->m_local_set.erase(wtxid) > 0;
184+        if (removed) {
185+            LogPrintLevel(BCLog::TXRECONCILIATION, BCLog::Level::Debug, "Removed %s from the reconciliation set for peer=%d. " /* Continued */

naumenkogs commented at 9:29 am on October 28, 2024:

https://github.com/bitcoin/bitcoin/commit/d2f260507279b1edd6c0f55d02c7ae2e598b3585

nit: in three places in this commit /* Continued */ should be dropped, it’s no longer necessary (been a linter thing)

in src/net_processing.cpp:6211 in 873e8dd60b outdated

6206+                                // When we consider to which (and how many) Erlay peers
6207+                                // we should fanout a tx, we must know to how
6208+                                // many peers we would certainly announce this tx
6209+                                // (non-Erlay peers).
6210+                                if (peer_tx_relay->m_relay_txs && !m_txreconciliation->IsPeerRegistered(cur_peer_id)) {
6211+                                    inbounds_fanout_tx_relay += cur_peer->m_is_inbound;

naumenkogs commented at 9:31 am on October 28, 2024:

873e8dd60b93f740330e9b0fbae4b3200204653a

nit: cur_peer->m_is_inbound ? ++inbounds_fanout_tx_relay : ++outbounds_fanout_tx_relay

naumenkogs changes_requested

sr-gi force-pushed on Oct 28, 2024

sr-gi commented at 7:50 pm on October 28, 2024: member

Squashed @naumenkogs fix on @marcofleon comment in ad95b2c0e21f1e865f967aa9463ef99bc252550a, plus added a commit replacing IsFanoutTarget with GetFanoutTargets so the cache for fanout can be dropped. After this, the fanout targets are computed just once per transaction

sr-gi force-pushed on Oct 28, 2024

DrahtBot added the label CI failed on Oct 28, 2024

DrahtBot commented at 8:02 pm on October 28, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/32180649974

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sr-gi force-pushed on Oct 28, 2024

DrahtBot removed the label CI failed on Oct 28, 2024

in src/node/txreconciliation.cpp:389 in 4234f5ebe1 outdated

393+        const double inbound_targets = (inbounds_fanout_tx_relay + m_inbounds_count) * INBOUND_FANOUT_DESTINATIONS_FRACTION;
394+        double n = std::max(inbound_targets - inbounds_fanout_tx_relay, 0.0);
395+
396+        // Being this a fraction, we need to round it either up or down. We do this deterministically at random based on the
397+        // transaction we are picking the peers for.
398+        CSipHasher deterministic_randomizer_in{m_deterministic_randomizer};

naumenkogs commented at 10:06 am on October 29, 2024:

4234f5ebe133b949080aaa56da8cbdd18b650ff2

Originally, the code forced me to make this call for every peer, that’s why i needed it to be deterministic. It’s not the case anymore. Do we still need determinism then?

naumenkogs commented at 10:08 am on October 29, 2024:

Otherwise we can drop m_deterministic_randomizer altogether and use some cheap FastRandomContext instead.

sr-gi commented at 7:34 pm on October 29, 2024:

Well, this really depends. GetFanoutTargets is called by RelayTransaction, which, in turn, may be called by ProcessMessage for a peer that has NetPermissionFlags::ForceRelay. So if force relay is used for both fanout and set reconciliation, as suggested in #30116 (review), then this could be called twice leading to two different sorting if we don’t do so deterministically

naumenkogs commented at 9:15 am on October 30, 2024:

This could be ReattemptInitialBroadcast, BroadcastTransaction (rpc/wallet), ProcessValidTx, or earlier upon handling NetMsgType::TX.

I think it’s odd we have two calls to RelayTransaction in the two latter cases. I’d rather change this code, e.g. pass already_relayed (duplicating force) to RelayTransaction. Perhaps too much for this PR and we should do it elsewhere (especially now that the package stuff happens in-between). I understand why it was ok when RelayTransaction was simple.

~~And while we’re stuck with these two calls~~, let’s say we drop the determinism.

Why RelayTransaction doesn’t look at m_tx_inventory_to_send very early (and maybe m_tx_inventory_known_filter too)? That would catch the transactions we already set for flooding, and just pass on them.

Now, the transactions in the sets. If we add them to the set again, nothing is done. If we try to flood them by a different choice, nothing is done still because we hit m_tx_inventory_to_send already having it.

Thus, for now, i think we should repeat the m_tx_inventory_to_send check above, rather than having the determinism. It kinda makes sense anyway, no?

sr-gi commented at 7:41 pm on December 5, 2024:

Why RelayTransaction doesn’t look at m_tx_inventory_to_send very early (and maybe m_tx_inventory_known_filter too)? That would catch the transactions we already set for flooding, and just pass on them.

I don’t think this really works(?).

Imagine we call this once, and peer p (who has ForceRelay` permissions) is selected for reconciliation using a non-deterministic approach.

Now, the transaction is sent again by p shortly after, and now he is selected for flooding instead. The filter won’t catch this, and the transaction would end up in both sets.

This makes me recall that even though the process of selecting peers is deterministic at the moment, it is dependent on our peer list. Calling this twice with a different set of peers can result in a different ordering, so maybe we should not assume that transactions can be freely added to any of the two sets, and we should make sure that the opposite set does not contain the transaction before adding it to our desired one?

in src/node/txreconciliation.cpp:374 in 4234f5ebe1 outdated

385+        LOCK(m_txreconciliation_mutex);
386 
387+        // We decide whether a particular peer is a low-fanout flood target differently based on its connection direction:
388+        // - for outbounds we have a fixed number of flood destinations.
389+        // - for inbounds we use a fraction of all inbound peers supporting tx relay.
390+        auto outbounds_target_size = (double)OUTBOUND_FANOUT_DESTINATIONS - outbounds_fanout_tx_relay;

naumenkogs commented at 10:17 am on October 29, 2024:

There is no need to cast it to double anymore.

What we should do is something like

0        size_t outbounds_target_size = 0;
1        if (OUTBOUND_FANOUT_DESTINATIONS > outbounds_fanout_tx_relay) outbounds_target_size = OUTBOUND_FANOUT_DESTINATIONS - outbounds_fanout_tx_relay;

Perhaps there is a less ugly way to do this. But casting to double seems worse.

naumenkogs commented at 10:27 am on October 29, 2024:

The following test would break the code if we forget the double cast (discussion here). I suggest adding it to the tests.

0        std::tie(in_fanout_targets, out_fanout_targets) = tracker.GetFanoutTargets(Wtxid::FromUint256(frc.rand256()), /*inbounds_fanout_tx_relay=*/0, /*outbounds_fanout_tx_relay=*/100);
1        BOOST_CHECK(!tracker.ShouldFanoutTo(peer_id0, in_fanout_targets, out_fanout_targets));

naumenkogs commented at 10:30 am on October 29, 2024:

(technically the code you have is correct, it’s just confusing for no reason i think)

sr-gi commented at 9:06 pm on October 29, 2024:

I’ve added it to the initial commit and reworked it on the last one

in src/node/txreconciliation.cpp:420 in 4234f5ebe1 outdated

440+            for_each(sorted_outbounds.begin(), sorted_outbounds.end(),
441+                    [&out_fanout_targets](auto& keyed_peer) { out_fanout_targets.push_back(keyed_peer.second); });
442+            out_fanout_targets.resize(outbounds_target_size);
443+        }
444+
445+        return std::make_tuple(in_fanout_targets, out_fanout_targets);

naumenkogs commented at 10:33 am on October 29, 2024:

4234f5ebe133b949080aaa56da8cbdd18b650ff2

We can just merge the two vectors, no? There is no use for this separation.

sr-gi commented at 7:38 pm on October 29, 2024:

Yeah, you’re right. Given how small the two vectors are, calling ShouldFanoutTo with a merged vector should not incur any major overhead, so merging will make it easier to read

DrahtBot added the label Needs rebase on Oct 29, 2024

sr-gi force-pushed on Oct 29, 2024

sr-gi force-pushed on Oct 30, 2024

sr-gi commented at 2:59 pm on October 30, 2024: member

Rebasing to deal with merge conflicts

DrahtBot removed the label Needs rebase on Oct 30, 2024

in src/node/txreconciliation.cpp:110 in ccd695851c outdated

106+        return m_local_set.size() + m_delayed_local_set.size();
107+    }
108+
109+    bool RemoveFromSet(const Wtxid& wtxid)
110+    {
111+        auto r = m_local_set.erase(wtxid) + m_delayed_local_set.erase(wtxid);

naumenkogs commented at 11:17 am on October 31, 2024:

Nit

0        const size_t removed = m_local_set.erase(wtxid) + m_delayed_local_set.erase(wtxid);
1        // Data must be in one of the sets at most
2        Assume(removed <= 1);
3        return removed == 1;

sr-gi commented at 8:27 pm on November 1, 2024:

Not too sure about the return removed == 1

The assume is there to make sure that if the code happens to be wrong, it is catched on testing. In the current approach, if the code moving data from delayed_set to m_local_set was broken somehow, this method will return that data was actually removed. If were to return removed == 1 in the aforementioned case, it would return false, and the caller’s logic may break.

I don’t think any of these cases is likely to happen given the assume will catch it before getting merged, but the proposed change makes me anxious.

PS: I agree on doing the variable renaming if you feel that makes the code more readable though

naumenkogs commented at 10:08 am on November 4, 2024:

I thought size_t 0 will be cast to bool false on return, and just suggested the exact same behavior (1==true, 0==false) but without forcing the reader to think about casting and cause this exact confusion between us. Am i wrong?

Ahhh, so you saying 2==true is also important? Then I’ll be fine with explicit return removed >= 1, and comment that 2 should not normally happen.

sr-gi commented at 9:10 pm on November 13, 2024:

2==true would mean that we had the transaction in both places, which should never happen (I’d mean that a transaction is both ready and delayed at the same time).

If that were to happen in production for whatever reason, this would return false with the suggested changes, which may break something?

naumenkogs commented at 7:33 am on November 14, 2024:

Probably a taste thing. I think auto should be used only when it’s obvious what type it is.

Here there are two non-obvious occasions:

auto x = bool + bool resulting in x=2.
casting 2 to bool for the result.

I won’t insist but i think verbose types are better here. Feel free to close.

sr-gi commented at 8:20 pm on November 14, 2024:

That is fair. I’ve updated it to make it more explicit (and readable)

in src/node/txreconciliation.cpp:436 in fd7c35a226 outdated

376@@ -376,6 +377,38 @@ class TxReconciliationTracker::Impl
377 
378         return IsFanoutTarget(wtxid, peer_id, peer_state->m_we_initiate, n);
379     }
380+
381+    std::vector<NodeId> SortPeersByFewestParents(std::vector<Wtxid> parents) EXCLUSIVE_LOCKS_REQUIRED(!m_txreconciliation_mutex)

naumenkogs commented at 9:49 am on November 1, 2024:

fd7c35a22669d440afe182a153bed28fd326b51f

Often, there will be no parents (e.g., a parent is in the mempool but has already been flooded/reconciled). In that case, it would be nice to give a random order. And then comment it please.

(currently it’s ordered by m_states i think).

sr-gi commented at 8:42 pm on November 1, 2024:

This is only called if there are in mempol parents though, the ordering doesn’t matter much otherwise

naumenkogs commented at 9:57 am on November 4, 2024:

There could be mempool parents, but none are in reconciliation sets (say, they were already reconciled). For all these cases, it would be nice to not choose the peer sitting first in m_states?

sr-gi commented at 9:51 pm on November 13, 2024:

In that case, no transaction would be moved from the reconciliation set to invs_to_send, given no transaction would be removed. Therefore, the order wouldn’t matter:

0auto it = std::find(fanout_with_ancestors.begin(), fanout_with_ancestors.end(), peer_id);
1if (it != fanout_with_ancestors.end()) {
2    for (const auto wtxid: parents) {
3        m_txreconciliation->TryRemovingFromSet(peer_id, wtxid);
4        invs_to_send.push_back(wtxid);
5    }
6} else {
7    // If the peer is registered for set reconciliation, maybe pick it as fanout
8    fanout = m_txreconciliation->ShouldFanoutTo(peer_id, fanout_targets);
9}

This may be a bit counterintuitive. I’ll give some though and potentially re-design so it clearer

naumenkogs commented at 7:12 am on November 14, 2024:

Say there is an in-mempool parent announced earlier, so it’s not in any of the sets. SortPeersByFewestParents will return the same order as m_states (since all parent_count are 0), and fanout_with_ancestors will take first two after resize.

Then for two peers (specifically, the first two from m_states) this code is executed:

0                for (const auto wtxid: parents) {
1                    m_txreconciliation->TryRemovingFromSet(peer_id, wtxid);
2                    invs_to_send.push_back(wtxid);
3                }

TryRemovingFromSet will do nothing. But then invs_to_send.push_back(wtxid); is executed anyway. And always for the same two first peers from m_states. In other words, the first two are guaranteed to take the fanout in this case.

sr-gi commented at 7:49 pm on November 14, 2024:

Sorry, you’re right. I’ve been looking at this being conditional to m_txreconciliation->TryRemovingFromSet(peer_id, wtxid), which is what happens with the descendants, but here it is unconditional.

It should be added only if TryRemovingFromSet succeeds, and therefore the ordering in case there are no ancestors won’t matter. I’ll also add some comments regarding that.

Notice the point of this is not to add something to invs_to_send but rather to move it from m_txreconciliation to invs_to_send, otherwise things would be sent more than once (or potentially captured by the bloom filters before hand, but still not optimal)

sr-gi commented at 8:49 pm on November 14, 2024:

Made the move dependant on TryRemovingFromSet

in src/net_processing.cpp:2232 in fd7c35a226 outdated

2212+                // we also remove and fanout the conflict and all its descendants.
2213+                // This is because our peer may have added the conflicting transaction
2214+                // to its set, in which reconciliation of these two would fail
2215+                if (const auto collision = result.m_conflict; collision.has_value()) {
2216+                    CTxMemPool::setEntries descendants;
2217+                    WITH_LOCK(m_mempool.cs, m_mempool.CalculateDescendants(m_mempool.get_iter_from_wtxid(collision.value()), descendants));

naumenkogs commented at 10:06 am on November 1, 2024:

Why having a collision means we should fanout the children? They may get reconciled in their own time.

sr-gi commented at 2:20 pm on November 1, 2024:

Because if you remove the parent from the reconciliation set but not the children, it could be the case that you reconcile with that peer before fanning out the parent, and all those children will be orphan

naumenkogs commented at 9:54 am on November 4, 2024:

Something worth documenting too :) Again, at least one place where we talk about handling dependencies.

sr-gi commented at 10:01 pm on November 13, 2024:

Done

sr-gi force-pushed on Nov 1, 2024

sr-gi commented at 8:44 pm on November 1, 2024: member

Rebased to make CI green, plus covered (and masked as resolved) some outstanding comments:

#30116 (review) #30116 (review) #30116 (review) #30116 (review) #30116 (review) #30116 (review)

sr-gi force-pushed on Nov 1, 2024

DrahtBot added the label CI failed on Nov 1, 2024

sr-gi force-pushed on Nov 1, 2024

DrahtBot removed the label CI failed on Nov 2, 2024

in src/node/txreconciliation.cpp:261 in e942f3a749 outdated

151+
152+        // Check if the reconciliation set is not at capacity for two reasons:
153+        // - limit sizes of reconciliation sets and short id mappings;
154+        // - limit CPU use for sketch computations.
155+        //
156+        // Since we reconcile frequently, reaching capacity either means:

naumenkogs commented at 9:16 am on November 4, 2024:

e942f3a7493908988eea08640d87b2ae420c5eef

This entire paragraph currently seems to me more confusing than useful. I suggest dropping it altogether.

sr-gi commented at 8:52 pm on November 14, 2024:

The whole thing from L151-L160?

naumenkogs commented at 6:13 am on November 26, 2024:

This comment is related to L156-160. L151-155 could be dropped for a different reason (redundancy with the MAX_RECONSET_SIZE comment)

sr-gi commented at 9:31 pm on November 26, 2024:

Covered in the last push

in src/node/txreconciliation.cpp:257 in e942f3a749 outdated

147+        AssertLockNotHeld(m_txreconciliation_mutex);
148+        LOCK(m_txreconciliation_mutex);
149+        auto peer_state = GetRegisteredPeerState(peer_id);
150+        if (!peer_state) return false;
151+
152+        // Check if the reconciliation set is not at capacity for two reasons:

naumenkogs commented at 9:18 am on November 4, 2024:

e942f3a7493908988eea08640d87b2ae420c5eef

Duplicates the comment around MAX_RECONSET_SIZE. One of them we better dop.

sr-gi commented at 8:53 pm on November 14, 2024:

I’m guessing this is related to #30116 (review), in which case I’ll drop that

in src/net_processing.cpp:5827 in 9005953037 outdated

5823+                            auto result = m_txreconciliation->AddToSet(pto->GetId(), wtxid);
5824+                            if (!result.m_succeeded) {
5825+                                vInv.push_back(inv);
5826+                                // If the transaction fails because it collides with an existing one,
5827+                                // we also remove and fanout the conflict and all its descendants.
5828+                                // This is because our peer may have added the conflicting tranaction

naumenkogs commented at 9:44 am on November 4, 2024:

90059530379cf99cf4b82d664ad35c1ddefc3d07

“Would fail” is very confusing. “Failed reconciliation” will have a technical meaning in the latter commits. Here, i’d rather be more explicit “two transactions mapped to the same short-id won’t be announced between these two peers”.

sr-gi commented at 10:02 pm on November 13, 2024:

Done

in src/net_processing.cpp:5826 in 9005953037 outdated

5822+                            // If the transactions fails to get into the set, we fanout
5823+                            auto result = m_txreconciliation->AddToSet(pto->GetId(), wtxid);
5824+                            if (!result.m_succeeded) {
5825+                                vInv.push_back(inv);
5826+                                // If the transaction fails because it collides with an existing one,
5827+                                // we also remove and fanout the conflict and all its descendants.

naumenkogs commented at 9:45 am on November 4, 2024:

Let’s provide a hint why fanouting the descendants (or at least send a reader to where the dependency issue is explained better — we should have a one place explaining it)

sr-gi commented at 10:02 pm on November 13, 2024:

Done

in src/node/txreconciliation.cpp:62 in c790aa0e7e outdated

55@@ -56,6 +56,13 @@ class TxReconciliationState
56      */
57     uint64_t m_k0, m_k1;
58 
59+    /**
60+    * Set of transactions to be added to the reconciliation set on the next trickle. These are still unrequestable for
61+    * privacy reasons (to prevent transaction probing), transactions became available (moved to m_local_set) once they
62+    * would have been announced via fanout.

naumenkogs commented at 10:04 am on November 4, 2024:

c790aa0e7ebfcf0b03b4d0ef5fcc0ddfa4199943

I think “upon trickle” or “see m_next_inv_send_time” is much less confusing.

in src/node/txreconciliation.cpp:279 in c790aa0e7e outdated

285+        if (peer_state->m_delayed_local_set.insert(wtxid).second) {
286             peer_state->m_short_id_mapping.emplace(short_id, wtxid);
287             LogPrintLevel(BCLog::TXRECONCILIATION, BCLog::Level::Debug, "Added %s to the reconciliation set for peer=%d. "
288                                                                         "Now the set contains %i transactions.\n",
289-                          wtxid.ToString(), peer_id, peer_state->m_local_set.size());
290+                          wtxid.ToString(), peer_id, set_size + 1);

naumenkogs commented at 10:12 am on November 4, 2024:

c790aa0e7ebfcf0b03b4d0ef5fcc0ddfa4199943 This is probably the most important log in the entire file, and I think it’s better to distinguish two set sizes here (preferably other places the size is logged too).

sr-gi commented at 9:16 pm on November 14, 2024:

I changed ReconSetSize to return a tuple, and updated this to read:

0 "Now the set contains %i reconcilable transactions (plus %i delayed transactions).\n",

sr-gi commented at 9:22 pm on November 14, 2024:

I have also applied the same logic when removing

sr-gi commented at 4:08 pm on November 26, 2024:

Can this be resolved?

naumenkogs commented at 8:01 am on November 27, 2024:

yes

bitcoin deleted a comment on Nov 4, 2024

sr-gi force-pushed on Nov 13, 2024

in src/net_processing.cpp:2186 in ee46e21a5c outdated

2184+            // to reconcile, fanout the transaction an all its ancestors. We just add the parents here and leave fanout as true
2185+            auto it = std::find(fanout_with_ancestors.begin(), fanout_with_ancestors.end(), peer_id);
2186+            if (it != fanout_with_ancestors.end()) {
2187+                for (const auto wtxid: parents) {
2188+                    m_txreconciliation->TryRemovingFromSet(peer_id, wtxid);
2189+                    invs_to_send.push_back(wtxid);

naumenkogs commented at 7:16 am on November 14, 2024:

ee46e21a5c913175415a192661b406b328209800

First I thought why don’t fanout=true here and handle it below. We can’t do it because txid/wtxid difference (although we could have a txid_or_wtxid variable above….). But then, you certainly want to at least set fanout=false here, otherwise it remains true and added to invs_to_send twice (it’s not critical but a bug still).

sr-gi commented at 9:32 pm on November 14, 2024:

Not sure if this comment only applied before #30116 (review). But I think this is not currently the case.

Here we are only adding the parents. Leaving fanout=true triggers adding the actual transaction RelayTransaction was called with in the following if (fanout) block.

We could also add it here, but it would involve adding another boolean to skip the if (fanout) block, since we need to populate m_tx_inventory_to_send with data from invs_to_send which happens right after

sr-gi commented at 9:30 pm on November 26, 2024:

Is this still an issue?

naumenkogs commented at 8:04 am on November 27, 2024:

The issue is re-defining wtxid (for parent_wtxid) while it’s already defined as a transaction-in-question :) That was the source in my confusion originally

sr-gi commented at 5:46 pm on November 27, 2024:

Ohh I see, I’ll rename that to parent_wtxid for the internal context

sr-gi commented at 9:46 pm on December 3, 2024:

Done in https://github.com/bitcoin/bitcoin/pull/30116/commits/f51e36057e509356890a70ced3a7209f98e8db4f

sr-gi force-pushed on Nov 14, 2024

in src/node/txreconciliation.cpp:236 in 1dce9aedc5 outdated

231+        }
232+
233+        for (auto &[parent_count, peer_id]: parents_by_peer) {
234+            const auto state = std::get<TxReconciliationState>(m_states.find(peer_id)->second);
235+            for (const auto& wtxid: parents) {
236+                if (auto found = state.m_local_set.find(wtxid); found != state.m_local_set.end()) {

naumenkogs commented at 6:23 am on November 26, 2024:

1dce9aedc59eba2b623c7785f59430a586fbd833

the variable found could be dropped

sr-gi commented at 4:31 pm on November 26, 2024:

That commit is from an old revision

naumenkogs commented at 7:21 am on November 27, 2024:

 0        std::multimap<uint16_t, NodeId> parents_by_peer;
 1        for (const auto &[peer_id, state_or_salt]: m_states) {
 2            if (const auto state = std::get_if<TxReconciliationState>(&state_or_salt)) {
 3                const size_t parent_count = std::count_if(parents.begin(), parents.end(),
 4                       [=](const auto& wtxid){return state->m_local_set.find(wtxid) != state->m_local_set.end();});
 5                parents_by_peer.insert(std::make_pair(parent_count, peer_id));
 6            }
 7        }
 8
 9        std::vector<NodeId> sorted_peers;
10        sorted_peers.reserve(parents_by_peer.size());
11        for (const auto &[_, peer_id]: parents_by_peer) {
12            sorted_peers.emplace_back(peer_id);
13        }
14        return sorted_peers;

naumenkogs commented at 7:25 am on November 27, 2024:

Multimap is ordered by key, and for same-key (same parent_count) since c++11 the order is consistent (documented here). Feel free to include this as a comment, although I’m not even sure we care about the same exact output of multiple SortPeersByFewestParents calls.

(also it depends on the consistent ordering of m_states iterator too, which could be documented along the same lines, here and elsewhere it is iterated over)

naumenkogs commented at 7:58 am on November 27, 2024:

(this could also be done in 4b4d99fb2df2c60a2214487cec627bc560f50f53 GetFanoutTargets, although it’s not that much an improvement over there)

sr-gi commented at 8:32 pm on December 4, 2024:

I’ve been playing a bit with this, plus adding a test for SortPeersByFewestParents, which was missing. The multimap does have a consistent ordering, but it does reverse order of insertion instead of order of insertion for ties. Not really like it matters to be honest, if that is something we care about we can always start from a shuffled copy of m_states.

nvm, the order was changing because this is iterating over m_states, which is an unordered_map.

I’ll take the suggestion, but I’m thinking about modifying it a bit so the order does not have to be so reliant on m_states, which is something you also pointed out in an older review IIRC. What about something along the line of:

 0std::vector<NodeId> registered_peers;
 1for (const auto &[peer_id, _]: m_states) {
 2    registered_peers.push_back(peer_id);
 3}
 4std::shuffle(registered_peers.begin(), registered_peers.end(), FastRandomContext());
 5
 6std::multimap<uint16_t, NodeId> parents_by_peer;
 7for (const auto &peer_id: registered_peers) {
 8    if (const auto state = GetRegisteredPeerState(peer_id)) {
 9        const size_t parent_count = std::count_if(parents.begin(), parents.end(),
10               [state](const auto& wtxid){return state->m_local_set.find(wtxid) != state->m_local_set.end();});
11        parents_by_peer.emplace(parent_count, peer_id);
12    }
13}
14
15std::vector<NodeId> sorted_peers;
16sorted_peers.reserve(parents_by_peer.size());
17for (const auto &[_, node_id]: parents_by_peer) {
18    sorted_peers.emplace_back(node_id);
19}
20
21return sorted_peers;

naumenkogs commented at 9:02 am on December 5, 2024:

I think an extra shuffle is a bit too much, because then for consistency you’d do it every time we access m_states? I think a comment in those places where order might matter is sufficient.

Im fine if you do this extra shuffle i guess.

sr-gi commented at 6:38 pm on December 5, 2024:

I added a comment on the header explaining how the order of ties work and dropped the shuffle

in src/net_processing.cpp:2216 in 0092f70314 outdated

2194+                fanout = m_txreconciliation->ShouldFanoutTo(Wtxid::FromUint256(wtxid), peer_id, inbounds_fanout_tx_relay, outbounds_fanout_tx_relay);
2195+            }
2196         }
2197+
2198+        if (fanout || !m_txreconciliation->AddToSet(peer_id, Wtxid::FromUint256(wtxid)).m_succeeded) {
2199+            invs_to_send.push_back(peer->m_wtxid_relay ? wtxid : txid);

naumenkogs commented at 6:37 am on November 26, 2024:

0092f70314b8d59b41ff4bd2bc15a71f954668d4

Consider fanout=false because the peer has too many parents (cut the tail from fanout_with_ancestors). Now, we fail to add a transaction to the set (whatever reason, say set too full), and we’re now going to flood a child while reconciling the parents again?

Perhaps AddToSet result could be handled more carefully instead (or have a todo/comment, this is a corner case of a corner case of course).

naumenkogs commented at 6:51 am on November 26, 2024:

This is slightly changed later in 5996d473434d364cff937ae89ffd95cd09c5d1a7. AddToSetResult::Failed still not handled though?

sr-gi commented at 7:19 pm on November 26, 2024:

Yes, that’s right. In the current revision, if the set is full and the given transaction has some ancestors, the children will be flooded while the parents will be reconciled, which may cause orphans.

I think we could handle this at the same level that we handled fanout_with_ancestors:

Fanout to peers that have ancestors and have a full set
Fanout to peers that have the least amount of ancestors

However, this will still be an issue in some cases, given we only select a subset of peers to fanout to when these conditions hold (currently 2, but let’s say n). If we do not fanout to all peers that fall under these conditions, there will always be some that will receive parents and children using different approaches, which may cause orphans. However, I think it is undesirable to fanout to all matches in this situation.

This raises the question, is it worth the added complexity, or should we add a comment acknowledging that this can be the case, but that we expect it to happen really infrequently?

sr-gi commented at 9:45 pm on December 3, 2024:

I added a comment for this instead of making a full set part of the sorting criteria. This should not happen under normal circumstances provided the set size is defined to account for way over the expected traffic between reconciliations. A peer hitting the limit is likely to be either broken or an attacker, and I don’t think we should be catering to them (nor making the logic more complex based on that)

https://github.com/bitcoin/bitcoin/pull/30116/commits/f51e36057e509356890a70ced3a7209f98e8db4f

in src/net_processing.cpp:2204 in 0092f70314 outdated

2202+        for (const auto& hash : invs_to_send) {
2203+            if (!tx_relay->m_tx_inventory_known_filter.contains(hash)) {
2204+                tx_relay->m_tx_inventory_to_send.insert(hash);
2205+            }
2206+        }
2207+        invs_to_send.clear();

naumenkogs commented at 6:39 am on November 26, 2024:

https://github.com/bitcoin/bitcoin/commit/0092f70314b8d59b41ff4bd2bc15a71f954668d4

consider defining this variable inside the loop rather than cleaning it after each iteration?

in src/node/txreconciliation.cpp:432 in 840a4e14f2 outdated

444@@ -381,9 +445,9 @@ class TxReconciliationTracker::Impl
445         }
446 
447         for (auto &[parent_count, peer_id]: parents_by_peer) {
448-            const auto state = std::get<TxReconciliationState>(m_states.find(peer_id)->second);
449+            auto state = std::get<TxReconciliationState>(m_states.find(peer_id)->second);
450             for (const auto& wtxid: parents) {
451-                if (auto found = state.m_local_set.find(wtxid); found != state.m_local_set.end()) {
452+                if (state.ContainsTx(wtxid, /*include_delayed=*/true)) {

naumenkogs commented at 7:05 am on November 26, 2024:

840a4e14f2d833717c87a54ec2707a92d5131ada

been thinking about the use of a delayed set here.

We apply the delay only on addition, but not on fanouting parents (or counting them), neither on removing stuff from the sets (say at receiving a transaction).

For me the choice is pretty arbitrary (and probably fine to save the code complexity for this rare corner case), but perhaps we would benefit from better documentation around it.

sr-gi commented at 8:24 pm on November 26, 2024:

I think I’m not following here. We do apply the delayed set both on addition (AddToSet), deletion (RemoveFromSet), and when counting parents (this bit of code is part of the parent sorting).

naumenkogs commented at 6:45 am on November 27, 2024:

I mean the delay specifically, not the set. If we talk about an abstract “reconciliation set” (which is a combination of two), you apply the delay for an addition, but not for the removal action.

sr-gi commented at 5:47 pm on December 4, 2024:

I see your point. The reasoning for this is so we can use ContainsTx in testing to check that things have been added/removed to/from the proper collection. Throughout the rest of the codebase, we always call this with true, and we do an implicit check on deletion.

I’m happy to consider an alternative that covers the same functionality if you have one

in src/node/txreconciliation.cpp:416 in fc06786f32 outdated

454-     * due to caching. Thus, we can save us adding that extra function for now.
455-     */
456-    bool ShouldFanoutTo(const Wtxid& wtxid, NodeId peer_id,
457-                        size_t inbounds_fanout_tx_relay, size_t outbounds_fanout_tx_relay)
458-        EXCLUSIVE_LOCKS_REQUIRED(!m_txreconciliation_mutex)
459+    bool ShouldFanoutTo(NodeId peer_id, std::vector<NodeId> &fanout_targets) EXCLUSIVE_LOCKS_REQUIRED(!m_txreconciliation_mutex)

naumenkogs commented at 7:10 am on November 26, 2024:

fc06786f32b2fc82c9571ebde67b4ff54f4ad825

this wrapper is not needed anymore. The Registered check already happens at the caller site. Just drop it and expose GetFanoutTargets?

sr-gi commented at 8:29 pm on November 26, 2024:

Sure. This also implies dropping the bench for it, which honestly I do not think is as useful anymore now that we do not have a cache

naumenkogs commented at 8:00 am on November 27, 2024:

Yeah, and perhaps good time to squash this into eb2b19d2c237f2cc7b4ff3e7cefec52a37eb24b3, it got much stale by now.

sr-gi commented at 9:47 pm on December 3, 2024:

Done in the last force push

in src/node/txreconciliation.cpp:417 in fc06786f32 outdated

423-        std::sort(best_peers.begin(), best_peers.end());
424-
425-        auto it = best_peers.begin();
426-        for (size_t i = 0; i < targets_size && it != best_peers.end(); ++i, ++it) {
427-            if (it->second == peer_id) return true;
428+        // Sort the peers based on their assigned random value, extract the node_ids and trim the collections to size

naumenkogs commented at 10:44 am on November 26, 2024:

 0        // TODO: now rename sorted_inbounds to weighted_inbounds
 1        std::vector<NodeId> fanout_targets(inbounds_target_size + outbounds_target_size);
 2        auto collect_fanout_targets = [&fanout_targets](std::vector<std::pair<uint64_t, NodeId>> weighted_peers, const size_t target_size) {
 3            const size_t result_size = std::min(target_size, weighted_peers.size());
 4            if (result_size == 0) return;
 5            std::nth_element(weighted_peers.begin(), weighted_peers.begin() + result_size, weighted_peers.end());
 6            for_each(weighted_peers.begin(), weighted_peers.begin() + result_size,
 7                    [&fanout_targets](auto& keyed_peer) { fanout_targets.push_back(keyed_peer.second); });
 8        };
 9
10        collect_fanout_targets(sorted_inbounds, inbounds_target_size);
11        collect_fanout_targets(sorted_outbounds, outbounds_target_size);
12        return fanout_targets;

sr-gi commented at 9:00 pm on November 26, 2024:

I think this is slightly harder to follow, but I like that the collections are only sorted up to some point, so I’ll take it. Instead of using the std::min, I’ve added two Assume when defining {outbounds, inbounds}_target_size to ensure that the targets are never bigger than the sets the pull from, since this should never be the case

sr-gi commented at 9:29 pm on November 26, 2024:

Covered in 4b4d99f

in src/node/txreconciliation.cpp:394 in fc06786f32 outdated

408-            if (peer_state && peer_state->m_we_initiate == we_initiate) {
409-                uint64_t hash_key = CSipHasher(deterministic_randomizer).Write(node_id).Finalize();
410-                best_peers.emplace_back(hash_key, node_id);
411+            if (peer_state) {
412+                if (peer_state->m_we_initiate) {
413+                    uint64_t hash_key = CSipHasher(deterministic_randomizer_out).Write(node_id).Finalize();

naumenkogs commented at 11:33 am on November 26, 2024:

 0        auto assign_key = [](const TxReconciliationState& peer_state, NodeId node_id, CSipHasher randomizer, std::vector<std::pair<uint64_t, NodeId>>& weighted_peers) {
 1            uint64_t hash_key = randomizer.Write(node_id).Finalize();
 2            weighted_peers.emplace_back(hash_key, node_id);
 3        };
 4
 5        for (const auto& [node_id, op_peer_state]: m_states) {
 6            const auto peer_state = std::get_if<TxReconciliationState>(&op_peer_state);
 7            if (peer_state) {
 8                if (peer_state->m_we_initiate) {
 9                    assign_key(*peer_state, node_id, deterministic_randomizer_out, sorted_outbounds);
10                } else {
11                    assign_key(*peer_state, node_id, deterministic_randomizer_in, sorted_inbounds);
12                }
13            }
14        }

very subjectively a little cleaner :)

sr-gi commented at 9:29 pm on November 26, 2024:

I don’t think it makes a huge difference, but I’ll take it -> 4b4d99f

in src/net_processing.cpp:2166 in 0092f70314 outdated

2154+                if (!parents_refs.empty()) {
2155+                    for (const auto &tx : parents_refs) {
2156+                        parents.emplace_back(tx.get().GetTx().GetWitnessHash());
2157+                    }
2158+                    fanout_with_ancestors = m_txreconciliation->SortPeersByFewestParents(parents);
2159+                    fanout_with_ancestors.resize(2); // FIXME: Resize to 2 for now

naumenkogs commented at 11:42 am on November 26, 2024:

Better check if the size is not 1 already? It’s currently safe, but could cause bugs in the future.

sr-gi commented at 9:33 pm on November 26, 2024:

This is just temporary. It should not be merged like this. I’ll leave it open once we decide on a clear way to trim

sr-gi force-pushed on Nov 26, 2024

in src/node/txreconciliation.cpp:334 in 8a73b1ca91 outdated

219+        if (registered && !registered->m_we_initiate) {
220+            Assert(m_inbounds_count > 0);
221+            --m_inbounds_count;
222         }
223+
224+        m_states.erase(peer);

naumenkogs commented at 7:35 am on November 27, 2024:

8a73b1ca91b1ff47b56451ca4074f73ef71b9f79

Mind handling the result of erasure here? Either Assert/Assume or if

sr-gi commented at 4:29 pm on December 4, 2024:

Covered in 9918f3b08768dece8d8daf73b8144ada60c4ae5a

in src/node/txreconciliation.cpp:400 in 4b4d99fb2d outdated

420-        for (size_t i = 0; i < targets_size && it != best_peers.end(); ++i, ++it) {
421-            if (it->second == peer_id) return true;
422-        }
423-        return false;
424+        // Make sure we never select more targets than we can
425+        Assert(outbounds_target_size <= weighed_outbounds.size());

naumenkogs commented at 7:48 am on November 27, 2024:

4b4d99fb2df2c60a2214487cec627bc560f50f53

this could be moved into the lambda :)

sr-gi commented at 4:28 pm on December 4, 2024:

That means capturing 4 additional arguments in the lambda just for these two asserts, which seems overkill

sr-gi commented at 7:05 pm on December 4, 2024:

TIL about [=] for lambdas, I may take it on the next push

sr-gi commented at 6:38 pm on December 5, 2024:

Added in 4d66110

DrahtBot added the label CI failed on Dec 3, 2024

sr-gi force-pushed on Dec 3, 2024

DrahtBot removed the label CI failed on Dec 3, 2024

sr-gi force-pushed on Dec 4, 2024

sr-gi force-pushed on Dec 5, 2024

DrahtBot added the label CI failed on Dec 5, 2024

DrahtBot commented at 7:04 pm on December 5, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/33993403729

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label CI failed on Dec 5, 2024

in src/node/txreconciliation.cpp:422 in 4b9f83a8b7 outdated

398+
399+        // Sort the peers based on their assigned random value, extract the node_ids and trim the collections to size
400+        std::vector<NodeId> fanout_targets(inbounds_target_size + outbounds_target_size);
401+        auto collect_fanout_targets = [&](std::vector<std::pair<uint64_t, NodeId>> weighted_peers, const size_t target_size) {
402+            // Make sure we never select more targets than we can
403+            Assert(outbounds_target_size <= weighed_outbounds.size());

marcofleon commented at 5:21 pm on December 10, 2024:

Fuzzed the class for a bit (fuzz test here: https://github.com/marcofleon/bitcoin/commits/erlay30116-fuzztest/) and this assertion failed.

I think having a preregistered peer and one inbound peer in m_states causes outbound_target_size to be 1 while weighed_outbounds.size() is 0 because assign_key is never called.

The equivalent unit test (which fails when I run it) would be something like

 0CSipHasher hasher(0x0706050403020100ULL, 0x0F0E0D0C0B0A0908ULL);
 1TxReconciliationTracker tracker(TXRECONCILIATION_VERSION, hasher);
 2NodeId peer_id0 = 0;
 3NodeId peer_id1 = 1;
 4FastRandomContext frc{/*fDeterministic=*/true};
 5std::vector<NodeId> fanout_targets;
 6
 7tracker.PreRegisterPeer(peer_id0);
 8tracker.PreRegisterPeer(peer_id1);
 9BOOST_REQUIRE_EQUAL(tracker.RegisterPeer(peer_id1, /*is_peer_inbound=*/true, 1, 1), ReconciliationRegisterResult::SUCCESS);
10fanout_targets = tracker.GetFanoutTargets(Wtxid::FromUint256(frc.rand256()), /*inbounds_fanout_tx_relay=*/0, /*outbounds_fanout_tx_relay=*/0);

Maybe it makes sense to have separate maps for preregistered vs registered peers?

naumenkogs commented at 7:52 am on December 11, 2024:

Great fuzzing!

ahhhh the sloppy part is m_states.size() - m_inbounds_count; , it ignores the presence of preregistered peers.

Maybe it makes sense to have separate maps for preregistered vs registered peers?

I’d rather have m_outbounds_count cached value (all similar to m_inbounds_count), but no big deal.

sr-gi commented at 7:24 pm on January 2, 2025:

I’ve cached m_outbounds_count to prevent this from happening.

This made me realize that fanout_targets was incorrectly initialized, instead of reserving space for inbounds_target_size + outbounds_target_size, it was being initialized with inbounds_target_size + outbounds_target_size pre-existing elements initialized to 0, meaning that the returned vector contained more elements than expected

sr-gi commented at 7:29 pm on January 2, 2025:

Addressed in 804ccf030362fe5b232fe5c4213b77899a425f2b

in src/node/txreconciliation.cpp:383 in 4b9f83a8b7 outdated

378+        std::vector<std::pair<uint64_t, NodeId>> weighted_inbounds, weighed_outbounds;
379+        weighted_inbounds.reserve(m_inbounds_count);
380+        Assume(m_states.size() >= m_inbounds_count);
381+        weighed_outbounds.reserve(m_states.size() - m_inbounds_count);
382+
383+        auto assign_key = [](const TxReconciliationState& peer_state, NodeId node_id, CSipHasher randomizer, std::vector<std::pair<uint64_t, NodeId>>& weighted_peers) {

marcofleon commented at 5:22 pm on December 10, 2024:

Does peer_state need to be passed in here? I don’t think it’s used.

sr-gi commented at 7:20 pm on January 2, 2025:

It does not. I’ll remove it

sr-gi commented at 7:29 pm on January 2, 2025:

Addressed in 804ccf030362fe5b232fe5c4213b77899a425f2b

in src/node/txreconciliation.cpp:189 in 131c28bfd9 outdated

132@@ -115,10 +133,57 @@ class TxReconciliationTracker::Impl
133                       peer_id, is_peer_inbound);
134 
135         const uint256 full_salt{ComputeSalt(local_salt, remote_salt)};
136-        recon_state->second = TxReconciliationState(!is_peer_inbound, full_salt.GetUint64(0), full_salt.GetUint64(1));
137+
138+        auto new_state = TxReconciliationState(!is_peer_inbound, full_salt.GetUint64(0), full_salt.GetUint64(1));;
139+        m_states.erase(recon_state);
140+        bool emplaced = m_states.emplace(peer_id, std::move(new_state)).second;

ariard commented at 0:22 am on December 22, 2024:

I think this snippet of code could be documented in the following fashion

0// If we receive many `SENDTXRCNCL` from the same peer, before to receive a
1// single `VERACK`, re-register the peer id state with potentially different local
2// and remote salts.

As far as I can see, at SENDTXRCNCL reception we do not check for duplicate SENDTXRCNCL message (we check if we have txreconciliation enabled, then that it’s happening before verack reception and then that transaction-relay is enabled on the link) and we can call RegisterPeer multiple times.

sr-gi commented at 8:43 pm on January 2, 2025:

Calling RegisterPeer with the same peer_id multiple times will result in ReconciliationRegisterResult::ALREADY_REGISTERED.

The docs of both PreRegisterPeer and RegisterPeer specify that they should called only once.

in src/node/txreconciliation.cpp:153 in 131c28bfd9 outdated

149+        LOCK(m_txreconciliation_mutex);
150+        auto peer_state = GetRegisteredPeerState(peer_id);
151+        if (!peer_state) return false;
152+
153+        // Transactions which don't make it to the set due to the limit are announced via fan-out.
154+        if (peer_state->m_local_set.size() >= MAX_RECONSET_SIZE) return false;

ariard commented at 0:29 am on December 22, 2024:

Logging that a given peer’s local reconciliation set max size has been reached can be useuful info for future debug.

 0diff --git a/src/node/txreconciliation.cpp b/src/node/txreconciliation.cpp
 1index d4d6bdf1ea..b99bb41681 100644
 2--- a/src/node/txreconciliation.cpp
 3+++ b/src/node/txreconciliation.cpp
 4@@ -150,7 +150,11 @@ public:
 5         if (!peer_state) return false;
 6 
 7         // Transactions which don't make it to the set due to the limit are announced via fan-out.
 8-        if (peer_state->m_local_set.size() >= MAX_RECONSET_SIZE) return false;
 9+        if (peer_state->m_local_set.size() >= MAX_RECONSET_SIZE) {
10+            LogPrintLevel(BCLog::TXRECONCILIATION, BCLog::Level::Debug, "Reconciliation set maximum size reachedfor peer=%d. ",
11+                    peer_id);
12+            return false;
13+        }
14 
15         // The caller currently keeps track of the per-peer transaction announcements, so it
16         // should not attempt to add same tx to the set twice. However, if that happens, we will

sr-gi commented at 8:44 pm on January 2, 2025:

Added in b84348a

in src/node/txreconciliation.cpp:215 in 8811fd28e1 outdated

208@@ -206,6 +209,27 @@ class TxReconciliationTracker::Impl
209     }
210 };
211 
212+AddToSetResult::AddToSetResult(bool succeeded, std::optional<Wtxid> conflict)
213+{
214+    m_succeeded = succeeded;
215+    m_conflict = conflict;

ariard commented at 0:45 am on December 22, 2024:

I think the variable naming could be enhanced, e.g m_collision, to dissociate from mempool conflicts. Here, it’s a collision from SipHash outputs that can arise from 2 distinct transactions, not even spending the same set of UTXOs.

sr-gi commented at 8:46 pm on January 2, 2025:

Addressed in 0e290996874200317ba4d3f1b3e12992b40db4d1

in src/net_processing.h:121 in 478137aa82 outdated

107@@ -108,6 +108,9 @@ class PeerManager : public CValidationInterface, public NetEventsInterface
108     /** Relay transaction to all peers. */
109     virtual void RelayTransaction(const uint256& txid, const uint256& wtxid) = 0;
110 
111+    /** Get the amount on inbounds and outbounds fanout peers. */
112+    virtual std::pair<size_t, size_t> GetFanoutPeersCount() = 0;

ariard commented at 0:48 am on December 22, 2024:

nit: s/Get the amount on/Get the amount of/g. And can say inbound is given first.

sr-gi commented at 8:46 pm on January 2, 2025:

Addressed in a461a1204109762ea38b21d9b6c445e9b9ba570f

ariard commented at 1:05 am on December 22, 2024: none

Reviewed up to commit 478137aa (“p2p: Add PeerManager method to count…”). Just minor comments so far and I still have to answer back comments on #28765.

in src/node/txreconciliation.h:159 in c86f4f3359 outdated

129+     * can be found in their reconciliation sets (from less to more). If two values tie, they will be
130+     * sorted in order or addition, which depends on the internal order of [m_states].
131+     * This is meant to be called with a collection of ancestor ids for a given transaction in order to
132+     * decide whether the transaction should be fanout with ancestors or not.
133+    */
134+    std::vector<NodeId> SortPeersByFewestParents(std::vector<Wtxid> parents);

ariard commented at 10:13 pm on January 1, 2025:

I think the rationale for introducing SortPeersByFewestParents could be more fleshed out, i.e iiuc explaining that eligible peers with the lowest number of parents for a target transaction in their reconciliation sets are favored for fan-out announcements, as this can be a hint of better connectivity in the overall transaction-relay graph and we wish to propagate the target transaction faster (..?).

sr-gi commented at 7:58 pm on January 2, 2025:

The rationale for who this method is used is provided in RelayTransaction when the method is used

https://github.com/bitcoin/bitcoin/blob/804ccf030362fe5b232fe5c4213b77899a425f2b/src/net_processing.cpp#L2154-L2159

I think the docs give enough rationale as to what this is intended for without being excessively narrow as to what the use cases could be.

in src/net_processing.cpp:2245 in feb8c98db1 outdated

2219+            // This should not happen under normal conditions, given the set size should be well over the number of transactions received
2220+            // between reconciling intervals. A peer hitting the limit is likely to be either a broken implementation or an attacker.
2221+            invs_to_send.push_back(peer->m_wtxid_relay ? wtxid : txid);
2222+        }
2223+
2224+        for (const auto& hash : invs_to_send) {

ariard commented at 10:46 pm on January 1, 2025:

If the peer is m_wtxid_relay=false should m_tx_inven tory_known_filter be checked by txid ? Note, given the parent identifiers are retrieved from GetMemPoolParents() we can have the identifiers easily. This could avoid some duplicated INV flooding.

sr-gi commented at 7:48 pm on January 2, 2025:

This is already the case. The transaction that is passed to RelayTransaction is added to invs_to_send using wtxid/txid based on m_wtxid_relay (that is part of the code you quoted already). For other transactions that may be in inv_to_send (e.g. ancestors and/or descendants), this only apply to Erlay, and Erlay enforces WTXID_RELAY

ariard commented at 10:58 pm on January 1, 2025: none

Reviewed up to commit feb8c98db (“p2p: Add transactions to reconciliation sets…”).

sr-gi force-pushed on Jan 2, 2025

in src/node/txreconciliation.cpp:223 in bbdc84ca0f outdated

224+        // Make sure there is no short id collision between the wtxid we are trying to add
225+        // and any existing one in the reconciliation set
226+        Wtxid collision;
227+        uint32_t short_id;
228+        if (HasCollision(peer_state, wtxid, collision, short_id)) {
229+            return AddToSetResult::Collision(collision);

ariard commented at 10:44 pm on January 2, 2025:

nit: The Collision here could be LogPrintLevel(BCLog::TXRECONCILIATION, BCLog::Level::Debug, “…”).

ariard commented at 10:56 pm on January 2, 2025: none

Reviewed up to 4b9f83a (“p2p: Makes transactions available…”), all the code changes. I’ll check the test coverage.

sr-gi force-pushed on Mar 25, 2025

sr-gi commented at 8:36 pm on March 25, 2025: member

Updated the approach based on the results from the simulations @naumenkogs and I have been running.

sr-gi force-pushed on Apr 1, 2025

DrahtBot added the label CI failed on Apr 1, 2025

sr-gi force-pushed on Apr 2, 2025

DrahtBot removed the label CI failed on Apr 2, 2025

sr-gi force-pushed on Apr 3, 2025

DrahtBot added the label CI failed on Apr 3, 2025

sr-gi force-pushed on Apr 8, 2025

DrahtBot removed the label CI failed on Apr 8, 2025

sr-gi force-pushed on Apr 15, 2025

sr-gi force-pushed on Jun 18, 2025

DrahtBot added the label Needs rebase on Jul 11, 2025

sr-gi force-pushed on Jul 16, 2025

DrahtBot removed the label Needs rebase on Jul 16, 2025

sr-gi force-pushed on Jul 23, 2025

DrahtBot added the label CI failed on Jul 23, 2025

DrahtBot commented at 9:09 pm on July 23, 2025: contributor

🚧 At least one of the CI tasks failed. Task CentOS, depends, gui: https://github.com/bitcoin/bitcoin/runs/46596699781 LLM reason (✨ experimental): The CI failure is caused by an assertion failure in txreconciliation.cpp during the “AddToSetTest” case, leading to test subprocess abortion.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sr-gi force-pushed on Jul 24, 2025

DrahtBot removed the label CI failed on Jul 25, 2025

sr-gi force-pushed on Jul 28, 2025

DrahtBot added the label CI failed on Jul 28, 2025

DrahtBot commented at 2:58 pm on July 28, 2025: contributor

🚧 At least one of the CI tasks failed. Task ARM, unit tests, no functional tests: https://github.com/bitcoin/bitcoin/runs/46868137690 LLM reason (✨ experimental): The CI failure is caused by unresolved version control conflict markers remaining in the source code file.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sr-gi force-pushed on Jul 28, 2025

DrahtBot removed the label CI failed on Jul 29, 2025

DrahtBot added the label Needs rebase on Jul 30, 2025

refactor: redesigns txreconciliation file split and namespace

Splits the txreconciliation logic in three files instead of two, allowing the
TxreconciliationState to be properly tested, instead of being internal to
txreconciliation.cpp.

Also defines includes everything in the node namespace, instead of being part
of an anonymous one.

63cd3b1dad

refactor: remove legacy comments

These comments became irrelevant in one of the previous code changes.
They simply don't make sense anymore.

326e3db51a

refactor: use LogDebug instead of LogPrintLevel in txreconciliation 1a972f3712

p2p: Functions to add/remove wtxids to tx reconciliation sets

They will be used later on.

9b86119e33

p2p: Make short id collision detectable when adding wtxids to tx reconciliation sets 1316feae61

p2p: Add PeerManager method to count the amount of inbound/outbounds fanout peers 35163fb1a0

p2p: Cache inbound reconciling peers count

It helps to avoid recomputing every time we consider
a transaction for fanout/reconciliation.

Co-authored-by: Sergi Delgado Segura <sergi.delgado.s@gmail.com>

aec58de788

p2p: Add method to decided whether to fanout or reconcile a transactions

Fanout or reconciliation is decided on a transaction basis, based on the following criteria:

If the peer is inbound, we fanout to a pre-defined subset of peers (which is rotated periodically).
If the peer is outbound, we will reconcile the transaction if we received it via reconciliation, or
defer the decision to relay time otherwise. At relay time, we will fanout to outbounds until a threshold is met
(selecting peers in the order their timers go off) and reconcile with the rest.

With this approach we try to fanout when we estimate to be early in the propagation of the transaction,
and reconcile otherwise. Notice these heuristics don't apply to inbound peers, since they would be easily
exploitable. For inbounds we just aim for a target subset picked at random.

b364a0b9e4

p2p: Add transactions to reconciliation sets

Transactions eligible for reconciliation are added to the reconciliation sets. For the remaining txs, low-fanout is used.

Co-authored-by:  Gleb Naumenko <naumenko.gs@gmail.com>

c4be2b5717

p2p: Add helper to compute reconciliation tx short ids and a cache of short ids to wtxids 05e4afd1f7

p2p: Deal with shortid collisions for reconciliation sets

If a transaction to be added to a peer's recon set has a shot id collisions (a previously
added wtxid maps to the same short id), both transaction should be fanout, given
our peer may have added the opposite transaction to our recon set, and these two
transaction won't be reconciled.

3eab0e84b6

sr-gi force-pushed on Aug 1, 2025

sr-gi commented at 9:40 pm on August 1, 2025: member

I’ve rebased to account for the recent GenTxid changes by @marcofleon.

I’ve also decided to clean the design while I was at it, to make things easier to follow and test. As part of this, the file structure has been changed from two files (txreconciliation.h and txreconciliation.cpp) to three (txreconciliation.h, txreconciliation_impl.h and txreconciliation_impl.cpp), following a similar approach as txdownloadman.

The code is split as follows:

txreconciliation.h contains the general reconciliation stuff, and everything that is not implementation dependent (coming from BIP 330). This also includes the TxReconciliationState, which was previously defined in txreconciliation.cpp and was therefore not exposed, nor tested.
txreconciliation_impl.h defined the TransactionReconciliationTracker using PIMPL
txreconciliation_impl.cpp has the actual implementation of the tracker

DrahtBot removed the label Needs rebase on Aug 1, 2025

p2p: Fill reconciliation sets (Erlay) attempt 2 #30116

Code Coverage & Benchmarks

Reviews

Conflicts

LLM Linter (✨ experimental)