The new approach differs from the previous in how peers are selected for fanout:
For Erlay-enabled peers, fanout targets are chosen for each transaction individually. Once the fanout target is reached, the transaction is reconciled with the rest of our peers. Whether a peer is selected for fanout depends on factors like peer type (inbound or outbound), how the transaction was received (via fanout or reconciliation), and how many of our peers are already aware of it.
For outbound peers:
If the transaction was received through fanout (or we originated it), we set a fanout rate threshold OUTBOUND_FANOUT_THRESHOLD{4}. As long as the transaction has been announced to (or received from) fewer than this number of peers, we continue faning out based on the order in which peers’ Poisson timers go off.
If the transaction was received via reconciliation, we simply reconcile it with the rest of our peers.
For inbound peers:
We select a subset of inbound peers as fanout targets and send all transactions to them within a set time interval. Once the interval ends, the selection is randomly rotated.
This approach helps scale the transaction fanout rate for outbound peers by approximating how widely a transaction has already been propagated. For inbound peers, we use a random selection strategy to prevent them from easily exploiting our heuristic.
You may notice 398078a8a92725f33ac28107f8fa045e02b2c5bd is not currently being used. It is likely to be dropped and brought back in a follow-up. Currently, we are picking Erlay fanout targets independently of whether we already have other non-erlay peers (which are always fanout). We may want to scale the selection based on non-Erlay connections, but I think that can be addressed on it’s own PR.
See the guideline for information on the review process.
A summary of reviews will appear here.
Conflicts
Reviewers, this pull request conflicts with the following ones:
#30277 ([DO NOT MERGE] Erlay: bandwidth-efficient transaction relay protocol (Full implementation) by sr-gi)
#28690 (build: Introduce internal kernel library by TheCharlatan)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
LLM Linter (✨ experimental)
Possible typos and grammar issues:
In txreconciliation.h: “In case of a collision, both transactions should be fanout.” -> “should be fanned out” [‘fanout’ is a noun; as a verb form it should be ‘fanned out’]
In net_processing.cpp near “This in only relevant for Erlay nodes…” -> “This is only relevant for Erlay nodes…” [missing “is”]
In net_processing.cpp comment “into a private function We would have to pass…” -> insert period: “…private function. We would have to pass…” [run-on sentence; needs a period for clarity]
drahtbot_id_4_m
DrahtBot added the label
P2P
on May 15, 2024
sr-gi
commented at 8:05 pm on May 15, 2024:
member
I’ve talked to @naumenkogs about picking this up and he was happy about it. I’m happy to close this if he changes his mind.
sr-gi renamed this:
p2p: Fill reconciliation sets (Erlay) attempt: 2
p2p: Fill reconciliation sets (Erlay) attempt 2
on May 15, 2024
in
src/node/txreconciliation.h:131
in
7c047d30cboutdated
80+ * Returns whether the transaction appears in the set.
81+ */
82+ bool AddToSet(NodeId peer_id, const Wtxid& wtxid);
83+
84+ /**
85+ * Before Step 2, we might want to remove a wtxid from the reconciliation set, for example if
In “p2p: Functions to add/remove wtxids to tx reconciliation sets” 7c047d30cb0eadc8a424cb01de2fcd0978e22206: What is the step 2? This comment seems a little confuse because I couldn’t find it as I see for Step 1.
Updated to track both successful and unsuccessful deletions (in debug log, to prevent unnecessarily spamming the logs)
in
src/bench/txreconciliation.cpp:7
in
3f59bf83a4outdated
0@@ -0,0 +1,40 @@
1+// Copyright (c) 2020-2022 The Bitcoin Core developers
2+// Distributed under the MIT software license, see the accompanying
3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
4+
5+#include <bench/bench.h>
6+
7+#include <net.h>
In “p2p: Functions to add/remove wtxids to tx reconciliation sets” 2e29e9beeebcadfc7afc56a28791456d40551bed: I think MAX_SET_SIZE could be in txreconciliation.h then we can use it in the tests.
0-/**
1- * Maximum number of wtxids stored in a peer local set, bounded to protect the memory use of
2- * reconciliation sets and short ids mappings, and CPU used for sketch computation.
3- */
4-constexpr size_t MAX_SET_SIZE = 3000;
5 6 /**
7 * Maximum number of transactions for which we store assigned fanout targets.
8diff --git a/src/node/txreconciliation.h b/src/node/txreconciliation.h
9index c2cdf9875f..648b6369c0 100644
10--- a/src/node/txreconciliation.h
11+++ b/src/node/txreconciliation.h
12@@ -14,6 +14,12 @@
13 /** Supported transaction reconciliation protocol version */
14 static constexpr uint32_t TXRECONCILIATION_VERSION{1};
1516+/**
17+ * Maximum number of wtxids stored in a peer local set, bounded to protect the memory use of
18+ * reconciliation sets and short ids mappings, and CPU used for sketch computation.
19+ */
20+constexpr size_t MAX_SET_SIZE = 3000;
21+
22 enum class ReconciliationRegisterResult {
23 NOT_FOUND,
24 SUCCESS,
25diff --git a/src/test/txreconciliation_tests.cpp b/src/test/txreconciliation_tests.cpp
26index a0071e4388..8befe0d36e 100644
27--- a/src/test/txreconciliation_tests.cpp
28+++ b/src/test/txreconciliation_tests.cpp
29@@ -115,7 +115,7 @@ BOOST_AUTO_TEST_CASE(AddToSetTest)
30 BOOST_REQUIRE_EQUAL(tracker.RegisterPeer(peer_id1, true, 1, 1), ReconciliationRegisterResult::SUCCESS);
31 BOOST_CHECK(tracker.IsPeerRegistered(peer_id1));
3233- for (size_t i = 0; i < 3000; ++i)
34+ for (size_t i = 0; i < MAX_SET_SIZE; ++i)
35 BOOST_REQUIRE(tracker.AddToSet(peer_id1, Wtxid::FromUint256(frc.rand256())));
36 BOOST_REQUIRE(!tracker.AddToSet(peer_id1, Wtxid::FromUint256(frc.rand256())));
I’ve been talking to @sipa about this recently and I think the limit may not be necessary if the approach for when to compute the reconciliation sets is changed. I’m may get rid of it, but if not, I’ll take this
Ended up taking this, since the limit will be kept in the end.
Thanks!
sr-gi force-pushed
on May 30, 2024
sr-gi force-pushed
on May 31, 2024
sr-gi
commented at 4:29 pm on May 31, 2024:
member
I’ve slightly extended the approach adding 3 commits to deal with short id collisions, which were not taken into account. Some of this may be squashable, I’ve added them separately for now so they are easy to diff/review.
This would be missing an additional commit/amend to deal with #28765 (review), which I overlooked when addressing the outstanding comments
sr-gi force-pushed
on May 31, 2024
DrahtBot added the label
CI failed
on May 31, 2024
DrahtBot
commented at 4:32 pm on May 31, 2024:
contributor
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot removed the label
CI failed
on Jun 3, 2024
sr-gi force-pushed
on Jun 7, 2024
sr-gi marked this as a draft
on Jun 7, 2024
sr-gi
commented at 6:21 pm on June 7, 2024:
member
I added two more commits, moving the fanout/reconciling logic to RelayTransaction instead of send message, plus dealing with ancestors in mempool, instead of descendants (which seemed to be the wrong approach).
I’m moving this to draft for now until I clean it a bit, plus get some feedback on the approach
DrahtBot added the label
CI failed
on Jun 8, 2024
DrahtBot
commented at 5:17 am on June 8, 2024:
contributor
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
Prabhat1308
commented at 11:58 am on June 11, 2024:
What is the intention behind moving salt calculation out of PreRegisterPeer ? Now if PreRegisterPeer is not executed due to lock being held we are doing meaningless computation .
The purpose is for PreRegisterPeer to be callable with a fixed salt (via PreRegisterPeerWithSalt), so collisions can be tested. This is just moving it out of the PImpl, the effects on the external caller should be the same
sr-gi force-pushed
on Jun 11, 2024
sr-gi force-pushed
on Jun 11, 2024
DrahtBot removed the label
CI failed
on Jun 12, 2024
sr-gi force-pushed
on Jun 13, 2024
sr-gi force-pushed
on Jun 13, 2024
sr-gi force-pushed
on Jun 13, 2024
sr-gi marked this as ready for review
on Jun 13, 2024
sr-gi
commented at 8:01 pm on June 13, 2024:
member
This should be ready for review now.
in
src/net_processing.cpp:2166
in
9de4233d00outdated
2366+ if (!parents_refs.empty()) {
2367+ for (const auto &tx : parents_refs) {
2368+ parents.emplace_back(tx.get().GetTx().GetWitnessHash());
2369+ }
2370+ fanout_with_ancestors = m_txreconciliation->SortPeersByFewestParents(parents);
2371+ fanout_with_ancestors.resize(2); // FIXME: Resize to 2 for now
This is pending. Happy get get feedback on how many to pick (either hardcoded or based on what)
Prabhat1308
commented at 3:26 pm on June 16, 2024:
The number of parents per peer should mostly be in the range of (0-2/3) , This should include a significant numbers of peers to reconcile with . A constant number might not be ideal considering if the array returned is smaller than that constant number , we might be reconciling with non-existent peers .
I’m curious to know where do you get that parent count range from 🤔
Prabhat1308
commented at 7:57 pm on June 19, 2024:
I got some data for about 4100+ transactions from a source (cant be sure of the credibility) for bitcoin transactions in a mempool and i ran a script to count the avg. parents which came to approx. 1.6 . Another way I thought of this is I got the avg. number of inputs per transaction . For 2023-2024 the average is somewhat between 2-3 . If we make a tree like dependency structure for transaction considering the transactions which had their inputs as utxo’s , the probability of having higher ancestories diminishes rapidly since any transaction missing will eliminate the ancestory above it . Considering the mempool are in general highly updated , ancestories beyond grandparent transactions are not very likely to be there . So if I consider this , there are supposed to be 6 ancestors for a transactions on an average and we can expect 2-3 range to have a significant number of transactions.
Notice this does not apply to all transactions though. The exception here applies to transactions that have some ancestors in the mempool. In that case, we need to be consistent with the way we share this set of transactions, otherwise, it could be the case that the parents are reconciled at time T and the children are fanout at T-n, potentially making the children orphan.
Under these conditions, we are checking how many peers have the ancestors in their reconciliation sets, and selecting a subset of the ones with fewer of them, so the amount of transactions to be fanout is the smallest possible.
brunoerg
commented at 2:28 pm on September 25, 2024:
Shouldn’t we decrease the number of inbounds_fanout_tx_relay/outbounds_fanout_tx_relay by the number of peers we’re going to fanout with ancestors?
sr-gi
commented at 10:11 am on September 26, 2024:
We could. I left it as for simplicity, given this case should not be too frequent, so having a higher fanout under these conditions shouldn’t be too bad.
I’ll consider adding it if it does not make the code much harder to read, or if there is a clear counterargument in favor of it.
naumenkogs
commented at 9:54 am on November 1, 2024:
This function could easily return two numbers: how many of these peers are inbound and outbound. And then just pass this pair along to ShouldFanoutTo. Should be not that complex code. A cheap cost to preserve the efficiency model.
I haven’t taken this yet, given I’m working on a last commit reworking part of how the fanout selection works. I may consider it after that. In any case, looks like this is only filling the cache but not testing anything after it
Make sure to run all tests locally, according to the documentation.
The failure may happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot removed the label
Needs rebase
on Sep 13, 2024
sr-gi force-pushed
on Sep 13, 2024
DrahtBot removed the label
CI failed
on Sep 13, 2024
in
src/net_processing.cpp:2453
in
3879f550d3outdated
2451+ if (!result.m_succeeded) {
2452+ // If the transactions fails to get into the set, we fanout
2453+ invs_to_send.push_back(wtxid);
2454+ // If the transaction fails because it collides with an existing one,
2455+ // we also remove and fanout the conflict and all its descendants.
2456+ // This is because our peer may have added the conflicting tranaction
brunoerg
commented at 12:47 pm on September 21, 2024:
0 // This is because our peer may have added the conflicting transaction
sr-gi force-pushed
on Oct 2, 2024
sr-gi force-pushed
on Oct 2, 2024
sr-gi force-pushed
on Oct 2, 2024
sr-gi
commented at 3:35 pm on October 2, 2024:
member
I’ve added a commit introducing a delayed set to store transactions that are being reconciled, but won’t be added to the sketch should it be constructed at this time. This is to mimic the fanout trickling logic where transactions are only made available to peers if they would have been INVed to them. This was already part of @naumenkogs approach, but was scrapped out when reworking when data is added to the reconciliation set in a22b2364f5942266596c78ad306b1e32aa89aa0f
sr-gi force-pushed
on Oct 2, 2024
sr-gi force-pushed
on Oct 3, 2024
sr-gi force-pushed
on Oct 3, 2024
sr-gi force-pushed
on Oct 3, 2024
sr-gi force-pushed
on Oct 3, 2024
in
src/node/txreconciliation.cpp:457
in
20772c0778outdated
452+ // We first decide how many reconciling peers of a given direction we want to flood to,
453+ // and then generate a list of peers of that size for a given transaction. We then see
454+ // whether the given peer falls into this list.
455+ double n;
456+ if (peer_state->m_we_initiate) {
457+ n = OUTBOUND_FANOUT_DESTINATIONS - outbounds_fanout_tx_relay;
marcofleon
commented at 3:46 pm on October 12, 2024:
Can outbounds_fanout_tx_relay ever be greater than 1 here? I think if we have two (or more) unregistered outbound peers and at least one registered peer, then this results in an unsigned integer overflow. Which then overflows targets_size in IsFanoutTarget.
naumenkogs
commented at 3:03 pm on October 23, 2024:
The commit with the casting works, but the test one is actually not a good way of testing this; it would pass both with and without the change.
Turns out that passing a huge value to IsFanoutTarget would result in targets_size being 0, which results in the method returning false. I don’t think there is really a good way of testing this.
I’ll take the code commit but skip the addition to the test.
Happy to consider an alternative approach that allows us to check there are no collisions without having to expose a public method if you have something in mind
naumenkogs
commented at 9:07 am on November 4, 2024:
Yeah just dropping HasCollision, keeping HasCollisionInternal. The former is used only in tests. And since its implementation is trivial, i think testing for collisions through AddToSet is sufficient.
Sure, looks like tests can be rewritten for it not to be needed. Will address in the next push
in
src/net_processing.cpp:6276
in
50c19dd21foutdated
6272+ auto result = m_txreconciliation->AddToSet(pto->GetId(), wtxid);
6273+ if (!result.m_succeeded) {
6274+ vInv.push_back(inv);
6275+ // If the transaction fails because it collides with an existing one,
6276+ // we also remove and fanout the conflict and all its descendants.
6277+ // This is because our peer may have added the conflicting tranaction
naumenkogs
commented at 8:20 am on October 28, 2024:
50c19dd21f6e437cd8e3d9047a89d793d097c5e5
worth adding a comment on what would happen if there are 3 colliding transactions (probably fine but some text would be helpful) :)
Being this part of RelayTransaction, which processed transactions one at a time, and given that if a collision happens the existing transaction is removed from the set, there cannot be a three-way collision. If a third transaction that happens to have the same short id as the two previous ones is added to the set later, it would be added normally since there won’t be a transaction to collide with.
I could add a comment for three, but someone may then mention what happens if there are four 😅
naumenkogs
commented at 8:22 am on October 31, 2024:
I was looking for something like “As a general rule, we keep the transaction that appeared first”.
Currently, the AddToSet documentation seems outdated—it basically claims that the only way to fail is to reach a set limit.
How about the following replacement:
0 /**
1 * Step 1. Add a to-be-announced transaction to the local reconciliation set of the target peer.
2 * Returns false if the set is at capacity, or if the set contains a colliding transaction.
3 * Returns true if the transaction appears in the set (whether it was already there or just was added).
4 */
naumenkogs
commented at 10:23 am on October 31, 2024:
Apparently I was wrong.
0 // If the transaction fails because it collides with an existing one,
1 // we also remove and fanout the conflict and all its descendants.
2 // This is because our peer may have added the conflicting transaction
3 // to its set, in which reconciliation of these two would fail
Say A, B, C arrive.
First A added to the set.
then B doesn’t make it to the set, but also drops A from it.
That’s a good point. It may be the case that the selected peer has a parent -> child dependency but was not picked as one of the fanout_with_ancestors peers, the parent and the child may have ended up in different sets.
We should check the return of TryRemovingFromSet before adding it to vInv
I just realized this is also the case for the ancestors. When relaying with ancestors, we should also check that the ancestor we are trying to relay is part of the reconciliation set (i.e. that it has not been previously flooded)
naumenkogs
commented at 9:47 am on November 4, 2024:
The fix should be done in p2p: Deal with shortid collisions for reconciliation sets, please move it if you retouch.
in
src/net_processing.cpp:6285
in
50c19dd21foutdated
6281+ Assume(peer->m_wtxid_relay);
6282+ CTxMemPool::setEntries descendants;
6283+ m_mempool.CalculateDescendants(m_mempool.get_iter_from_wtxid(collision.value()), descendants);
6284+ for (const auto &txit: descendants) {
6285+ auto wtxid = txit->GetTx().GetWitnessHash();
6286+ m_txreconciliation->TryRemovingFromSet(pto->GetId(), wtxid);
naumenkogs
commented at 8:32 am on October 28, 2024:
50c19dd21f6e437cd8e3d9047a89d793d097c5e5
could this be used to game collisions (for wtxid you have here)?
naumenkogs
commented at 8:33 am on October 28, 2024:
say a descendant was never considered for reconciliation (somehow), but now is used to removefromset some other transaction (it collides with)
or maybe it was considered, but then was dropped in favor of something else (that’s not how we currently handle collisions, but that would be a hard assumption)
Collisions of this kind could happen in theory, but I don’t think they can be really gamed:
short ids are peer specific, they are generated using SipHashUint256 which are salted using m_k0 and m_k1. Therefore, while a collision of this kind is theoretically possible, an adversarial peer should not be able to exploit them
naumenkogs
commented at 9:08 am on November 4, 2024:
Right, please resolve this issue.
in
src/net_processing.cpp:509
in
392facfce1outdated
naumenkogs
commented at 8:37 am on October 28, 2024:
392facfce1be6d0395abecc2e049ff39635bd1a5
why do you think force relay should influence whether we reconcile or flood? The transaction is relayed in either case (the original meaning of force relay)
You’re right, should have been added to the previous commit. Will fix
in
src/node/txreconciliation.h:158
in
392facfce1outdated
138@@ -139,6 +139,12 @@ class TxReconciliationTracker
139 */
140 bool ShouldFanoutTo(const Wtxid& wtxid, NodeId peer_id,
141 size_t inbounds_fanout_tx_relay, size_t outbounds_fanout_tx_relay);
142+
143+ /**
144+ * Returns a collections of node ids sorted by how many parents the peer has in its reconciliation set
naumenkogs
commented at 8:43 am on October 28, 2024:
392facfce1be6d0395abecc2e049ff39635bd1a5
you can technically feed it arbitrary transactions. I think the comment should say it, and then later say “we normally pass a list of parents to determine whether [… … …]”.
in
src/net_processing.cpp:6220
in
392facfce1outdated
6283@@ -6216,27 +6284,6 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
6284 // No reason to drain out at many times the network's capacity,
6285 // especially since we have many peers and some will draw much shorter delays.
6286 unsigned int nRelayedTransactions = 0;
6287-
6288- size_t inbounds_fanout_tx_relay = 0, outbounds_fanout_tx_relay = 0;
naumenkogs
commented at 8:44 am on October 28, 2024:
392facfce1be6d0395abecc2e049ff39635bd1a5
why not modifying the former commit that introduces this code?
Because it is doing more than just moving the code around, so I thought it may be easier for reviewers already familiar with the original approach to understand the changes. I wouldn’t mind squashing it once all the discussion around RelayTranaction is resolved
naumenkogs
commented at 9:10 am on November 4, 2024:
My impression was that there are very-very few reviewers already familiar with the original approach at the moment, so it’s an extra burden for most to understand that former code you drop anyway.
But yeah it’s ok. Perhaps a commit message saying it will be dropped will work.
in
src/net_processing.cpp:2188
in
392facfce1outdated
2429+ // to reconcile, fanout the transaction an all its ancestors. We just add the parents here and leave fanout as true
2430+ auto it = std::find(fanout_with_ancestors.begin(), fanout_with_ancestors.end(), peer_id);
2431+ if (it != fanout_with_ancestors.end()) {
2432+ for (const auto wtxid: parents) {
2433+ m_txreconciliation->TryRemovingFromSet(peer_id, wtxid);
2434+ invs_to_send.push_back(wtxid);
naumenkogs
commented at 8:54 am on October 28, 2024:
392facfce1be6d0395abecc2e049ff39635bd1a5
what about the order of the parents here? Deserves a comment at least. With the current non-cluster mempool it might break something?
UPD:
Say the following happens:
A -> B - > C
A arrives, we add it to some sets.
B arrives, triggers a flood from half the sets, but the other half has A+B now.
C arrives, triggers a flood from A+B. But then, can we guarantee A is flooded before B?
UPD2: topo sort should work. Have a common place where the INV vector is handled?
naumenkogs
commented at 10:17 am on October 31, 2024:
another question: say a child arrives a few minutes after a parent (parent still in the mempool, but it’s been announced to peers a while ago). Why bother with this inv? Perhaps we should do it only if TryRemovingFromSet actually have removed it?
(same applies to handling a collision below)
i know m_tx_inventory_known_filter will probably catch it, but i think it’s better to do it here.
in
src/net_processing.cpp:2202
in
392facfce1outdated
2442+ if (fanout) {
2443+ // We fanout if force relay is set, if the peer does not reconcile transactions, or if it does but it has been picked for fanout.
2444+ invs_to_send.push_back(peer->m_wtxid_relay ? wtxid : txid);
2445+ } else {
2446+ // Otherwise, we try to add the transaction to the peer's reconciliation set.
2447+ // If the transaction has in mempool descendants, we will fanout all the descendant
naumenkogs
commented at 9:11 am on October 28, 2024:
392facfce1be6d0395abecc2e049ff39635bd1a5
where do you do this (consider in-mempool descendants and decide to fanout everything)? I only see you do it for the collision
I don’t think this comment is accurate since this can never be the case. A transaction considered for relay cannot have in-mempool descendants by definition, since they would have been orphans at the time of considering them.
I cannot recall if this is a re-word of the previous approach, which tried to consider descendants, or I tried to refer to the collisions and it didn’t make much sense.
I’ll remove the comment
in
src/net_processing.cpp:5847
in
20772c0778outdated
6269@@ -6270,6 +6270,10 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
62706271 // Determine transactions to relay
6272 if (fSendTrickle) {
6273+ if (m_txreconciliation && m_txreconciliation->IsPeerRegistered(pto->GetId())) {
6274+ // Make transactions added to the reconciliation set during the last interval available
6275+ m_txreconciliation->ReadyDelayedTransactions(pto->GetId());
naumenkogs
commented at 9:19 am on October 28, 2024:
20772c077832592265bd6a2876aa6b4cb7dbde7d
with this code, and especially the 5/2 trickle delay (very long), previous performance measurements become outdated.
My idea was to delay the responses. And by this time, the hope was flooding does enough job so that scanning the sets through many connections does not work.
Also, adding to a set already happens on a trickle… so that’s another level of protection
naumenkogs
commented at 11:35 am on October 30, 2024:
Just talked to Bruno a bit more about this.
I think it’s fair to compare this delayed set mechanism to what i have in the code there.
I like my solution more because:
its synced across peers (handles better spying across multiple conns i think?)
simpler code
more flexibility (doesn’t have to be same trickling as adding to sets or fanouting)
I think the current approach mimics what you were doing with your old one. I haven’t picked up the trickle reduction commit not the delayed response, but I was planning to in the next PR.
Your approach used to add transactions to the reconciliation set during INV building, which happens on trickle intervals. So data was added to to_be_announced (delayed) on RelayTransaction and then made available on trickle. The current approach adds data to the delayed set on RelayTransaction and makes it available on trickle too.
So I don’t think the previous performance measurements become outdated, building the whole Erlay PR on this should yield similar results.
naumenkogs
commented at 10:14 am on October 31, 2024:
I agree now, it’s equivalent. Still wrapping my head around RelayTransaction refactoring :)
A difference could be in handling dependencies: during the trickle window after the first transaction arrives.
Say, a parent arrives, and then a child arrives in a second. In my approach we may notice the child when handling (announcement-wise) the parent (because handling the parent is postponed via trickle).
In your approach, you handle the parent independently.
Is there any use of knowing about the child while handling (announcement-wise) the parent? I don’t think so. You probably could save some efforts of adding the parent to reconset, and then removing it, but that should be cheap anyway.
(the child-before-parent case is entirely different, as the child-to-arrive-early doesn’t normally hit the announcement step while in orphanage)
naumenkogs
commented at 12:22 pm on November 1, 2024:
009:42:53<@gmaxwell> Existing relay logic checks that transactions are still in the mempool before relaying them. I think the issue there would go away if instead of keeping around some erlay datastructure you just keep growing the queue of the peers transactions to relay until its time to reconcile, enh? then the existing logic to check if things are still in the mempool is sufficient.
“Existing” refers to the legacy code in my branch.
I simulated this to see if there was any difference between doing the decision-making before or after trickle.
I didn’t do it for dependant transactions, just for a single one, trying to answer whether choosing fanout peers at scheduling time or at relay time made any difference. The answer is no, given the number of peers that know about the transaction at both times is almost the same.
in
src/node/txreconciliation.cpp:325
in
d2f2605072outdated
180+ auto peer_state = GetRegisteredPeerState(peer_id);
181+ if (!peer_state) return false;
182+
183+ auto removed = peer_state->m_local_set.erase(wtxid) > 0;
184+ if (removed) {
185+ LogPrintLevel(BCLog::TXRECONCILIATION, BCLog::Level::Debug, "Removed %s from the reconciliation set for peer=%d. " /* Continued */
naumenkogs
commented at 9:29 am on October 28, 2024:
nit: in three places in this commit /* Continued */ should be dropped, it’s no longer necessary (been a linter thing)
in
src/net_processing.cpp:6211
in
873e8dd60boutdated
6206+ // When we consider to which (and how many) Erlay peers
6207+ // we should fanout a tx, we must know to how
6208+ // many peers we would certainly announce this tx
6209+ // (non-Erlay peers).
6210+ if (peer_tx_relay->m_relay_txs && !m_txreconciliation->IsPeerRegistered(cur_peer_id)) {
6211+ inbounds_fanout_tx_relay += cur_peer->m_is_inbound;
naumenkogs
commented at 9:31 am on October 28, 2024:
sr-gi
commented at 7:50 pm on October 28, 2024:
member
Squashed @naumenkogs fix on @marcofleon comment in ad95b2c0e21f1e865f967aa9463ef99bc252550a, plus added a commit replacing IsFanoutTarget with GetFanoutTargets so the cache for fanout can be dropped. After this, the fanout targets are computed just once per transaction
sr-gi force-pushed
on Oct 28, 2024
DrahtBot added the label
CI failed
on Oct 28, 2024
DrahtBot
commented at 8:02 pm on October 28, 2024:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
sr-gi force-pushed
on Oct 28, 2024
DrahtBot removed the label
CI failed
on Oct 28, 2024
in
src/node/txreconciliation.cpp:389
in
4234f5ebe1outdated
393+ const double inbound_targets = (inbounds_fanout_tx_relay + m_inbounds_count) * INBOUND_FANOUT_DESTINATIONS_FRACTION;
394+ double n = std::max(inbound_targets - inbounds_fanout_tx_relay, 0.0);
395+
396+ // Being this a fraction, we need to round it either up or down. We do this deterministically at random based on the
397+ // transaction we are picking the peers for.
398+ CSipHasher deterministic_randomizer_in{m_deterministic_randomizer};
naumenkogs
commented at 10:06 am on October 29, 2024:
4234f5ebe133b949080aaa56da8cbdd18b650ff2
Originally, the code forced me to make this call for every peer, that’s why i needed it to be deterministic. It’s not the case anymore. Do we still need determinism then?
naumenkogs
commented at 10:08 am on October 29, 2024:
Otherwise we can drop m_deterministic_randomizer altogether and use some cheap FastRandomContext instead.
Well, this really depends. GetFanoutTargets is called by RelayTransaction, which, in turn, may be called by ProcessMessage for a peer that has NetPermissionFlags::ForceRelay. So if force relay is used for both fanout and set reconciliation, as suggested in #30116 (review), then this could be called twice leading to two different sorting if we don’t do so deterministically
naumenkogs
commented at 9:15 am on October 30, 2024:
This could be ReattemptInitialBroadcast, BroadcastTransaction (rpc/wallet), ProcessValidTx, or earlier upon handling NetMsgType::TX.
I think it’s odd we have two calls to RelayTransaction in the two latter cases. I’d rather change this code, e.g. pass already_relayed (duplicating force) to RelayTransaction. Perhaps too much for this PR and we should do it elsewhere (especially now that the package stuff happens in-between). I understand why it was ok when RelayTransaction was simple.
And while we’re stuck with these two calls, let’s say we drop the determinism.
Why RelayTransaction doesn’t look at m_tx_inventory_to_send very early (and maybe m_tx_inventory_known_filter too)? That would catch the transactions we already set for flooding, and just pass on them.
Now, the transactions in the sets. If we add them to the set again, nothing is done. If we try to flood them by a different choice, nothing is done still because we hit m_tx_inventory_to_send already having it.
Thus, for now, i think we should repeat the m_tx_inventory_to_send check above, rather than having the determinism. It kinda makes sense anyway, no?
Why RelayTransaction doesn’t look at m_tx_inventory_to_send very early (and maybe m_tx_inventory_known_filter too)? That would catch the transactions we already set for flooding, and just pass on them.
I don’t think this really works(?).
Imagine we call this once, and peer p (who has ForceRelay` permissions) is selected for reconciliation using a non-deterministic approach.
Now, the transaction is sent again by p shortly after, and now he is selected for flooding instead. The filter won’t catch this, and the transaction would end up in both sets.
This makes me recall that even though the process of selecting peers is deterministic at the moment, it is dependent on our peer list. Calling this twice with a different set of peers can result in a different ordering, so maybe we should not assume that transactions can be freely added to any of the two sets, and we should make sure that the opposite set does not contain the transaction before adding it to our desired one?
in
src/node/txreconciliation.cpp:374
in
4234f5ebe1outdated
385+ LOCK(m_txreconciliation_mutex);
386387+ // We decide whether a particular peer is a low-fanout flood target differently based on its connection direction:
388+ // - for outbounds we have a fixed number of flood destinations.
389+ // - for inbounds we use a fraction of all inbound peers supporting tx relay.
390+ auto outbounds_target_size = (double)OUTBOUND_FANOUT_DESTINATIONS - outbounds_fanout_tx_relay;
naumenkogs
commented at 10:17 am on October 29, 2024:
Yeah, you’re right. Given how small the two vectors are, calling ShouldFanoutTo with a merged vector should not incur any major overhead, so merging will make it easier to read
DrahtBot added the label
Needs rebase
on Oct 29, 2024
sr-gi force-pushed
on Oct 29, 2024
sr-gi force-pushed
on Oct 30, 2024
sr-gi
commented at 2:59 pm on October 30, 2024:
member
Rebasing to deal with merge conflicts
DrahtBot removed the label
Needs rebase
on Oct 30, 2024
in
src/node/txreconciliation.cpp:110
in
ccd695851coutdated
naumenkogs
commented at 11:17 am on October 31, 2024:
Nit
0const size_t removed = m_local_set.erase(wtxid) + m_delayed_local_set.erase(wtxid);
1// Data must be in one of the sets at most
2 Assume(removed <=1);
3return removed ==1;
The assume is there to make sure that if the code happens to be wrong, it is catched on testing. In the current approach, if the code moving data from delayed_set to m_local_set was broken somehow, this method will return that data was actually removed. If were to return removed == 1 in the aforementioned case, it would return false, and the caller’s logic may break.
I don’t think any of these cases is likely to happen given the assume will catch it before getting merged, but the proposed change makes me anxious.
PS: I agree on doing the variable renaming if you feel that makes the code more readable though
naumenkogs
commented at 10:08 am on November 4, 2024:
I thought size_t 0 will be cast to bool false on return, and just suggested the exact same behavior (1==true, 0==false) but without forcing the reader to think about casting and cause this exact confusion between us.
Am i wrong?
Ahhh, so you saying 2==true is also important? Then I’ll be fine with explicit return removed >= 1, and comment that 2 should not normally happen.
2==true would mean that we had the transaction in both places, which should never happen (I’d mean that a transaction is both ready and delayed at the same time).
If that were to happen in production for whatever reason, this would return false with the suggested changes, which may break something?
naumenkogs
commented at 7:33 am on November 14, 2024:
Probably a taste thing. I think auto should be used only when it’s obvious what type it is.
Here there are two non-obvious occasions:
auto x = bool + bool resulting in x=2.
casting 2 to bool for the result.
I won’t insist but i think verbose types are better here. Feel free to close.
naumenkogs
commented at 9:49 am on November 1, 2024:
fd7c35a22669d440afe182a153bed28fd326b51f
Often, there will be no parents (e.g., a parent is in the mempool but has already been flooded/reconciled). In that case, it would be nice to give a random order. And then comment it please.
This is only called if there are in mempol parents though, the ordering doesn’t matter much otherwise
naumenkogs
commented at 9:57 am on November 4, 2024:
There could be mempool parents, but none are in reconciliation sets (say, they were already reconciled). For all these cases, it would be nice to not choose the peer sitting first in m_states?
In that case, no transaction would be moved from the reconciliation set to invs_to_send, given no transaction would be removed. Therefore, the order wouldn’t matter:
0auto it = std::find(fanout_with_ancestors.begin(), fanout_with_ancestors.end(), peer_id);
1if (it != fanout_with_ancestors.end()) {
2for (constautowtxid: parents) {
3 m_txreconciliation->TryRemovingFromSet(peer_id, wtxid);
4 invs_to_send.push_back(wtxid);
5 }
6} else {
7// If the peer is registered for set reconciliation, maybe pick it as fanout
8 fanout = m_txreconciliation->ShouldFanoutTo(peer_id, fanout_targets);
9}
This may be a bit counterintuitive. I’ll give some though and potentially re-design so it clearer
naumenkogs
commented at 7:12 am on November 14, 2024:
Say there is an in-mempool parent announced earlier, so it’s not in any of the sets.
SortPeersByFewestParents will return the same order as m_states (since all parent_count are 0),
and fanout_with_ancestors will take first two after resize.
Then for two peers (specifically, the first two from m_states) this code is executed:
TryRemovingFromSet will do nothing. But then invs_to_send.push_back(wtxid); is executed anyway. And always for the same two first peers from m_states. In other words, the first two are guaranteed to take the fanout in this case.
Sorry, you’re right. I’ve been looking at this being conditional to m_txreconciliation->TryRemovingFromSet(peer_id, wtxid), which is what happens with the descendants, but here it is unconditional.
It should be added only if TryRemovingFromSet succeeds, and therefore the ordering in case there are no ancestors won’t matter. I’ll also add some comments regarding that.
Notice the point of this is not to add something to invs_to_send but rather to move it from m_txreconciliation to invs_to_send, otherwise things would be sent more than once (or potentially captured by the bloom filters before hand, but still not optimal)
in
src/net_processing.cpp:2232
in
fd7c35a226outdated
2212+ // we also remove and fanout the conflict and all its descendants.
2213+ // This is because our peer may have added the conflicting transaction
2214+ // to its set, in which reconciliation of these two would fail
2215+ if (const auto collision = result.m_conflict; collision.has_value()) {
2216+ CTxMemPool::setEntries descendants;
2217+ WITH_LOCK(m_mempool.cs, m_mempool.CalculateDescendants(m_mempool.get_iter_from_wtxid(collision.value()), descendants));
naumenkogs
commented at 10:06 am on November 1, 2024:
Why having a collision means we should fanout the children? They may get reconciled in their own time.
Because if you remove the parent from the reconciliation set but not the children, it could be the case that you reconcile with that peer before fanning out the parent, and all those children will be orphan
naumenkogs
commented at 9:54 am on November 4, 2024:
Something worth documenting too :) Again, at least one place where we talk about handling dependencies.
DrahtBot removed the label
CI failed
on Nov 2, 2024
in
src/node/txreconciliation.cpp:261
in
e942f3a749outdated
151+
152+ // Check if the reconciliation set is not at capacity for two reasons:
153+ // - limit sizes of reconciliation sets and short id mappings;
154+ // - limit CPU use for sketch computations.
155+ //
156+ // Since we reconcile frequently, reaching capacity either means:
naumenkogs
commented at 9:16 am on November 4, 2024:
e942f3a7493908988eea08640d87b2ae420c5eef
This entire paragraph currently seems to me more confusing than useful. I suggest dropping it altogether.
in
src/node/txreconciliation.cpp:257
in
e942f3a749outdated
147+ AssertLockNotHeld(m_txreconciliation_mutex);
148+ LOCK(m_txreconciliation_mutex);
149+ auto peer_state = GetRegisteredPeerState(peer_id);
150+ if (!peer_state) return false;
151+
152+ // Check if the reconciliation set is not at capacity for two reasons:
naumenkogs
commented at 9:18 am on November 4, 2024:
e942f3a7493908988eea08640d87b2ae420c5eef
Duplicates the comment around MAX_RECONSET_SIZE. One of them we better dop.
I’m guessing this is related to #30116 (review), in which case I’ll drop that
in
src/net_processing.cpp:5827
in
9005953037outdated
5823+ auto result = m_txreconciliation->AddToSet(pto->GetId(), wtxid);
5824+ if (!result.m_succeeded) {
5825+ vInv.push_back(inv);
5826+ // If the transaction fails because it collides with an existing one,
5827+ // we also remove and fanout the conflict and all its descendants.
5828+ // This is because our peer may have added the conflicting tranaction
naumenkogs
commented at 9:44 am on November 4, 2024:
90059530379cf99cf4b82d664ad35c1ddefc3d07
“Would fail” is very confusing. “Failed reconciliation” will have a technical meaning in the latter commits. Here, i’d rather be more explicit “two transactions mapped to the same short-id won’t be announced between these two peers”.
in
src/net_processing.cpp:5826
in
9005953037outdated
5822+ // If the transactions fails to get into the set, we fanout
5823+ auto result = m_txreconciliation->AddToSet(pto->GetId(), wtxid);
5824+ if (!result.m_succeeded) {
5825+ vInv.push_back(inv);
5826+ // If the transaction fails because it collides with an existing one,
5827+ // we also remove and fanout the conflict and all its descendants.
naumenkogs
commented at 9:45 am on November 4, 2024:
Let’s provide a hint why fanouting the descendants (or at least send a reader to where the dependency issue is explained better — we should have a one place explaining it)
in
src/node/txreconciliation.cpp:62
in
c790aa0e7eoutdated
55@@ -56,6 +56,13 @@ class TxReconciliationState
56 */
57 uint64_t m_k0, m_k1;
5859+ /**
60+ * Set of transactions to be added to the reconciliation set on the next trickle. These are still unrequestable for
61+ * privacy reasons (to prevent transaction probing), transactions became available (moved to m_local_set) once they
62+ * would have been announced via fanout.
naumenkogs
commented at 10:04 am on November 4, 2024:
c790aa0e7ebfcf0b03b4d0ef5fcc0ddfa4199943
I think “upon trickle” or “see m_next_inv_send_time” is much less confusing.
in
src/node/txreconciliation.cpp:279
in
c790aa0e7eoutdated
285+ if (peer_state->m_delayed_local_set.insert(wtxid).second) {
286 peer_state->m_short_id_mapping.emplace(short_id, wtxid);
287 LogPrintLevel(BCLog::TXRECONCILIATION, BCLog::Level::Debug, "Added %s to the reconciliation set for peer=%d. "
288 "Now the set contains %i transactions.\n",
289- wtxid.ToString(), peer_id, peer_state->m_local_set.size());
290+ wtxid.ToString(), peer_id, set_size + 1);
naumenkogs
commented at 10:12 am on November 4, 2024:
c790aa0e7ebfcf0b03b4d0ef5fcc0ddfa4199943
This is probably the most important log in the entire file, and I think it’s better to distinguish two set sizes here (preferably other places the size is logged too).
naumenkogs
commented at 8:01 am on November 27, 2024:
yes
bitcoin deleted a comment
on Nov 4, 2024
sr-gi force-pushed
on Nov 13, 2024
in
src/net_processing.cpp:2186
in
ee46e21a5coutdated
2184+ // to reconcile, fanout the transaction an all its ancestors. We just add the parents here and leave fanout as true
2185+ auto it = std::find(fanout_with_ancestors.begin(), fanout_with_ancestors.end(), peer_id);
2186+ if (it != fanout_with_ancestors.end()) {
2187+ for (const auto wtxid: parents) {
2188+ m_txreconciliation->TryRemovingFromSet(peer_id, wtxid);
2189+ invs_to_send.push_back(wtxid);
naumenkogs
commented at 7:16 am on November 14, 2024:
ee46e21a5c913175415a192661b406b328209800
First I thought why don’t fanout=true here and handle it below. We can’t do it because txid/wtxid difference (although we could have a txid_or_wtxid variable above….). But then, you certainly want to at least set fanout=false here, otherwise it remains true and added to invs_to_send twice (it’s not critical but a bug still).
Not sure if this comment only applied before #30116 (review). But I think this is not currently the case.
Here we are only adding the parents. Leaving fanout=true triggers adding the actual transaction RelayTransaction was called with in the following if (fanout) block.
We could also add it here, but it would involve adding another boolean to skip the if (fanout) block, since we need to populate m_tx_inventory_to_send with data from invs_to_send which happens right after
naumenkogs
commented at 8:04 am on November 27, 2024:
The issue is re-defining wtxid (for parent_wtxid) while it’s already defined as a transaction-in-question :)
That was the source in my confusion originally
in
src/node/txreconciliation.cpp:236
in
1dce9aedc5outdated
231+ }
232+
233+ for (auto &[parent_count, peer_id]: parents_by_peer) {
234+ const auto state = std::get<TxReconciliationState>(m_states.find(peer_id)->second);
235+ for (const auto& wtxid: parents) {
236+ if (auto found = state.m_local_set.find(wtxid); found != state.m_local_set.end()) {
naumenkogs
commented at 6:23 am on November 26, 2024:
naumenkogs
commented at 7:25 am on November 27, 2024:
Multimap is ordered by key, and for same-key (same parent_count) since c++11 the order is consistent (documented here). Feel free to include this as a comment, although I’m not even sure we care about the same exact output of multiple SortPeersByFewestParents calls.
(also it depends on the consistent ordering of m_states iterator too, which could be documented along the same lines, here and elsewhere it is iterated over)
naumenkogs
commented at 7:58 am on November 27, 2024:
(this could also be done in 4b4d99fb2df2c60a2214487cec627bc560f50f53GetFanoutTargets, although it’s not that much an improvement over there)
I’ve been playing a bit with this, plus adding a test for SortPeersByFewestParents, which was missing. The multimap does have a consistent ordering, but it does reverse order of insertion instead of order of insertion for ties. Not really like it matters to be honest, if that is something we care about we can always start from a shuffled copy of m_states.
nvm, the order was changing because this is iterating over m_states, which is an unordered_map.
I’ll take the suggestion, but I’m thinking about modifying it a bit so the order does not have to be so reliant on m_states, which is something you also pointed out in an older review IIRC. What about something along the line of:
naumenkogs
commented at 9:02 am on December 5, 2024:
I think an extra shuffle is a bit too much, because then for consistency you’d do it every time we access m_states? I think a comment in those places where order might matter is sufficient.
naumenkogs
commented at 6:37 am on November 26, 2024:
0092f70314b8d59b41ff4bd2bc15a71f954668d4
Consider fanout=false because the peer has too many parents (cut the tail from fanout_with_ancestors).
Now, we fail to add a transaction to the set (whatever reason, say set too full), and we’re now going to flood a child while reconciling the parents again?
Perhaps AddToSet result could be handled more carefully instead (or have a todo/comment, this is a corner case of a corner case of course).
naumenkogs
commented at 6:51 am on November 26, 2024:
This is slightly changed later in 5996d473434d364cff937ae89ffd95cd09c5d1a7. AddToSetResult::Failed still not handled though?
Yes, that’s right. In the current revision, if the set is full and the given transaction has some ancestors, the children will be flooded while the parents will be reconciled, which may cause orphans.
I think we could handle this at the same level that we handled fanout_with_ancestors:
Fanout to peers that have ancestors and have a full set
Fanout to peers that have the least amount of ancestors
However, this will still be an issue in some cases, given we only select a subset of peers to fanout to when these conditions hold (currently 2, but let’s say n). If we do not fanout to all peers that fall under these conditions, there will always be some that will receive parents and children using different approaches, which may cause orphans. However, I think it is undesirable to fanout to all matches in this situation.
This raises the question, is it worth the added complexity, or should we add a comment acknowledging that this can be the case, but that we expect it to happen really infrequently?
I added a comment for this instead of making a full set part of the sorting criteria. This should not happen under normal circumstances provided the set size is defined to account for way over the expected traffic between reconciliations. A peer hitting the limit is likely to be either broken or an attacker, and I don’t think we should be catering to them (nor making the logic more complex based on that)
consider defining this variable inside the loop rather than cleaning it after each iteration?
in
src/node/txreconciliation.cpp:432
in
840a4e14f2outdated
444@@ -381,9 +445,9 @@ class TxReconciliationTracker::Impl
445 }
446447 for (auto &[parent_count, peer_id]: parents_by_peer) {
448- const auto state = std::get<TxReconciliationState>(m_states.find(peer_id)->second);
449+ auto state = std::get<TxReconciliationState>(m_states.find(peer_id)->second);
450 for (const auto& wtxid: parents) {
451- if (auto found = state.m_local_set.find(wtxid); found != state.m_local_set.end()) {
452+ if (state.ContainsTx(wtxid, /*include_delayed=*/true)) {
naumenkogs
commented at 7:05 am on November 26, 2024:
840a4e14f2d833717c87a54ec2707a92d5131ada
been thinking about the use of a delayed set here.
We apply the delay only on addition, but not on fanouting parents (or counting them), neither on removing stuff from the sets (say at receiving a transaction).
For me the choice is pretty arbitrary (and probably fine to save the code complexity for this rare corner case), but perhaps we would benefit from better documentation around it.
I think I’m not following here. We do apply the delayed set both on addition (AddToSet), deletion (RemoveFromSet), and when counting parents (this bit of code is part of the parent sorting).
naumenkogs
commented at 6:45 am on November 27, 2024:
I mean the delay specifically, not the set. If we talk about an abstract “reconciliation set” (which is a combination of two), you apply the delay for an addition, but not for the removal action.
I see your point. The reasoning for this is so we can use ContainsTx in testing to check that things have been added/removed to/from the proper collection. Throughout the rest of the codebase, we always call this with true, and we do an implicit check on deletion.
I’m happy to consider an alternative that covers the same functionality if you have one
in
src/node/txreconciliation.cpp:416
in
fc06786f32outdated
454- * due to caching. Thus, we can save us adding that extra function for now.
455- */
456- bool ShouldFanoutTo(const Wtxid& wtxid, NodeId peer_id,
457- size_t inbounds_fanout_tx_relay, size_t outbounds_fanout_tx_relay)
458- EXCLUSIVE_LOCKS_REQUIRED(!m_txreconciliation_mutex)
459+ bool ShouldFanoutTo(NodeId peer_id, std::vector<NodeId> &fanout_targets) EXCLUSIVE_LOCKS_REQUIRED(!m_txreconciliation_mutex)
naumenkogs
commented at 7:10 am on November 26, 2024:
fc06786f32b2fc82c9571ebde67b4ff54f4ad825
this wrapper is not needed anymore. The Registered check already happens at the caller site. Just drop it and expose GetFanoutTargets?
in
src/node/txreconciliation.cpp:417
in
fc06786f32outdated
423- std::sort(best_peers.begin(), best_peers.end());
424-
425- auto it = best_peers.begin();
426- for (size_t i = 0; i < targets_size && it != best_peers.end(); ++i, ++it) {
427- if (it->second == peer_id) return true;
428+ // Sort the peers based on their assigned random value, extract the node_ids and trim the collections to size
naumenkogs
commented at 10:44 am on November 26, 2024:
I think this is slightly harder to follow, but I like that the collections are only sorted up to some point, so I’ll take it. Instead of using the std::min, I’ve added two Assume when defining {outbounds, inbounds}_target_size to ensure that the targets are never bigger than the sets the pull from, since this should never be the case
I don’t think it makes a huge difference, but I’ll take it -> 4b4d99f
in
src/net_processing.cpp:2166
in
0092f70314outdated
2154+ if (!parents_refs.empty()) {
2155+ for (const auto &tx : parents_refs) {
2156+ parents.emplace_back(tx.get().GetTx().GetWitnessHash());
2157+ }
2158+ fanout_with_ancestors = m_txreconciliation->SortPeersByFewestParents(parents);
2159+ fanout_with_ancestors.resize(2); // FIXME: Resize to 2 for now
naumenkogs
commented at 11:42 am on November 26, 2024:
Better check if the size is not 1 already? It’s currently safe, but could cause bugs in the future.
Covered in 9918f3b08768dece8d8daf73b8144ada60c4ae5a
in
src/node/txreconciliation.cpp:400
in
4b4d99fb2doutdated
420- for (size_t i = 0; i < targets_size && it != best_peers.end(); ++i, ++it) {
421- if (it->second == peer_id) return true;
422- }
423- return false;
424+ // Make sure we never select more targets than we can
425+ Assert(outbounds_target_size <= weighed_outbounds.size());
naumenkogs
commented at 7:48 am on November 27, 2024:
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot removed the label
CI failed
on Dec 5, 2024
in
src/node/txreconciliation.cpp:422
in
4b9f83a8b7outdated
398+
399+ // Sort the peers based on their assigned random value, extract the node_ids and trim the collections to size
400+ std::vector<NodeId> fanout_targets(inbounds_target_size + outbounds_target_size);
401+ auto collect_fanout_targets = [&](std::vector<std::pair<uint64_t, NodeId>> weighted_peers, const size_t target_size) {
402+ // Make sure we never select more targets than we can
403+ Assert(outbounds_target_size <= weighed_outbounds.size());
marcofleon
commented at 5:21 pm on December 10, 2024:
I think having a preregistered peer and one inbound peer in m_states causes outbound_target_size to be 1 while weighed_outbounds.size() is 0 because assign_key is never called.
The equivalent unit test (which fails when I run it) would be something like
I’ve cached m_outbounds_count to prevent this from happening.
This made me realize that fanout_targets was incorrectly initialized, instead of reserving space for inbounds_target_size + outbounds_target_size, it was being initialized with inbounds_target_size + outbounds_target_size pre-existing elements initialized to 0, meaning that the returned vector contained more elements than expected
As far as I can see, at SENDTXRCNCL reception we do not check for duplicate SENDTXRCNCL message (we check if we have txreconciliation enabled, then that it’s happening before verack reception and then that transaction-relay is enabled on the link) and we can call RegisterPeer multiple times.
Calling RegisterPeer with the same peer_id multiple times will result in ReconciliationRegisterResult::ALREADY_REGISTERED.
The docs of both PreRegisterPeer and RegisterPeer specify that they should called only once.
in
src/node/txreconciliation.cpp:153
in
131c28bfd9outdated
149+ LOCK(m_txreconciliation_mutex);
150+ auto peer_state = GetRegisteredPeerState(peer_id);
151+ if (!peer_state) return false;
152+
153+ // Transactions which don't make it to the set due to the limit are announced via fan-out.
154+ if (peer_state->m_local_set.size() >= MAX_RECONSET_SIZE) return false;
Logging that a given peer’s local reconciliation set max size has been reached can be useuful info for future debug.
0diff --git a/src/node/txreconciliation.cpp b/src/node/txreconciliation.cpp
1index d4d6bdf1ea..b99bb41681 100644
2--- a/src/node/txreconciliation.cpp
3+++ b/src/node/txreconciliation.cpp
4@@ -150,7 +150,11 @@ public:
5 if (!peer_state) return false;
6 7 // Transactions which don't make it to the set due to the limit are announced via fan-out.
8- if (peer_state->m_local_set.size() >= MAX_RECONSET_SIZE) return false;
9+ if (peer_state->m_local_set.size() >= MAX_RECONSET_SIZE) {
10+ LogPrintLevel(BCLog::TXRECONCILIATION, BCLog::Level::Debug, "Reconciliation set maximum size reachedfor peer=%d. ",
11+ peer_id);
12+ return false;
13+ }
1415 // The caller currently keeps track of the per-peer transaction announcements, so it
16 // should not attempt to add same tx to the set twice. However, if that happens, we will
I think the variable naming could be enhanced, e.g m_collision, to dissociate from mempool conflicts. Here, it’s a collision from SipHash outputs that can arise from 2 distinct transactions, not even spending the same set of UTXOs.
Addressed in a461a1204109762ea38b21d9b6c445e9b9ba570f
ariard
commented at 1:05 am on December 22, 2024:
none
Reviewed up to commit 478137aa (“p2p: Add PeerManager method to count…”).
Just minor comments so far and I still have to answer back comments on #28765.
in
src/node/txreconciliation.h:159
in
c86f4f3359outdated
129+ * can be found in their reconciliation sets (from less to more). If two values tie, they will be
130+ * sorted in order or addition, which depends on the internal order of [m_states].
131+ * This is meant to be called with a collection of ancestor ids for a given transaction in order to
132+ * decide whether the transaction should be fanout with ancestors or not.
133+ */
134+ std::vector<NodeId> SortPeersByFewestParents(std::vector<Wtxid> parents);
I think the rationale for introducing SortPeersByFewestParents could be more fleshed out, i.e iiuc explaining that eligible peers with the lowest number of parents for a target transaction in their reconciliation sets are favored for fan-out announcements, as this can be a hint of better connectivity in the overall transaction-relay graph and we wish to propagate the target transaction faster (..?).
I think the docs give enough rationale as to what this is intended for without being excessively narrow as to what the use cases could be.
in
src/net_processing.cpp:2245
in
feb8c98db1outdated
2219+ // This should not happen under normal conditions, given the set size should be well over the number of transactions received
2220+ // between reconciling intervals. A peer hitting the limit is likely to be either a broken implementation or an attacker.
2221+ invs_to_send.push_back(peer->m_wtxid_relay ? wtxid : txid);
2222+ }
2223+
2224+ for (const auto& hash : invs_to_send) {
If the peer is m_wtxid_relay=false should m_tx_inven tory_known_filter be checked by txid ? Note, given the parent identifiers are retrieved from GetMemPoolParents() we can have the identifiers easily. This could avoid some duplicated INV flooding.
This is already the case. The transaction that is passed to RelayTransaction is added to invs_to_send using wtxid/txid based on m_wtxid_relay (that is part of the code you quoted already). For other transactions that may be in inv_to_send (e.g. ancestors and/or descendants), this only apply to Erlay, and Erlay enforces WTXID_RELAY
ariard
commented at 10:58 pm on January 1, 2025:
none
Reviewed up to commit feb8c98db (“p2p: Add transactions to reconciliation sets…”).
sr-gi force-pushed
on Jan 2, 2025
sr-gi force-pushed
on Jan 2, 2025
in
src/node/txreconciliation.cpp:223
in
bbdc84ca0foutdated
224+ // Make sure there is no short id collision between the wtxid we are trying to add
225+ // and any existing one in the reconciliation set
226+ Wtxid collision;
227+ uint32_t short_id;
228+ if (HasCollision(peer_state, wtxid, collision, short_id)) {
229+ return AddToSetResult::Collision(collision);
nit: The Collision here could be LogPrintLevel(BCLog::TXRECONCILIATION, BCLog::Level::Debug, “…”).
ariard
commented at 10:56 pm on January 2, 2025:
none
Reviewed up to 4b9f83a (“p2p: Makes transactions available…”), all the code changes.
I’ll check the test coverage.
sr-gi force-pushed
on Mar 25, 2025
sr-gi force-pushed
on Mar 25, 2025
sr-gi
commented at 8:36 pm on March 25, 2025:
member
Updated the approach based on the results from the simulations @naumenkogs and I have been running.
sr-gi force-pushed
on Apr 1, 2025
DrahtBot added the label
CI failed
on Apr 1, 2025
sr-gi force-pushed
on Apr 2, 2025
sr-gi force-pushed
on Apr 2, 2025
DrahtBot removed the label
CI failed
on Apr 2, 2025
sr-gi force-pushed
on Apr 3, 2025
DrahtBot added the label
CI failed
on Apr 3, 2025
sr-gi force-pushed
on Apr 8, 2025
DrahtBot removed the label
CI failed
on Apr 8, 2025
sr-gi force-pushed
on Apr 15, 2025
sr-gi force-pushed
on Jun 18, 2025
DrahtBot added the label
Needs rebase
on Jul 11, 2025
sr-gi force-pushed
on Jul 16, 2025
DrahtBot removed the label
Needs rebase
on Jul 16, 2025
sr-gi force-pushed
on Jul 23, 2025
sr-gi force-pushed
on Jul 23, 2025
DrahtBot added the label
CI failed
on Jul 23, 2025
DrahtBot
commented at 9:09 pm on July 23, 2025:
contributor
🚧 At least one of the CI tasks failed.
Task CentOS, depends, gui: https://github.com/bitcoin/bitcoin/runs/46596699781
LLM reason (✨ experimental): The CI failure is caused by an assertion failure in txreconciliation.cpp during the “AddToSetTest” case, leading to test subprocess abortion.
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
sr-gi force-pushed
on Jul 24, 2025
sr-gi force-pushed
on Jul 24, 2025
sr-gi force-pushed
on Jul 24, 2025
DrahtBot removed the label
CI failed
on Jul 25, 2025
sr-gi force-pushed
on Jul 28, 2025
sr-gi force-pushed
on Jul 28, 2025
DrahtBot added the label
CI failed
on Jul 28, 2025
DrahtBot
commented at 2:58 pm on July 28, 2025:
contributor
🚧 At least one of the CI tasks failed.
Task ARM, unit tests, no functional tests: https://github.com/bitcoin/bitcoin/runs/46868137690
LLM reason (✨ experimental): The CI failure is caused by unresolved version control conflict markers remaining in the source code file.
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
sr-gi force-pushed
on Jul 28, 2025
sr-gi force-pushed
on Jul 28, 2025
sr-gi force-pushed
on Jul 28, 2025
sr-gi force-pushed
on Jul 28, 2025
sr-gi force-pushed
on Jul 28, 2025
sr-gi force-pushed
on Jul 28, 2025
DrahtBot removed the label
CI failed
on Jul 29, 2025
DrahtBot added the label
Needs rebase
on Jul 30, 2025
refactor: redesigns txreconciliation file split and namespace
Splits the txreconciliation logic in three files instead of two, allowing the
TxreconciliationState to be properly tested, instead of being internal to
txreconciliation.cpp.
Also defines includes everything in the node namespace, instead of being part
of an anonymous one.
63cd3b1dad
refactor: remove legacy comments
These comments became irrelevant in one of the previous code changes.
They simply don't make sense anymore.
326e3db51a
refactor: use LogDebug instead of LogPrintLevel in txreconciliation1a972f3712
p2p: Functions to add/remove wtxids to tx reconciliation sets
They will be used later on.
9b86119e33
p2p: Make short id collision detectable when adding wtxids to tx reconciliation sets1316feae61
p2p: Add PeerManager method to count the amount of inbound/outbounds fanout peers35163fb1a0
p2p: Cache inbound reconciling peers count
It helps to avoid recomputing every time we consider
a transaction for fanout/reconciliation.
Co-authored-by: Sergi Delgado Segura <sergi.delgado.s@gmail.com>
aec58de788
p2p: Add method to decided whether to fanout or reconcile a transactions
Fanout or reconciliation is decided on a transaction basis, based on the following criteria:
If the peer is inbound, we fanout to a pre-defined subset of peers (which is rotated periodically).
If the peer is outbound, we will reconcile the transaction if we received it via reconciliation, or
defer the decision to relay time otherwise. At relay time, we will fanout to outbounds until a threshold is met
(selecting peers in the order their timers go off) and reconcile with the rest.
With this approach we try to fanout when we estimate to be early in the propagation of the transaction,
and reconcile otherwise. Notice these heuristics don't apply to inbound peers, since they would be easily
exploitable. For inbounds we just aim for a target subset picked at random.
b364a0b9e4
p2p: Add transactions to reconciliation sets
Transactions eligible for reconciliation are added to the reconciliation sets. For the remaining txs, low-fanout is used.
Co-authored-by: Gleb Naumenko <naumenko.gs@gmail.com>
c4be2b5717
p2p: Add helper to compute reconciliation tx short ids and a cache of short ids to wtxids05e4afd1f7
p2p: Deal with shortid collisions for reconciliation sets
If a transaction to be added to a peer's recon set has a shot id collisions (a previously
added wtxid maps to the same short id), both transaction should be fanout, given
our peer may have added the opposite transaction to our recon set, and these two
transaction won't be reconciled.
3eab0e84b6
sr-gi force-pushed
on Aug 1, 2025
sr-gi
commented at 9:40 pm on August 1, 2025:
member
I’ve rebased to account for the recent GenTxid changes by @marcofleon.
I’ve also decided to clean the design while I was at it, to make things easier to follow and test. As part of this, the file structure has been changed from two files (txreconciliation.h and txreconciliation.cpp) to three (txreconciliation.h, txreconciliation_impl.h and txreconciliation_impl.cpp), following a similar approach as txdownloadman.
The code is split as follows:
txreconciliation.h contains the general reconciliation stuff, and everything that is not implementation dependent (coming from BIP 330). This also includes the TxReconciliationState, which was previously defined in txreconciliation.cpp and was therefore not exposed, nor tested.
txreconciliation_impl.h defined the TransactionReconciliationTracker using PIMPL
txreconciliation_impl.cpp has the actual implementation of the tracker
DrahtBot removed the label
Needs rebase
on Aug 1, 2025
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2025-08-08 18:13 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me