Erlay: bandwidth-efficient transaction relay protocol

naumenkogs commented at 8:58 pm on March 23, 2021: member

Erlay Project Tracking: #28646

This is an implementation of Erlay , using primitives in the BIP-330 (see the updated spec here ). Please refer to these two to understand the design. My talk is here.

Abstract

Erlay uses both flooding (announcing using INV messages to all peers) and reconciliation to announce transactions. Flooding is expensive, so Erlay seeks to use it sparingly and in strategic locations - only well-connected publicly reachable nodes flood transactions to other publicly reachable nodes via outbound connections. Since every unreachable node is directly connected to several reachable nodes, this policy ensures that a transaction is quickly propagated to be within one hop from most of the nodes in the network.

All transactions not propagated through flooding are propagated through efficient set reconciliation. To do this, every node keeps a reconciliation set for each peer, in which transactions are placed which would have been announced using INV messages absent this protocol. Every 2 seconds every node chooses a peer from its outbound connections in a predetermined order to reconcile with, resulting in both sides learning the transactions known to the other side. After every reconciliation round, the corresponding reconciliation set is cleared.

I think both paper and the BIP motives the changes, but I’ll mention them briefly once again here:

save 40% of the bandwidth consumed by a node
increase network connectivity for almost no bandwidth or latency cost
improves privacy as a side-effect

How to review

I suggest doing make clean && autogen.sh && configure before you try building it locally.

naumenkogs marked this as a draft on Mar 23, 2021

DrahtBot added the label Build system on Mar 23, 2021

DrahtBot added the label P2P on Mar 23, 2021

DrahtBot added the label Utils/log/libs on Mar 23, 2021

in src/txreconciliation.h:12 in 8110da8b1b outdated

 6+#define BITCOIN_TXRECONCILIATION_H
 7+
 8+#include <net.h>
 9+#include <sync.h>
10+
11+#include <tuple>

promag commented at 9:34 pm on March 23, 2021:

8110da8b1b8bc83fa7fd2b76f9cc4688a7a02623

nit #include <memory> for std::unique_ptr.

in src/txreconciliation.h:12 in 8110da8b1b outdated

 7+
 8+#include <net.h>
 9+#include <sync.h>
10+
11+#include <tuple>
12+#include <unordered_map>

promag commented at 9:39 pm on March 23, 2021:

8110da8b1b8bc83fa7fd2b76f9cc4688a7a02623

Move to txreconciliation.cpp

in src/txreconciliation.cpp:45 in 8110da8b1b outdated

40+            we_initiate_recon = true;
41+            we_respond_recon = false;
42+        }
43+
44+        uint64_t m_local_recon_salt(GetRand(UINT64_MAX));
45+        WITH_LOCK(m_mutex, m_local_salts.emplace(peer_id, m_local_recon_salt));

promag commented at 9:47 pm on March 23, 2021:

8110da8b1b8bc83fa7fd2b76f9cc4688a7a02623

Looks like insertion should always happen? If so you could assert .second of emplace() result. If not, then GetRand() could be avoided in case it exists, and below it should return the existing salt.

promag commented at 9:50 pm on March 23, 2021: contributor

Just noticed it’s marked as draft, I’ll resume once you mark it ready for review.

naumenkogs force-pushed on Mar 23, 2021

naumenkogs force-pushed on Mar 24, 2021

DrahtBot commented at 9:59 am on March 24, 2021: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept NACK	ghost, rebroad
Concept ACK	glozow
Stale ACK	promag

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#28912 (refactor: VectorWriter and SpanReader without nVersion by maflcko)
#28892 (refactor: P2P transport without serialize version and type by maflcko)
#28765 (p2p: Fill reconciliation sets (Erlay) by naumenkogs)
#28463 (p2p: Increase inbound capacity for block-relay only connections by mzumsande)
#28451 (refactor: Remove unused SER_DISK, SER_NETWORK, SER_GETHASH by maflcko)
#28429 (Do not log p2p bip61 reject messages, improve log, add tests by jonatack)
#27826 (validation: log which peer sent us a header by Sjors)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

naumenkogs force-pushed on Mar 24, 2021

naumenkogs force-pushed on Mar 25, 2021

naumenkogs force-pushed on Mar 26, 2021

naumenkogs force-pushed on Mar 27, 2021

naumenkogs closed this on Mar 28, 2021

naumenkogs reopened this on Mar 28, 2021

naumenkogs marked this as ready for review on Mar 28, 2021

naumenkogs force-pushed on Mar 28, 2021

in src/txreconciliation.cpp:60 in 3e4022a4a8 outdated

51@@ -52,6 +52,14 @@ class TxReconciliationTracker::Impl {
52 
53         return std::make_tuple(we_initiate_recon, we_respond_recon, RECON_VERSION, m_local_recon_salt);
54     }
55+
56+    void RemovePeer(NodeId peer_id)
57+    {
58+        LOCK(m_mutex);
59+        m_local_salts.erase(peer_id);
60+        LogPrint(BCLog::NET, "Stop tracking reconciliation state for peer=%d.\n", peer_id);

promag commented at 8:35 pm on March 28, 2021:

3e4022a4a87c91b4b671ab4b71ccbd6d7136bade

Could make this log conditional to whether peer_id was erased from m_local_salts.

naumenkogs force-pushed on Mar 28, 2021

in src/txreconciliation.cpp:126 in 8cdaf385ae outdated

121+        if (recon_state != m_states.end()) return;
122+
123+        recon_version = std::min(recon_version, RECON_VERSION);
124+        if (recon_version < 1) return;
125+
126+        // Must match SuggestReconciliation logic.

promag commented at 8:50 pm on March 28, 2021:

8cdaf385aee34a9a10406eef73d6dcae74b5a2c4

Should be “SuggestReconciling”?

in src/txreconciliation.cpp:172 in 49108e3894 outdated

167+        LOCK(m_mutex);
168+        auto recon_state = m_states.find(peer_id);
169+        if (recon_state == m_states.end()) {
170+            return std::nullopt;
171+        }
172+        return (*recon_state).second.m_flood_to;

promag commented at 10:26 pm on March 28, 2021:

49108e38941f7136fafdbfc64a989aba0f1cc99f

nit, ->second.

in src/net_processing.cpp:2665 in 99906e0c33 outdated

2546@@ -2539,6 +2547,28 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
2547                       pfrom.nVersion.load(), peer->m_starting_height,
2548                       pfrom.GetId(), (fLogIPs ? strprintf(", peeraddr=%s", pfrom.addr.ToString()) : ""),
2549                       pfrom.ConnectionTypeAsString());
2550+
2551+            // Keep track of the number of outbound flooding peers.
2552+            bool we_may_flood_to = false;
2553+            if (pfrom.MightSupportTransactionRelay()) {
2554+                if (m_reconciliation.IsPeerRegistered(pfrom.GetId())) {
2555+                    if (*m_reconciliation.IsPeerChosenForFlooding(pfrom.GetId())) {

promag commented at 10:39 pm on March 28, 2021:

99906e0c3399225ee361ce85291074971d2025e1

IsPeerRegistered is not needed since IsPeerChosenForFlooding returns nullopt. Maybe:

0if (pfrom.MightSupportTransactionRelay()) {
1    we_may_flood_to = m_reconciliation.IsPeerChosenForFlooding(pfrom.GetId()).value_or(true);
2}

naumenkogs commented at 9:27 am on March 29, 2021:

I agree this is way cleaner, but I think the current code is easier to understand, so I’m leaning towards my approach.

promag commented at 9:29 am on March 29, 2021:

The problem is that you can hit undefined behavior with the current code - assuming peer can be disconnected between IsPeerRegistered and IsPeerChosenForFlooding.

Edit: Looks like the above is not possible.

promag commented at 10:43 pm on March 28, 2021: contributor

Code review ACK up to 99906e0c3399225ee361ce85291074971d2025e1.

naumenkogs force-pushed on Mar 29, 2021

in src/net_processing.cpp:4557 in 3692db74f8 outdated

4553@@ -4552,15 +4554,19 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
4554             }
4555             peer->m_blocks_for_inv_relay.clear();
4556 
4557+            const bool supports_recon = m_reconciliation.IsPeerRegistered(pto->GetId());

promag commented at 11:01 am on March 29, 2021:

3692db74f86186789c9dad4005f7b768721d5afe

nit, could move to inner scope.

in src/txreconciliation.cpp:656 in 83a08c5ea0 outdated

177@@ -154,6 +178,19 @@ class TxReconciliationTracker::Impl {
178             full_salt.GetUint64(1), we_initiate, flood_to));
179     }
180 
181+    void AddToReconSet(NodeId peer_id, const std::vector<uint256>& txs_to_reconcile)
182+    {
183+        LOCK(m_mutex);

promag commented at 11:04 am on March 29, 2021:

83a08c5ea07f7f7e3e67501ac8666521427ae9d3

nit, assert(txs_to_reconcile.size() > 0)

in src/txreconciliation.cpp:190 in 83a08c5ea0 outdated

185+        assert(recon_state != m_states.end());
186+        for (auto& wtxid: txs_to_reconcile) {
187+            recon_state->second.m_local_set.m_wtxids.insert(wtxid);
188+        }
189+
190+        LogPrint(BCLog::NET, "Added %i transactions to the reconciliation set for peer=%d.\n",

promag commented at 11:06 am on March 29, 2021:

83a08c5ea07f7f7e3e67501ac8666521427ae9d3

Could m_local_set already have some of txs_to_reconcile? If so maybe we should count successful .insert() and log that?

in src/net_processing.cpp:4632 in 3692db74f8 outdated

4628@@ -4622,6 +4629,7 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
4629                         // Fetch the top element from the heap
4630                         std::pop_heap(vInvTx.begin(), vInvTx.end(), compareInvMempoolOrder);
4631                         uint256 hash = vInvTx.back();
4632+                        bool flood_tx = pto->m_tx_relay->m_txs_to_announce.at(hash);

promag commented at 11:12 am on March 29, 2021:

3692db74f86186789c9dad4005f7b768721d5afe

Move to inner scope, make const.

naumenkogs commented at 7:26 pm on April 2, 2021:

Making const. Can’t move because it’s erased in two lines after, so it won’t be accessible in the inner scope. And I can’t call .second on .erase, because it returns a number of erased elements.

One alternative could be to use txid_to_ann_protocol in vInvTx (instead of uint256). But then I won’t be able to reuse CompareInvMempoolOrder in AnnounceTxs.

in src/txreconciliation.cpp:126 in 05ac9dc21e outdated

119@@ -120,6 +120,13 @@ class TxReconciliationTracker::Impl {
120      */
121     std::unordered_map<NodeId, ReconciliationState> m_states GUARDED_BY(m_mutex);
122 
123+    /**
124+     * Maintains a queue of reconciliations we should initiate. To achieve higher bandwidth
125+     * conservation and avoid overflows, we should reconcile in the same order, because then it’s
126+     * easier to estimate set differene size.

promag commented at 11:13 am on March 29, 2021:

05ac9dc21e5697a1e0261909b64de4f7d6e21eb1

typo “differene”

in src/txreconciliation.cpp:142 in e854bcf539 outdated

134@@ -127,6 +135,16 @@ class TxReconciliationTracker::Impl {
135      */
136     std::deque<NodeId> m_queue GUARDED_BY(m_mutex);
137 
138+    /**
139+     * Reconciliations are requested periodically:
140+     * every RECON_REQUEST_INTERVAL seconds we pick a peer from the queue.
141+     */
142+    std::chrono::microseconds m_next_recon_request{0};

promag commented at 11:17 am on March 29, 2021:

e854bcf5391d02a28690f7eedad8382cbb05f85b

Add TSAN annotation?

in src/txreconciliation.cpp:443 in e854bcf539 outdated

140+     * every RECON_REQUEST_INTERVAL seconds we pick a peer from the queue.
141+     */
142+    std::chrono::microseconds m_next_recon_request{0};
143+    void UpdateNextReconRequest(std::chrono::microseconds now) EXCLUSIVE_LOCKS_REQUIRED(m_mutex)
144+    {
145+        m_next_recon_request = now + RECON_REQUEST_INTERVAL / m_queue.size();

promag commented at 11:19 am on March 29, 2021:

e854bcf5391d02a28690f7eedad8382cbb05f85b

nit assert(m_queue.size() > 0)

in src/txreconciliation.cpp:40 in 476ced6df2 outdated

33@@ -27,6 +34,14 @@ constexpr uint32_t MAX_OUTBOUND_FLOOD_TO = 8;
34  */
35 constexpr std::chrono::microseconds RECON_REQUEST_INTERVAL{16s};
36 
37+/**
38+ * Represents phase of the current reconciliation round with a peer.
39+ */
40+enum ReconciliationPhase {

promag commented at 11:25 am on March 29, 2021:

476ced6df27c13fd8d1dae46d09b91549977cac1

nit, enum class and use as ReconciliationPhase::NONE?

in src/net_processing.h:56 in 272b3b2db9 outdated

52+    /**
53+     * Relay transaction to every node.
54+     * The announcement protocol (flooding or reconciliation) will be chosen based on from which
55+     * peer the transaction came to us. For transactions originated locally, pass from=nullopt.
56+     */
57+    virtual void RelayTransaction(const uint256& txid, const uint256& wtxid, std::optional<NodeId> from=std::nullopt)

jnewbery commented at 11:34 am on March 29, 2021:

There’s no need to update this public interface method. The only users of this method outside net_processing will always call RelayTransaction() without a from parameter.

(it’s also unnecessary to wrap an optional parameter in a std::optional since the fact that it’s an optional parameter already gives you a way to indicate no value)

naumenkogs commented at 7:39 pm on April 2, 2021:

Sorry I don’t understand this comment, despite your elaborate explanation. Probably my lack of C++ expertise.

If I remove just this change in the virtual void line, I get a compilation error.

jnewbery commented at 9:50 am on April 15, 2021:

Yes, you’ll need to change the function signature in PeerManagerImpl() as well. An override function must have the same signature as the function it’s overriding.

in src/net_processing.cpp:376 in 272b3b2db9 outdated

371@@ -365,6 +372,9 @@ class PeerManagerImpl final : public PeerManager
372     /** Number of peers with wtxid relay. */
373     int m_wtxid_relay_peers GUARDED_BY(cs_main) = 0;
374 
375+    /** Number of outbound peers we may flood transactions to. */
376+    int m_out_flood_to GUARDED_BY(cs_main) = 0;

jnewbery commented at 11:38 am on March 29, 2021:

No need to protect this with cs_main. You can just make it atomic.

in src/net_processing.cpp:611 in 272b3b2db9 outdated

606@@ -597,6 +607,9 @@ struct CNodeState {
607     //! Whether this peer relays txs via wtxid
608     bool m_wtxid_relay{false};
609 
610+    //! Whether this peer may be used by us for flooding txs.
611+    bool m_we_may_flood_to{false};

jnewbery commented at 11:38 am on March 29, 2021:

Again, no need for this to be protected by main. I think it can easily live in Peer.

promag commented at 11:38 am on March 29, 2021: contributor

Code review up to f842ec7839e36adf24782228934f4fff223279a5.

in src/net_processing.cpp:2068 in 272b3b2db9 outdated

2062@@ -2017,8 +2063,9 @@ void PeerManagerImpl::ProcessHeadersMessage(CNode& pfrom, const Peer& peer,
2063  *                                  orphan will be reconsidered on each call of this function. This set
2064  *                                  may be added to if accepting an orphan causes its children to be
2065  *                                  reconsidered.
2066+ * @param           from            The peer for which we are processing the orphan set.
2067  */
2068-void PeerManagerImpl::ProcessOrphanTx(std::set<uint256>& orphan_work_set)
2069+void PeerManagerImpl::ProcessOrphanTx(std::set<uint256>& orphan_work_set, NodeId from)

jnewbery commented at 11:41 am on March 29, 2021:

New from parameter is unused.

naumenkogs force-pushed on Mar 29, 2021

DrahtBot added the label Needs rebase on Apr 1, 2021

naumenkogs force-pushed on Apr 2, 2021

DrahtBot removed the label Needs rebase on Apr 2, 2021

in src/txreconciliation.h:37 in baa461151d outdated

26+ *    required because the initiator knows only short IDs of those transactions.
27+ * 5. Sometimes reconciliation fails if the difference is larger than the parties estimated,
28+ *    then there is one sketch extension round, in which the initiator requests for extra data.
29+ * 6. If extension succeeds, go to step 4.
30+ * 7. If extension fails, the initiator notifies the peer
31+ */

ariard commented at 9:21 pm on April 14, 2021:

Thanks for the clear write-up, may I suggest to insert a stable link to the paper ?

in src/txreconciliation.h:38 in baa461151d outdated

27+ * 5. Sometimes reconciliation fails if the difference is larger than the parties estimated,
28+ *    then there is one sketch extension round, in which the initiator requests for extra data.
29+ * 6. If extension succeeds, go to step 4.
30+ * 7. If extension fails, the initiator notifies the peer
31+ */
32+class TxReconciliationTracker {

ariard commented at 9:28 pm on April 14, 2021:

I think one obvious extension of the test coverage would be to join basic fuzz coverage of TxReconciliationTracker interface to assert its robustness. Have a look on src/test/fuzz/addrman.cpp or src/test/fuzz/banman.cpp.

Fuzz coverage might test further sanity of TxReconciliationTracker after each call or in function of m_we_initiate or m_flood_to. Still thinking about concrete suggestions there.

ariard commented at 9:32 pm on April 14, 2021: none

I think PR description might incorporate a test plan describing in details functional/unit/fuzz coverage/manual testing coverage. Working on a suggestion.

At least for manual testing, we should start to run nodes on a signet with this branch. But should add a preferential peering to find erlay peers on the default signet network or should we encourage signet node operators to upgrade their peers with this branch ? Or even boostrap a new signet ?

in src/net_processing.cpp:139 in baa461151d outdated

130+static constexpr auto INBOUND_INVENTORY_BROADCAST_INTERVAL = 2s;
131 /** Average delay between trickled inventory transmissions for outbound peers.
132  *  Use a smaller delay as there is less privacy concern for them.
133  *  Blocks and peers with noban permission bypass this. */
134-static constexpr auto OUTBOUND_INVENTORY_BROADCAST_INTERVAL = 2s;
135+static constexpr auto OUTBOUND_INVENTORY_BROADCAST_INTERVAL = 1s;

ariard commented at 3:21 pm on April 15, 2021:

Surprised that modifying those 2 values doesn’t lead to functional test changes. Apparently, they’re not covered in p2p_tx_download.py. I can extend coverage there in another PR.

in src/net_processing.cpp:1596 in baa461151d outdated

1550+        // inbounds* (see above), so non-reachable nodes would never flood any transactions.
1551+        // Now, if they flood their local transactions, it will be obvious for their immediate peers
1552+        // that those transactions belong to those non-reachable nodes.
1553+        //
1554+        // On the other hand, if non-reachable nodes reconcile those transactions, they would look
1555+        // like any other transactions of non-reachable nodes.

ariard commented at 4:18 pm on April 15, 2021:

Do we have privacy leaks for the following cases not mentioned in this comment ?

A non-reachable node reconciliate with reconciliation-supporting flooding outbound peers ? As it would be a deviation from the expected announcement policy.

A reachable node reconciliate with reconciliation-supporting flooding inbound/outbound peers ? Likewise, it would be a deviation from the expected announcement policy on those links.

I think those cases are handled well at tx-announcement in SendMessages. Though I believe this comment would be better if it said that reconciliation is downgraded to flooding if this the announcement previously selected for the to-be-announced-to peer.

naumenkogs commented at 10:52 am on May 3, 2021:

Let’s continue this same discussion in one place: #21515 (review)

in src/net_processing.cpp:3312 in baa461151d outdated

2388+    std::vector<CInv> remote_missing_invs;
2389+    remote_missing_invs.reserve(std::min<size_t>(remote_missing_wtxids.size(), MAX_INV_SZ));
2390+
2391+    while (!remote_missing_wtxids.empty()) {
2392+        // No need to add transactions to peer's filter or do checks
2393+        // because it was already done when adding to the reconciliation set.

ariard commented at 4:37 pm on April 15, 2021:

In theory, peer’s feefilter might have increased between transaction selection for announcement at INV sending and receiving a SKETCH or RECONCILDIFF messages. Though AVG_FEEFILTER_BROADCAST_INTERVAL is 10min so it should be really okay.

Is this gap could be exploited by an off-path attacker to interfere with tx-relay ? I don’t think so, or at least without costly manipulation of victim’s mempools.

naumenkogs commented at 9:45 am on April 21, 2021:

Accidentally some nodes might ignore fee filter, but we don’t punish, so it’s safe in that sense. Some useless transactions could be sent, but I think it’s too hard to exploit in a real way.

ariard commented at 3:39 pm on April 21, 2021:

Yes I recognize that’s hard to exploit but even further any mempool manipulation comes up with a priced cost for the attacker as you need utxo value to fan out in victim(s) mempools.

in src/net_processing.cpp:2406 in baa461151d outdated

2401+            remote_missing_invs.clear();
2402+        }
2403+    }
2404+
2405+    if (remote_missing_invs.size() != 0) {
2406+        m_connman.PushMessage(&pto, msgMaker.Make(NetMsgType::INV, remote_missing_invs));

ariard commented at 4:44 pm on April 15, 2021:

This is logically equivalent ?

 0@@ -2386,15 +2390,11 @@ void PeerManagerImpl::AnnounceTxs(std::vector<uint256> remote_missing_wtxids, CN
 1         remote_missing_wtxids.pop_back();
 2 
 3         remote_missing_invs.push_back(CInv(MSG_WTX, wtxid));
 4-        if (remote_missing_invs.size() == MAX_INV_SZ) {
 5+        if (remote_missing_invs.size() == MAX_INV_SZ || remote_missing_wtxids.empty()) {
 6             m_connman.PushMessage(&pto, msgMaker.Make(NetMsgType::INV, remote_missing_invs));
 7             remote_missing_invs.clear();
 8         }
 9     }
10-
11-    if (remote_missing_invs.size() != 0) {
12-        m_connman.PushMessage(&pto, msgMaker.Make(NetMsgType::INV, remote_missing_invs));
13-    }
14 }

naumenkogs force-pushed on Apr 15, 2021

in test/functional/p2p_erlay.py:38 in a9270714e5 outdated

33+MAX_OUTBOUND_FLOOD_TO = 8
34+
35+BYTES_PER_SKETCH_CAPACITY = FIELD_BITS / 8
36+
37+#################################
38+######## General helpers ########

adamjonas commented at 3:37 pm on April 16, 2021:

The linter is not happy with this comment style.

brunoerg commented at 0:52 am on April 18, 2021:

Not only with that. I ran pycodestyle and got many warnings about it.

naumenkogs commented at 4:11 pm on April 21, 2021:

@brunoerg I think following best practices beyond what our linter does is not necessary, but still I tried to resolve most of those complaints from pycodestyle where it makes sense.

in src/net_processing.cpp:2797 in a9270714e5 outdated

2792+        uint32_t recon_version;
2793+        uint64_t remote_salt;
2794+        vRecv >> they_initiator >> they_responder >> recon_version >> remote_salt;
2795+
2796+        // Since this is called before VERACK, m_out_flooding_peers doesn't include this peer
2797+        // being considered for reconciliation support, so no need for substraction.

adamjonas commented at 3:45 pm on April 16, 2021:

s/substraction/subtraction/

in src/net_processing.cpp:4003 in a9270714e5 outdated

3998+                // we want to request extension).
3999+                m_connman.PushMessage(&pfrom, msgMaker.Make(NetMsgType::RECONCILDIFF,
4000+                    *recon_result, txs_to_request));
4001+            } else {
4002+                // No final result means we should request sketch extension to make another
4003+                // reconciliation attempt without loosing the initial data.

adamjonas commented at 3:46 pm on April 16, 2021:

s/loosing/losing/

in test/functional/p2p_erlay.py:556 in a9270714e5 outdated

475+
476+        self.test_node = self.nodes[0].add_p2p_connection(TestP2PConn(recon_version=1,
477+            mininode_salt=MININODE_SALT, be_recon_requestor=False, be_recon_responder=True), node_outgoing=True)
478+        self.test_node.wait_for_verack()
479+
480+        self.blocks = self.nodes[0].generate(nblocks=1024)

brunoerg commented at 0:26 am on April 18, 2021:

Why 1024? Is it an arbitrary number? I tested it using 500 blocks and it seems to work. Is there an explanation?

naumenkogs commented at 7:14 am on April 19, 2021:

Yeah, pretty arbitrary. I could generate them in generate_transactions function, but then it would force block sync across nodes, which is not ideal to test that transactions are not lost between scenarios (block sync could hide this effect by forcing tx sync)

DrahtBot added the label Needs rebase on Apr 19, 2021

in src/net_processing.cpp:2910 in a9270714e5 outdated

2780+            return;
2781+        }
2782+
2783+        LOCK(cs_main);
2784+        if (!State(pfrom.GetId())->m_wtxid_relay) {
2785+            // Disconnect peers that send a SENDRECON message before/without WTXIDRELAY.

ariard commented at 3:45 pm on April 20, 2021:

I think we should update BIP330 to mark the dependency on BIP339. Also, should we bump the protocol version ? Otherwise a bip330 peer will send SENDRECON for nothing to bip339-only peer (core version 20+).

naumenkogs commented at 9:25 am on April 21, 2021:

I think we should update BIP330 to mark the dependency on BIP339.

Agree.

Also, should we bump the protocol version ? Otherwise a bip330 peer will send SENDRECON for nothing to bip339-only peer (core version 20+).

I don’t understand what you mean here.

ariard commented at 3:46 pm on April 21, 2021:

A bip330 peer sends a SENDRECON message to a wtxid-relay peer but not supporting bip330. The wtxid-relay peer doesn’t understand SENDRECON and will ignore the message. The bip330 wasted SENDRECON message bandwidth.

If the bip330 peer would have been aware that the wtxid-relay peer didn’t support bip330 it could have avoid sending SENDRECON. Right now, I think your check implies that wtxid-relay peers also support bip330, I believe a protocol version bump would avoid this minor issue.

in src/net_processing.cpp:2672 in a9270714e5 outdated

2653+                        we_may_flood_to = true;
2654+                    }
2655+                } else {
2656+                    // All non-reconciling peers which pass the MightSupportTransactionRelay and
2657+                    // !IsInboundConn checks are outbound flooding peers.
2658+                    we_may_flood_to = true;

ariard commented at 4:24 pm on April 20, 2021:

I don’t think what said in comment “which pass… !IsInboundConn” is enforced and actually inbound peers might take outbound flooding peers slots ?

Turning this flag to false break test_outgoing_recon as all added peers L475-479 aren’t occupying anymore flooding peers slots and as such the tested peers is flood to. IIRC add_p2p_connection’s peers are seen as inbound ?

naumenkogs commented at 9:02 am on April 21, 2021:

I don’t think what said in comment “which pass… !IsInboundConn” is enforced and actually inbound peers might take outbound flooding peers slots ?

See line above: if (!pfrom.IsInboundConn()) {, that’s where it’s enforced.

IIRC add_p2p_connection’s peers are seen as inbound ?

Nope, I added a hack for that: 108b0233be7735b793e488ba5bda836d24b75d4e

ariard commented at 4:05 pm on April 21, 2021:

Ah sorry didn’t see the patch was the reason of my confusion. So yes it’s tested and the logic works as documented.

Maybe add a friendly note L475 in p2p_erlay.py “Contrary to default mininode setting, we mark those peers as node_outgoing=true to simulate them as outgoing connections and thus have them consume regular flood tx forwarding” ?

in src/net_processing.cpp:2818 in a9270714e5 outdated

2794+        vRecv >> they_initiator >> they_responder >> recon_version >> remote_salt;
2795+
2796+        // Since this is called before VERACK, m_out_flooding_peers doesn't include this peer
2797+        // being considered for reconciliation support, so no need for substraction.
2798+        m_reconciliation.EnableReconciliationSupport(pfrom.GetId(), pfrom.IsInboundConn(),
2799+            they_initiator, they_responder, recon_version, remote_salt, m_out_flood_to);

ariard commented at 4:35 pm on April 20, 2021:

Is the following scenario possible ?

current outbound flooding peers=MAX_OUTBOUND_FLOOD_TO - 1
outbound peer 1 connects
outbound peer 2 connects
outbound peer 1 send a SENDRECON is marked as outbound flooding peers as a slot is still available
outbound peer 2 send a SENDRECON is marked as outbound flooding peers as a slot is still available
outbound peer 1 send a VERACK, current outbound flooding peers = MAX_OUTBOUND_FLOOD_TO
outbound peer 2 send a VERACK, current outbound flooding peers = MAX_OUTBOUND_FLOOD_TO +1

What do you think about moving outbound flooding peers accouting from VERACK to SENDRECON ? Independently of my scenario truthiness, I believe it would even avoid to reason on how message ordering is influencing accounting.

naumenkogs commented at 10:01 am on April 27, 2021:

Yes, this scenario is possible.

What do you think about moving outbound flooding peers accouting from VERACK to SENDRECON ?

The problem with your suggestion is that we also need to account for peers which will never send SENDRECON (legacy flooding peers). There are 2 possible solutions:

Handle legacy peers as a special case, so have this accounting twice (in VERACK for legacy, in SENDRECON for reconciliation).
Always initialize ReconciliationState with m_flood_to=false, and update it to true on VERACK if there are still slots (along with tracking out peers in VERACK)

naumenkogs commented at 12:24 pm on April 27, 2021:

I will implement the latter.

in src/txreconciliation.cpp:47 in a9270714e5 outdated

42+/**
43+ * When considering whether we should flood to an outbound connection supporting reconciliation,
44+ * see how many outbound connections are already used for flooding. Flood only if the limit is not reached.
45+ * It helps to save bandwidth and reduce the privacy leak.
46+ */
47+constexpr uint32_t MAX_OUTBOUND_FLOOD_TO = 8;

ariard commented at 4:43 pm on April 20, 2021:

I think this variable belong more to our transaction relay policy rather than the reconciliation module in itself. Its effect on the reconciliation module are function of the connection state, which is located outside this module. I’m leaning towards removing any “flooding” mention from this module, it should be agnostic about the existence of another transaction-announcement protocol.

I guess “flood” could be a flag in CNodeState/Peer and allow to remove IsPeerChosenForFlooding here.

What do you think ?

in src/net_processing.cpp:4826 in a9270714e5 outdated

4780+                            // We don't care about a laggy peer (1) because we probably can't help them even if we flood transactions.
4781+                            // However, exploiting (2) should not prevent us from relaying certain transactions.
4782+                            //
4783+                            // Transactions which don't make it to the set due to the limit are announced via flooding.
4784+                            const size_t recon_set_size = *m_reconciliation.GetPeerSetSize(pto->GetId());
4785+                            if (txs_to_reconcile.size() + recon_set_size < MAX_PEER_TX_ANNOUNCEMENTS) {

ariard commented at 5:32 pm on April 20, 2021:

I don’t think MAX_PEER_TX_ANNOUNCEMENTS is covered by p2p_erlay.py ? This diff doesn’t yell an error.

 0diff --git a/src/net_processing.cpp b/src/net_processing.cpp
 1index 689b899f1..0703fcea7 100644
 2--- a/src/net_processing.cpp
 3+++ b/src/net_processing.cpp
 4@@ -81,7 +81,7 @@ static constexpr int32_t MAX_PEER_TX_REQUEST_IN_FLIGHT = 100;
 5  *  the actual transaction (from any peer) in response to requests for them.
 6  *  Also limits a maximum number of elements to store in the reconciliation set.
 7  */
 8-static constexpr int32_t MAX_PEER_TX_ANNOUNCEMENTS = 5000;
 9+static constexpr int32_t MAX_PEER_TX_ANNOUNCEMENTS = 4999;
10 /** How long to delay requesting transactions via txids, if we have wtxid-relaying peers */
11 static constexpr auto TXID_RELAY_DELAY = std::chrono::seconds{2};
12 /** How long to delay requesting transactions from non-preferred peers */

naumenkogs commented at 7:14 am on May 5, 2021:

I tried to make this test, but signing 5000 transactions takes minutes (that’s expected, right?), and that’s the only way to test this.

Instead, I’m adding tests for MAX_SKETCH_CAPACITY = 2 << 12

in src/net_processing.cpp:4787 in a9270714e5 outdated

4740@@ -4525,12 +4741,12 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
4741                     while (!vInvTx.empty() && nRelayedTransactions < INVENTORY_BROADCAST_MAX) {
4742                         // Fetch the top element from the heap
4743                         std::pop_heap(vInvTx.begin(), vInvTx.end(), compareInvMempoolOrder);
4744-                        std::set<uint256>::iterator it = vInvTx.back();
4745+                        uint256 hash = vInvTx.back();
4746+                        const bool flood_tx = pto->m_tx_relay->m_txs_to_announce.at(hash);

ariard commented at 5:40 pm on April 20, 2021:

IIUC, erlay is introducing an interdependency between transaction’s source and selection of outgoing announcement-protocol (cf RelayTransaction), but I’m not if this is covered by p2p_erlay.py ? Following diff doesn’t yell a failure. Same we turning to always true.

 0diff --git a/src/net_processing.cpp b/src/net_processing.cpp
 1index 1818fc0f2..a4e847110 100644
 2--- a/src/net_processing.cpp
 3+++ b/src/net_processing.cpp
 4@@ -4732,7 +4732,8 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
 5                         // Fetch the top element from the heap
 6                         std::pop_heap(vInvTx.begin(), vInvTx.end(), compareInvMempoolOrder);
 7                         uint256 hash = vInvTx.back();
 8-                        bool flood_tx = pto->m_tx_relay->m_txs_to_announce.at(hash);
 9+                        //bool flood_tx = pto->m_tx_relay->m_txs_to_announce.at(hash);
10+                        bool flood_tx = false;
11                         vInvTx.pop_back();
12                         CInv inv(state.m_wtxid_relay ? MSG_WTX : MSG_TX, hash);
13                         // Remove it from the to-be-sent set

in src/net_processing.cpp:4787 in a9270714e5 outdated

4785+                            if (txs_to_reconcile.size() + recon_set_size < MAX_PEER_TX_ANNOUNCEMENTS) {
4786+                                // Check that:
4787+                                // 1) the peer isn't set for flooding OR
4788+                                // 2) the transaction isn't set for flooding
4789+                                const bool recon_peer_flood_to = *m_reconciliation.IsPeerChosenForFlooding(pto->GetId());
4790+                                if (!recon_peer_flood_to || !flood_tx) {

ariard commented at 5:42 pm on April 20, 2021:

I think it should be AND not an OR here. If recon_peer_flood_to= false, even if transaction is marked for flooding, it will be added to reconciliation set. Is it the semantic intended ?

naumenkogs commented at 8:57 am on April 21, 2021:

Is it the semantic intended?

Yes. Whether the transaction is marked for flooding only matters for reconciliation peers which are set for flooding. So, if flood_tx=true but recon_peer_flood_to= false, we should reconcile.

Perhaps this comment covers it already, no:

0    /**
1     * Per BIP-330, we may want to flood certain transactions to a subset of peers with whom we
2     * reconcile.
3     * If the peer was not previously registered for reconciliations, returns nullopt.
4     */
5    std::optional<bool> IsPeerChosenForFlooding(NodeId peer_id) const;

Should I repeat it here?

ariard commented at 4:49 pm on April 21, 2021:

So, if flood_tx=true but recon_peer_flood_to= false, we should reconcile.

This case I understand. The case I don’t get the rational for is recon_peer_flood_to=true and flood_tx=false.

AFAICT, it’s motivated to preserve privacy of transactions originated locally as detailed in RelayTransaction ? I don’t think it’s achieving its goal as this announcement (i.e reconciliation) will diverge from the announcement-protocol expected on the link (i.e flooding) and this is observable by the peer on the other side of the link ?

Overall, yes the comment could be better as it’s describing code behavior but not it’s rational (or pointing where the rational is laid out). Instead, what do you think of ?

0
1// Unless sketch capacity has been reached, expected announcement policy should not diverge
2// for reconciling peers.
3// Outbound flooding peers may the expected announcement policy overridden in function of
4// the transaction source. Transactions originated locally should be always announced through
5// reconciliation, see RelayTransaction().

naumenkogs commented at 8:43 am on April 27, 2021:

This case I understand. The case I don’t get the rational for is recon_peer_flood_to=true and flood_tx=false.

Ok, what exactly are you suggesting? Always flood if recon_peer_flood_to=true and drop flood_tx at all?

If this is applied, non-reachable nodes will flood transactions to 7 (erlay enabled) peers after receiving transactions from the 8th. No bandwidth savings because almost all transactions are flooded, bad.
How erlay solves this? Adding a policy: “flood only if a transaction is received from reconciliation initiator [or legacy peer]”. Now non-reachable nodes will never flood, because they have no recon initiator peers (because they have no inbounds). Assuming no legacy peers.
Now, what to do with local transactions? If non-reachable nodes flood them, their reconciling peers will easily tell those are local transactions, because this is the only case non-reachable node floods. For this special case, we also use reconciliation, to provide privacy.

——————-––––——-

Now, I assume you can change step (2) or even step (1) here and have some different logic. Feel free to suggest, but this is my thinking for now, and Erlay is based on it.

Let me know if it’s clear.

ariard commented at 6:28 pm on April 28, 2021:

If this is applied, non-reachable nodes will flood transactions to 7 (erlay enabled) peers after receiving transactions from the 8th. No bandwidth savings because almost all transactions are flooded, bad.

Can you point me where as a non-reachable node we necessary mark our peers as reconciling ones ? AFAICT, it’s decided there without evaluating node reachability : https://github.com/naumenkogs/bitcoin/blob/5b8c65d32c142e7d91aef13772cf8c771020bee7/src/txreconciliation.cpp#L654

That said, I agree with your laid out reasoning on flooding-vs-reconciling peers selection for non-reachable nodes and how to protect their local transaction privacy.

But I don’t think current code logic hinders well reachable nodes’s local transactions as you will reconciliate (flood_tx=false) with your outbound flooding peers.

Let me write a functional test to sustain (or not) my thinking :)

naumenkogs commented at 8:41 am on May 3, 2021:

Can you point me where as a non-reachable node we necessary mark our peers as reconciling ones ? AFAICT, it’s decided there without evaluating node reachability : https://github.com/naumenkogs/bitcoin/blob/5b8c65d32c142e7d91aef13772cf8c771020bee7/src/txreconciliation.cpp#L654

Yes, that exact line: bool flood_to = !inbound && outbound_flooders < MAX_OUTBOUND_FLOOD_TO;

For a non-reachable node, all the peers are inbound=false…. and I just realized that my logic only works if connectivity=MAX_OUTBOUND_FLOOD_TO. Is that understandable?

Now, we should think what to do when we bump the connectivity… Non-reachable nodes will have both flood_to false AND true peers.

naumenkogs commented at 9:26 pm on May 4, 2021:

Now, we should think what to do when we bump the connectivity… Non-reachable nodes will have both flood_to false AND true peers.

Ok, so this is fine, because again, local transactions and transactions from outbounds are always reconciled, so it doesn’t matter if a non-reachable node has some peers with flood_to=true.

I will improve the docs.

Keeping this open, it’s likely gonna change a bit after I do some experiments.

in src/net_processing.cpp:4724 in a9270714e5 outdated

4678+                    if (!supports_recon && pto->IsInboundConn()) {
4679                         pto->m_tx_relay->nNextInvSend = m_connman.PoissonNextSendInbound(current_time, INBOUND_INVENTORY_BROADCAST_INTERVAL);
4680                     } else {
4681+                        // Use half the delay for outbound peers, as there is less privacy concern for them.
4682+                        // Also use half delay for inbound peers if they use reconciliations,
4683+                        // because we have an additional delay for sending out requested sketches.

ariard commented at 5:51 pm on April 20, 2021:

Not sure if the following change on how reconciliation is influencing broadcast interval is covered by p2p_erlay.py

 0diff --git a/src/net_processing.cpp b/src/net_processing.cpp
 1index 1818fc0f2..28641f576 100644
 2--- a/src/net_processing.cpp
 3+++ b/src/net_processing.cpp
 4@@ -4664,7 +4664,7 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
 5                 bool fSendTrickle = pto->HasPermission(PF_NOBAN);
 6                 if (pto->m_tx_relay->nNextInvSend < current_time) {
 7                     fSendTrickle = true;
 8-                    if (!supports_recon && pto->IsInboundConn()) {
 9+                    if (pto->IsInboundConn()) {
10                         pto->m_tx_relay->nNextInvSend = m_connman.PoissonNextSendInbound(current_time, INBOUND_INVENTORY_BROADCAST_INTERVAL);
11                     } else {
12                         // Use half the delay for outbound peers, as there is less privacy concern for them.

naumenkogs force-pushed on Apr 21, 2021

DrahtBot removed the label Needs rebase on Apr 21, 2021

in src/txreconciliation.cpp:493 in eede5dc518 outdated

488+            recon_state->second.FinalizeInitByUs(true, DEFAULT_RECON_Q);
489+            recon_state->second.m_state_init_by_us.m_phase = Phase::NONE;
490+
491+            result = false;
492+
493+            LogPrint(BCLog::NET, "Reconciliation we initiated with peer=%d terminated due to empty sketch." /* Continue */

adamjonas commented at 2:52 pm on April 21, 2021:

linter not happy with this line:

All calls to LogPrintf() and LogPrint() should be terminated with \n

naumenkogs force-pushed on Apr 21, 2021

in src/net_processing.cpp:4946 in 0b142cce89 outdated

3966@@ -3800,6 +3967,68 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
3967         return;
3968     }
3969 
3970+    if (msg_type == NetMsgType::REQRECON) {
3971+        uint16_t peer_recon_set_size, peer_q;

ariard commented at 5:58 pm on April 21, 2021:

If your peer has been marked as a responder but it sends you anyway a REQRECON, should we severe the connection ? I think it’s qualify as a violation of protocol roles (sender, responder) as negotiated in SENDRECON.

Same reasoning for other Erlay messages, should we enforce protocol flow ?

naumenkogs commented at 6:55 am on April 22, 2021:

It is enforced in txreconciliation.cpp, but currently these requests are just ignored. I’m not sure if we should disconnect.

in src/net_processing.cpp:4976 in 0b142cce89 outdated

3996+            }
3997+            AnnounceTxs(txs_to_announce, pfrom);
3998+        } else {
3999+            // Disconnect peers that send reconciliation sketch violating the protocol.
4000+            LogPrint(BCLog::NET, "sketch from peer=%d violates reconciliation protocol; disconnecting\n", pfrom.GetId());
4001+            pfrom.fDisconnect = true;

ariard commented at 6:13 pm on April 21, 2021:

Mutating this flag to false doesn’t seem to break p2p_erlay.py ?

 0diff --git a/src/net_processing.cpp b/src/net_processing.cpp
 1index bbab3f92c..1dfdc3017 100644
 2--- a/src/net_processing.cpp
 3+++ b/src/net_processing.cpp
 4@@ -3998,7 +3998,7 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
 5         } else {
 6             // Disconnect peers that send reconciliation sketch violating the protocol.
 7             LogPrint(BCLog::NET, "sketch from peer=%d violates reconciliation protocol; disconnecting\n", pfrom.GetId());
 8-            pfrom.fDisconnect = true;
 9+            pfrom.fDisconnect = false;
10             return;
11         }
12         return;

in src/net_processing.cpp:4126 in 0b142cce89 outdated

3966@@ -3800,6 +3967,68 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
3967         return;
3968     }
3969 
3970+    if (msg_type == NetMsgType::REQRECON) {
3971+        uint16_t peer_recon_set_size, peer_q;
3972+        vRecv >> peer_recon_set_size >> peer_q;
3973+        m_reconciliation.HandleReconciliationRequest(pfrom.GetId(), peer_recon_set_size, peer_q);

ariard commented at 6:28 pm on April 21, 2021:

Should we sanitize peer_recon_set_size ?

What if a peer send you peer_recon_set_size=UINT16_MAX, may this provoke issues in EstimateSketchCapacity or ComputeBaseSketch, which are both consuming this value or product of it ?

naumenkogs commented at 6:56 am on April 22, 2021:

What if a peer send you peer_recon_set_size=UINT16_MAX, may this provoke issues in EstimateSketchCapacity or ComputeBaseSketch, which are both consuming this value or product of it ?

This is taken care by capacity = std::min(capacity, MAX_SKETCH_CAPACITY); in ComputeBaseSketch, so this shouldn’t cause any harm.

in src/txreconciliation.cpp:805 in 0b142cce89 outdated

806+        // returning values
807+        std::vector<uint32_t>& txs_to_request, std::vector<uint256>& txs_to_announce, std::optional<bool>& result)
808+    {
809+        uint16_t remote_sketch_capacity = uint16_t(skdata.size() / BYTES_PER_SKETCH_CAPACITY);
810+        // Protocol violation: our peer exceeded the sketch capacity, or sent a malformed sketch.
811+        if (remote_sketch_capacity > MAX_SKETCH_CAPACITY) {

ariard commented at 6:38 pm on April 21, 2021:

Should MAX_SKETCH_CAPACITY be exposed in the BIP ?

Otherwise I think other implementation can’t verify when they’re not in violation of your reconciliation bounds.

in src/net_processing.cpp:5154 in 0b142cce89 outdated

4847+        //
4848+        // Message: reconciliation response
4849+        //
4850+        {
4851+            std::vector<uint8_t> skdata;
4852+            bool respond = m_reconciliation.RespondToReconciliationRequest(pto->GetId(), skdata);

ariard commented at 6:54 pm on April 21, 2021:

How do we handle failure if the responder never replies to our reconciliation request ?

I don’t think this is handled by TxReconciliationTracker for now. After RECON_REQUEST_INTERVAL expires again, is MaybeRequestReconciliation going to generate a new reconciliation request ?

naumenkogs commented at 6:58 am on April 22, 2021:

is MaybeRequestReconciliation going to generate a new reconciliation request ?

No, because if (recon_state->second.m_state_init_by_us.m_phase != Phase::NONE) return std::nullopt; takes care of that. Basically, an initiator will wait forever.

in src/net_processing.cpp:5001 in 0b142cce89 outdated

4021+        if (valid_finalization) {
4022+            AnnounceTxs(remote_missing, pfrom);
4023+        } else {
4024+            // Disconnect peers that send reconciliation finalization violating the protocol.
4025+            LogPrint(BCLog::NET, "reconcildiff from peer=%d violates reconciliation protocol; disconnecting\n", pfrom.GetId());
4026+            pfrom.fDisconnect = true;

ariard commented at 7:00 pm on April 21, 2021:

Mutating this flag doesn’t seem to break p2p_erlay.py

 0diff --git a/src/net_processing.cpp b/src/net_processing.cpp
 1index bbab3f92c..62afc0eb3 100644
 2--- a/src/net_processing.cpp
 3+++ b/src/net_processing.cpp
 4@@ -4023,7 +4023,7 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
 5         } else {
 6             // Disconnect peers that send reconciliation finalization violating the protocol.
 7             LogPrint(BCLog::NET, "reconcildiff from peer=%d violates reconciliation protocol; disconnecting\n", pfrom.GetId());
 8-            pfrom.fDisconnect = true;
 9+            pfrom.fDisconnect = false;
10             return;
11         }
12         return;

naumenkogs force-pushed on Apr 22, 2021

in test/functional/p2p_erlay.py:20 in 5b8c65d32c outdated

15+    msg_inv, msg_getdata, msg_wtxidrelay,
16+    msg_verack, msg_sendrecon, msg_reqrecon,
17+    msg_sketch, msg_reqsketchext, msg_reconcildiff,
18+    MSG_WTX, CTransaction, CInv,
19+)
20+from test_framework.p2p import P2PDataStore, MIN_P2P_VERSION_SUPPORTED

adamjonas commented at 2:32 pm on April 22, 2021:

linter turned up: test/functional/p2p_erlay.py:20:1: F401 'test_framework.p2p.MIN_P2P_VERSION_SUPPORTED' imported but unused

naumenkogs commented at 7:42 pm on April 22, 2021:

oops, sorry I keep missing these issues. Linter broke on my machine for some reason, I better fix it before pushing again.

in src/net_processing.cpp:3611 in 5b8c65d32c outdated

2803+        }
2804+
2805+        if (pfrom.GetCommonVersion() < RECONCILIATION_RELAY_VERSION) {
2806+            LogPrint(BCLog::NET, "ignoring sendrecon due to old common version=%d from peer=%d\n", pfrom.GetCommonVersion(), pfrom.GetId());
2807+            return;
2808+        }

ariard commented at 5:10 pm on April 28, 2021:

Thanks for implementing the suggestion.

Further, should we disconnect sendrecon sender with a protocol version inferior to RECONCILIATION_RELAY_VERSION ?

Good point to keep the check on WTXID_RELAY_VERSION support, I thought at first it should be removed but you might have sendrecon sender who don’t signal wtxid-relay.

naumenkogs commented at 8:32 am on May 3, 2021:

Further, should we disconnect sendrecon sender with a protocol version inferior to RECONCILIATION_RELAY_VERSION ?

Yeah seems right.

in src/txreconciliation.h:61 in 5b8c65d32c outdated

55+     * - whether we agree to respond to reconciliation requests (send our sketches)
56+     * - reconciliation protocol version
57+     * - salt used for short ID computation required for reconciliation
58+     * A peer can't be registered for future reconciliations without this call.
59+     */
60+    std::tuple<bool, bool, uint32_t, uint64_t> SuggestReconciling(NodeId peer_id, bool inbound);

ariard commented at 4:29 pm on April 29, 2021:

Should interface documentation say that peer must not be registered more than once ?

I think that’s a property enforced by the assert in the implementation.

naumenkogs commented at 9:59 am on April 30, 2021:

I think there is a confusion about “register”. Register for me is when Enable[…] is called and when ReconciliationState is created Here, we just generate the salt.
Ok, adding the mention.

in src/test/txreconciliation_tests.cpp:136 in 5b8c65d32c outdated

129+{
130+    TxReconciliationTracker tracker;
131+
132+    auto [we_initiate_recon, we_respond_recon, recon_version, recon_salt] = tracker.SuggestReconciling(0, true);
133+    assert(!we_initiate_recon);
134+    assert(we_respond_recon);

ariard commented at 4:32 pm on April 29, 2021:

assert(recon_version == RECON_VERSION) ?

naumenkogs commented at 9:57 am on April 30, 2021:

Not sure this is really useful.
RECON_VERSION is currently hidden in the module, it it worth moving it out?

in src/test/txreconciliation_tests.cpp:146 in 5b8c65d32c outdated

138+    assert(!we_respond_recon);
139+}
140+
141+BOOST_AUTO_TEST_CASE(EnableReconciliationSupportTest)
142+{
143+    TxReconciliationTracker tracker;

ariard commented at 4:40 pm on April 29, 2021:

What’s purpose serves the TxReconciliationTrackerTest wrapper given in that unit test you’re using directly the reference implementation of TxReconciliationTracker ?

naumenkogs commented at 9:56 am on April 30, 2021:

The purpose is to deduplicate the repeatable valid handshake (Suggest/Enable) by hiding it under the TxReconciliationTrackerTest

in src/txreconciliation.cpp:628 in 5b8c65d32c outdated

625+        // for the second time should not happen
626+        LOCK(m_mutex);
627+        if (m_states.find(peer_id) != m_states.end()) return;
628+
629+        recon_version = std::min(recon_version, RECON_VERSION);
630+        if (recon_version < 1) return;

ariard commented at 5:07 pm on April 29, 2021:

W.r.t to future reconciliation upgrades, IIUC if peer support a lower version than us we downgrade to the lowest common version ? I think it’s interesting to document in the EnableReconciliationSupport declaration how reconciliation version conflicts are negotiated. Further, why RECON_VERSION=1 and not just 0 ?

Really likely we’ll have to upgrade reconciliation if we introduce package_ids as tx-relay announcements in a near future.

naumenkogs commented at 9:54 am on April 30, 2021:

W.r.t to future reconciliation upgrades, IIUC if peer support a lower version than us we downgrade to the lowest common version ? I think it’s interesting to document in the EnableReconciliationSupport declaration how reconciliation version conflicts are negotiated.?

Agree.

Further, why RECON_VERSION=1 and not just 0 ?

I think this is about personal taste, 0 might have special meaning or something. Gonna keep it 1.

Really likely we’ll have to upgrade reconciliation if we introduce package_ids as tx-relay announcements in a near future.

Then we have v2 :)

in src/test/txreconciliation_tests.cpp:20 in 5b8c65d32c outdated

143+    TxReconciliationTracker tracker;
144+    const uint64_t salt = 0;
145+
146+    // Test inbound peers.
147+    bool inbound = true;
148+

ariard commented at 5:32 pm on April 29, 2021:

Just test peer has been registered ?

 0diff --git a/src/test/txreconciliation_tests.cpp b/src/test/txreconciliation_tests.cpp
 1index e7f20d7de..8dbad9461 100644
 2--- a/src/test/txreconciliation_tests.cpp
 3+++ b/src/test/txreconciliation_tests.cpp
 4@@ -146,6 +146,10 @@ BOOST_AUTO_TEST_CASE(EnableReconciliationSupportTest)
 5     // Test inbound peers.
 6     bool inbound = true;
 7 
 8+    NodeId non_registered_peer = 100;
 9+    // Do nothing if peer is not registered
10+    tracker.EnableReconciliationSupport(non_registered_peer, inbound, false, false, 1, salt, 0);
11+
12     NodeId peer_id0 = 0;
13     tracker.SuggestReconciling(peer_id0, inbound);
14     // Both roles are false, don't register.

It’s maybe covered in other tests but otherwise asserting on

 0diff --git a/src/txreconciliation.cpp b/src/txreconciliation.cpp
 1index d4cfc8ee4..a325e0a2a 100644
 2--- a/src/txreconciliation.cpp
 3+++ b/src/txreconciliation.cpp
 4@@ -632,7 +632,9 @@ class TxReconciliationTracker::Impl {
 5         auto local_salt = m_local_salts.find(peer_id);
 6 
 7         // This function should be called only after generating the local salt.
 8-        if(local_salt == m_local_salts.end()) return;
 9+        if(local_salt == m_local_salts.end()) {
10+            assert(false);
11+        }
12 
13         // Must match SuggestReconciling logic.
14         bool we_may_initiate = !inbound, we_may_respond = inbound;

Doesn’t break EnableReconciliationSupportTest.

in src/txreconciliation.cpp:644 in 5b8c65d32c outdated

641+        bool we_initiate = we_may_initiate && they_may_respond;
642+        // If we ever announce we_initiate && we_may_respond, this will need tie-breaking. For now,
643+        // this is mutually exclusive because both are based on the inbound flag.
644+        assert(!(they_initiate && we_initiate));
645+
646+        if (!(they_initiate || we_initiate)) return;

ariard commented at 5:40 pm on April 29, 2021:

IIUC, you can only hit this return if the peer is sending us both they_may_initiate=false and they_may_respond=false. This is def a pathological case and maybe we should disconnect such buggy peer ?

in src/test/txreconciliation_tests.cpp:106 in 5b8c65d32c outdated

262+    SetMockTime(start_time + 20000);
263+    // Still don't request because the other peer is earlier in the queue.
264+    assert(!tracker.MaybeRequestReconciliation(peer_id2));
265+    tracker.RemovePeer(peer_id1);
266+    assert(tracker.MaybeRequestReconciliation(peer_id2));
267+}

ariard commented at 6:52 pm on April 29, 2021:

Maybe add some transactions to the set, to exercise that the correct local_set_size is returned

 0@@ -264,6 +270,25 @@ BOOST_AUTO_TEST_CASE(MaybeRequestReconciliationTest)
 1     assert(!tracker.MaybeRequestReconciliation(peer_id2));
 2     tracker.RemovePeer(peer_id1);
 3     assert(tracker.MaybeRequestReconciliation(peer_id2));
 4+    tracker.RemovePeer(peer_id2);
 5+
 6+    NodeId peer_id3 = 3;
 7+    tracker.SuggestReconciling(peer_id3, false);
 8+    tracker.EnableReconciliationSupport(peer_id3, false, false, true, 1, 0, 0);
 9+
10+    size_t count = 10;
11+    for (size_t i = 0; i < count; i++) {
12+        tracker.AddToReconSet(peer_id0, std::vector<uint256>{GetRandHash()});
13+    }
14+
15+    SetMockTime(start_time + 20000);
16+    // Make a request.
17+    auto request_data = tracker.MaybeRequestReconciliation(peer_id3);
18+    if (request_data) {
19+        const auto [local_set_size, local_q_formatted] = (*request_data);
20+        assert(local_set_size == 10);
21+        assert(local_q_formatted == uint16_t(DEFAULT_RECON_Q * Q_PRECISION));
22+    }
23 }

in src/txreconciliation.cpp:723 in 5b8c65d32c outdated

713+    {
714+        LOCK(m_mutex);
715+        auto recon_state = m_states.find(peer_id);
716+        if (recon_state == m_states.end()) return;
717+        if (recon_state->second.m_state_init_by_them.m_phase != Phase::NONE) return;
718+        if (recon_state->second.m_we_initiate) return;

ariard commented at 6:58 pm on April 29, 2021:

This doesn’t seem to break HandleReconciliationRequestTest ?

 0diff --git a/src/txreconciliation.cpp b/src/txreconciliation.cpp
 1index d4cfc8ee4..65dcc2675 100644
 2--- a/src/txreconciliation.cpp
 3+++ b/src/txreconciliation.cpp
 4@@ -715,7 +715,9 @@ class TxReconciliationTracker::Impl {
 5         auto recon_state = m_states.find(peer_id);
 6         if (recon_state == m_states.end()) return;
 7         if (recon_state->second.m_state_init_by_them.m_phase != Phase::NONE) return;
 8-        if (recon_state->second.m_we_initiate) return;
 9+        if (recon_state->second.m_we_initiate) {
10+            assert(false);
11+        }
12 
13         double peer_q_converted = peer_q * 1.0 / Q_PRECISION;
14         recon_state->second.m_state_init_by_them.m_remote_q = peer_q_converted;

ariard commented at 6:58 pm on April 29, 2021: none

Currently looking through unit tests.

naumenkogs force-pushed on Apr 30, 2021

ariard commented at 3:11 pm on April 30, 2021: none

See comment here if you think replacing assert with exception is better to exercise coverage in unit tests : #16700 (comment)

in src/test/txreconciliation_tests.cpp:442 in 96c261630c outdated

310+    // The node receives a reconciliation request noting that the initiator has an empty set.
311+    // The node should terminate reconciliation by sending an empty sketch.
312+    TxReconciliationTrackerTest tracker_test(true);
313+    tracker_test.HandleReconciliationRequest(1);
314+    SetMockTime(start_time + 2000);
315+    assert(tracker_test.RespondToReconciliationRequest(skdata));

ariard commented at 3:41 pm on April 30, 2021:

Though embedding the peer in TxReconciliationTrackerTest doesn’t let you test the no-registered peer or initiation direction for this method ?

DrahtBot added the label Needs rebase on May 3, 2021

DrahtBot commented at 9:32 am on May 3, 2021: contributor

🕵️ @hebasto @sipa @practicalswift have been requested to review this pull request as specified in the REVIEWERS file.

naumenkogs force-pushed on May 5, 2021

DrahtBot removed the label Needs rebase on May 5, 2021

DrahtBot added the label Needs rebase on May 5, 2021

in src/test/txreconciliation_tests.cpp:283 in 1d8d43ed09 outdated

278+    assert(!tracker.RespondToReconciliationRequest(peer_id0, skdata));
279+
280+    // Request from a non-initiating peer should be ignored.
281+    TxReconciliationTrackerTest tracker_test1(false);
282+    tracker_test1.HandleReconciliationRequest();
283+    SetMockTime(start_time + 4000);

ariard commented at 2:33 pm on May 7, 2021:

What’s the unit of the mocktime bump here, seconds or microseconds ? If it’s second I would recommend to replace the bumping value by RECON_RESPOND_INTERVAL, I think it can be taken verbatim from txreconciliation.cpp like you’re already doing for few values.

That way you can assert that’s the module is binding to the interval expectation. Otherwise, slightly increasing RECON_RESPOND_INTERVAL doesn’t break the test.

in src/test/txreconciliation_tests.cpp:425 in 1d8d43ed09 outdated

310+    assert(tracker.EnableReconciliationSupport(peer_id0, true, true, false, 1, 0));
311+    tracker.HandleReconciliationRequest(peer_id0, 1, 1);
312+    SetMockTime(start_time + 1000);
313+    tracker.RemovePeer(peer_id0);
314+    assert(!tracker.RespondToReconciliationRequest(peer_id0, skdata));
315+

ariard commented at 2:39 pm on May 7, 2021:

I think it’s missing a small case coverage if node is the initiator. It should be ignored.

 0diff --git a/src/txreconciliation.cpp b/src/txreconciliation.cpp
 1index 0218d21ad..9cac2eba5 100644
 2--- a/src/txreconciliation.cpp
 3+++ b/src/txreconciliation.cpp
 4@@ -786,7 +786,7 @@ class TxReconciliationTracker::Impl {
 5         LOCK(m_mutex);
 6         auto recon_state = m_states.find(peer_id);
 7         if (recon_state == m_states.end()) return false;
 8-        if (recon_state->second.m_we_initiate) return false;
 9+        //if (recon_state->second.m_we_initiate) return false;
10 
11         Phase incoming_phase = recon_state->second.m_state_init_by_them.m_phase;

in src/test/txreconciliation_tests.cpp:324 in 1d8d43ed09 outdated

319+    tracker_test.HandleReconciliationRequest(1);
320+    SetMockTime(start_time + 2000);
321+    assert(tracker_test.RespondToReconciliationRequest(skdata));
322+    assert(skdata.size() == 0);
323+
324+    // The node receives a reconciliation request noting that their local set is empty.

ariard commented at 3:31 pm on May 7, 2021:

I think this comment should be reverse with the test case above ?

Local set is increased by AddTransaction(5) though initiator set is empty HandleReconciliationRequest

in src/test/txreconciliation_tests.cpp:329 in 1d8d43ed09 outdated

324+    // The node receives a reconciliation request noting that their local set is empty.
325+    // The node should terminate reconciliation by sending an empty sketch.
326+    TxReconciliationTrackerTest tracker_test2(true);
327+    tracker_test2.AddTransactions(5);
328+    tracker_test2.HandleReconciliationRequest(0);
329+    SetMockTime(start_time + 3000);

ariard commented at 3:33 pm on May 7, 2021:

Also I don’t understand why you bump mocktime by +1000 at each new test case ? You’re renewing tracker_test everytime so it shouldn’t matter

in src/txreconciliation.cpp:284 in 1d8d43ed09 outdated

315+     */
316+    Minisketch ComputeBaseSketch(uint32_t& capacity)
317+    {
318+        Minisketch sketch;
319+        // Avoid serializing/sending an empty sketch.
320+        if (m_local_set.GetSize() == 0 || capacity == 0) return sketch;

ariard commented at 3:45 pm on May 7, 2021:

If this helper is called in RespondToInitialRequest I believe the check on local set size is duplicated.

To avoid this, what do you think about uplifting this check to HandleInitialSketch and returning true and empty skdata there ?

in src/test/txreconciliation_tests.cpp:469 in 1d8d43ed09 outdated

340+    uint32_t expected_capacity = (10 - 1) + q * 1 + 1;
341+    Minisketch expected_sketch = tracker_test3.ComputeSketch(tracker_test3.m_transactions, expected_capacity);
342+    assert(skdata == expected_sketch.Serialize());
343+    // Then respond with an extension sketch.
344+    tracker_test3.HandleExtensionRequest();
345+    assert(tracker_test3.RespondToReconciliationRequest(skdata));

ariard commented at 3:53 pm on May 7, 2021:

You can verify that PrepareForExtensionRequest did clear up local_set for local_snapshot.

 0diff --git a/src/test/txreconciliation_tests.cpp b/src/test/txreconciliation_tests.cpp
 1index 2e12d55dc..35e3db5e5 100644
 2--- a/src/test/txreconciliation_tests.cpp
 3+++ b/src/test/txreconciliation_tests.cpp
 4@@ -343,344 +343,345 @@ BOOST_AUTO_TEST_CASE(RespondToReconciliationRequestTest)
 5     // Then respond with an extension sketch.
 6     tracker_test3.HandleExtensionRequest();
 7     assert(tracker_test3.RespondToReconciliationRequest(skdata));
 8+    assert(tracker_test3.GetPeerSetSize() == 0);
 9     std::vector<uint8_t> extended_sketch = tracker_test3.ComputeSketch(tracker_test3.m_transactions, expected_capacity * 2).Serialize();
10     std::vector<uint8_t> sketch_extension(extended_sketch.begin() + extended_sketch.size() / 2, extended_sketch.end());
11     assert(skdata == sketch_extension);

naumenkogs commented at 2:25 pm on June 29, 2021:

Adding a check for local_set, but it’s hard to get into local_set_snapshot because it’s a private variable with no getter method. I could possibly check it implicitly, but that’s more of a functional test thing.

in src/txreconciliation.cpp:747 in 1d8d43ed09 outdated

765+        // Update local reconciliation state for the peer.
766+        recon_state->second.m_state_init_by_them.m_phase = Phase::EXT_RESPONDED;
767+
768+        // Local extension sketch can be null only if initial sketch or initial capacity was 0,
769+        // in which case we would have terminated reconciliation already.
770+        uint32_t extended_capacity = std::min(uint32_t(recon_state->second.m_capacity_snapshot * 2), MAX_SKETCH_CAPACITY * 2);

ariard commented at 4:04 pm on May 7, 2021:

Verify but I believe that module’s previous steps guarantee that m_capacity_snapshot can’t be superior to MAX_SKETCH_CAPCITY.

At least following diff seems to pass unit tests:

 0diff --git a/src/txreconciliation.cpp b/src/txreconciliation.cpp
 1index 0218d21ad..d035569af 100644
 2--- a/src/txreconciliation.cpp
 3+++ b/src/txreconciliation.cpp
 4@@ -762,12 +762,13 @@ class TxReconciliationTracker::Impl {
 5     void RespondToExtensionRequest(std::unordered_map<NodeId, ReconciliationState>::iterator& recon_state, std::vector<uint8_t>& skdata)
 6     {
 7         assert(recon_state->second.m_capacity_snapshot > 0);
 8+        assert(MAX_SKETCH_CAPACITY * 2 > recon_state->second.m_capacity_snapshot * 2);
 9         // Update local reconciliation state for the peer.
10         recon_state->second.m_state_init_by_them.m_phase = Phase::EXT_RESPONDED;
11 
12         // Local extension sketch can be null only if initial sketch or initial capacity was 0,
13         // in which case we would have terminated reconciliation already.
14-        uint32_t extended_capacity = std::min(uint32_t(recon_state->second.m_capacity_snapshot * 2), MAX_SKETCH_CAPACITY * 2);
15+        uint32_t extended_capacity = uint32_t(recon_state->second.m_capacity_snapshot * 2);
16         Minisketch sketch = recon_state->second.ComputeExtendedSketch(extended_capacity);
17         assert(sketch);
18         skdata = sketch.Serialize();

in src/test/txreconciliation_tests.cpp:519 in 1d8d43ed09 outdated

389+    std::vector<uint32_t> txs_to_request;
390+    std::vector<uint256> txs_to_announce;
391+    std::optional<bool> recon_result;
392+
393+    // Protocol violation: unknown peer.
394+    assert(!tracker_test.HandleSketch(std::vector<uint8_t>(), txs_to_request, txs_to_announce, recon_result, true));

ariard commented at 4:11 pm on May 7, 2021:

nit: verify if we’re reconciliation responder and not initiator.

naumenkogs commented at 2:22 pm on June 29, 2021:

No way to test this. HandleSketch() can be legally called only after MaybeRequestSketch (m_state_init_by_us.m_phase = Phase::INIT_REQUESTED) , which is not possible for responders in the first place.

in src/test/txreconciliation_tests.cpp:520 in 1d8d43ed09 outdated

390+    std::vector<uint256> txs_to_announce;
391+    std::optional<bool> recon_result;
392+
393+    // Protocol violation: unknown peer.
394+    assert(!tracker_test.HandleSketch(std::vector<uint8_t>(), txs_to_request, txs_to_announce, recon_result, true));
395+

ariard commented at 4:13 pm on May 7, 2021:

Also I don’t think this unit test is covering sketch reception during wrong reconciliation phase, the final branch of HandleSketch ?

in src/txreconciliation.cpp:459 in 1d8d43ed09 outdated

472+        }
473+
474+        Minisketch local_sketch = recon_state->second.ComputeBaseSketch(remote_sketch_capacity);
475+        Minisketch remote_sketch;
476+        if (remote_sketch_capacity != 0) {
477+            remote_sketch = Minisketch(RECON_FIELD_SIZE, 0, remote_sketch_capacity).Deserialize(skdata);

ariard commented at 4:58 pm on May 7, 2021:

I would verify returned pointer before to try Deserialize(). Beyond capacity == 0, minisketch API will also return a nullptr in case of memory allocation failure inside library. I don’t know how to tread it, but definitively not crashing on it :/

in src/txreconciliation.cpp:515 in 1d8d43ed09 outdated

510+            recon_state->second.m_local_set.GetRelevantIDsFromShortIDs(differences, txs_to_request, txs_to_announce);
511+
512+            // Update local reconciliation state for the peer.
513+            size_t local_set_size = recon_state->second.m_local_set.GetSize();
514+            recon_state->second.FinalizeInitByUs(true,
515+                RecomputeQ(local_set_size, txs_to_request.size(), txs_to_announce.size()));

ariard commented at 5:12 pm on May 7, 2021:

Note, coefficient recomputation doesn’t seem covered by unit tests but sounds to be by functional tests. I would favor covering it unit tests too. An unrelated change in p2p_erlay.py in the future might remove coverage, up to you.

in src/txreconciliation.cpp:39 in 1d8d43ed09 outdated

34+*/
35+constexpr unsigned int RECON_FALSE_POSITIVE_COEF = 16;
36+static_assert(RECON_FALSE_POSITIVE_COEF <= 256,
37+    "Reducing reconciliation false positives beyond 1 in 2**256 is not supported");
38+/** Default value for the coefficient used to estimate reconciliation set differences. */
39+constexpr double DEFAULT_RECON_Q = 0.02;

ariard commented at 5:27 pm on May 7, 2021:

Interesting, default value sounds to diverge from Erlay paper recommendation :

“The coefficient q characterizes earlier reconciliation, so before the very first reconciliation round it is set to zero.”

You find empirically that a small coefficient was better to start with?

naumenkogs commented at 10:34 am on June 29, 2021:

For future conversations, I will refer to my findings here.

ariard commented at 5:35 pm on May 7, 2021: none

Good with unit tests for now, pretty fine coverage :)

sipa commented at 2:58 am on May 17, 2021: member

@naumenkogs #21859 should be sufficiently ready to rebase on, if you want CI-testable code. Note the src/minisketchwrapper module, which automatically benchmarks.

naumenkogs force-pushed on May 31, 2021

naumenkogs force-pushed on Jun 28, 2021

DrahtBot removed the label Needs rebase on Jun 28, 2021

naumenkogs force-pushed on Jun 29, 2021

rebroad commented at 11:07 pm on June 29, 2021: contributor

How can I help test this? I’ve been running it for a few hours so far on mainnet, but haven’t noticed much difference compared to the master branch.

naumenkogs commented at 4:43 pm on June 30, 2021: member

@rebroad It only makes a difference if your erlay-enabled node connects to other erlay-enabled nodes :) I’m planning to invite people to a such sub-network soon (where we can connect to each other by –addnode manually). I’ll let you know once I start that :)

You can obviously run several nodes with this patch by yourself too, just make sure to connect them.

naumenkogs force-pushed on Jun 30, 2021

naumenkogs force-pushed on Jul 1, 2021

naumenkogs force-pushed on Jul 2, 2021

naumenkogs force-pushed on Jul 4, 2021

in doc/README.md:33 in 1a0868c90d outdated

29@@ -30,8 +30,8 @@ Drag Bitcoin Core to your applications folder, and then run Bitcoin Core.
30 
31 * See the documentation at the [Bitcoin Wiki](https://en.bitcoin.it/wiki/Main_Page)
32 for help and more information.
33-* Ask for help on the [Bitcoin StackExchange](https://bitcoin.stackexchange.com)
34-* Ask for help on [#bitcoin](https://webchat.freenode.net/#bitcoin) on Freenode. If you don't have an IRC client, use [webchat here](https://webchat.freenode.net/#bitcoin).
35+* Ask for help on [Bitcoin StackExchange](https://bitcoin.stackexchange.com).

kiminuo commented at 8:53 am on July 5, 2021:

nit: Why is this change here?

naumenkogs commented at 2:51 pm on July 5, 2021:

oops, definitely not indented. Probably a rebasing mistake.

in test/functional/test_runner.py:330 in 1a0868c90d outdated

338-    parser.add_argument('--keepcache', '-k', action='store_true', help='the default behavior is to flush the cache directory on startup. --keepcache retains the cache from the previous testrun.')
339-    parser.add_argument('--quiet', '-q', action='store_true', help='only print dots, results summary and failure logs')
340-    parser.add_argument('--tmpdirprefix', '-t', default=tempfile.gettempdir(), help="Root directory for datadirs")
341-    parser.add_argument('--failfast', action='store_true', help='stop execution after the first test failure')
342-    parser.add_argument('--filter', help='filter scripts to run by regular expression')
343+    parser.add_argument('--ansi', action='store_true', default=sys.stdout.isatty(),

kiminuo commented at 8:57 am on July 5, 2021:

Nit: Is it a good idea to reformat this file? Maybe that can be done in a separate PR.

naumenkogs commented at 2:51 pm on July 5, 2021:

oops, definitely not indented. Probably a rebasing mistake.

naumenkogs force-pushed on Jul 6, 2021

DrahtBot added the label Needs rebase on Jul 7, 2021

naumenkogs force-pushed on Jul 15, 2021

naumenkogs force-pushed on Jul 20, 2021

DrahtBot removed the label Needs rebase on Jul 20, 2021

rebroad commented at 4:07 pm on July 23, 2021: contributor

I’ve been testing this, and so far I’ve found that the efficiency is reduced with this patch - i.e. a lower PERCENTAGE of the bandwidth is used for ACCEPTED transactions compared to not using this patch. Therefore it is better NOT to use this patch. Therefore, NACK, for now, perhaps premature.

I’ll continue testing, and share my data in the near future.

sipa commented at 7:29 pm on July 23, 2021: member

@rebroad That’s interesting, and surprising. Can you share your data? Which other Erlay-capable nodes have you been testing with? Substantial improvements are only expected once significant subgraphs adopt it.

ghost commented at 8:34 pm on July 23, 2021: none

rebroad:

How can I help test this? I’ve been running it for a few hours so far on mainnet, but haven’t noticed much difference compared to the master branch.

I’ve been testing this, and so far I’ve found that the efficiency is reduced with this patch - i.e. a lower PERCENTAGE of the bandwidth is used for ACCEPTED transactions compared to not using this patch. Therefore it is better NOT to use this patch. Therefore, NACK.

sipa:

Can you share your data?

+1 I love NACK data. Please share else NACK is meaningless.

naumenkogs commented at 2:23 pm on July 24, 2021: member

@sipa @prayank23 @rebroad I’ve spent last couple days running my own subnetwork of Erlay nodes connected to mainnet. I was slightly tuning implementation/configurations and observing savings of 20-50% of overall bandwidth.

I can imagine this PR with no modifications yeilding suboptimal results. I will update the PR with good actual-peformance (according to my real-time experiments) in the next few days, and bring that up at the meeting.

I think judging/nacking this based on the perf is premature.

Sorry for spending your time testing this commit. At this point, the PR was open mainly for concept/code review rather than pefr (maybe it could have been a draft PR, yeah).

naumenkogs marked this as a draft on Jul 24, 2021

naumenkogs force-pushed on Aug 3, 2021

rebroad commented at 10:19 am on August 4, 2021: contributor

I’m now running the latest erlay patch on 2 nodes, and created a manual connection between them. Compared to non-erlay nodes, the percentage of mempool-successful receive traffic seems below average, and the upload bps seems significantly higher than for the other nodes.

Update (5th August 2021) - still getting very poor TX percentage compared to regular nodes - and a lot of wasted bandwidth - far more than any other connection. I get better percentage with an onion connection even though it has a much larger minimum ping - usually the lower the ping the higher the TX percentage, but not with erlay!

naumenkogs force-pushed on Aug 5, 2021

naumenkogs force-pushed on Aug 7, 2021

naumenkogs force-pushed on Aug 8, 2021

naumenkogs force-pushed on Aug 11, 2021

jonatack commented at 5:54 pm on August 17, 2021: member

Debug build is clean. I’ll try to start looking at this soon.

naumenkogs force-pushed on Aug 19, 2021

naumenkogs commented at 6:37 pm on August 19, 2021: member

I created 2 discussion boards, please discuss not-code-related topics there:

rebroad commented at 3:04 pm on August 20, 2021: contributor

After running this for a few minutes, I’m so far seeing better performance from non-erlay nodes, but perhaps we need more erlay nodes for testing to really see if there’s really an improvement.

naumenkogs commented at 5:17 pm on August 20, 2021: member

@rebroad no, the bandwidth savings should come immediately for your node. Maybe the difference is that you run too many conns (I always tested 8/12 erlay conns and that’s it)…

We could also try to understand why it eats more bandwidth in your case, but perhaps we should stick to the realistic case? E.g., 4 erlay conns + 4 legacy conns for.

Anyway, could we continue the bandwidth-related discussion here?

naumenkogs force-pushed on Sep 17, 2021

naumenkogs marked this as ready for review on Sep 20, 2021

rebroad commented at 9:10 am on September 20, 2021: contributor

Maybe this is an “the emperor has no clothes moment”, but I shall say what’s on my mind without fear of appearing stupid.

To me, this change looks like the improvements are simply due to reducing the delay between sending TXs. With this change I see greater bandwidth, greater TX throughput but reduced efficiency (i.e. percentage of useful bandwidth). I’ve been seeing this since I started testing months ago.

For example, there’s a node in Toronto currently that is managing to be the main provider of TXs, and is not using erlay. I suspect they have changed the code to reduce the delay between TXs:-

This seems easily achieveable without erlay. Erlay also reduces the delay between sending TXs, so it’s an easy hack to get better TX throughput, but the difference with erlay is that it also reduces efficiency (in all my tests so far).

Erlay does well with throughput:-

but badly with efficency:-

I’d like to see some test results from others that contradict mine though.

DrahtBot added the label Needs rebase on Sep 30, 2021

naumenkogs force-pushed on Nov 5, 2021

DrahtBot removed the label Needs rebase on Nov 5, 2021

DrahtBot added the label Needs rebase on Nov 10, 2021

fanquake referenced this in commit c1fb30633b on Nov 12, 2021

jnewbery commented at 10:59 am on November 12, 2021: contributor

Minisketch is merged! Please rebase :)

maflcko commented at 11:07 am on November 12, 2021: member

#23491 will conflict with this pull after rebase. Let me know if this acceptable or if you prefer to defer 23491 until after all other changes are merged.

fanquake referenced this in commit d0923098c6 on Nov 16, 2021

naumenkogs force-pushed on Nov 17, 2021

DrahtBot removed the label Needs rebase on Nov 17, 2021

rebroad commented at 1:20 pm on November 25, 2021: contributor

what is the 2021-11-erlay1 branch please? confused as it looks more recent (from the name) but this one is also getting updated.

rebroad commented at 7:26 pm on November 25, 2021: contributor

This does not compile:-

 08d4354499f39a49bbb8675494c532802ef034b03 is the first bad commit
 1commit 8d4354499f39a49bbb8675494c532802ef034b03
 2Author: Gleb Naumenko <naumenko.gs@gmail.com>
 3Date:   Sat Mar 20 17:36:08 2021 +0200
 4
 5    Respond to a reconciliation request
 6
 7    When the time comes, we should send a sketch of our
 8    local reconciliation set to the reconciliation initiator.
 9
10 src/net_processing.cpp   | 15 +++++++++++++++
11 src/protocol.cpp         |  2 ++
12 src/protocol.h           |  6 ++++++
13 src/txreconciliation.cpp | 50 +++++++++++++++++++++++++++++++++++++++++++++++-
14 src/txreconciliation.h   |  8 ++++++++
15 5 files changed, 80 insertions(+), 1 deletion(-)
168d4354499f39a49bbb8675494c532802ef034b03 is the first bad commit

naumenkogs commented at 9:53 am on November 29, 2021: member

@rebroad hey

what is the 2021-11-erlay1 branch please? confused as it looks more recent (from the name) but this one is also getting updated.

That branch is a first batch of this PR commits, forked to be merged separately within that PR. I try to keep the OP PR up-to-date with that one, but also not too often to avoid too many notifications here.

8d4354499f39a49bbb8675494c532802ef034b03 is the first bad commit

Yeah I somehow can’t reproduce CI compiler errors neither on my mac nor on my linux, are you referring to those as well? Or can you share your compilation output?

naumenkogs force-pushed on Nov 29, 2021

glozow commented at 2:13 pm on November 29, 2021: member

Concept ACK, thanks for splitting the big PR up :) Will review soon.

naumenkogs commented at 2:57 pm on November 29, 2021: member

Fixing new CI issues here, for now please review the first batch separately here :)

naumenkogs force-pushed on Dec 1, 2021

DrahtBot added the label Needs rebase on Jan 6, 2022

in src/txreconciliation.cpp:32 in 5728fac4d3 outdated

27+constexpr unsigned int BYTES_PER_SKETCH_CAPACITY = RECON_FIELD_SIZE / 8;
28+/**
29+ * Limit sketch capacity to avoid DoS. This applies only to the original sketches,
30+ * and implies that extended sketches could be at most twice the size.
31+ */
32+constexpr uint32_t MAX_SKETCH_CAPACITY = 2 << 12;

dergoegge commented at 1:27 pm on February 13, 2022:

The value chosen here seems too large based on some tests that i did.

On my laptop (with a decent CPU) a sketch with a capacity of 2 << 13 and 2 << 13 differences (possible if an extension is requested) took ~20 seconds to decode. On lower end devices this will likely be worse. This is not super bad since sketches are only requested on outbound connections but having the msghand thread block like this is undesirable in any case.

I am not entirely sure what an appropriate value would be but given the expected average set difference of 7 i would assume that 256 - 1024 as the max. sketch capacity would suffice.

naumenkogs commented at 8:11 am on February 15, 2022:

Excellent point. I will address this once I have to rebase these changes.

in src/Makefile.test.include:156 in 5728fac4d3 outdated

139@@ -140,6 +140,7 @@ BITCOIN_TESTS =\
140   test/transaction_tests.cpp \
141   test/txindex_tests.cpp \
142   test/txpackage_tests.cpp \
143+  test/txreconciliation_tests.cpp \

jonatack commented at 3:28 pm on May 7, 2022:

This line should be in the earlier commit, “test: Add unit tests for reconciliation negotiation” (with it, the tests do pass).

0$ ./src/test/test_bitcoin -t txreconciliation_tests
1Test setup error: no test cases matching filter or all test cases were disabled

0--- a/src/Makefile.test.include
1+++ b/src/Makefile.test.include
2@@ -144,6 +144,7 @@ BITCOIN_TESTS =\
3   test/transaction_tests.cpp \
4   test/txindex_tests.cpp \
5   test/txpackage_tests.cpp \
6+  test/txreconciliation_tests.cpp \
7   test/txrequest_tests.cpp \

0$ ./src/test/test_bitcoin -t txreconciliation_tests
1Running 4 test cases...
2
3*** No errors detected

jonatack commented at 4:00 pm on May 7, 2022: member

Here is a rebase of the pull at https://github.com/jonatack/bitcoin/commits/erlay-PR21515-rebase, if helpful. It may be good to keep this parent PR up to date to provide an overall view of the proposed changes. At first glance it also looks like there are some changes in #23443 that are not present here.

Due to the net_processing renamings and refactoring changes in the interim, there were quite a few updates and some silent merge conflicts that only show up when building each commit, so please reverify: git range-diff 59ac8ba 5728fac 9d93cbe

Doing this helped become acquainted with the changes. Next will begin review of #23443.

Independently of the rebase, most of the commits have one or several build -Wunused-function warnings and re-organizing is needed to add functions in the same commit where they are first used (which facilitates review as well). In order:

Beginning with commit “Add helper to compute reconciliation salt”:

0txreconciliation.cpp:21:16: warning: unused function 'ComputeSalt' [-Wunused-function]
1static uint256 ComputeSalt(uint64_t local_salt, uint64_t remote_salt)
2               ^
31 warning generated.

Beginning with commit “Add helper to store transactions in recon set”:

0txreconciliation.cpp:40:10: warning: unused member function 'Clear' [-Wunused-member-function]
1    void Clear() {
2         ^
31 warning generated.

Beginning with commit “Add helper to see whether we should respond to recon request”:

0txreconciliation.cpp:77:10: warning: unused member function 'Clear' [-Wunused-member-function]
1    void Clear() {
2         ^
3txreconciliation.cpp:118:10: warning: unused member function 'ConsiderInitResponseAndTrack' [-Wunused-member-function]
4    bool ConsiderInitResponseAndTrack() {
5         ^
62 warnings generated.

Beginning with commit “Add helper to compute reconciliation tx short id”:

0txreconciliation.cpp:77:10: warning: unused member function 'Clear' [-Wunused-member-function]
1    void Clear() {
2         ^
3txreconciliation.cpp:118:10: warning: unused member function 'ConsiderInitResponseAndTrack' [-Wunused-member-function]
4    bool ConsiderInitResponseAndTrack() {
5         ^
6txreconciliation.cpp:172:14: warning: unused member function 'ComputeShortID' [-Wunused-member-function]
7    uint32_t ComputeShortID(const uint256 wtxid) const
8             ^
93 warnings generated.

Beginning with commit “Add helper to compute sketches for tx reconciliation”:

 0txreconciliation.cpp:103:10: error: unused member function 'Clear' [-Werror,-Wunused-member-function]
 1    void Clear() {
 2         ^
 3txreconciliation.cpp:145:10: error: unused member function 'ConsiderInitResponseAndTrack' [-Werror,-Wunused-member-function]
 4    bool ConsiderInitResponseAndTrack() {
 5         ^
 6txreconciliation.cpp:161:14: error: unused member function 'EstimateSketchCapacity' [-Werror,-Wunused-member-function]
 7    uint32_t EstimateSketchCapacity(size_t local_set_size) const
 8             ^
 9txreconciliation.cpp:224:16: error: unused member function 'ComputeSketch' [-Werror,-Wunused-member-function]
10    Minisketch ComputeSketch(uint32_t& capacity)               ^

Beginning with commit “Add a function to identify local/remote missing txs”:

0txreconciliation.cpp:108:10: error: unused member function 'GetRelevantIDsFromShortIDs' [-Werror,-Wunused-member-function]
1    void GetRelevantIDsFromShortIDs(const std::vector<uint64_t>& diff,
2         ^

Beginning with commit “Add a function to announce transactions after reconciliation”:

0net_processing.cpp:2640:23: error: unused member function 'AnnounceTxs' [-Werror,-Wunused-member-function]
1void PeerManagerImpl::AnnounceTxs(std::vector<uint256> remote_missing_wtxids, CNode& pto)
2                      ^
31 error generated.

Beginning with commit “Be ready to receive sketch extension”:

0  CXX      libbitcoin_node_a-validation.o
1txreconciliation.cpp:304:16: warning: unused member function 'ComputeExtendedSketch' [-Wunused-member-function]
2    Minisketch ComputeExtendedSketch(uint32_t extended_capacity)
3               ^
41 warning generated.

Beginning with commit “Add a function to get wtxids by shortids”:

0txreconciliation.cpp:137:26: warning: unused member function 'GetWTXIDsFromShortIDs' [-Wunused-member-function]
1    std::vector<uint256> GetWTXIDsFromShortIDs(const std::vector<uint32_t>& remote_missing_short_ids) const
2                         ^
31 warning generated.

naumenkogs force-pushed on May 11, 2022

DrahtBot removed the label Needs rebase on May 11, 2022

naumenkogs force-pushed on May 12, 2022

DrahtBot added the label Needs rebase on May 19, 2022

glozow commented at 6:28 pm on October 12, 2022: member

Should this be marked as draft since it’s a parent/meta PR?

naumenkogs marked this as a draft on Oct 13, 2022

glozow referenced this in commit e7a0e96271 on Oct 17, 2022

naumenkogs force-pushed on Nov 16, 2022

DrahtBot removed the label Needs rebase on Nov 16, 2022

DrahtBot added the label Needs rebase on Nov 30, 2022

refactor: Add a pre-mutexed version of IsPeerRegistered

The pre-mutexed version is useful for external calls, while
the regular version will be used internally.

101ed15efa

p2p: Functions to add/remove wtxids to tx reconciliation sets

They will be used later on.

e7479e9b02

p2p: Add transactions to reconciliation sets

Transactions eligible for reconciliation are added to the
reconciliation sets. For the remaining txs, low-fanout is used.

5b886367c8

p2p: Introduce reconciliation-fanout tx broadcast interval

For a subset of reconciling peers we announce transactions
via low fanout. We need to set lower intervals for that to
achieve lower relay latency.

Note that for privacy reasons the ratio between inbound and outbound
delays matter much more than the actual delays. That ratio is preserved
here, so it is not a privacy degradation.

89de0f71dc

p2p: Add peers to reconciliation queue on negotiation

When we're finalizing negotiation, we should add the peers
for which we will initiate reconciliations to the queue.

0ec1b427ec

p2p: Track reconciliation requests schedule

We initiate reconciliation by looking at the queue periodically
with equal intervals between peers to achieve efficiency.

This will be later used to see whether it's time to initiate.

4b57aef2f1

p2p: Initiate reconciliation round

When the time comes for the peer, we send a
reconciliation request with the parameters which
will help the peer to construct a (hopefully) sufficient
reconciliation sketch for us. We will then use that
sketch to find missing transactions.

272f0d5fa8

test: functional test for reqtxrcncl e710451051

p2p: Handle reconciliation request

Store the parameters the peer sent us inside the
reconciliation request.

9e0c8cd0a3

Add helper to see whether we should respond to recon request

For initial reconciliatin requests we track whether was the last time
we responded to them, and if it was too recent, we respond after
a small delay.

f99fa46911

Add helper to compute reconciliation tx short id b68cada9db

build: Link minisketch libraries e5d8e4a5be

Add helper to compute sketches for tx reconciliation 9d27b63575

naumenkogs force-pushed on Sep 11, 2023

DrahtBot added the label CI failed on Sep 11, 2023

DrahtBot removed the label Needs rebase on Sep 11, 2023

naumenkogs force-pushed on Sep 12, 2023

Respond to a reconciliation request

When the time comes, we should send a sketch of our
local reconciliation set to the reconciliation initiator.

e2fbf20193

p2p: Add a function to identify local/remote missing txs

When the sketches from both sides are combined successfully,
the diff is produced. Then this diff can (together with the local txs)
be used to identified which transactions are missing locally and remotely.

89312ab824

Use txid/uint256 in CompareInvMempoolOrder

This will help to reuse the code later on in the function
to announce transactions.

dab16090ea

p2p: Handle reconciliation sketch and successful decoding 2edc141a2a

p2p: Request extension if decoding failed

If after decoding a reconciliation sketch it turned out
to be insufficient to find set difference, request extension.

c227ba23f7

Be ready to receive sketch extension

Store the initial sketches so that we are able to process
extension sketch while avoiding transmitting the same data.

8dc4cd2238

p2p: Prepare for sketch extension request

To be ready to respond to a sketch extension request
from our peer, we should store a snapshot of our state
and capacity of the initial sketch, so that we compute
extension of the same size and over the exact same
transactions.

Transactions arriving during this reconciliation will
be instead stored in the regular set.

a6e4a13372

p2p: Keep track of announcements during txrcncl extension 41b3325d16

p2p: Handle reconciliation extension request

If peer failed to reconcile based on our initial response sketch,
they will ask us for a sketch extension. Store this request to respond later.

db671c63cc

p2p: Respond to sketch extension request

Sending an extension may allow the peer to reconcile
transactions, because now the full sketch has twice
as much capacity.

44b83dbbc8

Handle sketch extension

If a peer sent us an extension sketch, we should
reconstruct a full sketch from it with the snapshot
we stored initially, and attempt to decode the difference.

p2p: Handle sketch extension

If a peer sent us an extension sketch, we should
reconstruct a full sketch from it with the snapshot
we stored initially, and attempt to decode the difference.

14b2eac934

p2p: Add a finalize incoming reconciliation function

This currently unused function is supposed to be used once
a reconciliation round is done. It cleans the state corresponding
to the passed reconciliation.

b11a05219a

Handle reconciliation finalization message

Once a peer tells us reconciliation is done, we should behave as follows:
- if it was successful, just respond them with the transactions they asked
  by short ID.
- if it was a full failure, respond with all local transactions from the reconciliation
  set snapshot
- if it was a partial failure (only low or high part was failed after a bisection),
  respond with all transactions which were asked for by short id,
  and announce local txs which belong to the failed chunk.

62647d9a3c

naumenkogs force-pushed on Sep 13, 2023

p2p, test: Add tx reconciliation functional tests b4824ec6c4

test: Update mempool_packages.py test for reconciliation

First, extend the timeout to facilitate reconciliation
relay which might take longer than previously.

Second, since reconciliation can't guarantee that
descendants of same-degree (from the parent) are
relayed in the same order they were received, loosen
the check for descendant limit.
This is the case because, if the parent is flooded,
the children might be both flooded and reconciled,
and the decision is made independently. So even if
one arrived earlier, it can be relayed later because
flooding.
Note, it's possible to make txreconciliation.cpp be aware
of the order of adding transactions to the set, but this
won't help to solve the above issue.

3e08dab56b

naumenkogs force-pushed on Sep 14, 2023

DrahtBot added the label Needs rebase on Nov 28, 2023

DrahtBot commented at 12:44 pm on November 28, 2023: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

DrahtBot commented at 2:02 am on February 26, 2024: contributor

⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.

DrahtBot commented at 0:19 am on May 25, 2024: contributor

⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.

fanquake closed this on Jun 21, 2024

bitcoin locked this on Jun 21, 2025

Erlay: bandwidth-efficient transaction relay protocol #21515

Abstract

How to review

Reviews

Conflicts