naumenkogs
commented at 8:01 pm on March 4, 2020:
member
This is an implementation of Erlay, using primitives in the BIP-330 (see the updated spec here). Please refer to these two to understand the design. My talk on the topic is here.
Erlay uses both flooding (announcing using INV messages to all peers) and reconciliation to announce transactions. Flooding is expensive, so Erlay seeks to use it sparingly and in strategic locations - only well-connected publicly reachable nodes flood transactions to other publicly reachable nodes via outbound connections. Since every unreachable node is directly connected to several reachable nodes, this policy ensures that a transaction is quickly propagated to be within one hop from most of the nodes in the network.
All transactions not propagated through flooding are propagated through efficient set reconciliation. To do this, every node keeps a reconciliation set for each peer, in which transactions are placed which would have been announced using INV messages absent this protocol. Every 2 seconds every node chooses a peer from its outbound connections in a predetermined order to reconcile with, resulting in both sides learning the transactions known to the other side. After every reconciliation round, the corresponding reconciliation set is cleared.
I think both paper and the BIP motives the changes, but I’ll mention them briefly once again here:
save 40% of the bandwidth consumed by a node
increase network connectivity for almost no bandwidth or latency cost
improves privacy as a side-effect
Obviously looking for review, let’s try to start with a high-level concerns, and keep nits for later.
P.S.
Please don’t be scared of 8,000 LOC added. 7,000 of them is minisketch added as a subtree.
P.P.S.
My experiments of running this code live (slightly outdated) with a script to replicate the experiment: here1 and here2.
luke-jr
commented at 8:04 pm on March 4, 2020:
member
How does it react to diverse network policies?
Empact
commented at 8:13 pm on March 4, 2020:
member
Concept ACK
DrahtBot added the label
Needs rebase
on Mar 4, 2020
in
src/net_processing.cpp:3440
in
b93eb384cdoutdated
When it comes to network policies, I’m using the same code originally used by regular gossip (“Determine transactions to relay” in net_processing.cpp). So nothing should be lost or sent wastefully as a result of policy discrepancies.
gmaxwell
commented at 10:57 pm on March 8, 2020:
contributor
@luke-jr I’m understanding your question as being inspired by earlier ‘mempool sync’ ideas that would bleed bandwidth if there was a long lasting mempool discrepancy.
Erlay isn’t a mempool sync. It’s uses a way of communicating lists of things you want to relay which only takes bandwidth proportional to the difference rather than the total size. So there is no on-going bandwidth usage due to differences in mempool content. Bandwidth is used is roughly Antx_relayed + Bpeers*(difference_in_tx_relayed) + C*peers. for some constants A,B,C.
If a peer has a radically different relay policy than you, it works fine and continues to save bandwidth below what usage would be without erlay even though the erlay savings itself comes largely from eliminating data that both sides would send.
Does that answer your question?
practicalswift
commented at 7:57 pm on March 9, 2020:
contributor
@naumenkogs When trying out this PR I ran in to two small testing issues:
The suffix of the functional test p2p_erlay is .p2p (p2p_erlay.p2p) instead of the expected .py (p2p_erlay.py) :)
It seems like make check runs the minisketch binaries test-exhaust and test-exhaust-verify. The running times of these are quite substantial - is there some way to do a quick sanity check as part of make check instead of exhaustive testing? :)
sipa
commented at 8:48 pm on March 9, 2020:
member
@practicalswift The unit test minisketch binaries actually run forever. I need to fix that.
practicalswift
commented at 10:10 pm on March 9, 2020:
contributor
I did some robustness testing of this code by pulling in PRs #17989 (ProcessMessage(…) fuzzer). and #18288 (MemorySanitizer) and found an use of uninitialized memory (UUM) that is remotely triggerable.
You can reproduce the issue by pulling in the commits from those PR:s and simply run:
0$ src/test/fuzz/process_message
1…
The issue will be hit within a few seconds: libFuzzer is amazing :)
Notice also how libFuzzer will automatically find the newly added message names (wtxidrelay, sendrecon, reqrecon, sketch, reqbisec and reconcildiff) and probe using them all! The fuzzer harness does not need to be teached about the existence of those :)
Perhaps this UUM is the reason of the intermittent test failure you’re seeing?
I encourage everybody to review (or at least Concept ACK :)) #17989 (ProcessMessage(…) fuzzer). and #18288 (MemorySanitizer): having them merged would be great for robustness/security.
naumenkogs force-pushed
on Mar 10, 2020
naumenkogs force-pushed
on Mar 11, 2020
practicalswift
commented at 4:45 pm on March 11, 2020:
contributor
Needs rebase :)
naumenkogs force-pushed
on Mar 11, 2020
naumenkogs
commented at 7:26 pm on March 11, 2020:
member
I made some latest changes to make sure it can be plugged into a real network.
Please help with testing this by running a couple inter-connected Erlay nodes, and observing bandwidth (and ideally tx relay latency).
@practicalswift I was planning to rebase once #18044 is merged.
naumenkogs force-pushed
on Mar 14, 2020
naumenkogs
commented at 5:41 pm on March 15, 2020:
member
I ran 2 non-reachable nodes for around 24 hours, one is regular wtxid-node connected to legacy nodes, one is Erlay node connected to reconciliation-supporting nodes on the same host.
Bandwidth breakdown (in megabytes):
Legacy (sent)
Erlay (sent)
Legacy (received)
Erlay (received)
inv
38
0.4
22
5.3
getdata
6
5.7
1
0
sketch
-
-
-
1.2
reqrecon, reqbisec, reconcildiff
-
0.7
-
-
tx, blocktxn
3
0.3
75
75
total (incl. other)
48
7.1
103
84
Notice overall 40% bandwidth saving.
Please help by running similar experiments and sharing bandwidth results :)
Here’s the script I hacked together for bandwidth analysis (run nodes with debug=1)
Please sanitize your results before publishing: sometimes there’s noisy bandwidth like extra blocks due to some reasons I’m not aware of.
naumenkogs
commented at 2:12 am on March 18, 2020:
member
More experiments: now I ran the same 2 nodes for 24 hours, but connected to 16 instead of 8 nodes (yeah, all 16 peers of Erlay node support reconciliation).
Legacy wtxid-relay node spent 150 megabytes for announcements, while Erlay node spent 24 megabytes. Since these 2 days might have had different activity, it makes sense to compare the growth.. Legacy grew 2.23x (I guess due to more INVs in total), Erlay grew 1.8x. So, as expected, not only it’s decent savings comparing to legacy, it also grows slower with connectivity.
Based on the following facts I also make a guess that there’s no point in tweaking Erlay forward for better performance:
only 0.0016 reconciliation failed (fully or partially) — meaning we don’t underestimate much
only 15% of the announcement bandwidth is sketches — meaning we don’t overestimate much
only 15% of the announcement bandwidth is outbound invs (which are much more likely to still be a little wasteful, because those peers are better connected than us and likely don’t need those announemens)
Finally, I measured delay in receiving transactions. I take the earliest time I received a transaction, and take latency from there, comparing to:
an average receiving time at all legacy nodes: 0.92s
at the Erlay node: 1.47s
I think this is expected and fine, as we talk in the paper, but let me know if you think differently.
naumenkogs force-pushed
on Mar 19, 2020
DrahtBot removed the label
Needs rebase
on Mar 20, 2020
DrahtBot
commented at 1:01 am on March 20, 2020:
member
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Conflicts
Reviewers, this pull request conflicts with the following ones:
#21148 (Split orphan handling from net_processing into txorphanage by ajtowns)
#21015 (Make all of net_processing (and some of net) use std::chrono types by dhruv)
#20892 (tests: Run both descriptor and legacy tests within a single test invocation by achow101)
#20758 (net-processing refactoring – lose globals, move implementation details from .h to .cpp by ajtowns)
#20726 (p2p: Add DISABLETX message for negotiating block-relay-only connections by sdaftuar)
#20721 (Net: Move ping data to net_processing by jnewbery)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
DrahtBot added the label
Needs rebase
on Mar 30, 2020
mzumsande
commented at 8:32 pm on April 19, 2020:
member
Several existing functional tests are failing for me while synchronizing mempools via Erlay. I looked at the log of wallet_backup.py which has a rather simple setup of some nodes creating txes and then expecting all nodes to synchronize:
In my run, one node would behave strangely during reconciliations, sending GETDATA for a transaction with a wtxid 000000003f946000000000003f947ae147ae147b000000000000000100000000 even though its peer had created a tx with a different (and actually random) wtxid. As a result, NOTFOUND is sent and the synchronization never happens.
I haven’t been able to dig in deep enough yet to understand the cause, but there still seems to be a bug somewhere.
naumenkogs
commented at 9:42 pm on April 19, 2020:
member
@mzumsande thank you for taking a look! I’ll do my best to find the cause.
Some external eyes looking at the code would definitely help to both confirm the approach and troubleshoot issues like this :)
in
src/net.cpp:760
in
1b695eabc2outdated
749+ for (size_t i = 0; i < remote_missing_short_ids.size(); ++i) {
750+ auto local_tx = m_recon_state->local_short_id_mapping.find(remote_missing_short_ids[i]);
751+ if (local_tx == m_recon_state->local_short_id_mapping.end()) {
752+ LogPrint(BCLog::NET, "Unknown transaction requested by short id=%d by peer=%d\n", remote_missing_short_ids[i], GetId());
753+ }
754+ remote_missing.push_back(local_tx->second);
I think that this is UB (if no wtxid for a shortid is found, there is a LogPrint, but the item is still dereferenced).
While this should explain the strange wtxids with lots of 0s, it doesn’t explain the why no mapping is found to begin with.
naumenkogs
commented at 5:17 pm on April 21, 2020:
I found an issue, will commit the fix later today!
Basically, sometimes reconcil. difference finding gives a false positive: it looks like a success, but some difference elements are missing.
This is just a natural property of math in minisketch. I’ll submit a workaround.
naumenkogs
commented at 6:13 pm on April 21, 2020:
Still not sure why it would be UB here? It just accesses the existing array element..
I may be off, but if local_tx == m_recon_state->local_short_id_mapping.end(), local_tx->second is still accessed after the LogPrint - isn’t that illegal?
naumenkogs
commented at 6:46 pm on April 21, 2020:
Oh, i see you are definitely right, it’s a bug!
naumenkogs force-pushed
on Apr 21, 2020
naumenkogs force-pushed
on Apr 21, 2020
DrahtBot removed the label
Needs rebase
on Apr 21, 2020
naumenkogs force-pushed
on Apr 21, 2020
naumenkogs
commented at 8:31 pm on April 21, 2020:
member
Alright, all the tests should be passing now, also rebased.
There’s one last test that fails consistently: p2p_segwit.py.
I’ll take a look at that, and upgrade the code for recent minisketch (with reconciliation capacity checksum thing) soon.
One good idea for the future: test that legacy nodes talk OK alright to erlay nodes (I checked it only manually).
sipa
commented at 8:40 pm on April 21, 2020:
member
in
src/net_processing.cpp:3492
in
0a262c092foutdated
3446@@ -3417,6 +3447,274 @@ bool ProcessMessage(CNode* pfrom, const std::string& msg_type, CDataStream& vRec
3447 return true;
3448 }
34493450+ std::chrono::microseconds current_time = GetTime<std::chrono::microseconds>();
3451+
3452+ // Record an (expected) reconciliation request with parameters to respond when time comes.
3453+ // All initial reconciliation responses will be done at the same time to prevent tx-related privacy leaks.
3454+ if (strCommand == NetMsgType::REQRECON) {
3455+ if (!pfrom->m_recon_state->sender) return true;
One more potential issue: If I understand it correctly, m_recon_state is null if our peer does not signal Erlay support. So we should check if m_recon_state is non-null before accessing its members here.
Otherwise an attacker could not signal Erlay but then send Erlay-specific messages to crash our node. Same applies for the other new messages NetMsgType::SKETCH etc. below.
naumenkogs
commented at 0:50 am on April 22, 2020:
Definitely an issue, will address!
I had asserts all over, and then was replacing them with returns, but this one requires 2 checks for m_recon_state and for sender :)
DrahtBot added the label
Needs rebase
on Apr 25, 2020
naumenkogs force-pushed
on Apr 27, 2020
naumenkogs force-pushed
on Apr 27, 2020
naumenkogs force-pushed
on Apr 27, 2020
naumenkogs force-pushed
on Apr 27, 2020
DrahtBot removed the label
Needs rebase
on Apr 27, 2020
naumenkogs force-pushed
on Apr 27, 2020
naumenkogs force-pushed
on Apr 27, 2020
naumenkogs force-pushed
on Apr 28, 2020
DrahtBot added the label
Needs rebase
on Apr 29, 2020
naumenkogs
commented at 2:15 pm on May 7, 2020:
member
This should be in reviewable state now: I split commits more, fixed all the tests, made it more stable, addressed feedback by @mzumsande, updated with new helpers from minisketch.
Travis failed last time because of the little issue with minisketch release build, but it should work just fine for local testing and experiments!
Will rebase again when #18044 is rebased to not create much discrepancy there.
naumenkogs referenced this in commit
311c085cab
on Jun 6, 2020
in
src/protocol.h:410
in
5f81325bf1outdated
407 {
408 UNDEFINED = 0,
409 MSG_TX = 1,
410 MSG_BLOCK = 2,
411- // The following can only occur in getdata. Invs always use TX or BLOCK.
412+ MSG_WTX = 5, // Defined in BIP XXX
233@@ -234,6 +234,46 @@ extern const char *GETBLOCKTXN;
234 * @since protocol version 70014 as described by BIP 152
235 */
236 extern const char *BLOCKTXN;
237+/**
238+ * Indicates that a node prefers to relay transactions via wtxid, rather than
239+ * txid.
240+ * @since protocol version 70016 as described by BIP XXX.
DrahtBot removed the label
Needs rebase
on Jul 26, 2020
naumenkogs
commented at 7:35 am on July 28, 2020:
member
Rebased on #18044, but we will probably get #19184 before Erlay first, so please review that first.
Although this PR is in a reviewable state right now!
This PR currently doesn’t pass travis tests because of the minisketch build issue after the recent patch. Should work locally and shouldn’t stop people from reviewing this work :)
DrahtBot added the label
Needs rebase
on Jul 30, 2020
naumenkogs force-pushed
on Aug 27, 2020
naumenkogs force-pushed
on Aug 27, 2020
DrahtBot removed the label
Needs rebase
on Aug 27, 2020
DrahtBot added the label
Needs rebase
on Sep 2, 2020
practicalswift
commented at 9:08 am on October 2, 2020:
contributor
It seems like the tests are failing and that a rebase is needed.
I’m super excited about Erlay! :) What is the plan ahead?
naumenkogs
commented at 11:26 am on October 2, 2020:
member
@practicalswift we kinda agreed that #19988 is a prerequisite, I hope Erlay becomes a priority right after :)
naumenkogs marked this as a draft
on Oct 5, 2020
naumenkogs force-pushed
on Nov 16, 2020
naumenkogs force-pushed
on Nov 16, 2020
naumenkogs force-pushed
on Nov 16, 2020
naumenkogs force-pushed
on Nov 16, 2020
naumenkogs force-pushed
on Nov 16, 2020
naumenkogs force-pushed
on Nov 17, 2020
naumenkogs force-pushed
on Nov 17, 2020
naumenkogs marked this as ready for review
on Nov 17, 2020
naumenkogs
commented at 1:34 pm on November 17, 2020:
member
Rebased, added a bunch of comments and restructured commits for better review.
Marking it as reviewable now, although my fight with Travis builds is not over yet :)
DrahtBot removed the label
Needs rebase
on Nov 17, 2020
naumenkogs force-pushed
on Nov 18, 2020
naumenkogs force-pushed
on Nov 19, 2020
DrahtBot added the label
Needs rebase
on Nov 19, 2020
naumenkogs
commented at 5:49 pm on November 19, 2020:
member
Me and @sipa decided to switch to “extension sketches”: instead of requesting a bisection, someone who failed to decode the difference from the initial sketch, may ask for an extension of the initial sketch.
A responder then have to send syndromes of higher order only.
The main motivation is it seems like bisection adds a lot of complexity (see b5c92a41e4cc0599504cf838d20212f1a403e573).
Extensions have all the same properties after all, but they’re more CPU-costly. We think it’s fine for sketches of small capacity (what we expect to have here) to ignore this disadvantage.
So, for now, please review up to a30e861c9cada8fbc400d07255516839480d9e9d (but also feel free to review further if you’re curious).
I will replace bisection with extensions and update the BIP soon.
naumenkogs force-pushed
on Nov 19, 2020
DrahtBot removed the label
Needs rebase
on Nov 19, 2020
DrahtBot added the label
Needs rebase
on Nov 20, 2020
naumenkogs
commented at 9:39 pm on November 21, 2020:
member
Okay, now the BIP PR is up-to-date with this code, and this stuff is ready for full review again.
naumenkogs force-pushed
on Nov 21, 2020
DrahtBot removed the label
Needs rebase
on Nov 21, 2020
DrahtBot added the label
Needs rebase
on Nov 25, 2020
naumenkogs force-pushed
on Nov 25, 2020
DrahtBot removed the label
Needs rebase
on Nov 25, 2020
naumenkogs force-pushed
on Nov 30, 2020
ariard
commented at 1:41 pm on December 1, 2020:
member
Starting the review by digging into the BIP :)
Few remarks :
“improves privacy as a side-effect”, it would be great to explain and document this claim, maybe add a section in Rationale ?
“truncated transaction IDs”, AFAIU, Erlay introduces new shortened transaction identifiers for regular tx-relay, maybe it could be better to call them “truncated WTXID” ? If BIP 330 implies BIP 339, it could be better to remove references to transaction ID or at least when it’s ambiguous
You mentioned switching to extension sketch and mentioned the updated BIP was ready for review, I think you need to update the “intented protocol flow” schema, remove reqbisec and update Rationale ?
You might swap the “Local State” section before messages, it could be easier to explain q-coefficient before using them in New messages
naumenkogs force-pushed
on Dec 1, 2020
DrahtBot added the label
Needs rebase
on Dec 2, 2020
naumenkogs force-pushed
on Dec 4, 2020
DrahtBot removed the label
Needs rebase
on Dec 4, 2020
DrahtBot added the label
Needs rebase
on Dec 7, 2020
in
src/net_processing.cpp:2379
in
a6f979194aoutdated
2374+ bool be_recon_requestor, be_recon_responder;
2375+ // Currently reconciliation requests flow only in one direction inbound->outbound.
2376+ if (pfrom.IsFullOutboundConn()) {
2377+ be_recon_requestor = true;
2378+ be_recon_responder = false;
2379+ } else {
sdaftuar
commented at 8:12 pm on December 10, 2020:
We should probably not send a sendrecon message to block-relay only peers? Here we should just check to see if it’s an inbound peer, right?
in
src/protocol.h:268
in
a6f979194aoutdated
259@@ -260,6 +260,14 @@ extern const char* CFCHECKPT;
260 * @since protocol version 70016 as described by BIP 339.
261 */
262 extern const char* WTXIDRELAY;
263+/**
264+ * Contains 2 1-byte bools, a 4-byte version number and an 8-byte salt.
265+ * Indicates that a node is willing to participate in transaction reconciliation,
266+ * either as a sender or a receiver.
267+ * The salt is used to compute short txids needed for efficient reconciliation.
268+ * @since protocol version 80001 as described by BIP 330
sdaftuar
commented at 8:15 pm on December 10, 2020:
I believe BIP 330 doesn’t reference protocol version 80001, and the code in this commit seems to send it as long as the peer has version 70016 or higher.
in
src/net_processing.cpp:162
in
2ac6f49703outdated
145@@ -146,6 +146,10 @@ static constexpr uint32_t MAX_GETCFILTERS_SIZE = 1000;
146 static constexpr uint32_t MAX_GETCFHEADERS_SIZE = 2000;
147 /** the maximum percentage of addresses from our addrman to return in response to a getaddr message. */
148 static constexpr size_t MAX_PCT_ADDR_TO_SEND = 23;
149+/** Static component of the salt used to compute short txids for transaction reconciliation. */
150+static const std::string RECON_STATIC_SALT = "Tx Relay Salting";
151+/** Default coefficient used to estimate set difference for tx reconciliation. */
152+static constexpr double DEFAULT_RECON_Q = 0.02;
sdaftuar
commented at 8:17 pm on December 10, 2020:
Just a note, I think the BIP draft refers to a default value of q=0.1. Perhaps if we end up leaving our default at 0.02, then we could update the BIP to have the same default value suggestion.
in
src/net_processing.cpp:2620
in
2ac6f49703outdated
2616@@ -2559,6 +2617,62 @@ void PeerManager::ProcessMessage(CNode& pfrom, const std::string& msg_type, CDat
2617 return;
2618 }
26192620+ // Received from an inbound peer planning to reconcilie transactions with us, or
sdaftuar
commented at 8:19 pm on December 10, 2020:
spelling nit: should say “reconcile”
in
src/net_processing.cpp:2944
in
2ac6f49703outdated
2622+ // If received from outgoing, adds the peer to the reconciliation queue.
2623+ // Feature negotiation of tx reconciliation should happen between VERSION and
2624+ // VERACK, to avoid relay problems from switching after a connection is up.
2625+ if (msg_type == NetMsgType::SENDRECON) {
2626+ if (!pfrom.m_tx_relay) return;
2627+ if (peer->m_recon_state != nullptr) return; // Do not support reconciliation salt/version updates.
sdaftuar
commented at 8:21 pm on December 10, 2020:
Should we only consider negotiation to have succeeded if we’ve already received a wtxidrelay message from this peer? If so, I think we need to specify the order in which wtxidrelay and sendrecon get sent during handshaking in the BIP.
in
src/net_processing.cpp:2987
in
2ac6f49703outdated
2633+ if (recon_version != 1) return;
2634+
2635+ // Do not flood through inbound connections which support reconciliation to save bandwidth.
2636+ bool flood_to = false;
2637+ if (pfrom.IsInboundConn()) {
2638+ if (!recon_sender) return;
sdaftuar
commented at 8:25 pm on December 10, 2020:
A comment here explaining this setup for an inbound peer that sends us a SENDRECON with sender=false would be helpful. Here, it seems that we don’t set up any reconciliation to occur with an inbound peer that doesn’t want to reconcile with us, because we only send reconciliations to our outbound peers, and require that inbounds who want to reconcile be the requestor.
Perhaps a comment that reminds the reader that if we give up on setting up reconciliation with a peer, that we default to flooding to that peer?
in
src/net_processing.cpp:2990
in
2ac6f49703outdated
2666+ peer->m_recon_state->m_salt = full_salt;
2667+
2668+ // Reconcile with all outbound peers supporting reconciliation (even if we flood to them),
2669+ // to not miss transactions they have for us but won't flood.
2670+ if (peer->m_recon_state->m_responder) {
2671+ m_recon_queue.push_back(&pfrom);
sdaftuar
commented at 8:40 pm on December 10, 2020:
I think if an inbound peer advertised support for both requesting sketches and responding to sketch requests, that we would then add them to our recon queue, but the comment here says that we are only doing this for outbound peers. Can you clarify the correct behavior in that situation?
naumenkogs
commented at 5:19 pm on December 11, 2020:
Probably gonna be restrictive for now.
naumenkogs
commented at 5:20 pm on December 11, 2020:
Updating the code and the BIP
in
src/net_processing.cpp:836
in
5db8069b56outdated
sdaftuar
commented at 8:51 pm on December 10, 2020:
If I read correctly, even in the final version of this PR this function is only invoked with flooding=true. Can we simplify this function a bit to reflect that?
in
src/net_processing.cpp:840
in
5db8069b56outdated
sdaftuar
commented at 8:52 pm on December 10, 2020:
Style comment: I’ve noticed in a few places that you have one-line if statements; our style guide requires that if the body of the if isn’t on the same line, then we must use curly braces. So that’s something to fix up throughout the new code.
naumenkogs
commented at 5:10 pm on December 11, 2020:
Fixed this in the first batch of commits, will also fix for the rest.
in
src/net_processing.cpp:843
in
5db8069b56outdated
838+ size_t result = 0;
839+ m_connman.ForEachNode([flooding, &result](CNode* pnode) {
840+ if (!pnode->m_tx_relay)
841+ return;
842+ if (!pnode->IsFullOutboundConn() && !pnode->IsManualConn())
843+ return;
sdaftuar
commented at 8:55 pm on December 10, 2020:
I haven’t quite thought through exactly how flooding works in Erlay, but I believe this logic is written so that if we configure a node to have 8 manual outbound peers in addition to the 8 full outbound relay peers, and the 8 manual peers are all assigned to be the flood peers, and then they all go offline, we would be left with 0 flood peers? Is it a problem if we are using fewer than our 8 flood peers?
If it is, perhaps we should be considering whether to enabling flooding on a connection after an existing flood-peer disconnects?
naumenkogs
commented at 8:29 am on December 11, 2020:
Haha, I was planning to bring this up later in the review cycle. I didn’t expect someone to find it that fast :)
Yeah, what you describing is possible.
Perhaps we should have a loop checking that we have enough (8) flood outbound peers, and then if not, switch some existing recon peers to flooding.
Will add a commit for this.
in
src/net_processing.cpp:2378
in
9347a73f01outdated
2369@@ -2370,6 +2370,28 @@ static void ProcessGetCFCheckPt(CNode& peer, CDataStream& vRecv, const CChainPar
2370 connman.PushMessage(&peer, std::move(msg));
2371 }
23722373+/**
2374+ * Announce transactions a peer is missing after reconciliation is done.
2375+ * No need to add transactions to peer's filter or do checks
2376+ * because it was already done when adding to the reconciliation set.
2377+ */
2378+void static AnnounceTxs(std::vector<uint256> remote_missing_wtxids, CNode* pto, CNetMsgMaker msgMaker, CConnman* connman)
sdaftuar
commented at 9:00 pm on December 10, 2020:
I assume you could pass this vector by (const) reference? (And maybe all the other stuff can be passed by reference too…).
in
src/net_processing.cpp:2694
in
9347a73f01outdated
sdaftuar
commented at 9:10 pm on December 10, 2020:
nit: Here and in many of the loops in this PR, I think you could make these const uint256& wtxid : ..., to save a copy. Won’t point it out in every spot but probably could be cleaned up throughout the PR at some point.
Or even const auto& wtxid : ... to be sure no copy is created, regardless of the type of the container.
sdaftuar
commented at 9:14 pm on December 10, 2020:
member
Concept ACK on Erlay – thanks for working on this.
My review approach is to focus on the non-minisketch portions of this PR for now. I’ve read through the BIP draft and gone through the first 5 commits. As this is a big PR I think it’ll take me a while to go through the rest, so I’m leaving these comments for now and will come back to this later when I have time for additional review.
naumenkogs force-pushed
on Dec 11, 2020
in
src/net_processing.cpp:1064
in
e8fd1cb590outdated
sdaftuar
commented at 1:27 pm on December 11, 2020:
Here, if a peer doesn’t support reconciliation, then the flood bool doesn’t matter, we’ll always flood – right? If that’s correct, perhaps we could either set flood=true for all non-reconciliation peers, or leave a comment that explains that the parameter doesn’t matter in this case.
in
src/net_processing.cpp:2298
in
e8fd1cb590outdated
sdaftuar
commented at 1:29 pm on December 11, 2020:
We track where we got the orphan from; should we use that to determine whether to flood (rather than setting flood=true for all orphans)?
naumenkogs
commented at 12:40 pm on December 16, 2020:
I think you’re right, although we should be extra careful to make sure no privacy leak is introduced w.r.t topology.
in
src/net_processing.cpp:523
in
ed736b0c38outdated
458+ * Salt is generated randomly per-connection to prevent linking of
459+ * connections belonging to the same physical node.
460+ * Also, salts should be different per-connection to prevent halting
461+ * of relay of particular transactions due to collisions in short IDs.
462+ */
463+ uint64_t m_local_recon_salt;
sdaftuar
commented at 1:37 pm on December 11, 2020:
Can we make this const and just always initialize it when a Peer object is created? Then we could avoid having to worry about locking issues with this (right now it seems like there should be some kind of lock that guards it).
in
src/net_processing.cpp:806
in
489c98c077outdated
516+ */
517+ double m_local_q;
518+ };
519+
520+ /// nullptr if we're not reconciling (neither passively nor actively) with this peer.
521+ std::unique_ptr<ReconState> m_recon_state;
sdaftuar
commented at 1:40 pm on December 11, 2020:
I think this probably should have a lock that protects access to this object?
in
src/net_processing.cpp:535
in
489c98c077outdated
474+ * asymmetrical, there is always a requestor and a responder. At the end of the
475+ * sequence, nodes are supposed to exchange transactions, so that both of them
476+ * have all relevant transactions. For more protocol details, refer to BIP-0330.
477+ */
478+ struct ReconState {
479+ ReconState(){};
sdaftuar
commented at 1:42 pm on December 11, 2020:
nit: perhaps the constructor can take initial values for all the members, so there’s no risk that the members go uninitialized? It looks like the way you are currently using this is to instantiate the object and immediately set everything, so I think this would be an easy change.
in
src/net_processing.h:162
in
489c98c077outdated
148+ * Transaction reconciliation should happen with peers in the same order,
149+ * because the efficiency gain is the highest when reconciliation set difference
150+ * is predictable. This queue is used to maintain the order of
151+ * peers chosen for reconciliation.
152+ */
153+ std::deque<CNode*> m_recon_queue;
sdaftuar
commented at 1:43 pm on December 11, 2020:
I think this probably needs a lock to synchronize access to this?
In commit “Handle reconciliation support announcement”:
The comment of m_recon_queue needing a lock doesn’t seem resolved?
in
src/net_processing.cpp:4948
in
e8fd1cb590outdated
4527@@ -4504,11 +4528,11 @@ bool PeerManager::SendMessages(CNode* pto)
4528 bool fSendTrickle = pto->HasPermission(PF_NOBAN);
4529 if (pto->m_tx_relay->nNextInvSend < current_time) {
4530 fSendTrickle = true;
4531- if (pto->IsInboundConn()) {
4532- pto->m_tx_relay->nNextInvSend = std::chrono::microseconds{m_connman.PoissonNextSendInbound(count_microseconds(current_time), INVENTORY_BROADCAST_INTERVAL)};
4533- } else {
4534- // Use half the delay for outbound peers, as there is less privacy concern for them.
4535+ if (peer->m_recon_state || !pto->IsInboundConn()) {
sdaftuar
commented at 3:14 pm on December 11, 2020:
So inbound peers that are reconciling with us effectively get their own poisson timer – can you explain why this is an ok change?
It seems like an inbound peer that constantly tries to reconcile with us could be a more effective spy than before this change, but maybe I’m missing something.
sdaftuar
commented at 3:17 pm on December 11, 2020:
Also, it might be helpful for code readers if the way we check to see if we’re reconciling with a peer is to call a function that explains that better (peer->ReconEnabled() or something), than checking to see if the m_recon_state object is a nullptr or not. But that’s just a style nit, you can take it or leave it.
naumenkogs
commented at 2:25 pm on March 15, 2021:
It seems like an inbound peer that constantly tries to reconcile with us could be a more effective spy than before this change, but maybe I’m missing something.
We don’t respond to reconciliations right away, there is a shared timer for those responses. That’s why I thought it’s fine to reduce the delay here.
in
src/net_processing.cpp:5043
in
e8fd1cb590outdated
4630+ // Always flood to non-reconciliation nodes supporting tx relay.
4631+ // For reconciliation nodes, flood if flood_to is set up and transaction is meant for flooding.
4632+ vInv.push_back(inv);
4633+ else {
4634+ // Storing to populate the reconciliation set.
4635+ txs_to_reconcile.push_back(txinfo.tx->GetWitnessHash());
sdaftuar
commented at 3:24 pm on December 11, 2020:
If we’re reconciling with a peer, does it make sense to limit this loop to INVENTORY_BROADCAST_MAX transactions? That is primarily a bandwidth throttling behavior, not a privacy behavior – and since reconciliation is already bandwidth reducing itself, it might be beneficial to throw more things into the sketch sooner?
On the other hand, maybe it works out just fine if both parties are throttling like this and will have similarly sized sketches anyway… Depends a bit on how much variation there might be between the number of times each side’s poisson timer could fire in the window between reconciliations?
in
src/net_processing.cpp:4880
in
e8fd1cb590outdated
4680+ // We don't care about a laggy peer (1) because we probably can't help them even if we flood transactions.
4681+ // However, exploiting (2) should not prevent us from relaying certain transactions.
4682+ // Since computing a sketch over too many elements is too expensive, we will just flood some transactions here.
4683+ std::vector<uint256> txs_to_flood = std::vector<uint256>(txs_to_reconcile.end() - recon_set_overflow, txs_to_reconcile.end());
4684+ txs_to_reconcile.resize(txs_to_reconcile.size() - recon_set_overflow);
4685+ AnnounceTxs(txs_to_flood, *pto, msgMaker, m_connman);
sdaftuar
commented at 3:29 pm on December 11, 2020:
Would it make more sense perhaps to announce the earliest transactions in the queue, rather than the most recent ones?
It’s possible this logic is fairly optimal, but here is what I was concerned about:
an inbound adversary could negotiate erlay support but then never send a reconciliation request, and so this queue fills up until we start just inv’ing them everything new anyway. Since we reduce the poisson timer for peers we think we’re reconciling with, this could be a way to game a quicker INV time to be a better spy node.
this doesn’t seem to actually work because they’d have to wait for 2 poisson timers to go off before getting the prior batch of announcements, which is approximately the same (other than integer division) as the same inv timer that inbound peers normally get. Though perhaps this would get them their own poisson timer, rather than having to use the bucket that all inbounds share? That seems not great – an adversary could use multiple inbound connections this way to effectively get a small poisson timer (on average).
Countervailing idea is that if the peer is just held up for some reason, the older transactions are more likely to reconcile successfully than the newest ones, so it’s probably bandwidth minimizing in the honest case to hang on to the older ones in the reconciliation set.
in
src/net_processing.cpp:135
in
9832017da2outdated
122@@ -123,12 +123,12 @@ static constexpr std::chrono::hours AVG_LOCAL_ADDRESS_BROADCAST_INTERVAL{24};
123 static constexpr std::chrono::seconds AVG_ADDRESS_BROADCAST_INTERVAL{30};
124 /** Average delay between trickled inventory transmissions in seconds.
125 * Blocks and peers with noban permission bypass this, outbound peers get half this delay. */
126-static const unsigned int INVENTORY_BROADCAST_INTERVAL = 5;
127+static const unsigned int INVENTORY_BROADCAST_INTERVAL = 2;
sdaftuar
commented at 3:32 pm on December 11, 2020:
FYI, I believe that because this number used to be odd, the ratio of announcement delay to inbound versus outbound peers will change from 2.5 in the old code to 2 in the new code. Not sure if that is cause for any concern though.
naumenkogs
commented at 1:44 pm on December 16, 2020:
Yeah, so I think the ratio matters most. It was 2.5 and now it’s 2, so a bit different.
But I’m also not 100% if this difference matters, while preserving 2.5 would require operating with floats now, so I decided to not, at least yet.
in
src/net_processing.cpp:140
in
9832017da2outdated
128 /** Maximum rate of inventory items to send per second.
129 * Limits the impact of low-fee transaction floods. */
130 static constexpr unsigned int INVENTORY_BROADCAST_PER_SECOND = 7;
131 /** Maximum number of inventory items to send per transmission. */
132-static constexpr unsigned int INVENTORY_BROADCAST_MAX = INVENTORY_BROADCAST_PER_SECOND * INVENTORY_BROADCAST_INTERVAL;
133+static constexpr unsigned int INVENTORY_BROADCAST_MAX = INVENTORY_BROADCAST_PER_SECOND * INVENTORY_BROADCAST_INTERVAL * 4;
sdaftuar
commented at 3:34 pm on December 11, 2020:
Can you explain why this needs to change, and how you arrived at multiplying by 4?
naumenkogs
commented at 8:57 am on December 17, 2020:
INVENTORY_BROADCAST_PER_SECOND is probably derived from the block size, so it shouldn’t be changed.
INVENTORY_BROADCAST_INTERVAL is just how long we wait between reconciliations with different peers (say 2 seconds).
Now, imagine, a non-reachable node (which only learns from reconciliations), reconciled with a dysfunctional peer. So the stack of what it should learn during the next successful reconciliation is twice the regular size.
And INVENTORY_BROADCAST_MAX should better accommodate for these cases, because otherwise what we’ll get is at least inefficiency.
“*4” is just an intuitive amortization for these cases.
I probably should add the explanation to a commit message once we agree this makes sense.
in
src/net_processing.cpp:177
in
3fc23c90edoutdated
175@@ -176,6 +176,12 @@ static const unsigned int MAX_RECON_SET = 10000;
176 * Less frequent reconciliations would introduce high transaction relay latency.
177 */
178 static constexpr std::chrono::microseconds RECON_REQUEST_INTERVAL{std::chrono::seconds{16}};
179+/**
180+ * Interval between responding to peers' reconciliation requests.
181+ * We don't respond to reconciliation requests right away because that would enable monitoring
182+ * when we receive transactions (privacy leak).
sdaftuar
commented at 3:46 pm on December 11, 2020:
Ah, I didn’t realize we respond to these asynchronously. If this isn’t already in the BIP, perhaps it would be helpful to specify there that reconciliation responses are not guaranteed to be sent out in sync with other messages peers might send (such as pings, getdata, etc)?
in
src/net_processing.cpp:4181
in
3fc23c90edoutdated
sdaftuar
commented at 3:51 pm on December 11, 2020:
Should we snapshot our reconciliation set as of when we receive this request? If we have a fixed delay and no snapshot, then I’m not sure how much privacy we gain by the 2 second delay (other than from throttling reconciliation requests from a single peer).
naumenkogs
commented at 9:20 am on December 17, 2020:
We gain privacy by not allowing 2 peers to monitor our mempool, because we respond to them at the same time (every 2 seconds).
Snapshoting earlier would hide the transactions received during the last 2 seconds and disable this kind of monitoring, but then reconciliations would not be so “fresh” (txs received during last 2s go into the next one).
Not sure this kind of monitoring is that dangerous, especially since we also have a Poisson on adding to recon sets.
in
src/net_processing.cpp:197
in
43f06991b5outdated
194+* but the result will be nonsense (false-positive decoding).
195+* Given this coef, a false positive probability will be of 1 in 2**coef.
196+*/
197+static constexpr unsigned int RECON_FALSE_POSITIVE_COEF = 16;
198+static_assert(RECON_FALSE_POSITIVE_COEF <= 256,
199+ "Reducing reconciliation false positives beyond 1 in 2**256 is not supported");
sdaftuar
commented at 3:57 pm on December 11, 2020:
We should try to figure out a better way to organize all these constants and the complexity of the logic away from the rest of net_processing – not sure yet what to suggest but wanted to flag that this is now quite a lot of esoteric stuff we’re adding to the top of this file.
naumenkogs
commented at 9:22 am on December 17, 2020:
Yeah, definitely, I have the same feel. Perhaps it better belongs to ReconState{}.
in
src/net_processing.cpp:5180
in
5162519249outdated
4873+ // Message: reconciliation response
4874+ //
4875+ if (peer->m_recon_state) {
4876+ // Respond to a requested reconciliation to enable efficient transaction exchange.
4877+ // Respond only periodically to a) limit CPU usage for sketch computation,
4878+ // and, b) limit transaction posesssion privacy leak.
sdaftuar
commented at 4:14 pm on December 11, 2020:
spelling nit: possession
in
src/net_processing.cpp:4888
in
5162519249outdated
4883+ if (response_sketch != nullptr) {
4884+ // It is possible to create a sketch if we had at least one transaction to send to the peer.
4885+ size_t ser_size = minisketch_serialized_size(response_sketch);
4886+ uint8_t skdata[MAX_SKETCH_CAPACITY * BYTES_PER_SKETCH_CAPACITY];
4887+ minisketch_serialize(response_sketch, skdata);
4888+ response_skdata.resize(ser_size);
sdaftuar
commented at 4:16 pm on December 11, 2020:
Do we need to invoke minisketch_destroy on the response_sketch to avoid leaking memory?
I’m also confused about why we need to serialize the sketch into a temporary array and then move it into another vector before sending our response.
in
src/net_processing.cpp:4894
in
5162519249outdated
sdaftuar
commented at 4:20 pm on December 11, 2020:
I thought we needed to hang on to a snapshot of this in order to respond to sketch-extension requests?
EDIT: Ah, this happens in a later commit.
in
src/net_processing.cpp:742
in
e0627a7473outdated
689+ */
690+ void FinalizeReconciliation(bool clear_local_set, LocalQAction action, size_t actual_local_missing, size_t actual_remote_missing)
691+ {
692+ // According to the erlay spec, reconciliation is initiated by inbound peers.
693+ if (m_sender) {
694+ assert(m_incoming_recon != ReconPhase::NONE);
sdaftuar
commented at 5:42 pm on December 11, 2020:
This assert worried me a bit, since I figured that the FinalizeReconciliation could be triggered by a peer with a reconcildiff message without having sent a previous reqreconcil. It looks like there is a guard to prevent that from happening, but it’s not clear to me that assert() is what we want here.
naumenkogs force-pushed
on Dec 11, 2020
in
src/net_processing.cpp:2970
in
60a723ee67outdated
2650+ // We currently don't support reconciliations with outbound peers which
2651+ // don't want to be reconciliation responders (send us their sketches),
2652+ // or want to be reconciliation senders (request our sketches).
2653+ // Just ignore SENDRECON and use normal flooding for transaction relay with them.
2654+ if (recon_sender) return;
2655+ if (!recon_responder) return;
sdaftuar
commented at 5:47 pm on December 11, 2020:
So – this logic means that we won’t do any reconciliation with peers if their sender/responder settings don’t match exactly what we expect inbound and outbound peers to do. Should we add some sort of acknowledgement message so that both sides know whether reconciliation is enabled on the link?
Our own logic seems to change based on whether we think we’re reconciling with a peer; if some other implementation turns on both sending and responding (for instance) and then we silently ignore all their reconciliation requests, that seems suboptimal for everyone. Likewise, if we think we’ve enabled reconciliation with a peer but it has a different policy, it would be better for our software to know that reconciliation isn’t going to actually happen.
naumenkogs
commented at 3:22 pm on December 16, 2020:
Right, this implementation prioritized simplicity here.
Perhaps, I should add sending RECONACK, and also think whether at least some of the non-matching scenarios are compatible.
I think it’s possible to specify the behavior exactly without the need for an ACK message, but perhaps this is a detail you don’t actually want in the BIP:
We send (us_sender, us_responder, us_version)
We receive (they_sender, they_responder, they_version)
Define out_recon = us_sender && they_responder
Define in_recon = they_sender && us_responder
If !(out_recon || in_recon), reconciliation is disabled.
If exactly one of out_recon or in_recon, reconciliation (with us as requestor if out_recon, as responder otherwise) is enabled with protocol version min(us_version, they_version).
If both out_recon and in_recon, reconciliation is enabled with version min(us_version, they_version), in some tie-breaking direction (prefer outbound->inbound for example, or pick the one with the lowest nonce, …)?
If no SENDRECON is received by the time VERACK happens, reconciliation is disabled.
This also means a new recon protocol can be introduced in a backward-compatible way. If the other party announced recon protocol 2, things will just default to protocol 1 without problems (which means forcing any client that supports protocol 2 to also support protocol 1). If a change would be made that is so incompatible that we don’t expect clients to support the old one anymore, a different negotiation message would be used, or 0 could be sent.
In commit “Handle reconciliation support announcement”:
The code would be something like this to accomodate that:
0/* must match announcement logic */ 1staticconstexpruint32_t RECON_VERSION =1;
2bool us_sender =!IsInBound(), us_responder = IsInBound();
3 4bool they_sender, they_responder;
5vRecv >> they_sender >> they_responder >> recon_version >> remote_salt;
6recon_version = std::min(recon_version, RECON_VERSION);
7if (recon_version <1) return;
8bool recon_sender = us_sender && they_responder, recon_responder = us_responder && they_sender;
9// If we ever announce us_sender && us_responder, this will need tie-breaking (picking the outbound side as sender)
10assert(!(recon_sender && recon_responder));
naumenkogs
commented at 11:19 am on March 16, 2021:
sdaftuar
commented at 6:11 pm on December 11, 2020:
I’m wondering if this might be better as a poisson-timer, but I don’t quite understand the implications of all this yet to reach a conclusion. Intuition: with a fixed delay like this, an adversary with 2 inbound connections can use set reconciliation to measure the set difference between the node’s transactions-to-announce in known and arbitrarily precise time slices.
naumenkogs
commented at 3:18 pm on December 16, 2020:
I also don’t have a good answer, but it’s not that simple of an attack: we have Poisson on adding to reconciliation sets.
Having another poison here, well, I don’t think it makes anything worse. Maybe you’re right.
in
src/net_processing.cpp:688
in
bcade239e2outdated
sdaftuar
commented at 6:16 pm on December 11, 2020:
Should the m_local_short_id_mapping be cleared out in this function?
naumenkogs
commented at 3:12 pm on December 16, 2020:
Why so early? The point of it is to refer to it later, when we are requested something by short ID.
That’s why it should be kept until FinalizeReconciliation.
Although as currently implemented, I notice there’s a bug, because we may lose short IDs of the transactions received during extension when calling this after the extension is over. I need to fix this part.
in
src/net_processing.cpp:3987
in
19065cb401outdated
4219+
4220+ std::vector<uint8_t> skdata;
4221+ vRecv >> skdata;
4222+
4223+ // Attempt to decode the received sketch with a local sketch.
4224+ if (skdata.size() / BYTES_PER_SKETCH_CAPACITY > MAX_SKETCH_CAPACITY) return;
sdaftuar
commented at 6:19 pm on December 11, 2020:
I think this is a protocol violation and leaves the link in a permanently broken reconciliation state, since a reconcildiff would never be sent. Perhaps we should disconnect the peer if this happens?
naumenkogs
commented at 3:00 pm on December 16, 2020:
Not really permanently broken, they still have a chance to send us a proper sketch. But yeah, either finalizing or disconnecting would work. Making it disconnect for now.
in
src/net_processing.cpp:4232
in
19065cb401outdated
4227+ uint8_t remote_sketch_serialized[MAX_SKETCH_CAPACITY * BYTES_PER_SKETCH_CAPACITY];
4228+ std::copy(skdata.begin(), skdata.end(), remote_sketch_serialized);
4229+ minisketch_deserialize(remote_sketch, remote_sketch_serialized);
4230+
4231+ minisketch* working_sketch; // Contains sketch of the set difference
4232+ minisketch* local_sketch = peer->m_recon_state->ComputeSketch(peer->m_recon_state->m_local_set, remote_sketch_capacity);
sdaftuar
commented at 6:21 pm on December 11, 2020:
I think we need to call minisketch_destroy on these somewhere to avoid a memory leak right?
in
src/net_processing.cpp:4269
in
c14e819d5foutdated
4261@@ -4252,6 +4262,16 @@ void PeerManager::ProcessMessage(CNode& pfrom, const std::string& msg_type, CDat
4262 AnnounceTxs(remote_missing, pfrom, msgMaker, m_connman);
4263 m_connman.PushMessage(&pfrom, msgMaker.Make(NetMsgType::RECONCILDIFF, uint8_t(RECON_SUCCESS), local_missing));
4264 peer->m_recon_state->FinalizeReconciliation(true, Peer::ReconState::LocalQAction::Q_RECOMPUTE, local_missing.size(), remote_missing.size());
4265+ } else {
4266+ // Initial reconciliation failed.
4267+ // Store the received sketch and the local sketch, request extension.
4268+
4269+ assert(remote_sketch != nullptr);
sdaftuar
commented at 6:37 pm on December 11, 2020:
Is it possible for the peer to send us a message that causes remote_sketch to be nullptr here?
naumenkogs
commented at 1:59 pm on December 16, 2020:
Adding protection for this case.
naumenkogs force-pushed
on Dec 17, 2020
in
src/net_processing.cpp:2672
in
9da045b6f5outdated
BIP comment, and commit “Handle reconciliation support announcement”:
Should we use a BIP340 tagged hash here (see TaggedHash() in src/hash.h)?
in
src/net_processing.cpp:490
in
9da045b6f5outdated
485+
486+ /// Whether this peer will respond to reconciliation requests.
487+ bool m_responder;
488+
489+ /**
490+ * Whether we should flood transactions to the peer.
In commit “Handle reconciliation support announcement”:
This comment isn’t entirely clear to me. Flood transactions to the peer, as opposed to what (given that in the next line you say it’s not in conflict with reconciliation, so it’s not clear to me if this means instead of, or in addition to, reconciliation).
in
src/net_processing.cpp:508
in
9da045b6f5outdated
503+ * These short IDs will be salted so that they are not the same
504+ * across all pairs of peers, because otherwise it would enable network-wide
505+ * collisions which may (intentionally or not) halt relay of certain transactions.
506+ * Both of the peers contribute to the salt.
507+ */
508+ uint256 m_salt;
In commit “Handle reconciliation support announcement”
Is it necessary to use a recursive mutex here? If you can avoid them, Mutex is more efficient and easier to reason about.
naumenkogs
commented at 9:44 am on December 22, 2020:
I’m not 100% sure how to do it right, but simple Mutex doesn’t seem to work when GetFloodingOutboundsCount is called in handling NetMsgType::SENDRECON. It just hangs.
in
src/net_processing.cpp:2663
in
9da045b6f5outdated
2658+ // TODO: Flood only through a limited number of outbound connections.
2659+ flood_to = true;
2660+ }
2661+
2662+ uint64_t local_salt = peer->m_local_recon_salt;
2663+ std::string salt1, salt2;
in
src/net_processing.cpp:171
in
49a18d73dfoutdated
163+ * Sets the bound on the following objects:
164+ * - reconciliation set
165+ * - reconciliation set snapshot
166+ * - reconciliation short-full id mapping
167+ */
168+static const unsigned int MAX_RECON_SET = 10000;
In commit “Distinguish transactions to flood and to reconcile”
Would it make sense to reuse MAX_PEER_TX_ANNOUNCEMENTS here? It was reduced to 5000 in #19988.
in
src/net_processing.cpp:2942
in
9da045b6f5outdated
2625+ // from an outgoing peer demonstrating readiness to do reconciliations.
2626+ // If received from outgoing, adds the peer to the reconciliation queue.
2627+ // Feature negotiation of tx reconciliation should happen between VERSION and
2628+ // VERACK, to avoid relay problems from switching after a connection is up.
2629+ if (msg_type == NetMsgType::SENDRECON) {
2630+ if (!pfrom.m_tx_relay) return;
In commit “Distinguish transactions to flood and to reconcile”:
Nit: coding style (use { … } for multi-line ifs).
in
src/net.h:1022
in
49a18d73dfoutdated
1008@@ -1009,9 +1009,10 @@ class CNode
10091010 mutable RecursiveMutex cs_tx_inventory;
1011 CRollingBloomFilter filterInventoryKnown GUARDED_BY(cs_tx_inventory){50000, 0.000001};
1012- // Set of transaction ids we still have to announce.
1013- // They are sorted by the mempool before relay, so the order is not important.
1014- std::set<uint256> setInventoryTxToSend;
1015+ // Set of transaction ids we still have to announce, and whether we may flood them
1016+ // in case if peer is meant to receive flooding, as opposed to reconcile.
1017+ // Transactions are sorted by the mempool before relay, so the order is not important.
In commit “Distinguish transactions to flood and to reconcile”
Perhaps it’s useful to comment on the exact semantics for the bool parameter here. IIRC it is:
false: use reconciliation if negotiated, flood otherwise
true: flood unless reconciliation negotiated and this is not a flooding peer
in
src/net_processing.cpp:160
in
80bde2b1aaoutdated
146@@ -146,6 +147,65 @@ static constexpr uint32_t MAX_GETCFILTERS_SIZE = 1000;
147 static constexpr uint32_t MAX_GETCFHEADERS_SIZE = 2000;
148 /** the maximum percentage of addresses from our addrman to return in response to a getaddr message. */
149 static constexpr size_t MAX_PCT_ADDR_TO_SEND = 23;
150+/** Static component of the salt used to compute short txids for transaction reconciliation. */
151+static const std::string RECON_STATIC_SALT = "Tx Relay Salting";
152+/** Used to convert a floating point reconciliation coefficient q to an int for transmission. */
153+static constexpr double Q_PRECISION{2 << 12};
BIP comment: why 12 bits? You could have 16 bits of accuracy with the 2 bytes that it takes.
naumenkogs
commented at 6:23 pm on December 24, 2020:
q is computed as (total_missing - abs(local_missing - remote_missing)) / min_size
So the bounds on q are [0…2] I think?
if abs = 0, total_missing = 2 * min_size (at most) is an upper bound.
If abs !=0, well. Ideally, we need to solve this equation properly :)
So it should be (2 « 14) - 1 I think assuming [0..2] for now.
in
src/net_processing.cpp:188
in
80bde2b1aaoutdated
183+ * when we receive transactions (privacy leak).
184+ */
185+static constexpr std::chrono::microseconds RECON_RESPONSE_INTERVAL{std::chrono::seconds{2}};
186+/** The size of the field, used to compute sketches to reconcile transactions (see BIP-330). */
187+static constexpr unsigned int RECON_FIELD_SIZE = 32;
188+static_assert(RECON_FIELD_SIZE % 8 == 0, "Field size should be divisible by 8");
I’ve PR’ed https://github.com/sipa/minisketch/pull/28 that adds a safer C++ wrapper around the minisketch* type. You may want to use it instead of doing manual memory management for them. It also adds some convenience functions for dealing with fpbits (CreateFP and DecodeFP).
in
test/functional/p2p_erlay.py:45
in
80bde2b1aaoutdated
I’ve PR’ed https://github.com/sipa/minisketch/pull/26 which adds a pure Python (slow) full reimplementation of minisketch. You may want to use that instead (it also supports decoding).
sipa
commented at 1:46 am on December 20, 2020:
member
First set of comments.
naumenkogs force-pushed
on Dec 21, 2020
naumenkogs force-pushed
on Dec 22, 2020
naumenkogs force-pushed
on Dec 22, 2020
naumenkogs force-pushed
on Dec 22, 2020
naumenkogs force-pushed
on Dec 22, 2020
naumenkogs force-pushed
on Dec 23, 2020
naumenkogs force-pushed
on Dec 24, 2020
DrahtBot removed the label
Needs rebase
on Dec 24, 2020
naumenkogs force-pushed
on Dec 27, 2020
DrahtBot added the label
Needs rebase
on Jan 2, 2021
naumenkogs force-pushed
on Jan 9, 2021
DrahtBot removed the label
Needs rebase
on Jan 9, 2021
naumenkogs force-pushed
on Jan 10, 2021
DrahtBot added the label
Needs rebase
on Jan 11, 2021
DrahtBot
commented at 9:34 pm on January 11, 2021:
member
🐙 This pull request conflicts with the target branch and needs rebase.
Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.
in
src/net.h:559
in
f472f3077eoutdated
555@@ -556,9 +556,11 @@ class CNode
556557 mutable RecursiveMutex cs_tx_inventory;
558 CRollingBloomFilter filterInventoryKnown GUARDED_BY(cs_tx_inventory){50000, 0.000001};
559- // Set of transaction ids we still have to announce.
560- // They are sorted by the mempool before relay, so the order is not important.
561- std::set<uint256> setInventoryTxToSend;
562+ // Set of transaction ids we still have to announce, and whether we may flood them:
jnewbery
commented at 10:24 pm on January 17, 2021:
This is no longer a set.
in
src/net_processing.cpp:173
in
f472f3077eoutdated
168+ * This value allows to reconcile ~100 transactions (7 tx/s * 16s) during normal system operation at capacity.
169+ * More frequent reconciliations would cause significant constant bandwidth overhead due to
170+ * reconciliation metadata (sketch sizes etc.), which would nullify the efficiency.
171+ * Less frequent reconciliations would introduce high transaction relay latency.
172+ */
173+static constexpr std::chrono::microseconds RECON_REQUEST_INTERVAL{std::chrono::seconds{16}};
jnewbery
commented at 10:25 pm on January 17, 2021:
in
src/net_processing.cpp:182
in
f472f3077eoutdated
177+ * when we receive transactions (privacy leak).
178+ */
179+static constexpr std::chrono::microseconds RECON_RESPONSE_INTERVAL{std::chrono::seconds{2}};
180+
181+/** Represents a reconciliation result, used to decide what to do when the reconciliation is over. */
182+enum ReconResult {
jnewbery
commented at 10:26 pm on January 17, 2021:
Any reason not to use a bool here?
in
src/net_processing.cpp:2363
in
f472f3077eoutdated
2354@@ -2267,6 +2355,29 @@ static void ProcessGetCFCheckPt(CNode& peer, CDataStream& vRecv, const CChainPar
2355 connman.PushMessage(&peer, std::move(msg));
2356 }
23572358+/**
2359+ * Announce transactions a peer is missing after reconciliation is done.
2360+ * No need to add transactions to peer's filter or do checks
2361+ * because it was already done when adding to the reconciliation set.
2362+ */
2363+void static AnnounceTxs(const std::vector<uint256>& remote_missing_wtxids, CNode& pto, const CNetMsgMaker& msgMaker, CConnman& connman)
jnewbery
commented at 10:29 pm on January 17, 2021:
It’s easy enough to instantiate a new CNetMsgMaker from the CNode, rather than passing it as an argument. If you make this function a member of PeerManagerImpl you can also avoid passing the CConnman&.
in
src/net_processing.cpp:2365
in
f472f3077eoutdated
2360+ * No need to add transactions to peer's filter or do checks
2361+ * because it was already done when adding to the reconciliation set.
2362+ */
2363+void static AnnounceTxs(const std::vector<uint256>& remote_missing_wtxids, CNode& pto, const CNetMsgMaker& msgMaker, CConnman& connman)
2364+{
2365+ if (remote_missing_wtxids.size() != 0) {
jnewbery
commented at 10:30 pm on January 17, 2021:
Consider inverting this if condition to make this a guard clause and avoid deep indentations below.
0 if (remote_missing_wtxids.size() == 0) return;
in
src/net_processing.cpp:2366
in
f472f3077eoutdated
2361+ * because it was already done when adding to the reconciliation set.
2362+ */
2363+void static AnnounceTxs(const std::vector<uint256>& remote_missing_wtxids, CNode& pto, const CNetMsgMaker& msgMaker, CConnman& connman)
2364+{
2365+ if (remote_missing_wtxids.size() != 0) {
2366+ std::vector<CInv> remote_missing_invs;
jnewbery
commented at 10:31 pm on January 17, 2021:
Perhaps reserve max(size of remote_missing_wtxids, MAX_INV_SIZE) to avoid reallocations.
naumenkogs
commented at 11:25 am on January 19, 2021:
I think this should be min, because the vector never exceeds MAX_INV_SIZE.
jnewbery
commented at 11:54 am on January 19, 2021:
yes, you’re right. Should be min.
in
src/net_processing.cpp:4182
in
f472f3077eoutdated
3942@@ -3751,6 +3943,165 @@ void PeerManager::ProcessMessage(CNode& pfrom, const std::string& msg_type, CDat
3943 return;
3944 }
39453946+ std::chrono::microseconds current_time = GetTime<std::chrono::microseconds>();
3947+
3948+ // Record an (expected) reconciliation request with parameters to respond when time comes.
3949+ // All initial reconciliation responses will be done at the same time to prevent tx-related privacy leaks.
3950+ if (msg_type == NetMsgType::REQRECON) {
jnewbery
commented at 10:32 pm on January 17, 2021:
A bit of space/comments in this code block would make it easier to read.
in
src/net_processing.cpp:4307
in
f472f3077eoutdated
4080+ return;
4081+ }
4082+
4083+ // Among transactions requested by short ID here, we should send only those transactions
4084+ // sketched (stored in local set snapshot), because otherwise we would leak privacy (mempool content).
4085+ if (msg_type == NetMsgType::RECONCILDIFF) {
jnewbery
commented at 10:33 pm on January 17, 2021:
Again, spacing/comments would make this more legible.
jnewbery
commented at 10:34 pm on January 17, 2021:
sort
in
src/net_processing.h:37
in
f472f3077eoutdated
31@@ -31,6 +32,19 @@ static const bool DEFAULT_PEERBLOOMFILTERS = false;
32 static const bool DEFAULT_PEERBLOCKFILTERS = false;
33 /** Threshold for marking a node to be discouraged, e.g. disconnected and added to the discouragement filter. */
34 static const int DISCOURAGEMENT_THRESHOLD{100};
35+/** The size of the field, used to compute sketches to reconcile transactions (see BIP-330). */
36+static constexpr unsigned int RECON_FIELD_SIZE = 32;
37+static constexpr unsigned int BYTES_PER_SKETCH_CAPACITY = RECON_FIELD_SIZE / 8;
jnewbery
commented at 10:35 pm on January 17, 2021:
This doesn’t need to be in the header, since it’s only used inside net_processing.cpp.
in
src/net_processing.h:280
in
f472f3077eoutdated
277+ /**
278+ * Reconciliation involves computing a space-efficient representation of transaction identifiers (a sketch).
279+ * A sketch has a capacity meaning it allows reconciling at most a certain number of elements. (see BIP-330).
280+ * Considering whether we are going to send a sketch to a peer or use locally, we estimate the set difference.
281+ */
282+ Minisketch ComputeSketch(const std::set<uint256> local_set, uint16_t& capacity)
jnewbery
commented at 10:38 pm on January 17, 2021:
Inside a structure inside Peer feels like the wrong place for a lot of this complex logic. I think ideally, Peer would continue to be a struct (i.e. data members only) and the logic would be contained in a separate module.
jnewbery
commented at 10:39 pm on January 17, 2021:
Don’t use old cs nomenclature for mutexes:
0 RecursiveMutex m_recon_state_mutex;
Also prefer to use a Mutex over a RecursiveMutex in new code.
naumenkogs
commented at 3:24 pm on January 19, 2021:
I think I can’t make it non-recursive because of the call inside GetFloodingOutboundsCount. How would you suggest to overcome this issue?
in
src/net_processing.h:571
in
f472f3077eoutdated
566+ * Transaction reconciliation should happen with peers in the same order,
567+ * because the efficiency gain is the highest when reconciliation set difference
568+ * is predictable. This queue is used to maintain the order of
569+ * peers chosen for reconciliation.
570+ */
571+ RecursiveMutex cs_recon_queue;
jnewbery
commented at 10:40 pm on January 17, 2021:
Again, prefer a non-reenrant mutex and don’t use the cs nomenclature.
naumenkogs
commented at 3:00 pm on January 19, 2021:
I can switch to non-recursive if UpdateNextReconRequest() takes m_recon_queue.size() as an argument, instead of accessing it inside. Do you think that would be preferable?
jnewbery
commented at 4:30 pm on January 19, 2021:
I haven’t looked into your branch in great detail, but it looks like UpdateNextReconRequest() is only called in one place, which already holds m_recon_queue_mutex. Could you make the function EXCLUSIVE_LOCKS_REQUIRED(m_recon_queue_mutex) and not retake the mutex inside?
in
test/functional/p2p_erlay.py:6
in
f472f3077eoutdated
0@@ -0,0 +1,595 @@
1+#!/usr/bin/env python3
2+# Copyhigh (c) 2016-2017 The Bitcoin Core developers
3+# Distributed under the MIT software license, see the accompanying
4+# file COPYING or http://www.opensource.org/licenses/mit-license.php.
5+"""Test reconciliation-based transaction relay protocol.
6+
jnewbery
commented at 10:41 pm on January 17, 2021:
Why this space?
in
test/functional/p2p_erlay.py:25
in
f472f3077eoutdated
20+from enum import IntEnum
21+from io import BytesIO
22+import random
23+import hashlib
24+import time
25+import struct
jnewbery
commented at 10:42 pm on January 17, 2021:
Please sort, with standard library imports first.
in
test/functional/p2p_erlay.py:159
in
f472f3077eoutdated
jnewbery
commented at 10:44 pm on January 17, 2021:
sort
jnewbery
commented at 10:53 pm on January 17, 2021:
member
Just some style comments so far. A couple of high-level points:
I agree with the comment here #18261 (review) that we should aim to separate this from the rest of net_processing. This PR currently has about 850 LOC change in net_processing.{cpp|h}, out of a total of around 5700 lines. That means ~15% of net_processing is erlay code after this PR (and over half of net_processing.h is erlay specific). Would it be possible to split the erlay logic into its own subcomponent? One immediate benefit for you is that there would be far fewer disruptive rebases if you did that.
Could the minisketch code and tests be split into their own PR? It seems like there is pretty wide support for incorporating erlay. Reviewing and merging minisketch first would make this PR a lot smaller and more focused on just the p2p changes.
naumenkogs force-pushed
on Jan 19, 2021
naumenkogs force-pushed
on Jan 21, 2021
in
src/net_processing.cpp:246
in
c452e5dbd6outdated
241+ * of relay of particular transactions due to collisions in short IDs.
242+ */
243+ const uint64_t m_local_recon_salt;
244+
245+ Mutex m_recon_state_mutex;
246+ /// nullptr if we're not reconciling (neither passively nor actively) with this peer.
jnewbery
commented at 9:02 am on January 25, 2021: