Several performance and privacy improvements to inv/mempool handling

sipa commented at 1:39 pm on April 8, 2016: member

As we’re increasingly changing how block invs and tx invs are treated, bite the bullet and separate them into two separate queues. This makes sense, as they are subject to very different behaviour: we want blocks to relay as fast as possible, while transactions are subject to rate limiting, deduplication, and source obscurity.

Once this is done, we can turn the tx queue into a set, and use it to prevent the mempool command from leaking unannounced transactions. The handling of such commands is also moved to the send loop, sorted by dependency order, and made subject to trickling.

Based on top of #7805.

sipa force-pushed on Apr 8, 2016

in src/main.cpp: in 906e5c7fdb outdated

5809+                        if (!pto->pfilter->IsRelevantAndUpdate(tx)) continue;
5810+                    }
5811+                    if (pto->minFeeFilter) {
5812+                        CFeeRate feeRate;
5813+                        mempool.lookupFeeRate(hash, feeRate);
5814+                        LOCK(pto->cs_feeFilter);

gmaxwell commented at 6:01 pm on April 8, 2016:

Perhaps the fee filter access should be hoisted out of this inner loop on transactions, and a local copy of the filter-limit made.

sipa commented at 1:38 pm on April 10, 2016:

Done.

gmaxwell commented at 6:06 pm on April 8, 2016: contributor

This is a very good idea– much better than my earlier ones around avoiding mempool requests breaking privacy.

Your commit message should explain that by eliminating queued entries from the mempool response and responding only at trickle time, this makes the mempool no longer leak transaction arrival order information (as the mempool itself is also sorted)– at least no more than relay itself leaks it.

in src/main.cpp: in 906e5c7fdb outdated

5824+                    }
5825+                }
5826             }
5827+
5828+            // Determine transactions to relay
5829             if (fSendTrickle) {

gmaxwell commented at 7:06 pm on April 8, 2016:

“&& vInv.size() < INVENTORY_BROADCAST_MAX” perhaps? otherwise when a mempool request has prefilled the INV it will waste time heapifying for no reason.

sipa commented at 1:39 pm on April 10, 2016:

Done.

gmaxwell commented at 0:32 am on April 9, 2016: contributor

I have a test-harness now thats sutiable for testing this, a little 13 node network. Seems to work, it caught that the conversion to use the STL heap broke the behavior: the stl heap is a max heap.

sipa force-pushed on Apr 9, 2016

gmaxwell commented at 3:15 am on April 9, 2016: contributor

@Thanks for fixing, looking good so far.

We still get a tiny number of orphans in a network the consists of nothing but this code. Here is one reason why node A sends you Tx1. You getdata Tx1 Node B sends you Tx1 and Tx2 and Tx1 is a child of Tx1. You getdata Tx2 from node2. Node2 responds faster. Tx2 is now an orphan.

sipa force-pushed on Apr 10, 2016

sipa renamed this:
~~Split and optimize inv queues and improve mempool privacy~~
Several performance and privacy improvements to inv/mempool handling
on Apr 10, 2016

sipa commented at 1:41 pm on April 10, 2016: member

I added a commit to sort the response of mempool requests in dependency order, and renamed the PR to something more general.

dcousens commented at 6:48 am on April 11, 2016: contributor

concept ACK

laanwj added the label P2P on Apr 11, 2016

laanwj added the label Privacy on Apr 11, 2016

in src/main.cpp: in 0cdc0fa69b outdated

5852-                if (vInv.size() >= 1000)
5853-                {
5854-                    pto->PushMessage(NetMsgType::INV, vInv);
5855-                    vInv.clear();
5856+            // Determine transactions to relay
5857+            if (fSendTrickle && vInv.size() < INVENTORY_BROADCAST_MAX) {

sdaftuar commented at 7:07 pm on April 11, 2016:

INVENTORY_BROADCAST_MAX is supposed to apply to rate-limited transaction relay, while vInv at this point can be holding blocks, and/or any leftovers from the mempool command. This seems like it’s not the right comparison.

in src/net.h: in 0cdc0fa69b outdated

395@@ -396,14 +396,17 @@ class CNode
396 
397     // inventory based relay
398     CRollingBloomFilter filterInventoryKnown;
399-    std::vector<CInv> vInventoryToSend;
400+    std::set<uint256> vInventoryTxToSend;
401+    std::vector<uint256> vInventoryBlockToSend;

sdaftuar commented at 7:27 pm on April 11, 2016:

nit: Perhaps add a comment explaining why the vector is needed for blocks?

sdaftuar commented at 7:34 pm on April 11, 2016: member

Concept ACK, will test.

in src/main.h: in 0cdc0fa69b outdated

103- *  Blocks, whitelisted receivers, and a random 25% of transactions bypass this. */
104-static const unsigned int AVG_INVENTORY_BROADCAST_INTERVAL = 5;
105+/** Average delay between trickled inventory transmissions in seconds.
106+ *  Blocks and whitelisted receivers bypass this, outbound peers get half this delay. */
107+static const unsigned int INVENTORY_BROADCAST_INTERVAL = 5;
108+/** Maximum number of inventor items to send per transmission.

sdaftuar commented at 7:36 pm on April 11, 2016:

“inventor” -> “inventory”

sipa commented at 2:40 pm on April 12, 2016: member

Added two commits to address the comments.

Mempool requests are now not filtered by vInventoryTxToSend, but instead every mempool inv sent as a response is removed from vInventoryTxToSend.

gmaxwell commented at 9:30 pm on April 12, 2016: contributor

In a test topology over 48 hours this reduced the addition of orphan transactions from 436/hr to 2.3/hr.

gmaxwell commented at 9:40 pm on April 12, 2016: contributor

Also, utack latest improvements.

dcousens commented at 2:32 am on April 13, 2016: contributor

concept ACK, utACK 98305df

gmaxwell commented at 7:05 am on April 19, 2016: contributor

@sipa: any squashing interest?

sipa force-pushed on Apr 19, 2016

sipa commented at 5:54 pm on April 19, 2016: member

Rebased and squashed.

sipa force-pushed on Apr 19, 2016

sipa commented at 6:07 pm on April 19, 2016: member

Also renamed the vInventoryTxToSend to setInventoryTxToSend, and restructured the tx send loop a bit to be less deeply nested.

gmaxwell commented at 4:58 am on April 20, 2016: contributor

Travis CI timeout, doesn’t look like a real failure.

Eliminate TX trickle bypass, sort TX invs for privacy and priority.

Previously Bitcoin would send 1/4 of transactions out to all peers
 instantly.  This causes high overhead because it makes >80% of
 INVs size 1.  Doing so harms privacy, because it limits the
 amount of source obscurity a transaction can receive.

These randomized broadcasts also disobeyed transaction dependencies
 and required use of the orphan pool.  Because the orphan pool is
 so small this leads to poor propagation for dependent transactions.

When the bypass wasn't in effect, transactions were sent in the
 order they were received.  This avoided creating orphans but
 undermines privacy fairly significantly.

This commit:
 Eliminates the bypass. The bypass is replaced by halving the
  average delay for outbound peers.

 Sorts candidate transactions for INV by their topological
  depth then by their feerate (then hash); removing the
  information leakage and providing priority service to
  higher fee transactions.

 Limits the amount of transactions sent in a single INV to
  7tx/sec (and twice that for outbound); this limits the
  harm of low fee transaction floods, gives faster relay
  service to higher fee transactions. The 7 sounds lower
  than it really is because received advertisements need
  not be sent, and because the aggregate rate is multipled
  by the number of peers.

f2d3ba7386

sipa force-pushed on Apr 20, 2016

sipa commented at 7:26 pm on April 20, 2016: member

Added one more commit by @gmaxwell to move all inv filtering to the send stage (which avoids announcing things that are no longer in our mempool and bloom filter updates for those).

I’m sorry for the increasingly expansing scope this PR has had, so I’m not going to touch functionality any further unless bugs in it are found.

in src/main.cpp: in 66a5885a9e outdated

5823+                        if (!fInMemPool) continue; // another thread removed since queryHashes, maybe...
5824+                        if (!pto->pfilter->IsRelevantAndUpdate(tx)) continue;
5825+                    }
5826+                    pto->filterInventoryKnown.insert(hash);
5827+                    vInv.push_back(inv);
5828+                    if (vInv.size() == MAX_INV_SZ) {

jonasnick commented at 8:26 pm on April 20, 2016:

It seems that in general it can happen that vInv.size() > MAX_INV_SZ, namely when pto->vInventoryBlockToSend.size() >= MAX_INV_SZ. Are we assuming that this isn’t the case?

jonasnick commented at 8:33 pm on April 20, 2016:

not here https://github.com/sipa/bitcoin/blob/splitinvtxblock/src/main.cpp#L5790

in src/main.cpp: in 1a16f859e4 outdated

5855+                        if (!fInMemPool) continue; // another thread removed since queryHashes, maybe...
5856+                        if (!pto->pfilter->IsRelevantAndUpdate(tx)) continue;
5857+                    }
5858+                    pto->filterInventoryKnown.insert(hash);
5859+                    vInv.push_back(inv);
5860+                    if (vInv.size() == MAX_INV_SZ) {

sipa commented at 8:29 pm on April 20, 2016:

Entries are added one by one to vInv, and all call sites where that happens are followed by a max size test, so I don’t think vInv can exceed MAX_INV_SZ.

sipa commented at 8:49 pm on April 20, 2016:

Ah, good point. There may not be other mechanisms that prevent over 5000 blocks being inved at once.

sipa commented at 10:36 pm on April 20, 2016:

Fixed.

sipa force-pushed on Apr 20, 2016

Split up and optimize transaction and block inv queues dc13dcd2be

Handle mempool requests in send loop, subject to trickle

By eliminating queued entries from the mempool response and responding only at
trickle time, this makes the mempool no longer leak transaction arrival order
information (as the mempool itself is also sorted)-- at least no more than
relay itself leaks it.

ed7068302c

Return mempool queries in dependency order 4578215e7f

Move bloom and feerate filtering to just prior to tx sending.

This will avoid sending more pointless INVs around updates, and
 prevents using filter updates to timetag transactions.

Also adds locking for fRelayTxes.

b559914753

sipa force-pushed on Apr 20, 2016

dcousens commented at 4:08 am on April 21, 2016: contributor

@sipa could you update your original PR description with the full scope of this PR now? (for others)

gmaxwell commented at 7:04 pm on April 21, 2016: contributor

ACK.

jonasnick commented at 8:08 am on April 22, 2016: contributor

Ack

in src/txmempool.cpp: in b559914753

773+    CTxMemPool *mp;
774+public:
775+    DepthAndScoreComparator(CTxMemPool *mempool) : mp(mempool) {}
776+    bool operator()(const uint256& a, const uint256& b) { return mp->CompareDepthAndScore(a, b); }
777+};
778+}

sdaftuar commented at 8:18 pm on April 22, 2016:

nit: What do you think of exposing the DepthAndScoreComparator defined here, and eliminating the CompareInvMempoolOrder object declared in main.cpp by moving the comparator it defines (which reverses the comparison) into this class?

sipa commented at 8:54 pm on April 22, 2016:

The comparator over there sorts iterators to set items, so there is more difference.

I do think the comparator (or at least a base comparator) should be exposed here, which grabs the mempool lock etc. Perhaps the other comparator can then take a ref to the first one as an argument to add the reversing and set iterator dereferencing.

sdaftuar commented at 9:08 pm on April 22, 2016:

I was thinking that because the main.cpp comparator acts on different objects, it’d be easy to just move that function into this class (and explain in comments what the two comparators were for and why they are subtly different!), but your suggestion seems reasonable to me as well. I suppose its not awesome for the implementation details in main.cpp to leak into txmempool…

sipa commented at 9:36 pm on April 22, 2016:

Even better would be for txmempool.h to expose an opaque SortedMempoolSet object, which takes a list of hashes as inputs, and maintains a sorted heap of iterators to the mempool entries themselves, in addition to the cs lock.

sdaftuar commented at 8:24 pm on April 22, 2016: member

ACK, apart from my question about whether we should combine the two comparators that are being introduced.

Couple notes:

RelayTransaction no longer requires a feerate, and AcceptToMemoryPool no longer needs to fill a feerate in. We can clean this up in a future PR.
I noticed while testing that CTxMemPool::queryHashes is what we use in the RPC call getrawmempool. It’s hard for me to imagine anyone would care (and in fact the new sort order on the results seems handy!), but perhaps we should flag the change in the release notes?

gmaxwell commented at 6:00 am on April 26, 2016: contributor

I’ve had this running in its various forms for two weeks on a couple of public nodes, as well as a seperate test topology that consisted of nothing but nodes running this code. What do we need to move this forward? I’m concerned that if I move to testing other PRs this will be forgotten.

laanwj merged this on May 5, 2016

laanwj closed this on May 5, 2016

laanwj referenced this in commit 3b9a0bf41f on May 5, 2016

codablock referenced this in commit 994eb30518 on Sep 16, 2017

codablock referenced this in commit d8a279004c on Sep 19, 2017

codablock referenced this in commit 5d8e94a26f on Dec 21, 2017

codablock referenced this in commit d4b8abf4ff on Dec 21, 2017

MarkLTZ referenced this in commit a1e585bd60 on Apr 28, 2019

furszy referenced this in commit 1f010c0969 on Jan 23, 2021

zkbot referenced this in commit 1b5f17c900 on Apr 1, 2021

zkbot referenced this in commit 80e66e7daa on Apr 2, 2021

zkbot referenced this in commit 12af999d42 on Aug 10, 2021

zkbot referenced this in commit 44f7d7e4ea on Aug 11, 2021

zkbot referenced this in commit 7fa7d5700b on Aug 12, 2021

zkbot referenced this in commit fd462fd8c4 on Aug 13, 2021

DrahtBot locked this on Sep 8, 2021

Several performance and privacy improvements to inv/mempool handling #7840