Mempool: rework rebroadcast logic to improve privacy

amitiuttarwar commented at 4:48 pm on August 23, 2019: contributor

The current rebroadcast logic is terrible for privacy because only the source wallet will rebroadcast transactions, and does so quite frequently. This PR aims to improve privacy dramatically while preserving the benefits of rebroadcasting that ensure txns are successfully propagated through the network.

This PR introduces changes so nodes will resend transactions that it believes should have already been mined. It extracts the logic from the wallet into the mempool, so nodes will rebroadcast txns regardless of the originating wallet. Txns are defined as “should have been mined” by using the block assembler logic, and excluding txns that were recently added to the mempool. The wallet will resubmit txns to the mempool on a regular cadence to ensure those txns aren’t dropped (due to eviction, expiry, etc..) before being confirmed.

For more information, see: https://gist.github.com/amitiuttarwar/b592ee410e1f02ac0d44fcbed4621dba

DrahtBot added the label Mempool on Aug 23, 2019

DrahtBot added the label Mining on Aug 23, 2019

DrahtBot added the label P2P on Aug 23, 2019

DrahtBot added the label RPC/REST/ZMQ on Aug 23, 2019

DrahtBot added the label Tests on Aug 23, 2019

DrahtBot added the label TX fees and policy on Aug 23, 2019

DrahtBot added the label Wallet on Aug 23, 2019

DrahtBot commented at 6:45 pm on August 23, 2019: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#17805 (test: Add test for rpcwhitelistdefault by emilengler)
#17786 (refactor: Nuke policy/fees->mempool circular dependencies by hebasto)
#17693 (rpc: Add generateblock to mine a custom set of transactions by andrewtoth)
#17484 (wallet: add cached m_is_ibd to remove Chain::isInitialBlockDownload by ariard)
#17303 (p2p: Stop relaying non-mempool txs, improve tx privacy slightly by MarcoFalke)
#17261 (Make ScriptPubKeyMan an actual interface and the wallet to have multiple by achow101)
#16910 (wallet: reduce loading time by using unordered maps by achow101)
#16528 ([WIP] Native Descriptor Wallets (take 2) by achow101)
#15719 (Drop Chain::requestMempoolTransactions method by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

amitiuttarwar force-pushed on Aug 23, 2019

in test/functional/wallet_resendwallettransactions.py:9 in c383bdeea4 outdated

 5@@ -6,11 +6,10 @@
 6 from collections import defaultdict
 7 import time
 8 
 9-from test_framework.blocktools import create_block, create_coinbase
10-from test_framework.messages import ToHex
11-from test_framework.mininode import P2PInterface, mininode_lock
12+from test_framework.blocktools import create_coinbase

MarcoFalke commented at 9:45 pm on August 23, 2019:

0test/functional/wallet_resendwallettransactions.py:9:1: F401 'test_framework.blocktools.create_coinbase' imported but unused

amitiuttarwar force-pushed on Aug 23, 2019

amitiuttarwar commented at 10:23 pm on August 23, 2019: contributor

This code is ready for initial review.

there are still some to dos before it would be ready for merge:

identify expected rebroadcast traffic & worst case bandwidth usage.
persist setUnbroadcastTxIDs to mempool.dat
add functionality to run an automated job to cache min fee rate for txns to be included in block, then apply that filter to exclude txns with fee rate < min from rebroadcast set. this will reduce rebroadcast noise in scenarios where the mempool is emptying out.

there are also some follow-ups that can be addressed in separate PRs:

m_best_block_time is no longer used & can be removed & the wallet no longer needs to subscribe to UpdatedBlockTip() validation interface method
functionality to mark a peer (as “local” or such) so the mempool would still enforce initial broadcast for transactions received from one of these peers.

jnewbery commented at 10:24 pm on August 23, 2019: member

Concept ACK

amitiuttarwar force-pushed on Aug 23, 2019

meshcollider removed the label Mining on Aug 24, 2019

meshcollider removed the label RPC/REST/ZMQ on Aug 24, 2019

meshcollider removed the label TX fees and policy on Aug 24, 2019

meshcollider removed the label Tests on Aug 24, 2019

meshcollider removed the label Wallet on Aug 24, 2019

meshcollider removed the label P2P on Aug 24, 2019

meshcollider added the label Wallet on Aug 24, 2019

fanquake added the label P2P on Aug 24, 2019

amitiuttarwar force-pushed on Aug 24, 2019

in src/net.h:953 in 25a3b0ac6a outdated

891-
892 /** Return a timestamp in the future (in microseconds) for exponentially distributed events. */
893 int64_t PoissonNextSend(int64_t now, int average_interval_seconds);
894 
895+/** Wrapper to return mockable type */
896+inline std::chrono::seconds PoissonNextSend(std::chrono::seconds now, int average_interval_seconds)

MarcoFalke commented at 3:44 pm on August 25, 2019:

in commit 25a3b0ac6aea95845c48c0a345325af8ad15c3ca:

The legacy signature of this function takes as argument type and return type microseconds. Can you explain why this one is different? Note that you are allowed to pass in std::chrono::seconds when the function takes std::chrono::microseconds.

amitiuttarwar commented at 0:08 am on October 20, 2019:

Fixed.

in src/net.h:956 in 25a3b0ac6a outdated

894 
895+/** Wrapper to return mockable type */
896+inline std::chrono::seconds PoissonNextSend(std::chrono::seconds now, int average_interval_seconds)
897+{
898+    int64_t now_micros = (std::chrono::duration_cast<std::chrono::microseconds>(now)).count();
899+    return std::chrono::duration_cast<std::chrono::seconds>(std::chrono::microseconds{PoissonNextSend(now_micros, average_interval_seconds)});

MarcoFalke commented at 3:46 pm on August 25, 2019:

same commit:

Can you explain how the cast to seconds has an effect on the distribution? It appears that the most likely return value will be exactly 0? Alternatively, I’d rather return microseconds just like the existing function.

amitiuttarwar commented at 8:11 pm on August 26, 2019:

ah my bad, will fix

based on this tip

Note that you are allowed to pass in std::chrono::seconds when the function takes std::chrono::microseconds.

I’m thinking of updating the function signature to both take in & return microseconds. And the caller can pass through seconds when needed. I’m interested in moving all the poisson invocations to the chrono in a follow up PR. Its less of a gotcha since chrono requires the duration to be explicit, but it would be nice for it to be consistent.

in src/miner.h:138 in fd92d22540 outdated

134@@ -135,6 +135,7 @@ class BlockAssembler
135     bool fIncludeWitness;
136     unsigned int nBlockMaxWeight;
137     CFeeRate blockMinFeeRate;
138+    int64_t nMaxTxTime;

MarcoFalke commented at 3:48 pm on August 25, 2019:

In commit fd92d22540f97924bd73301dc061005b401d7472:

I’d prefer if new members are prefixed with m_ and used snake case according to the dev notes. This makes it easier to guess from the variable name if something is a member, a global, or just a symbol in the local scope. Additionally, I’d prefer to use std::chrono::seconds (or whatever the type is) for those reasons:

It documents the type for reviewers
It enforces the type at compile time and prevents unwanted casts
It documents that the time is mockable. (I know that the memepool currently is mockable and uses the legacy GetTime() function and types, but at least new code should use the new GetTime<>() functions and types.)

amitiuttarwar commented at 0:53 am on October 20, 2019:

Fixed.

in src/txmempool.cpp:118 in af38c70497 outdated

112+    // use CreateNewBlock to get set of transaction candidates
113+    std::unique_ptr<CBlockTemplate> pblocktemplate = BlockAssembler(Params(), options).CreateNewBlock(scriptDummy);
114+
115+    LOCK(cs);
116+    for (const auto& tx : pblocktemplate->block.vtx) {
117+        if (mapTx.find(tx->GetHash()) == mapTx.end()) continue;

MarcoFalke commented at 3:57 pm on August 25, 2019:

in commit af38c70497575c9ad33901a19db01a7f104ffaeb

Can you add a comment why this would ever be true, otherwise remove the dead code.

amitiuttarwar commented at 9:39 pm on August 26, 2019:

I added it as a safeguard. Do you feel confident it would never happen? If so, I will take another careful look at the logic to build my own confidence & remove.

MarcoFalke commented at 9:46 pm on August 26, 2019:

Even if it would, any follow-up logic that deals with the returned set needs to be robust against missing txs anyway.

MarcoFalke commented at 9:47 pm on August 26, 2019:

Note that ::mempool.cs is released as soon as this method returns.

amitiuttarwar commented at 8:09 pm on October 21, 2019:

ok my understanding is.. without the check, if the txn was no longer in the mempool, it would..

get added to setRebroadcastTxs in this function.
returned to the caller here & insert into setInventoryTxToSend
setInventoryTxToSend gets copied into vInvTx here
checks if txn is in mempool otherwise skips it here

which all seems fine so I’ll remove the check.

MarcoFalke commented at 4:02 pm on August 25, 2019: member

Concept ACK. Some questions about why microseconds are truncated to seconds among other stuff.

in src/net_processing.cpp:3811 in 4bda14245c outdated

3828@@ -3823,6 +3829,10 @@ bool PeerLogicValidation::SendMessages(CNode* pto)
3829                     }
3830 
3831                     pto->setInventoryTxToSend.insert(setRebroadcastTxs.begin(), setRebroadcastTxs.end());
3832+
3833+                    // also ensure inclusion of wallet txns that haven't been successfully broadcast yet
3834+                    // since set elements are unique, this will be a no-op if the txns are already in setInventoryTxToSend
3835+                    pto->setInventoryTxToSend.insert(mempool.setUnbroadcastTxIDs.begin(), mempool.setUnbroadcastTxIDs.end());

mzumsande commented at 9:20 pm on August 26, 2019:

If we add wallet transactions to the rebroadcast INV that have a smaller feerate than the top of our mempool, wouldn’t there be a feerate gap to the other INVs of the message, making the wallet transactions easily identifiable as such and reducing privacy?

amitiuttarwar commented at 9:52 pm on August 26, 2019:

correct. I will add a comment to document.

this logic should only trigger when a user submits a txn locally and it doesn’t get initially relayed.. an example where this would be needed.. a user submits a low fee rate txn with p2p disabled, inspects it in the local mempool, and enables p2p. currently, the txn would get initially relayed due to the wallet rebroadcast logic. with the proposed changes (w/out this additional mempool-force-relays-unbroadcast-txns mechansim), the node could have to be running at a pretty specific time (when the mempool is clearing out and mining low fee rate txns) in order for the txn to ever initially get broadcast. thus, setInventoryTxToSend.

In terms of privacy guarantees (or lack thereof) it mimics the current behavior. If you have any suggestions for how we could improve while preserving the propagation guarantees, I’m all ears :)

MarcoFalke commented at 3:29 pm on September 3, 2019:

making the wallet transactions easily identifiable as such and reducing privacy?

I don’t think this is true (at least no worse than today). While it is possible to guess whether a list of INVs is a rebroadcast inv or not, it shouldn’t be possible to trivially find the source of the low feerate txs in that inv. Txs broadcast for the first time from our node should be no different from txs broadcast for the first time from another node to us and then relayed by us. It is know to not be perfect (see dandelion tx relay), but improving that seems like a separate issue to me.

amitiuttarwar commented at 10:14 pm on November 25, 2019:

Want to clarify my expectations of the behavior-

We add transactions to setInventoryTxToSend when we expect to relay the transaction to a peer. This includes if we are the source node for the transaction as well as if we are relaying it. With these changes, both the transactions to rebroadcast and the “unbroadcast” transactions are added to setInventoryTxToSend on the rebroadcast poisson timer.

Later, all transactions from setInventoryTxToSend get added to vInvTx, passed through some checks, then relayed to peers.

So, what this means…

if they are keeping track, peers can detect rebroadcast transactions because INV messages have been sent for that transaction before.
INVs for unbroadcast transactions have probably not been sent out before (unless all the peers who received the message did not send a GETDATA back).
In the normal case, the peer would not be able to distinguish a transaction sent from the unbroadcast set from a transaction being submitted initially, or the initial relay. This is known to be imperfect, but these changes should not make privacy worse than current behavior.
The only new potential for privacy leak that I see based on these changes- A node disables p2p, creates a transaction, enables p2p. At that time, that transaction gets propagated out to the network due to being on the unbroadcast set. The first peer the node sends an INV to is the attacker. The attacker is able to send back a GETDATA before any other INV messages are sent out. The attacker does not relay the transaction to any other nodes. Time passes, the node calculates the transaction “should have been mined” by now, puts it into the rebroadcast set, relays it out to its peers. The attacker has another connection open to the node & is able to identify that it is a rebroadcast. The attacker would also have to have many other nodes & connections to the rest of the network to identify that this is the only node rebroadcasting that transaction. This seems pretty difficult to execute, definitely harder to discern source node based on rebroadcast logic than in the existing implementation.
In this case, the much bigger concern would be the ability of an attacker to stop the propagation of a transaction by sending the GETDATA back very quickly, thus removing it from the node’s unbroadcast set. I’m trying to better understand the risk of this & brainstorm possible better solutions (see #16698 (review)). If any reviewers have thoughts or questions about this mechanism, let’s continue the convo on that thread. @mzumsande to conclude this long explanation, hopefully this resolves your original question? The unbroadcast set should have (pretty much) the same guarantees as initial relay. LMK if you have any outstanding questions or recommendations for improved documentation.

in test/functional/wallet_resendwallettransactions.py:56 in 1053c0e915 outdated

100-        wait_until(lambda: node.p2ps[1].tx_invs_received[txid] >= 1, lock=mininode_lock)
101+        node.setmocktime(mocktime + 2 * one_day_in_seconds)
102+
103+        # confirm that its back in the mempool
104+        self.log.info("Transaction should be resubmitted to mempool")
105+        wait_until(lambda: txhsh in node.getrawmempool(), timeout=30)

mzumsande commented at 9:25 pm on August 26, 2019:

This goes into timeout for me, causing wallet_resendwallettransactions to fail when I run the entire test suite (same as in AppVeyor build here) but the test succeeds if I run it in isolation. If you can’t reproduce this I could look deeper into why it fails.

amitiuttarwar commented at 9:53 pm on August 26, 2019:

oh interesting. I haven’t been able to reproduce locally. I would love some help!

mzumsande commented at 4:55 pm on August 27, 2019:

I think that this is due to a bug with the time: RESEND_TXS_FREQUENCY is in microseconds. It is added to GetTime() which is in seconds, so an actual resend doesn’t ever happen in test. Yet the test sometimes succeeds because there is an initial resend (nNextResend is initialized to 0) which in some runs happens after the mocktime is set to two weeks and the tx has been expired (test passes), in some runs happens earlier (test fails).

amitiuttarwar commented at 0:14 am on August 29, 2019:

thanks for digging in!! I will fix the bug and ensure the times are consistent in microseconds.

could you tell me more about how you debugged? were you able to isolate the failure, or was it always when run in the entire suite? I’m curious why this behavior wouldn’t manifest as a flaky test when run in isolation.

mzumsande commented at 10:26 pm on August 29, 2019:

On my other computer it also sometimes failed in the single run, which made testing easier. I found this the bug adding some debug statements in ResendWalletTransactions.

Fixing the times might not solve this completely, because the initial rebroadcast after startup also happened for me in a few runs right before assert txhsh not in node.getrawmempool(), which then fails. Adding a sleep before you jump ahead in time could help.

mzumsande commented at 9:33 pm on August 26, 2019: member

Txns are defined as “should have been mined” by using the block assembler logic

I don’t understand 1) the concept “should have been mined”, and 2) how the block assembler logic achieves this. As to 1) do you mean the txns should have been mined in a specific block/range of blocks, but weren’t? Should no rebroadcasts happen in an ideal world where miners have an identical mempool to ours and mine rationally? As to 2) from what I understand, BlockAssembler creates a block template with 3/4*MAX_BLOCK_WEIGHT including the top feerate packages of our current mempool. Wouldn’t it always fill this block template with txns if our mempool is large enough, and therefore rather include 75% of the txns that we expect to be mined in the next block, instead of txns that should have been mined in the past?

amitiuttarwar commented at 10:20 pm on August 26, 2019: contributor

thanks for the review @mzumsande !

I don’t understand 1) the concept “should have been mined" […] do you mean the txns should have been mined in a specific block/range of blocks, but weren’t? Should no rebroadcasts happen in an ideal world where miners have an identical mempool to ours and mine rationally?

based on the local mempool, we are attempting to answer the question of what txns we think should have been mined. And saying if it wasn’t, maybe there was an issue with relay. You are correct- in a world where all mempools are consistent there wouldn’t be rebroadcasts.

As to 2) from what I understand, BlockAssembler creates a block template with 3/4*MAX_BLOCK_WEIGHT including the top feerate packages of our current mempool. Wouldn’t it always fill this block template with txns if our mempool is large enough, and therefore rather include 75% of the txns that we expect to be mined in the next block, instead of txns that should have been mined in the past?

yes. you will start with txns you expect to be mined in the next block. the recency filter will (likely) remove some of those transactions. however, in the case of a mempool thats emptying out, the recency filter might not have much effect. for that I have this todo before the PR would be ready for merge:

add functionality to run an automated job to cache min fee rate for txns to be included in block, then apply that filter to exclude txns with fee rate < min from rebroadcast set

mzumsande commented at 10:51 pm on August 26, 2019: member

based on the local mempool, we are attempting to answer the question of what txns we think should have been mined.

What confuses me is how we can answer that without actually looking into recent blocks. Considering that txns are removed from the mempool once they are included in a valid block, it is possible that the previous blocks removed the respective top of our previous versions of the mempool, so we are left with txns that were not at the top of the mempool earlier but are now, which we wouldn’t want to rebroadcast. Or the miners could have left out several high-fee-rate txns in favor of lower ones, in which case the higher ones are still present in our mempool and we would like to rebroadcast them. Or there just might have been no new blocks found in the hour since the last rebroadcast, in which case we wouldn’t need to rebroadcast.

How could we distinguish between these cases by just looking at our current mempool?

amitiuttarwar commented at 7:10 pm on August 29, 2019: contributor

great questions @mzumsande. I’ve thought about this a lot, so let me share some of my reasoning-

What confuses me is how we can answer that without actually looking into recent blocks.

We can’t. And further, even if we do look at the recent blocks, we still cannot answer exactly what “should” have been included. The two main pieces of information we are missing are 1. what the miner’s mempool looked like and 2. any weight applied through prioritisetransaction. By looking at a block, it is difficult to extrapolate the exact minimum fee rate for transactions to be included. So instead, the approach here is for a node to look at its local mempool and work towards the picture of what it believes should have already been included.

so we are left with txns that were not at the top of the mempool earlier but are now, which we wouldn’t want to rebroadcast.

yup, specifically in the case of the emptying out mempool, we would currently rebroadcast a full set of txns, thats why I want to build a cache of min fee rate and apply an extra filter to reduce noise in this circumstance ( from #16698 (comment) ):

add functionality to run an automated job to cache min fee rate for txns to be included in block, then apply that filter to exclude txns with fee rate < min from rebroadcast set. this will reduce rebroadcast noise in scenarios where the mempool is emptying out.

–

The crux of the thinking is that we are not going to get the rebroadcast set perfect, but that is ok. When a node rebroadcasts a txn, it sends an INV message to new connections (see filterInventoryKnown). Since INV messages are relatively small & can incorporate many transactions, we have some leeway.

All these different mechanisms are to reduce noise. I want to ensure the parameters chosen allow the worst case scenario (rebroadcasting the full set) to not be disruptive to the network. And these mechanisms should make the worst case infrequent.

Does this make sense to you? Let me know if you still have questions.

mzumsande commented at 7:04 pm on August 31, 2019: member

Thanks for the answer! I think that your approach makes sense and am looking forward to the traffic/bandwidth analysis. I am still not so sure if your approach is best described with the notion of “should have been mined”, but to a degree that’s just semantics.

DrahtBot added the label Needs rebase on Sep 7, 2019

laanwj commented at 12:32 pm on September 16, 2019: member

Concept ACK

MarcoFalke commented at 5:51 pm on September 18, 2019: member

Are you still working on this? This missed the 0.19 feature freeze, but I’d like to see it in 0.20 (hopefully early in the cycle)

MarcoFalke added this to the milestone 0.20.0 on Sep 18, 2019

amitiuttarwar commented at 8:44 pm on September 19, 2019: contributor

@MarcoFalke yes will resume soon

amitiuttarwar force-pushed on Oct 9, 2019

DrahtBot removed the label Needs rebase on Oct 9, 2019

amitiuttarwar force-pushed on Oct 9, 2019

amitiuttarwar force-pushed on Oct 10, 2019

amitiuttarwar force-pushed on Oct 12, 2019

amitiuttarwar force-pushed on Oct 14, 2019

amitiuttarwar force-pushed on Oct 19, 2019

amitiuttarwar commented at 2:01 am on October 19, 2019: contributor

an update on my current thinking for anyone interested-

next steps

some misc small cleanup (make PoissonNextSend interface consistent, pull out 7f8056c50d047d12201ad5d0f75e00103f8a0bd6 into separate PR, etc.)
cache the min fee rate for a txn to be included in a block & add as filter on rebroadcast set

then this PR would be ready for code review & I’d want to observe & gather data on bandwidth usage

in follow up PRs

persist the unbroadcast txn set to mempool.dat
remove m_best_block_time
fix circular dependency introduced between txmempool & miner

LMK if you have any questions or feedback!

amitiuttarwar force-pushed on Oct 20, 2019

in test/functional/mempool_rebroadcast.py:27 in 5b5151dd64 outdated

22+import time
23+
24+# Constant from txmempool.h
25+MAX_REBROADCAST_WEIGHT = 3000000
26+
27+class P2PStoreTxInvs(P2PInterface):

amitiuttarwar commented at 5:33 pm on October 20, 2019:

this is a temporary parking spot for this code. All three tests use it, so I’d like to pull it out into some shared utility, but haven’t figured out where makes sense yet. Any suggestions?

MarcoFalke commented at 1:33 pm on October 22, 2019:

It doesn’t depend on any outside modules except mininode, so it could be moved there or to a new module.

amitiuttarwar force-pushed on Oct 21, 2019

in src/net.h:953 in e768ad10fb outdated

952 
953+/** Wrapper to return mockable type */
954+inline std::chrono::microseconds PoissonNextSend(std::chrono::microseconds now, std::chrono::microseconds average_interval)
955+{
956+    int64_t interval_seconds = (std::chrono::duration_cast<std::chrono::seconds>(average_interval)).count();
957+    return std::chrono::microseconds{PoissonNextSend(now.count(), interval_seconds)};

MarcoFalke commented at 9:17 pm on October 22, 2019:

in commit e768ad10fbd24929059a1ada67cefaa3bc570b0d:

I don’t think casting (truncating) microseconds to seconds is acceptable, as it degrades everything smaller than a second to a delay of zero.

You could just pass in std::chrono::seconds average_interval and then call here:

0     return std::chrono::microseconds{PoissonNextSend(now.count(), interval_seconds.count())};

amitiuttarwar commented at 6:51 pm on October 23, 2019:

ok I get it. fixed.

amitiuttarwar force-pushed on Oct 23, 2019

instagibbs commented at 8:01 pm on October 24, 2019: member

is this ready for review?

DrahtBot added the label Needs rebase on Oct 24, 2019

amitiuttarwar commented at 0:15 am on October 25, 2019: contributor

@instagibbs - working on it ! building out one more piece of functionality (described #16698 (comment)), then will be ready for review. I’ll remove the WIP from the title & leave a comment when it is :)

MarcoFalke referenced this in commit c7e6b3b343 on Nov 5, 2019

amitiuttarwar force-pushed on Nov 20, 2019

DrahtBot removed the label Needs rebase on Nov 20, 2019

amitiuttarwar force-pushed on Nov 20, 2019

amitiuttarwar renamed this:
~~[WIP] Mempool: rework rebroadcast logic to improve privacy~~
Mempool: rework rebroadcast logic to improve privacy
on Nov 20, 2019

amitiuttarwar commented at 11:15 pm on November 20, 2019: contributor

🎉 this PR is ready for review!

in src/net_processing.cpp:1567 in 229f6b2cc6 outdated

1562@@ -1563,6 +1563,11 @@ void static ProcessGetData(CNode* pfrom, const CChainParams& chainparams, CConnm
1563             if (mi != mapRelay.end()) {
1564                 connman->PushMessage(pfrom, msgMaker.Make(nSendFlags, NetMsgType::TX, *mi->second));
1565                 push = true;
1566+                // Once the first peer requests GETDATA for a txn, we deem initial broadcast a success

amitiuttarwar commented at 0:15 am on November 21, 2019:

Not sure if one GETDATA is the best solution to indicate the txn has been successfully broadcast to the network.

The worst case is highlighted by the tests- a node receives the getdata back from the first peer before announcing to any other peers. so I have to disconnect the other peers to send to the new connection. (see here)

in practice, this means a user who submits a txn with p2p disabled, then re-enables, might only get the txn out to one other peer before marking it as “broadcast”

amitiuttarwar commented at 2:45 am on November 26, 2019:

some relevant info on how an attack could potentially be carried out here: #16698 (review). but relevant parts reiterated here-

the attack seems simple to execute, but rare to get the right conditions where it succeeds, and the incentive seems weak.

execution: attacker sends GETDATAs back for every transaction it receives & never propagate them out to the network.
conditions needed:

node would be relying on unbroadcast set to broadcast txn to network (attacker couldn’t discern)
node would have to relay to attacker first
timing would have to be fast enough that node receives & processes GETDATA before sending out other INVs.

outcome: the transaction doesn’t propagate until the rebroadcast logic picks it up & sends it out to all peers. the attacker delayed the initial propagation of the transaction.

doesn’t seem like a very worthwhile attack to me…. seems much more likely to occur under accidental conditions.

but I was thinking it through & am sharing incase reviewers want to weigh in.

ariard commented at 7:32 pm on November 27, 2019:

I find the mechanism a bit weird as it leaks a wallet concern in the P2P stack. If you have multiple wallets connected to your node or even accept tx from third-party through your RPC command are you going to ensure success for all of them ?

amitiuttarwar commented at 9:06 pm on December 2, 2019:

I agree that its not an ideal separation of concerns, but I think it’s reasonable. Essentially, when you submit a txn to the mempool/node, it guarantees that it will 1. validate the txn & accept into its mempool locally & 2. propagate it out to the network (assuming validity).

The normal way this is achieved is by bundling the actions together. However, if you have p2p disabled when you initially submit the txn, you no longer get # 2.

Previously, this wasn’t an issue because the rebroadcast logic was so over-eager, that initial relay would automatically occur within a short amount of time after enabling p2p. With these changes, that is no longer a guarantee, so we need to explicitly keep track of the transactions that have passed step # 1, but not yet # 2.

The other option would be that the mempool returns indication when the txn was accepted locally & then when the txn was propagated to the network. Then the entity submitting the txn (wallet, third-party RPC, whatever) would have to keep track of txns in the in-between state. I think thats a much messier interaction between the layers.

ariard commented at 5:13 pm on December 3, 2019:

validate the txn & accept into its mempool locally

Assuming tx succeed fee checks.

The normal way this is achieved is by bundling the actions together. However, if you have > p2p disabled when you initially submit the txn, you no longer get # 2.

As a user if you deactivate p2p functionality at tx mempool-insertion broadcast should be then a best-effort on the timing side. I think this point is really linked with wallet timer rebroadcast, if you reduce it to be under the mempool one, won’t you be sure than your txn is going to be initially broadcast at least within 1 hour ?

The other option would be that the mempool returns indication when the txn was accepted locally & then when the txn was propagated to the network. Then the entity submitting the txn (wallet, third-party RPC, whatever) would have to keep track of txns in the in-between state. I think thats a much messier interaction between the layers.

If a wallet gets notification of transaction mempool insertion it should rely on it for network propagation but mempool may fail for its own internal reasons (size, conflicts) so for reliability your wallet should ensure periodic resend/rebroadcast.

amitiuttarwar commented at 6:23 pm on January 3, 2020:

@ariard do you still have an outstanding concern here?

I find the mechanism a bit weird as it leaks a wallet concern in the P2P stack. the first comment talks about the layer separation so for reliability your wallet should ensure periodic resend/rebroadcast. but your second comment seems to talk more about the wallet & mempool rebroadcast timers.

the thread was started in regards to the unbroadcast logic. which is different than the rebroadcast logic. I explained the reason that it makes sense for the node to keep track of txns that are being initially relayed, let me know if you have further questions. I believe I have addressed your concerns around the timers elsewhere on this PR.

I’m going to resolve this conversation. If there’s further questions or concerns about the unbroadcast logic I can re-open.

in test/functional/mempool_rebroadcast.py:103 in 6b70e7af82 outdated

129@@ -100,59 +130,6 @@ def test_simple_rebroadcast(self):
130         # check that node1 got txns bc rebroadcasting
131         wait_until(lambda: len(node1.getrawmempool()) == 3, timeout=30)
132 
133-    def test_rebroadcast_top_txns(self):

amitiuttarwar commented at 0:31 am on November 21, 2019:

I removed this test because it was converging with test_fee_rate_cache.

amitiuttarwar force-pushed on Nov 21, 2019

in src/wallet/wallet.cpp:1921 in 38599640a1 outdated

1925-// mined in the most recent block. Any transaction that wasn't in the top
1926-// blockweight of transactions in the mempool shouldn't have been mined,
1927-// and so is probably just sitting in the mempool waiting to be confirmed.
1928-// Rebroadcasting does nothing to speed up confirmation and only damages
1929-// privacy.
1930+// Once a day, resbumit all wallet transactions to the node,

instagibbs commented at 8:00 pm on November 21, 2019:

s/resbumit/resubmit/

JeremyRubin commented at 2:58 am on November 22, 2019:

Isn’t this a more obvious privacy leak now? Because you’d see the tx hash appear in two updates?

amitiuttarwar commented at 2:49 am on November 26, 2019:

Not sure I follow. This method would be a no-op if the transaction is in your mempool. If its not in your mempool, it would call through to BroadcastTransaction with relay=false, so it would go through ATMP, but not be immediately relayed. Another INV will only be sent if the transaction is selected for rebroadcast.

I’m sure there’s some small privacy leak here based on timing of re-entrance to mempool. Haven’t fully thought it through yet. But it should be much less than before? What two updates are you referring to?

in src/wallet/wallet.cpp:1958 in 38599640a1 outdated

1963-            // Attempt to rebroadcast all txes more than 5 minutes older than
1964-            // the last block. SubmitMemoryPoolAndRelay() will not rebroadcast
1965-            // any confirmed or conflicting txs.
1966-            if (wtx.nTimeReceived > m_best_block_time - 5 * 60) continue;
1967             std::string unused_err_string;
1968-            if (wtx.SubmitMemoryPoolAndRelay(unused_err_string, true)) ++submitted_tx_count;

instagibbs commented at 8:02 pm on November 21, 2019:

suggested bikeshedding of name for new readers:

SubmitMemoryPoolAndMaybeRelay or SubmitMemoryPool since relay is an argument…

amitiuttarwar commented at 5:09 am on November 26, 2019:

yeah I agree. seems unrelated to this PR though? I can easily make it as a separate PR, but not sure on the etiquette around that since it would be a pure refactor. thoughts?

instagibbs commented at 1:54 pm on November 26, 2019:

It’s fine to not change it. I might do it myself since it’s just bothering me :)

in src/net_processing.cpp:116 in 91eefef240 outdated

110@@ -111,6 +111,8 @@ static constexpr unsigned int INVENTORY_BROADCAST_MAX = 7 * INVENTORY_BROADCAST_
111 static constexpr unsigned int AVG_FEEFILTER_BROADCAST_INTERVAL = 10 * 60;
112 /** Maximum feefilter broadcast delay after significant change. */
113 static constexpr unsigned int MAX_FEEFILTER_CHANGE_DELAY = 5 * 60;
114+/** Average delay between rebroadcasts */
115+static const std::chrono::seconds TX_REBROADCAST_INTERVAL = std::chrono::seconds{60 * 60};

JeremyRubin commented at 2:47 am on November 22, 2019:

Does it make sense to hardcode a smaller MIN_TX_REBROADCAST_INTERVAL (5 minutes?), and then a TX_REBROADCAST_INTERVAL which is either 1 hour or a custom parameter?

JeremyRubin commented at 2:47 am on November 22, 2019:

Can be addressed in a follow up PR.

amitiuttarwar commented at 2:08 am on November 26, 2019:

are you proposing the minimum to support a user configurable frequency, or for the current case (since its possible bc of poisson distribution)?

amitiuttarwar commented at 2:17 am on November 26, 2019:

would be happy to support customizable rebroadcast interval in a future PR if that’s of interest. request for anyone coming across this who’d like that feature to leave a 👍response

JeremyRubin commented at 2:44 am on November 26, 2019:

I think the min can be hard-coded, and that it shouldn’t happen more frequently even when drawing poisson time (e.g., something like: min_time + poisson(avg_time - min_time)).

So as to impose a strict cap on how frequently this is happening, because if you draw out of a poisson a large number of times eventually you’ll see a run of small interval windows.

You also want something to ensure that there is always some entropy, even if min_time == avg_time (maybe enforce that avg_time >= min_time*2).

Then, I think that the actual rebroadcast interval should be configurable, based on a user parameter. E.g., default is 30 minutes, but can be changes down to the min_time. This can happen in a future PR.

jnewbery commented at 3:35 am on November 26, 2019:

I don’t think a customizable rebroadcast interval is a good idea:

we should avoid adding command line options that have minimal impact on user experience. Take a look at the 10s of options in init.cpp. I’m sure there are many of them that hardly anyone uses, and they just add unnecessary complexity/interactions to the code.
we should try to avoid adding unnecessary fingerprints to nodes by keeping their p2p behaviour as uniform as possible.

JeremyRubin commented at 7:18 am on November 26, 2019:

Then that’s an argument against doing this at all, as it increases bandwidth requirements for low-resourced nodes who need to rebroadcast transactions.

JeremyRubin commented at 7:20 am on November 26, 2019:

To be clear; I’m in favor of this change :)

I just think if you’re a low resourced node you’d want to be able to decrease the frequency at which your rebroadcasts are occurring if you have to send out an entire block worth of data.

ajtowns commented at 8:15 am on November 26, 2019:

I think there’s a case for reviewing the worst case resource usage, but I think in the normal case the bandwidth for rebroadcast will be very low and not really something that needs to be throttled even for low powered nodes. (Reducing your mempool size would be one way of reducing your outgoing rebroadcast traffic, but I think that’s mostly just a tradeoff against incoming rebroadcast traffic)

JeremyRubin commented at 1:43 am on November 28, 2019:

Reducing mempool does not obviously help with resource reduction for a few reasons:

because of compact blocks (slower to get new blocks, so if you’re peak-bandwidth capped it’s bad).
If you’re bandwidth constrained, you aren’t necessarily memory constrained either (why throw away a transaction if it seems good?)
It stands to reason that when fees are going up, you’ll evict a lot of stuff from your mempool as newer txns go in, and when the fees go back down you’ll need stuff re-relayed.

I think I’d just like to understand how this does impact bandwidth usage, it seemed to not be a non trivial amount more as it’s 1/f * block size * period * #peers more data volume? If it’s actually small then so be it. But it does seem, to me, to be non-negligible

amitiuttarwar commented at 1:04 am on January 7, 2020:

Seems like the customizable rebroadcast interval is a proposed solution for concerns around bandwidth usage for low resourced nodes.

I agree that we should ensure these rebroadcast changes don’t impose a significant bandwidth requirement.

I believe the current changes require little additional bandwidth, which my initial monitoring from running this patch supports. More info posted in the main comment thread.

But this is not conclusive for determining that bandwidth impact is low in all conditions. If further monitoring leads to deciding we need to restrict resource usage further, we have options. For the fingerprint reasons mentioned above I don’t think a customizable rebroadcast interval makes the most sense. However, we could consider a kill switch to disable rebroadcast all together (along the lines of current implementation walletbroadcast=0). Also available (and probably the first line of experimentation) would be tinkering with the params to tighten the txns selected for rebroadcast.

I’ve updated my gist with this open questions & possible solutions and moving forward will maintain it with the options & latest thinking.

Since this comment thread is about customizable rebroadcast intervals, I’m going to resolve the conversation. But let’s continue to discuss bandwidth impact in the main PR thread as I continue to monitor & share data. @JeremyRubin let me know what you think. esp if you disagree or have any questions.

in src/txmempool.cpp:118 in 91eefef240 outdated

113+
114+    // use CreateNewBlock to get set of transaction candidates
115+    std::unique_ptr<CBlockTemplate> pblocktemplate = BlockAssembler(Params(), options).CreateNewBlock(scriptDummy);
116+
117+    LOCK(cs);
118+    for (const auto& tx : pblocktemplate->block.vtx) {

JeremyRubin commented at 2:51 am on November 22, 2019:

There’s no real point to using a set here – the block is already guaranteed to be de-duplicated, so you can make this interface just use a vector (or move the vtx out of the block template, and insert directly into setRebroadcastTxs later)

amitiuttarwar commented at 11:27 pm on December 2, 2019:

updated to vector. I started reading up on move, but don’t fully understand it and it seems like there are some gotchas? going to resolve this convo for now, but we can re-open if you think move is better.

in src/txmempool.h:43 in 91eefef240 outdated

34@@ -35,6 +35,10 @@ extern CCriticalSection cs_main;
35 /** Fake height value used in Coin to signify they are only in the memory pool (since 0.8) */
36 static const uint32_t MEMPOOL_HEIGHT = 0x7FFFFFFF;
37 
38+// we rebroadcast 3/4 of max block weight (defined in consensus.h)
39+// to reduce noise due to circumstances such as miners mining priority txns
40+static const unsigned int MAX_REBROADCAST_WEIGHT = 3000000;

JeremyRubin commented at 2:52 am on November 22, 2019:

Can this be re-written in terms of the consensus.h parameters?

instagibbs commented at 7:00 pm on November 22, 2019:

I’d also like a slightly stronger justification for 3/4 if there is one. What is “noise” in this context? Any simulations/estimations to show that 1/4 cut off is a lot of bandwidth in practice?

amitiuttarwar commented at 0:42 am on December 3, 2019:

updated to be defined in terms of consensus.h param.

RE justification for 3/4 & bandwidth implications -> agreed more data & justification is important. I’m working on running this patch live & observing bandwidth usage and will report back with that data. I’m going to resolve this comment & continue the conversation in the main thread at that time.

in src/txmempool.h:39 in 54e0e87e21 outdated

34@@ -35,6 +35,9 @@ extern CCriticalSection cs_main;
35 /** Fake height value used in Coin to signify they are only in the memory pool (since 0.8) */
36 static const uint32_t MEMPOOL_HEIGHT = 0x7FFFFFFF;
37 
38+// Default minimum age for a txn to be rebroadcast in seconds - 30 min

JeremyRubin commented at 2:54 am on November 22, 2019:

Why 30 minutes?

instagibbs commented at 7:03 pm on November 22, 2019:

probably want it a value related to the consensus.nPowTargetSpacing ?

in src/miner.cpp:276 in 54e0e87e21 outdated

272@@ -269,6 +273,7 @@ int BlockAssembler::UpdatePackagesForAdded(const CTxMemPool::setEntries& already
273 bool BlockAssembler::SkipMapTxEntry(CTxMemPool::txiter it, indexed_modified_transaction_set &mapModifiedTx, CTxMemPool::setEntries &failedTx)
274 {
275     assert (it != mempool.mapTx.end());
276+    if (it->GetTime() > m_max_tx_time) return true; // txn too recent

JeremyRubin commented at 2:55 am on November 22, 2019:

technically, for the name to be accurate should be >= (as we won’t allow a tx with the max time).

JeremyRubin commented at 2:56 am on November 22, 2019:

Could either do >= or rename it tx_time_bound

instagibbs commented at 7:06 pm on November 22, 2019:

I suggested rename eslewhere to be self-describing(something like m_skip_inclusion_until ?) and then you can drop the comment on this line because it’s obvious what the meaning is.

mzumsande commented at 9:57 pm on November 26, 2019:

I think that this commit should have more detailed documentation, both in the header and here where txes are filtered out. Since the code is intended for actual mining, most readers won’t expect this kind of “creative” indirect use and will wonder why we would ever exclude transactions from a block for being too recent.

amitiuttarwar commented at 11:34 pm on December 2, 2019:

updated name to m_skip_inclusion_until & added documentation to this function definition & in the header

in src/wallet/wallet.cpp:1926 in 38599640a1 outdated

1932 void CWallet::ResendWalletTransactions()
1933 {
1934     // During reindex, importing and IBD, old wallet transactions become
1935-    // unconfirmed. Don't resend them as that would spam other nodes.
1936+    // unconfirmed. Don't need to resubmit to our node.
1937     if (!chain().isReadyToBroadcast()) return;

JeremyRubin commented at 3:06 am on November 22, 2019:

Maybe this line should “go away”, and then it should be ensured that GetBroadcastTransactions is false during reindex/import/IBD (which would block the calls below in SubmitMemoryPoolAndRelay).

“Go away” in scare quotes, so that way we’re only using this line as an optimization to prevent having to lock the chain. below, but it’s not required for any correctness reasons.

in src/wallet/wallet.cpp:1923 in 38599640a1 outdated

1927-// and so is probably just sitting in the mempool waiting to be confirmed.
1928-// Rebroadcasting does nothing to speed up confirmation and only damages
1929-// privacy.
1930+// Once a day, resbumit all wallet transactions to the node,
1931+// in case it has been dropped from your mempool.
1932 void CWallet::ResendWalletTransactions()

JeremyRubin commented at 3:07 am on November 22, 2019:

Perhaps rename this because now we’re not resending?

amitiuttarwar commented at 4:52 am on November 26, 2019:

we’re resending (or attempting) to the mempool. I could rename to ResubmitWalletTransactions, but that doesn’t seem much different?

JeremyRubin commented at 3:26 am on November 22, 2019: contributor

This is some really cool work! :clap:

I did some basic review, left comments below.

It also seems that (maybe?) GitHub is screwing you on topological ordering, if I recall git rebase --ignore-date can make github show commits in proper order again (at the expense of the actual time you did the work). If you end up needing to rebase for whatever reason, may be worth doing that.

Generally, this gets a concept ack from me, it does seem to be a logical improvement.

Open questions for me are:

Testing – I didn’t review the python tests. I’m not sure what are the edgiest conditions that might come up
I don’t quite like using the mining code as a dependency for figuring out what to rebroadcast. Prior to this it would be possible to disable the ancestor-scored mapTx if you were a non-mining node, after this you will need to keep that index around. If we’re only rebroadcasting once an hour or so though, how bad would it be to rebuild this index as a bounded-size priority queue (drop low scoring values) when needed (e.g., for future non-mining nodes)?
I don’t find the constants chosen unreasonable, but would be good to hear a bit more on why they were selected
Would this PR be better if we used a “thin blocks” style relay? We already form a block template, but then we don’t take advantage of the compact-blocks relay code which would minimize the data sent, assuming the neighbor node does have most of what we rebroadcast…

in test/lint/lint-circular-dependencies.sh:29 in 91eefef240 outdated

30@@ -31,6 +31,7 @@ EXPECTED_CIRCULAR_DEPENDENCIES=(
31     "qt/guiutil -> qt/walletmodel -> qt/optionsmodel -> qt/guiutil"
32     "txmempool -> validation -> validationinterface -> txmempool"
33     "wallet/scriptpubkeyman -> wallet/wallet -> wallet/scriptpubkeyman"
34+    "miner -> txmempool -> miner"

instagibbs commented at 6:46 pm on November 22, 2019:

how much work is it to avoid this?

in src/net_processing.cpp:3804 in 91eefef240 outdated

3815+
3816+                // Check for rebroadcasts
3817+                if (pto->m_next_rebroadcast < current_time) {
3818+                    LogPrint(BCLog::NET, "Rebroadcast timer triggered\n");
3819+                    // schedule next rebroadcast
3820+                    bool fFirst = (pto->m_next_rebroadcast.count() == 0);

instagibbs commented at 6:51 pm on November 22, 2019:

you could re-arrange this to just check the count directly and use in below conditional, then set the new value for pto->m_next_rebroadcast after the conditional block. Gets rid of fFirst.

amitiuttarwar commented at 11:44 pm on December 2, 2019:

I’d rather leave it in for legibility since the compiler should optimize it away regardless- (simple example reproduced here). If invoked, the fSkipRun logic needs to overwrite the next rebroadcast time, so that logic would also need to be shuffled. Doable but makes the code more opaque.

in src/net_processing.cpp:3831 in 91eefef240 outdated

3829+                            LogPrint(BCLog::NET, "Rebroadcast tx=%s peer=%d\n", hash.GetHex(), pto->GetId());
3830+                        }
3831+
3832+                        // add rebroadcast txns
3833+                        pto->m_tx_relay->setInventoryTxToSend.insert(setRebroadcastTxs.begin(), setRebroadcastTxs.end());
3834+                }

instagibbs commented at 6:52 pm on November 22, 2019:

Think these brackets are off by a tab?

instagibbs commented at 7:07 pm on November 22, 2019:

oh I see in 38599640a1afae2bcf2aa46b0009bc0b94a46434 it’s fixed anyways, not worth messing around…

amitiuttarwar commented at 11:46 pm on December 2, 2019:

whitespace strugs 🙈

in src/txmempool.cpp:112 in 91eefef240 outdated

107+    if (::ChainstateActive().IsInitialBlockDownload() || ::fImporting || ::fReindex) return;
108+
109+    BlockAssembler::Options options;
110+    options.nBlockMaxWeight = MAX_REBROADCAST_WEIGHT;
111+    options.m_max_tx_time = std::chrono::seconds(GetTime()) - REBROADCAST_MIN_TX_AGE;
112+    CScript scriptDummy = CScript() << OP_TRUE;

instagibbs commented at 6:56 pm on November 22, 2019:

I don’t think it needs to be a non-empty script, blank is fine for this.

also try using snake_case for new variables aka dummy_script.

amitiuttarwar commented at 11:47 pm on December 2, 2019:

fixed

in src/net_processing.cpp:3826 in 91eefef240 outdated

3821+                    pto->m_next_rebroadcast = PoissonNextSend(current_time, TX_REBROADCAST_INTERVAL);
3822+
3823+
3824+                    if (!fFirst) {
3825+                        std::set<uint256> setRebroadcastTxs;
3826+                        mempool.GetRebroadcastTransactions(setRebroadcastTxs);

instagibbs commented at 6:59 pm on November 22, 2019:

very high level question: how much compute time does creating a unique block template for each peer roughly every 10 minutes take?

in src/miner.h:138 in 54e0e87e21 outdated

134@@ -135,6 +135,7 @@ class BlockAssembler
135     bool fIncludeWitness;
136     unsigned int nBlockMaxWeight;
137     CFeeRate blockMinFeeRate;
138+    std::chrono::seconds m_max_tx_time;

instagibbs commented at 7:05 pm on November 22, 2019:

this name is pretty ambiguous.

m_skip_inclusion_until or something in that vicinity?

amitiuttarwar commented at 11:47 pm on December 2, 2019:

fixed

in src/net_processing.cpp:3827 in 38599640a1 outdated

3830@@ -3831,8 +3831,13 @@ bool PeerLogicValidation::SendMessages(CNode* pto)
3831 
3832                         // add rebroadcast txns
3833                         pto->m_tx_relay->setInventoryTxToSend.insert(setRebroadcastTxs.begin(), setRebroadcastTxs.end());
3834+
3835+                        // also include wallet txns that haven't been successfully broadcast yet
3836+                        LogPrint(BCLog::NET, "Force initial broadcast of %lu transactions \n", mempool.m_unbroadcast_txids.size());

instagibbs commented at 7:14 pm on November 22, 2019:

this commit 38599640a1afae2bcf2aa46b0009bc0b94a46434 introduces m_unbroadcast_txids before it’s defined anywhere.

I’d also suggest a rename if it’s wallet-related only. m_unbroadcast_wallet_txids

amitiuttarwar commented at 11:49 pm on December 2, 2019:

cleaned up my commits, they should be good now.

as you’ve mentioned, its wallet & rpc-submitted txns, so I could rename to m_unbroadcast_local_txids, but I don’t think that helps clarify so I left as is. lmk if you disagree.

in src/node/transaction.cpp:81 in ffd154d064 outdated

77@@ -78,6 +78,9 @@ TransactionError BroadcastTransaction(NodeContext& node, const CTransactionRef t
78     }
79 
80     if (relay) {
81+        // the mempool explicitly keeps track of wallet txns to ensure successful initial broadcast

instagibbs commented at 7:18 pm on November 22, 2019:

This also adds anything submitted via sendrawtransaction, at least, so it’s not explicitly wallet-related.

I think it’s wallet-or-sendrawtransaction transactions only :)

confirmed: src/node/transaction.cpp:19: // BroadcastTransaction can be called by either sendrawtransaction RPC or wallet RPCs.

this should get a test!

amitiuttarwar commented at 2:57 am on November 26, 2019:

Yup. Totally. I think I use the language of “local” transactions elsewhere. Do you find that clear or should I just spell it all out every time?

Also, revisiting my mempool_wallet_transactions test, I realized I actually end up creating the “wallet” transaction with sendrawtransaction. Oops. I’ll add another transaction in that test that uses sendtoaddress then it will cover both :)

Thanks !

instagibbs commented at 1:56 pm on November 26, 2019:

{node, locally}-submitted transactions is fine, just noting the comments are wrong in certain places

amitiuttarwar commented at 11:50 pm on December 2, 2019:

fixed

instagibbs commented at 7:28 pm on November 22, 2019: member

gave some opening suggestions while I digest the general strategy, feel free to not engage in the suggestions until enough concept/approach ACKs come in

in test/functional/mempool_wallet_transactions.py:61 in 42c460a24f outdated

56+            utxo = utxos.pop()
57+            inputs = [{'txid': utxo['txid'], 'vout': utxo['vout']}]
58+            raw_tx_hex = node.createrawtransaction(inputs, outputs)
59+            signed_tx = node.signrawtransactionwithwallet(raw_tx_hex)
60+            node.sendrawtransaction(hexstring=signed_tx['hex'], maxfeerate=0)
61+            self.nodes[1].sendrawtransaction(hexstring=signed_tx['hex'], maxfeerate=0)

MarcoFalke commented at 0:29 am on November 26, 2019:

You are creating blocks on node[0], that this node (node[1]) might not have. This will lead to a race.

You can avoid it by calling sync_blocks or sync_all after you have generated the blocks.

amitiuttarwar commented at 1:05 am on November 26, 2019:

ah, got it now. thank you!

amitiuttarwar commented at 4:38 am on November 27, 2019: contributor

thanks for the reviews @JeremyRubin & @instagibbs. I’m making my way through them, but want to leave a high level comment- the biggest concern with these changes is the bandwidth impact to the node & network. The parameters I have currently chosen are fairly arbitrary, with the intent of keeping the size of the rebroadcast set small but meaningful. I’m working on running a node with some extra logging, and will share my findings once I have some real world data. Also can tweak parameters based on that info if needed.

in src/txmempool.cpp:111 in 91eefef240 outdated

106+    // accidentally spam our peers with old transactions.
107+    if (::ChainstateActive().IsInitialBlockDownload() || ::fImporting || ::fReindex) return;
108+
109+    BlockAssembler::Options options;
110+    options.nBlockMaxWeight = MAX_REBROADCAST_WEIGHT;
111+    options.m_max_tx_time = std::chrono::seconds(GetTime()) - REBROADCAST_MIN_TX_AGE;

adamjonas commented at 4:01 pm on November 27, 2019:

91eefef24 isn’t compiling for me due to no member named 'm_max_tx_time' in 'BlockAssembler::Options'

amitiuttarwar commented at 11:57 pm on December 2, 2019:

fixed. all commits should now build.

in src/txmempool.cpp:119 in 91eefef240 outdated

114+    // use CreateNewBlock to get set of transaction candidates
115+    std::unique_ptr<CBlockTemplate> pblocktemplate = BlockAssembler(Params(), options).CreateNewBlock(scriptDummy);
116+
117+    LOCK(cs);
118+    for (const auto& tx : pblocktemplate->block.vtx) {
119+        // add to rebroadcast set

ariard commented at 4:15 pm on November 27, 2019:

nit: you could print log here instead of in net_processing to avoid iterating on the set/vector again like “Submitting tx for rebroadcast to net processing”

amitiuttarwar commented at 10:20 pm on December 2, 2019:

hm that’s true, but then I wouldn’t be able to supplement with which peer. Leaving it for now.

ariard commented at 4:56 pm on December 3, 2019:

Or you can also add a printer just for the peer in the “check for rebroadcasts” SendMessages branch :) ?

in test/functional/mempool_rebroadcast.py:171 in 42c460a24f outdated

166+            # create a recent transaction
167+            new_tx = node1.sendtoaddress(node1.getnewaddress(), 2)
168+            new_tx_id = int(new_tx, 16)
169+
170+            # ensure node0 has the transaction
171+            wait_until(lambda: new_tx in node.getrawmempool())

jonatack commented at 5:02 pm on November 27, 2019:

This assertion timed out for me (“not true after 60 seconds”) on the first run of 375 secs. Passed on a second run, then timed out in the same place again in the third run. Passed a few more runs, then failed again.

jonatack commented at 5:58 pm on November 27, 2019:

Both at 42c460a24f6d9d9c2c65c30287965acbce096ecf and rebased to current master.

 0~/projects/bitcoin/bitcoin (pr/16698) $ test/functional/mempool_rebroadcast.py 
 12019-11-27T17:51:11.955000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_g9hdr2e_
 22019-11-27T17:51:13.096000Z TestFramework (INFO): Test simplest rebroadcast case
 32019-11-27T17:51:14.928000Z TestFramework (INFO): Test recent txns don't get rebroadcast
 42019-11-27T17:56:31.453000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: ''''
 5            wait_until(lambda: new_tx in node.getrawmempool())
 6'''
 72019-11-27T17:56:31.467000Z TestFramework (ERROR): Assertion failed
 8Traceback (most recent call last):
 9  File "/home/jon/projects/bitcoin/bitcoin/test/functional/test_framework/test_framework.py", line 111, in main
10    self.run_test()
11  File "test/functional/mempool_rebroadcast.py", line 44, in run_test
12    self.test_recency_filter()
13  File "test/functional/mempool_rebroadcast.py", line 171, in test_recency_filter
14    wait_until(lambda: new_tx in node.getrawmempool())
15  File "/home/jon/projects/bitcoin/bitcoin/test/functional/test_framework/util.py", line 229, in wait_until
16    raise AssertionError("Predicate {} not true after {} seconds".format(predicate_source, timeout))
17AssertionError: Predicate ''''
18            wait_until(lambda: new_tx in node.getrawmempool())
19''' not true after 60 seconds
202019-11-27T17:56:31.519000Z TestFramework (INFO): Stopping nodes

adamjonas commented at 6:03 pm on November 27, 2019:

This on my first run as well, but then passed six in a row so didn’t report.

amitiuttarwar commented at 6:42 pm on November 30, 2019:

I’m unable to reproduce this locally. Asked @jonatack for logs & looks like the failure occurs after going through the while loop >150 times, which means the if new_conn.get_invs() conditional is never hitting. Theoretically it would be possible for the condition to not hit, but that should be an extreme edge case, vs the frequency of failures being seen. I’ll try to run this test in different environments to see if I can reproduce.

jonatack commented at 7:35 pm on November 30, 2019:

FWIW this was on a recent Debian and building with --enable-debug --enable-bench. Could be good to have other data points in the form of people running this test. I hope to dig into these tests soon.

amitiuttarwar commented at 7:22 pm on January 6, 2020:

I was able to reproduce this failure on an ubuntu machine. latest push fixed it for me & jonatack. going to resolve.

jonatack commented at 7:56 pm on January 6, 2020:

I was able to reproduce this failure on an ubuntu machine. latest push fixed it for me & jonatack. going to resolve.

SGTM

in test/functional/mempool_rebroadcast.py:139 in cadd572afd outdated

134+        wait_until(lambda: len(conn1.tx_invs_received) == 90)
135+
136+        # confirm txns are more than max rebroadcast amount
137+        assert_greater_than(node.getmempoolinfo()['bytes'], MAX_REBROADCAST_WEIGHT)
138+
139+        self.sync_all()

adamjonas commented at 5:21 pm on November 27, 2019:

I am unable to get this to pass in cadd572afd8b1468aee8a0a3ba65b5e38fc021dd or ffd154d06. Getting an ‘Mempool sync timed out` AssertionError.

amitiuttarwar commented at 11:57 pm on December 2, 2019:

should be fixed

adamjonas commented at 5:49 pm on November 27, 2019: member

A few issues with the builds (https://github.com/bitcoin/bitcoin/pull/16698#discussion_r351371480 and #16698 (review)) and functional tests. Overall, found the code well commented.

jonatack commented at 5:54 pm on November 27, 2019: member

Interesting ideas!

Built and ran tests both on your branch and rebased to current master while looking over the code.

At a high level, I think it would help reviewers and future contributors to:

give more detail in the PR description (e.g. the specific parameters chosen and why with justification, links to mailing list discussions, research and benchmarking)
update the gist that is provided for more information (the implementation plan appears to be out of date?) or remove it
add more supporting explanation/documentation in the code

Kudos the for the functional tests. So far mempool_rebroadcast.py seems flakey for me, see my comment below.

Overall my first impression (which could be wrong!) is that these changes may need substantive research and benchmarking to help with reasoning about them and to demonstrate rigorously that they are a worthwhile improvement without introducing x-order effects, but I need to think more deeply about them and look more into the tests.

in src/miner.cpp:459 in 42c460a24f outdated

451@@ -406,6 +452,12 @@ void BlockAssembler::addPackageTxs(int &nPackagesSelected, int &nDescendantsUpda
452         // This transaction will make it in; reset the failed counter.
453         nConsecutiveFailed = 0;
454 
455+        CFeeRate newFeeRate(packageFees, packageSize * WITNESS_SCALE_FACTOR);
456+
457+        if (newFeeRate < minPackageFeeRate) {

emilengler commented at 6:05 pm on November 27, 2019:

Nit: Wouldn’t it be better to use the one line if coding style?

in src/rpc/mining.cpp:271 in 0723e61274 outdated

261@@ -262,7 +262,9 @@ static UniValue getmininginfo(const JSONRPCRequest& request)
262 static UniValue prioritisetransaction(const JSONRPCRequest& request)
263 {
264             RPCHelpMan{"prioritisetransaction",
265-                "Accepts the transaction into mined blocks at a higher (or lower) priority\n",
266+                "Accepts the transaction into mined blocks at a higher (or lower) priority.\n"
267+                "\nNote that prioritizing a transaction could leak privacy, through both\n"

ariard commented at 6:13 pm on November 27, 2019:

“could LEAK PRIVACY if tx is new and hasn’t previously flooded on the network”

amitiuttarwar commented at 8:26 pm on December 2, 2019:

hm, there could be a privacy leak even if the txn isn’t new. Even if it has propagated the network, if you prioritisetransaction it could cause the rebroadcast logic to pick up the txn & rebroadcast it sooner than it would have otherwise. this would indicate to a spy node that this txn is of special interest to you, thus leaking privacy.

in src/wallet/wallet.cpp:47 in 38599640a1 outdated

43@@ -44,6 +44,8 @@ const std::map<uint64_t,std::string> WALLET_FLAG_CAVEATS{
44 };
45 
46 static const size_t OUTPUT_GROUP_MAX_ENTRIES = 10;
47+// frequency of resubmitting txns to mempool- 24 hours

ariard commented at 6:14 pm on November 27, 2019:

rational of every 24h? I wondering if we shouldn’t be more aggressive to have a chance of wallet rebroadcast being staggered in next mempool rebroadcast happening every 10min so resend like 1 min

amitiuttarwar commented at 8:48 pm on December 2, 2019:

if the txn is already in your mempool, this function will be a no-op, which should be the majority of the time. this is only relevant if it has gotten dropped from your mempool (expired, evicted, etc.) the rebroadcast logic will look at the contents of the mempool to decide what to rebroadcast. so, the majority of the time, this function will not be relevant to determining if a wallet txn will be rebroadcast. does that make sense?

also, where are you getting the 10 min number from? rebroadcasts will occur ~1x / hour per peer.

ariard commented at 5:17 pm on December 3, 2019:

Ooops, yes 1 hour, dunno from where I drawn it, my question is you want wallet rebroadcast to have the highest change possible to be selected for mempool rebroadcast, to do so your wallet rebroadcast batch should be resubmitted in each mempool rebroadcast period, so wallet_timer < mempool_timer ?

amitiuttarwar commented at 11:22 pm on January 6, 2020:

the wallet resubmission timer is only incase the txn gets dropped (expired, evicted, etc.) from the local mempool before being confirmed. thus, its unnecessary for the wallet to attempt to resubmit to the mempool every hour. @ariard based on our conversation offline, I believe you have reached this understanding & are now comfortable with this timer. can you confirm?

instagibbs commented at 6:16 pm on November 27, 2019: member

I think we should try to hold off nits until further concept acks and development

On Wed, Nov 27, 2019, 1:05 PM Emil Engler notifications@github.com wrote:

@emilengler commented on this pull request.

In src/miner.cpp https://github.com/bitcoin/bitcoin/pull/16698#discussion_r351431222:

@@ -406,6 +452,12 @@ void BlockAssembler::addPackageTxs(int &nPackagesSelected, int &nDescendantsUpda // This transaction will make it in; reset the failed counter. nConsecutiveFailed = 0;
   CFeeRate newFeeRate(packageFees, packageSize * WITNESS_SCALE_FACTOR);
   if (newFeeRate < minPackageFeeRate) {
Nit: Wouldn’t it be better to use the one line if coding style?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bitcoin/bitcoin/pull/16698?email_source=notifications&email_token=ABMAFU2BUVGFY7OHVHV7D2LQV2ZGDA5CNFSM4IPBPTD2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCNG3RIY#pullrequestreview-323860643, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMAFU4O72DQV3NMFNPPUFTQV2ZGDANCNFSM4IPBPTDQ .

in src/miner.cpp:280 in 54e0e87e21 outdated

272@@ -269,6 +273,7 @@ int BlockAssembler::UpdatePackagesForAdded(const CTxMemPool::setEntries& already
273 bool BlockAssembler::SkipMapTxEntry(CTxMemPool::txiter it, indexed_modified_transaction_set &mapModifiedTx, CTxMemPool::setEntries &failedTx)

ariard commented at 6:21 pm on November 27, 2019:

need to update comment

amitiuttarwar commented at 0:02 am on December 3, 2019:

updated

in src/miner.cpp:321 in 42c460a24f outdated

314+    assert(pindexPrev != nullptr);
315+
316+    pblock->nTime = GetAdjustedTime();
317+    const int64_t nMedianTimePast = pindexPrev->GetMedianTimePast();
318+
319+    nLockTimeCutoff = (STANDARD_LOCKTIME_VERIFY_FLAGS & LOCKTIME_MEDIAN_TIME_PAST)

ariard commented at 7:33 pm on November 27, 2019:

IMO just set the nLocktimeCutoff to nMedianTimePast, as we care about up to date mempool all of them are far ahead BIP 113 activation.

amitiuttarwar commented at 1:21 am on December 3, 2019:

is there a downside to leaving as is? I’d rather have this code mimic the normal mining path as closely as possible.

ariard commented at 4:54 pm on December 3, 2019:

The downside is more complex code to review in the future.

Given current value STANDARD_LOCKTIME_VERIFY_FLAGS, nLockTimeCutoff is always going to be nMedianTimePast. And this rule being not only a standard one but a consensus one it’s not going to change. We may have future different rules to compute transaction timelock but I think linking to standardness is a confusion here. Normal mining code should also do that.

amitiuttarwar commented at 8:39 pm on December 3, 2019:

ok you’ve convinced me :) I’ll update.

amitiuttarwar commented at 9:16 pm on January 4, 2020:

fixed

in src/net_processing.cpp:3836 in 42c460a24f outdated

3850@@ -3844,6 +3851,12 @@ bool PeerLogicValidation::SendMessages(CNode* pto)
3851                     }
3852                 }
3853 
3854+                // cache the min fee rate for a txn to be included in a block
3855+                // applied as rebroadcast filter above
3856+                if (mempool.m_next_min_fee_cache < current_time){
3857+                    mempool.CacheMinRebroadcastFee();

ariard commented at 7:36 pm on November 27, 2019:

Did you benchmark CacheMinRebroadcastFee ? Not sure if it’s the best way to encumber ThreadMessageHandler with mempool computation. I think it could be scheduled by the scheduler like we are doing for DumpBanList or DumpAddresses.

ariard commented at 7:45 pm on November 27, 2019: member

I think this PR is going in the right direction.

Some open questions:

wallet resend timer, would it be a real issue it’s scaled under the rebroadcast one to be sure than wallet txn have at least an attempt at every rebroadcast selection ?
on GETDATA/m_unbroadcast_txids logic: we don’t have a delivery insurance right now for our wallet txn so this PR makes it not worse? maybe this part could be splitted for a future PR which would also explore the aforementioned “thin blocks” idea
privacy-side: I think it’s better even it’s still lower bounded by initial relay

amitiuttarwar force-pushed on Dec 2, 2019

amitiuttarwar force-pushed on Dec 3, 2019

jonatack commented at 7:16 pm on December 3, 2019: member

Tested 5335439a8c0cc591d717665d9dcd57d6adfb3fa8, if helpful. Same test issue:

 0wallet_watchonly.py                    | ✓ Passed  | 2 s
 1wallet_watchonly.py --usecli           | ✓ Passed  | 3 s
 2wallet_zapwallettxes.py                | ✓ Passed  | 6 s
 3mempool_rebroadcast.py                 | ✖ Failed  | 338 s
 4
 5ALL                                    | ✖ Failed  | 3064 s (accumulated) 
 6Runtime: 833 s
 7
 8
 9bitcoin/bitcoin ((HEAD detached at origin/pr/16698))$ test/functional/mempool_rebroadcast.py 
102019-12-03T19:04:39.827000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_qb0fjic5
112019-12-03T19:04:40.899000Z TestFramework (INFO): Test simplest rebroadcast case
122019-12-03T19:04:42.876000Z TestFramework (INFO): Test recent txns don't get rebroadcast
132019-12-03T19:09:52.917000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: ''''
14            wait_until(lambda: new_tx in node.getrawmempool())
15'''
162019-12-03T19:09:52.918000Z TestFramework (ERROR): Assertion failed
17Traceback (most recent call last):
18  File "/home/jon/projects/bitcoin/bitcoin/test/functional/test_framework/test_framework.py", line 111, in main
19    self.run_test()
20  File "test/functional/mempool_rebroadcast.py", line 44, in run_test
21    self.test_recency_filter()
22  File "test/functional/mempool_rebroadcast.py", line 171, in test_recency_filter
23    wait_until(lambda: new_tx in node.getrawmempool())
24  File "/home/jon/projects/bitcoin/bitcoin/test/functional/test_framework/util.py", line 229, in wait_until
25    raise AssertionError("Predicate {} not true after {} seconds".format(predicate_source, timeout))
26AssertionError: Predicate ''''
27            wait_until(lambda: new_tx in node.getrawmempool())
28''' not true after 60 seconds
292019-12-03T19:09:52.974000Z TestFramework (INFO): Stopping nodes

andrewtoth referenced this in commit e265fd2fbe on Dec 4, 2019

ariard commented at 4:35 am on December 8, 2019: member

After talking with Amiti out-of-band, I’m concept ACK on the privacy fix. Turning relay flag of SubmitMemoryPoolAndRelay to false in ResendWalletTransactions means than wallet rebroadcast logic cascade in mempool one and so shouldn’t be observable by a connected peer. Wallet tx is hidden in mempool “anonymity set”.

DrahtBot added the label Needs rebase on Dec 13, 2019

fjahr commented at 1:04 am on December 27, 2019: member

Concept ACK on improving privacy here. I still need to put more work in going into the details of the approach but I took down some notes while looking over the code and previous discussions:

Implications of empty blocks were mentioned in PR Review Club and I just want to echo here that the behavior should be investigated and documented here. In addition to that, I can also see cases arise of blocks that are full but have a much lower fee rate than the mempool. Miners could have exclusive deals with parties that do not want to broadcast their transactions to the network and pay the miners privately to mine their transactions. Also, I thought of a case where a miner was sweeping a lot of dust outputs from publicly known private keys in a huge transaction a few months ago (did not find the link right now).
One heuristic that came to mind to detect the former case is comparing fee rate ranges, i.e. if the fee rate of the latest block was just flat 1 sat/byte but our block template says we would expect a range of 20 - 50 sat/byte, probably something is going on that is out of our control and a rebroadcast would not change this.
I am not so happy with the changes in the miner code as they are not concerns of the miner. Naively, I would suggest a more “functional” design within GetRebroadcastTransactions: A block template would be requested from unchanged miner code and then run through a series of filters that kick out rebroadcast candidates for different reasons. IMO it would make code easier to reason about, filters could be added/removed as the behavior of different combinations are tested and they should be easily unit testable. In pseudo-code:

0candidates = getBlockTemplate();
1filterRecentTransactions(candidates);
2filterOverMinFee(candidates, minFeeCache);
3...
4return candidates;

My suggestion in the latter point also includes a behavior change: recent transactions are filtered after the template is constructed and not before. And I think this also makes sense on a conceptual level. Let’s say we have a full block of up to 20 sat/byte txs that are older and should be rebroadcast per your logic and we also have just received over a block full of new 30 sat/byte txs. We would rebroadcast txs of <= 20 sat/byte although we know that they actually will not make it into the next 1-2 blocks because there are these 30 sat/byte txs, which well-connected miners probably received even earlier than we did. So we might as well hold off from rebroadcasting them right now. I think it’s right to not rebroadcast recent transactions but we should not ignore their existence either. This may be a controversial point but I think we can save network traffic by deferring rebroadcasting in this case.

FWIW, I am also experiencing the test failures mentioned by others but only when running them through the test harness. In that case, wallet_resendwallettransactions.py fails consistently for me and mempool_rebroadcast.py fails about 80% of the time.

amitiuttarwar force-pushed on Jan 3, 2020

[p2p] implement mempool rebroadcast functionality

Update rebroadcast logic to live in the mempool & have two
fundamental differences from the existing wallet logic:
1. Apply to all transactions (not just mine)
2. Rebroadcast txns expected to have been mined by now.

The main reasoning behind these changes are to improve privacy.

9e9b185186

[docs] updates to prioritisetransaction help man 0b2c36e6a3

[mining] add recency filter on block creation to get rebroadcast set

To reduce noise, apply a filter to ensure recent txns are not rebroadcast

24ff7de788

[test] Test rebroadcast functionality

Add functional tests to ensure that rebroadcast behavior works
as expected, namely that only txns at the top of the mempool
(according to block assembler logic) are rebroadcast, and txns
that are recent are excluded from the set.

d3154f1160

[wallet] update wallet to resubmit to node rather than rebroadcast

With the new functionality added to the mempool, the wallet
no longer needs to rebroadcast transactions to peers directly.
Instead, it resubmits txns to the nodes once a day in case they
were dropped (expired, evicted, etc.) from the local mempool
before being confirmed.

Rename `ResendWalletTransactions` method to
`ResubmitWalletTransactionsToMempool` for clarity.

baf7de2210

[mempool] mempool tracks wallet txns & ensures successful initial broadcast

Since the newly implemented functionality only rebroadcasts txns at
the top of the mempool, its possible to hit a case where if a txn was
submitted locally & not immediately relayed, it would take a long time
to be included in the rebroadcast set & ever succesfully be initially
broadcast.

To prevent this case, the mempool keeps track of `m_unbroadcast_txids`,
and deems the txn relay successful when an associated GETDATA message is
received. On the rebroadcast timer, txns from this set are added to the
txns being relayed.

f8835c989b

[mempool] add fee rate filter on rebroadcast set

Periodically (~20 mins) construct a block, identify the minimum
fee rate for a package to be included & cache the result. Only
rebroadcast transactions with higher fee rate than cached value.
This is to reduce rebroadcast noise.

We add logic to ensure a new block between last cache run &
rebroadcast, because otherwise the cache doesn't eliminate any
transactions from the set.

At the time of rebroadcast, if the cache has run within the past
30 mins, it will work together with the recency filter to remove
txns from the rebroadcast set.

939e220602

amitiuttarwar force-pushed on Jan 3, 2020

DrahtBot removed the label Needs rebase on Jan 4, 2020

amitiuttarwar commented at 2:42 am on January 5, 2020: contributor

first off, thank you all for the thoughtful reviews! here is an update:

TLDR; overview of this PR & open threads here. You can help me move this PR forward by

approach ACKs or explaining unaddressed concerns
running the mempool_rebroadcast test and sharing your results with me.

–

I pushed changes that …

should fix the test failures -> there was an uninitialized variable that only failed sporadically due to shared memory space, causing the test to have different results on different operating systems. Got the test to succeed on ubuntu & it passes on travis, but looks like appveyor is failing. Might be a different issue, looks like I need to debug further 😬. I’d love some reviewers to run the tests to see if it passes locally / share failure logs with me.
address most misc review comments (a couple small ones are still open)
rebased
renamed ResendWalletTransactions to ResubmitWalletTransactionsToMempool to make it very clear that the wallet submits txns to the local mempool, not the network.

By my count, this PR currently has 7 concept ACKS & 0 NACKs. To move these changes forward I’m currently monitoring the patch on the network & am working on improving the monitoring mechanisms to make it easy for reviewers to see the results of & run themselves if willing. Simultaneously I’m working to address conceptual concerns & hope to begin receiving approach ACKs.

I’ve updated the gist in OP (link) with an overview of the changes, open threads & concerns, monitoring results, etc. My intention is to capture all the branches in an understandable way and make it easier for reviewers to follow. I’ve tried to capture all open conceptual concerns there.

The biggest question we all have about these changes is the bandwidth impact. While I have implemented various mechanisms to mitigate, its important to monitor the patch live and observe results.

I’m keeping the gist updated with the results of my monitoring. Copy pasting the latest results:

0after running the patch for..
110 days, node has only outbound connections -> 30 additional invs sent to peers (28, 2, 1)
26 days, node also accepts incoming connections -> 186 additional invs sent to peers (22, 29, 28, 2, 3, 24, 35, 7, 5, 3, 28)
3
4Since each inv message is 36 bytes, this means...
5~1 kb of data sent in 10 days with only outbound connections
6~6.5 kb of data sent in 6 days when also accepting incoming connections

– responses to individual conceptual concerns @JeremyRubin - I’ve updated the gist with these topics, but answering here as well.

mining code as a dependency -> if reviewers agree with the concern around introducing a dependency on the mining code for non-mining nodes, I think the best way to address would be a kill switch to disable rebroadcasting. creating a bounded-size priority queue might be viable, but would significantly increase the complexity of this diff so I hope to avoid. A switch to disable would have some fingerprint leaks but much less than a customizable rebroadcast interval, and the ability to disable addresses any bandwidth issues for low resourced nodes. Thoughts?
using compact-blocks relay code - seems like a suggestion to address concerns around bandwidth usage, which I believe can be addressed in simpler ways. to use compact-blocks, we’d have to introduce a different P2P message to indicate these are mempool transactions. @fjahr - thank you for your thoughtful review. here are my responses-
changes to the miner code -> initially I tried to take the approach of getting the block template then applying the filters. the issue that arose with the recency filter has to do with processing packages. consider a CPFP case. If the child arrives in the past 5 minutes and causes the parent to be in the block template, the parent should also be filtered out by the recency logic (aka not rebroadcast). Updating the miner logic takes care of this automatically. If we are to apply filters after retrieving the template, we would have to reconstruct the dependency tree to handle appropriately. I agree with you that its not an ideal separation of concerns, but the flip would be unnecessary code complexity and computation that I believe would be harder to reason about and an undesirable tradeoff in processing time. Let me know if that makes sense or if you have further concerns / considerations.
comparing fee rate ranges -> I’ve thought a lot about the idea of comparing fee-rate expectations based on our mempool vs the block, and I’ve been entirely unable to come up with any mechanisms robust for all situations. While maybe we would be able to look at a block and identify that a miner is intentionally choosing low fee rate transactions over high fee rate ones, I haven’t been able to figure out a way to identify that algorithmically. To allow some room for manual priority is why I chose the rebroadcast set to begin with 3/4 block weight instead of a full block. Similarly, its possible to have a case where txns have successfully propagated the network but are intentionally being censored. While that would mean a lot of not useful transactions being rebroadcast, the network would have bigger issues. I’m happy to discuss this further (maybe in a DM convo & we can share a summary) if you’d like to hear more about strategies I have thought through, or if you have ideas for potentially viable strategies for automated rules to detect.
my understanding is that your example with 20 vs 30 sat/byte txns is trying to capture a situation where the miner receives transactions that you don’t & they are higher fee rate. Since it only takes a max of a few seconds for txns to propagate the network, I don’t think we would run into a dramatic race condition here. so that means the situation you’ve described here converges on the one you’re talking about earlier where a miner picks up txns that are unseen to the network. Again, I haven’t figured out any sensical way of algorithmically determining such case. Its possible but doesn’t seem worth the tradeoffs (storing a lot of extra data). In the current implementation, if this case does occasionally occur, I think it would be ok because of all the other bandwidth mitigation (max 3/4 block, inv messages only sent to peers that didn’t receive recently because of filterInventoryKnown).
empty blocks -> still have to think this one fully through. I’ll do so soon & update the gist.

jonatack commented at 9:44 am on January 5, 2020: member

2. running the `mempool_rebroadcast` test and sharing your results with me.

Ran tests at 939e2206021fd00cfd56c4de090434e4152bae8a repeatedly on Debian 4.19 x86/64 and they appear to be much more stable, both when called directly (10 runs) and when running the test suite (twice) via the runner. :rocket: The resubmit txs test takes 2-3 seconds; the mempool tests each take 2-3 minutes.

mempool_rebroadcast.py and wallet_resubmit_txs.py both appear to be passing reliably
mempool_local_transactions.py timed out only once out of ten solo runs (log 168 lines, consolidated log last ~1000 lines) with

0  File "test/functional/mempool_local_transactions.py", line 110, in run_test
1    wait_until(lambda: wallet_tx_id in conn.get_invs())

I could not reproduce the mempool_local_transactions timeout on further runs and provide the log details only in case they are helpful.

gmaxwell commented at 2:05 am on January 7, 2020: contributor

I think the best way to address would be a kill switch to disable rebroadcasting.

There is one, -walletbroadcast=0.

How should this be handling dependencies?

If you are merely filtering on feerate will you end up with a situation where you rebroadcast a child transaction that has a high feerate even when the low feerate of its parent means it has no chance of being included?

Has anyone considered significantly increasing the time between rebroadcasts as an additional change (and one that could go in separately much faster)? Particularly once you’ve performed one successful broadcast per execution additional rebroadcasts usually do nothing except leak data to attackers that frequently reconnect to you– you won’t resend to a host you’ve sent to before, and its not usually useful to do so. This wouldn’t obviate the need to be more intelligent, but it would reduce the bleeding and also make it more acceptable for the rest of the change to be more approximate.

Miners could have exclusive deals with parties that do not want to broadcast their transactions to the network and pay the miners privately to mine their transactions.

This sort of concern would be substantially mitigated by not rebroadcasting as soon as the transaction is ‘missed’ but only after it is missed by N blocks (say 3) blocks where, according to your mempool they should have included the transaction. Then you would only rebroadcast inappropriately if all 3 miners should have included it but skipped it for some reason.

The biggest question we all have about these changes is the bandwidth impact.

Bandwidth impact should be somewhat mitigated by the fact that the known filters should block many of the rebroadcasts to long running peers.

FWIW, I missed on the first pass that this change makes rebroadcasting apply to all mempool transactions. I think that is conceptually a very good move but it has considerable implications.

Looking at your average bandwidth usage isn’t enough, you have to worry about cases like what if all nodes are running into their mempool limits under high load– will this change cause the network to start DOS attacking itself?

Would this PR be better if we used a “thin blocks” style relay? We already form a block template, but then we don’t take advantage of the compact-blocks relay code which would minimize the data sent, assuming the neighbor node does have most of what we rebroadcast…

Compact blocks would be entirely the wrong mechanism. The transactions are unordered, unlike a block. And they are almost entirely known. The mechanism you want is the erlay reconcillation mechanism because it uses bandwidth equal to the size of the actual set difference, not the size of the mempool.

In fact with that mechanism, in theory there is no need to do any additional mempool feerate tracking, comparisons with blocks or whatever. Just use erlay to send IDs for the entire mempool (filtered only by the peer’s current relay minimum feerate) after every block. This takes bandwidth equal to 4 bytes times the number of differences, no bytes per entry in common.

However, in practice it isn’t that simple. While designing erlay we specifically consider and discarded the approach of making it work by synchronizing the mempools (which was how I approached the relay efficiency problem in my original writeup). The reason for this is that difference in mempool policy (standardness, minfees, maximum mempool sizes, ancestor depth limits, softforks) or even just simple transaction ordering causes persistent network-wide bandwidth leaks if you relay by syncing the mempool.

All these problems also apply here, because this is essentially a mempool syncing proposal, but even worse because compared to the erlay reconciliation this is an extremely inefficient way of syncing: it spends bandwidth for transactions the peer already knows.

Consider an attacker that monitors the network and finds nodes close to miners. Then he sends the near-miner nodes a bunch of very low feerate transactions which won’t get mined soon . He concurrently sends conflicting high feerate transactions to every other node. The high feerate transactions don’t get mined, the other nodes have no idea why, and they bleed bandwidth continually uselessly re-advertising transactions. (if the higher feerate txn gets mined by accident he just tries again)

If erlay style relay is used, the bandwidth is really only wasted at the low/high feerate boundary… but unfortunately the attacker can make that boundary arbitrarily large (e.g. give half the nodes the low feerate txn in additional to all the miner-proximal nodes).

gmaxwell commented at 3:02 am on January 7, 2020: contributor

Question: Is it a goal to rebroadcast transactions to peers that have already had them sent to them?

If yes: the current PR (and wallet rebroadcasting) fails to do that (reliably) because of the per peer bloom filters on broadcasts.

If no: great that reduces bandwidth, but the current bloomfiltering approach doesn’t successfully do that reliably– because it forgets. The existing bloom filters are slow and use a lot of memory (they’re gigantic to avoid false positives that would break txn relay). For other reasons its been previously proposed to replace them with data in each mempool transaction that indicates which peers its been relayed to.

E.g. a possible approach is each mempool txn gets a

0NodeId lowest_connected_unsent_peer;
1vector<bool> sent_peers_above_lowest_unsent;

Or something analogous but with less allocation overhead. :)

I suspect that for reasonable numbers of peers and transactions this could take less memory than the bloom filter (which I seem to vaguely recall is something like 1MB per peer?).

amitiuttarwar commented at 5:51 am on January 9, 2020: contributor

thank you for your review @gmaxwell !

RE: kill switch & -walletbroadcast=0

This mechanism previously disabled rebroadcasts, since that was a purely wallet focused responsibility. With these changes (in the current state), setting this flag to false will not disable the node rebroadcasting transactions.
Conceptually I think this makes sense. with these changes rebroadcasting transactions is more of a p2p/mempool responsibility. Disabling wallet broadcast shouldn’t disable rebroadcasting.
-blocksonly mode would disable rebroadcast because there would be no txns in the mempool (unless you submit locally)
I’m planning to add another flag to disable rebroadcast specifically. This can also be used for rollout, and nodes can choose to enable the new logic to verify reasonable behavior.

RE: handling dependencies for the reasons you mentioned I use addPackageTx for calculating top of mempool.

Has anyone considered significantly increasing the time between rebroadcasts as an additional change (and one that could go in separately much faster)

I have not. That’s a great idea. I’m going to create a smaller PR with the mempool logic that guarantees delivery of wallet transactions & reduce frequency of rebroadcasts in the existing implementation. Thank you!

This sort of concern would be substantially mitigated by not rebroadcasting as soon as the transaction is ‘missed’ but only after it is missed by N blocks (say 3) blocks …

Yeah. For these sorts of reason there are impositions on the rebroadcast set (mainly txn must be >30 mins old. block must have arrived between last cache run and rebroadcast ). Still an open question whether the params chosen are the most reasonable.

RE: bandwidth

Yup and yup. filterInventoryKnown hugely decreases the inv messages sent to peers in the normal cases. But average bandwidth usage is insufficient for accepting these changes, simply just one aspect.

The attack you’ve portrayed here is in line with the open question I’ve been pondering about mempools with different policies - the non-adversarial case you’ve mentioned. The solution I’m thinking about is adding another data structure, max_rebroadcast_count that would maintain txids, expiry times, and count (number of times I have rebroadcasted). Over a certain count, it would serve as a rebroadcast blacklist. If the txn is mined into a block it would be removed from the list, still have to think through the other expiry conditions. I’m still hammering out the details but would be interested in hearing your feedback.

Question: Is it a goal to rebroadcast transactions to peers that have already had them sent to them?

Hm, somewhere in between? The goal of rebroadcasting transactions is to propagate a transaction out to the network in your mempool that you suspect the miners might not have. If you’ve recently sent a txn to a peer, you don’t have reason to believe resending it to them would help solve this issue, and would just waste bandwidth. On the other hand, you would want to rebroadcast to the same peers in eg. a situation where you have a large mempool, your peers have small mempools & a competitive fee rate market caused them to drop a transaction. After the fee pressure reduces, re-announcing the txn to them could help it get mined. In its current form, the bloom filter is conducive to this behavior by preventing repeat sends to a peer within a ~2-6 hr time period (including txns dropped from the mempool).

RE: alternative peer-transaction tracking system

I’m definitely interested in thinking through this approach you’re proposing / other alternatives to the current bloom filter, but think its tangential to this PR? Would be happy to continue the convo on a new issue if you’re interested in opening.

JeremyRubin added this to the "Rebroadcasting" column in a project

DrahtBot added the label Needs rebase on Jan 30, 2020

DrahtBot commented at 6:08 am on January 30, 2020: member

MarcoFalke referenced this in commit 36f42e1bf4 on Feb 18, 2020

sidhujag referenced this in commit 2fc510301e on Feb 18, 2020

MarcoFalke removed this from the milestone 0.20.0 on Mar 30, 2020

amitiuttarwar commented at 0:25 am on March 31, 2020: contributor

closing this PR until #18038 gets merged.

if you’re interested in moving this project forward, #18038 is ready for review :) thanks in advance!

amitiuttarwar closed this on Mar 31, 2020

codablock referenced this in commit 1cd6797373 on Apr 9, 2020

codablock referenced this in commit a26a2d82cf on Apr 12, 2020

codablock referenced this in commit debd93a84c on Apr 12, 2020

codablock referenced this in commit 02ee99718a on Apr 12, 2020

codablock referenced this in commit c52c1e1ce5 on Apr 14, 2020

fanquake referenced this in commit 0ef0d33f75 on Apr 29, 2020

sidhujag referenced this in commit 7fc0f22e3d on Apr 29, 2020

ComputerCraftr referenced this in commit 285bce192a on Jun 10, 2020

ComputerCraftr referenced this in commit 69e7eee7a1 on Jun 10, 2020

sidhujag referenced this in commit 990a3ee8c8 on Nov 10, 2020

gmaxwell commented at 2:30 pm on January 10, 2021: contributor

Is this work going to continue now that #18038 is done?

amitiuttarwar commented at 2:34 am on January 12, 2021: contributor

@gmaxwell yes! I’m currently trying to work out a couple more kinks before continuing the review process on github & seeking feedback on approach.

Fabcien referenced this in commit 674a222fdb on Jan 21, 2021

amitiuttarwar commented at 5:55 pm on February 8, 2021: contributor

this work is being continued in #21061 :)

ckti referenced this in commit fb46609fd9 on Mar 28, 2021

malbit referenced this in commit 602513f825 on Nov 5, 2021

gades referenced this in commit ff57a3caff on Mar 17, 2022

vijaydasmp referenced this in commit 877d0a1d2a on Apr 5, 2022

vijaydasmp referenced this in commit 5ce7ce2c4d on May 19, 2022

DrahtBot locked this on Aug 16, 2022

Mempool: rework rebroadcast logic to improve privacy #16698

Conflicts

@emilengler commented on this pull request.