P2P: Mempool tracks locally submitted transactions to improve wallet privacy

amitiuttarwar commented at 5:52 am on January 31, 2020: contributor

This PR introduces mempool tracking of unbroadcast transactions and periodic reattempts at initial broadcast. This is a part of the rebroadcast project, and a standalone privacy win.

The current rebroadcast logic is terrible for privacy because 1. only the source wallet rebroadcasts transactions and 2. it does so quite frequently. In the current system, if a user submits a transaction that does not immediately get broadcast to the network (eg. they are offline), this “rebroadcast” behavior is the safety net that can actually serve as the initial broadcast. So, keeping the attempts frequent is important for initial delivery within a reasonable timespan.

This PR aims to improve # 2 by reducing the wallet rebroadcast frequency to ~1/day from ~1/15 min. It achieves this by separating the notion of initial broadcast from rebroadcasts. With these changes, the mempool tracks locally submitted transactions & periodically reattempts initial broadcast. Transactions submitted via the wallet or RPC are added to an “unbroadcast” set & are removed when a peer sends a getdata request, or the transaction is removed from the mempool. Every 10-15 minutes, the node reattempts an initial broadcast. This enables reducing the wallet rebroadcast frequency while ensuring the transactions will be propagated to the network.

For privacy improvements around # 1, please see #16698. Thank you to gmaxwell for the idea of how to break out this subset of functionality (https://github.com/bitcoin/bitcoin/pull/16698#issuecomment-571399346)

fanquake added the label P2P on Jan 31, 2020

DrahtBot commented at 8:39 am on January 31, 2020: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#18807 ([WIP] unbroadcast follow-ups by amitiuttarwar)
#18781 (Add templated GetRandDuration<> by MarcoFalke)
#18764 (refactor: test: replace inv type magic numbers by constants by theStack)
#18446 (test: Add test for wtxid transaction relay by fjahr)
#18261 (Erlay: bandwidth-efficient transaction relay protocol by naumenkogs)
#18044 (Use wtxid for transaction relay by sdaftuar)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

amitiuttarwar force-pushed on Feb 5, 2020

JeremyRubin added this to the "Package Relay" column in a project

JeremyRubin moved this from the "Package Relay" to the "Rebroadcasting" column in a project

MarcoFalke referenced this in commit 36f42e1bf4 on Feb 18, 2020

DrahtBot added the label Needs rebase on Feb 18, 2020

sidhujag referenced this in commit 2fc510301e on Feb 18, 2020

amitiuttarwar force-pushed on Feb 26, 2020

DrahtBot removed the label Needs rebase on Feb 27, 2020

amitiuttarwar commented at 8:14 pm on February 27, 2020: contributor

travis failure seems unrelated- feature_fee_estimation.py

MarcoFalke commented at 8:43 pm on February 27, 2020: member

It could be a crash, because node1 stops to log anything at all

in src/txmempool.h:539 in 66ddec88f5 outdated

533@@ -534,6 +534,10 @@ class CTxMemPool
534     const setEntries & GetMemPoolParents(txiter entry) const EXCLUSIVE_LOCKS_REQUIRED(cs);
535     const setEntries & GetMemPoolChildren(txiter entry) const EXCLUSIVE_LOCKS_REQUIRED(cs);
536     uint64_t CalculateDescendantMaximum(txiter entry) const EXCLUSIVE_LOCKS_REQUIRED(cs);
537+
538+    // track locally submitted transactions & periodically retry initial broadcast
539+    std::set<uint256> m_unbroadcast_txids;

MarcoFalke commented at 9:00 pm on February 27, 2020:

Why is this not saved to disk? It seems that on restart this set is cleared regardless of whether the tx was sent out.

amitiuttarwar commented at 5:15 pm on February 28, 2020:

great point. originally I was thinking of doing as a follow up PR (see here), but now that this unbroadcast logic is broken out into its own PR I agree it makes sense here. I’ll implement.

ajtowns commented at 3:30 am on March 4, 2020:

m_unbroadcast_txids declaration is missing the GUARDED_BY(cs) thread safety annotation, and I think the errors adding that turns up indicates there’s more locking needed which probably explains the node segfaulting in travis.

ariard commented at 8:31 pm on March 16, 2020:

Have you weighted implications of wtxid-relay (#18044) ? I think you will need to track both txid/wtxid for backward-compability with non-upgraded nodes.

amitiuttarwar commented at 8:39 pm on March 17, 2020:

added logic to save unbroadcast set to disk
added thread safety annotation & missing locks from lots of call sites 😛

thank you both for this feedback 🙏🏽 would love further review on the added logic

amitiuttarwar commented at 10:27 pm on March 17, 2020:

ah, good point. will think this through. thanks!

amitiuttarwar commented at 0:14 am on March 31, 2020:

from #18038 (comment)

RE: wtxid - there is some additional code needed to support unbroadcast set & wtxid relay. I’ll work on the patch & depending on timing around which PR is ready for merge first, include the patch here or offer it on the other PR. Thanks again for pointing that out :)

in src/net_processing.cpp:1593 in 66ec5571ad outdated

1589@@ -1590,6 +1590,11 @@ void static ProcessGetData(CNode* pfrom, const CChainParams& chainparams, CConnm
1590             if (mi != mapRelay.end()) {
1591                 connman->PushMessage(pfrom, msgMaker.Make(nSendFlags, NetMsgType::TX, *mi->second));
1592                 push = true;
1593+                // Once a peer requests GETDATA for a txn, we deem initial broadcast a success

MarcoFalke commented at 4:31 pm on February 28, 2020:

The comment is incorrect, since a tx is not removed when it is not sent, e.g. because it was not found.

amitiuttarwar commented at 8:43 pm on March 17, 2020:

I don’t fully understand what you mean here. Are you pointing out that there are other ways the transaction gets removed from the set?

I updated to: “We intepret processing a GETDATA for a transaction as a successful initial broadcast and remove it from our unbroadcast set.”

Does that seem reasonable? Or am I still missing the point?

in src/net_processing.cpp:1594 in 66ec5571ad outdated

1589@@ -1590,6 +1590,11 @@ void static ProcessGetData(CNode* pfrom, const CChainParams& chainparams, CConnm
1590             if (mi != mapRelay.end()) {
1591                 connman->PushMessage(pfrom, msgMaker.Make(nSendFlags, NetMsgType::TX, *mi->second));
1592                 push = true;
1593+                // Once a peer requests GETDATA for a txn, we deem initial broadcast a success
1594+                int num = mempool.m_unbroadcast_txids.erase(inv.hash);

MarcoFalke commented at 4:32 pm on February 28, 2020:

Why is the tx not removed when it was initially sent out after 15 minutes, e.g. when it was removed from mapRelay and then relayed from the mempool.

I think this code should go down into the if(push) branch

amitiuttarwar commented at 3:33 am on March 3, 2020:

you are right, this is a mistake. thanks 🙏🏽

amitiuttarwar commented at 8:43 pm on March 17, 2020:

Fixed. Thanks again.

in src/txmempool.cpp:420 in 66ec5571ad outdated

415@@ -416,6 +416,10 @@ void CTxMemPool::removeUnchecked(txiter it, MemPoolRemovalReason reason)
416     for (const CTxIn& txin : it->GetTx().vin)
417         mapNextTx.erase(txin.prevout);
418 
419+    if (mempool.m_unbroadcast_txids.erase(hash)) {
420+        LogPrint(BCLog::NET, "Removed %i from m_unbroadcast_txids before GETDATA was received. \n", hash.GetHex());

MarcoFalke commented at 4:33 pm on February 28, 2020:

0        LogPrint(BCLog::NET, "Removed %i from m_unbroadcast_txids before tx was sent out. \n", hash.GetHex());

amitiuttarwar commented at 8:46 pm on March 17, 2020:

I updated it to “from m_unbroadcast_txids before confirmation that txn was sent out”, but I feel uncertain… saying “before GETDATA was received” seems more clear & specific than “confirmation that txn was sent out”. Could you explain further why you didn’t like it?

is this related to this?

MarcoFalke commented at 4:34 pm on February 28, 2020: member

Concept ACK. Some design comments on the first commit 66ec5571ad49b68366e4607cf0b671231083757e

in src/net.cpp:2336 in 1a1a30cc4a outdated

2331@@ -2332,6 +2332,9 @@ bool CConnman::Start(CScheduler& scheduler, const Options& connOptions)
2332     // Dump network addresses
2333     scheduler.scheduleEvery(std::bind(&CConnman::DumpAddresses, this), DUMP_PEERS_INTERVAL * 1000);
2334 
2335+    // Attempt to broadcast transactions that have not yet been acknowledged by the network
2336+    scheduler.scheduleEvery(std::bind(&CConnman::AttemptInitialBroadcast, this), REATTEMPT_BROADCAST_INTERVAL.count() );

MarcoFalke commented at 4:38 pm on February 28, 2020:

0    scheduler.scheduleEvery([]{this->AttemptInitialBroadcast();}, REATTEMPT_BROADCAST_INTERVAL.count() );

amitiuttarwar commented at 10:00 pm on March 17, 2020:

updated. although I was surprised to find out I had to pass “this” through in the capture list.

in src/wallet/wallet.cpp:1985 in 66ddec88f5 outdated

1981@@ -1982,7 +1982,7 @@ void CWallet::ResendWalletTransactions()
1982     // that these are our transactions.
1983     if (GetTime() < nNextResend || !fBroadcastTransactions) return;
1984     bool fFirst = (nNextResend == 0);
1985-    nNextResend = GetTime() + GetRand(30 * 60);
1986+    nNextResend = GetTime() + GetRand(24 * 60 * 60);

naumenkogs commented at 6:44 pm on February 28, 2020:

This is not frequency 1/day (as the PR post says). It’s a value between 0 and 24 hours, with an average of 12 hours. You probably want to do 126060 + GetRand(126060) or something like that?

ajtowns commented at 0:27 am on March 3, 2020:

12*60*60 + GetRand(12*60*60) would get an average of 24 hours with a range of 12-36

naumenkogs commented at 4:07 pm on March 3, 2020:

@ajtowns yeah, that’s what would be closer to 1/day, right? Not sure what you meant to say.

ajtowns commented at 3:08 am on March 4, 2020:

Oh my god. 12*60*60 + GetRand(24*60*60) was what I was trying to type; min of 12h, max of 36h, average of 24h.

naumenkogs commented at 2:35 pm on March 4, 2020:

Agree, I think we should do what you just suggested.

amitiuttarwar commented at 8:49 pm on March 17, 2020:

updated.

funny to update the code to match the comment language rather than vice versa 😛 but I don’t have a strong preference around 12-36hrs vs 0-24hrs, both have small tradeoffs.

in src/net_processing.cpp:795 in 66ddec88f5 outdated

779@@ -780,6 +780,14 @@ void PeerLogicValidation::InitializeNode(CNode *pnode) {
780         PushNodeVersion(pnode, connman, GetTime());
781 }
782 
783+void PeerLogicValidation::QueueUnbroadcastTxs(CNode* pnode)
784+{
785+    for(const uint256& txid : mempool.m_unbroadcast_txids){
786+        CInv inv(MSG_TX, txid);
787+        pnode->PushInventory(inv);

naumenkogs commented at 6:54 pm on February 28, 2020:

How do you know that it won’t be rejected due to “filterInventoryKnown”? What is the rationale here?

amitiuttarwar commented at 7:50 pm on March 2, 2020:

It might. Which upholds the same guarantees as the previous logic it’s replacing.

On current master, wallet transactions get rebroadcast through a scheduler flow that makes its way to ResendWalletTransactions, which selects transactions to rebroadcast then invokes SubmitMemoryPoolAndRelay. This goes to BroadcastTransaction, which checks ATMP then relays via RelayTransaction, which uses the CNode#PushInventory function that only adds the hash if its not contained in filterInventoryKnown.

This PR changes the initial parts of the flow… so BroadcastTransaction adds the transaction to mempool.m_unbroadcast_txids, then there is a scheduled job to periodically AttemptInitialBroadcast, but the behavior is matched where the locally-submitted transactions would be reattempted to send to the network, filtered for the connections that were already tried, and sent out to new connections.

Changing that behavior could be desirable for robustness but of course reduces privacy. So out of scope for this PR. If you think it may be desirable, I’d be happy to think it through / talk about it more, we can open a new issue. (I’m working on the assumption that current delivery guarantees are sufficient).

naumenkogs commented at 4:17 pm on March 3, 2020:

Ah I see. Just want to be clear that if the network is not really used, the bloom filter doesn’t roll and transaction may be not rebroadcast in days. If that’s what you have in mind when designing the whole thing, let’s just leave a comment that this can happen.

ajtowns commented at 3:28 am on March 4, 2020:

The filterInventoryKnown is per-peer though, so it just won’t be rebroadcast to a peer that already knew about it somehow; but it’ll stay in the unbroadcast set and be retried for other peers until a getdata actually makes its way through (or until it gets dropped from the mempool)? If all of your peers ignore your INV and don’t request a GETDATA, you’ll not send more INVs until you find a new peer, but… even if you did, those peers would probably still not send a GETDATA anyway, so what else would you do?

naumenkogs commented at 3:40 pm on March 4, 2020:

Maybe it’s the naming that confuses me. It’s called “rebroadcast”, not “consider to rebroadcast”.

even if you did, those peers would probably still not send a GETDATA anyway

Not sure about that. If transaction expires from a mempool, and there’s not much activity on-chain, filter would still prevent us from re-announcing it. The correspondence between the two is questionable.

But yeah, if that’s only my own confusion, I don’t care about just keeping it this way without any comments.

amitiuttarwar commented at 9:01 pm on March 17, 2020:

Maybe it’s the naming that confuses me. It’s called “rebroadcast”, not “consider to rebroadcast”.

the language I’m introducing is “unbroadcast”, which means consider it as initial broadcast. but there are these interactions with filterInventoryKnown to consider in different cases.

I added some comments here in an attempt to make this more apparent.

going to resolve this conversation, but @naumenkogs let me know if you think this is something we need to more carefully consider in regards to this PR.

naumenkogs commented at 6:55 pm on February 28, 2020: member

Concept ACK, reviewed code and left comments. Will test once we resolve the discussions.

in src/net.cpp:52 in 66ddec88f5 outdated

47@@ -48,6 +48,9 @@ static_assert(MINIUPNPC_API_VERSION >= 10, "miniUPnPc API version >= 10 assumed"
48 // Dump addresses to peers.dat every 15 minutes (900s)
49 static constexpr int DUMP_PEERS_INTERVAL = 15 * 60;
50 
51+// Attempt initial broadcast of locally submitted txns every 10 minutes.
52+static constexpr std::chrono::milliseconds REATTEMPT_BROADCAST_INTERVAL = std::chrono::milliseconds(10 * 60 * 1000);

MarcoFalke commented at 1:24 pm on March 4, 2020:

While I couldn’t come up with a privacy-leaking attack based on this being a constant, I’d still suggest to make the delay random to avoid network-wide synchronized events that don’t need to be synchronized

amitiuttarwar commented at 10:20 pm on March 17, 2020:

ah, I thought through the possible privacy leaks but didn’t think about weird circumstances that could lead to synchronized network-wide events.

updated reattempts to be between 10-15 minutes, chosen every time a node starts up.

DrahtBot added the label Needs rebase on Mar 5, 2020

MarcoFalke commented at 6:37 pm on March 5, 2020: member

Some comments for the second commit

in src/txmempool.h:537 in 66ec5571ad outdated

533@@ -534,6 +534,10 @@ class CTxMemPool
534     const setEntries & GetMemPoolParents(txiter entry) const EXCLUSIVE_LOCKS_REQUIRED(cs);
535     const setEntries & GetMemPoolChildren(txiter entry) const EXCLUSIVE_LOCKS_REQUIRED(cs);
536     uint64_t CalculateDescendantMaximum(txiter entry) const EXCLUSIVE_LOCKS_REQUIRED(cs);
537+
538+    // track locally submitted transactions & periodically retry initial broadcast

ariard commented at 8:40 pm on March 16, 2020:

“periodically (REATTEMPT_BROADCAST_INTERVAL)”

amitiuttarwar commented at 10:28 pm on March 17, 2020:

going to pass on this one, especially since I removed the constant for some variability in intervals.

ariard commented at 8:58 pm on March 16, 2020: member

If I understand well this PR, it introduces a initial-broadcast reattempt mechanism through mempool tracking and GETDATA monitoring. Compare to wallet rebroadcast, the safety catch of new one isn’t confirmation-based but network-based so once you have reasonable propagation (reasonable=1) you stop initial-broadcast. But I don’t grasp where the privacy win is in itself, I would say that’s a reliability improvement against network unknowns (maybe even a slight worsening because now a spying node only have to be connected to you for REATTEMPT_BROADCAST_INTERVAL instead of ~1/30min, which decrease attacker deployment cost ?)

META: can you keep your commit message line to 74 characters :) ?

amitiuttarwar force-pushed on Mar 17, 2020

in src/validation.cpp:5053 in 3d185eda3c outdated

5045@@ -5045,12 +5046,30 @@ bool LoadMempool(CTxMemPool& pool)
5046         for (const auto& i : mapDeltas) {
5047             pool.PrioritiseTransaction(i.first, i.second);
5048         }
5049+
5050+        // check if there are unbroadcast transactions
5051+        bool has_unbroadcast_txns = false;
5052+
5053+        try {

amitiuttarwar commented at 8:22 pm on March 17, 2020:

this is the best option I’ve been able to identify for how to check if we are at the end of the file. would love to hear about any cleaner solutions.

DrahtBot removed the label Needs rebase on Mar 17, 2020

amitiuttarwar commented at 8:33 pm on March 17, 2020: contributor

just pushed some changes:

added functionality to persist unbroadcast set across restarts using mempool.dat & added some tests
updates to locking when accessing unbroadcast set
addressed misc review comments

I’m looking for review / feedback, particularly in the following areas:

that my locking logic makes sense
preference of handling the mempool.dat update. this current implementation doesn’t update the mempool.dat version. Instead LoadMempool checks if we are at the end of the file. If not, reads the next line into the mempool unbroadcast set. this allows for compatibility with any current mempool.dat formats.

if we bump the version, it may cause some friction when users upgrade, but that might be negligible, and I don’t expect persisting the unbroadcast set to be something that’s super frequently used, especially in current network conditions. then, we could have DumpMempool add a bool value that indicates whether or not there are unbroadcast txids, and LoadMempool could use a true to read the next line into the unbroadcast set.

I implemented the first way because I thought version interoperability would be simpler, but after seeing this implementation (and how annoyingly difficult it is to identify “are we at the end of the file?”), I am leaning towards bumping the version. But since I have it built out, thought I’d post the current version.

amitiuttarwar commented at 9:14 pm on March 17, 2020: contributor

looks like I have severely aggravated Travis. whoops 🙈. looking into it.

in src/net_processing.cpp:1633 in 3d185eda3c outdated

1629+
1630+                // We intepret processing a GETDATA for a transaction as a successful initial broadcast
1631+                // and remove it from our unbroadcast set.
1632+                {
1633+                    LOCK(mempool.cs);
1634+                    num = mempool.m_unbroadcast_txids.erase(inv.hash);

MarcoFalke commented at 9:30 pm on March 17, 2020:

Needs rebase to make the const CTxMemPool& non-const

amitiuttarwar commented at 10:24 pm on March 17, 2020:

thanks!

amitiuttarwar force-pushed on Mar 17, 2020

amitiuttarwar commented at 0:08 am on March 18, 2020: contributor

@ariard

If I understand well this PR, it introduces a initial-broadcast reattempt mechanism through mempool tracking and GETDATA monitoring. Compare to wallet rebroadcast, the safety catch of new one isn’t confirmation-based but network-based so once you have reasonable propagation (reasonable=1) you stop initial-broadcast.

Yup.

But I don’t grasp where the privacy win is in itself, I would say that’s a reliability improvement against network unknowns

The privacy improvement is due to wallet rebroadcasts being less frequent. Updated to ~1/day (between 12-36 hrs) instead of ~1/15 mins (between 0-30mins). The current wallet rebroadcast logic deanonymizes the sender, but is important to ensure the initial delivery in some cases. So, we extract the two functions.

Also updated OP to try to make this clearer. @gmaxwell summarizes the motivation pretty well here:

Has anyone considered significantly increasing the time between rebroadcasts as an additional change (and one that could go in separately much faster)? Particularly once you’ve performed one successful broadcast per execution additional rebroadcasts usually do nothing except leak data to attackers that frequently reconnect to you– you won’t resend to a host you’ve sent to before, and its not usually useful to do so. This wouldn’t obviate the need to be more intelligent, but it would reduce the bleeding and also make it more acceptable for the rest of the change to be more approximate.

and in regards to the attack vector,

(maybe even a slight worsening because now a spying node only have to be connected to you for REATTEMPT_BROADCAST_INTERVAL instead of ~1/30min, which decrease attacker deployment cost ?)

Because of filterInventoryKnown logic, its not quite that the attacker needs to be connected to you for that long, but rather that they would need to open a new connection to you during that window of time. Also, my understanding is that the conditions would need to be such that you didn’t receive any GETDATAs back from any peers in regards to this txn, because otherwise it would be removed from the set. So we’d need: 1. victim sends txn to attacker, 2. attacker doesn’t acknowledge with GETDATA but notes down txn hash, 3. somehow (either intentional or unintentional) ensure no other peers send back GETDATA for the txn 4. open a new connection before next broadcast attempt. Then the attacker can confirm that the victim is the source for that transaction.

Does that logic make sense to you? Please poke holes at it :) Or let me know if you have any questions.

Also, relevant conversation in this comment thread here, which I know you’ve seen but might be worth re-visiting.

MarcoFalke renamed this:
~~P2P: Mempool tracks locally submitted transactions to improve privacy~~
P2P: Mempool tracks locally submitted transactions to improve wallet privacy
on Mar 18, 2020

MarcoFalke commented at 3:18 pm on March 18, 2020: member

Changed title to indicate that this only improves wallet privacy

in src/net_processing.cpp:1631 in 77748da226 outdated

1615+                {
1616+                    LOCK(mempool.cs);
1617+                    num = mempool.m_unbroadcast_txids.erase(inv.hash);
1618+                }
1619+
1620+                if (num) {

MarcoFalke commented at 3:36 pm on March 18, 2020:

in commit 77748da2269bd7f2304ebc3b8616d3a584e7c65f:

Can be written shorter as

0                if (WITH_LOCK(mempool.cs, return mempool.m_unbroadcast_txids.erase(inv.hash);)) {

amitiuttarwar commented at 4:41 pm on March 26, 2020:

pass on this one. the num var gets compiled out regardless, and I personally think its much clearer to have it broken out, and helps indicate that erasing returns a number thats evaluated for the if statement

in src/node/transaction.cpp:83 in 77748da226 outdated

77@@ -78,6 +78,10 @@ TransactionError BroadcastTransaction(NodeContext& node, const CTransactionRef t
78     }
79 
80     if (relay) {
81+        // the mempool tracks locally submitted transactions to make a
82+        // best-effort of initial broadcast
83+        WITH_LOCK(mempool.cs, mempool.m_unbroadcast_txids.insert(hashTx));

MarcoFalke commented at 3:40 pm on March 18, 2020:

in commit 77748da:

Please use node.mempool and not the global

amitiuttarwar commented at 4:41 pm on March 26, 2020:

fixed

in src/net.cpp:2348 in 2f3f449dc9 outdated

2341@@ -2345,6 +2342,11 @@ bool CConnman::Start(CScheduler& scheduler, const Options& connOptions)
2342     // Dump network addresses
2343     scheduler.scheduleEvery([this] { DumpAddresses(); }, DUMP_PEERS_INTERVAL);
2344 
2345+    // Periodically attempt to broadcast transactions that have not yet been acknowledged by the network
2346+    // To ensure impossibility of network-wide synchronization event, stagger retry intervals between 10-15 minutes.
2347+    std::chrono::seconds reattempt_broadcast_interval = std::chrono::seconds(10*60 + GetRand(5*60));
2348+    scheduler.scheduleEvery([this]{this->AttemptInitialBroadcast();}, std::chrono::milliseconds(reattempt_broadcast_interval));

MarcoFalke commented at 3:55 pm on March 18, 2020:

in commit 2f3f449dc9c89420a444cc8263c5a5fe537a475b:

I still haven’t figured out a way to exploit this for privacy leaks, but making each node pick a unique fingerprint (heartbeat between 10-15 minutes) doesn’t seem too great. Sometimes nodes offer a tor p2p interface and a clearnet p2p interface and ideally those should be uncorrelated. Obviously horribly broken in the current p2p implementation, but we shouldn’t make it worse, I think.

So what do you think about just scheduling this once here and then reschedule this at the end of the AttemptInitialBroadcast function?

Something like

0void CConnman::ScheduleNextInitialBroadcast(CScheduler& s) const
1{
2 const auto delta = std::chrono::minutes{10} + GetRandMicros(std::chrono::minutes{5});
3 s.schedule([&]{this->AttemptInitialBroadcast(s);}, delta);
4}

naumenkogs commented at 5:01 pm on March 18, 2020:

Although this is supposed to happen extremely rare, I think Marco’s comment is very valid. I agree with his suggestion.

ariard commented at 6:52 pm on March 19, 2020:

See https://github.com/ariard/bitcoin/commit/2d8ae9a838f5c6ec43db15a7bd36554c351b7442, I find a bit weird to schedule a mempool task from connection manager instead of message processing layer. It should achieve the same goal while removing one new method (or have you already considered this and turn away for a good reason) ?

amitiuttarwar commented at 0:05 am on March 25, 2020:

yeah great point. p2p fingerprints are never good. thanks for pointing out.

having AttemptInitialBroadcast call ScheduleNextInitialBroadcast to schedule the next AttemptInitialBroadcast seems a bit much to me (and I’d have to pass the scheduler through anyways), so I’ll just schedule the next invocation at the end of the function itself.

amitiuttarwar commented at 9:52 pm on March 25, 2020:

this is awesome! thanks for the patch. I was also realizing there’s a simpler flow via RelayTransaction, but I agree that kicking off the process in PeerLogicValidation constructor makes more sense conceptually than CConnman::Start. Also I like your rename to ReattemptInitialBroadcast, small change but makes intentions more obvious. Thank you! 🙏🏽

MarcoFalke commented at 2:56 pm on March 26, 2020:

Indeed nice patch @ariard

amitiuttarwar commented at 4:42 pm on March 26, 2020:

fixed. thanks!

amitiuttarwar commented at 4:52 pm on March 26, 2020:

incorporated. thanks again!

in src/net_processing.cpp:790 in 2f3f449dc9 outdated

785+//
786+// Passing txns through the bloom filter has a stronger privacy model, but weaker
787+// delivery guarantees. Under certain network & node conditions, it may still
788+// take an extended period of time for the transaction to be initially broadcast
789+// to the network.
790+void PeerLogicValidation::QueueUnbroadcastTxs(CNode* pnode)

MarcoFalke commented at 3:59 pm on March 18, 2020:

in commit 2f3f449dc9c89420a444cc8263c5a5fe537a475b:

style nit. Can be a reference because it is never null

0void PeerLogicValidation::QueueUnbroadcastTxs(CNode& pnode)

amitiuttarwar commented at 4:44 pm on March 26, 2020:

method no longer exists, but pass by reference for ReattemptInitialBroadcast

in src/net_processing.cpp:793 in 2f3f449dc9 outdated

788+// take an extended period of time for the transaction to be initially broadcast
789+// to the network.
790+void PeerLogicValidation::QueueUnbroadcastTxs(CNode* pnode)
791+{
792+    LOCK(mempool.cs);
793+    for(const uint256& txid : mempool.m_unbroadcast_txids){

MarcoFalke commented at 4:00 pm on March 18, 2020:

in commit 2f3f449dc9c89420a444cc8263c5a5fe537a475b:

Please use m_mempool and not the global.

amitiuttarwar commented at 4:44 pm on March 26, 2020:

method no longer exists, but fixed in ReattemptInitialBroadcast

in src/net_processing.cpp:789 in 2f3f449dc9 outdated

784+// filterInventoryKnown.
785+//
786+// Passing txns through the bloom filter has a stronger privacy model, but weaker
787+// delivery guarantees. Under certain network & node conditions, it may still
788+// take an extended period of time for the transaction to be initially broadcast
789+// to the network.

MarcoFalke commented at 4:14 pm on March 18, 2020:

in commit 2f3f449:

I don’t understand this comment, why would it have weaker delivery guarantees? Is it the bloom filter false positive rate?

amitiuttarwar commented at 7:57 pm on March 23, 2020:

no, simpler.. just mean some transactions will get filtered & not relayed because of filterInventoryKnown. this was a result of the confusion & conversation in #18038 (review).

but maybe without that context this comment is misleading? appreciate any feedback on how to clarify. right now I’m thinking of just deleting the second paragraph.

MarcoFalke commented at 7:59 pm on March 23, 2020:

Why would filterInventoryKnown return a match when the transaction was initially never sent out to that peer?

amitiuttarwar commented at 10:13 pm on March 23, 2020:

eg. if you disconnect wifi & submit a txn to your mempool. code will add to the bloom filters but the transaction never actually went anywhere

amitiuttarwar commented at 4:46 pm on March 26, 2020:

updated this so comment above ReattemptInitialBroadcast just notes that unbroadcast txs will be passed through filterInventoryKnown. Leaving the tradeoff analysis for reviewers / future people writing code.

in src/net_processing.h:79 in 2f3f449dc9 outdated

74@@ -75,7 +75,8 @@ class PeerLogicValidation final : public CValidationInterface, public NetEventsI
75     void CheckForStaleTipAndEvictPeers(const Consensus::Params &consensusParams);
76     /** If we have extra outbound peers, try to disconnect the one with the oldest block announcement */
77     void EvictExtraOutboundPeers(int64_t time_in_seconds) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
78-
79+    /** Retrieve unbroadcast transactions from the mempool and possibly send to pnode */
80+    void QueueUnbroadcastTxs(CNode* pnode) override;

MarcoFalke commented at 4:15 pm on March 18, 2020:

in commit 2f3f449:

Can be const?

0    void QueueUnbroadcastTxs(CNode& node) const override;

amitiuttarwar commented at 4:47 pm on March 26, 2020:

Method no longer exists but made ReattemptInitialBroadcast const.

in test/functional/mempool_unbroadcast.py:21 in f40f6875f5 outdated

16+        assert_equal,
17+        disconnect_nodes,
18+        connect_nodes,
19+        create_confirmed_utxos,
20+        wait_until,
21+        hex_str_to_bytes

MarcoFalke commented at 4:20 pm on March 18, 2020:

in commit f40f6875f5570cb25882d8a04a6cbeb61ee7366d:

Style nit. Could add trailing comma for symmetry, git blame, and code formatters?

0        hex_str_to_bytes,

practicalswift commented at 5:26 pm on March 18, 2020:

@amitiuttarwar Since this is a new file, you might want to run black on the entire file to get proper formatting which addresses MarcoFalke’s nit and all other formatting nits :)

MarcoFalke commented at 6:01 pm on March 18, 2020:

@practicalswift Style is subjective and we allow the pull request author to pick the style, as long as it is reasonable. We don’t require a specific python formatter. While black or yapf (https://github.com/bitcoin/bitcoin/blob/master/.style.yapf) or the built-in formatter of your editor can be used, they are not mandatory.

practicalswift commented at 8:05 pm on March 18, 2020:

@MarcoFalke Have I claimed anything else? :) This is your style nit - I’m just suggesting a way to address it automatically :)

amitiuttarwar commented at 4:48 pm on March 26, 2020:

fixed! thanks for the style tips

in test/functional/mempool_unbroadcast.py:62 in f40f6875f5 outdated

57+        rpc_tx_hsh = node.sendrawtransaction(txFS['hex'])  # txhsh in hex
58+
59+        # check that second node doesn't have these two txns
60+        mempool = self.nodes[1].getrawmempool()
61+        assert(rpc_tx_hsh not in mempool)
62+        assert(wallet_tx_hsh not in mempool)

MarcoFalke commented at 4:22 pm on March 18, 2020:

In commit f40f6875f5570cb25882d8a04a6cbeb61ee7366d:

I think assert is a statement, not a function, so should be without ()

0        assert wallet_tx_hsh not in mempool

practicalswift commented at 5:27 pm on March 18, 2020:

@amitiuttarwar Same goes for this one: black on the entire file to get proper formatting for free :)

amitiuttarwar commented at 4:48 pm on March 26, 2020:

fixed

in test/functional/mempool_unbroadcast.py:64 in f40f6875f5 outdated

59+        # check that second node doesn't have these two txns
60+        mempool = self.nodes[1].getrawmempool()
61+        assert(rpc_tx_hsh not in mempool)
62+        assert(wallet_tx_hsh not in mempool)
63+
64+        self.log.info("Reconnect nodes & check if they are sent to node 1")

MarcoFalke commented at 4:23 pm on March 18, 2020:

in commit f40f6875f5570cb25882d8a04a6cbeb61ee7366d:

Could do the same test again with a self.nodes[0].restart_node() added here?

This way it will test to persist txs on disk.

amitiuttarwar commented at 4:48 pm on March 26, 2020:

good idea! updated.

in test/functional/mempool_unbroadcast.py:69 in f40f6875f5 outdated

64+        self.log.info("Reconnect nodes & check if they are sent to node 1")
65+        connect_nodes(node, 1)
66+
67+        # fast forward into the future & ensure that the second node has the txns
68+        node.mockscheduler(15*60) # 15 min in seconds
69+        wait_until(lambda: len(self.nodes[1].getrawmempool()) == 2, timeout=30)

MarcoFalke commented at 4:27 pm on March 18, 2020:

in commit f40f6875f5570cb25882d8a04a6cbeb61ee7366d:

Can be shorter, because self.sync_mempools() does the same, I think

0        self.sync_mempools()

amitiuttarwar commented at 4:49 pm on March 26, 2020:

updated

in test/functional/mempool_unbroadcast.py:107 in f40f6875f5 outdated

102+        tx.deserialize(BytesIO(hex_str_to_bytes(signed_tx['hex'])))
103+        block.vtx.append(tx)
104+        block.rehash()
105+        block.hashMerkleRoot = block.calc_merkle_root()
106+        block.solve()
107+        node.submitblock(ToHex(block))

MarcoFalke commented at 4:29 pm on March 18, 2020:

in commit f40f687:

Does the block need to be created manually? What about a simple two-liner?

0self.nodes[0].sendtoaddress()
1self.nodes[0].generate(1)

Alternatively, if you want to test both wallet txs and rpc txs, you could factor out the tx-creation code from the previous test case into a new helper function and call it here.

amitiuttarwar commented at 4:51 pm on March 26, 2020:

ahahhahaha omg I’m glad its been a while since I wrote this test & I don’t remember how long I spent figuring out how to manually make the txn and block 😝

two-liner is much cleaner. I feel comfortable with the coverage of wallet & rpc from previous test.

thanks!

in src/validation.cpp:5061 in b0f6a23833 outdated

5056+            // eof gets set when we try to read the end of the file
5057+            // so specifically catch expected error and move on.
5058+            if (!file.eof()) throw e;
5059+        }
5060+
5061+        if (has_unbroadcast_txns) {

MarcoFalke commented at 4:33 pm on March 18, 2020:

in commit b0f6a238335d96c9f5f7c2b83aabd56ab4f47a71:

Why is this needed?

I think you can just go ahead and assume that the set was serialized. If it wasn’t it will result in a one-time LogPrintf that the mempool data couldn’t be deserialized. I think this is fine if it is mentioned in the release notes or as a code comment or not at all.

amitiuttarwar commented at 9:38 pm on March 24, 2020:

so, are you proposing

remove writing the boolean has_unbroadcast_txns to the file here
remove try block above, just leave the file>>pool.m_unbroadcast_txids; code below
if there are no m_unbroadcast_txids, there will be a LogPrintf that “Failed to deserialize mempool data on disk. Continuing anyway”, and code moves on?

if I am interpreting correctly…. I find that to be very misleading user output. And I’d expect it to happen frequently since the unbroadcast set is more of a safety mechanism and hopefully nodes aren’t often shutting down without their local transactions being broadcast to the network. So I’m not really seeing the reasoning / benefits. But maybe I’m misunderstanding your suggestion.

MarcoFalke commented at 0:11 am on March 25, 2020:

The logprint should only happen on node update. A serialized empty set is hopefully different from no serialized set.

MarcoFalke commented at 0:12 am on March 25, 2020:

See here:

https://github.com/bitcoin/bitcoin/blob/5236b2e267a58870239673c7ec85e5df0cb8fc8e/src/serialize.h#L972

naumenkogs commented at 0:55 am on March 25, 2020:

Yeah, so in normal case we’ll get this error once: when the node is updated to use this code, and attempts to use the file from previous runs. Which is, yeah, not ideal, but probably fine. If you’d really want to avoid that, I’d rather add versioning of the file, rather than having bools like that you suggest. (See how we do versioning in peers.dat AddrMan serialization if interested)

MarcoFalke commented at 2:49 pm on March 25, 2020:

This file is already versioned, but the version shouldn’t be bumped because that will silently drop all transactions of the mempool

naumenkogs commented at 3:58 pm on March 25, 2020:

Why does it have to drop all transactions? The new code can just be aware of the previous version, and not attempt to read unbroadcast_txs in that case, but just carefully read all old transactions (the advantage over this current PR is that no error is thrown/logged).

MarcoFalke commented at 4:14 pm on March 25, 2020:

Why does it have to drop all transactions?

When bumping the version here and assuming the code is merged into 0.21.0, then reading a mempool.dat created by Bitcoin Core 0.21.0 will fail completely on a Bitcoin Core 0.20.0, because the code is already released and the logic of that code is to drop all transactions when the version is not equal to 1.

naumenkogs commented at 4:19 pm on March 25, 2020:

True. But who would use a mempool.dat from 0.21 on 0.20 node? I would say the most common scenario is opposite, upgrading from 0.20 to 0.21.

MarcoFalke commented at 4:22 pm on March 25, 2020:

Miners might want to upgrade, test the new version for a bit and then downgrade temporarily because they noticed an RPC breaking change. They should be able to continue mining on the txs they had before. For many users, a version upgrade/downgrade is the only time they restart the node. So this is the only time when this code is run and can provide them with value. If the code fails to be useful in the only use-case we might as well remove it altogether. ;)

MarcoFalke commented at 4:24 pm on March 25, 2020:

If you guys think that a try-catch is useful to avoid the debug LogPrint, fine. I am ok with that too. But I don’t see a reason for this boolean.

naumenkogs commented at 4:31 pm on March 25, 2020:

I also don’t see a good reason for this boolean.

amitiuttarwar commented at 9:47 pm on March 25, 2020:

ok! I’m following now. I wasn’t aware of the serialization logic. I’ll remove the bool & try-catch as well. thanks!

amitiuttarwar commented at 4:52 pm on March 26, 2020:

fixed! thanks again

jnewbery commented at 10:44 pm on April 15, 2020:

but the version shouldn’t be bumped because that will silently drop all transactions of the mempool @MarcoFalke - this is interesting. It means if we ever have to bump the version number we’d have to do something like:

Bitcoin Core version X only understands mempool.dat v1
add mempool.dat v2 parsing logic in Bitcoin Core version X+1
add mempoo.dat v2 writing logic in Bitcoin Core version X+2

I think to avoid the debug LogPrint we can wrap the m_unbroadcast_txids parsing in its own try-except. Something like:

0        try {
1            LOCK(pool.cs);
2            file >> pool.m_unbroadcast_txids;
3            unbroadcast = pool.m_unbroadcast_txids.size();
4        } catch (const std::exception& e) {
5            // mempool.dat files created prior to v0.21 will not have an
6            // unbroadcast set. No need to log a failure if parsing fails
7            // here.
8        }

amitiuttarwar commented at 5:58 pm on April 23, 2020:

I think to avoid the debug LogPrint we can wrap the m_unbroadcast_txids parsing in its own try-except

I considered doing this but decided not to because a one-time error message (with functionality working fine) seems more desirable than a nested try / catch statement that would probably live in the code for a very long time.

amitiuttarwar commented at 6:04 pm on April 29, 2020:

ah, based on another review comment or two, ended up implementing here 191fc04e9bb99589f204ccfceece7d715a82f4c0 (PR #18807)

MarcoFalke approved

MarcoFalke commented at 4:35 pm on March 18, 2020: member

ACK. Looks ready to merge, just a ton of style nits.

ariard commented at 7:15 pm on March 19, 2020: member

Does that logic make sense to you? Please poke holes at it :) Or let me know if you have any questions.

I think we agree, missed inventory filter implications at first, maybe should get some message in your first commit to track goal aimed:

“Previously, in case of rebroadcast, a spying node may learn transaction origin if it was connected to the node during a ~1/15min period.

Now, mempool is tracking locally submitted transactions. At servicing GETDATA for an announced local transaction for at least one peer, tracking is erased. If no GETDATA is processed before mempool initial-broadcast reattempt period expires, a new announcement is made to every peer. Due to inventory filter, this is only effective for new peers since last period expiration. Transaction origin privacy would only be worsened if a) among new nodes there is a spying one b) no honest/reliable peers reported back a GETDATA. Given the mempool rebroadcast period length, if b) occurs, that would be a clue of bigger concern.

This doesn’t improve initial-broadcast privacy where a spying node is already connected to the node before transaction first announcement.”

(but if b) occurs that’s a reliability improvement IMO)

laanwj added the label Feature on Mar 25, 2020

MarcoFalke added this to the milestone 0.21.0 on Mar 26, 2020

amitiuttarwar force-pushed on Mar 26, 2020

amitiuttarwar commented at 10:35 pm on March 26, 2020: contributor

thank you for the thoughtful reviews! pushed changes incorporating the feedback.

currently looking into CI failures. appveyor is failing on feature_versionbits_warning.py, but travis is failing on mempool_persist, so I’m investigating.

MarcoFalke commented at 0:32 am on March 27, 2020: member

Previous runs:

I think they are intermittent failures, so I am going to re-run this pull

Open-Close to re-run ci. See #15847 (comment)

MarcoFalke closed this on Mar 27, 2020

MarcoFalke reopened this on Mar 27, 2020

amitiuttarwar commented at 6:28 pm on March 30, 2020: contributor

the travis valgrind build is failing on p2p_permissions test.

trying to make sense of the logs… looks like node1 has an unclean shutdown, possibly because the RPC is attempted to be stopped more than once? I’m a bit confused because it looks like test_framework/authproxy.py is throwing a ConnectionResetError from _request, but that method should be catching the error? but I think its thrown for a second time in the except block.

trying to figure out if this is related to my changes. earlier in this PR what first looked like an unrelated failure to me was actually related because of issues with my lock logic. this time doesn’t look like either node is crashing. from what I can discern, looks like node1 having an unclean shutdown because of a failure in an http request. which leads me to thinking its test flakiness. @MarcoFalke - thoughts?

amitiuttarwar commented at 7:18 pm on March 30, 2020: contributor

@ariard - addressing your 2 outstanding review comments

RE: wtxid - there is some additional code needed to support unbroadcast set & wtxid relay. I’ll work on the patch & depending on timing around which PR is ready for merge first, include the patch here or offer it on the other PR. Thanks again for pointing that out :)

RE: filterInventoryKnown - to make this easier to realize, I mention in a comment above ReattemptInitialBroadcast. However, I’m hesitant to impose too much interpretation in the code comments or commit messages, because I want to encourage independent thinking that might conflict with or challenge my own interpretation. Also because there are so many interactions it’s kind of impossible to encapsulate them all, and they will also change over time.

To be more specific..

If no GETDATA is processed before mempool initial-broadcast reattempt period expires, a new announcement is made to every peer. Due to inventory filter, this is only effective for new peers since last period expiration.

This is the normal condition, but iirc filterInventoryKnown is cleared out based on number of transactions, so maybe under super high traffic network conditions, an announcement is resent to the same peer?

And in regards to the second part of your comment, I don’t quite follow..

Transaction origin privacy would only be worsened if a) among new nodes there is a spying one

Say you have a set of unbroadcast transactions. You attempt initial broadcast, but your wifi is disconnected so they don’t go anywhere, but get inserted into filterInventoryKnown for your existing connections. A malicious node connects to you. ReattemptInitialBroadcast is triggered & you send your wallet transaction to this node. At that time, the node cannot identify that is your transaction with certainty. For it to carry out an attack, it needs to be the ONLY node you submitted the transaction to. Which is possible, but a specific subset of this condition.

b) no honest/reliable peers reported back a GETDATA. Given the mempool rebroadcast period length, if b) occurs, that would be a clue of bigger concern. (but if b) occurs that’s a reliability improvement IMO)

so, there are two main reasons a node wouldn’t report back a GETDATA. Either network issues (it didn’t receive announcement, you didn’t receive the GETDATA). Or it is malicious. If it is occurring because of network issues, it makes sense to try again. Is this the reliability improvement that you are talking about? because thats the main point of this PR :)

amitiuttarwar commented at 0:16 am on March 31, 2020: contributor

ok! travis is green, all current review comments are addressed.

this PR is ready for another round of review! thanks in advance :)

sipa commented at 2:39 am on April 2, 2020: member

Do we have any numbers on how important wallet rebroadcasting is? I have a small concern that stopping the “fast” initial broadcast after a single successful transmission may sometimes not be enough (e.g. the peer it’s sent to goes offline before forwarding it further). This seems unlikely to happen (both we only submitting once, and them failing immediately afterwards), and even if it happens, it wouldn’t be a disaster as the once per day rebroadcast would still occur.

Besides that, concept and code review ACK.

naumenkogs commented at 7:19 pm on April 2, 2020: member

Code review re-ACK

amitiuttarwar commented at 10:17 pm on April 2, 2020: contributor

thanks for the review @sipa!

unfortunately, I don’t have any real-life numbers of wallet rebroadcast usage for getting a transaction initially broadcast. I’d love relevant data. (anyone reading this… ) please let me know if you have any ideas of how I could get relevant info.

I’ve mostly been operating in theoretical possibility space & have come to the same conclusion- the likelihood seems slim & there’s an additional safety net. in this implementation, it’d be possible to increase the guarantee (eg. ensure we fulfill two GETDATA requests instead of one), but the tradeoff hasn’t seemed worth it.

in src/net_processing.cpp:1614 in e82fdd329c outdated

1610+            if (push) {
1611+                // We intepret processing a GETDATA for a transaction as a successful initial broadcast
1612+                // and remove it from our unbroadcast set.
1613+                int num = WITH_LOCK(mempool.cs, return mempool.m_unbroadcast_txids.erase(inv.hash););
1614+                if (num) {
1615+                    LogPrint(BCLog::NET, "Removed %i from m_unbroadcast_txids \n", inv.hash.GetHex());

MarcoFalke commented at 6:47 pm on April 13, 2020:

e82fdd329c812623c6e8a9022322fe43e1000c5b

0                    LogPrint(BCLog::NET, "Removed %i from the set of unbroadcast txs\n", inv.hash.GetHex());

Any reason for the trailing whitespace?

Also, leaking source code names into the debug log might cause parse issues when the symbol is renamed

amitiuttarwar commented at 5:32 pm on April 23, 2020:

fixed

in src/txmempool.cpp:421 in e82fdd329c outdated

416@@ -417,6 +417,10 @@ void CTxMemPool::removeUnchecked(txiter it, MemPoolRemovalReason reason)
417     for (const CTxIn& txin : it->GetTx().vin)
418         mapNextTx.erase(txin.prevout);
419 
420+    if (mempool.m_unbroadcast_txids.erase(hash)) {
421+        LogPrint(BCLog::NET, "Removed %i from m_unbroadcast_txids before confirmation that txn was sent out. \n", hash.GetHex());

MarcoFalke commented at 6:48 pm on April 13, 2020:

0        LogPrint(BCLog::NET, "Removed %i from set of unbroadcast txs before confirmation that txn was sent out\n", hash.GetHex());

Same

jnewbery commented at 9:41 pm on April 15, 2020:

Is there a reason this is categorized as a NET log? I think it makes more sense as a MEMPOOL, or even uncategorized as a LogPrintf().

jnewbery commented at 9:42 pm on April 15, 2020:

Agree that it’s better to avoid using variable names in logs.

amitiuttarwar commented at 5:32 pm on April 23, 2020:

fixed

amitiuttarwar commented at 5:43 pm on April 23, 2020:

historic that I didn’t notice. fixed now

in src/txmempool.h:537 in e82fdd329c outdated

532@@ -533,6 +533,10 @@ class CTxMemPool
533     const setEntries & GetMemPoolParents(txiter entry) const EXCLUSIVE_LOCKS_REQUIRED(cs);
534     const setEntries & GetMemPoolChildren(txiter entry) const EXCLUSIVE_LOCKS_REQUIRED(cs);
535     uint64_t CalculateDescendantMaximum(txiter entry) const EXCLUSIVE_LOCKS_REQUIRED(cs);
536+
537+    // track locally submitted transactions & periodically retry initial broadcast

MarcoFalke commented at 6:49 pm on April 13, 2020:

This should probably be a doxygen comment. See the developer notes on how to get one.

0    /** track locally submitted transactions & periodically retry initial broadcast */

amitiuttarwar commented at 5:37 pm on April 23, 2020:

fixed & added doxygen comments for new methods I introduced into txmempool

in src/net_processing.cpp:782 in a11968c878 outdated

778@@ -779,6 +779,21 @@ void PeerLogicValidation::InitializeNode(CNode *pnode) {
779         PushNodeVersion(pnode, connman, GetTime());
780 }
781 
782+// Note: unbroadcast transactions will be filtered through filterInventoryKnown

MarcoFalke commented at 6:56 pm on April 13, 2020:

a11968c8780424ddb0fd01341101bb28ff6953a8

Sorry, I still don’t understand why this comment is relevant to rebroadcasting. I remember in an earlier discussion you mentioned that a user might disconnect their wifi. A tx might get added to filterInventoryKnown, but then never sent out.

I don’t think this is true. Either, we reconnect the wifi and be still connected to the peer, in which case we can push the inv out and wait for them to request the tx. Or, we drop the peer (either because the wifi isn’t turned on again, or it took us too long to do so), in which case a new peer will pop up and they will receive the inv (after 10-15 minutes).

jnewbery commented at 10:01 pm on April 15, 2020:

I agree with Marco that this comment can just be removed.

amitiuttarwar commented at 5:38 pm on April 23, 2020:

we reconnect the wifi and be still connected to the peer, in which case we can push the inv out and wait for them to request the tx.

ah, I didn’t realize. thanks!

removed the comment.

in src/net_processing.cpp:793 in a11968c878 outdated

788+    for (const uint256& txid : unbroadcast_txids) {
789+        RelayTransaction(txid, *connman);
790+    }
791+
792+    // schedule next run for 10-15 minutes in the future
793+    const std::chrono::milliseconds delta = std::chrono::milliseconds(10 * 60 * 1000 + GetRand(5 * 60 * 1000));

MarcoFalke commented at 7:00 pm on April 13, 2020:

0    const std::chrono::milliseconds delta = std::chrono::minutes{10} + GetRandMillis(std::chrono::minutes{5});

You might have to add GetRandMillis by copying from GetRandMicros.

jnewbery commented at 10:17 pm on April 15, 2020:

nit: could declare these two times as constants.

amitiuttarwar commented at 5:40 pm on April 23, 2020:

added the method GetRandMillis & updated here. I was thinking might make sense to have a template for the different chrono types? I haven’t looked into it yet but you mentioned that it wouldn’t work?

amitiuttarwar commented at 5:49 pm on April 23, 2020:

pass on this one, I pulled delta out for readability, but breaking into two constants seems a bit overkill to me

MarcoFalke commented at 8:24 pm on April 23, 2020:

@amitiuttarwar Done here: #18751

in src/validation.cpp:5030 in 6307582b4d outdated

5023@@ -5023,12 +5024,19 @@ bool LoadMempool(CTxMemPool& pool)
5024         for (const auto& i : mapDeltas) {
5025             pool.PrioritiseTransaction(i.first, i.second);
5026         }
5027+
5028+        {
5029+            LOCK(pool.cs);
5030+            file >> pool.m_unbroadcast_txids;

MarcoFalke commented at 7:18 pm on April 13, 2020:

6307582b4d30167a931678d12e99a00a0e75e81a

This is not safe. You are erasing the existing content of the m_unbroadcast_txids

amitiuttarwar commented at 5:42 pm on April 23, 2020:

ah, thanks for catching. fixed via new mempool methods

in src/validation.cpp:5087 in 6307582b4d outdated

5079@@ -5072,6 +5080,13 @@ bool DumpMempool(const CTxMemPool& pool)
5080         }
5081 
5082         file << mapDeltas;
5083+
5084+        {
5085+            LOCK(pool.cs);
5086+            LogPrintf("Writing %d unbroadcast transactions to disk.\n", pool.m_unbroadcast_txids.size());
5087+            file << pool.m_unbroadcast_txids;

MarcoFalke commented at 7:21 pm on April 13, 2020:

This is racy and you might end up with zombie txids here that are never removed because the corresponding mempool tx has not been written to disk

jnewbery commented at 10:50 pm on April 15, 2020:

I agree. All of the data should be brought out of the mempool under one lock. Make a local std::set<uint256> and populate it under the same lock as mapDeltas and vinfo get populated.

amitiuttarwar commented at 5:42 pm on April 23, 2020:

oops. thanks for catching. fixed

MarcoFalke approved

MarcoFalke commented at 7:21 pm on April 13, 2020: member

ACK 6307582b4d30167a931678d12e99a00a0e75e81a 🐫

Signature:

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA512
 2
 3ACK 6307582b4d30167a931678d12e99a00a0e75e81a 🐫
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
 7pUg7fQwAs3+uIjFusxFMhvNvRgjY+JbpGN8/UQWcXAe9bRFc2+zDMqZVrDlwOJyQ
 864QnpWf16aDzuKZdL7iLh574+jBJuFioMfq0ogPqctgVF9+PamnI9sRWEuZwjVVD
 94G+haPYyPi9uR/Ip5f4xbRT62Aq3oi7OY9necxeL2Aev3T/pHbLWyl8hh6nBt24y
10skq6k31A+5rOGaQY8VhMaBnp1p6irM1CH+6/XdkVeLNZi+7PGYBcbxPWnc6OLh7r
115bUl9XiMdg1LfVjPg601rQEEvU95spj5Xcm2YE4JblELgewYydF0PpCJcbgww7H5
12YW2BA2ubzBOXJTjzOUsU43NH3l5dAIMwGcHmNEmLTKzq1wbf0EmDngI/SgUv0VCU
13YBz6FExO8WSf42c2SDqNNEr4JoqkhERt/25nQ1MCQ9F9M4pEUZmZsedM40jp81LM
14Ua+f7rQJ9ML3+Y8/kjwE8AuU/m4+uBJEzA71qGmyACw2rRnLPp5LyDG89JDxJSpH
154bCE2dPF
16=qFW/
17-----END PGP SIGNATURE-----

Timestamp of file with hash 081497859ee990ba5d941ec43c32052fbadbcdd5237c93dca46251989de90d45 -

in src/net.cpp:1647 in 6307582b4d outdated

1643@@ -1644,17 +1644,6 @@ void CConnman::ThreadDNSAddressSeed()
1644     LogPrintf("%d addresses found from DNS seeds\n", found);
1645 }
1646 
1647-

jnewbery commented at 9:45 pm on April 15, 2020:

I also think this whitespace is ugly, but I think you can leave removing it for another time!

MarcoFalke commented at 11:10 pm on April 15, 2020:

Our guidelines say that whitespace may be fixed up when the line is touched for other reasons. I doubt anyone is going to touch an empty line any time soon. It seems fine to remove an empty line in net.cpp in a pull request that modifies p2p code.

jnewbery commented at 7:01 pm on April 16, 2020:

Touching a file in a commit for no other reason than to remove the whitespace doesn’t make any sense! I’m surprised that you would think otherwise.

amitiuttarwar commented at 5:44 pm on April 23, 2020:

😢

unfortunately I can’t disagree that its silly to touch a file just to remove whitespace.

empty lines in net.cpp… I will be back for you!

in test/functional/test_framework/mininode.py:613 in 6307582b4d outdated

605@@ -606,3 +606,19 @@ def send_txs_and_test(self, txs, node, *, success=True, expect_disconnect=False,
606                 # Check that none of the txs are now in the mempool
607                 for tx in txs:
608                     assert tx.hash not in raw_mempool, "{} tx found in mempool".format(tx.hash)
609+
610+class P2PTxInvStore(P2PInterface):

jnewbery commented at 10:08 pm on April 15, 2020:

I prefer all classes in the framework to have a docstring, even if it’s very simple. This one would be something like:

“A P2PInterface which stores a count of how many times each txid has been announced”

amitiuttarwar commented at 5:44 pm on April 23, 2020:

done

amitiuttarwar commented at 6:02 pm on April 23, 2020:

also added to the docstring at the top of the file.

in test/functional/test_framework/mininode.py:618 in 6307582b4d outdated

613+        self.tx_invs_received = defaultdict(int)
614+
615+    def on_inv(self, message):
616+        # Store how many times invs have been received for each tx.
617+        for i in message.inv:
618+            if i.type == 1:

jnewbery commented at 10:11 pm on April 15, 2020:

Can we import MSG_TX from .messages and if i.type == MSG_TX to remove the magic 1 number?

amitiuttarwar commented at 5:47 pm on April 23, 2020:

turns out, MSG_TX was already imported. fixed.

in test/functional/mempool_unbroadcast.py:14 in 6307582b4d outdated

 9+
10+from test_framework.mininode import P2PTxInvStore
11+from test_framework.test_framework import BitcoinTestFramework
12+from test_framework.util import (
13+    assert_equal,
14+    disconnect_nodes,

jnewbery commented at 10:13 pm on April 15, 2020:

nit: sort :stuck_out_tongue:

amitiuttarwar commented at 5:48 pm on April 23, 2020:

fixed

in test/functional/mempool_unbroadcast.py:87 in 6307582b4d outdated

81+    def test_txn_removal(self):
82+        self.log.info("Test that transactions removed from mempool are removed from unbroadcast set")
83+        node = self.nodes[0]
84+        disconnect_nodes(node, 1)
85+
86+        # since the node doesn't have any connections, it will not receive

jnewbery commented at 10:20 pm on April 15, 2020:

I don’t think this is true. You added a P2PTxInvStore in test_broadcast() which I think is still connected.

amitiuttarwar commented at 5:54 pm on April 23, 2020:

good point! looks like since P2PTxInvStore overwrites the on_inv function, it doesn’t send back a GETDATA, so the relevant piece of functionality from a connection isn’t triggered.

for this PR, I added a line to disconnect_p2ps so the expectations make more sense (making the comment true).

but I do think its confusing that P2PTxInvStore inherits from P2PInterface but abandons this one piece of functionality. I’m thinking as a small followup I can have P2PTxInvStore call through to the parent on_inv after it stores the txid to behave more as expected. what do you think?

fjahr commented at 5:07 pm on April 24, 2020:

You are missing the parenthesis behind the function call: node.disconnect_p2ps() so this line is not doing the disconnecting at the moment. So the comment is indeed wrong and it just “accidentally” works because P2PTxInvStore doesn’t send a GETDATA. I think I would expect it to send a GETDATA because it is part of the framework now, but in general, I think the P2PInterface is there so you can override the behavior to be what you need so I don’t have a strong feeling about it. If it was part of the test I would say it doesn’t matter but because it moves to the framework, slight preference to call the super behavior.

EDIT: For reference, I mean this line: https://github.com/bitcoin/bitcoin/pull/18038/files#diff-5e6f5e6aab04f82ec4b58148c4bdec72R85 the linking isn’t working properly because I commented on an outdated conversation.

amitiuttarwar commented at 8:23 pm on April 24, 2020:

🤦‍♀️ sigh. thanks for catching!

ya I agree with the reasoning of in test vs framework. I’ll do a small followup to add parens & have P2PTxInvStore call through super.

amitiuttarwar commented at 9:48 pm on April 28, 2020:

fixed here: https://github.com/bitcoin/bitcoin/pull/18807/commits/97f15b487f064cf359e8042afa25a5d0fb88bf8c

in test/functional/mempool_unbroadcast.py:97 in 6307582b4d outdated

92+        # add a connection to node
93+        conn = node.add_p2p_connection(P2PTxInvStore())
94+
95+        # the transaction should have been removed from the unbroadcast set
96+        # since it was removed from the mempool for MemPoolRemovalReason::BLOCK.
97+        # verify by checking it isn't broadcast to the node's new connection.

jnewbery commented at 10:23 pm on April 15, 2020:

You can also verify this by using assert_debug_log() to check that the “Removed %i from m_unbroadcast_txids before confirmation that txn was sent out” log has been printed. That might be simpler than connected a new peer.

amitiuttarwar commented at 5:54 pm on April 23, 2020:

seems more straightforward so I updated

jnewbery commented at 11:09 pm on April 15, 2020: member

Looks great. I’ve added a bunch of nits, but overall this change set is very nice. Strong concept ACK.

Architecturally, my only comment would be to make m_unbroadcast_txids private, and introduce three access functions AddUnbroadcastTx(), RemoveUnbroadcastTx() and GetUnbroadcastTxs(), so validation/net_processing aren’t reaching into CTxMempool, and if we decide to have more sophisticated unbroadcast logic in future and the internal data structures change, hopefully those clients don’t need to change their calling code as much.

Other things I was looking out for in this review:

this PR adds local data that identifies your own transactions (mempool.dat unbroadcast set and debug logs). I think that’s fine - if your bitcoin datadir is compromised you probably have bigger problems.
I initially thought that mempool.dat should be versioned, but this conversation: #18038 (review) has convinced me that I don’t need to worry. mempool.dat is such a short-lived file that versioning seems minimally useful.

One final comment: for your git commit logs, I recommend you wrap the body lines at 80 chars.

MarcoFalke commented at 11:13 pm on April 15, 2020: member

this PR adds local data that identifies your own transactions (mempool.dat unbroadcast set and debug logs). I think that’s fine - if your bitcoin datadir is compromised you probably have bigger problems.

All the information should already be accessible today in wallet.dat, and debug.log, so this is not getting worse.

jnewbery commented at 2:23 am on April 16, 2020: member

All the information should already be accessible today in wallet.dat, and debug.log, so this is not getting worse.

wallet.dat could be stored externally, and without wallet logging enabled I don’t think debug.log will indicate which transactions are yours. But like I said, I don’t think it’s a big concern.

MarcoFalke commented at 11:46 am on April 16, 2020: member

With wallet logging disabled, the tx source is still leaked through ATMP logging, validationinterface logging, RPC logging and NET logging. You’d probably have to pass -nodebuglogfile to make sure nothing is leaked.

fjahr commented at 12:56 pm on April 23, 2020: member

The underlying assumption here seems to be that all nodes are online 24/7 (or at least several days after a tx broadcast). I have heard from some users that they have a node on their laptop that is not online often but when they want to send a tx they launch it, wait for it to sync, and then broadcast the tx before shutting it down again. Should such a usage pattern be taken into consideration or should it be ignored?

MarcoFalke commented at 1:05 pm on April 23, 2020: member

@fjahr Can you explain why this is the underlying assumption? A node on a laptop should work fine wiht this. In fact this improves the user experience on exactly such systems.

fjahr commented at 1:10 pm on April 23, 2020: member

@fjahr Can you explain why this is the underlying assumption? A node on a laptop should work fine wiht this. In fact this improves the user experience on exactly such systems.

The way I understand the logic it seems like if a first attempt at broadcasting the tx fails, the user would need to remember that and relaunch their node ~24h later to retry which may be inconvenient compared to keeping the node running for another hour for 1-2 more retries. But maybe I am getting it wrong, I have not reviewed the code yet.

MarcoFalke commented at 1:21 pm on April 23, 2020: member

I don’t think we can trivially solve the case where you have only connections to malicious nodes that pretend to want your transaction, but then drop it on the floor.

Are you saying that waiting another hour after start-up could improve the peers you have and they’d be less like to drop your txs on the floor?

fjahr commented at 2:04 pm on April 23, 2020: member

I guess I just misunderstood the unbroadcast logic and it’s differences to the old wallet rebroadcast, now that I am reviewing the code it makes more sense and I see how it’s not an issue for these types of users.

amitiuttarwar force-pushed on Apr 23, 2020

amitiuttarwar commented at 6:23 pm on April 23, 2020: contributor

thanks for the reviews @MarcoFalke, @jnewbery & @fjahr

I’ve addressed all review comments. Fixed the issues with raciness around dumping & loading unbroadcast set to mempool.dat. Updated so m_unbroadcast_txids is private & exposes public methods to add / remove / get the set. And a bunch of small nit fixes. @fjahr

The underlying assumption here seems to be that all nodes are online 24/7

it seems like if a first attempt at broadcasting the tx fails, the user would need to remember that and relaunch their node ~24h later to retry

now that I am reviewing the code it makes more sense

ok, maybe you’re clear now but just incase / for other reviewers..

if we were to simply bump the wallet rebroadcast from ~1/30 min to ~1/day, we would have exactly the problem you describe. so we introduce the idea of the “unbroadcast” set to the mempool. instead of using wallet “rebroadcast” logic to do best effort of first delivery, we now use the mempool “unbroadcast” logic to serve this function. it does work a bit differently (tries more frequently & keeps trying until a GETDATA is received or txn is removed from mempool for a different reason), so reviewers should be critical about the changes.

in test/functional/mempool_unbroadcast.py:95 in c365159b89 outdated

90+        txhsh = node.sendtoaddress(addr, 0.0001)
91+
92+        # check transaction was removed from unbroadcast set due to presence in
93+        # a block
94+        removal_reason = "Removed {} from set of unbroadcast txns before confirmation that txn was sent out.".format(txhsh)
95+        with node.assert_debug_log(expected_msgs=removal_reason):

MarcoFalke commented at 7:01 pm on April 23, 2020:

This doesn’t work. You can try by modifying the string above

The type needs to be an array

amitiuttarwar commented at 9:48 pm on April 23, 2020:

good catch. fixed

in src/net_processing.cpp:1145 in c365159b89 outdated

1140@@ -1128,6 +1141,9 @@ PeerLogicValidation::PeerLogicValidation(CConnman* connmanIn, BanMan* banman, CS
1141     // timer.
1142     static_assert(EXTRA_PEER_CHECK_INTERVAL < STALE_CHECK_INTERVAL, "peer eviction timer should be less than stale tip check timer");
1143     scheduler.scheduleEvery([this, consensusParams] { this->CheckForStaleTipAndEvictPeers(consensusParams); }, std::chrono::seconds{EXTRA_PEER_CHECK_INTERVAL});
1144+
1145+    const std::chrono::milliseconds delta = std::chrono::milliseconds(10 * 60 * 1000 + GetRand(5 * 60 * 1000));

MarcoFalke commented at 7:03 pm on April 23, 2020:

Please use the same for consistency. Either by copy-pasting from above or by putting the two lines into a function

amitiuttarwar commented at 9:48 pm on April 23, 2020:

fixed

in src/txmempool.h:712 in c365159b89 outdated

707+        LOCK(cs);
708+        m_unbroadcast_txids.insert(txid);
709+    }
710+
711+    /** Removes a transaction from the unbroadcast set */
712+    inline void RemoveUnbroadcastTx(const uint256& txid, const bool unchecked = false) {

MarcoFalke commented at 7:12 pm on April 23, 2020:

Any reason why these are inline? I think we don’t that elsewhere.

Also, it might be good to move the implementation of this one to the cpp file, so that logging doesn’t have to be included in the header and compiled on every include.

amitiuttarwar commented at 7:28 pm on April 23, 2020:

thought it was more efficient since these functions are so simple.

we don’t have inline functions in txmempool, but there are plenty scattered around the codebase.

👍 RE logging. I can def move this function to the cpp. Do you prefer I move them all?

MarcoFalke commented at 7:35 pm on April 23, 2020:

The others can stay, but I think the inline keyword is still not needed? (With “inline” I meant “inline” (the keyword))

amitiuttarwar commented at 8:01 pm on April 23, 2020:

oh interesting.. reading around, seems like inline keyword is just a compiler hint & usually the compiler will deduce when necessary? the code compiles fine when I remove them. is there any harm in leaving it in? on the flip, is there any time they should be used?

sipa commented at 8:10 pm on April 23, 2020:

@amitiuttarwar inline has two independent effects:

A hint to the compiler that it should try inlining the function (both in C and C++). Modern compilers will do this automatically in many cases, with or without the keyword.
Indicating that this symbol may appear in multiple modules (but they will all be identical). This is only in C++, and makes it possible to have code in header files (as it will end up in every module that includes it). Member functions defined in classes always have this implicitly, no need to add the “inline” keyword for it (unless you also want to hinting effect above).

So the only time you have to use inline is for functions in headers files that are not member functions.

amitiuttarwar commented at 9:18 pm on April 23, 2020:

ah okay, this is very helpful. I think I’m understanding- its a clue to the compiler to try expanding in line, as well as an indicator that the symbol defined in more than one translation unit are the same, which the compiler can enforce. when a member function is defined, not just declared, within the class definition- the compiler automatically interprets this as an inline member function. so, the keyword inline in the implementation I have here is redundant.

amitiuttarwar commented at 9:24 pm on April 23, 2020:

but, since I’m pulling out the definition of RemoveUnbroadcastTx to txmempool.cpp, it will no longer be automatically inlined. So, adding inline there would be a hint to the compiler to inline the function at the call sites, but the compiler might do it anyway / ignore.

is this sounding correct?

sipa commented at 9:25 pm on April 23, 2020:

that the symbol defined in more than one translation unit are the same

Not just that they’re the same; that they may occur in multiple translation units in the first place. Without inline (or a member being defined in a class body), you will get a linker error if you define the same symbol in multiple translation units.

Unless (there’s always an unless…), it’s in an anonymous namespace as well (or static for globals). In that case, every translation unit will get its own individual copy, which won’t be seen as conflicting by the linker at all - but they also won’t get deduplicated.

sipa commented at 9:30 pm on April 23, 2020:

Once you’ve moved the function definition to the .cpp file, it can only get inlined anymore inside the txmempool.cpp translation unit (until we enable LTO, which we currently don’t do by default or in releases). With or without inline, it will still get inlined there, and with or without inline it won’t get inlined elsewhere. Adding the keyword may still make it a stronger hint to the compiler to do so - unsure.

amitiuttarwar commented at 9:38 pm on April 23, 2020:

ok ya, got the part where inline indicates permission for same symbol in multiple translation units. the part about inline in .cpp being limited to the file unless the optimization is enabled makes sense. and the special case for anonymous namespace is 🤯 and I’m going to go have to learn / tinker more 😛

but first I’m going to push fixups. thanks so much for these explanations! I’m learning a lot :)

amitiuttarwar commented at 9:50 pm on April 23, 2020:

removed inline keywords, moved RemoveUnbroadcastTx function definition to txmempool.cpp & removed including logging.h in the header.

MarcoFalke commented at 7:12 pm on April 23, 2020: member

ACK c365159b8922320ba542db97a43ae72a177380bb 💎

Signature:

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA512
 2
 3ACK c365159b8922320ba542db97a43ae72a177380bb 💎
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
 7pUi5TQv/UfFNEB6lDNi7ZjPTVQhYPj6JQsYc9pN6r8vGARGWt22PX9t2epW4/WRQ
 8GWyfjg+uSl3LQYBc1+GZZe2l0uxzH4bbLEN7fL0JxWqnTTboretkD/XEFO36jjGD
 9XOfzExoOKprAXx1TA1GQqqPyWQy3B+TCIxEHYpJxuQJ2iHKYoBwuqMGV1HHQHRiM
10q84fIho0WpgB7hNr56f7IqgdSEhSzIMA7UCJGdo2cnlQJf0CYspPmnx0biRkbZa9
118rYul+/cVI9EICVL/ZZ5Ab61QrqtVncm7fBo+zG/KkXIHKrTEmDfnrLsB0YCtwhM
12C9prYyQDLAJ5PreYZZbC7sdzn8IbC/wsdKVK+z2gx7fmGFRp+Zuxtag70+LSxNCn
138vAwGow0d2QIxBDUKHAGpkjg98XAMP5ZWPHDCM2Ky22wvoPSpIV8c+Q0uEf1eSMF
14E45fpdbik5tUi2ntm0TXdTwunnWQIJdlY4TocFdF+9ZzyM3vOgcEOrRDYzh47xzQ
15MIWA8UcU
16=Um+Y
17-----END PGP SIGNATURE-----

Timestamp of file with hash 40fb7d6151b1989018740bdc8a2582d53f1b63f76e24b76af6cb200fb664ce36 -

[mempool] Track "unbroadcast" transactions

- Mempool tracks locally submitted transactions (wallet or rpc)
- Transactions are removed from set when the node receives a GETDATA request
  from a peer, or if the transaction is removed from the mempool.

89eeb4a333

[util] Add method that returns random time in milliseconds 7e93eecce3

[p2p] Reattempt initial send of unbroadcast transactions

Every 10-15 minutes, the scheduler kicks off a job that queues unbroadcast
transactions onto each node.

e25e42f20a

[wallet] Update the rebroadcast frequency to be ~1/day.

Since the mempool unbroadcast mechanism handles the reattempts for initial
broadcast, the wallet rebroadcast attempts can be much less frequent
(previously ~1/30 min)

dc1da48dc5

[refactor/test] Extract P2PTxInvStore into test framework 6851502472

[test] Integration tests for unbroadcast functionality

Check that...
- mempool tracks & reattempts delivery of a transaction where a GETDATA hasn't
  been requested by a peer yet.
- transaction delivery is not attempted again after GETDATA is received.
- transaction is removed from the unbroadcast set when its removed from the
  mempool.

297a178536

[mempool] Persist unbroadcast set to mempool.dat

Ensure that the unbroadcast set will still be meaningful if the node is
restarted.

50fc4df6c4

amitiuttarwar force-pushed on Apr 23, 2020

MarcoFalke commented at 10:22 pm on April 23, 2020: member

ACK 50fc4df6c4e8a84bdda13ade7bed7a2131796f00, I think this is ready for merge now 👻

Signature:

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA512
 2
 3ACK 50fc4df6c4e8a84bdda13ade7bed7a2131796f00, I think this is ready for merge now 👻
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
 7pUirugv7BJubk3aTH2GX+7nvWrieZDFLtNjEb/auwRu/Jbr65RbmVDuMXLwYoOpJ
 8GYTvWwFz24vA1SUvduymfoYx2wm2IoDlscWld3BHRQ7CLWN0pl5QZMmmaSEcIQ73
 9GN+RxCSHC2rUlfHNlRa0z9ftsmxj4NXvITuC267Icm8ncxXRBUqAUI52Tdhoebhi
10ajCe4O0ELW8bm+QfhRQAZLDBEn9ozIkr1PF07GGpWK0l0ocXjJXvx9C6kh0Dk6/K
11Tjag22/C9USANwlNQM5ylZQAFjIvg4NrCW5YNsosD9fGG48CwHEGGIsjZdK83HLp
12+b/RiVqdgUEy0xIMVA/h8rzg9WIrw1wF30N8ojifoIXksBy6xiOacNNWbjMyZl5k
13EwQe39TciWyiC4SavgApLaNEm2O89F1ioFdEOBSuD1zTvk0NhESUcBP2c0t/QA55
14X2EG6E1+QqXh2QHD4FxUY2Ilc3FYLoxjXtZn0hXeYvVVRt5WVjZ8lKCmHAsSQQSD
15uEtIKPHC
16=fzbr
17-----END PGP SIGNATURE-----

Timestamp of file with hash a0ea0b9710aa41df23da3ac2e0fb074613017836932762a951df4f9384b87eae -

in src/validation.cpp:5028 in 50fc4df6c4

5023@@ -5023,12 +5024,21 @@ bool LoadMempool(CTxMemPool& pool)
5024         for (const auto& i : mapDeltas) {
5025             pool.PrioritiseTransaction(i.first, i.second);
5026         }
5027+
5028+        std::set<uint256> unbroadcast_txids;

sipa commented at 10:42 pm on April 23, 2020:

Perhaps this section for deserializing unbroadcast txids can be moved to a separate try … catch block, so that the normal “Imported …” message appears rather than a deserialization error?

amitiuttarwar commented at 11:41 pm on April 23, 2020:

is this in regards to preventing the one-time error when upgrading? or more general possible errors?

sipa commented at 11:53 pm on April 23, 2020:

Ah, it seems the return value of LoadMempool is actually ignored in case of failure, so it’s indeed just a warning issue. It would seem cleaner that LoadMempool returns either returns true for successfully loaded (whether unbroadcast_txids was present or not), and false + empty mempool otherwise. That’s just code esthetics though, no need to address it here.

sipa commented at 1:12 am on April 24, 2020: member

utACK 50fc4df6c4e8a84bdda13ade7bed7a2131796f00

MarcoFalke commented at 12:01 pm on April 24, 2020: member

@fjahr @naumenkogs @jnewbery @ariard Maybe one more re-ACK to push this over the finish line? :pray:

Merging this early gives more time to test while it is in master until the release of 0.21.0

in src/net_processing.cpp:1626 in 50fc4df6c4

1621@@ -1605,7 +1622,13 @@ void static ProcessGetData(CNode* pfrom, const CChainParams& chainparams, CConnm
1622                     push = true;
1623                 }
1624             }
1625-            if (!push) {
1626+
1627+            if (push) {

naumenkogs commented at 1:28 pm on April 24, 2020:

Imagine in future we will not care about privacy here (e.g., use Tor or whatnot), and allow push for unsolicited GETDATA. Then, if the condition for rebroadcast is still if (push) it would be trivial to censor a transaction by sending unsolicited GETDATA. It means that with this code, effectiveness of rebroadcast depend on this privacy behaviour we have. I’m afraid that this non-obvious relation might screw us up in future. I wish we documented this more, maybe in the comment right above. I’m fine with keeping it this way too, just want to raise some awareness here.

MarcoFalke commented at 1:39 pm on April 24, 2020:

Sorry, I don’t follow. How would a node send GETDATA without knowing the txid? Also, how would it censor (prevent you other connections from sending a GETDATA) by just sending a GETDATA?

amitiuttarwar commented at 8:20 pm on April 24, 2020:

I’m not fully following either, but a couple of points to highlight:

lets distinguish unbroadcast from rebroadcast. unbroadcast is extra attempts at successful initial broadcast. rebroadcast is repeated broadcasts (from the POV of the node, which cannot know full picture).
in this implementation, if victim sends INV to only one peer, peer is adversarial, replies back with GETDATA but doesn’t propagate it out to the network -> victim removes from unbroadcast set & doesn’t retry until ~1 day later when ResendWalletTransactions is triggered. so, prevents txn from making it out to mempool for a day.
@naumenkogs do you think unsolicited GETDATA messages would change this?

naumenkogs commented at 9:13 pm on April 24, 2020:

Imagine a lightning node submits a tx, but it is offline, and will be online in 30 minutes to relay it from Unbroadcast. Now an attacker connect to the victim once they got online, and send GETDATA for that transaction (before a victim broadcasts from unbroadcast).

Right now, it will be ignored, because it’s unsolicited (for privacy reasons!). If we discard those privacy reasons, and now forget about this inherent unbroadcast behaviour, and allow unsolicited GETDATA, an attacker will be able to prevent us from broadcasting a tx, by removing a tx from unbroadcast.

And yeah, one day for lightning is pretty bad, you will simply loose money in most of the implementations today.

ariard commented at 11:14 pm on April 24, 2020:

Yes that’s a good point, in LN there is class of transaction where txid is known from remote party due to the fact they counter-sign them (HTLC-timeout) but honest party have to broadcast with time-constrain to avoid any risk of money-loss.

Right now most of implementations are pretty aggressive at triggering rebroadcast at every block, i.e calling again sendrawtransaction which isn’t great for privacy and this may change in the future. That said, often offline nodes, like mobile, won’t use core as a backend and broadcast directly their transaction on the p2p layer or submit to a third-party server. Routing nodes assumed to run core, should rebroadcast at every block so it would override some attacker messing with their unbroadcast logic.

I think adopting wtxid would solve this issue because counterparty won’t know about witness, at least for LN. But I agree, we should document this to keep track (add a comment commit on top or as a follow-up)

MarcoFalke commented at 11:21 pm on April 24, 2020:

A simpler fix would be to add a one-line check that pfrom is outbound, no?

ariard commented at 0:31 am on April 25, 2020:

I think that’s okay as a fix, it won’t degrade the reliability improvement of aimed by this PR because we already announce sooner to outbound. But we should document better these maybe-cases of time-sensitive tx propagation obstruction (both at p2p/mempool layers) like last thread on ml shows.

amitiuttarwar commented at 2:19 am on April 29, 2020:

ok, updating to check pfrom is from outbound sounds good to me. increases delivery guarantee for the general case too. I added this to the todo list on follow ups pr #18807#issue-410374742. code is just 1 line change, but I want to make sure to update other relevant comments to clarify its removed via outbound getdata & hopefully add a test case.

I’m going to resolve this comment here. thanks!

amitiuttarwar commented at 7:27 pm on May 1, 2020:

on further thought, I don’t think it makes sense to only remove unbroadcast txns on receiving GETDATA from outbound peers.

this relies on the expectation that outbound peers get the txn before inbound peers (otherwise they might relay), which isn’t necessarily true in edge cases. if we were to implement, I’d want to add logic to remove from unbroadcast set if we receive an inv from a peer. However, doing all this seems unnecessary right now, especially considering the greater direction of this project (#16698).

naumenkogs commented at 1:35 pm on April 24, 2020: member

utACK 50fc4df

fjahr commented at 5:11 pm on April 24, 2020: member

Code review ACK 50fc4df6c4e8a84bdda13ade7bed7a2131796f00

minus the disconnect_p2ps() thing. I would be ok if you save ACKs and fix it in a follow-up since you have more work that builds on this anyway.

Now that I understood it, very nice change! :)

in src/txmempool.h:705 in 89eeb4a333 outdated

700@@ -698,6 +701,21 @@ class CTxMemPool
701 
702     size_t DynamicMemoryUsage() const;
703 
704+    /** Adds a transaction to the unbroadcast set */
705+    void AddUnbroadcastTx(const uint256& txid) {

ariard commented at 11:42 pm on April 24, 2020:

89eeb4a

Should we annotate this new methods EXCLUSIVE_LOCKS_REQUIRED(cs_main, cs) to increase confidence that lock order is preserved ? (like some of them are called from node helpers code and we may refactor lock tacking there at some point)

jonatack commented at 12:45 pm on April 27, 2020:

89eeb4a

Should we annotate this new methods EXCLUSIVE_LOCKS_REQUIRED(cs_main, cs) to increase confidence that lock order is preserved ? (like some of them are called from node helpers code and we may refactor lock tacking there at some point)

Was asking myself this as well.

MarcoFalke commented at 12:49 pm on April 27, 2020:

Enforcing lock order with annotations has never been done in the past. This is generally a layer violation, see also #18458 (comment)

amitiuttarwar commented at 0:57 am on April 29, 2020:

we don’t acquire cs_main lock for this function, so I don’t understand how this comment is applicable?

MarcoFalke commented at 1:25 am on April 29, 2020:

Maybe the comments were not about lock order, but whether the cs_main lock is needed as per #14963? :thinking:

Haven’t thought about this, but if I had to guess, I’d say no.

in src/net_processing.cpp:790 in e25e42f20a outdated

785+
786+    for (const uint256& txid : unbroadcast_txids) {
787+        RelayTransaction(txid, *connman);
788+    }
789+
790+    // schedule next run for 10-15 minutes in the future

ariard commented at 11:49 pm on April 24, 2020:

e25e42f

You could have comment that rational of recursively calling scheduleFromNow is to get a different delta every time and that way avoid an observable reattempt-period fingerprint.

amitiuttarwar commented at 9:42 pm on April 28, 2020:

👍 added in https://github.com/bitcoin/bitcoin/pull/18807/commits/b5bb038f0eae2eefb923343fdb05b3992c32c208

in src/wallet/wallet.cpp:1982 in dc1da48dc5 outdated

1977@@ -1978,7 +1978,8 @@ void CWallet::ResendWalletTransactions()
1978     // that these are our transactions.
1979     if (GetTime() < nNextResend || !fBroadcastTransactions) return;
1980     bool fFirst = (nNextResend == 0);
1981-    nNextResend = GetTime() + GetRand(30 * 60);
1982+    // resend 12-36 hours from now, ~1 day on average.
1983+    nNextResend = GetTime() + (12 * 60 * 60) + GetRand(24 * 60 * 60);

ariard commented at 11:52 pm on April 24, 2020:

dc1da48 @MarcoFalke should we add a release note for this, I mean that’s a substantial wallet behavior change and people may have stuff depending on this ?

amitiuttarwar commented at 1:20 am on April 25, 2020:

yeah definitely. I’m planning to add a release note as a follow up.

jnewbery commented at 6:20 pm on April 28, 2020:

For follow-up PR: I think it should now be possible to remove the nLastResend logic, since we’d always expect there to be a block in the 12-36 hours since the last rebroadcast attempt.

amitiuttarwar commented at 9:43 pm on April 28, 2020:

added here: https://github.com/bitcoin/bitcoin/pull/18807/commits/bff4e9a5005409811e3f48c8d8877d63b32c9054

amitiuttarwar commented at 9:55 pm on April 28, 2020:

yes good point. noted here #18807#issue-410374742 and will incorporate into the follow-up. resolving this comment in favor of tracking there.

thanks!

ariard commented at 0:23 am on April 25, 2020: member

Code Review ACK 50fc4df (minor points no need to invalid other ACKs)

unfortunately, I don’t have any real-life numbers of wallet rebroadcast usage for getting a transaction initially broadcast. I’d love relevant data. (anyone reading this… ) please let me know if you have any ideas of how I could get relevant info.

Unless you run a spying node, that’s going to be hard because if tx is still in network mempools it won’t propagate again through the network, and even if you do you will still have to dissociate Core wallet from other ones. It’s sadly doable by observing tx construction pattern but a lot of work.

wallet.dat could be stored externally, and without wallet logging enabled I don’t think debug.log will indicate which transactions are yours. But like I said, I don’t think it’s a big concern.

You’re likely shrinkdebugfile or trim your log regularly for host maintenance but that’s not the same for mempool.dat, unbroadcasted is assumed to be short-lived should be a minor leak. There is still the edge case of you using a new mempool.dat with old node and unbroadcasted never going to be dry-up?

in src/net_processing.cpp:792 in 50fc4df6c4

787+        RelayTransaction(txid, *connman);
788+    }
789+
790+    // schedule next run for 10-15 minutes in the future
791+    const std::chrono::milliseconds delta = std::chrono::minutes{10} + GetRandMillis(std::chrono::minutes{5});
792+    scheduler.scheduleFromNow([&] { ReattemptInitialBroadcast(scheduler); }, delta);

jonatack commented at 6:58 pm on April 26, 2020:

In e25e42f, would suggest replacing the two instances of const std::chrono::milliseconds delta = std::chrono::minutes{10} + GetRandMillis(std::chrono::minutes{5}); and their associated comments, with a well-named function that can be called directly within each scheduleFromNow. The function could be Doxygen-commented with @ariard’s #18038 (review) above.

amitiuttarwar commented at 9:53 pm on April 28, 2020:

thanks for the suggestion but I am a pass. I tried it out and it comes out to be about double the code overall, and a lot of passing the scheduler around.

in test/functional/mempool_unbroadcast.py:68 in 50fc4df6c4

63+
64+        self.log.info("Reconnect nodes & check if they are sent to node 1")
65+        connect_nodes(node, 1)
66+
67+        # fast forward into the future & ensure that the second node has the txns
68+        node.mockscheduler(15 * 60)  # 15 min in seconds

jonatack commented at 10:46 am on April 27, 2020:

297a178 suggestion

 0+MAX_INITIAL_BROADCAST_DELAY = 15 * 60  # 15 minutes in seconds
 1 
 2 class MempoolUnbroadcastTest(BitcoinTestFramework):
 3     def set_test_params(self):
 4@@ -65,7 +66,7 @@ class MempoolUnbroadcastTest(BitcoinTestFramework):
 5         connect_nodes(node, 1)
 6 
 7         # fast forward into the future & ensure that the second node has the txns
 8-        node.mockscheduler(15 * 60)  # 15 min in seconds
 9+        node.mockscheduler(MAX_INITIAL_BROADCAST_DELAY)  # 15 min in seconds
10         self.sync_mempools(timeout=30)
11         mempool = self.nodes[1].getrawmempool()
12         assert rpc_tx_hsh in mempool
13@@ -74,7 +75,7 @@ class MempoolUnbroadcastTest(BitcoinTestFramework):
14         self.log.info("Add another connection & ensure transactions aren't broadcast again")
15 
16         conn = node.add_p2p_connection(P2PTxInvStore())
17-        node.mockscheduler(15 * 60)
18+        node.mockscheduler(MAX_INITIAL_BROADCAST_DELAY)
19         time.sleep(5)

amitiuttarwar commented at 9:49 pm on April 28, 2020:

taken in https://github.com/bitcoin/bitcoin/pull/18807/commits/97f15b487f064cf359e8042afa25a5d0fb88bf8c (PR #18807)

in test/functional/mempool_unbroadcast.py:78 in 50fc4df6c4

73+
74+        self.log.info("Add another connection & ensure transactions aren't broadcast again")
75+
76+        conn = node.add_p2p_connection(P2PTxInvStore())
77+        node.mockscheduler(15 * 60)
78+        time.sleep(5)

jonatack commented at 10:48 am on April 27, 2020:

297a178 is the 5 second sleep necessary? It adds 50% to the run time of this file which appears to pass reliably without it. If it is needed, could add a comment explaining why?

jnewbery commented at 6:32 pm on April 28, 2020:

I think it’s to give the scheduler thread time to wake and call ReattemptInitialBroadcast() again. If you remove the sleep, then the transaction could potentially be broadcast after the assert_equal below and the test wouldn’t test that the tx isn’t being unbroadcast.

I think the sleep could be a bit shorter, and a comment wouldn’t harm, but only if the PR is being retouched.

amitiuttarwar commented at 9:52 pm on April 28, 2020:

yup. reduced sleep and added comment here: 97f15b487f064cf359e8042afa25a5d0fb88bf8c (PR #18807)

in test/functional/mempool_unbroadcast.py:85 in 50fc4df6c4

80+
81+    def test_txn_removal(self):
82+        self.log.info("Test that transactions removed from mempool are removed from unbroadcast set")
83+        node = self.nodes[0]
84+        disconnect_nodes(node, 1)
85+        node.disconnect_p2ps

jonatack commented at 10:50 am on April 27, 2020:

297a178 this looks like a bug?

0-        node.disconnect_p2ps
1+        node.disconnect_p2ps()

amitiuttarwar commented at 9:49 pm on April 28, 2020:

yes, see #18038 (review). fixed in https://github.com/bitcoin/bitcoin/pull/18807/commits/97f15b487f064cf359e8042afa25a5d0fb88bf8c.

in test/functional/wallet_resendwallettransactions.py:60 in 50fc4df6c4

58-        # Use mocktime and give an extra 5 minutes to be sure.
59-        rebroadcast_time = int(time.time()) + 41 * 60
60+        self.log.info("Bump time & check that transaction is rebroadcast")
61+        # Transaction should be rebroadcast approximately 24 hours in the future,
62+        # but can range from 12-36. So bump 36 hours to be sure.
63+        rebroadcast_time = int(time.time()) + 36 * 60 * 60

jonatack commented at 11:58 am on April 27, 2020:

In dc1da48 I think a test is missing here to check that the transaction is not rebroadcast the first 12 hours.

 0        # Check transaction is rebroadcast in 12-36 hour time interval.
 1        time_now = int(time.time())
 2
 3        self.log.info("Bump time 12 hours from now, check transaction is not yet rebroadcast")
 4        rebroadcast_time = time_now  + (12 * 60 * 60)
 5        node.setmocktime(rebroadcast_time)
 6        assert_equal(node.p2ps[1].tx_invs_received[txid], 0)
 7
 8        self.log.info("Bump time 36 hours from now, check transaction is rebroadcast")
 9        rebroadcast_time = time_now + (36 * 60 * 60)
10        node.setmocktime(rebroadcast_time)
11        wait_until(lambda: node.p2ps[1].tx_invs_received[txid] >= 1, lock=mininode_lock)

amitiuttarwar commented at 9:53 pm on April 28, 2020:

thanks! added test here: e77984eaeb1fc10d66e64574f006c2781b38fec7 (PR#18807)

jonatack commented at 10:00 pm on April 28, 2020:

thanks @amitiuttarwar

jnewbery commented at 3:31 pm on April 29, 2020:

I’m not sure this is actually testing anything. If you bump the mocktime and then immediately assert that you haven’t received the tx, you haven’t given the scheduler any time to do anything.

jonatack commented at 12:38 pm on April 27, 2020: member

Concept ACK

I think this is close. Thoughts:

ISTM #16698 (comment) should be mentioned up-front in the PR description as well as a co-author mention in the relevant commit
if the work in @ariard’s commit 2d8ae9a838f5c6ec43db15a7bd36554c351b7442 was used, it would be kind to pull in the commit
the bug in the functional test cited by @fjahr is trivial to fix and re-ACK the diff and should probably be fixed without waiting for a follow-up
other comments follow below, feel free to pick and choose

Last but not least, the resend timer has not been changed since e89b9f6a2ab e.g. way back to 2011 or earlier. It seems best to not rush, and to document and preserve the most salient points of the discussion in the code and tests.

in test/functional/mempool_unbroadcast.py:84 in 50fc4df6c4

79+        assert_equal(len(conn.get_invs()), 0)
80+
81+    def test_txn_removal(self):
82+        self.log.info("Test that transactions removed from mempool are removed from unbroadcast set")
83+        node = self.nodes[0]
84+        disconnect_nodes(node, 1)

jnewbery commented at 6:36 pm on April 28, 2020:

When you’re retouching this, I suggest moving the disconnection logic to the end of test_broadcast(), so that each sub-test starts with the same state (no connections). It’s not a problem now, but in larger test files where there are many subtests, lingering state between subtests makes debugging more difficult.

amitiuttarwar commented at 9:56 pm on April 28, 2020:

makes sense. updated here: 97f15b487f064cf359e8042afa25a5d0fb88bf8c (PR #18807) and will keep in mind when writing future tests.

in test/functional/mempool_unbroadcast.py:94 in 50fc4df6c4

89+        addr = node.getnewaddress()
90+        txhsh = node.sendtoaddress(addr, 0.0001)
91+
92+        # check transaction was removed from unbroadcast set due to presence in
93+        # a block
94+        removal_reason = "Removed {} from set of unbroadcast txns before confirmation that txn was sent out".format(txhsh)

jnewbery commented at 6:38 pm on April 28, 2020:

(for a future PR) it’d be a nice enhancement to update:

the getmempoolinfo RPC to show an unbroadcast count
the getrawmempool RPCs to show an unbroadcast flag for each transaction

That’d make it easier to test this directly rather than through log-parsing.

amitiuttarwar commented at 10:02 pm on April 28, 2020:

good idea! def seems useful to expose some unbroadcast info via the RPCs. I’ve added it to the follow up todos here: #18807#issue-410374742.

but not sure how what you’re proposing would lead to being able to change this test? because this test is verifying the reason the txn was removed from unbroadcast set. If we just check that its no longer in the set, we don’t know if its because of another reason like processing a GETDATA

jnewbery commented at 1:20 am on April 29, 2020:

If you disconnect all P2P connections, then there’s no peer to send a GETDATA.

amitiuttarwar commented at 1:48 am on April 29, 2020:

in that case yes (assuming the test actually works and you don’t accidentally write a bug like I did :))

what I’m trying to get at is that the log parsing tests the expectation more directly by confirming the reason for removal. the removal reasons / code pathways are simple enough right now, but over time they might change and if we were just checking for removal the test might not fail even though the expectations have changed.

anyway, I don’t think this is a big deal? we could even test both in the follow-up =P I’m going to resolve this conversation, but feel free to lmk if you feel strongly.

in src/net_processing.cpp:785 in 50fc4df6c4

778@@ -779,6 +779,19 @@ void PeerLogicValidation::InitializeNode(CNode *pnode) {
779         PushNodeVersion(pnode, connman, GetTime());
780 }
781 
782+void PeerLogicValidation::ReattemptInitialBroadcast(CScheduler& scheduler) const
783+{
784+    std::set<uint256> unbroadcast_txids = m_mempool.GetUnbroadcastTxs();
785+

jnewbery commented at 7:18 pm on April 28, 2020:

For a follow-up, I think we should sanity check that the transaction still exists in the mempool before rebroadcasting. If not, delete the entry from the unbroadcast set.

If we don’t do this, then a logic error where the entry isn’t removed from the unbroadcast set when the tx is removed from the mempool turns into a memory leak and a network spam.

(Removal logic in removeUnchecked() is pretty straightforward and looks fine in this PR, but adding this extra check does no harm. It’d prevent a bug like #18038 (review) from causing eternal network spam)

amitiuttarwar commented at 9:58 pm on April 28, 2020:

yeah totally agree. also was planning this for follow up. I’ve noted as a todo here: #18807#issue-410374742. Resolving this conversation and will track there.

in src/txmempool.h:553 in 50fc4df6c4

548@@ -549,6 +549,9 @@ class CTxMemPool
549 
550     std::vector<indexed_transaction_set::const_iterator> GetSortedDepthAndScore() const EXCLUSIVE_LOCKS_REQUIRED(cs);
551 
552+    /** track locally submitted transactions to periodically retry initial broadcast */
553+    std::set<uint256> m_unbroadcast_txids GUARDED_BY(cs);

jnewbery commented at 7:19 pm on April 28, 2020:

I think this could be an unordered_set for constant-time insertion/lookup, but the size is going to be so small that it doesn’t matter.

amitiuttarwar commented at 10:20 pm on April 28, 2020:

ah, that makes sense. enforces the same uniqueness guarantees that I need, but the ordering is unnecessary.

jnewbery commented at 7:33 pm on April 28, 2020: member

utACK 50fc4df6c4e8a84bdda13ade7bed7a2131796f00.

I’ve left a few small comments, but they’re all just observations or suggestions for a follow-up PR. I think this is ready for merge.

amitiuttarwar commented at 10:11 pm on April 28, 2020: contributor

thank you for the reviews everyone! 🙏

The current tip 50fc4df currently has 6 ACKs on it, so I’ve opened #18807 to address the last bits. @jonatack

ISTM #16698 (comment) should be mentioned up-front in the PR description as well as a co-author mention in the relevant commit

I’ve updated the PR description to reflect this but I disagree that a co-author mention is necessary in this case. Crediting in-depth conceptual review would be quite a tall task given how helpful people are around here :)

if the work in @ariard’s commit 2d8ae9a was used, it would be kind to pull in the commit

Agree here. I reached out to @ariard yesterday to ask if he’d like to be added as a co-author & expressed preferring not to invalidate the ACKs.

Last but not least, the resend timer has not been changed since e89b9f6 e.g. way back to 2011 or earlier. It seems best to not rush, and to document and preserve the most salient points of the discussion in the code and tests.

I agree that we should not rush. We are changing a core behavior of the wallet and P2P around how transactions are being broadcast. Having critical review analyze different possible repercussions of this functionality is extremely important. So many thanks to all of the reviewers who have thought critically about these changes 🙌

I also agree in the value of documentation and testing. I have incorporated the outstanding feedback in the #18807 . I would strongly prefer not to consume reviewers’ time with a re-ACK so hoping further clean-up items can go there.

MarcoFalke added the label Review club on Apr 28, 2020

in test/functional/mempool_persist.py:89 in 50fc4df6c4

82@@ -80,6 +83,11 @@ def run_test(self):
83         assert_greater_than_or_equal(tx_creation_time, tx_creation_time_lower)
84         assert_greater_than_or_equal(tx_creation_time_higher, tx_creation_time)
85 
86+        # disconnect nodes & make a txn that remains in the unbroadcast set.
87+        disconnect_nodes(self.nodes[0], 2)
88+        self.nodes[0].sendtoaddress(self.nodes[2].getnewaddress(), Decimal("12"))
89+        connect_nodes(self.nodes[0], 2)

robot-visions commented at 6:31 am on April 29, 2020:

If I add time.sleep(10) before or after connect_nodes, then the test fails locally for me (the mempool for node 2 has 6, instead of the expected 5 transactions).

Does this test depend on particular timings (e.g. node 1 doesn’t have time to relay the transaction to node 2 before the nodes are all shut down)? If so, in a future change would it make sense to either (i) document the necessary timing assumptions, or (ii) update the test to depend less on particular timings?

amitiuttarwar commented at 9:55 pm on May 1, 2020:

good catch! this is a bug. line 87 isn’t actually disconnecting the two nodes because its passing node_num of 2 rather than 1

I’ve added commit 41fd9c1b7b372d7769e005db1d92000c0373651e on the follow-up PR #18807 to fix this issue.

thank you!

robot-visions commented at 3:33 am on May 2, 2020:

Confirmed that with 41fd9c1b7b372d7769e005db1d92000c0373651e the test passes even with random time.sleep(10) thrown. Thanks for updating!

robot-visions commented at 6:54 am on April 29, 2020: contributor

ACK 50fc4df6c4e8a84bdda13ade7bed7a2131796f00

Great idea, thanks for working on this. Also, thanks for structuring your commits in such an organized way; it made the PR incredibly nice to review!

fanquake merged this on Apr 29, 2020

fanquake closed this on Apr 29, 2020

sidhujag referenced this in commit 7fc0f22e3d on Apr 29, 2020

MarcoFalke commented at 7:56 pm on April 29, 2020: member

The rebroadcast behaviour changes for own wallets connected on the p2p interface, as opposed to local wallets or wallets connected on the rpc interface.

For context, see the IRC log: https://bitcoincore.reviews/18038.html

fanquake referenced this in commit ad3a61c5f5 on May 21, 2020

sidhujag referenced this in commit 2c5b7d1147 on May 22, 2020

MarcoFalke referenced this in commit 826fe9c667 on May 30, 2020

sidhujag referenced this in commit e89c4bd9d2 on May 31, 2020

ComputerCraftr referenced this in commit 285bce192a on Jun 10, 2020

ComputerCraftr referenced this in commit 6d07be106f on Jun 10, 2020

ComputerCraftr referenced this in commit d2fddf6f43 on Jun 10, 2020

ComputerCraftr referenced this in commit 69e7eee7a1 on Jun 10, 2020

ComputerCraftr referenced this in commit 4abc92194d on Jun 10, 2020

ComputerCraftr referenced this in commit 1cf6ab1d17 on Jun 10, 2020

MarcoFalke referenced this in commit 0afbeb73cc on Jun 16, 2020

sidhujag referenced this in commit 7dda7c0a8c on Jul 7, 2020

sidhujag referenced this in commit 990a3ee8c8 on Nov 10, 2020

MarcoFalke referenced this in commit c37600777e on Jan 5, 2021

sidhujag referenced this in commit 80b94f53e0 on Jan 5, 2021

Fabcien referenced this in commit 674a222fdb on Jan 21, 2021

Fabcien referenced this in commit 9a88bcf461 on Jan 21, 2021

Fabcien referenced this in commit 0ea6595322 on Jan 21, 2021

Fabcien referenced this in commit 232939a301 on Jan 21, 2021

deadalnix referenced this in commit 1167285f91 on Jan 21, 2021

DrahtBot locked this on Feb 15, 2022

P2P: Mempool tracks locally submitted transactions to improve wallet privacy #18038

Conflicts