Use wtxid for transaction relay

sdaftuar commented at 4:42 pm on January 31, 2020: member

Using txids (a transaction’s hash, without witness) for transaction relay is problematic, post-segwit – if a peer gives us a segwit transaction that fails policy checks, it could be because the txid associated with the transaction is definitely unacceptable to our node (regardless of the witness), or it could be that the transaction was malleated and with a different witness, the txid could be accepted to our mempool.

We have a bloom filter of recently rejected transactions, whose purpose is to help us avoid redownloading and revalidating transactions that fail to be accepted, but because of this potential for witness malleability to interfere with relay of valid transactions, we do not use the filter for segwit transactions. This issue is discussed at some length in #8279. The effect of this is that whenever a segwit transaction that fails policy checks is relayed, a node would download that transaction from every peer announcing it, because it has no way presently to cache failure. Historically this hasn’t been a big problem, but if/when policy for accepting segwit transactions were to change (eg taproot, or any other change), we could expect older nodes talking to newer nodes to be wasting bandwidth because of this.

As discussed in that issue, switching to wtxid-based relay solves this problem – by using an identifier for a transaction that commits to all the data in our relay protocol, we can be certain if a transaction that a peer is announcing is one that we’ve already tried to process, or if it’s something new. This PR introduces support for wtxid-based relay with peers that support it (and remains backwards compatible with peers that use txids for relay, of course).

Apart from code correctness, one issue to be aware of is that by downloading from old and new peers alike, we should expect there to be some bandwidth wasted, because sometimes we might download the same transaction via txid-relay as well as wtxid-relay. The last commit in this PR implements a heuristic I want to analyze, which is to just delay relay from txid-relay peers by 2 seconds, if we have at least 1 wtxid-based peer. I’ve just started running a couple nodes with this heuristic so I can measure how well it works, but I’m open to other ideas for minimizing that issue. In the long run, I think this will be essentially a non-issue, so I don’t think it’s too big a concern, we just need to bite the bullet and deal with it during upgrade.

Finally, this proposal would need a simple BIP describing the changes, which I haven’t yet drafted. However, review and testing of this code in the interim would be welcome.

To do items:

Write BIP explaining the spec here (1 new p2p message for negotiating wtxid-based relay, along with a new INV type)
Measure and evaluate a heuristic for minimizing how often a node downloads the same transaction twice, when connected to old and new nodes.

DrahtBot added the label Mempool on Jan 31, 2020

DrahtBot added the label P2P on Jan 31, 2020

DrahtBot commented at 5:55 pm on January 31, 2020: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#19556 (Remove mempool global by MarcoFalke)
#19498 (Tidy up ProcessOrphanTx by jnewbery)
#19488 (Refactor mempool.dat to be extensible, and store missing info by luke-jr)
#19478 (Remove CTxMempool::mapLinks data structure member by JeremyRubin)
#19306 (refactor: Replace RecursiveMutex with Mutex in CTxMemPool by hebasto)
#19184 (Overhaul transaction request logic by sipa)
#19134 (test: Replace global wait_until with BitcoinTestFramework.wait_until and mininode.wait_until by dboures)
#19031 (Implement ADDRv2 support (part of BIP155) by vasild)
#18086 (Accurately account for mempool index memory by sipa)
#12677 (RPC: Add ancestor{count,size,fees} to listunspent output by luke-jr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

sdaftuar force-pushed on Jan 31, 2020

in src/net_processing.cpp:1641 in 310aafa76b outdated

1637             if (mi != mapRelay.end()) {
1638                 connman->PushMessage(pfrom, msgMaker.Make(nSendFlags, NetMsgType::TX, *mi->second));
1639                 push = true;
1640             } else {
1641-                auto txinfo = mempool.info(inv.hash);
1642+                auto txinfo = mempool.info(inv.hash, it->type == MSG_WTX);

sipa commented at 0:02 am on February 1, 2020:

You’re dereferencing it here after it has been incremented (see Travis failure). I think you want to use inv.type instead.

sdaftuar commented at 11:37 am on February 1, 2020:

Thanks, fixed.

in src/net_processing.cpp:2180 in ee29e7debc outdated

2174@@ -2175,6 +2175,10 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
2175             nCMPCTBLOCKVersion = 1;
2176             connman->PushMessage(pfrom, msgMaker.Make(NetMsgType::SENDCMPCT, fAnnounceUsingCMPCTBLOCK, nCMPCTBLOCKVersion));
2177         }
2178+
2179+        if (pfrom->nVersion >= WTXID_RELAY_VERSION) {
2180+            connman->PushMessage(pfrom, msgMaker.Make(NetMsgType::WTXIDRELAY));

sdaftuar commented at 11:39 am on February 1, 2020:

I wonder if this should be sent immediately in response to a VERSION message, rather than in response to a VERACK. It looks like the way I’ve done it here creates a race condition where a peer could send a txid-based inv to us before it gets this message, which I think would cause relay of that transaction to fail.

https://travis-ci.org/bitcoin/bitcoin/jobs/644609638?utm_medium=notification&utm_source=github_status

sdaftuar force-pushed on Feb 3, 2020

in src/net_processing.cpp:4139 in fd2c3f257e outdated

4131@@ -4057,7 +4132,7 @@ bool PeerLogicValidation::SendMessages(CNode* pto)
4132             // Erase this entry from tx_process_time (it may be added back for
4133             // processing at a later time, see below)
4134             tx_process_time.erase(tx_process_time.begin());
4135-            CInv inv(MSG_TX | GetFetchFlags(pto), txid);
4136+            CInv inv(state.m_wtxid_relay ? MSG_WTX : (MSG_TX | GetFetchFlags(pto)), txid);

jonatack commented at 10:04 am on February 4, 2020:

Seeing a build warning here:

0net_processing.cpp: In member function ‘virtual bool PeerLogicValidation::SendMessages(CNode*)’:
1net_processing.cpp:4135:42: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
2             CInv inv(state.m_wtxid_relay ? MSG_WTX : (MSG_TX | GetFetchFlags(pto)), txid);
3                      ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

sdaftuar commented at 2:21 pm on February 4, 2020:

Yeah I saw that too, but I think the code is correct?

ajtowns commented at 3:12 am on February 7, 2020:

I think it’s treating MSG_WTX as enum GetDataMsg and MSG_TX | GetFetchFlags() as uint32_t (match GetFetchFlags return type). Changing GetFetchFlags() to return GetDataMsg (and casting its result correspondingly), and making it enum GetDataMsg : uint32_t makes the warning go away. Actually, just saying enum GetDataMsg : uint32_t makes the warning go away too by the looks.

sdaftuar commented at 6:52 pm on February 26, 2020:

Updated with the enum GetDataMsg : uint32_t fix, thanks.

sdaftuar commented at 2:23 pm on February 4, 2020: member

Wrote up a short BIP draft here: https://github.com/sdaftuar/bips/blob/2020-02-wtxid-relay/bip-wtxid-relay.mediawiki, which this PR implements.

I’ll wait for some concept ACKs here before I send to the mailing list.

in src/net_processing.cpp:4020 in fd2c3f257e outdated

3996@@ -3927,6 +3997,11 @@ bool PeerLogicValidation::SendMessages(CNode* pto)
3997                             if (ret.second) {
3998                                 vRelayExpiration.push_back(std::make_pair(nNow + std::chrono::microseconds{RELAY_TX_CACHE_TIME}.count(), ret.first));
3999                             }
4000+                            // Add wtxid-based lookup into mapRelay as well, so that peers can request by wtxid
4001+                            auto ret2 = mapRelay.insert(std::make_pair(ret.first->second->GetWitnessHash(), ret.first->second));
4002+                            if (ret2.second) {
4003+                                vRelayExpiration.push_back(std::make_pair(nNow + std::chrono::microseconds{RELAY_TX_CACHE_TIME}.count(), ret2.first));

jonatack commented at 2:24 pm on February 4, 2020:

Would it be a good idea to fetch nNow + std::chrono::microseconds{RELAY_TX_CACHE_TIME}.count() once and use the same value for both relay expirations?

jonatack commented at 2:38 pm on February 4, 2020: member

Nice – will add a link to the BIP draft in the review club notes at https://bitcoincore.reviews/18044. Currently reviewing. Can someone add a Review club label? It helps let people know that there is a review club resource they can use.

rajarshimaitra commented at 6:27 pm on February 4, 2020: contributor

Cocnept ACK. Ran standard tests. All passing. Overall agreed with the idea of using wtxid for everything. Will try to try out the huristic and report some results.

in src/net_processing.cpp:2613 in fb1b69dba6 outdated

2609         } else {
2610-            if (!tx.HasWitness() && state.GetResult() != TxValidationResult::TX_WITNESS_MUTATED) {
2611+            if (tx.HasWitness() || state.GetResult() != TxValidationResult::TX_WITNESS_MUTATED) {
2612                 // Do not use rejection cache for witness transactions or
2613                 // witness-stripped transactions, as they can have been malleated.
2614                 // See https://github.com/bitcoin/bitcoin/issues/8279 for details.

rajarshimaitra commented at 6:29 pm on February 4, 2020:

Doesn’t this comment needs update in light of the changed condition? We are now accepting txs containg witness in recentRejects.

sdaftuar force-pushed on Feb 5, 2020

JeremyRubin added this to the "Unrelated Refactors (Project Conflicts)" column in a project

adamjonas commented at 7:47 pm on February 6, 2020: member

Reference IRC discussion for further context around the VERSION/VERACK messages.

JeremyRubin moved this from the "Unrelated Refactors (Project Conflicts)" to the "Package Relay" column in a project

sdaftuar force-pushed on Feb 7, 2020

ajtowns commented at 7:36 pm on February 9, 2020: member

Concept ACK. First take: doing extra messages in between VERSION and VERACK sounds better to me than extending VERSION or potentially doing txid relay briefly in between VERACK and WTXIDRELAY, the code changes seem really nicely laid out, and the main thing that caught my eye looks like it’s probably fixed by the WITNESS_STRIPPED addition.

in src/txmempool.h:723 in cf8c70fac6 outdated

719+        return (mapTx.get<index_by_wtxid>().count(hash) != 0);
720+    }
721+
722     CTransactionRef get(const uint256& hash) const;
723-    TxMempoolInfo info(const uint256& hash) const;
724+    indexed_transaction_set::const_iterator get_iter_from_wtxid(const uint256& wtxid) const

TheBlueMatt commented at 11:22 pm on February 12, 2020:

How is indexed_transaction_set::const_iterator different than txindex?

sdaftuar commented at 6:53 pm on February 26, 2020:

Fixed.

in src/net.h:957 in 9377e308a7 outdated

953@@ -954,11 +954,11 @@ class CNode
954     }
955 
956 
957-    void AddInventoryKnown(const CInv& inv)
958+    void AddInventoryKnown(const uint256& hash)

TheBlueMatt commented at 11:30 pm on February 12, 2020:

Can we rename the function, then?

sdaftuar commented at 6:53 pm on February 26, 2020:

Done.

in src/protocol.h:372 in cf8c70fac6 outdated

368@@ -363,7 +369,8 @@ enum GetDataMsg
369     UNDEFINED = 0,
370     MSG_TX = 1,
371     MSG_BLOCK = 2,
372-    // The following can only occur in getdata. Invs always use TX or BLOCK.
373+    MSG_WTX = 5, // Defined in BIP XXX

TheBlueMatt commented at 11:34 pm on February 12, 2020:

WITNESS_TX and WTX read a little bit too close for comfort. I don’t think the extra characters to write a useful name cost us much :p.

sdaftuar commented at 1:14 pm on February 13, 2020:

Suggestions?

ariard commented at 11:54 pm on February 27, 2020:

Would be okay with MSG_WTXID to mark we only deal with tx identifier not a specific new data structure, we already have a GetDataMsg for the transaction and its witness.

naumenkogs commented at 2:46 pm on March 10, 2020:

I think MSG_WTXID is even more confusing than MSG_WTX. The suffix here represents an object being fetched. So TX or BLOCK with some type. MSG_WTXID would mean we are fetching an ID. The only thing I can thing of is MSG_TX_BY_WTXID. But I’m also fine with MSG_WTX.

jonatack commented at 3:05 pm on March 10, 2020:

Maybe I’ve been staring at this for too long but I agree with @naumenkogs. I confused MSG_TX and MSG_WTX once during the first pass at reviewing a few weeks ago, but then I got used to it and I like its brevity. I think it’s fine for now.

ajtowns commented at 3:16 pm on June 15, 2020:

TX (by txid, don’t send witness), WTX_TXID (by txid, do send witness), WTX_WTXID (by wtxid, do send witness)` might be clearer?

in src/net_processing.cpp:1205 in cf8c70fac6 outdated

1203@@ -1184,6 +1204,7 @@ void PeerLogicValidation::BlockConnected(const std::shared_ptr<const CBlock>& pb
1204         LOCK(g_cs_recent_confirmed_transactions);
1205         for (const auto ptx : pblock->vtx) {

TheBlueMatt commented at 11:58 pm on February 12, 2020:

Not (this pr’s) your bug, but clang complains that you’re copying shared_ptrs on every iteration and you should add a &.

ajtowns commented at 2:22 am on February 18, 2020:

(this is already fixed on master, so a rebase would solve it)

in src/net_processing.cpp:156 in cf8c70fac6 outdated

93@@ -92,6 +94,7 @@ struct COrphanTx {
94 };
95 RecursiveMutex g_cs_orphans;
96 std::map<uint256, COrphanTx> mapOrphanTransactions GUARDED_BY(g_cs_orphans);
97+std::map<uint256, std::map<uint256, COrphanTx>::iterator> g_orphans_by_wtxid GUARDED_BY(g_cs_orphans);

TheBlueMatt commented at 0:00 am on February 13, 2020:

You should probably wipe this in ~CNetProcessingCleanup() like we do the other orphan fields, but, also, why the hell is that thing even there…

sdaftuar commented at 6:54 pm on February 26, 2020:

Fixed, thanks.

Not sure exactly what you’re asking but it seemed easier to add an additional map, rather than change the existing data structure to support wtxid-lookup.

ariard commented at 11:23 pm on February 27, 2020:

What would be the challenges to switch mapOrphanTransactions to wtxid-only? It’s just an internal structure, do we use it to interact with our peers ? (and even you can translate backward there)

in src/net_processing.cpp:1946 in cf8c70fac6 outdated

1942@@ -1910,12 +1943,18 @@ void static ProcessOrphanTx(CConnman* connman, std::set<uint256>& orphan_work_se
1943             // Has inputs but not accepted to mempool
1944             // Probably non-standard or insufficient fee
1945             LogPrint(BCLog::MEMPOOL, "   removed orphan tx %s\n", orphanHash.ToString());
1946-            if (!orphanTx.HasWitness() && orphan_state.GetResult() != TxValidationResult::TX_WITNESS_MUTATED) {
1947-                // Do not use rejection cache for witness transactions or
1948-                // witness-stripped transactions, as they can have been malleated.
1949-                // See https://github.com/bitcoin/bitcoin/issues/8279 for details.
1950+            if (orphanTx.HasWitness() || orphan_state.GetResult() != TxValidationResult::TX_WITNESS_STRIPPED) {

TheBlueMatt commented at 0:16 am on February 13, 2020:

Would this be any different if it were just orphan_state.GetResult() != TxValidationResult::TX_WITNESS_STRIPPED?

sdaftuar commented at 6:54 pm on February 26, 2020:

Nope, would not be different. Fixed.

in src/net_processing.cpp:1954 in cf8c70fac6 outdated

1953+                // adding such txids to the reject filter would potentially
1954+                // interfere with relay of valid transactions from peers that
1955+                // do not support wtxid-based relay. See
1956+                // https://github.com/bitcoin/bitcoin/issues/8279 for details.
1957+                // We can remove this restriction (and always add wtxids to
1958+                // the filter even for witness tripped transactions) once

TheBlueMatt commented at 0:17 am on February 13, 2020:

s/tripped/stripped/, but, more importantly, can we? Wouldn’t that allow an attacker to see an 0.19 client’s inv, request the tx without delay (unlike its other peers), then spam the network with the witness-stripped version, preventing relay of that transaction. At least as long as we dont ban for TX_WITNESS_STRIPPED.

sdaftuar commented at 1:29 pm on February 13, 2020:

From a single node’s perspective, as long as you have at least 1 honest wtxid-relay peer, such an attack would not work, because the wtxid for the honest version of the transaction would not be added to your filter (only the txid), and so you would eventually request the valid version of the transaction from your honest wtxid-relay peer.

Currently we preference outbound peers for transaction download, so once we believe that some sufficient majority of listening nodes support wtxid-relay, such that the chances of having 0 or just a few wtxid-relay-outbound-peers is pretty small, then I think we can just get rid of this.

I don’t think our threat model has ever really been too concerned with every node getting every transaction, anyway – we just don’t want the network as a whole to not relay a transaction altogether because of something like this. So exactly how long we should wait before dropping this logic is debatable IMO; anyway we should defer the debate to the future when someone thinks this is ready.

TheBlueMatt commented at 0:27 am on February 13, 2020: member

Didn’t-get-a-chance-to-go-through-the-whole-thing-and-am-too-tired-to-be-reviewing-anyway Concept ACK

(that’s how we’re supposed to ACK now, right, with lots of context?)

TheBlueMatt commented at 0:28 am on February 13, 2020: member

FWIW, travis error is " node0 2020-02-12T14:39:48.835709Z [msghand] Error: Disk space is too low! "

fjahr commented at 4:58 pm on February 14, 2020: member

Concept ACK on the BIP draft and the implementation of it

I am not so sure about the 2-second delay, however, because it is not necessarily an improvement, it can also make matters worse afaict. Example: At least one of our txid peers is better connected than all of our wtxid peers. We get a txid inv first and wait for 2 seconds. Meanwhile, we receive the wtxid inv and request it, but before we get the response we also request the transaction from the txid peer. I am not sure how realistic this is and it will certainly not happen frequently once large parts of the network have upgraded to wtxid invs. But I also think the PR is compelling enough without the 2-second delay optimization and it could be left as a follow-up so it can be discussed separately.

Do you plan to add additional tests? Otherwise, I might be interested to look into that.

sdaftuar closed this on Feb 18, 2020

sdaftuar reopened this on Feb 18, 2020

instagibbs commented at 8:03 pm on February 18, 2020: member

concept ACK

jnewbery added the label Review club on Feb 23, 2020

naumenkogs commented at 7:46 pm on February 25, 2020: member

Concept ACK

sdaftuar force-pushed on Feb 26, 2020

sdaftuar commented at 6:57 pm on February 26, 2020: member

Updated to fix the comments left so far. I also changed the 2 second delay heuristic to only delay initial request of a transaction from a txid-relay peer, and not subsequent requests – @naumenkogs pointed out to me that the original behavior I proposed would have opened us up to an “InvBlock” attack if we get a bunch of inbound peers that have wtxid-relay turned on, but don’t yet have any outbound peers. @fjahr Please feel free to write some tests, thank you! As for your comment about delaying transaction relay, yes I think that is the tradeoff – reduce bandwidth at the expense of latency.

in src/net_processing.cpp:2517 in 1cdf17837b outdated

2201         pfrom->fSuccessfullyConnected = true;
2202         return true;
2203     }
2204 
2205+    // Feature negotiation of wtxidrelay should happen between VERSION and
2206+    // VERACK, to avoid relay problems from switching after a connection is up

naumenkogs commented at 1:17 am on February 27, 2020:

This comment is slightly confusing to me. If you setting this expectation here, maybe you want to enforce the rule in the code? Track whether VERACK was not received or something.

sdaftuar commented at 6:40 pm on March 4, 2020:

What to do in that case? We could disconnect the peer for violating the spec, or ignore the message and not switch to wtxid relay, but I thought trying to do wtxid-relay was the most practical, as it should mostly work (just things that were announced earlier might not be successfully requested). Maybe it’s better to not tolerate any kind of bugs though, so that implementations get all this stuff right?

naumenkogs commented at 6:31 pm on March 5, 2020:

Disconnecting is too much, but ignoring the wtxid_relay message may be reasonable.

Alternatively (and probably better), we can just make the comment more clear. Something like “currently this is not enforced, because at most we get slight inefficiency if not following this rule”.

And then the responsibility would be on the future developer to introduce the disconnect/ignore behavior if they choose to fundamentally rely on this assumption.

ajtowns commented at 3:45 am on June 29, 2020:

Disconnecting seems fine to me – it’s not like there are old peers sending wtxidrelay messages around that we need to worry about supporting, and if we disconnect immediately, that should prevent anyone creating any new peers that behave weirdly? Mildly prefer ignoring the wtxidrelay if proposed post-verack, than just trying to do what they intend.

sdaftuar commented at 9:20 pm on June 29, 2020:

Ok made it disconnect. Failing to make any connections at all is a good way for software to get fixed I guess…

naumenkogs commented at 1:38 am on February 27, 2020: member

Code review ACK. Did some light testing making sure we don’t break the existing behaviour.

in src/txmempool.cpp:920 in 1d1fddb979 outdated

911@@ -912,8 +912,8 @@ bool CCoinsViewMemPool::GetCoin(const COutPoint &outpoint, Coin &coin) const {
912 
913 size_t CTxMemPool::DynamicMemoryUsage() const {
914     LOCK(cs);
915-    // Estimate the overhead of mapTx to be 12 pointers + an allocation, as no exact formula for boost::multi_index_contained is implemented.
916-    return memusage::MallocUsage(sizeof(CTxMemPoolEntry) + 12 * sizeof(void*)) * mapTx.size() + memusage::DynamicUsage(mapNextTx) + memusage::DynamicUsage(mapDeltas) + memusage::DynamicUsage(mapLinks) + memusage::DynamicUsage(vTxHashes) + cachedInnerUsage;
917+    // Estimate the overhead of mapTx to be 15 pointers + an allocation, as no exact formula for boost::multi_index_contained is implemented.

ariard commented at 8:47 pm on February 27, 2020:

54d58e7

3 new pointers for boost::multi_index::tag<index_by_wtxid>, mempoolentry_wtxid, SaltedTxidHasher, not one for new boost::multi_index::hashed_unique ?

sipa commented at 11:20 pm on July 6, 2020:

@ariard I don’t understand this comment (https://github.com/bitcoin/bitcoin/pull/18044/commits/1c382b500b2b6025f9a4cf4cd37af678172c6ae1#r385362270).

I think 3 pointers for an extra index is probably close to correct, but #18086 may make these kinds of heuristics unnecessary, so that’s probably a better approach long term.

in src/protocol.h:266 in 021c5d392b outdated

229@@ -230,6 +230,12 @@ extern const char *GETBLOCKTXN;
230  * @since protocol version 70014 as described by BIP 152
231  */
232 extern const char *BLOCKTXN;
233+/**
234+ * Indicates that a node prefers to relay transactions via wtxid, rather than
235+ * txid.

ariard commented at 11:59 pm on February 27, 2020:

d0f03e9

Is this rule a best-effort right now? At tx reception, we may have to request parent based only on the input. If so is the BIP clear enough on the 5) ? Precise also there is no current sanction of the rule by receiver as of today

sdaftuar commented at 5:23 pm on March 5, 2020:

Yes, you made a good observation – our fetching of parents of orphan transactions must be by txid, rather than wtxid. I did try to carefully word the BIP to avoid referencing this situation:

After a node has sent and received a “wtxidrelay” message to/from a given peer, the node is required to use the MSG_WTX inv-type when announcing transactions to that peer, or requesting announced transactions from that peer.

So the BIP says that newly announced transactions must be by wtxid. And if your peer announces a transaction by wtxid, you must fetch it by wtxid (and not txid). The BIP is silent on requests for transactions that are not announced – we could make this explicit but this has never been documented behavior, and I’m not sure we should commit to a certain handling of this kind of thing anyway… But open to suggestions.

ariard commented at 7:12 pm on March 5, 2020:

I would amend the BIP to make its scope explicitly restrained. Add to 5) “Note: this rule doesn’t cover cases where a peer needs to request transaction for validation purposes but don’t know its wtxid yet (e.g missing parents of a wtxid-announced child)” ? At least to avoid someone raising same concern in the future and saying we don’t commit on anything wrt to this issue.

in src/net_processing.cpp:2054 in 6e7290f8f8 outdated

1954+                // interfere with relay of valid transactions from peers that
1955+                // do not support wtxid-based relay. See
1956+                // https://github.com/bitcoin/bitcoin/issues/8279 for details.
1957+                // We can remove this restriction (and always add wtxids to
1958+                // the filter even for witness stripped transactions) once
1959+                // wtxid-based relay is broadly deployed.

ariard commented at 0:06 am on February 28, 2020:

cf8c70f

For context-tracking, would be fine to even further comment, “right now if we reject witness transactions based on their txids, relying on wtxid-based relay peers might lead to a stripped-witness tx starvation due to the low number of wtxid-relay peers at protocol deployment”

Also, once we reach a deployment meaningful enough, should we merge together TX_WITNESS_MUTATED and TX_WITNESS_STRIPPED ?

sdaftuar commented at 5:24 pm on March 5, 2020:

In the future, I think I’d like to eventually get rid of WITNESS_STRIPPED as a reason that validation returns, because we incur non-trivial CPU usage in order to make the determination of whether a transaction might be WITNESS_STRIPPED.

ariard commented at 7:13 pm on March 5, 2020:

Yes that what I understood, you’re thinking about the 2 CheckInputScripts in what is currently PolicyScriptChecks?

sdaftuar commented at 7:46 pm on March 6, 2020:

Yes that’s right.

ariard commented at 0:24 am on February 28, 2020: member

Concept & light-code-review ACK

Historically this hasn’t been a big problem, but if/when policy for accepting segwit transactions were to change (eg taproot, or any other change), we could expect older nodes talking to newer nodes to be wasting bandwidth because of this.

What taproot is making worst here than current tx-relay of segwit v0 txn ?

Wrt to IRC discussion on multi-party transactions & tx-relay interference, at least for Lightning I can’t see this as an issue, worst-case your counterparty can resign to produce another valid witness and you redownload twice ? Or is the attack finding a wtxid-collision with an invalid tx to make propagation of a valid time-sensitive tx near impossible?

sdaftuar commented at 5:18 pm on March 5, 2020: member

What taproot is making worst here than current tx-relay of segwit v0 txn ?

Explained this offline, but the issue is that in a normal, non-adversarial situation where new nodes are relaying segwit transactions that violate old nodes’ policy, those old nodes will be wasting bandwidth repeatedly downloading such transactions, because we have no caching of acceptance failure for segwit transactions.

Wrt to IRC discussion on multi-party transactions & tx-relay interference, at least for Lightning I can’t see this as an issue, worst-case your counterparty can resign to produce another valid witness and you redownload twice ? Or is the attack finding a wtxid-collision with an invalid tx to make propagation of a valid time-sensitive tx near impossible?

I think you’re referencing an IRC conversation about a proposal to change wtxid-based relay to use some shorter, truncated hash instead (128 bits instead of 256 bits), which one version of the Erlay proposal called for. I’ve opted not to pursue that, because the bandwidth savings seem small after Erlay. If someone is going to advocate for truncated hashes, we can revisit the concerns from that conversation.

ariard commented at 7:23 pm on March 5, 2020: member

Explained this offline, but the issue is that in a normal, non-adversarial situation where new nodes are relaying segwit transactions that violate old nodes’ policy, those old nodes will be wasting bandwidth repeatedly downloading such transactions, because we have no caching of acceptance failure for segwit transactions.

Yes thanks for explanation, what I didn’t get was with a segwit upgrade deployment, transaction download waste wasn’t encumbered only in case of adversarial but also just for normal situation (without change any segwit v1 would be redownload at every announcement by non-upgrades nodes).

I think you’re referencing an IRC conversation about a proposal to change wtxid-based relay to use some shorter, truncated hash instead (128 bits instead of 256 bits), which one version of the Erlay proposal called for. I’ve opted not to pursue that, because the bandwidth savings seem small after Erlay. If someone is going to advocate for truncated hashes, we can revisit the concerns from that conversation.

Attack scenario is knowing a wtxid, a counterparty may find transaction identifier collision for an invalid transaction to stuck a time-sensitive tx in the reject filter, and keep broadcasting such malicious transaction at every block tip until timelock expires. That’s not a concern for LN today (because your counterparty doesn’t know your witness for your local commitment tx) but that would change with something like Eltoo (because symmetric transaction). So Eltoo-style offchain contracts + short-bit transaction identifier is a security concern, something we should document somewhere (maybe add comment around recentRejects)

in src/net_processing.cpp:2047 in 1cdf17837b outdated

1947-                // Do not use rejection cache for witness transactions or
1948-                // witness-stripped transactions, as they can have been malleated.
1949-                // See https://github.com/bitcoin/bitcoin/issues/8279 for details.
1950+            if (orphan_state.GetResult() != TxValidationResult::TX_WITNESS_STRIPPED) {
1951+                // Do not add txids of witness transactions or witness-stripped
1952+                // transactions to the filter, as they can have been malleated;

jonatack commented at 2:18 pm on March 6, 2020:

s/the filter/the recentRejects filter/ in 6e7290f to aid git grepping (in which case can drop “reject” in the next line).

Idem for the section in ProcessMessage()::L2684 containing the same paragraph.

sdaftuar commented at 7:46 pm on March 6, 2020: member

So Eltoo-style offchain contracts + short-bit transaction identifier is a security concern, something we should document somewhere (maybe add comment around recentRejects) @ariard This PR does not include a “short-bit transaction identifier” so I don’t think any additional documentation is needed.

ariard commented at 5:35 am on March 7, 2020: member

@ariard This PR does not include a “short-bit transaction identifier” so I don’t think any additional documentation is needed.

Yeah I know it’s more a safeguard against someone proposing the idea in the future and this issue not being remembered.

in src/protocol.h:367 in 1cdf17837b outdated

363@@ -358,12 +364,13 @@ const uint32_t MSG_TYPE_MASK    = 0xffffffff >> 2;
364  * These numbers are defined by the protocol. When adding a new value, be sure
365  * to mention it in the respective BIP.
366  */
367-enum GetDataMsg
368+enum GetDataMsg : uint32_t

jonatack commented at 5:59 pm on March 9, 2020:

Curious why the uint32_t enum type is being added; good practice?

sdaftuar commented at 8:43 pm on March 17, 2020:

This was aj’s proposed fix to the build warning you reported before: #18044 (review)

in src/net_processing.cpp:78 in 1cdf17837b outdated

68@@ -69,6 +69,8 @@ static constexpr int HISTORICAL_BLOCK_AGE = 7 * 24 * 60 * 60;
69 static constexpr int32_t MAX_PEER_TX_IN_FLIGHT = 100;
70 /** Maximum number of announced transactions from a peer */
71 static constexpr int32_t MAX_PEER_TX_ANNOUNCEMENTS = 2 * MAX_INV_SZ;
72+/** How many microseconds to delay requesting transactions via txids, if we have wtxid-relaying peers */

jonatack commented at 8:55 pm on March 9, 2020:

s/microseconds/seconds/ in 068bf9f – that said, there are several other static constexprs that are also described as μs but expressed in s.

naumenkogs commented at 2:38 pm on March 10, 2020:

Because of this confusion I am generally not a fan of mentioning the type in the comment. I don’t see why it’s useful since std::chrono::microseconds definition is clear enough.

sdaftuar commented at 8:28 pm on March 17, 2020:

I see @naumenkogs’s point that this is redundant, but I don’t understand why @jonatack is saying this is incorrect? TXID_RELAY_DELAY is the number of microseconds that we wait before requesting a transaction, no?

jonatack commented at 9:02 pm on March 17, 2020:

The value in the definition is expressed in seconds (albeit with microsecond precision), e.g. a delay of 2 seconds and not 2 microseconds, unless I am confused. There are several similar cases; I think this can be ignored and addressed (or not) separately.

sipa commented at 1:34 am on March 18, 2020:

The comment is technically correct because it’s referring to the TXID_RELAY_DELAY constant, not how it is computed. Compare it with how before it might have been static const int TXID_RELAY_DELAY = 2 * 1000000, where 2 was still a value in seconds.

It’s a silly semantics discussion though. With the std::chrono types everywhere the comment could just be “Duration of delay …”.

jonatack commented at 9:15 am on March 18, 2020:

the comment could just be “Duration of delay …”.

Yes.

in src/txmempool.h:490 in 1cdf17837b outdated

483@@ -467,6 +484,12 @@ class CTxMemPool
484         boost::multi_index::indexed_by<
485             // sorted by txid
486             boost::multi_index::hashed_unique<mempoolentry_txid, SaltedTxidHasher>,
487+            // sorted by wtxid

jonatack commented at 0:56 am on March 10, 2020:

Perhaps update the documentation L403-404 above to mention sorting the mempool by both txid and wtxid transaction hashes (and 5 criteria rather than 4, depending on how you count it).

sdaftuar commented at 8:42 pm on March 17, 2020:

Thanks, fixed.

jonatack commented at 12:07 pm on March 10, 2020: member

ACK 1cdf178

I reviewed the code extensively and built/ran tests cleanly for each commit. Have also been running two nodes (one Debian, one macOS) with these changes since 1-2 weeks. Thanks for the well-structured, hygienic changes. I’m still thinking about what I might have overlooked, and test coverage and updated logging seem needed as follow-ups… supposing (optimistically, to be sure) this could be merged after the 0.20 branch-off with half a year of testing and progress in master for 0.21. Some comments follow but none worth losing review for unless you need to retouch.

jonatack commented at 2:52 pm on March 10, 2020: member

Also, I believe #18261 (the current implementation of Erlay) depends on these changes.

in src/txmempool.h:725 in 1d1fddb979 outdated

721+
722     CTransactionRef get(const uint256& hash) const;
723-    TxMempoolInfo info(const uint256& hash) const;
724+    txiter get_iter_from_wtxid(const uint256& wtxid) const
725+    {
726+        LOCK(cs);

sipa commented at 8:58 pm on March 13, 2020:

It seems that all call sites of get_iter_from_wtxid hold cs already. Perhaps it’s better to have an AssertLockHeld / EXCLUSIVE_LOCKS_REQUIRED here instead?

sdaftuar commented at 8:42 pm on March 17, 2020:

Done.

in src/net_processing.cpp:3931 in b91a879d2c outdated

3926@@ -3927,6 +3927,11 @@ bool PeerLogicValidation::SendMessages(CNode* pto)
3927                             if (ret.second) {
3928                                 vRelayExpiration.push_back(std::make_pair(nNow + std::chrono::microseconds{RELAY_TX_CACHE_TIME}.count(), ret.first));
3929                             }
3930+                            // Add wtxid-based lookup into mapRelay as well, so that peers can request by wtxid
3931+                            auto ret2 = mapRelay.insert(std::make_pair(ret.first->second->GetWitnessHash(), ret.first->second));

sipa commented at 9:59 pm on March 13, 2020:

Slightly more efficient and C++11ish:

0                            auto ret2 = mapRelay.emplace(ret.first->second->GetWitnessHash(), ret.first->second);

sdaftuar commented at 8:41 pm on March 17, 2020:

Done.

in src/net_processing.cpp:3933 in b91a879d2c outdated

3926@@ -3927,6 +3927,11 @@ bool PeerLogicValidation::SendMessages(CNode* pto)
3927                             if (ret.second) {
3928                                 vRelayExpiration.push_back(std::make_pair(nNow + std::chrono::microseconds{RELAY_TX_CACHE_TIME}.count(), ret.first));
3929                             }
3930+                            // Add wtxid-based lookup into mapRelay as well, so that peers can request by wtxid
3931+                            auto ret2 = mapRelay.insert(std::make_pair(ret.first->second->GetWitnessHash(), ret.first->second));
3932+                            if (ret2.second) {
3933+                                vRelayExpiration.push_back(std::make_pair(nNow + std::chrono::microseconds{RELAY_TX_CACHE_TIME}.count(), ret2.first));

sipa commented at 9:59 pm on March 13, 2020:

0                                vRelayExpiration.emplace_back(nNow + std::chrono::microseconds{RELAY_TX_CACHE_TIME}.count(), ret2.first);

sdaftuar commented at 8:41 pm on March 17, 2020:

Done.

in src/net_processing.cpp:874 in 591931a2b0 outdated

869@@ -869,6 +870,8 @@ bool AddOrphanTx(const CTransactionRef& tx, NodeId peer) EXCLUSIVE_LOCKS_REQUIRE
870     auto ret = mapOrphanTransactions.emplace(hash, COrphanTx{tx, peer, GetTime() + ORPHAN_TX_EXPIRE_TIME, g_orphan_list.size()});
871     assert(ret.second);
872     g_orphan_list.push_back(ret.first);
873+    // Allow for lookups in the orphan pool by wtxid, as well as txid
874+    g_orphans_by_wtxid.insert(std::make_pair(tx->GetWitnessHash(), ret.first));

sipa commented at 10:01 pm on March 13, 2020:

0    g_orphans_by_wtxid.emplace(tx->GetWitnessHash(), ret.first);

sdaftuar commented at 8:41 pm on March 17, 2020:

Thanks, done.

in src/net_processing.cpp:1919 in 0d3d656ab5 outdated

1915@@ -1916,12 +1916,12 @@ void static ProcessOrphanTx(CConnman* connman, std::set<uint256>& orphan_work_se
1916             // Has inputs but not accepted to mempool
1917             // Probably non-standard or insufficient fee
1918             LogPrint(BCLog::MEMPOOL, "   removed orphan tx %s\n", orphanHash.ToString());
1919-            if (!orphanTx.HasWitness() && orphan_state.GetResult() != TxValidationResult::TX_WITNESS_MUTATED) {
1920+            if (orphanTx.HasWitness() || orphan_state.GetResult() != TxValidationResult::TX_WITNESS_MUTATED) {

sipa commented at 10:19 pm on March 13, 2020:

The comment on the next line here is outdated now.

naumenkogs commented at 1:22 pm on June 30, 2020:

To other reviewers: this issue is addressed in one of the following commits. It would make more sense to update it here, but it’s probably fine.

in src/net_processing.cpp:2610 in 0d3d656ab5 outdated

2603@@ -2604,14 +2604,15 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
2604                 // We will continue to reject this tx since it has rejected
2605                 // parents so avoid re-requesting it from other peers.
2606                 recentRejects->insert(tx.GetHash());
2607+                recentRejects->insert(tx.GetWitnessHash());
2608             }
2609         } else {
2610-            if (!tx.HasWitness() && state.GetResult() != TxValidationResult::TX_WITNESS_MUTATED) {
2611+            if (tx.HasWitness() || state.GetResult() != TxValidationResult::TX_WITNESS_MUTATED) {

sipa commented at 10:25 pm on March 13, 2020:

Next line’s comment is outdated here as well.

in src/net_processing.cpp:1364 in 44ea27c61d outdated

1359@@ -1356,7 +1360,8 @@ bool static AlreadyHave(const CInv& inv) EXCLUSIVE_LOCKS_REQUIRED(cs_main)
1360 
1361             {
1362                 LOCK(g_cs_orphans);
1363-                if (mapOrphanTransactions.count(inv.hash)) return true;
1364+                if (inv.type != MSG_WTX && mapOrphanTransactions.count(inv.hash)) return true;
1365+                else if (inv.type == MSG_WTX && g_orphans_by_wtxid.count(inv.hash)) return true;

sipa commented at 10:27 pm on March 13, 2020:

Can you use braces and indentation here? Combination of a line ending with ‘;’ followed by else on the next line may be confusing.

sdaftuar commented at 8:41 pm on March 17, 2020:

Done

in src/net_processing.cpp:2200 in 021c5d392b outdated

2187+    // Feature negotiation of wtxidrelay should happen between VERSION and
2188+    // VERACK, to avoid relay problems from switching after a connection is up
2189+    if (strCommand == NetMsgType::WTXIDRELAY) {
2190+        if (pfrom->nVersion >= WTXID_RELAY_VERSION) {
2191+            LOCK(cs_main);
2192+            if (!State(pfrom->GetId())->m_wtxid_relay) {

sipa commented at 10:55 pm on March 13, 2020:

The conditional seems superfluous here.

sdaftuar commented at 8:25 pm on March 17, 2020:

We use it to properly count the number of wtxid-relay peers.

sipa commented at 8:30 pm on March 17, 2020:

Oh, of course.

sipa commented at 11:09 pm on March 13, 2020: member

Concept ACK, and rough code review ACK (left a number of comments though none are serious blockers). This certainly needs a functional p2p test, though.

DrahtBot added the label Needs rebase on Mar 16, 2020

sdaftuar force-pushed on Mar 17, 2020

sdaftuar commented at 8:46 pm on March 17, 2020: member

There are a couple review notes about comments that I still need to go back and look at but this is now rebased to get rid of the conflicts with master after the refactor done in #17997, and most of the non-comment review feedback has been incorporated.

I also still need to measure the memory usage of mapTx to see if the estimate I’m using is correct.

If anyone feels like contributing a test, I think that might help get this merged.

DrahtBot removed the label Needs rebase on Mar 17, 2020

in src/txmempool.h:725 in 782a0b0449 outdated

716@@ -693,8 +717,19 @@ class CTxMemPool
717         return (mapTx.count(hash) != 0);
718     }
719 
720+    bool wtxid_exists(const uint256& hash) const

jnewbery commented at 6:22 pm on March 28, 2020:

Is there a reason you’ve added a new function here, instead of extending exists() with a boolean wtxid parameter (in the same way that you’ve extended CompareDepthAndScore() and info()?

sdaftuar commented at 7:40 pm on March 30, 2020:

I think this is just a style choice, trying to balance different tradeoffs:

if you change a function’s arguments, then you have to change all call sites, unless you use an optional argument
changing lots of call sites is not great as it adds noise to PR review
optional arguments are not great as they can make it harder to maintain code and do review in the future

But sometimes optional arguments are the easier way to go, to avoid having to embed hairy logic at a bunch of call sites to decide which function to call, which can also harm readability. So I think it’s just a balance, and I tried to strike the balance I thought made sense when I was writing all this which wasn’t always the same for every example, but maybe I could have done better.

Anyway at this point I’d prefer not to make stylistic changes to this PR.

ajtowns commented at 2:58 am on June 29, 2020:

FWIW, I think mempool.exists(hash, by_wtxid) is more convenient/readable in AlreadyHave (the only place wtxid_exists() is used); it would let you say:

0    const bool by_wtxid = (inv.type == MSG_WTX); 
1    return mempool.exists(inv.hash, by_wtxid));

instead of

0    return (inv.type != MSG_WTX && mempool.exists(inv.hash)) ||
1           (inv.type == MSG_WTX && mempool.wtxid_exists(inv.hash));

in src/txmempool.h:729 in 782a0b0449 outdated

723+        return (mapTx.get<index_by_wtxid>().count(hash) != 0);
724+    }
725+
726     CTransactionRef get(const uint256& hash) const;
727-    TxMempoolInfo info(const uint256& hash) const;
728+    txiter get_iter_from_wtxid(const uint256& wtxid) const EXCLUSIVE_LOCKS_REQUIRED(cs)

jnewbery commented at 6:23 pm on March 28, 2020:

As above, I think it would be clearer if this was combined with GetIter()

sipa commented at 7:57 am on July 8, 2020:

In commit “Add a wtxid-index to the mempool”

I think this would be a nice improvement.

in src/net_processing.cpp:2957 in 782a0b0449 outdated

2616+        // for witness malleation.
2617+        if (!AlreadyHave(CInv(MSG_WTX, wtxid), mempool) &&
2618             AcceptToMemoryPool(mempool, state, ptx, &lRemovedTxn, false /* bypass_limits */, 0 /* nAbsurdFee */)) {
2619             mempool.check(&::ChainstateActive().CoinsTip());
2620-            RelayTransaction(tx.GetHash(), *connman);
2621+            RelayTransaction(tx.GetHash(), tx.GetWitnessHash(), *connman);

jnewbery commented at 6:25 pm on March 28, 2020:

Is there a good reason not to use txid and wtxid here instead of looking them up again?

ajtowns commented at 12:58 pm on March 30, 2020:

GetHash() and GetWitnessHash() are inline and just return pre-cached results anyway so it should make no practical difference. Probably clearer to use the shorter aliases though.

sdaftuar commented at 7:43 pm on March 30, 2020:

Looking at the diff, I note that the code change I have here looks clearly correct if you believe the old code was correct. Given the lack of performance sensitivity I’m not inclined to change this.

in src/net_processing.cpp:2647 in 782a0b0449 outdated

2643@@ -2568,8 +2644,13 @@ bool ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStream& vR
2644                 const auto current_time = GetTime<std::chrono::microseconds>();
2645 
2646                 for (const CTxIn& txin : tx.vin) {
2647+                    // Here, we only have the txid (and not wtxid) of the

jnewbery commented at 7:50 pm on March 28, 2020:

I don’t understand how this is supposed to work. You add the parent’s txid to the node’s m_tx_process_time via RequestTx() here, but when we process it in SendMessages() getdata logic, we always construct a CInv with type MSG_WTX. The receiving node will fail to find the tx because it searches in the mempool for a tx with that wtxid.

ajtowns commented at 12:54 pm on March 30, 2020:

This also happens if a buggy node sends CInv with MSG_WTX without having announced wtxidrelay first, or MSG_TX after having announced wtxidrelay I think.

sdaftuar commented at 7:35 pm on March 30, 2020:

This logic of requesting unconfirmed parent transactions when an orphan is received should eventually go away, in favor of some sort of package relay scheme, where orphan transactions trigger discovery of all unconfirmed ancestors – so I think the question is what to do in the meantime. I think wtxid-based relay is more important than optimizing this behavior when we encounter an orphan.

Note that we only just recently (#16851, merged last October) fixed things so that these requests worked for transactions that are not in mapRelay. Also note that for transactions that are still in mapRelay, or for non-segwit transactions where txid and wtxid are the same, these requests will still work. However this is admittedly a regression in behavior, undoing the benefit of #16851. However given how long that behavior was broken (and that the recent fix was largely at my urging, I believe), it’s hard for me to think that anyone is really too concerned with this.

Anyway it is possible to improve this by keeping extra bookkeeping so that requests generated this way go out as MSG_TX, but I don’t think it’s worth it – it’s a lot of additional complexity for very little benefit, and I think effort spent on improving orphan transaction relay should be devoted to a p2p protocol extension that supports it directly.

jnewbery commented at 10:35 pm on March 30, 2020:

I disagree with the argument “this was broken for a long time and we only recently fixed it, so it’s ok to break it again”

it is possible to improve this by keeping extra bookkeeping … it’s a lot of additional complexity.

This doesn’t seem like too much additional complexity. We store in m_tx_process_time whether the request should be for a txid or a wtxid, and set the GETDATA INV based on that. That’s one additional bit of information, which could be a bool or a enum GetDataMsg.

for transactions that are still in mapRelay […] these requests will still work

Oh, I hadn’t realised that we’d respond to a MSG_WTX GETDATA request for a wtxid with a transaction that has that txid. The draft BIP doesn’t specify that we should do that, and it seems extremely unintuitive. Could we have two mapRelays - one based on txid and one based on wtxid so we don’t do this?

(I also think that we should remove mapRelay entirely (#17303), which depends on responding to NOTFOUNDs (#18238), which currently is broken and seems to be waiting for this PR (https://github.com/bitcoin/bitcoin/pull/18238#discussion_r398318859))

sdaftuar commented at 11:37 pm on March 30, 2020:

I disagree with the argument “this was broken for a long time and we only recently fixed it, so it’s ok to break it again”

Please, that is not the entirety of the argument; the substance of what is being “broken” – and how significant it is – is relevant too. It’s not like transaction relay was horribly broken until the last release.

For context: fixing the behavior as we did was something that I suggested we do in order to make another improvement I wanted to make – package relay – something that we could introduce in two parts, first as a pure client-side behavior integrated with orphan handling, and then as a p2p protocol extension. Since then, I concluded that splitting those two things up doesn’t make sense, as getting the orphan handling part right is complicated enough that it isn’t worth the review burden for something that would just be ripped out later (and so I think we just go straight to package relay with a p2p protocol extension).

I think it’d be perfectly reasonable to just rip out this behavior of trying to request parents when we receive an orphan – it is fundamentally unreliable and is already just an opportunistic way to improve relay in one edge case situation. Adding complexity solely for the purpose of supporting something that flaky is something we should all be skeptical about; this part of the codebase is already very difficult for most people to understand.

Oh, I hadn’t realised that we’d respond to a MSG_WTX GETDATA request for a wtxid with a transaction that has that txid. The draft BIP doesn’t specify that we should do that, and it seems extremely unintuitive. Could we have two mapRelays - one based on txid and one based on wtxid so we don’t do this?

I don’t see the downside – the behavior I’ve implemented seems strictly better, though non-essential and not something I think we need to specify in a BIP as behavior we promise to maintain.

ajtowns commented at 0:08 am on March 31, 2020:

I don’t see the downside

Just my opinion, but: the p2p code is already really complicated and not really well understood, adding new behaviours and having the implementation and the spec differ in ways that matter in practice and aren’t obvious seems like it makes that a fair bit worse. This makes the review burden for both this PR and future p2p changes much higher, since you can’t just read the BIPs to understand how things should work, or the local bits of code to see how they do work to see if a change makes sense, but you have to deeply understand how many things interact to even get started.

I was going to suggest adding code to explicitly ignore INVs that come in with the wrong TX/WTX marker for the peer sending them, and to not request orphan txs’ parents when we’re in WTX mode – I do agree that behaviour’s not too crucial to keep even without package relay; I just don’t think mixing wtxid/txid values and having functionality relying on weird undocumented special cases like that working sometimes (only while the tx is in mapRelay but not after expiry?) is okay.

(Not a nack, just my opinion; I’m open to being convinced otherwise. But I think I’ll go back to poking at the notfound pr rather than holding off for this one getting merged for now)

sdaftuar commented at 2:27 pm on March 31, 2020:

@ajtowns My view on this is that what goes into a BIP are the essential behaviors that all implementations should abide by in order to make an idea work, but non-essential implementation details are left for projects to individually figure out.

This makes it easier for new protocol extensions to get adopted by the whole community (the fewer requirements, the better), and for our own software to stay compatible with the BIPs we claim to implement, as otherwise we’d have to constantly deprecate/redraft/repropose, as we make minor changes to our own implementation.

I’d say that how we handle a getdata request for a value that we never announced is one of those non-essential behaviors – ignoring it is certainly fair, and returning something useful is obviously better still. No one should assume, or would assume, that this is something that ought to work, so I don’t think any software would be relying on this.

Also, if we had a set of design docs for how our own project implements particular BIPs, I’d agree that those should reflect more exactly what our code does – but that would be a new initiative for us.

jnewbery commented at 3:44 pm on March 31, 2020:

I find this behavior very confusing. Can you help me understand if I’m getting any of this wrong:

We have a peer that has negotiated wtxid relay
That peer relays us a transaction whose parents we don’t have
We want to request the parent transaction, so we put the txid in the m_tx_process_time and m_tx_announced for this peer (https://github.com/bitcoin/bitcoin/pull/18044/files#diff-eff7adeaec73a769788bb78858815c91R2654). That txid will later be added to the global g_already_asked_for (https://github.com/bitcoin/bitcoin/pull/18044/files#diff-eff7adeaec73a769788bb78858815c91R4151)
In SendMessages() we request that transaction with a MSG_WTX GETDATA, but with the hash set to the txid (https://github.com/bitcoin/bitcoin/pull/18044/files#diff-eff7adeaec73a769788bb78858815c91R3990).
The peer responds to that GETDATA with a TX because it looks in its mapRelay for transactions with a txid OR wtxid that match (https://github.com/bitcoin/bitcoin/pull/18044/files#diff-eff7adeaec73a769788bb78858815c91R1628)
When we receive the TX, we try to remove the wtxid from the peer’s m_tx_announced and m_tx_in_flight, and the global g_already_asked_for (https://github.com/bitcoin/bitcoin/pull/18044/files#diff-eff7adeaec73a769788bb78858815c91R2587-R2594)

The thing I don’t understand is that we seem to be adding the txid to these per-peer and global structures, and then trying to remove the wtxid from them. Will the txid ever be cleared? Do we end up in a request loop with the peer?

jnewbery commented at 8:47 pm on March 31, 2020:

I also can’t quite follow the logic here: https://github.com/bitcoin/bitcoin/pull/18044/files#diff-eff7adeaec73a769788bb78858815c91R4139-R4140

It looks to me like the txid has been added to m_tx_process_time, but when we call into AlreadyHave(inv, m_mempool), the inv is a MSG_WTX, so we’ll check in our mempool for a transaction with that wtxid (which we won’t find). We’ll then rerequest the transaction from our peer.

ajtowns commented at 3:08 pm on April 2, 2020:

@sdaftuar “No one should assume, or would assume, that this is something that ought to work” – but… aren’t we assuming that it will work when we send a WTX request with a txid instead of a wtxid?

Aside from any bugs, I think this adds three lots of technical debt:

we’re replying to GETDATA WTX <txid> and GETDATA TX <wtxid> if the tx is still in mapRelay – this will go away if/when mapRelay goes away
we’re asking for WTX <txid> when looking for parents of orphan transactions – if we figure out how to replace orphan handling with package relay, this will go away too; but in the meantime it means mapRelay going away breaks orphan handling, and I think it breaks orphan handling when the parent’s >15m old immediately
in the functional tests, as of #18446 at least, we’re still sending TX INVs/GETDATAs etc despite claiming to be of a version that supports wtxid relay

One alternative might be to embrace the ambiguity: make the BIP be “if you signal wtxidrelay, then we’ll send INV TX wtxid instead of INV TX txid, and also allow you to request txs by either GETDATA TX txid or GETDATA TX wtxid” ?

sdaftuar commented at 7:04 pm on April 2, 2020:

I think I’m just going to eliminate the logic for requesting orphan parents from wtxid-relay peers, as you suggested above – I think that is the simplest approach, and should eliminate the bugs reported by @jnewbery.

ajtowns commented at 9:31 am on April 6, 2020:

Seems okay. I think it could make sense to explicitly skip INVs offering a TX/WTX that we’d incorrectly GETDATA as a WTX/TX (respectively) as well:

 0--- a/src/net_processing.cpp
 1+++ b/src/net_processing.cpp
 2@@ -2323,6 +2323,13 @@ bool ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStream& vR
 3             if (interruptMsgProc)
 4                 return true;
 5 
 6+            // ignore INVs that don't match wtxidrelay setting
 7+            if (State(pfrom->GetId())->m_wtxid_relay) {
 8+                if (inv.type == MSG_TX) continue;
 9+            } else {
10+                if (inv.type == MSG_WTX) continue;
11+            }
12+
13             bool fAlreadyHave = AlreadyHave(inv, mempool);
14             LogPrint(BCLog::NET, "got inv: %s  %s peer=%d\n", inv.ToString(), fAlreadyHave ? "have" : "new", pfrom->GetId());

I had a go at updating fjahr’s tests, and it seems to make reasonable sense (either with or without the change I suggest here)

(This still leaves us with the tech debt of our functional tests mostly testing tx relay when we’ll be moving towards doing wtxid relay most of the time in the real world, but that’s not really avoidable)

fjahr commented at 4:28 pm on April 21, 2020:

(This still leaves us with the tech debt of our functional tests mostly testing tx relay when we’ll be moving towards doing wtxid relay most of the time in the real world, but that’s not really avoidable)

I think that should be addressed with this update: https://github.com/fjahr/bitcoin/commit/becdb68ab37fdcbc1dcf743b19ea8629f52677bb

fjahr commented at 5:28 pm on April 21, 2020:

I think it could make sense to explicitly skip INVs offering a TX/WTX that we’d incorrectly GETDATA as a WTX/TX (respectively) as well:

And I agree with @ajtowns , this behavior makes intuitive sense to me and it matches my interpretation of the current description in the BIP: “After a node has sent and received a “wtxidrelay” message to/from a given peer, the node is required to use the MSG_WTX inv-type when announcing transactions to that peer, or requesting announced transactions from that peer.”

I think it would even be reasonable to argue about banning peers for this behavior if there is an agreement that this behavior is wrong. The node is unnecessarily spamming the network and just silently dropping the message might let it take longer until an issue with this is discovered by the node user.

jnewbery commented at 7:53 pm on March 28, 2020: member

Concept ACK. I’ve had a quick look through the commits and have one question about orphan tx relay a few minor comments.

commit Rename AddInventoryKnown() to AddKnownTx() (782a0b04496738030e3e51a6bf44aa5b426870b9) should be squashed into Just pass a hash to AddInventoryKnown (ed5eebb4739d347bb5de783aa92ebac8eecf87a4)
Commit log for Add p2p message “wtxidrelay” should s/wtxid’s/wtxids/
#18446 should be added to this PR.

sdaftuar commented at 7:48 pm on March 30, 2020: member

@fjahr Thanks for starting work on a test in #18446! Once things converge in the review there I’d be happy to include those commits here as well.

DrahtBot added the label Needs rebase on Apr 29, 2020

ariard commented at 6:28 am on April 30, 2020: member

@amitiuttarwar follow-up on this #18038 (review), now 18038 has been merged it would be really kind to have a patch for mempool intial-broadcast reattempt support and avoid wtxid taking too much delay :)

amitiuttarwar commented at 4:06 pm on April 30, 2020: contributor

@ariard yup! thanks for flagging. already working on the patch & will post it here once its ready for review

amitiuttarwar commented at 10:37 pm on May 1, 2020: contributor

initial attempt here: https://github.com/amitiuttarwar/bitcoin/commits/pr18044

rebased this branch on master, turned unbroadcast set into a map of txid -> wtxid, updated logic to play nice with changes to RelayTransaction. existing tests pass
can revisit and add tests as #18807 and #18446 make progress

fjahr commented at 10:44 am on June 19, 2020: member

initial attempt here: amitiuttarwar/bitcoin@pr18044 (commits)

rebased this branch on master, turned unbroadcast set into a map of txid -> wtxid, updated logic to play nice with changes to RelayTransaction. existing tests pass

can revisit and add tests as #18807 and #18446 make progress

Which progress are you looking for in #18446?

fjahr commented at 11:56 am on June 19, 2020: member

@sdaftuar Sorry, I may be missing context but is this work blocked somehow? How can people help bring this closer to the finish line?

sdaftuar force-pushed on Jun 19, 2020

sdaftuar commented at 7:08 pm on June 19, 2020: member

@fjahr Thanks for the reminder, I’ve gone ahead and rebased this. There were quite a few merge conflicts, so I hope reviewers will try to review these changes from scratch!

DrahtBot removed the label Needs rebase on Jun 19, 2020

adamjonas commented at 1:39 am on June 20, 2020: member

I’m seeing a few post-rebase build warnings that weren’t on 9eb584e1ffefdcbf7db0c3ca592f914c9ff46f68 (macOS clang v10):

0validation.cpp:5102:49: warning: loop variable 'elem' has type 'const std::pair<uint256, uint256> &' but is initialized with type 'std::__1::__map_iterator<std::__1::__tree_iterator<std::__1::__value_type<uint256, uint256>, std::__1::__tree_node<std::__1::__value_type<uint256, uint256>, void *> *, long> >::value_type' (aka 'pair<const uint256, uint256>') resulting in a copy [-Wrange-loop-analysis]
1        for (const std::pair<uint256, uint256>& elem : unbroadcast_txids) {
2                                                ^
3validation.cpp:5102:14: note: use non-reference type 'std::pair<uint256, uint256>' to keep the copy or type 'const std::__1::__map_iterator<std::__1::__tree_iterator<std::__1::__value_type<uint256, uint256>, std::__1::__tree_node<std::__1::__value_type<uint256, uint256>, void *> *, long> >::value_type &' (aka 'const pair<const uint256, uint256> &') to prevent copying
4        for (const std::pair<uint256, uint256>& elem : unbroadcast_txids) {
5             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
61 warning generated.

0net_processing.cpp:835:45: warning: loop variable 'elem' has type 'const std::pair<uint256, uint256> &' but is initialized with type 'std::__1::__map_iterator<std::__1::__tree_iterator<std::__1::__value_type<uint256, uint256>, std::__1::__tree_node<std::__1::__value_type<uint256, uint256>, void *> *, long> >::value_type' (aka 'pair<const uint256, uint256>') resulting in a copy [-Wrange-loop-analysis]
1    for (const std::pair<uint256, uint256>& elem : unbroadcast_txids) {
2                                            ^
3net_processing.cpp:835:10: note: use non-reference type 'std::pair<uint256, uint256>' to keep the copy or type 'const std::__1::__map_iterator<std::__1::__tree_iterator<std::__1::__value_type<uint256, uint256>, std::__1::__tree_node<std::__1::__value_type<uint256, uint256>, void *> *, long> >::value_type &' (aka 'const pair<const uint256, uint256> &') to prevent copying
4    for (const std::pair<uint256, uint256>& elem : unbroadcast_txids) {
5         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6net_processing.cpp:838:13: warning: calling function 'RelayTransaction' requires holding mutex 'cs_main' exclusively [-Wthread-safety-analysis]
7            RelayTransaction(elem.first, elem.second, *connman);
8            ^
92 warnings generated.

These two were first seen on 9f2dfa674aa54315f5c51a37463974019d091eb2.

0node/transaction.cpp:85:9: warning: calling function 'RelayTransaction' requires holding mutex 'cs_main' exclusively [-Wthread-safety-analysis]
1        RelayTransaction(hashTx, tx->GetWitnessHash(), *node.connman);
2        ^
31 warning generated.

This one seems to be introduced in 40825853e.

naumenkogs commented at 9:24 am on June 20, 2020: member

Code review ACK a7844da61d3f9eaca9d9d35337d463c64ac541a4, I think everything looks correct.

I confirm the build warnings above, would be nice to resolve them.
Also would be nice to address these two issues regarding outdated comments.
The code should be updated with BIP number once that’s merged (this PR has a couple “BIP XXX”).

in src/net_processing.cpp:835 in a7844da61d outdated

829@@ -818,14 +830,14 @@ void PeerLogicValidation::InitializeNode(CNode *pnode) {
830 
831 void PeerLogicValidation::ReattemptInitialBroadcast(CScheduler& scheduler) const
832 {
833-    std::set<uint256> unbroadcast_txids = m_mempool.GetUnbroadcastTxs();
834+    std::map<uint256, uint256> unbroadcast_txids = m_mempool.GetUnbroadcastTxs();
835 
836-    for (const uint256& txid : unbroadcast_txids) {
837+    for (const std::pair<uint256, uint256>& elem : unbroadcast_txids) {

amitiuttarwar commented at 7:41 pm on June 20, 2020:

from a quick look, seems like using const and reference declarator together on this std::pair<uint256,uint256> type is causing the build warnings. I don’t yet understand why.

leaving & but removing the const throws an error:

0net_processing.cpp:835:39: error: non-const lvalue reference to type 'pair<uint256, [...]>' cannot bind to a value of unrelated type 'pair<const uint256, [...]>'
1    for (std::pair<uint256, uint256>& elem : unbroadcast_txids) {

but removing the reference declarator allows it to build with no errors / warnings. with or without the const.

sipa commented at 8:00 pm on June 20, 2020:

The reason is that the element type of unbroadcast_txids is std::pair<const uint256, uint256> which can’t be converted to std::pair<uint256, uint256>. If you want a const reference to such a thing, it will be copied, and you’ll get a reference to the copy.

The correct solution is either (const) std::pair<const uint256, uint256>&, or just const auto&.

sdaftuar commented at 2:53 pm on June 22, 2020:

Fixed with const auto&.

in src/net_processing.cpp:428 in 40825853e2 outdated

398@@ -399,6 +399,9 @@ struct CNodeState {
399     //! Whether this peer is a manual connection
400     bool m_is_manual_connection;
401 
402+    //! Whether this peer relays txs via wtxid
403+    bool m_wtxid_relay{false};

ariard commented at 8:21 am on June 22, 2020:

As a follow-up, I think we may scope this under a connection type, like #19316.

Side-note, @amitiuttarwar, I had only a super-quick overview on connection type, it’s a great move but IMO I would adopt a more fine-grained typology. Right now it sounds to confuse type of traffic (block, addr, tx) and type of connection selection (outbound/inbound, normal/anchor, …)

sdaftuar commented at 12:08 pm on June 23, 2020:

Marking this as resolved – nothing more to do here as far as I understand.

amitiuttarwar commented at 8:33 pm on June 24, 2020:

@ariard lets continue the conversation here: #19316 (comment)

in src/net_processing.cpp:2957 in 40825853e2 outdated

2866+        // with a different witness, but this has limited downside:
2867+        // mempool validation does its own lookup of whether we have the txid
2868+        // already; and an adversary can already relay us old transactions
2869+        // (older than our recency filter) if trying to DoS us, without any need
2870+        // for witness malleation.
2871+        if (!AlreadyHave(CInv(MSG_WTX, wtxid), mempool) &&

ariard commented at 8:36 am on June 22, 2020:

AFAICT, if we have sighash_noinput+Eltoo channel design in the future, commitment transactions are going to be symmetric but witness likely asymmetric (one per-party). So you would observe wtxidA != wtxidB but txidA == txidB which means in case of concurrent broadcast by different parties you will spuriously re-download transactions and only gets one version in your mempool.

Assuming such class of transactions traffic (multiple-parties, asymmetric witnesses, high-rate of concurrent broadcasts) get few digits of overall traffic it may be a serious bandwidth increase and may not be compatible with a wtxid-based tx-relay. Of course, as of today it’s really hypothetical but you may find such pattern with vault protocols (multiple-parties, concurrent broadcast, asymmetric witnesses).

ajtowns commented at 9:50 am on June 22, 2020:

“spuriously re-download transactions and only gets one version in your mempool” already happens with RBF; I don’t think this is meaningfully different?

(If you’re just varying the witness you don’t need SIGHASH_NOINPUT, though, so I think there’s a good chance the txid would be different too for eltoo transactions if that comes into play)

sdaftuar commented at 3:16 pm on June 22, 2020:

I don’t understand the comment here – are you saying wtxid-relay is somehow incompatible with a future change to the protocol? It seems to me that (a) wtxid-relay is more likely to support the ability to relay transactions with same underlying txid but different wtxid than our existing relay protocol, and (b) whether we adopt changes to the mempool acceptance rules to allow for this is otherwise orthogonal to this PR.

If you’re asking for a change here, can you indicate what that is?

ariard commented at 7:22 pm on June 22, 2020:

“spuriously re-download transactions and only gets one version in your mempool” already happens with RBF; I don’t think this is meaningfully different?

I think you’re right we do already have this kind of case due to single-party RBF. After thoughts, qualifying them of spurious was a fallacy as a smaller valid witness may increase transaction feerate and such increase its probabilities of propagating and confirming. So we have to download them anyway.

(I agree SIGHASH_NOINPUT doesn’t force you to vary witness, it’s more likely we would do it, fyi another generalized channel proposal leveraging per-party adaptor sigs to get symmetric states)

I’m not advocating on any change here, I was more brooding on the fact that fancier on-top-of-Bitcoin protocols and automatic dynamic-fee bumping as currently gauged by LN implementations will have for side-effect to increase bandwidth demand in case of mempool congestion.

and (b) whether we adopt changes to the mempool acceptance rules to allow for this is otherwise orthogonal to this PR.

Yes, I was wrong that’s orthogonal to tx-relay, it’s more being sure that these potential bandwidth demand surges are priced “correctly” with regards to BIP 125 rule 4. Not a discussion to have in this PR.

in src/net_processing.cpp:797 in 9f0f4809e5 outdated

772@@ -768,6 +773,9 @@ std::chrono::microseconds CalculateTxGetDataTime(const uint256& txid, std::chron
773     // We delay processing announcements from inbound peers
774     if (use_inbound_delay) process_time += INBOUND_PEER_TX_DELAY;
775 
776+    // We delay processing announcements from peers that use txid-relay (instead of wtxid)

ariard commented at 8:51 am on June 22, 2020:

The trade-off bandwidth saving against tx-relay propagation latency may not be only the one. Investigating around transaction pinning (https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2020-April/017757.html), there is this idea that an attacker, by bypassing tx-relay delay timers will always win the propagation race and that way be first in every miners full-node mempools and succeed the pin. I think against some malicious party willingly to override protocol rules at its advantage we can do nothing but I’m not yet confident on this.

sdaftuar commented at 12:09 pm on June 23, 2020:

Marking this as resolved, as nothing more to do here as far as I understand.

in src/net_processing.cpp:3035 in 247d67486a outdated

2972+                // interfere with relay of valid transactions from peers that
2973+                // do not support wtxid-based relay. See
2974+                // https://github.com/bitcoin/bitcoin/issues/8279 for details.
2975+                // We can remove this restriction (and always add wtxids to
2976+                // the filter even for witness stripped transactions) once
2977+                // wtxid-based relay is broadly deployed.

ariard commented at 9:07 am on June 22, 2020:

I think we should be really careful about including TX_WITNESS_STRIPPED to the rejection filter.

In LN, your channel counterparty knows your own version of the commitment transaction as it has to build it to sign it. It means a malicious counterparty would be able to broadcast a witness stripped version of it at any moment and that way get it added to the rejection filters of any p2p peers. If the full-node used by your LN-node is non-upgraded it will announce broadcast of a witness transaction (the local commitment) and it won’t be fetched by the lured peers thus effectively censoring your time-sensitive transactions.

Is this understanding of wtxid tx-relay correct ? I don’t think it changes anything that your topology is well-established with wtxid-relay peers, the issue is you not being upgraded and subject to intentional reject filters collisions.

sdaftuar commented at 3:11 pm on June 22, 2020:

Can you explain what you think needs to change in the code here?

I think the documentation is trying to explain a necessary (but not sufficient) condition for changing the code at some unspecified point in the future. If you want to argue that we should never change this code because other ecosystem software might break, then that’s something worthy of discussion, but I don’t know that we need to have that discussion in this PR, given that we’re not proposing actually doing that now (nor even proposing a timeline for it).

ariard commented at 7:46 pm on June 22, 2020:

Can you reference this comment/issue directly around the code ?

I think that’s a bit different than Matt’s previous comment here where we only considered censorship of a wtixd-relay requester, which should be fine if you have at least 1 honest wtxid-relay peer as you pointed.

The scenario I’m describing is I think different, it’s censorship of an non-upgraded wtxid-relay initial sender being only connected to reject-witness-stripped wtxid-relay peers. This sender announcing through txid only won’t have its transaction fetched and propagated due to an already existent collision in the rejection filter. Such topology configuration sounds realistic for private nodes.

I don’t want to argue that we should never change this but when we do it we give a clear warning and time buffer to the rest of the ecosystem to be sure they upgrade. Because by switching p2p policy rules you would effectively introduce a critical vulnerability for any higher-layer software pending on a non-upgraded full-node. LN security model fundamentally reverse the double-spend problem to a private matter, it’s the duty of your individual LN-node and full-node to ensure that your transactions propagate well and confirm quickly.

And I don’t think we have guidelines for this today, it sounds more like an illustration of the lack of a set of design docs for a) how we implement BIPs b) what assumptions ecosystem software can do on stability of such implemented rules that you’re describing here.

sdaftuar commented at 12:04 pm on June 23, 2020:

Do you mean just add a link to your github comment, or something else?

Here’s the current comment, which I think captures the issue you’re describing (as far as I understand, what you’re talking about is fundamentally the same as the issue we flagged years ago when we first deployed segwit, see #8279):

0                // Do not add txids of witness transactions or witness-stripped
1                // transactions to the filter, as they can have been malleated;
2                // adding such txids to the reject filter would potentially
3                // interfere with relay of valid transactions from peers that
4                // do not support wtxid-based relay. See
5                // [#8279](/bitcoin-bitcoin/8279/) for details.
6                // We can remove this restriction (and always add wtxids to
7                // the filter even for witness stripped transactions) once
8                // wtxid-based relay is broadly deployed.

But if there’s specific different language or a link you’d like to see in here, please just provide alternate text.

ariard commented at 0:43 am on June 27, 2020:

Right the scenario I’m describing is leveraging the same original issue from #8279, just concerning censorship of a non-upgraded initial sender.

0               // Note, any on-top-of-base-layer software (eg LN-node) relying on a non-upgraded to wtxid-relay full-node 
1              //  would be subject to censor of its time-sensitive transactions. See [#18044 (review)](/bitcoin-bitcoin/18044/#discussion_r443419034)
2             //  Removing this restriction should be done considering ecosystem coordination.

I think “broadly” is to dim, ideally you want 99% of upgrade among LN-node’s full-nodes.

sdaftuar commented at 10:09 pm on June 27, 2020:

I will update the comment to incorporate your suggestion somewhat, including providing a link to the comments as you request, but I have a philosophical difference of opinion about the implications here of ecosystem software that is dependent on particular p2p behavior (specifically with regard to your “ideally you want 99% of upgrade among LN-node’s full nodes” comment).

We have anti-CPU DoS reasons for wanting to get rid of the code that tries to determine whether a transaction is witness stripped (as well as generally simplifying the code that handles transactions that fail mempool acceptance). Requiring that ecosystem software eventually update – which can be as simple as putting txid-relay nodes behind wtxid-relay border nodes – in order to be protected against an esoteric attack, with many years of notice, seems completely reasonable to me.

In particular, if we see that the vast majority of full nodes upgrade to wtxid-relay support over the next few years, but a handful of stubborn lightning implementations have not, I think at some point we could say, too bad, find your own workaround for transaction relay if you’re worried about this kind of attack. For these kinds of non-consensus security issues, I think it’s reasonable for this project to provide software and solutions (including documenting our behavior and communicating that to our users) and expect that people worried about those kinds of problems will take care of it themselves. At some point it should be okay for us to have shipped solutions and given people plenty of time to upgrade, and it’s not our responsibility if there are people who have chosen not to follow our advice.

ariard commented at 9:23 am on June 22, 2020: member

I think this point is concerning and should be documented, others are far more loose.

With regards to BIP lack of specification around wtxid-relay peers who may have to fetch through MSG_TX for getting orphan parents, I think that’s okay to remove this feature for wtxid tx-relay, I can’t envision any application relying on it for their correctness/security (like no one is doing reverse-topological transaction announcements with CPFP-first as we still evaluate transaction 1-by-1).

As other blockers I see:

bandwidth measure to assess https://github.com/bitcoin/bitcoin/pull/18044/commits/9f0f4809e516b7c2393b667cb039701684e1c7b6 worthiness
publishing to the mailing list the BIP
addressing naming : #18044 (review) (it has been judged confusing by at least few people)
testing coverage : #18446
integrating mempool initial-broadcast reattempt wtxid-support : https://github.com/amitiuttarwar/bitcoin/commits/pr18044 ?

What else ?

As follow-ups, this implementation duplicate some index on our data structures (orphan, mempool, …) we may clean then by encapsulating behavior : #18044 (review) (I think it’s okay to add a bit of technical debt here, implementation is already complex enough).

in src/net_processing.h:103 in 40825853e2 outdated

 96@@ -97,6 +97,6 @@ struct CNodeStateStats {
 97 bool GetNodeStateStats(NodeId nodeid, CNodeStateStats &stats);
 98 
 99 /** Relay transaction to every node */
100-void RelayTransaction(const uint256&, const CConnman& connman);
101+void RelayTransaction(const uint256& txid, const uint256& wtxid, const CConnman& connman) EXCLUSIVE_LOCKS_REQUIRED(cs_main);

ajtowns commented at 9:55 am on June 22, 2020:

Looks like this means cs_main needs to be locked in ReattemptInintialBroadcast() (net_processing) and BroadcastTransaction() (node/transaction.cpp)

sdaftuar commented at 3:05 pm on June 22, 2020:

Thanks for catching those, fixed.

sdaftuar force-pushed on Jun 22, 2020

sdaftuar commented at 3:28 pm on June 22, 2020: member

Thanks all for flagging the build warnings and the missing LOCK(cs_main) calls (at least 1 was lost inadvertently in the rebase). Those should all be fixed now. @naumenkogs With regards to the outdated comments you referenced, I believe those were already addressed – is there something else you wanted to see? @ariard

bandwidth measure to assess 9f0f480 worthiness

FYI – I did some testing with an older version of this PR, with one node connected to another using this patch (both running locally on the same machine), and each was also connected out to the network. I was measuring how many times I downloaded the same transaction twice, and the number I saw over the last few months is about 0.5%. My setup of having two nodes on the same (fast) machine is certainly not representative, so I would encourage others to do their own tests of this.

publishing to the mailing list the BIP

The BIP was already posted to the mailing list in February (https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2020-February/017648.html). I suppose I should get a BIP number…

addressing naming : #18044 (comment) (it has been judged confusing by at least few people)

There doesn’t seem to be any clear agreement on what better names would look like, so I’d be inclined to let someone try to address this as a follow-up PR, if the enum I introduced really bothers anyone.

testing coverage : #18446

I’ll try to cherry-pick those testing commits later today.

integrating mempool initial-broadcast reattempt wtxid-support : https://github.com/amitiuttarwar/bitcoin/commits/pr18044 ?

Can you explain what is missing here?

in src/validation.cpp:5102 in d5d55edb1f outdated

5103         } catch (const std::exception&) {
5104           // mempool.dat files created prior to v0.21 will not have an
5105           // unbroadcast set. No need to log a failure if parsing fails here.
5106         }
5107-
5108+        for (const std::pair<uint256, uint256>& elem : unbroadcast_txids) {

adamjonas commented at 3:53 pm on June 22, 2020:

Believe the same use of const auto& is needed here as used in net_processing.

0        for (const auto& elem : unbroadcast_txids) {

sdaftuar commented at 4:33 pm on June 22, 2020:

Doh, thanks! Fixed.

sdaftuar force-pushed on Jun 22, 2020

ariard commented at 9:59 pm on June 22, 2020: member

The BIP was already posted to the mailing list in February (https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2020-February/017648.html). I suppose I should get a BIP number…

Right, I forgot about that. Reading about the mail again it doesn’t mention the issue with wtixd-relay which may have to GETDATA(INV(MSG_TX) in case of TX_MISSING_INPUTS but as we agree to drop this behavior for wtixd-relay (note: not done as of tip commit da72e51 ?) I think BIP doesn’t have to change. Maybe a release note if the really-unlikely-case some application was relying on it ? Is BIP number attribution considered as needed before merging this ?

Can you explain what is missing here?

AFAICT, it sounds to be exhaustive with mempool initial-rebroadcast support, just wasn’t pick up on main branch.

I think it’s just pending on removal of orphan parents fetching through txid announcement before another round of Code Review ?

sdaftuar commented at 11:54 am on June 23, 2020: member

Reading about the mail again it doesn’t mention the issue with wtixd-relay which may have to GETDATA(INV(MSG_TX) in case of TX_MISSING_INPUTS but as we agree to drop this behavior for wtixd-relay (note: not done as of tip commit da72e51 ?)

This should have already been fixed here: https://github.com/bitcoin/bitcoin/pull/18044/commits/af8e570be4a059d254492b304f60bb0039052094#diff-eff7adeaec73a769788bb78858815c91R2902

in src/net_processing.cpp:4552 in bc565ee3bd outdated

4458+                    // txid-relay peers is to save bandwidth on initial
4459+                    // announcement of a transaction, and doesn't make sense
4460+                    // for a followup request if our first peer times out (and
4461+                    // would open us up to an attacker using inbound
4462+                    // wtxid-relay to prevent us from requesting transactions
4463+                    // from outbound txid-relay peers).

ariard commented at 0:24 am on June 27, 2020:

I think you may have a slight InvBlock until wtxid-relay is double-digit deployed, a wtxid-relay-signaling inbound peer can announce a MSG_TX and thus being processed with same time priority than txid-relay outbound peers. By holding the TX and assume your victim only have txid-relay peers you block transaction propagation

Maybe CalculateTxGetDataTime should use the inv type instead of peer signaling to apply the delay ?

sdaftuar commented at 0:31 am on June 28, 2020:

I don’t think this is very significant but perhaps we can include the suggestion that I believe aj (or others) made of ignoring MSG_TX inv’s from wtxid-relay peers (and vice versa).

ariard commented at 0:28 am on June 27, 2020: member

This should have already been fixed here: af8e570#diff-eff7adeaec73a769788bb78858815c91R2902

Okay, I thought you would drop behavior even for txid-relay peers. It maybe cleaner to remove it completely to avoid some types of peers intentionally not-signaling wtxid-relay to leverage this behavior.

sdaftuar force-pushed on Jun 28, 2020

sdaftuar commented at 12:02 pm on June 28, 2020: member

This PR has been updated to include the tests from #18446 (thank you @fjahr and @ajtowns!), and to reflect the bip number assignment. I’ve also included logic from @ajtowns to ignore MSG_TX invs from wtxid-relay peers, and vice versa, to address concerns about (1) intentional misbehavior from wtxid-relay peers to interfere with relay, and (2) making the logic around what kind of data we process from each peer simpler to understand.

ariard commented at 0:28 am on June 29, 2020: member

Code Review ACK 00cd6e3. Enforce inv-compliance and integrate BIP number since last review.

in src/txmempool.h:755 in fec2cba022 outdated

753     /** Removes a transaction from the unbroadcast set */
754     void RemoveUnbroadcastTx(const uint256& txid, const bool unchecked = false);
755 
756     /** Returns transactions in unbroadcast set */
757-    std::set<uint256> GetUnbroadcastTxs() const {
758+    const std::map<uint256, uint256>& GetUnbroadcastTxs() const {

ajtowns commented at 2:05 am on June 29, 2020:

This was making a copy of the set, which could then be accessed without holding the mempool.cs lock; returning a const reference instead is unsafe, isn’t it? (In particular, if I understand correctly, map<uint256,uint256> x = mempool.GetUnbroadcastTxs(); will do the copy from *(&mempool.m_unbroadcast_txids) after releasing the lock with this code, where previously it would have either copied to a temporary first or done copy elision and copied directly to x while holding the lock)

sdaftuar commented at 9:16 pm on June 29, 2020:

Thanks, fixed.

in src/net_processing.cpp:1664 in b4779bbb0a outdated

1640@@ -1625,24 +1641,24 @@ void static ProcessGetBlockData(CNode& pfrom, const CChainParams& chainparams, c
1641 }
1642 
1643 //! Determine whether or not a peer can request a transaction, and return it (or nullptr if not found or not allowed).
1644-CTransactionRef static FindTxForGetData(CNode& peer, const uint256& txid, const std::chrono::seconds mempool_req, const std::chrono::seconds longlived_mempool_time) LOCKS_EXCLUDED(cs_main)
1645+CTransactionRef static FindTxForGetData(CNode& peer, const uint256& txid_or_wtxid, bool use_wtxid, const std::chrono::seconds mempool_req, const std::chrono::seconds longlived_mempool_time) LOCKS_EXCLUDED(cs_main)

ajtowns commented at 3:20 am on June 29, 2020:

Might be clearer just to pass the CInv in instead of separating the hash and precomputing inv.type==MSG_WTX at the caller.

sdaftuar commented at 9:19 pm on June 29, 2020:

I kind of don’t like passing the CInv around, as it makes more places in the code have to know about the different enums that we use in there (and more places to potentially change in the code if we add new enums in the future), so leaving this as-is.

sipa commented at 11:31 pm on July 6, 2020:

Agree with @sdaftuar. It’s annoying that it adds variables in many places, but the various CInv states really don’t correspond to what we need here either.

It may be useful to have an encapsulated (use_wtxid, txid_or_wtxid) pair type, but let’s keep that for later (I think it would make sense for combining with #19184 as well).

ajtowns commented at 6:27 am on June 29, 2020: member

894c35aa4fc22c2b048ebfec6503164b3aadf592 looks pretty good, but I think the unbroadcast stuff has a minor bug?

in src/net_processing.cpp:2046 in 7d742ab911 outdated

2007@@ -2007,10 +2008,19 @@ void static ProcessOrphanTx(CConnman* connman, CTxMemPool& mempool, std::set<uin
2008             // Has inputs but not accepted to mempool
2009             // Probably non-standard or insufficient fee
2010             LogPrint(BCLog::MEMPOOL, "   removed orphan tx %s\n", orphanHash.ToString());
2011-            if (orphanTx.HasWitness() || orphan_state.GetResult() != TxValidationResult::TX_WITNESS_MUTATED) {
2012-                // Do not use rejection cache for witness transactions or
2013-                // witness-stripped transactions, as they can have been malleated.
2014-                // See https://github.com/bitcoin/bitcoin/issues/8279 for details.
2015+            if (orphan_state.GetResult() != TxValidationResult::TX_WITNESS_STRIPPED) {
2016+                // Do not add txids of witness transactions or witness-stripped

instagibbs commented at 2:01 pm on June 29, 2020:

Isn’t this block about tracking witness txid not txid?

sdaftuar commented at 9:17 pm on June 29, 2020:

Ok I guess recentRejects is pretty confusing. Made one more attempt to add comments around how this works.

instagibbs commented at 2:12 pm on June 30, 2020:

I understand how they work(in master), I didn’t understand why it was talking about something it’s not doing. With the added comment it’s clearer, thanks.

sdaftuar force-pushed on Jun 29, 2020

sdaftuar commented at 9:23 pm on June 29, 2020: member

Addressed @ajtowns’ comments. Note the behavior change: we now disconnect a peer if they send a wtxidrelay message after VERACK.

Also fixed the bug I introduced in the mempool unbroadcast stuff, gave in to dropping the wtxid_exists function, and made yet another attempt to improve the comments explaining how we use recentRejects.

in src/net_processing.cpp:3021 in 6787a3dc2a outdated

2982@@ -2877,15 +2983,30 @@ void ProcessMessage(
2983                 LogPrint(BCLog::MEMPOOL, "not keeping orphan with rejected parents %s\n",tx.GetHash().ToString());
2984                 // We will continue to reject this tx since it has rejected
2985                 // parents so avoid re-requesting it from other peers.
2986+                // Here we add both the txid and the wtxid, as we know that
2987+                // regardless of what witness is provided, we will not accept
2988+                // this, so we don't need to allow for redownload of this txid
2989+                // from any of our non-wtxidrelay peers.
2990                 recentRejects->insert(tx.GetHash());

naumenkogs commented at 1:49 pm on June 30, 2020:

I’m not very familiar with alternative (deployed) SIGHASH flags, but I was wondering if this can possible screw us somehow… I don’t have a particular attack scenario, just throwing this idea here.

sdaftuar commented at 2:07 pm on June 30, 2020:

I don’t know what SIGHASH flags has to do with this, can you elaborate on the scenario you have in mind?

My understanding of the logic here is that at this point in the code, we know that a parent of this orphan transaction will definitely not be accepted by our node (as it is already in our reject filter). Since a txid is a commitment to the inputs of a transaction, this means that we can definitively know that we’ll never accept any transaction with the same txid as this orphan, regardless of what witness it comes with.

naumenkogs commented at 4:03 pm on June 30, 2020:

Nevermind, I was worried that something would happen to xxx | ANYONECANPAY, but now I think there are no issues there.

in test/functional/p2p_segwit.py:1324 in 5c7c3f44bd outdated

1295@@ -1296,9 +1296,9 @@ def test_tx_relay_after_segwit_activation(self):
1296 
1297         # Node will not be blinded to the transaction
1298         self.std_node.announce_tx_and_wait_for_getdata(tx3)
1299-        test_transaction_acceptance(self.nodes[1], self.std_node, tx3, True, False, 'tx-size')
1300+        test_transaction_acceptance(self.nodes[1], self.std_node, tx3, True, False)

instagibbs commented at 3:57 pm on June 30, 2020:

why are these string checks removed?

sdaftuar commented at 8:25 pm on July 18, 2020:

Took me forever to remember the issue here – the first string check can stay, but the second one has to be removed because now that we cache failure of wtxid’s in the recentRejects filter, the reason for failure on the second validation attempt will be different.

sdaftuar commented at 10:57 pm on July 18, 2020:

Reinstating this first string check (and the second line was removed in a later commit anyway).

naumenkogs commented at 4:03 pm on June 30, 2020: member

utACK 6787a3dc2abb04f0b75aa2f8996e4bbd6c468ddd

instagibbs commented at 4:06 pm on June 30, 2020: member

utACK https://github.com/bitcoin/bitcoin/pull/18044/commits/6787a3dc2abb04f0b75aa2f8996e4bbd6c468ddd

in src/net_processing.cpp:1240 in 95d729ad02 outdated

1236@@ -1236,6 +1237,7 @@ void PeerLogicValidation::BlockConnected(const std::shared_ptr<const CBlock>& pb
1237         LOCK(g_cs_recent_confirmed_transactions);
1238         for (const auto& ptx : pblock->vtx) {
1239             g_recent_confirmed_transactions->insert(ptx->GetHash());
1240+            g_recent_confirmed_transactions->insert(ptx->GetWitnessHash());

sipa commented at 11:28 pm on July 6, 2020:

In commit “Add wtxids of confirmed transactions to bloom filter”

It may be worth doing this only in case the txid and wtxid differ (inserting the same thing twice still counts towards expiration of old entries). In the worst case it doesn’t matter (it maintains the last 24000 inserted txid/wtxid pairs always), but in the average case, as long as there are non-witness transactions doing so would increase that to somewhere between 24000 and 48000.

sdaftuar commented at 10:58 pm on July 18, 2020:

Done.

in src/net_processing.cpp:2995 in 5c7c3f44bd outdated

2902-                    CInv _inv(MSG_TX | nFetchFlags, txin.prevout.hash);
2903-                    pfrom.AddInventoryKnown(txin.prevout.hash);
2904-                    if (!AlreadyHave(_inv, mempool)) RequestTx(State(pfrom.GetId()), _inv.hash, current_time);
2905+                if (!State(pfrom.GetId())->m_wtxid_relay) {
2906+                    for (const CTxIn& txin : tx.vin) {
2907+                        // Here, we only have the txid (and not wtxid) of the

sipa commented at 2:15 am on July 7, 2020:

This will only request parents of orphan transactions when received from non-wtxid peers? What if all peers support wtxid?

Can’t we still request them, but with MSG_TX instead of MSG_WTX? That’d require keeping track of wtxidness per request, unfortunately.

ajtowns commented at 4:30 am on July 7, 2020:

Earlier discussion of this proposed handling this via package relay

ajtowns commented at 6:31 am on July 7, 2020:

I think this will be an issue even when there’s only a small number of wtxid peers – in that case we’ll delay requesting transactions from non-wtxid peers, so most of the txs (and hence most of the orphans) we receive will be from wtxid peers. I guess I’d still rather see this merged now and fixed later than further delayed though?

sipa commented at 6:58 am on July 7, 2020:

Ok, I read back some of the history there. I see the complexity it introduces, but I’m still really uncomfortable with just breaking orphan parent-fetching. I’ll instrument my node to see how frequently that actually happens.

ajtowns commented at 8:14 am on July 7, 2020:

Hmm, it just doesn’t seem like a big deal to me. I think https://github.com/ajtowns/bitcoin/commits/202007-wtxid-orphans should be all that’s needed to fix it.

sipa commented at 6:16 pm on July 7, 2020:

@ajtowns That does not look bad at all, indeed. I guess we can move ahead with this and re-instate orphan fetching with a patch like that afterwards.

ajtowns commented at 1:18 am on July 8, 2020:

Seems like it would fit in well with the tx relay overhaul, if you stole another bit from m_sequence for the txid vs wtxid flag to me.

sipa commented at 2:51 am on July 8, 2020:

@ajtowns Exactly!

in src/protocol.h:405 in 5c7c3f44bd outdated

395+enum GetDataMsg : uint32_t {
396     UNDEFINED = 0,
397     MSG_TX = 1,
398     MSG_BLOCK = 2,
399+    MSG_WTX = 5,                                      //!< Defined in BIP 339
400     // The following can only occur in getdata. Invs always use TX or BLOCK.

sipa commented at 2:47 am on July 7, 2020:

Nit: this comment is out of date now.

sdaftuar commented at 10:58 pm on July 18, 2020:

Fixed.

ajtowns commented at 6:33 am on July 7, 2020: member

utACK 6787a3dc2abb04f0b75aa2f8996e4bbd6c468ddd ; checked changes since previous review

in src/net_processing.cpp:1708 in 5c7c3f44bd outdated

1704+        CTransactionRef tx = FindTxForGetData(pfrom, inv.hash, inv.type == MSG_WTX, mempool_req, longlived_mempool_time);
1705         if (tx) {
1706+            // WTX and WITNESS_TX imply we serialize with witness
1707             int nSendFlags = (inv.type == MSG_TX ? SERIALIZE_TRANSACTION_NO_WITNESS : 0);
1708             connman->PushMessage(&pfrom, msgMaker.Make(nSendFlags, NetMsgType::TX, *tx));
1709             mempool.RemoveUnbroadcastTx(inv.hash);

sipa commented at 2:47 am on July 8, 2020:

In commit “Add support for tx-relay via wtxid”

I think RemoveUnbroadcastTx will always take txids, not wtxids, so this should be changed to tx->GetHash().

sdaftuar commented at 6:40 pm on July 8, 2020:

Fixed, thanks!

sipa commented at 8:06 am on July 8, 2020: member

Concept and code review ACK. I found one issue that should be fixed; the rest are nits.

I think it would be good to introduce a {bool use_wtxid, uint256 hash} type in a follow-up, as it looks like a lot of code could be made cleaner using that. It seems that could also be a relatively easy way to re-enable orphan fetching of unconfirmed parents from wtxid peers (see https://github.com/bitcoin/bitcoin/pull/18044/files#r450687076), which I’d like to see in the same release as wtxid relay.

sdaftuar force-pushed on Jul 8, 2020

DrahtBot added the label Needs rebase on Jul 11, 2020

sipa commented at 0:20 am on July 14, 2020: member

I’ve been running this branch on a number of nodes, in various configurations (including one with only wtxid peers, and mixed reachable/unreachable ones).

The primary thing I wanted to know is if there was any noticable impact on the number of compact block reconstructions that need transaction requests (as this PR impacts orphan processing). At first glance everything looks fine - nearly all blocks are reconstructed without requesting anything else.

Something that did surprise me, is that there doesn’t seem to be a strong bias towards wtxid peers in terms of downloaded transactions. I have one wtxid peer that is connected to both a wtxid and a normal peer, but around 60% of tx bandwidth is from the txid one. This may be due to the wtxid node just being slower, or having worse latency, or just being further away from common tx sources in the transaction graph - but I was expecting a strong effect.

laanwj added this to the "Blockers" column in a project

ajtowns commented at 11:20 pm on July 17, 2020: member

Something that did surprise me, is that there doesn’t seem to be a strong bias towards wtxid peers in terms of downloaded transactions. I have one wtxid peer that is connected to both a wtxid and a normal peer, but around 60% of tx bandwidth is from the txid one. This may be due to the wtxid node just being slower, or having worse latency, or just being further away from common tx sources in the transaction graph - but I was expecting a strong effect.

I think this is behaving something like:

node A and B do wtxid relay and are peered to each other, but no other wtxid relay peers
node A receives INV of tx t from node C, delays requesting it for 1+2s
node B receives INV of tx t from node D, delays requesting it for 1+2s
node A requests t from C, receives it
node B receives INV of tx t from node A, delays requesting it for 1s
node B requests t from D, receives it

that is, B’s willingness to wait to receive from A is exhausted in the same time as A spend hoping to receive from B.

So I think this will likely remain a small effect for transactions that are initially broadcast to/via non-wtxid-relay nodes? Might be something to reconsider with #19184 which I think would allow prioritising from wtxid-relay peers more directly. (EDIT: on second thoughts, the approach I was thinking of would only help if you could prioritise relay-by-wtxid over relay-by-txid of the same txn, which you obviously can’t do. Some sort of random delay triggered on node-secret/txid so each wtxid node has a small chance of introducing the tx to the wtxid-relay-network early could work, but probably not until there’s a decent number of wtxid relay nodes out there)

sdaftuar force-pushed on Jul 18, 2020

Add a wtxid-index to the mempool 2b4b90aa8f

Add wtxid to mempool unbroadcast tracking c7eb6b4f1f

Just pass a hash to AddInventoryKnown

Since it's only used for transactions, there's no need to pass in an inv type.

60f0acda71

Add a wtxid-index to mapRelay 08b39955ec

Add wtxid-index to orphan map 85c78d54af

Add wtxids of confirmed transactions to bloom filter

This is in preparation for wtxid-based invs (we need to be able to tell whether
we AlreadyHave() a transaction based on either txid or wtxid).

This also double the size of the bloom filter, which is overkill, but still
uses a manageable amount of memory.

144c385820

Add wtxids to recentRejects instead of txids

Previously, we only added txids to recentRejects if we were sure that the
transaction couldn't have had the wrong witness (either because the witness was
malleated or stripped).

In preparation for wtxid-based relay, we can observe that txid == wtxid for
transactions that have no witness, and add the wtxid of rejected transactions,
provided the transaction wasn't a witness-stripped one. This means that we now
add more data to the filter (as prior to this commit, any transaction with a
witness that failed to be accepted was being skipped for inclusion in the
filter) but witness malleation should still not interfere with relay of a valid
segwit transaction, because the txid of a segwit transaction would not be added
to the filter after failing validation.

In the future, having wtxids in the recent rejects filter will allow us to
skip downloading the same wtxid multiple times, once our peers use wtxids for
transaction relay.

8e68fc246d

Add support for tx-relay via wtxid

This adds a field to CNodeState that tracks whether to relay transactions with
that peer via wtxid, instead of txid. As of this commit the field will always
be false, but in a later commit we will add a way to negotiate turning this on
via p2p messages exchanged with the peer.

ac88e2eb61

ignore non-wtxidrelay compliant invs 2d282e0cba

Add p2p message "wtxidrelay"

When sent to and received from a given peer, enables using wtxid's for
announcing and fetching transactions with that peer.

46d78d47de

Delay getdata requests from peers using txid-based relay

Using both txid and wtxid-based relay with peers means that we could sometimes
download the same transaction twice, if announced via two different hashes from
different peers.

Use a heuristic of delaying txid-peer-getdata requests by 2 seconds, if we have
at least one wtxid-based peer.

97141ca442

Make TX_WITNESS_STRIPPED its own rejection reason

Previously, TX_WITNESS_MUTATED could be returned during transaction validation
for either transactions that had a witness that was non-standard, or for
transactions that had no witness but were invalid due to segwit validation
rules.

However, for txid/wtxid-relay considerations, net_processing distinguishes the
witness stripped case separately, because it affects whether a wtxid should be
able to be added to the reject filter. It is safe to add the wtxid of a
witness-mutated transaction to the filter (as that wtxid shouldn't collide with
the txid, and hence it wouldn't interfere with transaction relay from
txid-relay peers), but it is not safe to add the wtxid (== txid) of a
witness-stripped transaction to the filter, because that would interfere with
relay of another transaction with the same txid (but different wtxid) when
relaying from txid-relay peers.

Also updates the comment explaining this logic, and explaining that we can get
rid of this complexity once there's a sufficient deployment of wtxid-relaying
peers on the network.

4eb515574e

Rename AddInventoryKnown() to AddKnownTx() dd78d1d641

test: Update test framework p2p protocol version to 70016

This new p2p protocol version allows to use WTXIDs for tx relay.

9a5392fdf6

test: Add tests for wtxid tx relay in segwit test

Also cleans up some doublicate lines in the rest of the test.

co-authored-by: Anthony Towns <aj@erisian.com.au>

8d8099e97a

test: Use wtxid relay generally in functional tests cacd85209e

Disconnect peers sending wtxidrelay message after VERACK 0e20cfedb7

Further improve comments around recentRejects 0a4f1422cd

sdaftuar force-pushed on Jul 19, 2020

DrahtBot removed the label Needs rebase on Jul 19, 2020

adamjonas commented at 1:26 pm on July 19, 2020: member

To summarize the state of review before rebase of #19109 and #19474:

concept ACK - jnewbery, TheBlueMatt, rajarshimaitra code review ACK - sipa, ariard, jonatack utACK - ajtowns, naumenkogs, instagibbs

sdaftuar commented at 9:24 pm on July 19, 2020: member

When I rebased this, I had to think about the interaction between #19109 and wtxid-based relay. One change I made to this PR is to add both the txid and wtxid of an announced transaction to the filterInventoryKnown for a peer; this prevents clogging the filter with txid entries for that were already announced via wtxid when child transactions are relayed.

Also I found a bug with insertions into mapRelay; the intention was to add lookups by txid and wtxid whenever a transaction is relayed, but I think for wtxid-relay peers we were attempting to insert a lookup by wtxid twice instead.

I also took this opportunity to address a few outstanding nits.

in src/net_processing.cpp:2911 in 8e68fc246d outdated

2907@@ -2908,14 +2908,15 @@ void ProcessMessage(
2908                 // We will continue to reject this tx since it has rejected
2909                 // parents so avoid re-requesting it from other peers.
2910                 recentRejects->insert(tx.GetHash());
2911+                recentRejects->insert(tx.GetWitnessHash());

ariard commented at 0:38 am on July 21, 2020:

We set fRejectedParents based on parent’s txid in recentRejects. If parent has been previously announced and rejected based on wtxid only we won’t found it. So we won’t add this orphan and may fetch it again. I don’t see this case covered in new test ?

ariard commented at 1:04 am on July 21, 2020:

Also if we fix it, and parent txA is rejected due to a mutated witness, child txB with a valid witness will be pinned in recentRejects. If parent txA is re-announced and accepted with a new witness, propagation of child txB will still fail until next block, assuming there is a unique valid witness known. So in multi-party, competing scenario, you introduce a slight delay vector. I think that’s okay if an attacker can’t repeat this game at each block.

sdaftuar commented at 9:02 pm on July 21, 2020:

We set fRejectedParents based on parent’s txid in recentRejects. If parent has been previously announced and rejected based on wtxid only we won’t found it. So we won’t add this orphan and may fetch it again. I don’t see this case covered in new test ?

Yes, this is an unfortunate problem, but I don’t see a way to fix the redownloading of transactions with failed parent without implementing a new p2p protocol to support efficiently communicating the unconfirmed ancestor wtxid’s of a given transaction (in order to know whether to reprocess a given transaction).

Note that before this PR, we have this same problem not just for child transactions of a reject parent, but for all transactions that have witnesses which fail our policy checks.

sipa commented at 11:15 pm on July 21, 2020:

In theory we could distinguish whether failure is due to a definitely-not-witness-related reason, and in that case, add both the txid and wtxid to the recentRejects filter. I don’t think this accomplishes much, because it won’t occur in non-adverserial scenarios, and in adverserial ones, the attacker can just use invalid witnesses if he wants to cause harm with invalid transactions.

Edit: seems the code is already doing that for transactions spending rejected ones.

sdaftuar commented at 1:43 am on July 22, 2020:

I think we could try to special case the common scenario that we expect to happen at some point: some transaction spending a segwit version 1 output is rejected by (older) software as nonstandard, and then child transactions are relayed and redownloaded from each upgraded peer that announces it because the parent transaction’s txid wouldn’t be found in the reject filter. We could improve this behavior by treating failures in AreInputsStandard() as indicative of the txid being definitely bad, and therefore add the txid to the reject filter (even though the transaction has a witness).

I think my preference would be to defer this kind of somewhat-tricky-to-reason-about change to a future PR, since this PR seems ready to me and is already an improvement, but if others feel differently we should discuss.

ariard commented at 0:59 am on July 23, 2020:

This issue is only a bandwidth waste and this PR is already a huge improvement on this side, that’s all good to defer this to future work, orphan parent handling is already an area known to be improved.

in src/txmempool.h:744 in c7eb6b4f1f outdated

741+    void AddUnbroadcastTx(const uint256& txid, const uint256& wtxid) {
742         LOCK(cs);
743         // Sanity Check: the transaction should also be in the mempool
744         if (exists(txid)) {
745-            m_unbroadcast_txids.insert(txid);
746+            m_unbroadcast_txids[txid] = wtxid;

ariard commented at 4:27 pm on July 21, 2020:

I think std::map::operator[] returns a reference to the value if the key already exists ? You might broadcast a transaction with different witnesses and ideally you should track all of them. I don’t know any application relying on this behavior though so maybe not worthy to fix.

sdaftuar commented at 8:58 pm on July 21, 2020:

Leaving this as-is. There are a whole host of other places in our code that would need to change in order to support tracking of multiple transactions with the same txid (wallet, relay logic, etc).

naumenkogs commented at 8:52 am on July 22, 2020: member

utACK 0a4f1422cd1c20e12a05d7ff1a2ef1d5e7c654bb

laanwj commented at 6:58 pm on July 22, 2020: member

utACK 0a4f1422cd1c20e12a05d7ff1a2ef1d5e7c654bb

laanwj merged this on Jul 22, 2020

laanwj closed this on Jul 22, 2020

sipa commented at 7:09 pm on July 22, 2020: member

Posthumous re-ACK 0a4f1422cd1c20e12a05d7ff1a2ef1d5e7c654bb

Some follow-ups I’d like:

List BIP339 support in doc/bips.md
Re-enable fetching of orphans from wtxid peers (e.g. https://github.com/bitcoin/bitcoin/pull/18044/files#r450687076, see https://github.com/ajtowns/bitcoin/commits/202007-wtxid-orphans).
Possibly streamline things a bit by having an explicit pair type (bool is_wtxid, uint256 txid_or_wtxid) - possibly just for tx requesting like above, or for more places (finding in the mempool and lookups in various structures would be cleaner with this as well, I think).

I’ll work on that if nobody does it instead, as it’ll be hard to rebase #19184 on top of this otherwise.

JeremyRubin commented at 8:21 pm on July 22, 2020: contributor

Can the hashes get a typed wrapper so that the unit256’s are e.g. subclassed to WTXID and TXID?

sipa commented at 8:23 pm on July 22, 2020: member

@JeremyRubin I think that’s hard, because a portion of the code relies on the fact that the wtxid of a non-witness tx is also its txid.

laanwj removed this from the "Blockers" column in a project

ariard commented at 1:04 am on July 23, 2020: member

Posthumous Code Review ACK 0a4f142

Another follow-up:

update tracking of multiple transactions with same txid, especially BroadcastTransaction, see #18044 (review)

sidhujag referenced this in commit bdb259705d on Jul 24, 2020

in src/txmempool.h:580 in 0a4f1422cd

577-    std::set<uint256> m_unbroadcast_txids GUARDED_BY(cs);
578+    /**
579+     * track locally submitted transactions to periodically retry initial broadcast
580+     * map of txid -> wtxid
581+     */
582+    std::map<uint256, uint256> m_unbroadcast_txids GUARDED_BY(cs);

jnewbery commented at 8:07 am on July 26, 2020:

I don’t think this change is necessary (or desirable). I guess it was prompted by this comment: #18038 (review) , but storing this as a map from txid->wtxid is unnecessary and confusing, since it obfuscates the purpose of this data, which is a set of transactions that may need to be rebroadcast.

The only place that the wtxid is read from this map is in ReattemptInitialBroadcast(), at which point we can fetch the transaction from the mempool (we very nearly do that with m_mempool.exists()), and get the wtxid from there.

cc @amitiuttarwar @ariard

ariard commented at 4:23 pm on July 26, 2020:

My mistake here, effectively you need to announce transactions from the unbroadcast set by wtxid to upgraded peers but as you point out it doesn’t mean we need to cache them as a solution.

Maybe we should precise unbroadcast set semantic wrt same txid, different wtxids, we only guarantee reattempt to the best mempool candidate known for a given txid at the time we call ReattemptInitialBroadcast, not insertion (AddUnbroadcastTx). #18044 (review) isn’t an issue.

jnewbery commented at 7:45 am on July 27, 2020:

I really think this “same txid, different wtxid” thing is a complete red herring. The mempool can only ever be aware of one witness for a transaction, so any attempt to announce the transaction via a different wtxid would fail anyway. As Suhas pointed out earlier (https://github.com/bitcoin/bitcoin/pull/18044#discussion_r458383811), to support tracking multiple witnesses we’d need to make significant changes many of our components. Besides, the unbroadcast set is transactions that we created ourselves, so we’re not expecting the witnesses to change.

sipa commented at 8:02 am on July 27, 2020:

Agree with @jnewbery. There is no real way that we can reasonable start tracking multiple witnesses for the same transaction (either in the mempool or the wallet, …), so unless this map’s storing of wtxid is needed for efficiency, it seems just a set of txid should be sufficient here.

amitiuttarwar commented at 5:22 am on July 28, 2020:

😮 wow that’s a great point @jnewbery. thank you!!

I thought during implementation I ran into something that prevented me from exclusively having txids on the set, but that evaluation is simply wrong. I’ve taken a first pass at reverting unbroadcast to a set here: https://github.com/amitiuttarwar/bitcoin/commit/f51cccdd6c97d1b1e45163d56ebd6b7d29a2f587. I’ll need to revisit with fresh eyes and then will PR.

I agree set makes more sense than a map- its simpler, communicates the intent better, and there’s no noticeable efficiency gain with the map. Since we check the transaction is the mempool, the difference between map vs set is the difference between calling CTxMemPool::get vs CTxMemPool::exists. This boils down to mapTx.find(hash) vs mapTx.count(hash), which both have log(n) complexity (according to the boost docs I found).

JeremyRubin commented at 5:46 am on July 28, 2020:

@amitiuttarwar mapTX is a multiindex and we use a hashed index not a ordered index so it’s O(1) lookup. Not sure exactly how mapTX relates to m_unboardcast_txids though…

jnewbery commented at 6:56 am on July 28, 2020:

@amitiuttarwar that patch looks correct from a quick skim. I’ll review fully once you open a PR.

As @JeremyRubin points out, we’re using a boost::multi_index::hashed_unique index for the wtxid index, which has constant time search/retrieval.

amitiuttarwar commented at 10:29 pm on September 4, 2020:

reverted to set in https://github.com/bitcoin/bitcoin/pull/19879/commits/cb79b9dbf4cd06e17c8c65b36bf15c3ea2641de4, #19879

in src/net_processing.cpp:1476 in 0a4f1422cd

1474-    connman.ForEachNode([&txid](CNode* pnode)
1475+    connman.ForEachNode([&txid, &wtxid](CNode* pnode)
1476     {
1477-        pnode->PushTxInventory(txid);
1478+        AssertLockHeld(cs_main);
1479+        CNodeState &state = *State(pnode->GetId());

jnewbery commented at 8:15 am on July 26, 2020:

We shouldn’t be indiscriminately taking the address of a return value which may be nullptr. I know the same pattern exists in a few other places, but really we should check for existence here before dereferencing the pointer.

amitiuttarwar commented at 10:28 pm on September 4, 2020:

addressed in https://github.com/bitcoin/bitcoin/pull/19879/commits/fc66d0a65cdc52a3b259effe0c29b5eafb1b5ff5, #19879

laanwj referenced this in commit 17de75b028 on Jul 30, 2020

sidhujag referenced this in commit fc7030c8fe on Jul 31, 2020

in src/net_processing.cpp:3025 in 0a4f1422cd

3024         } else {
3025-            if (!tx.HasWitness() && state.GetResult() != TxValidationResult::TX_WITNESS_MUTATED) {
3026-                // Do not use rejection cache for witness transactions or
3027-                // witness-stripped transactions, as they can have been malleated.
3028-                // See https://github.com/bitcoin/bitcoin/issues/8279 for details.
3029+            if (state.GetResult() != TxValidationResult::TX_WITNESS_STRIPPED) {

jnewbery commented at 10:27 am on July 31, 2020:

The else if (tx.HasWitness() && RecursiveDynamicUsage(*ptx) < 100000) block below is now dead code and should be removed (anything that doesn’t go in this if branch is TX_WITNESS_STRIPPED and therefore doesn’t have a witness)

amitiuttarwar commented at 10:29 pm on September 4, 2020:

addressed in: https://github.com/bitcoin/bitcoin/pull/19879/commits/125c0381266e0e05a408f8e1818501ab73d29110, #19879

fanquake referenced this in commit f7c73b03d9 on Jul 31, 2020

sidhujag referenced this in commit 1c6711468d on Jul 31, 2020

in src/validation.cpp:5091 in c7eb6b4f1f outdated

5087         try {
5088-          std::set<uint256> unbroadcast_txids;
5089           file >> unbroadcast_txids;
5090           unbroadcast = unbroadcast_txids.size();
5091-
5092-          for (const auto& txid : unbroadcast_txids) {

JeremyRubin commented at 8:28 pm on July 31, 2020:

Clarification: does moving this out of the try block have any behavior change? Is it possible that a semi-corrupt unbroadcast_txids would partially deserialize and then have only some entries?

amitiuttarwar commented at 0:34 am on August 25, 2020:

the try block was introduced to specifically catch the error when upgrading around not having an unbroadcast set written to disk. so I think having the initialization & iteration separate actually captures that intent better.

but lmk if you think of any specific ways this logic could be problematic

in src/validation.cpp:5091 in c7eb6b4f1f outdated

5091-
5092-          for (const auto& txid : unbroadcast_txids) {
5093-            pool.AddUnbroadcastTx(txid);
5094-          }
5095         } catch (const std::exception&) {
5096           // mempool.dat files created prior to v0.21 will not have an

JeremyRubin commented at 8:29 pm on July 31, 2020:

Suggest: clearing unbroadcast_txids here in case has garbage data.

amitiuttarwar commented at 0:53 am on August 25, 2020:

looking into this- is there really a way that the CAutoFile >> operator could cause a silent/partial failure? as I mentioned previously the idea of this try catch is just for a smooth upgrade. To elaborate further: so we don’t print Failed to deserialize mempool data on disk: %s. Continuing anyway." when everything actually went fine. the current intention is to remove this in 0.22, so if there’s a possibility for unbroadcast_txids to have garbage data written, I’d like to think this through more carefully to find a long term solution.

JeremyRubin commented at 1:12 am on August 25, 2020:

Yes I believe so? Fortunately for std::map we deserialize an element and then insert it into the map, but is a possibility that we say that there should be 10 elements and there are only 5 elements, or there are 5 elements and their are actually 10. In the event the file is corrupt, we should likely treat it as if the entire thing is corrupt, rather than continuing to process.

Perhaps we should add an exception to throw from Unserialize if there are not the correct amount of entries?

I’m not sure to the degree we need to worry about files on our own local being corrupt, and there are other forms of corruption that can occur. But because it is a behavior change, I’m noting it.

JeremyRubin commented at 1:14 am on August 25, 2020:

to be clear the issue only comes up when you don’t have enough elements or when you have a half element presently.

in src/validation.cpp:5096 in c7eb6b4f1f outdated

5097           // unbroadcast set. No need to log a failure if parsing fails here.
5098         }
5099-
5100+        for (const auto& elem : unbroadcast_txids) {
5101+            // Don't add unbroadcast transactions that didn't get back into the
5102+            // mempool.

JeremyRubin commented at 8:32 pm on July 31, 2020:

Is this correct? Should this transaction get queued somewhere for rebroadcasting? Or is it a known precondition that because it failed to get into the mempool before this line it will fail again?

JeremyRubin commented at 8:33 pm on July 31, 2020:

I can’t think of why we’d want it in the unbroadcast if it didn’t make it back in the mempool, but just double checking.

amitiuttarwar commented at 0:46 am on August 25, 2020:

the unbroadcast set should always be a subset of the mempool. it just maintains txids, so we don’t have the capability to do much if its not found.

Should this transaction get queued somewhere for rebroadcasting?

do you mean re-attempt mempool submission? bc we def wouldn’t want to broadcast or rebroadcast a transaction we don’t have in our mempool!

JeremyRubin commented at 1:21 am on August 25, 2020:

yes. it’s possible that we may want to resubmit it to the mempool if it was previously in our mempool. What changed to cause it to no longer be accepted?

in src/net_processing.cpp:4250 in 08b39955ec outdated

4245@@ -4246,6 +4246,11 @@ bool PeerLogicValidation::SendMessages(CNode* pto)
4246                             if (ret.second) {
4247                                 vRelayExpiration.push_back(std::make_pair(nNow + std::chrono::microseconds{RELAY_TX_CACHE_TIME}.count(), ret.first));
4248                             }
4249+                            // Add wtxid-based lookup into mapRelay as well, so that peers can request by wtxid
4250+                            auto ret2 = mapRelay.emplace(ret.first->second->GetWitnessHash(), ret.first->second);

JeremyRubin commented at 8:45 pm on July 31, 2020:

This line relies on the idea that it is cryptographically hard to find a TXID such that another WTXID (for a different txid) is the same. It’s not clear to me that this is provably true because there may be serialization ambiguity.

My preference would be to have two different mapRelays which have a strongly typed key.

It’s probably low risk, but maybe worth documenting or at least somehow guaranteeing there can never be ambiguity. I don’t think there can be? Because

0Definition of txid remains unchanged: the double SHA256 of the traditional serialization format:
1
2  [nVersion][txins][txouts][nLockTime]
3  
4
5A new wtxid is defined: the double SHA256 of the new serialization with witness data:
6
7  [nVersion][marker][flag][txins][txouts][witness][nLockTime]

but it’s not clear to me that with some setting of marker and flag you could cause an overlap.

JeremyRubin commented at 9:20 pm on July 31, 2020:

confirmed that not exists tx tx2. txid(tx) == wtxid(tx2) && tx != tx2

 0[7/31/20 14:09] <jeremyrubin> sipa: is this proven: not exists tx tx2. txid(tx) == wtxid(tx2) && tx != tx2
 1[7/31/20 14:10] <sipa> jeremyrubin: yes, assuming double-sha256 collision resistance that statement is equivalent to ser_without_witness(tx) == ser_with_witness(tx2)
 2[7/31/20 14:11] <sipa> either both tx and tx2 don't have a witness, and the statement is obviously false
 3[7/31/20 14:11] <sipa> or tx2 has a witness, in which case the serialization definitely differs, due to the flag byte
 4[7/31/20 14:13] <jeremyrubin> but it's not possible in tx2 that the flag + marker can be read as the length field for the txins?
 5[7/31/20 14:13] <jeremyrubin> because a leading 0 byte means it can't be a length right?
 6[7/31/20 14:13] <jeremyrubin> in which case it would be marker and not flag that guarantees uniqueness?
 7[7/31/20 14:15] <sipa> a valid transactions cannot have 0 inputs
 8[7/31/20 14:15] <sipa> the witness serialization is intended to be unambiguously different from the old serialization
 9[7/31/20 14:15] <sipa> (except that we later discovered that people do pass around transactions with 0 inputs, where things break... but that's not relevant for valid network transactions)(
10[7/31/20 14:17] <jeremyrubin> gotcha. and the unambiguity is because it's a compactsize field preceding the vIn
11[7/31/20 14:18] <jeremyrubin> so 0 is always read as a 1 byte number in any context
12[7/31/20 14:18] <-- proofofkeags (~proofofke@174-29-8-246.hlrn.qwest.net) has left this server (Remote host closed the connection).
13[7/31/20 14:19] <sipa> indeed

in src/net_processing.cpp:153 in 85c78d54af outdated

150@@ -151,6 +151,7 @@ struct COrphanTx {
151 };
152 RecursiveMutex g_cs_orphans;
153 std::map<uint256, COrphanTx> mapOrphanTransactions GUARDED_BY(g_cs_orphans);

JeremyRubin commented at 8:51 pm on July 31, 2020:

Because presumably WTXID becomes the primary lookup method, someone may prefer to in the future switch the roles of mapOrphanTransactions and g_orphans_by_wtxid so that the g_orphans_by_wtxid is the owning map. This is a small performance win avoiding an extra iterator lookup.

JeremyRubin commented at 9:06 pm on July 31, 2020: contributor

Post merge review up to ac88e2eb619821ad7ae1d45d4b40be69051d3999

Couple questions

in test/functional/p2p_segwit.py:1300 in 8d8099e97a outdated

1320@@ -1297,8 +1321,6 @@ def test_tx_relay_after_segwit_activation(self):
1321         # Node will not be blinded to the transaction
1322         self.std_node.announce_tx_and_wait_for_getdata(tx3)
1323         test_transaction_acceptance(self.nodes[1], self.std_node, tx3, True, False, 'tx-size')
1324-        self.std_node.announce_tx_and_wait_for_getdata(tx3)

instagibbs commented at 1:03 pm on August 2, 2020:

I got really confused reviewing master’s test. The surrounding test is dead, with incorrect comments asserting that we are checking for blinding. Seems this was deleted just to make the test pass :grimacing:

instagibbs referenced this in commit bcf5d91568 on Aug 3, 2020

fanquake referenced this in commit 1c4f59728c on Sep 15, 2020

sidhujag referenced this in commit aae58c49b8 on Sep 16, 2020

laanwj referenced this in commit c2c4dbaebd on Oct 14, 2020

sidhujag referenced this in commit 7c94367064 on Oct 16, 2020

laanwj referenced this in commit a339289c2e on Nov 5, 2020

deadalnix referenced this in commit 145650dc37 on Nov 24, 2020

deadalnix referenced this in commit 8ea2e27baa on Sep 22, 2021

DrahtBot locked this on Feb 15, 2022

Use wtxid for transaction relay #18044

Conflicts