Halt processing of unrequested transactions v2

ariard commented at 10:27 am on August 2, 2024: contributor

This is a re-work of #21224.

Motivation is still the same to limit the DoS exposure in face of some DoSy behaviors from tx-relay peers.

Difference w.r.t 21224 is adding a protocol upgrade (REJECT_UNSOLICITED_TX_VERSION) for the rejection of unsolicited txn solely to happen for new upgraded peers. That way no disruption of all the undocumented clients and node currently participating in the transaction-relay network.

Opening as a draft - There should be indeed test, there a bunch of old on 21224 branch.

Add REJECT_UNSOLICITED_TX_VERSION version

This commit does not upgrade the negotiated PROTOCOL_VERSION.

fd5abfaad8

DrahtBot commented at 10:27 am on August 2, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30572.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept NACK	vasild

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#30110 (refactor: TxDownloadManager + fuzzing by glozow)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

ariard marked this as a draft on Aug 2, 2024

DrahtBot added the label CI failed on Aug 2, 2024

DrahtBot commented at 12:16 pm on August 2, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/28260785704

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

glozow added the label P2P on Aug 2, 2024

gmaxwell commented at 4:33 pm on August 3, 2024: contributor

The search is being done for every received transaction, even when code unconditionally won’t reject it due to the protocol versions. This could be very simply avoided by moving the “!pfrom.HasPermission(NetPermissionFlags::Relay) && pfrom.GetCommonVersion() >= REJECT_UNSOLICITED_TX_VERSION” test up to surround the expected.

Alternatively, the search could also be gated on NET LOGGING enabled (e.g. net logging || (version && no permission)), and then it could log the unexpected TXN even if it will not ignore it (the disposition could be in the log message). This would make it easier to monitor the behavior (when I used to monitor for this I carried a local patch).

Given that the check now only applies to a newer protocol version could it be made to disconnect the peer? This may cause a more rapid discovery of bugs in users code since being disconnected is much more obvious than having the txn silently dropped.

I’m somewhat dubious of the value of this behind a version check: the abusive hosts that intentionally do this will just continue to be an older version and I don’t think it’s realistic to cut off older versions completely. But I think this PR is vastly superior to doing nothing, and perhaps once the new behavior is widely deployed the test could be extended to older versions (presumably in ignore rather than disconnect form, if this is changed to disconnect).

gmaxwell commented at 7:27 pm on August 3, 2024: contributor

Having gone and looked at the old discussion, I see some skepticism for the value of this and I don’t see anyone pointing out the reason I think it’s important.

At times there have been parties that mass connect to every node on the network and blast them with transactions, presumably in an effort to make sure the transactions get propagated and accepted. This is abusive particularly because it wastes bandwidth when it gives nodes transactions they already know or have already requested from someone else. It’s also abusive because it wastes connection slots… a mass connector uses thousands of times the slot capacity of an ordinary nodes. If the conduct doesn’t work (or just gets the peer disconnected) then it will be of no value and will likely stop (or not be started again in the future, if it’s not happening right now).

The same issue also previously existed for full blocks with some misguided miners blindly spamming every node they could connect with a full block every time they solved one, effectively DOSing the whole network (and probably working contrary to their intent). The best and often only way to correct conduct like this is to just make it not even conjecturally provide a benefit.

Perhaps one way to hasten the deployment is to enforce this on P2Pv2 connections immediately in addition to or instead of the protocol version check, custom announcement code isn’t likely to use v2 yet. As far as I know all widely deployed software already does the right thing.

mzumsande commented at 1:08 am on August 4, 2024: contributor

Just wanted to mention that this change does not go well together with #29415 which suggests for bitcoin core to send unrequested transaction in the new “private broadcast” mode.

gmaxwell commented at 6:07 am on August 4, 2024: contributor

@mzumsande Good catch, but that’s just a flaw in PR29415– a gratuitous protocol violation, since the only cost of complying with the protocol is some tens of milliseconds of round trip. This is sort of error that wouldn’t be made if nodes enforced the behavior by ignoring unsolicited transactions.

ariard commented at 6:49 pm on August 5, 2024: contributor

Indeed, preventing mass connection as it gives an edge in some other attacks is one of the motivation behind this change. This is distinct from limiting CPU DoS (whatever the DoS margin already existent from MAX_PEER_TX_ANNOUNCEMENT at full steam, reducing that maximum number is irrelevant if you can bypass it with unsolicited txn) and other p2p DoS.

Pushed a new commit to discourage and disconnect upgraded peers (>= REJECT_UNSOLICITED_TX_VERSION), though non upgraded peers are non affected. There is still the risk to be over strict with deployed softwares not doing the right things, i.e there is a lot of weird behaviors in other p2p bitcoin stacks, breaking the tx broadcast of some obscure lightning wallet ain’t a small bug. On the other hand, being liberal with non upgraded peers all the DoS vectors can be still be abused. A better approach could be to revert ae60d485, i.e gradual increase of the discouragement score were non upgraded peers would be allow a number of score points before to be disconnected (e.g for each unsolicited tx 10 discouragement score points bump).

W.r.t “private broadcast” mode, I think it’s better to have it fixed to follow the INV / GETDATA / TX flow, otherwise it sounds a loophole for an attacker to trigger DoS / connection slots mass waste if “private peers” are not punished for unsolicited traffic. It’s better to do it now, rather to introduce a node fingerprint between hypothetical version of “private broadcast” mode.

in src/txrequest.cpp:680 in 83c9cd8d3d outdated

674@@ -675,6 +675,19 @@ class TxRequestTracker::Impl {
675         if (it != m_index.get<ByPeer>().end()) MakeCompleted(m_index.project<ByTxHash>(it));
676     }
677 
678+    bool ExpectedTx(NodeId peer, const uint256& txhash)
679+    {
680+        auto it_hash = m_index.get<ByTxHash>().lower_bound(ByTxHashView{txhash, State::REQUESTED, 0});
681+        // We need to traverse all the REQUEST / COMPLETED announcements to verify we effectively

sipa commented at 7:23 pm on August 5, 2024:

You can do this more efficiently by looking for ByPeerView{peer, false, txhash}, I think (as opposed to iterating)?

ajtowns commented at 0:46 am on August 15, 2024:

ReceivedResponse already does this, could just have it return true in that case, and false otherwise (instead of void), and issue the error if neither the txid or wtxid resulted in a true result?

ariard commented at 7:23 am on August 17, 2024:

The ExpectedTx have been updated to look by peer {peer, false, txhash} rather than iterating as we’re asking information for a given peer. I’ve not changed ReceiveResponse function call type, to avoid having unused value compiler errors, neither benchmarked yet to compare more the 2 approach.

gmaxwell commented at 5:23 pm on August 6, 2024: contributor

gradual increase of the discouragement score were non upgraded peers would be allow a number of score points before to be disconnected

That’s like not-found, – an idea that sounds attractive on it’s face but on examination causes problems. For behavior that can’t be prevented (like, say, inv-ing a transaction to you that already know), then it should obviously never disconnect for that. For behavior that can be prevented like sending a txn before an inv, sending an invalid block, etc.., peers are either compatible or they’re not. If they’re not you want them disconnected ASAP so they can each replace the connection with a compatible peer, and if one is buggy you want the operator finding out ASAP too, ideally as soon as they deploy the buggy code. letting something rack up a dos score hides the issue temporarily but then under different transaction load its suddenly getting disconnected, but since nothing changed for the user they may not be paying much attention.

Is anyone running this currently with the code set to always enforce or just log? it would be useful to know at what rate nodes are currently seeing unsolicited. In the past, I’d see waves of them… months with absolutely none and then periods where a single IP was mass connecting the whole network and bursting them out. I don’t think I ever observed any widely deployed software doing this, though it easily could have changed. I’m particularly interested if any protocol v2 nodes engage in it, since I think that’s a viable way to immediately enforce without breaking anything.

in src/net_processing.cpp:4561 in 5d86b2bf51 outdated

4553@@ -4554,6 +4554,19 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
4554 
4555         LOCK2(cs_main, m_tx_download_mutex);
4556 
4557+        // Do not process unrequested transactions to mitigate potential DoS risks.
4558+        // We check both identifiers as txid mode may happen with wtxidrelay peers
4559+        // due to parent-orphan fetching.
4560+        if (!pfrom.HasPermission(NetPermissionFlags::Relay) && pfrom.GetCommonVersion() >= REJECT_UNSOLICITED_TX_VERSION) {
4561+            bool is_expected = tx.HasWitness() ? m_txrequest.ExpectedTx(pfrom.GetId(), wtxid) ||

gmaxwell commented at 5:26 pm on August 6, 2024:

trailing whitespace, btw.

ariard commented at 7:19 am on August 17, 2024:

fixed on latest push - good to have people not relying on github for code review.

maflcko commented at 7:53 am on August 7, 2024: member

I’m particularly interested if any protocol v2 nodes engage in it, since I think that’s a viable way to immediately enforce without breaking anything.

Just for context, there is an idea for a “BIP324 proxy” for legacy light clients: https://delvingbitcoin.org/t/bip324-proxy-easy-integration-of-v2-transport-protocol-for-light-clients-poc/678 . However, it is experimental, only works for outbound connections, requires the light client to be patched right now (and may have other design issues). Just wanted to mention it for context, because a legacy light client behind a BIP324 proxy will likely not be visible on the network today, but theoretically, it may be in the future. In any case, it seems reasonable to assume anyone opting to run the proxy first made sure that their software does not send unrequested transactions. (If they didn’t, they’ll hopefully notice the relay issue quickly)

andrewtoth commented at 2:16 pm on August 7, 2024: contributor

See also https://github.com/alfred-hodler/pushtx/ which sends unsolicited TXs.

gmaxwell commented at 3:00 pm on August 7, 2024: contributor

This PR exposes that nodes are currently making duplicate transaction requests not too infrequently. ::facepalm:: What happens is that when Peer1 announces {parent,child} and Peer2 announces {parent,child} and we request parent from 1 and child from 2, but 2 responds faster, then orphan handling will request parent by txid from 2. Then it gets the parent by txid from 2. Then when peer 1 replies to the request for parent (sending a duplicate transaction), the current behavior of this PR falsely thinks that was an unrequested txn because its tracking was removed after the transaction was from peer 2 was accepted.

For this PR it’s probably just best to silently ignore AlreadyHave transactions (more precise handling of things that are requested but outstanding would be more correct but requires care in memory management), but the duplicate requests should separately get fixed. At first blush it seems hard to fix them completely but they could be made less common by significantly delaying the requests made on behalf of the orphan resolution, so they get satisfied by the already in flight request for the same txn by wtxid when it comes in… which should almost always be the case outside of a recent startup except when the parent was never offered because it was below the nodes fee filter.

mzumsande commented at 8:40 pm on August 7, 2024: contributor

This PR exposes that nodes are currently making duplicate transaction requests not too infrequently. (…)

I think that this will also happen with just one peer and chains of transactions:

1.) orphan X is received, put into orphanage, parent P requested 2.) a child Y of X is received -> It’s also an orphan, parent X is requested again (by txid!) because AlreadyHaveTx does not check the orphanage by txid (reason) 3.) P is received and accepted 4.) orphan resolution accepts X, removes the tx request for X (in ProcessValidTx()) even though there is an outstanding request 5.) X is received again, it’s seen as unrequested. The peer would be disconnected even though it did nothing wrong.

For this PR it’s probably just best to silently ignore AlreadyHave transactions

+1

gmaxwell commented at 6:43 pm on August 10, 2024: contributor

@mzumsande Good point– though the logic in the comment ought not apply when the transaction in the orphan pool comes from the same peer, in that case the extra request would be purely redundant.

vasild commented at 12:12 pm on August 13, 2024: contributor

INV+GETDATA is now implemented in #29415 so that it does not stand in the way of this PR, but:

#21224 was frowned upon for a few reasons and one of them was breaking existent tools. This PR addresses just that by introducing a new protocol version. It does not address the other concerns.

On the new protocol idea - we don’t expect that an attacker will voluntarily switch to the new protocol so that their attack can be thwarted, right? If this PR is merged and at some point in the future, we drop unsolicited transactions from old peers, then that is the same as going back to #21224, without this PR because that would break existent tools.

The OP mentions “undocumented clients” and in the discussions there seems to creep the idea that sending unsolicited transactions is a violation of the protocol, undocumented behavior, an error or wrong in some way without providing any reference to back those claims. On the contrary, the oppotite seems to be the case: sending unsolicited transactions is explicitly documented and even if it was not, in the absence of documentation then the source code is the spec and Bitcoin Core accepts unsolicited transactions.

Correct me if I am wrong but even if this is implemented, an attacker can send up to 5000 (MAX_PEER_TX_ANNOUNCEMENTS) announcements to which the victim will respond (after a few seconds) by requesting 5000 transactions at once. If one transaction takes 1 second to reject, then the attack is successful. The motivation behind MAX_PEER_TX_ANNOUNCEMENTS is to limit the amount of memory used for announcements. I guess it is too high to stop CPU DoS with slow-to-reject transactions… this is already repeating the arguments from #21224.

In summary, requiring INV+GETDATA does not solve the problem and breaks existent tools. Concept NACK because of this.

It would be better to throttle the processing of newly arrived transactions, globally and/or per-peer, regardless of whether they are solicited or unsolicited.

gmaxwell commented at 5:42 pm on August 13, 2024: contributor

@vasild you appear to have disregarded my above message about this change is desirable. Was it unclear?

With respect to breakage– it’s not uncommon that fixing bugs requires fixing bugs in other software. Fortunately, it’s usually possible to balance the issues. One way to do so is to phase in changes so that older software has an opportunity to fix itself, and once the fixed behavior is ubiquitous then the compatibility issue no longer exists. But if you over narrowly analyze the first step in such a transition you might mistakenly argue that it was pointless because the old behavior still exists.

vasild commented at 1:18 pm on August 14, 2024: contributor

@vasild you appear to have disregarded my above message about this change is desirable. Was it unclear?

You mean this one: “At times there have been parties that mass connect…”? If that would be the driving force behind this, then maybe it would be better to put it in the PR description.

So this is not about CPU DoS attack by slow-invalid transactions, but about somebody doing something stupid and using the Bitcoin network in an ineffective way, sending 500 bytes instead of 30? Requiring INV+GETDATA will probably stop this, but is it such a problem to be worth the effort? How often does it happen? Is it something that happened once 7 years ago, or is it happening regularly, e.g. every week? What % of the entire network traffic is wasted due to that per month?

You also mentioned wasting inbound slots, but requiring INV+GETDATA will not solve that - the mass connect tool will be updated to do INV+GETDATA+TX instead of just TX. Those are short-lived connections anyway.

fixing bugs requires fixing bugs in other software

I do not think that sending or accepting unsolicited transactions is a bug - it is documented, has been working like this forever and the network is just fine with it.

ajtowns commented at 0:31 am on August 15, 2024: contributor

but requiring INV+GETDATA will not solve that - the mass connect tool will be updated to do INV+GETDATA+TX instead of just TX

Sending INV and waiting for a GETDATA does mitigate the bandwidth involved substantially: when the txs are duplicates, you’re now only receiving the w/txid instead of the full transaction, reducing the wasted bandwidth by perhaps a factor of at least ~4x (32B vs ~128B for parent txid/sig/output), and probably substantially more. I think the inbound slots issue is just “if we know we don’t want this peer, lets discourage immediately so the slot is free”, so an optimisation rather than addressing a DoS concern per se.

vasild commented at 6:35 am on August 15, 2024: contributor

Another aspect to consider wrt wasting bandwidth: if the sender knows the recipient does not have the transaction (e.g. when a wallet sends its newly created transaction), then doing the extra INV+GETDATA round-trip is a waste.

sipa commented at 10:55 am on August 15, 2024: member

@vasild Yes, but that gratuitously reveals to its peers that it is the transaction originator too (not really an issue in your private broadcast setting, but in nearly all others, this would be undesirable for that reason).

[p2p]: Stop unrequested transactions processing

To mitigate potential DoS risks, stop unrequested transaction
processing early for upgraded peers (`REJECT_UNSOLICITED_TX_VERSION`),
unless the peer has `NetPermissionsFlags::Relay`.

Unrequested transactions is defined as one for which we have not
issued GETDATAs on this connection link.

32e6bc8964

[draft] disconnect upgraded peers b29303719b

[perf] make is_expected check conditional dd0d06d3ea

[perf] iterate by peer rather than status c7bc444d02

ariard force-pushed on Aug 17, 2024

ariard commented at 7:19 am on August 17, 2024: contributor

Addressed all code comments at c7bc444d02

—————————————— @gmaxwell

That’s like not-found, – an idea that sounds attractive on it’s face but on examination causes problems. For behavior that can’t be prevented (like, say, inv-ing a transaction to you that already know), then it should obviously never disconnect for that. For behavior that can be prevented like sending a txn before an inv, sending an invalid block, etc.., peers are either compatible or they’re not. If they’re not you want them disconnected ASAP so they can each replace the connection with a compatible peer, and if one is buggy you want the operator finding out ASAP too, ideally as soon as they deploy the buggy code. letting something rack up a dos score hides the issue temporarily but then under different transaction load its suddenly getting disconnected, but since nothing changed for the user they may not be paying much attention.

Surely, in the ideal you would like your whole set of inbound / outbound peers to be fully composed of non-buggy peers, respecting behaviors like sending a txn before an inv or not sending an invalid block. In practice, a full-node is a network-exposed daemon, so it’s good to accomodate a wide variety of network peers to allow to roll out multiple versions of the protocol at the same time. On the internet, it’s not like we have 2 versions of the IP protocol used in production…

On the silent rack up you’re describing, this can be indeed an issue as suddenly you’re getting a buggy peer disconnected, while this connection slot could have been used for a reliable peer. I believe it’s more making the point that bad discouragement dos score could be encompassed in the automated inbound slot eviction heuristics.

Is anyone running this currently with the code set to always enforce or just log? it would be useful to know at what rate nodes are currently seeing unsolicited. In the past, I’d see waves of them… months with absolutely none and then periods where a single IP was mass connecting the whole network and bursting them out. I don’t think I ever observed any widely deployed software doing this, though it easily could have changed. I’m particularly interested if any protocol v2 nodes engage in it, since I think that’s a viable way to immediately enforce without breaking anything.

No, I’ve not been running a full-node with the old patch to monitor unsolicited transactions. As you’re pointing out whatever buggy lightweight clients doing that behavior, there is still the situation of a single IP mass connecting the whole network, and potentially doing so for deanonymization purposes.

In any way, as you’re suggesting I think too upgrading to a protocol v2 like TX_PROTOCOL_V2 can be a practical way to introduce this more restrictive behavior on inv-txn flow. So far other examples of p2p behavior deprecation have been things like NODE_BLOOM version bits. @maflcko

Just for context, there is an idea for a “BIP324 proxy” for legacy light clients: https://delvingbitcoin.org/t/bip324-proxy-easy-integration-of-v2-transport-protocol-for-light-clients-poc/678 . However, it is experimental, only works for outbound connections, requires the light client to be patched right now (and may have other design issues). Just wanted to mention it for context, because a legacy light client behind a BIP324 proxy will likely not be visible on the network today, but theoretically, it may be in the future. In any case, it seems reasonable to assume anyone opting to run the proxy first made sure that their software does not send unrequested transactions. (If they didn’t, they’ll hopefully notice the relay issue quickly)

Yes, I think that means if the bip324 proxy got more used by legacy light clients, version agent and protocol spoken it is harder to rely on to monitor solely on the legacy clients characteristics. @gmaxwell

This PR exposes that nodes are currently making duplicate transaction requests not too infrequently. ::facepalm:: What happens is that when Peer1 announces {parent,child} and Peer2 announces {parent,child} and we request parent from 1 and child from 2, but 2 responds faster, then orphan handling will request parent by txid from 2. Then it gets > the parent by txid from 2. Then when peer 1 replies to the request for parent (sending a duplicate transaction), the current behavior of this PR falsely thinks that was an unrequested txn because its tracking was removed after the t> ransaction was from peer 2 was accepted.

Of course, orphaness is terrible for this type of restrictiosn on the inv-tx flow, I already pointed out so in my original PR.

“I think we have a fundemental issue with orphans processing. Orphanness source isn’t strictly assignable. I.e, Mallory might provoke mempool conflicts between Alice and Bob, sending Y to Alice, Y’ to Bob. From then, Mallory can send a set of Y childrens to Alice, she will accept them and relay forward to Bob. Alice will be the one swalloing the DoS penalty. And now you have a vector of tx-relay jamming…”

For this PR it’s probably just best to silently ignore AlreadyHave transactions (more precise handling of things that are requested but outstanding would be more correct but requires care in memory management), but the duplicate requests should separately get fixed. At first blush it seems hard to fix them completely but they could be made less common by significantly delaying the requests made on behalf of the orphan resolution, so they get satisfied by the already in flight request for the same txn by wtxid when it comes in… which should almost always be the case outside of a recent startup except when the parent was never offered because it was below the nodes fee filter.

The AlreadyHave can be a good idea, but I think now you’re introducing a bandwidth free relay vector at last, where a single parent can be concurrently replayed through many transaction-relay links. I’m not sure either delayed orphan resolution is necessarily a working way as now you can have slow propagation of pre-signed low-fees transactions that are only interesting with high-fee child. And it’s the simple chain of transactions situations, you can have a same child txid, different wtxid / different feerates rebinding on the same parent… @mzumsande

1.) orphan X is received, put into orphanage, parent P requested 2.) a child Y of X is received -> It’s also an orphan, parent X is requested again (by txid!) because AlreadyHaveTx does not check the orphanage by txid (reason) 3.) P is received and accepted 4.) orphan resolution accepts X, removes the tx request for X (in ProcessValidTx()) even though there is an outstanding request 5.) X is received again, it’s seen as unrequested. The peer would be disconnected even though it did nothing wrong.

If I’m understanding correctly, you’re pointing to sequential orphan resolution of a chain of transaction P <- X <- Y, yes I think you could have X the intermediary transaction element seen as unrequested. See my point above to gmaxwell, as I think the AlreadyHave presents already issues with adding a free relay vector on simple 2-depth chain of transactions. @vasild

#21224 was frowned upon for a few reasons and one of them was breaking existent tools. This PR addresses just that by introducing a new protocol version. It does not address the other concerns.

Addressing all the other concerns mentioned in the old PR means purely re-writing the ~10k of LoC of the transaction-relay stack from scratch in cpp or rust…It’s an interesting exercise and certainly one that someone could actually do so thanks to the multiprocess interfaces, but in the meantime it will leave forever exposed current bitcoin core full-nodes to DoS vectors arising from hard to verify transactions.

On the new protocol idea - we don’t expect that an attacker will voluntarily switch to the new protocol so that their attack can be thwarted, right? If this PR is merged and at some point in the future, we drop unsolicited transactions from old peers, then that is the same as going back to #21224, without this PR because that would break existent tools.

Of course we don’t live in the world of Care Bears so we cannot expect attackers running potentially modified software to adhere to the new protocol, period. Though here you’re missing all the salt of the past conversations on the old PR, where there is a trade-off between breaking either existent tools or letting exposed current nodes.

The OP mentions “undocumented clients” and in the discussions there seems to creep the idea that sending unsolicited transactions is a violation of the protocol, undocumented behavior, an error or wrong in some way without providing any reference to back those claims. On the contrary, the oppotite seems to be the case: sending unsolicited transactions is explicitly documented and even if it was not, in the absence of documentation then the source code is the spec and Bitcoin Core accepts unsolicited transactions.

There is no BIP standard documenting this behavior about transaction-relay and there is no guarantee that other full-nodes softwares are respecting the non-reject of unsolicited transactions. In the absence of documentation, it’s indeed wise to refer to the code, though in matters of transaction-relay messages usually it has been something fleshout from iterations between full-nodes softwares and application clients (e.g the MEMPOOL message), and this I think since the early days of lightweight clients deployment in bitcoin circa ~2011.

Correct me if I am wrong but even if this is implemented, an attacker can send up to 5000 (MAX_PEER_TX_ANNOUNCEMENTS) announcements to which the victim will respond (after a few seconds) by requesting 5000 transactions at once. If one transaction takes 1 second to reject, then the attack is successful. The motivation behind MAX_PEER_TX_ANNOUNCEMENTS is to limit the amount of memory used for announcements. I guess it is too high to stop CPU DoS with slow-to-reject transactions… this is already repeating the arguments from #21224.

This is correct that the current value of 5000 (MAX_PEER_TX_ANNOUNCEMENTS) can still leave some DOS exposure, and that reducing it could be considered in future PRs, however without that initial step of rejecting the unsolicited transactions over the MAX_PEER_TX_ANNOUNCEMENTS limit, this is quite useless to consider at first the default value of MAX_PEER_TX_ANNOUNCEMENTS.

In summary, requiring INV+GETDATA does not solve the problem and breaks existent tools. Concept NACK because of this.

It would be better to throttle the processing of newly arrived transactions, globally and/or per-peer, regardless of whether they are solicited or unsolicited.

Randomly throttling the processing of newly arrived solicited and unsolicited transactions could make the mitigation worst than the DoS exposure reduction, as this could a transaction propagation censorship vector.

You mean this one: “At times there have been parties that mass connect…”? If that would be the driving force behind this, then maybe it would be better to put it in the PR description.

Generally, this is not the job of bitcoin core contributors to write a full description on how transactions relay behaviors could be used by incompetent chainalysis companies to conduct deanonymization attacks on the network…

So this is not about CPU DoS attack by slow-invalid transactions, but about somebody doing something stupid and using the Bitcoin network in an ineffective way, sending 500 bytes instead of 30? Requiring INV+GETDATA will probably stop this, but is it such a problem to be worth the effort? How often does it happen? Is it something that happened once 7 years ago, or is it happening regularly, e.g. every week? What % of the entire network traffic is wasted due to that per month?

Leveling up the bar w.r.t to potential deanonymization is one concern. A full-node being feeded unsolicited hard to verify transactions that could be DoSy can be another concern.

You also mentioned wasting inbound slots, but requiring INV+GETDATA will not solve that - the mass connect tool will be updated to do INV+GETDATA+TX instead of just TX. Those are short-lived connections anyway.

A mass connect tool might not open short-lived connections, all depends the purpose of the tool.

I do not think that sending or accepting unsolicited transactions is a bug - it is documented, has been working like this forever and the network is just fine with it.

Not really, there is no such documentation about unsolicited transaction in the codebase at commit 4405b78d, and times have proven this version of the software had significant and major bugs such as BIP30. @ajtowns

Sending INV and waiting for a GETDATA does mitigate the bandwidth involved substantially: when the txs are duplicates, you’re now only receiving the w/txid instead of the full transaction, reducing the wasted bandwidth by perhaps a factor of at least ~4x (32B vs ~128B for parent txid/sig/output), and probably substantially more. I think the inbound slots issue> is just “if we know we don’t want this peer, lets discourage immediately so the slot is free”, so an optimisation rather than addressing a DoS concern per se.

About the inbound slots issue, see one of the comment of gmaxwell above, I think there is an argument about the robustness of your local transaction-relay topology by not dropping misbehaving peers at first protocol violation. IMHO, I think this can be improved by adding a discouragement score, or whatever other marker as inbound peers eviction heuristics. @vasild

Another aspect to consider wrt wasting bandwidth: if the sender knows the recipient does not have the transaction (e.g. when a wallet sends its newly created transaction), then doing the extra INV+GETDATA round-trip is a waste.

The sender can never be sure that the recipient already have the transaction, recipient full-node could have shutdown and reboot or the transaction could have temporarily fallen under the mempool min fee at the TX message reception.

ariard commented at 7:38 am on August 17, 2024: contributor

I think the main open problem of this approach, and iiuc as pointed out by gmaxwell and aj, if it’s applied to upgraded peers only (REJECT_UNSOLICITED_TX_VERSION) is the occupation of inbound slots by buggy peers. Slots that could be rather used by more non-buggy peers to preserve robustness of the transaction-relay topology.

I think one way to alleviate is to introduce a new versioning of the transaction-relay protocol itself, which has not been really existent so far and that way have an equilibrated transaction-relay topology. The node version bits could be used for that as that way more selective peering policy can be build on top.

 0diff --git a/src/protocol.h b/src/protocol.h
 1index fd7cfddf3b..495a2e3a21 100644
 2--- a/src/protocol.h
 3+++ b/src/protocol.h
 4@@ -329,6 +329,9 @@ enum ServiceFlags : uint64_t {
 5     // NODE_P2P_V2 means the node supports BIP324 transport
 6     NODE_P2P_V2 = (1 << 11),
 7 
 8+    // NODE_TXRELAY_V2 means the node supports BIPXXX transaction-relay v2 protocol
 9+    NODE_TXRELAY_V2 = (1 << 12),
10+
11     // Bits 24-31 are reserved for temporary experiments. Just pick a bit that
12     // isn't getting used, or one not being used much, and notify the
13     // bitcoin-development mailing list. Remember that service bits are just

Introducing a transaction-relay versioning as another advantage, independent of this PR’s concerns, as ConnectionType and expected node p2p messages support could be looked-on before any connection attempt. The service bits are within addr messages already.

DrahtBot removed the label CI failed on Aug 17, 2024

gmaxwell commented at 11:29 pm on August 18, 2024: contributor

@vasild I don’t think ariaid’s initial justification for the PR was the most important one. Sometimes people propose doing the right things for less than the most important reasons. If one were to judge things only by their initial proposals then it would open a very easy attack on development– just keep proposing good things with bad justification to make sure they won’t get done. :)

The amount of CPU-work per byte sent might only be increased by 50% by forcing INV use. E.g. say an attack scriptpubkey in the utxo set causes a lot of work, and for whatever reason the attack is limited to single input transactions. The minimum serialized transaction size is about 63 bytes. So without inv the attacker sends 63 bytes per unit of work, but with the inv the attacker sends 32+63=95 bytes per unit of work. So for a given amount of bandwidth the attacker causes 50% more work in this contrived and extreme case.

I don’t think that’s a particularly interesting reason to motivate the PR. But I think not having an incentive to send unexpected data generally is a very important reason, as a single mass connector can easily end up using a very large fraction (as in a majority, depending on how many txn they send) of the networks’ total bandwidth usage from doing so.

I think the mass connector concern would be potentially addressable by doing something like, “if a peer sends an unsolicited txn stop all processing on the peer for N seconds” (h/t to Pieter for mentioning that in an off-thread discussion). Then go ahead and process it. It would be backwards compatible too since some wallet relaying their txn wouldn’t be meaningfully harmed by an N second delay. The primary downside I see is that the mass connectors may continue because without the disconnect they don’t realize their “optimization” pessimizes instead of optimizes. We certainly saw that with block relay, e.g. viabtc for years spamming every node they could connect to with 900kb witness stripped blocks that just got totally ignored.

Aside, in the case of private broadcast– I think it might eventually be good for nodes to periodically “private broadcast” transactions that aren’t their own at a very low rate. This is because anonymity networks like tor have extremely low resistance to traffic analysis. It seems likely to me that ordinary bitcoin p2p will have better traffic analysis resistance than “private broadcast” in the future (e.g. after doing a very small amount of work on using p2p encryption padding and rate limiting). I’ll go make some comments on the private broadcast PR to discuss that. In any case, INV in that case then helps mitigate traffic even for private broadcast. @ariard

but I think now you’re introducing a bandwidth free relay vector at last,

Huh? Nothing here changes at all what transactions a node relays. And someone can always waste bandwidth to you if thats their only goal (e.g. sending you IP ping packets ).

In any case, I’m only concerned about the shortest path to deploy– since the behavior change could be made initially lax and then tightened over time. Right now, existing bitcoin core nodes are foolishly making many duplicate requests. It’s clear that this could be improved, but it seems like some will remain. More precise tracking can be done but would need analysis from a worst case memory usage perspective.

DrahtBot added the label CI failed on Sep 4, 2024

DrahtBot commented at 2:33 am on September 4, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/28889097062

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

ariard commented at 1:15 am on September 5, 2024: contributor

@gmaxwell

Huh? Nothing here changes at all what transactions a node relays. And someone can always waste bandwidth to you if thats their only goal (e.g. sending you IP ping packets ).

In any case, I’m only concerned about the shortest path to deploy– since the behavior change could be made initially lax and then tightened over time. Right now, existing bitcoin core nodes are foolishly making many duplicate reque> sts. It’s clear that this could be improved, but it seems like some will remain. More precise tracking can be done but would need analysis from a worst case memory usage perspective.

Back from travel – I looked more to implement AlreadyHaveTx() at the reception of NetMsgTypeTx as an exemption of calling Misbehaving, though after lokking back on TxRequestTracker code this concurrency behavior due to orphan handling has very low odds to happen as the announcements should be ordered by CANDIDATE and REQUESTED status. See https://github.com/bitcoin/bitcoin/blob/master/src/txrequest.h#L152

Given we’re taking the cs_main lock to deal with TxRequestTracker, if we have concurrent requests that can be issued, whatever the level of peer preferredness, it should be fix at that level.

Beyond, I maintain that you’re naively adding AlreadyHaveTx() on this current branch, you would add a bandwidth free relay vector, as peer1 and peer2 can collude to send a unique child towards the same node (the parent not existing at all for any other node), and have the GETDATA for the parent duplicated by the peer.

For this behavior change, and without digging more about precise tracking and memory / CPU usage, I think current branch approach is the best that can be done…? Though we can always introduce a NODE_TXRELAY_V2 bit service flag to signal disconnection for protocol violating peers, either buggy or malicious.

ariard commented at 1:31 am on September 5, 2024: contributor

I think the mass connector concern would be potentially addressable by doing something like, “if a peer sends an unsolicited txn stop all processing on the peer for N seconds” (h/t to Pieter for mentioning that in an off-thread disc> ussion).

I don’t think it is that easy…a peer can be the fastest to announce a seen-by-all network txid, get requests from the node and then inject an unsolicited tx to stop all processing, to force the node to fallback to another peer to fetch the seen-by-all network txid which is a delay… If it’s a miner node, and it’s a high fee transaction this can be an interesting trick to play with.

Add NODE_TXRELAY_V2.

This does not add default signaling of NODE_TXRELAY_V2 as node local
services.

23551efc91

ariard commented at 2:37 am on September 5, 2024: contributor

Updated at [23551ef] with the NODE_TXRELAY_V2 approach.

Submitted a BIP draft in that sense, in the lack for now of a better approach for now on how to implement reject of unrequested transactions in a minimally-disruptive fashion for the bitcoin peer-to-peer network. I think I should open a distinct PR for the service signaling support itself.

mzumsande commented at 11:07 am on September 5, 2024: contributor

Beyond, I maintain that you’re naively adding AlreadyHaveTx() on this current branch, you would add a bandwidth free relay vector, as peer1 and peer2 can collude to send a unique child towards the same node (the parent not existing at all for any other node), and have the GETDATA for the parent duplicated by the peer.

I don’t get this at all. The suggestion above was, if you receive an unrequested TX, and you already have it, ignore it but don’t disconnect the peer. Which is the same as current behavior on master.

Free relay is a network-level concept, all of what you describe is local between nodes that are connected. In order for anything to be a free relay vector, resources of non-involved nodes must be wasted.

and have the GETDATA for the parent duplicated by the peer.

If a tx is received a second time, and that tx is in the orphanage, it is ignored - there will be no second GETDATA for the parent on either master or this branch.

though after lokking back on TxRequestTracker code this concurrency behavior due to orphan handling has very low odds to happen as the announcements should be ordered by CANDIDATE and REQUESTED status

When I tested it (simply start a node and log when an unrequested tx is received),unrequested transactions arrived frequently and within seconds of starting the node. I would recommend to just test it for yourself…

ariard commented at 10:36 pm on September 5, 2024: contributor

@mzumsande

I don’t get this at all. The suggestion above was, if you receive an unrequested TX, and you already have it, ignore it but don’t disconnect the peer. Which is the same as current behavior on master.

Let’s say you have Peer1 and Peer2 announcing the Transaction ABC, if ABC is concurrently fetch from both Peer1 and Peer2, the GETDATA (32-bytes) for ABC would have been wasted as outgoing bandwidth by Local Node. This 32 byte leak can be amplified if batch of transactions are concurrently fetched (bounded by _INVENTORY_BROADCAST_INTERVAL).

Free relay is a network-level concept, all of what you describe is local between nodes that are connected. In order for anything to be a free relay vector, resources of non-involved nodes must be wasted.

I’m considering the bandwidth ressources of the local node here (though easy to have the single UTXO used for the leak being re-used against a N number of topology edges) . A topology Peer1 <-> Local Node <-> Peer2 is enough to be a network.

If a tx is received a second time, and that tx is in the orphanage, it is ignored - there will be no second GETDATA for the parent on either master or this branch.

… ? For AlreadyHaveTx() the HaveTx in TxOrphanage is done on wtxid and AFAICT, one can still announce a transaction by wtxid and then the same transaction with another wtxid (use the OP_ELSE instead of an OP_IF spending path in witness). I think the AlreadyHaveTx L4585 should return false at the second TX reception and code flow execution should land in TX_MISSING_INPUTS branch.

When I tested it (simply start a node and log when an unrequested tx is received),unrequested transactions arrived frequently and within seconds of starting the node. I would recommend to just test it for yourself…

Sure, I’m aware that unrequested transactions is an allowed behavior since commit 5b788bb from my read of the code. This doesn’t mean it cannot be exploited under adversarial scenarios, e.g mass connectors for deanonymization purposes.

mzumsande commented at 8:29 am on September 6, 2024: contributor

Let’s say you have Peer1 and Peer2 announcing the Transaction ABC, if ABC is concurrently fetch from both Peer1 and Peer2, the GETDATA (32-bytes) for ABC would have been wasted as outgoing bandwidth by Local Node. This 32 byte leak can be amplified if batch of transactions are concurrently fetched (bounded by _INVENTORY_BROADCAST_INTERVAL).

That has nothing to do with free relay, because it doesn’t propagate to the rest of the network (hence the “relay” in the name). It’s also not a interesting attack because there are dozens of ways for an attacker to waste both its own and its peer’s bandwidth when it is on one end of the connection: For example send messages of unknown type (that are ignored), request a block repeatedly, spam INVs with random txids and get GETDATA in return, sending IP ping packets (see gmaxwell above) etc.

maflcko commented at 9:44 am on September 6, 2024: member

Would be nice to change the title from “… transactions v2” to something else. Otherwise, it may be a bit confusing in the context of “v2 transactions” (transactions with version number set to 2, not 3), or “v2 transport” (the P2P thing).

ariard commented at 6:47 pm on September 6, 2024: contributor

@mzumsande

That has nothing to do with free relay, because it doesn’t propagate to the rest of the network (hence the “relay” in the name). It’s also not a interesting attack because there are dozens of ways for an attacker to waste both its own and its peer’s bandwidth when it is on one end of the connection:

Sure, technically this GETDATA duplication alone isn’t a free relay bandwidth, in the sense that a transaction is relayed many times over the network with no feerate / absolute fee compensation.

Yet, I think this can be easily escalated as a free relay vector with the following sequence of events:

consensus-valid parent txid ABC wtxid BAC is forward to Local Node
consensus-valid parent txid ABC is accepted with its child as valid by Local Node: https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L4631
consensus-valid parent txid is announced as BAC to all wtxid-relay peers
concurrently, a conflicting transaction spending a common input of the child is forward to Local Node : https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L1338
min fee consensus-valid parent txid ABC wtxid BAC is evicted from its mempool by Local Node: https://github.com/bitcoin/bitcoin/blob/master/src/txmempool.cpp#L1128
the duplicated GETDATA is received and the same parent fetched by its txid ABC again
consensus-valid parent txid ABC wtixd CAB is forward to Local Node again
consensus-valid parent txid ABC is accepted as valid by Local Node again
consensus-valid parent is re-announced as CAB to all wtxid-relay peers again: https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L2383

Replacement penalty has been indeed paid on the conflicting child, but not for the vbytes of the parent at second-relay to wtixd-relay peers.

For example send messages of unknown type (that are ignored), request a block repeatedly, spam INVs with random txids and get GETDATA in return, sending IP ping packets (see gmaxwell above) etc.

This is missing the point…there is no common measure between messages of unknown types that will be disregarded siltently (“Ignore unknown commands for extensibility”)and messages that will trigger a data fetch from the Local Node such as transaction relay. I got gmaxwell point above, though it’s not the purpose of this code change to discuss here with gmaxwell all what is wrong with core p2p stack (though we could …?).

I think there have been no answer on the point I was raising above, re-quoting:

… ? For AlreadyHaveTx() the HaveTx in TxOrphanage is done on wtxid and AFAICT, one can still announce a transaction by wtxid and then the same transaction with another wtxid (use the OP_ELSE instead of an OP_IF spending path in witn> ess). I think the AlreadyHaveTx L4585 should return false at the second TX reception and code flow execution should land in TX_MISSING_INPUTS branch.

ariard commented at 6:49 pm on September 6, 2024: contributor

Would be nice to change the title from “… transactions v2” to something else. Otherwise, it may be a bit confusing in the context of “v2 transactions” (transactions with version number set to 2, not 3), or “v2 transport” (the P2P thing).

See #30837, as the signaling support for transaction V2 can be discussed separately from halting the processing of unrequested transactions.

ajtowns commented at 4:47 am on September 8, 2024: contributor

Motivation is still the same to limit the DoS exposure in face of some DoSy behaviors from tx-relay peers.

If this is just something a peer can opt-in to, I don’t think it mitigates any DoS behaviours: spammers/attackers will just use the old version.

I’m pretty skeptical that anything here can really prevent attacks – an attacker can run multiple or low-latency connections to end up with pretty much the same result in most cases, I think. As far as I can see the main benefit is to make low-effort mass broadcast a bit less redundant, so that we only end up with (w)txids getting spammed unnecessarily, not the complete transaction. That seems a worthwhile improvement to me, but not one that should be mistaken for a DoS fix.

But I don’t see how we get even that benefit if this is an opt-in behaviour that isn’t applied to existing software, though. Going beyond that and introducing a service flag for this doesn’t make any sense to me.

ariard commented at 4:47 am on September 13, 2024: contributor

@ajtowns

If this is just something a peer can opt-in to, I don’t think it mitigates any DoS behaviours: spammers/attackers will just use the old version.

Yes, indeed this is a quite obvious point about a node service bit. It was raised during the mempoolfullrbf discussion years ago (see #25600), that effectively node service bit network mechanism cannot be effectively checked for, without actually connecting to and triggering the logic. About the present case, this observation has been already made by gmax above: “I’m somewhat dubious of the value of this behind a version check: the abusive hosts that intentionally do this will just continue to be an older version and I don’t think it’s realistic to cut off older versions completely.”

One code improvement to throttle out spammers / attacker could be to favor inbound slots for upgraded peers when the eviction logic is triggered (AttemptToEvictConnection) when the inbound slots peers are full, that can be added in a patch of its own.

I’m pretty skeptical that anything here can really prevent attacks – an attacker can run multiple or low-latency connections to end up with pretty much the same result in most cases, I think. As far as I can see the main benefit is to make low-effort mass broadcast a bit less redundant, so that we only end up with (w)txids getting spammed unnecessarily, not the complete transaction. That seems a worthwhile improvement to me, but not one that should be mistaken for a DoS fix.

As usual, when you’re evaluating an attacker, you have to think in terms of ressources available both for the attacker and defender. From the defender viewpoint, there is only at max 125 connections to different peers, so if this assumption is correct (let’s waive at network-level attack that most of the time demands corresponding capabilities…) an attacker cannot have an unbounded number of low-latency connections. If spam nodes are disfavored by the peering mechanism, it’s reducing at least the overall spam transactions that a local node has to accept or reject for a time duration, e.g the one driving the tx fetch done by the TxRequestTracker. Though yes, I think too with you that it should make low-effort mass broadcast a bit less redundant, and hypothetical save global network bandwidth, even in non adversarial situations.

But I don’t see how we get even that benefit if this is an opt-in behaviour that isn’t applied to existing software, though. Going beyond that and introducing a service flag for this doesn’t make any sense to me.

This is leveling up the bar for mass connector wishing to deanonymize transaction propagation over the network, even if they’re running non-upgraded peers, their non-solicited transactions will be rejected. See deanonymization heuristics like tx probe where the peering topology can be inferred from transactions propagations.

All that said, if you can think about a mechanism to drop unsolicited transactions, with no invasive effects on non-upgraded softwares, with no downsides, I’ll have a look and if I think it’s technically better, I’ll withdraw this code change.

ajtowns commented at 10:58 am on September 23, 2024: contributor

an attacker cannot have an unbounded number of low-latency connections.

You only need one low-latency connection to run a pretty effective tx spamming attack, as low-latency means that the INV/GETDATA/TX cycle is fast, and you’re not rate-limited by the maximum number of pending GETDATA requests. (And even if you don’t have a low-latency connection, you can open many connections to get a multiple of that rate-limit)

ariard commented at 7:22 pm on September 23, 2024: contributor

You only need one low-latency connection to run a pretty effective tx spamming attack, as low-latency means that the INV/GETDATA/TX cycle is fast, and you’re not rate-limited by the maximum number of pending GETDATA requests. (And even if you don’t have a low-latency connection, you can open many connections to get a multiple of that rate-limit)

I don’t deny it that you need only one low-latency connection to run a pretty effective tx spamming attack. Though, in my understanding, you’re pointing yourself one of the aspect that this code change aims to improve, namely the lack of effective rate limitation of the maximum number of TX validation cycle that one node can have to process, independently of the INV/GETDATA flow.

And indeed, as you’re paraphrasing, even if an attacker does not have a low-latency connection, one can open many connections to get a multiple of the per-connection currently not respected rate-limit processing). Yet, as I was pointing in practice bitcoin core already has limitations on the max number of connections, i.e 125.

Leveling-up in face of tx spamming attacks also should make it harder to deanonymize transaction propagation over the transaction-relay network. If you have another technical approach, achieving the same effect and under the same minimalism, than the proposed current code change, it’s definitively worthy to think more about it.

0xB10C commented at 8:15 am on October 9, 2024: contributor

Is anyone running this currently with the code set to always enforce or just log? it would be useful to know at what rate nodes are currently seeing unsolicited.

I switched one of my monitoring nodes to run https://github.com/0xB10C/bitcoin/commit/ba39837d999407a55c3784059f7cf07bdbdfce76 to collect some data on this. Having a glance at the logs since yesterday, I’ve mostly seen the same few peers sending me unsolicited transactions - at a rate of a few per minute.

I’m particularly interested if any protocol v2 nodes engage in it, since I think that’s a viable way to immediately enforce without breaking anything.

One peer (peer=3755) that has sent a few hundred unsolicited transactions today is an outbound connection using P2P transport version 2 to a node with a /Satoshi:27.0.0/ user agent. Also note that my node requests about 97% of transactions this peer INVs (which is unusually high).

However, looking at the logs in more detail, it seems many of these unsolicited transactions are false-positives and not actually unsolicited. See for example the log below - I grepped for a handful of txids from peer=3755 and all seemed to be false positives. I haven’t looked closer into why these happen.

02024-10-09T07:06:00.821953Z [msghand] [ProcessMessage] [net] got inv: wtx 75139145a46e1f911ea0c4956d8d2d08349b956e0682900a5caab2527b51b154  new peer=4332
12024-10-09T07:06:00.998299Z [msghand] [ProcessMessage] [net] got inv: wtx 75139145a46e1f911ea0c4956d8d2d08349b956e0682900a5caab2527b51b154  new peer=3755
22024-10-09T07:06:00.998312Z [msghand] [SendMessages] [net] Requesting wtx 75139145a46e1f911ea0c4956d8d2d08349b956e0682900a5caab2527b51b154 peer=3755
32024-10-09T07:06:01.071619Z [msghand] [ProcessMessage] [net] unrequested transaction 75139145a46e1f911ea0c4956d8d2d08349b956e0682900a5caab2527b51b154 (wtxid=75139145a46e1f911ea0c4956d8d2d08349b956e0682900a5caab2527b51b154) from peer=3755 ([redacted]:8333)
4...

mzumsande commented at 3:40 pm on October 9, 2024: contributor

I haven’t looked closer into why these happen.

Two mechanisms have been pointed out above, both are due to timing issues in orphan resolving: #30572 (comment) and #30572 (comment) - I wouldn’t be surprised if there were others.

Obviously fixing these issues (whether by avoiding the timing issues in the first place, or by ignoring unrequested transactions for which AlreadyHave is true instead of disconnecting) is a blocker for any further progress, whether with a BIP or not. If this PR was deployed as is, any two upgraded peers would disconnect each other very quickly. @0xB10C : I think it would be helpful to have data about how often unrequested transactions are received that the node doesn’t already know about (AlreadyHave) at the time the unrequested tx is received.

ariard commented at 11:42 pm on October 13, 2024: contributor

@0xB10C Very interesting to have the logs about unsolicited transactions from network peers

I’ve mostly seen the same few peers sending me unsolicited transactions - at a rate of a few per minute.

User agent might not be verbose that much if they are lightweight clients (e.g ones based on bitcoinj lib) doing transaction broadcast, with no INV before. It could be interesting to get volume of the unsolicited transactions, if there are more likely to come from lightweight clients doing few broadcast a day, or some big bitcoin infra like an exchange.

Also note that my node requests about 97% of transactions this peer INVs (which is unusually high).

This is unusually high, as if a node is running with inbound / outbound peers for its topology, and it’s somehow located with some centrality in the topological graph, the ingress and egress transaction traffic shouldn’t diverge that much in quantitative measurements.

One hypothesis to explain that 97% of observed difference, if it’s not some deanonymization trick of the transaction-relay topology, it can be a full-node tied to an exchange or a lightning node, with a high-volume of issued transactions, and your monitoring node is one of its first vertice on the topology.

Two mechanisms have been pointed out above, both are due to timing issues in orphan resolving: #30572 (comment) and #30572 (comment) - I wouldn’t be surprised if there were others.

Yes, there can be obviously other issues in the way orphans are currently resolved, e.g #30572 (comment) @mzumsande

Obviously fixing these issues (whether by avoiding the timing issues in the first place, or by ignoring unrequested transactions for which AlreadyHave is true instead of disconnecting) is a blocker for any further progress, whether with a BIP or not. If this PR was deployed as is, any two upgraded peers would disconnect each other very quickly.

See the latest state of the code (23551ef), which I think have been pushed before your comment publication. There is a ExpectedTx() method checking in the TxRequestTracker if the transaction has been requested either by wtxid or txid, orphan or not. Currently, Bitcoin Core is doing parent orphan fetching by txid: https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L4682 so there should no parent orphan received as an unsolicited transaction, and not found as REQUESTED in the TxRequestTracker. Parent wtxid is not announced in TX and BIP331 is explicit on ancpkginfo, so there should be no unsolicited parent received. I don’t think your comment about two upgraded peers would disconnect each other very quickly is correct.

If there is anything that demands to be improved in the 2 related BIPs, you’re free to comment there.

I think it would be helpful to have data about how often unrequested transactions are received that the node doesn’t already know about (AlreadyHave) at the time the unrequested tx is received

Even more than AlreadyHave(), which is only about orphanage, recent rejects, reconsiderable packages txn, the recently confirmed txn, the recently rejected transaction and the mempool, one have to check if a transaction is in the REQUESTED state, as done by the `TxRequestTracker.

mzumsande commented at 10:55 am on October 16, 2024: contributor

See the latest state of the code (23551ef), which I think have been pushed before your comment publication. There is a ExpectedTx() method checking in the TxRequestTracker if the transaction has been requested either by wtxid or txid, orphan or not. Currently, Bitcoin Core is doing parent orphan fetching by txid: https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L4682 so there should no parent orphan received as an unsolicited transaction, and not found as REQUESTED in the TxRequestTracker.

Again, the issues mentioned in #30572 (comment) and #30572 (comment) point out that unrequested transactions are received today when an orphan resolution clears outstanding tx requests both by txid and wtxid here, but there is still a tx request in flight (we already sent GETDATA but haven’t received the TX yet), which is seen as unrequested when the TX arrives. So the transactions were requested before, but aren’t seen as requested anymore at the point the tx is received. How is this addressed by your answer / the current code?

I don’t think your comment about two upgraded peers would disconnect each other very quickly is correct.

Have you actually tested this, e.g. by running on mainnet for a while, logging when an unrequested tx is received?

There is a ExpectedTx() method checking in the TxRequestTracker if the transaction has been requested either by wtxid or txid, orphan or not. Currently, Bitcoin Core is doing parent orphan fetching by txid

Are you sure it works correctly? If I comment out the P2P version restrictions to actually enable this,

0@@ -4558,7 +4558,8 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
1         // We check both identifiers as txid mode may happen with wtxidrelay peers
2         // due to parent-orphan fetching. For backward-compatibility reasons, we only
3         // apply this transaction relay policy to node signaling NODE_TXRELAY_V2.
4-        if ((peer->m_our_services & NODE_TXRELAY_V2) && !pfrom.HasPermission(NetPermissionFlags::Relay) && pfrom.GetCommonVersion() >= REJECT_UNSOLICITED_TX_VERSION) {
5+        //if ((peer->m_our_services & NODE_TXRELAY_V2) && !pfrom.HasPermission(NetPermissionFlags::Relay) && pfrom.GetCommonVersion() >= REJECT_UNSOLICITED_TX_VERSION) {
6+        if (!pfrom.HasPermission(NetPermissionFlags::Relay)) {
7             bool is_expected = tx.HasWitness() ? m_txrequest.ExpectedTx(pfrom.GetId(), wtxid) ||
8                 m_txrequest.ExpectedTx(pfrom.GetId(), txid) : m_txrequest.ExpectedTx(pfrom.GetId(), txid);
9             if (!is_expected) {

dozens of functional tests fail - and many of these don’t seem to be sending unrequested transactions.

ariard commented at 0:04 am on October 18, 2024: contributor

@mzumsande

Again, the issues mentioned in #30572 (comment) and #30572 (comment) point out that unrequested transactions are received today when an orphan resolution clears outstanding tx requests both by txid and wtxid here, but there is still> a tx request in flight (we already sent GETDATA but haven’t received the TX yet), which is seen as unrequested when the TX arrives. So the transactions were requested before, but aren’t seen as requested anymore at the point the tx > is received. How is this addressed by your answer / the current code?

In my understanding, what you’re describing is the following:

a peer Alice is announcing an INV(TX_A) to our local node
a peer Bob is announcing an INV(TX_B) to our local node
our local node is doing a GETDATA of (TX_A) to Alice
Alice is buggy or withholding the TX_A message
after TxRequestTracker timer expirates, our local node is doing a GETDATA of (TX_A) to Bob
our local node receives TX_B from Bob
after a while, Alice is forwarding TX_A to our local node
transaction A is considered as an unrequested transaction

I was pointing out this issue about TxRequestTracker in my above comments. Without having checked more it might already happen today as ProcessValidTx clears up early in net_processing.cpp, even if it’s not an orphan resolution. This issue should be fixed orthogonally of the current reception of unsolicited transactions, unsolicited in the sense of “never send an INV for this txid, whatever the peer” before. My code and this proposal does not address issue, as (a) it’s already existent and (b) the effects are only for upgraded peers, there is no effect for non-upgraded peers with the current defect you’re pointing too in orphan resolution or TxRequestTracker timing issues.

Though yes to be clear, if you don’t understand, I think it should be fixed, should I work on another code fix, or do you think proposing yourself one ? I’ll review it.

Have you actually tested this, e.g. by running on mainnet for a while, logging when an unrequested tx is received?

I checked the software of old lightclients backend, who are actually forwarding unrequested transactions, so there is no need of further monitoring to guess that unrequested transactions is a reality today. Though if you have more logging data on this aspect you’re free to share it here.

Are you sure it works correctly? If I comment out the P2P version restrictions to actually enable this,

…Let’s be a bit a minimum professional…I can go to any of your PRs too, change the code with one line, and say the code is failing dozen of tests. Taking a step back this code change is open as a draft, to indeed look for more conceptual review before testing it more. It’s not like there has been a previous PR a while back, and no kind of converging thoughts on how to address that.

I don’t have the merge rights on the bitcoin core project, never been interested by that when it was proposed to me, so calm down it’s only a draft code change.

mzumsande commented at 2:06 pm on October 22, 2024: contributor

In my understanding, what you’re describing is the following:

This looks like a separate issue. I was talking about orphan resolution removing the tx, you are talking about TxRequestTracker expiration removing it.

I can go to any of your PRs too, change the code with one line, and say the code is failing dozen of tests

I wasn’t changing a random line of code: the added code is guarded by a future P2P version upgrade and therefore unreachable and untestable as is - one way to expose this code to testing is to remove this version restriction. I was just trying to point out that there might be a bug somewhere - I don’t know if so and where exactly and haven’t analyzed this in detail but it seemed to me that pre-segwit txns are often considered unrequested, so these might be false positives.

I think it would be helpful to fix this, not because this is going to be merged anytime soon, but because running this branch with logging would help to judge how often unrequested transctions (actual or perceived) are received today, in particular by other bitcoin core nodes, that would then disconnect each other if they ran this branch and upgraded the p2p version. For me this would be a logical next step before any further conceptual review, but up to you.

ariard commented at 2:02 am on October 25, 2024: contributor

This looks like a separate issue. I was talking about orphan resolution removing the tx, you are talking about TxRequestTracker expiration removing it.

Yes, see my comments above. I got it there are current issues in the orphan resolution removing the transaction with a ForgetTxHash before we received it again, and wrongly see it as unsolicited.

I wasn’t changing a random line of code: the added code is guarded by a future P2P version upgrade and therefore unreachable and untestable as is - one way to expose this code to testing is to remove this version restriction.

Sure, but if you remove the P2P version upgrade guard, and this without consequently having upgraded all the corresponding transaction-relay tests, there is nothing surprising in getting failures. Note this PR, is a re-work of an old PR linked in the OP, and last time I went to fix all the tests before there was a convergence on how to implement unsolicited transactions restrictions in a minimally disruptive approach.

not because this is going to be merged anytime soon, but because running this branch with logging would help to judge how often unrequested transctions (actual or perceived) are received today, in particular by other bitcoin core nodes, that would then disconnect each other if they ran this branch and upgraded the p2p version.

There are very likely unsolicited or unrequested transactions that must been received by bitcoin core nodes, whatever the version, see this code comment in bitcoinj: https://github.com/bitcoinj/bitcoinj/blob/master/core/src/main/java/org/bitcoinj/core/TransactionBroadcast.java#L178

I’ll add a commit to add option logic to configure a bitcoin core node with or without -v2txrelay on top of this branch. Beyond it’s unclear to me what exactly is wished more to run logging nodes in the most agnostic fashion on the current network ? @mzumsande @0xB10C

DrahtBot added the label Needs rebase on Oct 29, 2024

DrahtBot commented at 8:04 pm on October 29, 2024: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

0xB10C commented at 5:57 pm on November 1, 2024: contributor

I was just trying to point out that there might be a bug somewhere - I don’t know if so and where exactly and haven’t analyzed this in detail but it seemed to me that pre-segwit txns are often considered unrequested, so these might be false positives. @mzumsande and I had a look at my logs and the “rate of a few [unsolicited] per minute” I was seeing were mostly non-segwit transactions - not unsolicited transactions. The problematic commit seems to be https://github.com/bitcoin/bitcoin/pull/30572/commits/c7bc444d02d34273a0a42806745e6a32b3146143.

I’ve been running https://github.com/0xb10c/bitcoin/commit/a6310b2b7257a32c8164f553e857ab6e96812469 for about two weeks now (without https://github.com/bitcoin/bitcoin/pull/30572/commits/c7bc444d02d34273a0a42806745e6a32b3146143) and included an !AlreadyHaveTx() check before logging. The https://github.com/0xB10C/bitcoin/commit/a6310b2b7257a32c8164f553e857ab6e96812469 logs now suggests a lot fewer unsolicited tx than a few per minute - often less than one unsolicited transaction per minute.

date	on this day	mean per hour	mean per minute
2024-10-20	1550	64.5	1.076
2024-10-21	2309	96.2	1.603
2024-10-22	582	24.2	0.404
2024-10-23	404	16.8	0.280
2024-10-24	770	32.0	0.534
2024-10-25	245	10.2	0.170
2024-10-26	213	8.87	0.147
2024-10-27	306	12.75	0.212
2024-10-28	113	4.70	0.078
2024-10-29	26589	1107.87	18.464
2024-10-30	2495	103.95	1.732

Between 2024-10-29T22:50:51 and 2024-10-29T23:54 the node received nearly 26k unsolicited transactions from different peers on all networks the node has connections to/from (IPv4, Onion, I2P).

Full logs are here https://gist.githubusercontent.com/0xB10C/659b5e4ad76774032e6dc1dab5a1d1f2

ariard commented at 7:34 pm on November 12, 2024: contributor

I was seeing were mostly non-segwit transactions - not unsolicited transactions. The problematic commit seems to be https://github.com/bitcoin/bitcoin/commit/c7bc444d02d34273a0a42806745e6a32b3146143. @0xB10C Thanks, that’s interesting. I’ll dig into the TxRequester, if due to switching from iterating by status to iterate by peers, if the processing for non-segwit transactions and segwit transactions is always the same.

—————

Between 2024-10-29T22:50:51 and 2024-10-29T23:54 the node received nearly 26k unsolicited transactions from different peers on all networks the node has connections to/from (IPv4, Onion, I2P).

Receiving quantity of unsolicited transactions is not surprising to me, given that few lightweight clients backends such as bitcoinj are not sending an INV first. If you have the global traffic, out of which the 26k unsolicited transactions have been seen, that would give some empirical points if unsolicited transactions falls under some distribution: < 1%, >1% && <5%, >5%. More than that the upper bound of this distribution would be surprising. Like there would be a good number of p2p transaction-relay clients that don’t follow the INV ->, <- GETDATA, TX -> flow.

ariard commented at 4:01 pm on January 29, 2025: contributor

Working on a re-write of the 2 bips for this PR + #30837 and updating PRs now more context on the rational of the change is available: https://bitcoinops.org/en/newsletters/2024/12/06/

I should give an update in the coming weeks.

ariard commented at 4:06 am on February 19, 2025: contributor

Forwarded an update on the mailing list entitled “Transaction Relay V2” to give an overview on the topic.

Working on a re-write of the 2 bips for this PR + #30837 and updating PRs now more context on the rational of the change is available: https://bitcoinops.org/en/newsletters/2024/12/06/

I prefer go to close this draft PR, not because I think the problem has been solved, neither because I don’t think it’s a serious problem (I believe it’s a serious problem) though because I prefer to go to maintain my own full-node in Rust and fix the pointed transaction-relay issues there. So apart of reviewing the code changes for the consensus logic and finishing Erlay deployment, I don’t plan to contribute anymore to Bitcoin Core.

After having done some tests (https://github.com/ariard/standalone-bitcoinkernel), I found libbitcoinkernel mature enough to re-write from scratch on top of it tx-relay, block-relay and other significant components while only sharing the crux of the consensus logic. With good experience in embedded systems and C language, maintaining a bitcoin full-node is somehow a reasonable endeavor, it’s only a userspace program after all.

I’ll still share bugs and vulnerabilities found to the bitcoin core contributors I trust.

Thanks for all the reviews on this PR, this is appreciated. Feel free to grab it if it does interest you.

ariard closed this on Feb 19, 2025

Halt processing of unrequested transactions v2 #30572

Code Coverage & Benchmarks

Reviews

Conflicts