Halt processing of unrequested transactions v2 #30572

pull ariard wants to merge 6 commits into bitcoin:master from ariard:reject-unsolicited-txn changing 5 files +40 −0
  1. ariard commented at 10:27 am on August 2, 2024: contributor

    This is a re-work of #21224.

    Motivation is still the same to limit the DoS exposure in face of some DoSy behaviors from tx-relay peers.

    Difference w.r.t 21224 is adding a protocol upgrade (REJECT_UNSOLICITED_TX_VERSION) for the rejection of unsolicited txn solely to happen for new upgraded peers. That way no disruption of all the undocumented clients and node currently participating in the transaction-relay network.

    Opening as a draft - There should be indeed test, there a bunch of old on 21224 branch.

  2. Add REJECT_UNSOLICITED_TX_VERSION version
    This commit does not upgrade the negotiated PROTOCOL_VERSION.
    fd5abfaad8
  3. DrahtBot commented at 10:27 am on August 2, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept NACK vasild

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30110 (refactor: TxDownloadManager + fuzzing by glozow)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. ariard marked this as a draft on Aug 2, 2024
  5. DrahtBot added the label CI failed on Aug 2, 2024
  6. DrahtBot commented at 12:16 pm on August 2, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/28260785704

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. glozow added the label P2P on Aug 2, 2024
  8. gmaxwell commented at 4:33 pm on August 3, 2024: contributor

    The search is being done for every received transaction, even when code unconditionally won’t reject it due to the protocol versions. This could be very simply avoided by moving the “!pfrom.HasPermission(NetPermissionFlags::Relay) && pfrom.GetCommonVersion() >= REJECT_UNSOLICITED_TX_VERSION” test up to surround the expected.

    Alternatively, the search could also be gated on NET LOGGING enabled (e.g. net logging || (version && no permission)), and then it could log the unexpected TXN even if it will not ignore it (the disposition could be in the log message). This would make it easier to monitor the behavior (when I used to monitor for this I carried a local patch).

    Given that the check now only applies to a newer protocol version could it be made to disconnect the peer? This may cause a more rapid discovery of bugs in users code since being disconnected is much more obvious than having the txn silently dropped.

    I’m somewhat dubious of the value of this behind a version check: the abusive hosts that intentionally do this will just continue to be an older version and I don’t think it’s realistic to cut off older versions completely. But I think this PR is vastly superior to doing nothing, and perhaps once the new behavior is widely deployed the test could be extended to older versions (presumably in ignore rather than disconnect form, if this is changed to disconnect).

  9. gmaxwell commented at 7:27 pm on August 3, 2024: contributor

    Having gone and looked at the old discussion, I see some skepticism for the value of this and I don’t see anyone pointing out the reason I think it’s important.

    At times there have been parties that mass connect to every node on the network and blast them with transactions, presumably in an effort to make sure the transactions get propagated and accepted. This is abusive particularly because it wastes bandwidth when it gives nodes transactions they already know or have already requested from someone else. It’s also abusive because it wastes connection slots… a mass connector uses thousands of times the slot capacity of an ordinary nodes. If the conduct doesn’t work (or just gets the peer disconnected) then it will be of no value and will likely stop (or not be started again in the future, if it’s not happening right now).

    The same issue also previously existed for full blocks with some misguided miners blindly spamming every node they could connect with a full block every time they solved one, effectively DOSing the whole network (and probably working contrary to their intent). The best and often only way to correct conduct like this is to just make it not even conjecturally provide a benefit.

    Perhaps one way to hasten the deployment is to enforce this on P2Pv2 connections immediately in addition to or instead of the protocol version check, custom announcement code isn’t likely to use v2 yet. As far as I know all widely deployed software already does the right thing.

  10. mzumsande commented at 1:08 am on August 4, 2024: contributor
    Just wanted to mention that this change does not go well together with #29415 which suggests for bitcoin core to send unrequested transaction in the new “private broadcast” mode.
  11. gmaxwell commented at 6:07 am on August 4, 2024: contributor
    @mzumsande Good catch, but that’s just a flaw in PR29415– a gratuitous protocol violation, since the only cost of complying with the protocol is some tens of milliseconds of round trip. This is sort of error that wouldn’t be made if nodes enforced the behavior by ignoring unsolicited transactions.
  12. ariard commented at 6:49 pm on August 5, 2024: contributor

    Indeed, preventing mass connection as it gives an edge in some other attacks is one of the motivation behind this change. This is distinct from limiting CPU DoS (whatever the DoS margin already existent from MAX_PEER_TX_ANNOUNCEMENT at full steam, reducing that maximum number is irrelevant if you can bypass it with unsolicited txn) and other p2p DoS.

    Pushed a new commit to discourage and disconnect upgraded peers (>= REJECT_UNSOLICITED_TX_VERSION), though non upgraded peers are non affected. There is still the risk to be over strict with deployed softwares not doing the right things, i.e there is a lot of weird behaviors in other p2p bitcoin stacks, breaking the tx broadcast of some obscure lightning wallet ain’t a small bug. On the other hand, being liberal with non upgraded peers all the DoS vectors can be still be abused. A better approach could be to revert ae60d485, i.e gradual increase of the discouragement score were non upgraded peers would be allow a number of score points before to be disconnected (e.g for each unsolicited tx 10 discouragement score points bump).

    W.r.t “private broadcast” mode, I think it’s better to have it fixed to follow the INV / GETDATA / TX flow, otherwise it sounds a loophole for an attacker to trigger DoS / connection slots mass waste if “private peers” are not punished for unsolicited traffic. It’s better to do it now, rather to introduce a node fingerprint between hypothetical version of “private broadcast” mode.

  13. in src/txrequest.cpp:680 in 83c9cd8d3d outdated
    674@@ -675,6 +675,19 @@ class TxRequestTracker::Impl {
    675         if (it != m_index.get<ByPeer>().end()) MakeCompleted(m_index.project<ByTxHash>(it));
    676     }
    677 
    678+    bool ExpectedTx(NodeId peer, const uint256& txhash)
    679+    {
    680+        auto it_hash = m_index.get<ByTxHash>().lower_bound(ByTxHashView{txhash, State::REQUESTED, 0});
    681+        // We need to traverse all the REQUEST / COMPLETED announcements to verify we effectively
    


    sipa commented at 7:23 pm on August 5, 2024:
    You can do this more efficiently by looking for ByPeerView{peer, false, txhash}, I think (as opposed to iterating)?

    ajtowns commented at 0:46 am on August 15, 2024:
    ReceivedResponse already does this, could just have it return true in that case, and false otherwise (instead of void), and issue the error if neither the txid or wtxid resulted in a true result?

    ariard commented at 7:23 am on August 17, 2024:
    The ExpectedTx have been updated to look by peer {peer, false, txhash} rather than iterating as we’re asking information for a given peer. I’ve not changed ReceiveResponse function call type, to avoid having unused value compiler errors, neither benchmarked yet to compare more the 2 approach.
  14. gmaxwell commented at 5:23 pm on August 6, 2024: contributor

    gradual increase of the discouragement score were non upgraded peers would be allow a number of score points before to be disconnected

    That’s like not-found, – an idea that sounds attractive on it’s face but on examination causes problems. For behavior that can’t be prevented (like, say, inv-ing a transaction to you that already know), then it should obviously never disconnect for that. For behavior that can be prevented like sending a txn before an inv, sending an invalid block, etc.., peers are either compatible or they’re not. If they’re not you want them disconnected ASAP so they can each replace the connection with a compatible peer, and if one is buggy you want the operator finding out ASAP too, ideally as soon as they deploy the buggy code. letting something rack up a dos score hides the issue temporarily but then under different transaction load its suddenly getting disconnected, but since nothing changed for the user they may not be paying much attention.

    Is anyone running this currently with the code set to always enforce or just log? it would be useful to know at what rate nodes are currently seeing unsolicited. In the past, I’d see waves of them… months with absolutely none and then periods where a single IP was mass connecting the whole network and bursting them out. I don’t think I ever observed any widely deployed software doing this, though it easily could have changed. I’m particularly interested if any protocol v2 nodes engage in it, since I think that’s a viable way to immediately enforce without breaking anything.

  15. in src/net_processing.cpp:4561 in 5d86b2bf51 outdated
    4553@@ -4554,6 +4554,19 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
    4554 
    4555         LOCK2(cs_main, m_tx_download_mutex);
    4556 
    4557+        // Do not process unrequested transactions to mitigate potential DoS risks.
    4558+        // We check both identifiers as txid mode may happen with wtxidrelay peers
    4559+        // due to parent-orphan fetching.
    4560+        if (!pfrom.HasPermission(NetPermissionFlags::Relay) && pfrom.GetCommonVersion() >= REJECT_UNSOLICITED_TX_VERSION) {
    4561+            bool is_expected = tx.HasWitness() ? m_txrequest.ExpectedTx(pfrom.GetId(), wtxid) || 
    


    gmaxwell commented at 5:26 pm on August 6, 2024:
    trailing whitespace, btw.

    ariard commented at 7:19 am on August 17, 2024:
    fixed on latest push - good to have people not relying on github for code review.
  16. maflcko commented at 7:53 am on August 7, 2024: member

    I’m particularly interested if any protocol v2 nodes engage in it, since I think that’s a viable way to immediately enforce without breaking anything.

    Just for context, there is an idea for a “BIP324 proxy” for legacy light clients: https://delvingbitcoin.org/t/bip324-proxy-easy-integration-of-v2-transport-protocol-for-light-clients-poc/678 . However, it is experimental, only works for outbound connections, requires the light client to be patched right now (and may have other design issues). Just wanted to mention it for context, because a legacy light client behind a BIP324 proxy will likely not be visible on the network today, but theoretically, it may be in the future. In any case, it seems reasonable to assume anyone opting to run the proxy first made sure that their software does not send unrequested transactions. (If they didn’t, they’ll hopefully notice the relay issue quickly)

  17. andrewtoth commented at 2:16 pm on August 7, 2024: contributor
    See also https://github.com/alfred-hodler/pushtx/ which sends unsolicited TXs.
  18. gmaxwell commented at 3:00 pm on August 7, 2024: contributor

    This PR exposes that nodes are currently making duplicate transaction requests not too infrequently. ::facepalm:: What happens is that when Peer1 announces {parent,child} and Peer2 announces {parent,child} and we request parent from 1 and child from 2, but 2 responds faster, then orphan handling will request parent by txid from 2. Then it gets the parent by txid from 2. Then when peer 1 replies to the request for parent (sending a duplicate transaction), the current behavior of this PR falsely thinks that was an unrequested txn because its tracking was removed after the transaction was from peer 2 was accepted.

    For this PR it’s probably just best to silently ignore AlreadyHave transactions (more precise handling of things that are requested but outstanding would be more correct but requires care in memory management), but the duplicate requests should separately get fixed. At first blush it seems hard to fix them completely but they could be made less common by significantly delaying the requests made on behalf of the orphan resolution, so they get satisfied by the already in flight request for the same txn by wtxid when it comes in… which should almost always be the case outside of a recent startup except when the parent was never offered because it was below the nodes fee filter.

  19. mzumsande commented at 8:40 pm on August 7, 2024: contributor

    This PR exposes that nodes are currently making duplicate transaction requests not too infrequently. (…)

    I think that this will also happen with just one peer and chains of transactions:

    1.) orphan X is received, put into orphanage, parent P requested 2.) a child Y of X is received -> It’s also an orphan, parent X is requested again (by txid!) because AlreadyHaveTx does not check the orphanage by txid (reason) 3.) P is received and accepted 4.) orphan resolution accepts X, removes the tx request for X (in ProcessValidTx()) even though there is an outstanding request 5.) X is received again, it’s seen as unrequested. The peer would be disconnected even though it did nothing wrong.

    For this PR it’s probably just best to silently ignore AlreadyHave transactions

    +1

  20. gmaxwell commented at 6:43 pm on August 10, 2024: contributor
    @mzumsande Good point– though the logic in the comment ought not apply when the transaction in the orphan pool comes from the same peer, in that case the extra request would be purely redundant.
  21. vasild commented at 12:12 pm on August 13, 2024: contributor

    INV+GETDATA is now implemented in #29415 so that it does not stand in the way of this PR, but:

    #21224 was frowned upon for a few reasons and one of them was breaking existent tools. This PR addresses just that by introducing a new protocol version. It does not address the other concerns.

    On the new protocol idea - we don’t expect that an attacker will voluntarily switch to the new protocol so that their attack can be thwarted, right? If this PR is merged and at some point in the future, we drop unsolicited transactions from old peers, then that is the same as going back to #21224, without this PR because that would break existent tools.

    The OP mentions “undocumented clients” and in the discussions there seems to creep the idea that sending unsolicited transactions is a violation of the protocol, undocumented behavior, an error or wrong in some way without providing any reference to back those claims. On the contrary, the oppotite seems to be the case: sending unsolicited transactions is explicitly documented and even if it was not, in the absence of documentation then the source code is the spec and Bitcoin Core accepts unsolicited transactions.


    Correct me if I am wrong but even if this is implemented, an attacker can send up to 5000 (MAX_PEER_TX_ANNOUNCEMENTS) announcements to which the victim will respond (after a few seconds) by requesting 5000 transactions at once. If one transaction takes 1 second to reject, then the attack is successful. The motivation behind MAX_PEER_TX_ANNOUNCEMENTS is to limit the amount of memory used for announcements. I guess it is too high to stop CPU DoS with slow-to-reject transactions… this is already repeating the arguments from #21224.

    In summary, requiring INV+GETDATA does not solve the problem and breaks existent tools. Concept NACK because of this.

    It would be better to throttle the processing of newly arrived transactions, globally and/or per-peer, regardless of whether they are solicited or unsolicited.

  22. gmaxwell commented at 5:42 pm on August 13, 2024: contributor

    @vasild you appear to have disregarded my above message about this change is desirable. Was it unclear?

    With respect to breakage– it’s not uncommon that fixing bugs requires fixing bugs in other software. Fortunately, it’s usually possible to balance the issues. One way to do so is to phase in changes so that older software has an opportunity to fix itself, and once the fixed behavior is ubiquitous then the compatibility issue no longer exists. But if you over narrowly analyze the first step in such a transition you might mistakenly argue that it was pointless because the old behavior still exists.

  23. vasild commented at 1:18 pm on August 14, 2024: contributor

    @vasild you appear to have disregarded my above message about this change is desirable. Was it unclear?

    You mean this one: “At times there have been parties that mass connect…”? If that would be the driving force behind this, then maybe it would be better to put it in the PR description.

    So this is not about CPU DoS attack by slow-invalid transactions, but about somebody doing something stupid and using the Bitcoin network in an ineffective way, sending 500 bytes instead of 30? Requiring INV+GETDATA will probably stop this, but is it such a problem to be worth the effort? How often does it happen? Is it something that happened once 7 years ago, or is it happening regularly, e.g. every week? What % of the entire network traffic is wasted due to that per month?

    You also mentioned wasting inbound slots, but requiring INV+GETDATA will not solve that - the mass connect tool will be updated to do INV+GETDATA+TX instead of just TX. Those are short-lived connections anyway.

    fixing bugs requires fixing bugs in other software

    I do not think that sending or accepting unsolicited transactions is a bug - it is documented, has been working like this forever and the network is just fine with it.

  24. ajtowns commented at 0:31 am on August 15, 2024: contributor

    but requiring INV+GETDATA will not solve that - the mass connect tool will be updated to do INV+GETDATA+TX instead of just TX

    Sending INV and waiting for a GETDATA does mitigate the bandwidth involved substantially: when the txs are duplicates, you’re now only receiving the w/txid instead of the full transaction, reducing the wasted bandwidth by perhaps a factor of at least ~4x (32B vs ~128B for parent txid/sig/output), and probably substantially more. I think the inbound slots issue is just “if we know we don’t want this peer, lets discourage immediately so the slot is free”, so an optimisation rather than addressing a DoS concern per se.

  25. vasild commented at 6:35 am on August 15, 2024: contributor
    Another aspect to consider wrt wasting bandwidth: if the sender knows the recipient does not have the transaction (e.g. when a wallet sends its newly created transaction), then doing the extra INV+GETDATA round-trip is a waste.
  26. sipa commented at 10:55 am on August 15, 2024: member
    @vasild Yes, but that gratuitously reveals to its peers that it is the transaction originator too (not really an issue in your private broadcast setting, but in nearly all others, this would be undesirable for that reason).
  27. [p2p]: Stop unrequested transactions processing
    To mitigate potential DoS risks, stop unrequested transaction
    processing early for upgraded peers (`REJECT_UNSOLICITED_TX_VERSION`),
    unless the peer has `NetPermissionsFlags::Relay`.
    
    Unrequested transactions is defined as one for which we have not
    issued GETDATAs on this connection link.
    32e6bc8964
  28. [draft] disconnect upgraded peers b29303719b
  29. [perf] make is_expected check conditional dd0d06d3ea
  30. [perf] iterate by peer rather than status c7bc444d02
  31. ariard force-pushed on Aug 17, 2024
  32. ariard commented at 7:19 am on August 17, 2024: contributor

    Addressed all code comments at c7bc444d02

    —————————————— @gmaxwell

    That’s like not-found, – an idea that sounds attractive on it’s face but on examination causes problems. For behavior that can’t be prevented (like, say, inv-ing a transaction to you that already know), then it should obviously never disconnect for that. For behavior that can be prevented like sending a txn before an inv, sending an invalid block, etc.., peers are either compatible or they’re not. If they’re not you want them disconnected ASAP so they can each replace the connection with a compatible peer, and if one is buggy you want the operator finding out ASAP too, ideally as soon as they deploy the buggy code. letting something rack up a dos score hides the issue temporarily but then under different transaction load its suddenly getting disconnected, but since nothing changed for the user they may not be paying much attention.

    Surely, in the ideal you would like your whole set of inbound / outbound peers to be fully composed of non-buggy peers, respecting behaviors like sending a txn before an inv or not sending an invalid block. In practice, a full-node is a network-exposed daemon, so it’s good to accomodate a wide variety of network peers to allow to roll out multiple versions of the protocol at the same time. On the internet, it’s not like we have 2 versions of the IP protocol used in production…

    On the silent rack up you’re describing, this can be indeed an issue as suddenly you’re getting a buggy peer disconnected, while this connection slot could have been used for a reliable peer. I believe it’s more making the point that bad discouragement dos score could be encompassed in the automated inbound slot eviction heuristics.

    Is anyone running this currently with the code set to always enforce or just log? it would be useful to know at what rate nodes are currently seeing unsolicited. In the past, I’d see waves of them… months with absolutely none and then periods where a single IP was mass connecting the whole network and bursting them out. I don’t think I ever observed any widely deployed software doing this, though it easily could have changed. I’m particularly interested if any protocol v2 nodes engage in it, since I think that’s a viable way to immediately enforce without breaking anything.

    No, I’ve not been running a full-node with the old patch to monitor unsolicited transactions. As you’re pointing out whatever buggy lightweight clients doing that behavior, there is still the situation of a single IP mass connecting the whole network, and potentially doing so for deanonymization purposes.

    In any way, as you’re suggesting I think too upgrading to a protocol v2 like TX_PROTOCOL_V2 can be a practical way to introduce this more restrictive behavior on inv-txn flow. So far other examples of p2p behavior deprecation have been things like NODE_BLOOM version bits. @maflcko

    Just for context, there is an idea for a “BIP324 proxy” for legacy light clients: https://delvingbitcoin.org/t/bip324-proxy-easy-integration-of-v2-transport-protocol-for-light-clients-poc/678 . However, it is experimental, only works for outbound connections, requires the light client to be patched right now (and may have other design issues). Just wanted to mention it for context, because a legacy light client behind a BIP324 proxy will likely not be visible on the network today, but theoretically, it may be in the future. In any case, it seems reasonable to assume anyone opting to run the proxy first made sure that their software does not send unrequested transactions. (If they didn’t, they’ll hopefully notice the relay issue quickly)

    Yes, I think that means if the bip324 proxy got more used by legacy light clients, version agent and protocol spoken it is harder to rely on to monitor solely on the legacy clients characteristics. @gmaxwell

    This PR exposes that nodes are currently making duplicate transaction requests not too infrequently. ::facepalm:: What happens is that when Peer1 announces {parent,child} and Peer2 announces {parent,child} and we request parent from 1 and child from 2, but 2 responds faster, then orphan handling will request parent by txid from 2. Then it gets > the parent by txid from 2. Then when peer 1 replies to the request for parent (sending a duplicate transaction), the current behavior of this PR falsely thinks that was an unrequested txn because its tracking was removed after the t> ransaction was from peer 2 was accepted.

    Of course, orphaness is terrible for this type of restrictiosn on the inv-tx flow, I already pointed out so in my original PR.

    “I think we have a fundemental issue with orphans processing. Orphanness source isn’t strictly assignable. I.e, Mallory might provoke mempool conflicts between Alice and Bob, sending Y to Alice, Y’ to Bob. From then, Mallory can send a set of Y childrens to Alice, she will accept them and relay forward to Bob. Alice will be the one swalloing the DoS penalty. And now you have a vector of tx-relay jamming…”

    For this PR it’s probably just best to silently ignore AlreadyHave transactions (more precise handling of things that are requested but outstanding would be more correct but requires care in memory management), but the duplicate requests should separately get fixed. At first blush it seems hard to fix them completely but they could be made less common by significantly delaying the requests made on behalf of the orphan resolution, so they get satisfied by the already in flight request for the same txn by wtxid when it comes in… which should almost always be the case outside of a recent startup except when the parent was never offered because it was below the nodes fee filter.

    The AlreadyHave can be a good idea, but I think now you’re introducing a bandwidth free relay vector at last, where a single parent can be concurrently replayed through many transaction-relay links. I’m not sure either delayed orphan resolution is necessarily a working way as now you can have slow propagation of pre-signed low-fees transactions that are only interesting with high-fee child. And it’s the simple chain of transactions situations, you can have a same child txid, different wtxid / different feerates rebinding on the same parent… @mzumsande

    1.) orphan X is received, put into orphanage, parent P requested 2.) a child Y of X is received -> It’s also an orphan, parent X is requested again (by txid!) because AlreadyHaveTx does not check the orphanage by txid (reason) 3.) P is received and accepted 4.) orphan resolution accepts X, removes the tx request for X (in ProcessValidTx()) even though there is an outstanding request 5.) X is received again, it’s seen as unrequested. The peer would be disconnected even though it did nothing wrong.

    If I’m understanding correctly, you’re pointing to sequential orphan resolution of a chain of transaction P <- X <- Y, yes I think you could have X the intermediary transaction element seen as unrequested. See my point above to gmaxwell, as I think the AlreadyHave presents already issues with adding a free relay vector on simple 2-depth chain of transactions. @vasild

    #21224 was frowned upon for a few reasons and one of them was breaking existent tools. This PR addresses just that by introducing a new protocol version. It does not address the other concerns.

    Addressing all the other concerns mentioned in the old PR means purely re-writing the ~10k of LoC of the transaction-relay stack from scratch in cpp or rust…It’s an interesting exercise and certainly one that someone could actually do so thanks to the multiprocess interfaces, but in the meantime it will leave forever exposed current bitcoin core full-nodes to DoS vectors arising from hard to verify transactions.

    On the new protocol idea - we don’t expect that an attacker will voluntarily switch to the new protocol so that their attack can be thwarted, right? If this PR is merged and at some point in the future, we drop unsolicited transactions from old peers, then that is the same as going back to #21224, without this PR because that would break existent tools.

    Of course we don’t live in the world of Care Bears so we cannot expect attackers running potentially modified software to adhere to the new protocol, period. Though here you’re missing all the salt of the past conversations on the old PR, where there is a trade-off between breaking either existent tools or letting exposed current nodes.

    The OP mentions “undocumented clients” and in the discussions there seems to creep the idea that sending unsolicited transactions is a violation of the protocol, undocumented behavior, an error or wrong in some way without providing any reference to back those claims. On the contrary, the oppotite seems to be the case: sending unsolicited transactions is explicitly documented and even if it was not, in the absence of documentation then the source code is the spec and Bitcoin Core accepts unsolicited transactions.

    There is no BIP standard documenting this behavior about transaction-relay and there is no guarantee that other full-nodes softwares are respecting the non-reject of unsolicited transactions. In the absence of documentation, it’s indeed wise to refer to the code, though in matters of transaction-relay messages usually it has been something fleshout from iterations between full-nodes softwares and application clients (e.g the MEMPOOL message), and this I think since the early days of lightweight clients deployment in bitcoin circa ~2011.

    Correct me if I am wrong but even if this is implemented, an attacker can send up to 5000 (MAX_PEER_TX_ANNOUNCEMENTS) announcements to which the victim will respond (after a few seconds) by requesting 5000 transactions at once. If one transaction takes 1 second to reject, then the attack is successful. The motivation behind MAX_PEER_TX_ANNOUNCEMENTS is to limit the amount of memory used for announcements. I guess it is too high to stop CPU DoS with slow-to-reject transactions… this is already repeating the arguments from #21224.

    This is correct that the current value of 5000 (MAX_PEER_TX_ANNOUNCEMENTS) can still leave some DOS exposure, and that reducing it could be considered in future PRs, however without that initial step of rejecting the unsolicited transactions over the MAX_PEER_TX_ANNOUNCEMENTS limit, this is quite useless to consider at first the default value of MAX_PEER_TX_ANNOUNCEMENTS.

    In summary, requiring INV+GETDATA does not solve the problem and breaks existent tools. Concept NACK because of this.

    It would be better to throttle the processing of newly arrived transactions, globally and/or per-peer, regardless of whether they are solicited or unsolicited.

    Randomly throttling the processing of newly arrived solicited and unsolicited transactions could make the mitigation worst than the DoS exposure reduction, as this could a transaction propagation censorship vector.

    You mean this one: “At times there have been parties that mass connect…”? If that would be the driving force behind this, then maybe it would be better to put it in the PR description.

    Generally, this is not the job of bitcoin core contributors to write a full description on how transactions relay behaviors could be used by incompetent chainalysis companies to conduct deanonymization attacks on the network…

    So this is not about CPU DoS attack by slow-invalid transactions, but about somebody doing something stupid and using the Bitcoin network in an ineffective way, sending 500 bytes instead of 30? Requiring INV+GETDATA will probably stop this, but is it such a problem to be worth the effort? How often does it happen? Is it something that happened once 7 years ago, or is it happening regularly, e.g. every week? What % of the entire network traffic is wasted due to that per month?

    Leveling up the bar w.r.t to potential deanonymization is one concern. A full-node being feeded unsolicited hard to verify transactions that could be DoSy can be another concern.

    You also mentioned wasting inbound slots, but requiring INV+GETDATA will not solve that - the mass connect tool will be updated to do INV+GETDATA+TX instead of just TX. Those are short-lived connections anyway.

    A mass connect tool might not open short-lived connections, all depends the purpose of the tool.

    I do not think that sending or accepting unsolicited transactions is a bug - it is documented, has been working like this forever and the network is just fine with it.

    Not really, there is no such documentation about unsolicited transaction in the codebase at commit 4405b78d, and times have proven this version of the software had significant and major bugs such as BIP30. @ajtowns

    Sending INV and waiting for a GETDATA does mitigate the bandwidth involved substantially: when the txs are duplicates, you’re now only receiving the w/txid instead of the full transaction, reducing the wasted bandwidth by perhaps a factor of at least ~4x (32B vs ~128B for parent txid/sig/output), and probably substantially more. I think the inbound slots issue> is just “if we know we don’t want this peer, lets discourage immediately so the slot is free”, so an optimisation rather than addressing a DoS concern per se.

    About the inbound slots issue, see one of the comment of gmaxwell above, I think there is an argument about the robustness of your local transaction-relay topology by not dropping misbehaving peers at first protocol violation. IMHO, I think this can be improved by adding a discouragement score, or whatever other marker as inbound peers eviction heuristics. @vasild

    Another aspect to consider wrt wasting bandwidth: if the sender knows the recipient does not have the transaction (e.g. when a wallet sends its newly created transaction), then doing the extra INV+GETDATA round-trip is a waste.

    The sender can never be sure that the recipient already have the transaction, recipient full-node could have shutdown and reboot or the transaction could have temporarily fallen under the mempool min fee at the TX message reception.

  33. ariard commented at 7:38 am on August 17, 2024: contributor

    I think the main open problem of this approach, and iiuc as pointed out by gmaxwell and aj, if it’s applied to upgraded peers only (REJECT_UNSOLICITED_TX_VERSION) is the occupation of inbound slots by buggy peers. Slots that could be rather used by more non-buggy peers to preserve robustness of the transaction-relay topology.

    I think one way to alleviate is to introduce a new versioning of the transaction-relay protocol itself, which has not been really existent so far and that way have an equilibrated transaction-relay topology. The node version bits could be used for that as that way more selective peering policy can be build on top.

     0diff --git a/src/protocol.h b/src/protocol.h
     1index fd7cfddf3b..495a2e3a21 100644
     2--- a/src/protocol.h
     3+++ b/src/protocol.h
     4@@ -329,6 +329,9 @@ enum ServiceFlags : uint64_t {
     5     // NODE_P2P_V2 means the node supports BIP324 transport
     6     NODE_P2P_V2 = (1 << 11),
     7 
     8+    // NODE_TXRELAY_V2 means the node supports BIPXXX transaction-relay v2 protocol
     9+    NODE_TXRELAY_V2 = (1 << 12),
    10+
    11     // Bits 24-31 are reserved for temporary experiments. Just pick a bit that
    12     // isn't getting used, or one not being used much, and notify the
    13     // bitcoin-development mailing list. Remember that service bits are just
    

    Introducing a transaction-relay versioning as another advantage, independent of this PR’s concerns, as ConnectionType and expected node p2p messages support could be looked-on before any connection attempt. The service bits are within addr messages already.

  34. DrahtBot removed the label CI failed on Aug 17, 2024
  35. gmaxwell commented at 11:29 pm on August 18, 2024: contributor

    @vasild I don’t think ariaid’s initial justification for the PR was the most important one. Sometimes people propose doing the right things for less than the most important reasons. If one were to judge things only by their initial proposals then it would open a very easy attack on development– just keep proposing good things with bad justification to make sure they won’t get done. :)

    The amount of CPU-work per byte sent might only be increased by 50% by forcing INV use. E.g. say an attack scriptpubkey in the utxo set causes a lot of work, and for whatever reason the attack is limited to single input transactions. The minimum serialized transaction size is about 63 bytes. So without inv the attacker sends 63 bytes per unit of work, but with the inv the attacker sends 32+63=95 bytes per unit of work. So for a given amount of bandwidth the attacker causes 50% more work in this contrived and extreme case.

    I don’t think that’s a particularly interesting reason to motivate the PR. But I think not having an incentive to send unexpected data generally is a very important reason, as a single mass connector can easily end up using a very large fraction (as in a majority, depending on how many txn they send) of the networks’ total bandwidth usage from doing so.

    I think the mass connector concern would be potentially addressable by doing something like, “if a peer sends an unsolicited txn stop all processing on the peer for N seconds” (h/t to Pieter for mentioning that in an off-thread discussion). Then go ahead and process it. It would be backwards compatible too since some wallet relaying their txn wouldn’t be meaningfully harmed by an N second delay. The primary downside I see is that the mass connectors may continue because without the disconnect they don’t realize their “optimization” pessimizes instead of optimizes. We certainly saw that with block relay, e.g. viabtc for years spamming every node they could connect to with 900kb witness stripped blocks that just got totally ignored.

    Aside, in the case of private broadcast– I think it might eventually be good for nodes to periodically “private broadcast” transactions that aren’t their own at a very low rate. This is because anonymity networks like tor have extremely low resistance to traffic analysis. It seems likely to me that ordinary bitcoin p2p will have better traffic analysis resistance than “private broadcast” in the future (e.g. after doing a very small amount of work on using p2p encryption padding and rate limiting). I’ll go make some comments on the private broadcast PR to discuss that. In any case, INV in that case then helps mitigate traffic even for private broadcast. @ariard

    but I think now you’re introducing a bandwidth free relay vector at last,

    Huh? Nothing here changes at all what transactions a node relays. And someone can always waste bandwidth to you if thats their only goal (e.g. sending you IP ping packets ).

    In any case, I’m only concerned about the shortest path to deploy– since the behavior change could be made initially lax and then tightened over time. Right now, existing bitcoin core nodes are foolishly making many duplicate requests. It’s clear that this could be improved, but it seems like some will remain. More precise tracking can be done but would need analysis from a worst case memory usage perspective.

  36. DrahtBot added the label CI failed on Sep 4, 2024
  37. DrahtBot commented at 2:33 am on September 4, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/28889097062

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  38. ariard commented at 1:15 am on September 5, 2024: contributor

    @gmaxwell

    Huh? Nothing here changes at all what transactions a node relays. And someone can always waste bandwidth to you if thats their only goal (e.g. sending you IP ping packets ).

    In any case, I’m only concerned about the shortest path to deploy– since the behavior change could be made initially lax and then tightened over time. Right now, existing bitcoin core nodes are foolishly making many duplicate reque> sts. It’s clear that this could be improved, but it seems like some will remain. More precise tracking can be done but would need analysis from a worst case memory usage perspective.

    Back from travel – I looked more to implement AlreadyHaveTx() at the reception of NetMsgTypeTx as an exemption of calling Misbehaving, though after lokking back on TxRequestTracker code this concurrency behavior due to orphan handling has very low odds to happen as the announcements should be ordered by CANDIDATE and REQUESTED status. See https://github.com/bitcoin/bitcoin/blob/master/src/txrequest.h#L152

    Given we’re taking the cs_main lock to deal with TxRequestTracker, if we have concurrent requests that can be issued, whatever the level of peer preferredness, it should be fix at that level.

    Beyond, I maintain that you’re naively adding AlreadyHaveTx() on this current branch, you would add a bandwidth free relay vector, as peer1 and peer2 can collude to send a unique child towards the same node (the parent not existing at all for any other node), and have the GETDATA for the parent duplicated by the peer.

    For this behavior change, and without digging more about precise tracking and memory / CPU usage, I think current branch approach is the best that can be done…? Though we can always introduce a NODE_TXRELAY_V2 bit service flag to signal disconnection for protocol violating peers, either buggy or malicious.

  39. ariard commented at 1:31 am on September 5, 2024: contributor

    I think the mass connector concern would be potentially addressable by doing something like, “if a peer sends an unsolicited txn stop all processing on the peer for N seconds” (h/t to Pieter for mentioning that in an off-thread disc> ussion).

    I don’t think it is that easy…a peer can be the fastest to announce a seen-by-all network txid, get requests from the node and then inject an unsolicited tx to stop all processing, to force the node to fallback to another peer to fetch the seen-by-all network txid which is a delay… If it’s a miner node, and it’s a high fee transaction this can be an interesting trick to play with.

  40. Add NODE_TXRELAY_V2.
    This does not add default signaling of NODE_TXRELAY_V2 as node local
    services.
    23551efc91
  41. ariard commented at 2:37 am on September 5, 2024: contributor

    Updated at [23551ef] with the NODE_TXRELAY_V2 approach.

    Submitted a BIP draft in that sense, in the lack for now of a better approach for now on how to implement reject of unrequested transactions in a minimally-disruptive fashion for the bitcoin peer-to-peer network. I think I should open a distinct PR for the service signaling support itself.

  42. mzumsande commented at 11:07 am on September 5, 2024: contributor

    Beyond, I maintain that you’re naively adding AlreadyHaveTx() on this current branch, you would add a bandwidth free relay vector, as peer1 and peer2 can collude to send a unique child towards the same node (the parent not existing at all for any other node), and have the GETDATA for the parent duplicated by the peer.

    I don’t get this at all. The suggestion above was, if you receive an unrequested TX, and you already have it, ignore it but don’t disconnect the peer. Which is the same as current behavior on master.  

    Free relay is a network-level concept, all of what you describe is local between nodes that are connected. In order for anything to be a free relay vector, resources of non-involved nodes must be wasted.

    and have the GETDATA for the parent duplicated by the peer.

    If a tx is received a second time, and that tx is in the orphanage, it is ignored - there will be no second GETDATA for the parent on either master or this branch.

    though after lokking back on TxRequestTracker code this concurrency behavior due to orphan handling has very low odds to happen as the announcements should be ordered by CANDIDATE and REQUESTED status

    When I tested it (simply start a node and log when an unrequested tx is received),unrequested transactions arrived frequently and within seconds of starting the node. I would recommend to just test it for yourself…

  43. ariard commented at 10:36 pm on September 5, 2024: contributor

    @mzumsande

    I don’t get this at all. The suggestion above was, if you receive an unrequested TX, and you already have it, ignore it but don’t disconnect the peer. Which is the same as current behavior on master.

    Let’s say you have Peer1 and Peer2 announcing the Transaction ABC, if ABC is concurrently fetch from both Peer1 and Peer2, the GETDATA (32-bytes) for ABC would have been wasted as outgoing bandwidth by Local Node. This 32 byte leak can be amplified if batch of transactions are concurrently fetched (bounded by _INVENTORY_BROADCAST_INTERVAL).

    Free relay is a network-level concept, all of what you describe is local between nodes that are connected. In order for anything to be a free relay vector, resources of non-involved nodes must be wasted.

    I’m considering the bandwidth ressources of the local node here (though easy to have the single UTXO used for the leak being re-used against a N number of topology edges) . A topology Peer1 <-> Local Node <-> Peer2 is enough to be a network.

    If a tx is received a second time, and that tx is in the orphanage, it is ignored - there will be no second GETDATA for the parent on either master or this branch.

    … ? For AlreadyHaveTx() the HaveTx in TxOrphanage is done on wtxid and AFAICT, one can still announce a transaction by wtxid and then the same transaction with another wtxid (use the OP_ELSE instead of an OP_IF spending path in witness). I think the AlreadyHaveTx L4585 should return false at the second TX reception and code flow execution should land in TX_MISSING_INPUTS branch.

    When I tested it (simply start a node and log when an unrequested tx is received),unrequested transactions arrived frequently and within seconds of starting the node. I would recommend to just test it for yourself…

    Sure, I’m aware that unrequested transactions is an allowed behavior since commit 5b788bb from my read of the code. This doesn’t mean it cannot be exploited under adversarial scenarios, e.g mass connectors for deanonymization purposes.

  44. mzumsande commented at 8:29 am on September 6, 2024: contributor

    Let’s say you have Peer1 and Peer2 announcing the Transaction ABC, if ABC is concurrently fetch from both Peer1 and Peer2, the GETDATA (32-bytes) for ABC would have been wasted as outgoing bandwidth by Local Node. This 32 byte leak can be amplified if batch of transactions are concurrently fetched (bounded by _INVENTORY_BROADCAST_INTERVAL).

    That has nothing to do with free relay, because it doesn’t propagate to the rest of the network (hence the “relay” in the name). It’s also not a interesting attack because there are dozens of ways for an attacker to waste both its own and its peer’s bandwidth when it is on one end of the connection: For example send messages of unknown type (that are ignored), request a block repeatedly, spam INVs with random txids and get GETDATA in return, sending IP ping packets (see gmaxwell above) etc.

  45. maflcko commented at 9:44 am on September 6, 2024: member
    Would be nice to change the title from “… transactions v2” to something else. Otherwise, it may be a bit confusing in the context of “v2 transactions” (transactions with version number set to 2, not 3), or “v2 transport” (the P2P thing).
  46. ariard commented at 6:47 pm on September 6, 2024: contributor

    @mzumsande

    That has nothing to do with free relay, because it doesn’t propagate to the rest of the network (hence the “relay” in the name). It’s also not a interesting attack because there are dozens of ways for an attacker to waste both its own and its peer’s bandwidth when it is on one end of the connection:

    Sure, technically this GETDATA duplication alone isn’t a free relay bandwidth, in the sense that a transaction is relayed many times over the network with no feerate / absolute fee compensation.

    Yet, I think this can be easily escalated as a free relay vector with the following sequence of events:

    Replacement penalty has been indeed paid on the conflicting child, but not for the vbytes of the parent at second-relay to wtixd-relay peers.

    For example send messages of unknown type (that are ignored), request a block repeatedly, spam INVs with random txids and get GETDATA in return, sending IP ping packets (see gmaxwell above) etc.

    This is missing the point…there is no common measure between messages of unknown types that will be disregarded siltently (“Ignore unknown commands for extensibility”)and messages that will trigger a data fetch from the Local Node such as transaction relay. I got gmaxwell point above, though it’s not the purpose of this code change to discuss here with gmaxwell all what is wrong with core p2p stack (though we could …?).

    I think there have been no answer on the point I was raising above, re-quoting:

    … ? For AlreadyHaveTx() the HaveTx in TxOrphanage is done on wtxid and AFAICT, one can still announce a transaction by wtxid and then the same transaction with another wtxid (use the OP_ELSE instead of an OP_IF spending path in witn> ess). I think the AlreadyHaveTx L4585 should return false at the second TX reception and code flow execution should land in TX_MISSING_INPUTS branch.

  47. ariard commented at 6:49 pm on September 6, 2024: contributor

    Would be nice to change the title from “… transactions v2” to something else. Otherwise, it may be a bit confusing in the context of “v2 transactions” (transactions with version number set to 2, not 3), or “v2 transport” (the P2P thing).

    See #30837, as the signaling support for transaction V2 can be discussed separately from halting the processing of unrequested transactions.

  48. ajtowns commented at 4:47 am on September 8, 2024: contributor

    Motivation is still the same to limit the DoS exposure in face of some DoSy behaviors from tx-relay peers.

    If this is just something a peer can opt-in to, I don’t think it mitigates any DoS behaviours: spammers/attackers will just use the old version.

    I’m pretty skeptical that anything here can really prevent attacks – an attacker can run multiple or low-latency connections to end up with pretty much the same result in most cases, I think. As far as I can see the main benefit is to make low-effort mass broadcast a bit less redundant, so that we only end up with (w)txids getting spammed unnecessarily, not the complete transaction. That seems a worthwhile improvement to me, but not one that should be mistaken for a DoS fix.

    But I don’t see how we get even that benefit if this is an opt-in behaviour that isn’t applied to existing software, though. Going beyond that and introducing a service flag for this doesn’t make any sense to me.

  49. ariard commented at 4:47 am on September 13, 2024: contributor

    @ajtowns

    If this is just something a peer can opt-in to, I don’t think it mitigates any DoS behaviours: spammers/attackers will just use the old version.

    Yes, indeed this is a quite obvious point about a node service bit. It was raised during the mempoolfullrbf discussion years ago (see #25600), that effectively node service bit network mechanism cannot be effectively checked for, without actually connecting to and triggering the logic. About the present case, this observation has been already made by gmax above: “I’m somewhat dubious of the value of this behind a version check: the abusive hosts that intentionally do this will just continue to be an older version and I don’t think it’s realistic to cut off older versions completely.

    One code improvement to throttle out spammers / attacker could be to favor inbound slots for upgraded peers when the eviction logic is triggered (AttemptToEvictConnection) when the inbound slots peers are full, that can be added in a patch of its own.

    I’m pretty skeptical that anything here can really prevent attacks – an attacker can run multiple or low-latency connections to end up with pretty much the same result in most cases, I think. As far as I can see the main benefit is to make low-effort mass broadcast a bit less redundant, so that we only end up with (w)txids getting spammed unnecessarily, not the complete transaction. That seems a worthwhile improvement to me, but not one that should be mistaken for a DoS fix.

    As usual, when you’re evaluating an attacker, you have to think in terms of ressources available both for the attacker and defender. From the defender viewpoint, there is only at max 125 connections to different peers, so if this assumption is correct (let’s waive at network-level attack that most of the time demands corresponding capabilities…) an attacker cannot have an unbounded number of low-latency connections. If spam nodes are disfavored by the peering mechanism, it’s reducing at least the overall spam transactions that a local node has to accept or reject for a time duration, e.g the one driving the tx fetch done by the TxRequestTracker. Though yes, I think too with you that it should make low-effort mass broadcast a bit less redundant, and hypothetical save global network bandwidth, even in non adversarial situations.

    But I don’t see how we get even that benefit if this is an opt-in behaviour that isn’t applied to existing software, though. Going beyond that and introducing a service flag for this doesn’t make any sense to me.

    This is leveling up the bar for mass connector wishing to deanonymize transaction propagation over the network, even if they’re running non-upgraded peers, their non-solicited transactions will be rejected. See deanonymization heuristics like tx probe where the peering topology can be inferred from transactions propagations.

    All that said, if you can think about a mechanism to drop unsolicited transactions, with no invasive effects on non-upgraded softwares, with no downsides, I’ll have a look and if I think it’s technically better, I’ll withdraw this code change.

  50. ajtowns commented at 10:58 am on September 23, 2024: contributor

    an attacker cannot have an unbounded number of low-latency connections.

    You only need one low-latency connection to run a pretty effective tx spamming attack, as low-latency means that the INV/GETDATA/TX cycle is fast, and you’re not rate-limited by the maximum number of pending GETDATA requests. (And even if you don’t have a low-latency connection, you can open many connections to get a multiple of that rate-limit)

  51. ariard commented at 7:22 pm on September 23, 2024: contributor

    You only need one low-latency connection to run a pretty effective tx spamming attack, as low-latency means that the INV/GETDATA/TX cycle is fast, and you’re not rate-limited by the maximum number of pending GETDATA requests. (And even if you don’t have a low-latency connection, you can open many connections to get a multiple of that rate-limit)

    I don’t deny it that you need only one low-latency connection to run a pretty effective tx spamming attack. Though, in my understanding, you’re pointing yourself one of the aspect that this code change aims to improve, namely the lack of effective rate limitation of the maximum number of TX validation cycle that one node can have to process, independently of the INV/GETDATA flow.

    And indeed, as you’re paraphrasing, even if an attacker does not have a low-latency connection, one can open many connections to get a multiple of the per-connection currently not respected rate-limit processing). Yet, as I was pointing in practice bitcoin core already has limitations on the max number of connections, i.e 125.

    Leveling-up in face of tx spamming attacks also should make it harder to deanonymize transaction propagation over the transaction-relay network. If you have another technical approach, achieving the same effect and under the same minimalism, than the proposed current code change, it’s definitively worthy to think more about it.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me