net: Block v2->v1 transport downgrade if !fNetworkActive #32073

pull hodlinator wants to merge 1 commits into bitcoin:master from hodlinator:2025/02/avoid_reconnect changing 1 files +3 −2
  1. hodlinator commented at 9:30 pm on March 14, 2025: contributor

    We might have just set CNode::fDisconnect in the first loop because of !CConnman::fNetworkActive.

    Attempting to reconnect using v1 transport just because fNetworkActive was set to false at the “right” stage in the v2 handshake does not make sense.

    Issue discovered by davidgumberg.

  2. DrahtBot commented at 9:31 pm on March 14, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32073.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK davidgumberg, vasild, mabu44, stratospher

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32061 ([draft] Replace libevent with our own HTTP and socket-handling implementation by pinheadmz)
    • #32015 (net: replace manual reference counting of CNode with shared_ptr by vasild)
    • #30988 (Split CConnman by vasild)
    • #30951 (net: option to disallow v1 connection on ipv4 and ipv6 peers by stratospher)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label P2P on Mar 14, 2025
  4. hodlinator commented at 9:32 pm on March 14, 2025: contributor

    How I tested

    On master, run mainnet node:

    0bitcoind -debug=net -daemonwait
    

    Tail log for relevant info:

    0tail -f ~/.bitcoin/debug.log | grep -E "SetNetworkActive|Network not active|retrying"
    

    In another console, spam:

    0build/bin/bitcoin-cli setnetworkactive false && sleep 0.5 && build/bin/bitcoin-cli setnetworkactive true
    

    Expected grep results:

    02025-03-14T21:12:17Z SetNetworkActive: false
    12025-03-14T21:12:17Z [net] Network not active, disconnecting peer=9
    22025-03-14T21:12:17Z SetNetworkActive: true
    32025-03-14T21:12:18Z SetNetworkActive: false
    4...
    52025-03-14T21:12:23Z SetNetworkActive: true
    62025-03-14T21:12:23Z SetNetworkActive: false
    72025-03-14T21:12:24Z [net] Network not active, disconnecting peer=10
    82025-03-14T21:12:24Z [net] retrying with v1 transport protocol for peer=10
    
  5. hodlinator renamed this:
    net: Only try downgrade v2->v1 transport if online
    net: Only attempt v2->v1 transport downgrade if online
    on Mar 15, 2025
  6. hodlinator force-pushed on Mar 15, 2025
  7. fanquake requested review from mzumsande on Mar 17, 2025
  8. fanquake requested review from stratospher on Mar 17, 2025
  9. stratospher commented at 8:43 am on March 17, 2025: contributor

    fNetworkActive doesn’t indicate whether our node is online/offline - it’s just a node specific option configured using RPC/GUI to enable/disable P2P network activity. So even with this patch, we’d still attempt for a v1 reconnection if we’re trying to establish a v2 connection during the time period when our internet goes off :)

    (Could you update the online/offline terminology in the PR?)

    If we’re attempting a v2 connection during the time period when the fNetworkActive option is toggled, we would have the 1 extra connection attempt but less probability of this happening/OpenNetworkConnection immediately returns anyways.

    Problem would be if something like #13038 happened, but i don’t think that’s the case?

  10. in src/net.cpp:1935 in 0f4e20728d outdated
    1931@@ -1932,7 +1932,7 @@ void CConnman::DisconnectNodes()
    1932                 // Add to reconnection list if appropriate. We don't reconnect right here, because
    1933                 // the creation of a connection is a blocking operation (up to several seconds),
    1934                 // and we don't want to hold up the socket handler thread for that long.
    1935-                if (pnode->m_transport->ShouldReconnectV1()) {
    1936+                if (fNetworkActive && pnode->m_transport->ShouldReconnectV1()) {
    


    vasild commented at 10:46 am on March 17, 2025:

    fNetworkActive will be read 2 times in CConnman::DisconnectNodes(). Given that it can change any time, it could happen that it has one value in the first read and another value here in the second read (this is actually inside a loop so it is many second reads, one for each node). This makes it a bit involved to reason all possible outcomes and could produce unexpected results.

    I think it would be better if the variable is read just once in the CConnman::DisconnectNodes() method. This will make the method atomic wrt to that - it will behave as if the variable is changed either before it starts or after it ends but never while it is executing.

    0const bool network_active{fNetworkActive};
    1// and then use network_active in both places.
    

    hodlinator commented at 4:30 pm on March 17, 2025:
    Also felt it was a bit brittle, fixed in latest push, with added note in commit message.

    davidgumberg commented at 11:49 pm on March 17, 2025:

    Not blocking, but this seemed slightly more correct to me before, since now if fNetworkActive == false when caching the result, but true when disconnecting, we might not attempt v1 reconnect when we should have.

    Enumerating the possible states in the old version where fNetworkActive is checked “twice”, first to decide whether or not we should flag all m_nodes for disconnect, and the “second time” where for each node flagged for disconnect, we check whether or not we should retry v1 before actually disconnecting them:

    1. If fNetworkActive is true in the first check and false in the second check, this is fine since we should never be attempting v1 reconnect when fNetworkActive == false

    2. If fNetworkActive == false in the first check, causing us to flag all peers for disconnect, but becomes true for the second check, behavior is unchanged from today.

    3. If fNetworkActive == true both times, this is the happy path, no change.

    4. Same if fNetworkActive == false both times, nothing changes.


    hodlinator commented at 2:06 pm on March 18, 2025:

    Not blocking, but this seemed slightly more correct to me before, since now if fNetworkActive == false when caching the result, but true when disconnecting, we might not attempt v1 reconnect when we should have.

    The probability that someone will be able to toggle fNetworkActive false -> true between the first loop starting and the second loop, when one also happens to have a peer in the right step in the handshake.. just doesn’t feel like it should be something we need to support. By keeping it “atomic” we compress down from 4 to 2 permutations.

    Counter-example - Running V1 with V2 capable node

    On master, one could contrive the possibility of having a V2 peer which we are handshaking with,

    1. User toggles fNetworkActive true -> false.
    2. Making CConnman::DisconnectNodes(): a. Set pnode->fDisconnect = true. b. Queue a v1 reconnection attempt.
    3. User toggles fNetworkActive false -> true before we process reconnections.
    4. We now end up using V1 with a peer capable of V2.

    Behavior with current version of this PR in this case will be to drop the V2 peer and not attempt immediate reconnection.

    (If we instead read fNetworkActive a second time when checking for reconnection like this PR did initially, on could reorder 2.b. and 3. in the above example to run into the same issue).

  11. vasild commented at 10:46 am on March 17, 2025: contributor
    Approach ACK 0f4e20728d231935f0140845370090a760ae93af
  12. net: Block v2->v1 transport downgrade if !CConnman::fNetworkActive
    We might have just set CNode::fDisconnect in the first loop because of being offline.
    
    Also caches CConnman::fNetworkActive in case it's changed concurrently with our own thread.
    6869fb4170
  13. hodlinator force-pushed on Mar 17, 2025
  14. hodlinator renamed this:
    net: Only attempt v2->v1 transport downgrade if online
    net: Block v2->v1 transport downgrade if !fNetworkActive
    on Mar 17, 2025
  15. hodlinator commented at 4:27 pm on March 17, 2025: contributor

    Could you update the online/offline terminology in the PR?

    Fixed to use fNetworkActive instead, agree it was a bit of a stretch.

    If we’re attempting a v2 connection during the time period when the fNetworkActive option is toggled, we would have the 1 extra connection attempt but less probability of this happening/OpenNetworkConnection immediately returns anyways.

    Yes, seems like we wouldn’t get very far with reconnection anyway, but good to avoid the log message.

  16. hodlinator commented at 4:44 pm on March 17, 2025: contributor

    Windows CI failure is unrelated:

    0Downloading https://gitlab.freedesktop.org//freetype/freetype/-/archive/VER-2-13-3/freetype-VER-2-13-3.tar.gz -> freetype-freetype-VER-2-13-3.tar.gz
    1error: https://gitlab.freedesktop.org//freetype/freetype/-/archive/VER-2-13-3/freetype-VER-2-13-3.tar.gz: failed: status code 503
    

    Currently tracked as #32085.

  17. davidgumberg commented at 2:56 am on March 18, 2025: contributor

    Tested and Reviewed ACK 6869fb417096b43ba7f7

    I could be wrong, but I think https://github.com/bitcoin/bitcoin/commit/0f4e20728d231935f0140845370090a760ae93af was more correct (see #32073 (review)), but I don’t think it’s that important. I was wrong, see #32073 (review)


    Anecdotally, the issue this fixes is more likely to occur on startup, probably because of how many new connections we’re making, so I used this script to reproduce:

    0bitcoind -debug=net -daemonwait  && sleep 20 \
    1  && bitcoin-cli setnetworkactive false && sleep 0.5 \
    2  && bitcoin-cli setnetworkactive true && sleep 0.5 \
    3  && bitcoin-cli  stop
    

    Message about retrying v1 connection doesn’t occur on this branch.

  18. DrahtBot requested review from vasild on Mar 18, 2025
  19. vasild approved
  20. vasild commented at 12:14 pm on March 19, 2025: contributor
    ACK 6869fb417096b43ba7f74bf767ca3e41b9894899
  21. mabu44 commented at 8:57 pm on March 19, 2025: none
    ACK 6869fb417096b43ba7f74bf767ca3e41b9894899 Reviewed the code and tested it with the command provided in #32073 (comment). I was able to reproduce the problem on master only once and never on this branch.
  22. stratospher commented at 6:34 am on March 20, 2025: contributor
    ACK 6869fb4. I’ve reviewed the code but don’t have strong preference for this branch vs master since only functional change is just a single log not being printed in a low probability scenario (we happen to be attempting v2 connection when P2P network activity is being turned off).
  23. ryanofsky assigned ryanofsky on Mar 24, 2025
  24. ryanofsky merged this on Mar 24, 2025
  25. ryanofsky closed this on Mar 24, 2025

  26. hodlinator deleted the branch on Mar 24, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-28 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me