ThreadI2PAcceptIncoming temporarily bypasses 125 connection ceiling #27843

issue Crypt-iQ openend this issue on June 9, 2023
  1. Crypt-iQ commented at 2:52 pm on June 9, 2023: contributor
    • The i2p accept code is in ThreadI2PAcceptIncoming, instead of in ThreadSocketHandler like the non-i2p accept code.
    • When accepting non-i2p connections, a connection is accepted in AcceptConnection and if there are more than maxInbound outstanding connections, one is marked for disconnect. The next iteration of the loop in ThreadSocketHandler will call DisconnectNodes and remove it.
    • Since the i2p code resides in its own thread, it’s possible for i2p connections to stack up past 125 (I was able to hit 190 outstanding connections when running a local script to connect to my bitcoind i2p address). I think the higher numbered fd’s could cause a slight slowdown in select.
    • How many connections it can go past 125 is a race between DisconnectNodes and ThreadI2PAcceptIncoming.
     0diff --git a/src/test/i2p_tests.cpp b/src/test/i2p_tests.cpp
     1index b2e1ae43b..f1f0b2a60 100644
     2--- a/src/test/i2p_tests.cpp
     3+++ b/src/test/i2p_tests.cpp
     4@@ -10,14 +10,52 @@
     5 #include <test/util/net.h>
     6 #include <test/util/setup_common.h>
     7 #include <util/threadinterrupt.h>
     8+#include <netbase.h>
     9+#include <protocol.h>
    10+
    11+#include <chrono>
    12+#include <thread>
    13 
    14 #include <boost/test/unit_test.hpp>
    15 
    16 #include <memory>
    17 #include <string>
    18+#include <stdio.h>
    19 
    20 BOOST_FIXTURE_TEST_SUITE(i2p_tests, BasicTestingSetup)
    21 
    22+BOOST_AUTO_TEST_CASE(spam)
    23+{
    24+    CThreadInterrupt interrupt;
    25+
    26+    in_addr ipv4Addr;
    27+    ipv4Addr.s_addr = inet_addr("127.0.0.1");
    28+
    29+    Proxy p = Proxy(CService(ipv4Addr, 7656), false);
    30+
    31+    auto i2p_transient_session = std::make_unique<i2p::sam::Session>(p.proxy, &interrupt);
    32+
    33+    const char* pszDest = "tfshhokzifn2g46dc6dxwrwwxwbeo4l5eeepequ52kpvrflm3foq.b32.i2p";
    34+
    35+    const std::vector<CService> resolved{Lookup(pszDest, 0, false, 256)};
    36+
    37+    CAddress addrConnect = CAddress(resolved[0], NODE_NONE);
    38+
    39+    std::vector<std::unique_ptr<Sock>> v;
    40+    v.reserve(1000);
    41+
    42+    for (int i = 0; i < 1000; i++) {
    43+        i2p::Connection conn;
    44+        bool proxy_error;
    45+        auto connected = i2p_transient_session->Connect(addrConnect, conn, proxy_error);
    46+        if (connected) {
    47+            v.push_back(std::move(conn.sock));
    48+        }
    49+    }
    50+
    51+    std::this_thread::sleep_for(std::chrono::milliseconds(50000));
    52+}
    53+
    54 BOOST_AUTO_TEST_CASE(unlimited_recv)
    55 {
    56     const auto prev_log_level{LogInstance().LogLevel()};
    

    I then called it from a script:

    0#!/bin/bash
    1#
    2for i in {1..10}
    3do
    4	./src/test/test_bitcoin --log_level=all --run_test=i2p_tests/spam &
    5done
    

    When testing this, I also ran into a reproducible error when simultaneously using bitcoin-cli:

    0error: timeout on transient error: Could not connect to the server 127.0.0.1:18443 (error code 1 - "EOF reached")
    1
    2Make sure the bitcoind server is running and that you are connecting to the correct RPC port.
    

    Wireshark showed that a TCP port was reused which led to a RST being sent back to bitcoin-cli. It happened when the i2p sam bridge ran out of memory, but as far as I can tell the rpc server should be totally unaffected by that. This error message also appeared in bitcoind’s logs at the same time: 2023-06-09T15:02:58Z connect() to 127.0.0.1:7656 failed: Can't assign requested address (49)

  2. willcl-ark commented at 11:21 am on June 12, 2023: member

    I can’t test this right now, as I uninstalled i2pd and now don’t seem to be able to re-connect to any of the reseed servers and get connected :(

    A cursory look at the code reads to me that they should behave similarly; although as you state non-I2P connections are done in AcceptConnection, the check on open connections is actually in CreateNodeFromAcceptedSocket, which is called from both sites, i.e. also from ThreadI2PAcceptIncoming:

    https://github.com/bitcoin/bitcoin/blob/fbe48f97dfec3138b06b5f00b75655da0c985008/src/net.cpp#L2117-L2118

    Are the connections stacking up on the i2pd side before they are accepted into bitcoind? How were you measuring accepted connections in bitcoind?

  3. Crypt-iQ commented at 11:40 am on June 12, 2023: contributor

    The connections will get marked for disconnect in AttemptToEvictConnection but linger until DisconnectNodes is called. I was calling the getconnectioncount RPC, so these are connections that bitcoind has. Again not sure if this matters as I couldn’t exhaust file descriptors, but figured I would point it out since the original behavior was to:

    • connect
    • mark for disconnect
    • disconnect

    all in the same thread.

  4. jonatack commented at 4:38 pm on June 14, 2023: member

    @Crypt-iQ did you see this using the i2pd router? With a debug build, or a regular one? (Also make sure that conf file options like checkaddrman and checkmempool are off).

    I attempted to reproduce with your script using the Java I2P router on mainnet with a regular build on master (after changing const char* pszDest = "tfshhokzifn2g46dc6dxwrwwxwbeo4l5eeepequ52kpvrflm3foq.b32.i2p"; in your test code to my i2p address).

    With -netinfo I saw an initial spike to 128 connections that quickly went back to normal, then 130 on the second run (with a temporary additional block-relay-only conn). Given that there were 4 addnode connections, for a ceiling of 125 default max + 4 manual + 1 temporary block-relay-only peer = 129-130 total, this didn’t seem too bad. Will have a look with i2pd.

    Just before launching the script

    0        onion     i2p   cjdns   total   block  manual
    1in         52      10       0      62
    2out        12       1       1      15       3       4
    3total      64      11       1      77
    

    Initial spike to 128 connections

    0        onion     i2p   cjdns   total   block  manual
    1in         51      63       0     114
    2out        11       1       1      14       2       4
    3total      62      64       1     128
    

    A few seconds later

    0        onion     i2p   cjdns   total   block  manual
    1in         54       8       0      62
    2out        11       1       1      14       2       4
    3total      65       9       1      76
    

    On a second run, saw 130

    0        onion     i2p   cjdns   total   block  manual
    1in         53      62       0     115
    2out        12       1       1      15       3       4
    3total      65      63       1     130
    

    (Note that the totals are off by one because I have a manual ipv4 connection that is not being counted by -netinfo due to ipv4 not being “reachable” in getnetworkinfo; am fixing.)

    • bitcoin-cli -netinfo continued to function without an error
    • I did see two listening errors in the debug log
    02023-06-14T16:02:01.006225Z [i2paccept] [i2p.cpp:257] [Log] [i2p] Error listening: Cannot connect to 127.0.0.1:7656
    12023-06-14T16:14:44.067120Z [i2paccept] [i2p.cpp:257] [Log] [i2p] Error listening: Cannot connect to 127.0.0.1:7656
    

    Will look further with i2pd + mitigation.

  5. Crypt-iQ commented at 5:33 pm on June 14, 2023: contributor
    I used the default-install i2p from here: https://geti2p.net/en/ and no extra bitcoind options. I also ran the script in multiple terminals to make even more connections at a time
  6. Crypt-iQ commented at 5:49 pm on June 14, 2023: contributor

    2023-06-14T16:14:44.067120Z [i2paccept] [i2p.cpp:257] [Log] [i2p] Error listening: Cannot connect to 127.0.0.1:7656

    I experienced two issues that this log line might be related to:

    • the i2p SAM bridge crashed due to OOM
    • the number of ephemeral ports were exhausted for connections with 127.0.0.1:7656 (may be worth checking the output of netstat -a | grep TCP
  7. jonatack commented at 6:30 pm on June 14, 2023: member

    Ok, we both tested with the same router.

    Tried this time with i2pd (latest stable 2.48.0). I saw a similar max number of connections, except that i2pd then crashed and I had to restart it manually.

    0        onion     i2p   cjdns   total   block  manual
    1in         19      98       0     117
    2out        10       2       1      14       2       4
    3total      29     100       1     131
    
  8. jonatack commented at 6:34 pm on June 14, 2023: member

    I also ran the script in multiple terminals to make even more connections at a time

    Running it in two terminals got the race to happen clearly. Then i2pd crashed again.

    0        onion     i2p   cjdns   total   block  manual
    1in         21     117       0     138
    2out        11       1       1      14       2       4
    3total      32     118       1     152
    
  9. Crypt-iQ commented at 6:39 pm on June 14, 2023: contributor
    FWIW I wasn’t able to crash i2pd or get the bitcoin-cli error due to ephemeral port exhaustion when the scripts ran on a separate machine, but was still able to get the high connection count.
  10. jonatack commented at 6:44 pm on June 14, 2023: member
    That seems good, if we can just worry about mitigating the connection count race. (Nice find!)
  11. willcl-ark commented at 1:19 pm on June 15, 2023: member

    @Crypt-iQ would you be willing to re-test with the rather crude patch on this branch: https://github.com/bitcoin/bitcoin/compare/master...willcl-ark:bitcoin:27843-i2p-thread It just adds another DisconnectNodes() (and NotifyNumConnectionsChanged()) call to the I2P thread. I am still battling to do an initial sync of i2pd still after wiping my computer recently so I can’t test myself :(

    You should be able to fetch only the single branch with: git remote add -t 27843-i2p-thread willcl-ark https://github.com/willcl-ark/bitcoin.git
    And then check out the branch as normal: git checkout 27843-i2p-thread

  12. Crypt-iQ commented at 1:28 pm on June 15, 2023: contributor
    Sure thing @willcl-ark
  13. Crypt-iQ commented at 1:50 pm on June 15, 2023: contributor
    I think that patch will race on m_disconnected_nodes if both threads are calling it simultaneously because m_disconnected_nodes.remove(...) isn’t covered by the m_nodes_mutex. I also think both threads could call m_disconnected_nodes.remove(...) on the same element
  14. willcl-ark commented at 12:07 pm on June 16, 2023: member

    Oh right. I added a mutex on DisconnectNodes() to pretect against that.

    I’m not planning on opening these changes as a pull here (at least not in their current ill-thought-out state), but am curious to know if the updated branch does indeed fix the race you observed?

  15. Crypt-iQ commented at 3:16 am on June 17, 2023: contributor
    The patch fixes it @willcl-ark, the number of conns evens out at 114
  16. tuanggo commented at 8:57 am on July 1, 2023: none
    ok

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-01 13:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me