tracing: network connection tracepoints #25832

pull 0xB10C wants to merge 7 commits into bitcoin:master from 0xB10C:2022-05-connection-tracepoints changing 6 files +520 −5
  1. 0xB10C commented at 5:18 pm on August 12, 2022: contributor

    This adds five new tracepoints with documentation and tests for network connections:

    • established connections with net:inbound_connection and net:outbound_connection
    • closed connections (both closed by us or by the peer) with net:closed_connnection
    • inbound connections that we choose to evict with net:evicted_inbound_connection
    • connections that are misbehaving and punished with net:misbehaving_connection

    I’ve been using these tracepoints for a few months now to monitor connection lifetimes, re-connection frequency by IP and netgroup, misbehavior, peer discouragement, and eviction and more. Together with the two existing P2P message tracepoints they allow for a good overview of local P2P network activity. Also sort-of addresses #22006 (review).

    I’ve been back and forth on which arguments to include. For example, net:evicted_connection could also include some of the eviction metrics like e.g. last_block_time, min_ping_time, … but I’ve left them out for now. If wanted, this can be added here or in a follow-up. I’ve tried to minimize a potential performance impact by measuring executed instructions with gdb where possible (method described here). I don’t think a few hundred extra instructions are too crucial, as connection opens/closes aren’t too frequent (compared to e.g. P2P messages). Note: e.g. CreateNodeFromAcceptedSocket() usually executes between 80k and 90k instructions for each new inbound connection.

    tracepoint instructions
    net:inbound_connection 390 ins
    net:outbound_connection between 700 and 1000 ins
    net:closed_connnection 473 ins
    net:evicted_inbound_connection not measured; likely similar to net:closed_connnection
    net:misbehaving_connection not measured

    Also added a bpftrace (tested with v0.14.1) log_p2p_connections.bt example script that produces output similar to:

    0Attaching 6 probes...
    1Logging opened, closed, misbehaving, and evicted P2P connections
    2OUTBOUND conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, total_out=1
    3INBOUND conn from 127.0.0.1:45324: id=1, type=inbound, network=0, total_in=1
    4MISBEHAVING conn id=1, message='getdata message size = 50001'
    5CLOSED conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, established=1231006505
    6EVICTED conn to 127.0.0.1:45324: id=1, type=inbound, network=0, established=1612312312
    
  2. jb55 commented at 10:54 pm on August 12, 2022: contributor
    nice, utACK 7252c60a7ef3b86b52aba110e2f808171d952577
  3. kouloumos referenced this in commit 6f3480e204 on Aug 17, 2022
  4. russeree commented at 10:10 am on August 18, 2022: contributor

    Just reporting in that everything is functional. Console log looks correct.

    Tests Passed image

  5. glozow added the label Utils/log/libs on Aug 19, 2022
  6. 0xB10C referenced this in commit c4b03c092a on Sep 12, 2022
  7. 0xB10C referenced this in commit 89918949e9 on Sep 12, 2022
  8. 0xB10C referenced this in commit b167ccf1c6 on Sep 12, 2022
  9. 0xB10C referenced this in commit c317f09f1e on Sep 12, 2022
  10. DrahtBot commented at 6:16 pm on September 13, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/25832.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK dergoegge, i-am-yuvi
    Approach ACK vasild
    Stale ACK jb55, virtu, maflcko

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30988 (Split CConnman by vasild)
    • #30381 ([WIP] net: return result from addnode RPC by willcl-ark)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  11. 0xB10C referenced this in commit f0e8758415 on Sep 14, 2022
  12. 0xB10C commented at 1:08 pm on November 16, 2022: contributor
    Note to self: might be worth checking if we can also pass AS information and not only the net_group.
  13. 0xB10C force-pushed on Jan 20, 2023
  14. 0xB10C commented at 4:28 pm on January 20, 2023: contributor

    Note to self: might be worth checking if we can also pass AS information and not only the net_group.

    This is possible by accessing the CNodeStats, but don’t think we should do that in the tracepoint. We have the IP and can lookup the AS in a tracing scirpt, if necessary.


    Fixed up the linter complains and rebased. This is ready for further review.

  15. dergoegge commented at 4:36 pm on March 9, 2023: member
    Concept ACK
  16. DrahtBot added the label Needs rebase on Mar 20, 2023
  17. 0xB10C force-pushed on Mar 21, 2023
  18. DrahtBot removed the label Needs rebase on Mar 21, 2023
  19. 0xB10C force-pushed on Mar 21, 2023
  20. 0xB10C commented at 11:45 am on March 21, 2023: contributor

    The linter started complaining about “E275 missing whitespace after keyword”. Fixed this up in the individual commits.

    ac438b10918ef05df2e564734d879699c8aa18f9 -> 6de4fc73c935baa2cc6f54dd57c8e57a5df3c7ae (2022-05-connection-tracepoints.pr.1 -> 2022-05-connection-tracepoints.pr.2; compare)

  21. in src/net.cpp:948 in 6de4fc73c9 outdated
    943+                pnode->GetId(),
    944+                pnode->m_addr_name.c_str(),
    945+                pnode->ConnectionTypeAsString().c_str(),
    946+                pnode->ConnectedThroughNetwork(),
    947+                pnode->nKeyedNetGroup,
    948+                std::chrono::duration<uint64_t>(pnode->m_connected).count());
    


    dergoegge commented at 12:20 pm on March 21, 2023:

    Thinking a little bit about what data should go into the tracepoints.

    m_addr_name: would it make more sense to put addr in binary form (or both)? I’m not sure they are equivalent all the time. nKeyedNetGroup: is derived deterministically from addr using random seeds that are created with every CConnman. I think that means that across restarts of a node previous net groups loose meaning (i.e. you can’t compare them to netgroups of new connections). Maybe NetGroupManager::GetGroup(addr) would be better?


    0xB10C commented at 9:25 am on March 22, 2023:

    Thanks, helpful comments!

    • m_addr_name: my reasoning for passing m_addr_name as string instead of binary is that we likely don’t want to implement the binary decoding in each tracing script. I think the string is closer to what users need. However, not opposing to add the binary if there’s need.
    • nKeyedNetGroup: I added the netgroup as argument to better identify connections coming from the same network. For my use-case, it’s not problematic if nKeyedNetGroup isn’t consistent across restarts and doesn’t have a meaning besides being an ID. I had a quick look at NetGroupManager::GetGroup(). This looks like a good alternative to nKeyedNetGroup as consumers can choose to decode it if they want. I plan to play around with it a bit to see if it works.

    virtu commented at 1:24 pm on March 24, 2023:
    I can imagine some use cases where having consistent netgroup identifiers would be advantageous. Unless NetGroupManager::GetGroup() adds too much overhead, they seem preferable.

    0xB10C commented at 12:00 pm on April 14, 2023:

    Looking into GetGroup() I saw it returns a C++ std::vector<unsigned char>. We can’t pass C++ std::vector’s in the tracepoints as the eBPF code, written in C, won’t be able to read it. We can call data() and pass this in the tracepoint as pointer to unsigned chars (we do this with e.g. the raw P2P messages too). To read from this pointer, we also need to pass the of the length of underlying array.

    I’ll give this a shot in a separate branch.


    0xB10C commented at 12:20 pm on April 14, 2023:
    An alternative is to pass GetMappedAS() which is an uint32_t that will be !=0 in an ASMap future. A net group similar to Bitcoin Core’s can be calculated from the peer address by consumers that need it.
  22. in src/net.cpp:599 in 6de4fc73c9 outdated
    594+            GetId(),
    595+            m_addr_name.c_str(),
    596+            ConnectionTypeAsString().c_str(),
    597+            ConnectedThroughNetwork(),
    598+            nKeyedNetGroup,
    599+            std::chrono::duration<uint64_t>(m_connected).count());
    


    maflcko commented at 9:31 am on March 22, 2023:
    0            Ticks<std::chrono::seconds>(m_connected);
    

    nit, use Ticks, or count_seconds (if you want the compile failure if m_connected is changed to higher precision than seconds)?

  23. virtu commented at 1:43 pm on March 24, 2023: contributor

    ACK 6de4fc7

    Code reviewed tracepoints, tests, and demo script. Tracepoint placement looks good, so do tests. Successfully ran functional tests and the demo script.

    Some points:

    • Any reason why only inbound_connection and outbound_connection explicitly cast all arguments in the demo script?
    • I think some of the format specifiers in the demo script are lightly off, leading to outputs with negative netgroup values on my system. I believe it should be %lld for int64 and %llu for uint64.
    • As brought up previously, it would be nice to switch from nKeyedNetGroup to netgroup ids that are consistent over time
  24. DrahtBot requested review from jb55 on Mar 24, 2023
  25. in src/net.cpp:942 in 6de4fc73c9 outdated
    937@@ -930,6 +938,15 @@ bool CConnman::AttemptToEvictConnection()
    938     for (CNode* pnode : m_nodes) {
    939         if (pnode->GetId() == *node_id_to_evict) {
    940             LogPrint(BCLog::NET, "selected %s connection for eviction peer=%d; disconnecting\n", pnode->ConnectionTypeAsString(), pnode->GetId());
    941+
    942+            TRACE6(net, evicted_connection,
    


    dergoegge commented at 3:05 pm on March 29, 2023:
    0            TRACE6(net, evicted_inbound_connection,
    

    The logic in net exclusively evicts inbound peers and there is more eviction logic in net_processing for outbound connections. Maybe adding separate trace points for both makes sense?

  26. 0xB10C commented at 3:57 pm on April 25, 2023: contributor
    Still think this is relevant, marking this as in draft until I address the feedback.
  27. 0xB10C marked this as a draft on Apr 25, 2023
  28. 0xB10C force-pushed on Sep 11, 2023
  29. 0xB10C marked this as ready for review on Sep 11, 2023
  30. 0xB10C commented at 11:24 am on September 11, 2023: contributor

    This is ready for review again. Changes:

    • Rebased: master now includes #27831 and #27944. The tests in this PR have been updated to match the changes in these PRs.
    • #25832 (review): I’ve dropped nKeyedNetGroup from all tracepoints and didn’t add a GetGroup() call. The net group can be calculated from the passed address. Passing NetGroupManager::GetGroup() seems impractical as it’s a std::vector<unsigned char>.
    • #25832 (review): Renamed evicted_connection to evicted_inbound_connection to clarify that this only covers inbound evictions. Outbound eviction is happening in multiple places in net_processsing.cpp and would need multiple tracepoints. If needed, these can be added in a separate pull-request.
    • #25832 (review): using Ticks<std::chrono::seconds>(m_connected); now
    • adapted the bpftrace demo script and tested with bpftrace v0.18
  31. jb55 commented at 5:56 pm on September 11, 2023: contributor

    On Mon, May 02, 2022 at 04:30:15PM +0200, 0xb10c wrote:


    doc/tracing.md | 12 ++++ src/net_processing.cpp | 10 +++- test/functional/interface_usdt_net.py | 84 ++++++++++++++++++++++++++-

    diff –git a/src/net_processing.cpp b/src/net_processing.cpp index 6b415b3a1ea63..ca7e2c8ca3be0 100644 — a/src/net_processing.cpp +++ b/src/net_processing.cpp @@ -1662,13 +1662,21 @@ void PeerManagerImpl::Misbehaving(Peer& peer, int howmuch, const std::string& me const std::string message_prefixed = message.empty() ? "" : (": " + message); std::string warning;

    • if (score_now >= DISCOURAGEMENT_THRESHOLD && score_before < DISCOURAGEMENT_THRESHOLD) {
    • const bool discurage_threshold_exceeded = score_now >= DISCOURAGEMENT_THRESHOLD && score_before < DISCOURAGEMENT_THRESHOLD;
    • if (discurage_threshold_exceeded) {

    nit: should this be discourage_threshold_exceeded ? 😅

  32. 0xB10C requested review from virtu on Sep 11, 2023
  33. 0xB10C force-pushed on Sep 11, 2023
  34. 0xB10C force-pushed on Oct 16, 2023
  35. 0xB10C commented at 8:56 am on October 16, 2023: contributor
    rebased to inlcude #28629 and added the compiler flags here too.
  36. DrahtBot added the label CI failed on Oct 16, 2023
  37. DrahtBot removed the label CI failed on Oct 16, 2023
  38. jb55 commented at 1:22 am on October 19, 2023: contributor

    crACK af207077578265964674618b9e2b0e40c087ae35

    (although I noticed the mispelling is still there)

  39. DrahtBot removed review request from jb55 on Oct 19, 2023
  40. DrahtBot removed review request from virtu on Oct 19, 2023
  41. DrahtBot requested review from virtu on Oct 19, 2023
  42. DrahtBot requested review from dergoegge on Oct 19, 2023
  43. DrahtBot removed review request from virtu on Oct 19, 2023
  44. DrahtBot requested review from virtu on Oct 19, 2023
  45. DrahtBot added the label CI failed on Jan 14, 2024
  46. 0xB10C force-pushed on Jan 18, 2024
  47. 0xB10C commented at 10:03 am on January 18, 2024: contributor

    Hm the CI failure of previous releases, qt5 dev package and depends packages, DEBUG (CI job used to find silent-merge conflicts (?)) four days ago seems weird. Didn’t see any conflicts when rebasing.

    Anyway, rebased. Sorry for invalidating your ACK again, @jb55. Fixed the discurage -> discourage misspelling. I thought I did it in an earlier push.

  48. jb55 commented at 7:42 pm on January 22, 2024: contributor

    On Thu, Jan 18, 2024 at 02:03:35AM -0800, 0xB10C wrote:

    Hm the CI failure of previous releases, qt5 dev package and depends packages, DEBUG (CI job used to find silent-merge conflicts (?)) four days ago seems weird. Didn’t see any conflicts when rebasing.

    Anyway, rebased. Sorry for invalidating your ACK again, @jb55. Fixed the discurage -> discourage misspelling. I thought I did it in an earlier push.

    sorry I haven’t been able to sit down and actually test this yet. The changes look good though!

  49. 0xB10C commented at 10:51 am on March 7, 2024: contributor
    Can we re-trigger the Cirrus CI / ASan + LSan + UBSan + integer, no depends, USDT job? The failing wallet_send.py --descriptors test looks unrelated.
  50. DrahtBot removed the label CI failed on Mar 7, 2024
  51. Jess25051980 approved
  52. in doc/tracing.md:108 in 996029b1fd outdated
    100+
    101+Arguments passed:
    102+1. Peer ID as `int64`
    103+2. Peer address and port (IPv4, IPv6, Tor v3, I2P, ...) as `pointer to C-style String` (max. length 68 characters)
    104+3. Connection Type (inbound, feeler, outbound-full-relay, ...) as `pointer to C-style String` (max. length 20 characters)
    105+4. Network the peer connects from as `uint32` (1 = IPv4, 2 = IPv6, 3 = Onion, 4 = I2P, 5 = CJDNS). See `Network` enum in `netaddress.h`.
    


    laanwj commented at 2:26 pm on April 8, 2024:
    It’s slightly inconsistent to pass this parameter as int, while the other enum (Connection Type) gets passed as string. Might similarly pass GetNetworkName(...).c_str(). Especially after #26593 which would remove the call overhead in the happy path. No strong opinion though, I think the integer value is easier to use if you’re going to make a histogram or such. My personal preference is also string-less APIs. But it’s less consistent.

    0xB10C commented at 7:36 am on May 10, 2024:
    I fully agree what you’ve said. I’m leaving as is for now. If #26593 is merged and I retouch this, I’ll reconsider changing this. I’ll leave this as “unresolved” for now.
  53. in contrib/tracing/README.md:357 in 996029b1fd outdated
    352+Logging opened, closed, misbehaving, and evicted P2P connections
    353+OUTBOUND conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, net_group=245974404224840066, total_out=1
    354+INBOUND conn from 127.0.0.1:45324: id=1, type=inbound, network=0, net_group=90061749176390714, total_in=1
    355+MISBEHAVING conn id=1, score_before=0, score_increase=20, message='getdata message size = 50001', threshold_exceeded=false
    356+CLOSED conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, net_group=245974404224840066, established=1231006505
    357+EVICTED conn to 127.0.0.1:45324: id=1, type=inbound, network=0, net_group=90061749176390714, established=1612312312
    


    virtu commented at 8:53 am on April 29, 2024:
    0OUTBOUND conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, total_out=1
    1INBOUND conn from 127.0.0.1:45324: id=1, type=inbound, network=0, total_in=1
    2MISBEHAVING conn id=1, score_before=0, score_increase=20, message='getdata message size = 50001', threshold_exceeded=false
    3CLOSED conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, established=1231006505
    4EVICTED conn to 127.0.0.1:45324: id=1, type=inbound, network=0, established=1612312312
    

    0xB10C commented at 8:25 am on May 10, 2024:
    dropped in 8109319e102c41d3aeed0ecfbc3a0e23b7fea807
  54. virtu commented at 9:26 am on April 29, 2024: contributor

    ACK 996029b (modulo comment and this)

    Successfully ran log_p2p_connections.bt over the weekend on both x86_64/clang and aarch64/gcc.

    Extended functional test (interface_usdt_net.py) equally passed on both machines.

  55. 0xB10C force-pushed on May 10, 2024
  56. virtu commented at 7:41 am on May 21, 2024: contributor
    LGTM! ACK 8109319
  57. 0xB10C commented at 9:43 am on May 21, 2024: contributor
    note that #29575 removes the misbehavior score (which is exposed via the net:misbehaving_connection tracepoint). If/when #29575 is merged, the misbehaving_connection tracepoint will change.
  58. DrahtBot added the label Needs rebase on Jun 20, 2024
  59. 0xB10C force-pushed on Jun 21, 2024
  60. 0xB10C closed this on Jun 21, 2024

  61. 0xB10C force-pushed on Jun 21, 2024
  62. 0xB10C reopened this on Jun 21, 2024

  63. 0xB10C commented at 11:38 am on June 21, 2024: contributor

    Must have closed this while force pushing. That was not my intention.

    I’ve rebased this with #29575 merged. The net:misbehaving_connection changed from

    0TRACE5(net, misbehaving_connection,
    1  peer.m_id,
    2  score_before,
    3  howmuch,
    4  message.c_str(),
    5  discourage_threshold_exceeded);
    

    to

    0TRACE2(net, misbehaving_connection,
    1  peer.m_id,
    2  message.c_str());
    

    I’ve updates the tests and the bpftrace demo script accordingly.

  64. DrahtBot removed the label Needs rebase on Jun 21, 2024
  65. 0xB10C requested review from laanwj on Jul 25, 2024
  66. 0xB10C requested review from Jess25051980 on Jul 25, 2024
  67. 0xB10C requested review from virtu on Jul 25, 2024
  68. in src/net.cpp:1828 in eca0750b8d outdated
    1827@@ -1813,6 +1828,13 @@ void CConnman::CreateNodeFromAcceptedSocket(std::unique_ptr<Sock>&& sock,
    1828 
    1829     LogPrint(BCLog::NET, "connection from %s accepted\n", addr.ToStringAddrPort());
    1830 
    1831+    TRACE5(net, inbound_connection,
    


    maflcko commented at 10:47 am on July 26, 2024:

    nit: (coming from one of the conflicting pull requests):

    It probably doesn’t matter much, but as a tracepoint user I’d find it a bit odd to receive the trace, then call getpeerinfo and get an empty reply, as the RPC thread may still have the view on the previous m_nodes state.

    I understand that this can never be race-free for the user, but for consistency, I’d recommend to move this after the next LOCK scope. See also commit 55555574d105f6c529c5c966c3c971c9db5ab786 / https://github.com/bitcoin/bitcoin/pull/30512/commits

  69. in src/net.cpp:2950 in eca0750b8d outdated
    2941@@ -2920,6 +2942,13 @@ void CConnman::OpenNetworkConnection(const CAddress& addrConnect, bool fCountFai
    2942         return;
    2943     pnode->grantOutbound = std::move(grant_outbound);
    2944 
    2945+    TRACE5(net, outbound_connection,
    2946+        pnode->GetId(),
    2947+        pnode->m_addr_name.c_str(),
    2948+        pnode->ConnectionTypeAsString().c_str(),
    2949+        pnode->ConnectedThroughNetwork(),
    2950+        GetNodeCount(ConnectionDirection::Out));
    


    maflcko commented at 10:48 am on July 26, 2024:
    nit: Probably the same here? Or at least for consistency, both should happen consistently before InitializeNode/m_nodes.push_back, or after?
  70. maflcko commented at 10:53 am on July 26, 2024: member
    left two style nits, feel free to ignore
  71. 0xB10C commented at 1:37 pm on August 5, 2024: contributor
    Thanks @maflcko. I’ll incorporate these suggestions once #30512 is merged and I rebase.
  72. DrahtBot added the label Needs rebase on Aug 5, 2024
  73. 0xB10C force-pushed on Aug 5, 2024
  74. DrahtBot removed the label Needs rebase on Aug 5, 2024
  75. in test/functional/interface_usdt_net.py:355 in 8e2a7d78df outdated
    244+        testnodes = list()
    245+        for _ in range(EXPECTED_INBOUND_CONNECTIONS):
    246+            testnode = P2PInterface()
    247+            self.nodes[0].add_p2p_connection(testnode)
    248+            testnodes.append(testnode)
    249+        bpf.perf_buffer_poll(timeout=200)
    


    maflcko commented at 8:15 am on August 8, 2024:
    As a follow-up idea, it could make sense to reduce the timeout here and instead scale it with the self.timeout_factor, so that the timeout is magically adjusted to the speed of the test-env it is running in
  76. in doc/tracing.md:129 in 07e7bd873b outdated
    124+Arguments passed:
    125+1. Peer ID as `int64`
    126+2. Peer address and port (IPv4, IPv6, Tor v3, I2P, ...) as `pointer to C-style String` (max. length 68 characters)
    127+3. Connection Type (inbound, feeler, outbound-full-relay, ...) as `pointer to C-style String` (max. length 20 characters)
    128+4. Network the peer connects from as `uint32` (1 = IPv4, 2 = IPv6, 3 = Onion, 4 = I2P, 5 = CJDNS). See `Network` enum in `netaddress.h`.
    129+5. Connection established UNIX epoch timestamp as `uint64`.
    


    maflcko commented at 8:18 am on August 8, 2024:
    style nit in 07e7bd873bd512c9d488efbaf452e54efc92ac0a: Could clarify that this is seconds, not nano- or other seconds (like in other tracepoints)?
  77. in test/functional/interface_usdt_net.py:50 in 16ea431f1f outdated
    46     MAX_PEER_ADDR_LENGTH,
    47     MAX_PEER_CONN_TYPE_LENGTH,
    48     MAX_MSG_TYPE_LENGTH,
    49-    MAX_MSG_DATA_LENGTH
    50+    MAX_MSG_DATA_LENGTH,
    51+    MAX_MISBEHAVING_MESSAGE_LENGTH
    


    maflcko commented at 8:21 am on August 8, 2024:
    style nit in 16ea431f1f3a4ae57729ba265ff080b64b711c5d: Could add the missing trailing comma now? Otherwise this line will have to be touched again in the future, if something is appended.
  78. in doc/tracing.md:150 in d07d558da9 outdated
    145+Arguments passed:
    146+1. Peer ID as `int64`
    147+2. Peer address and port (IPv4, IPv6, Tor v3, I2P, ...) as `pointer to C-style String` (max. length 68 characters)
    148+3. Connection Type (inbound, feeler, outbound-full-relay, ...) as `pointer to C-style String` (max. length 20 characters)
    149+4. Network the peer connects from as `uint32` (1 = IPv4, 2 = IPv6, 3 = Onion, 4 = I2P, 5 = CJDNS). See `Network` enum in `netaddress.h`.
    150+5. Connection established UNIX epoch timestamp as `uint64`.
    


    maflcko commented at 8:22 am on August 8, 2024:
    nit in d07d558da945a47069b91bfce58b13831da61f23: Same?
  79. maflcko approved
  80. maflcko commented at 8:23 am on August 8, 2024: member

    Left a minor follow-up idea and a few style nits, but nothing blocking. lgtm

    review ACK edef8c78e3a436e7187937b6e1bc906392ef9891 💽

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: review ACK edef8c78e3a436e7187937b6e1bc906392ef9891 💽
    3E1bat9/3mlxuNJ9mxUBYhUxNSOWw8eIFqjqBo86KKWEUbfBUe85vUgW0Kv+VOjkL/7yPvxAdFPVdZ/LWpOP8CA==
    
  81. DrahtBot added the label Needs rebase on Sep 2, 2024
  82. in contrib/tracing/log_p2p_connections.bt:46 in edef8c78e3 outdated
    41+  $conn_type = str(arg2);
    42+  $network = (int32) arg3;
    43+  printf("EVICTED conn to %s: id=%ld, type=%s, network=%d, established=%ld\n", $addr, $id, $conn_type, $network, arg4);
    44+}
    45+
    46+usdt:./src/bitcoind:net:misbehaving_connection
    


    0xB10C commented at 11:31 pm on September 2, 2024:
    note to self: update the binary path to new cmake location
  83. 0xB10C force-pushed on Sep 3, 2024
  84. 0xB10C commented at 1:43 pm on September 3, 2024: contributor
    rebased, addresses some of @maflcko’s nits and #25832 (review)
  85. DrahtBot removed the label Needs rebase on Sep 3, 2024
  86. DrahtBot added the label CI failed on Sep 4, 2024
  87. DrahtBot removed the label CI failed on Sep 4, 2024
  88. maflcko removed review request from dergoegge on Sep 4, 2024
  89. maflcko removed review request from laanwj on Sep 4, 2024
  90. maflcko removed review request from Jess25051980 on Sep 4, 2024
  91. maflcko removed review request from virtu on Sep 4, 2024
  92. maflcko commented at 3:31 pm on September 4, 2024: member

    re-ACK 5b59cfc944f9eac73e1e69bcc660280b6809724a 🌞

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: re-ACK 5b59cfc944f9eac73e1e69bcc660280b6809724a 🌞
    3+fwdnyZ2go0FfCZLGQhHmGEVETFdRHYk006fB55POSqqbbyiX3tXZFbuDFztmXiPE57ThJyQ/JJcQ8s5WOuHCA==
    
  93. DrahtBot requested review from dergoegge on Sep 4, 2024
  94. DrahtBot requested review from virtu on Sep 4, 2024
  95. i-am-yuvi commented at 2:16 pm on September 18, 2024: none

    tested ACK 🔥

    Successfully ran demo scripts and functional tests!!

    I have been using these tracepoints for 3-4 months from now and I find it very useful. They help you keep track of your peers, also I am working on a tool that detects spy peers and for that these tracepoints are very useful.

    Great work @0xB10C.

  96. in doc/tracing.md:103 in 5b59cfc944 outdated
     98+Is called when a new inbound connection is opened to us. Passes information about
     99+the peer and the number of inbound connections including the newly opened connection.
    100+
    101+Arguments passed:
    102+1. Peer ID as `int64`
    103+2. Peer address and port (IPv4, IPv6, Tor v3, I2P, ...) as `pointer to C-style String` (max. length 68 characters)
    


    vasild commented at 12:42 pm on September 30, 2024:

    That 68 probably comes from here:

    https://github.com/bitcoin/bitcoin/blob/a0473442d1c22043f5a288bd9255c006fd85d947/test/functional/interface_usdt_net.py#L22-L23

    but I just did this:

    Added to /etc/hosts:

    024.116.163.227  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    

    (that is 70 as)

    Then addnode aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa:8333 and CNode::m_addr_name gets assigned that 70-chars long string.

    Less importantly, that 68 includes space for : and for the port, whereas here we don’t have a port.


    0xB10C commented at 2:39 pm on November 13, 2024:
    Good catch. I confirmed that the a’s actually show up in the tracing scripts and are cut off. I think it’s fine to leave 68 in the tests, but I’ll add a footnote in the docs in a separate commit.
  97. in test/functional/interface_usdt_net.py:120 in 5b59cfc944 outdated
    111@@ -77,12 +112,112 @@
    112     outbound_messages.perf_submit(ctx, &msg, sizeof(msg));
    113     return 0;
    114 };
    115+
    116+BPF_PERF_OUTPUT(inbound_connections);
    117+int trace_inbound_connection(struct pt_regs *ctx) {
    118+    struct NewConnection inbound = {};
    119+    bpf_usdt_readarg(1, ctx, &inbound.conn.id);
    120+    bpf_usdt_readarg_p(2, ctx, &inbound.conn.addr, MAX_PEER_ADDR_LENGTH);
    


    vasild commented at 2:30 pm on October 2, 2024:
    I couldn’t find the doc of bpf_usdt_readarg_p(). What would it do if the argument is longer than the supplied buffer? I presume the last arg is the length of the buffer, so it is not going to overflow, I guess. But would it add a trailing '\0'? That is - could it copy MAX_PEER_ADDR_LENGTH from the input to the output and leave it like that? If yes, would that be a problem - a string without a terminating '\0'?

    0xB10C commented at 2:04 pm on November 13, 2024:

    What would it do if the argument is longer than the supplied buffer? I presume the last arg is the length of the buffer, so it is not going to overflow, I guess.

    Correct, it would cut it off.

    But would it add a trailing ‘\0’?

    No. However there is bpf_probe_read_user_str() which does that. This generally seems to be the better suited methodology here. I’ll use that for the changes in this PR. There are other places in our code we can update in a potential follow up.

    That is - could it copy MAX_PEER_ADDR_LENGTH from the input to the output and leave it like that? If yes, would that be a problem - a string without a terminating ‘\0’?

    I think, for us, in this test here, this isn’t a problem. The Python ctype mapping for Connection defined below has the same size for an address (("addr", ctypes.c_char * MAX_PEER_ADDR_LENGTH)).

  98. in doc/tracing.md:117 in 5b59cfc944 outdated
    112+
    113+Arguments passed:
    114+1. Peer ID as `int64`
    115+2. Peer address and port (IPv4, IPv6, Tor v3, I2P, ...) as `pointer to C-style String` (max. length 68 characters)
    116+3. Connection Type (inbound, feeler, outbound-full-relay, ...) as `pointer to C-style String` (max. length 20 characters)
    117+4. Network the peer connects from as `uint32` (1 = IPv4, 2 = IPv6, 3 = Onion, 4 = I2P, 5 = CJDNS). See `Network` enum in `netaddress.h`.
    


    vasild commented at 2:40 pm on October 2, 2024:

    For outbound connections, the peer does not “connect from”.

    04. Network of the peer as `uint32` (1 = IPv4, 2 = IPv6, 3 = Onion, 4 = I2P, 5 = CJDNS). See `Network` enum in `netaddress.h`.
    

    or “Network to which the peer’s address belongs”.

  99. in doc/tracing.md:125 in 5b59cfc944 outdated
    117+4. Network the peer connects from as `uint32` (1 = IPv4, 2 = IPv6, 3 = Onion, 4 = I2P, 5 = CJDNS). See `Network` enum in `netaddress.h`.
    118+5. Number of existing outbound connections as `uint64` including the newly opened outbound connection.
    119+
    120+#### Tracepoint `net:evicted_inbound_connection`
    121+
    122+Is called when a inbound connection is evicted by us. Passes information about the evicted peer and the time at connection establishment.
    


    vasild commented at 3:44 pm on October 2, 2024:

    nit: currently we evict only inbounds, but this may change in the future. Maybe drop the “inbound” from here so that this does not get outdated:

    0Is called when a connection is evicted by us. Passes information about the evicted peer and the time at connection establishment.
    

    0xB10C commented at 7:40 am on October 3, 2024:

    There are multiple places in which we evict outbound connections. For example, https://github.com/bitcoin/bitcoin/blob/cfb59da4b3bb34afae467691a3e901f2b5a186f3/src/net_processing.cpp#L6291-L6293

    Based on earlier review, I renamed the tracepoint to explicitly mention “inbound” - I think the same is fine for the documentation. If we were going to add (an) outbound eviction tracepoint(s), then they’d get their own tracepoint documentation.

    See also discussion in #25832 (review) & #25832 (comment)

  100. in src/net_processing.cpp:1758 in 5b59cfc944 outdated
    1944@@ -1945,6 +1945,10 @@ void PeerManagerImpl::Misbehaving(Peer& peer, const std::string& message)
    1945     const std::string message_prefixed = message.empty() ? "" : (": " + message);
    1946     peer.m_should_discourage = true;
    1947     LogDebug(BCLog::NET, "Misbehaving: peer=%d%s\n", peer.m_id, message_prefixed);
    1948+    TRACE2(net, misbehaving_connection,
    1949+        peer.m_id,
    1950+        message.c_str()
    


    vasild commented at 4:34 pm on October 2, 2024:

    I am not sure how serious is violating the expectation that this is going to be max 128 chars, but if that could cause troubles, maybe here:

    0        message.substr(0, 127).c_str() // 128 including the terminating '\0'
    

    0xB10C commented at 2:52 pm on November 13, 2024:

    I think this can be handled on the user side. The bpftrace tool cuts of strings after 64 chars by default and bcc and libbpf both have bpf_probe_read_user_str(), which would replace the last char with \0 on overflow.

    I haven’t had any problems with this using the tracepoints in Python and Rust. I did a quick check with a C program, and using bpf_probe_read_user_str() seems fine: https://github.com/0xB10C/libbpf-bootstrap/commit/d94496936dec11699736abce0d303a94f815f59b

  101. in test/functional/interface_usdt_net.py:156 in 5b59cfc944 outdated
    151+
    152+BPF_PERF_OUTPUT(misbehaving_connections);
    153+int trace_misbehaving_connection(struct pt_regs *ctx) {
    154+    struct MisbehavingConnection misbehaving = {};
    155+    bpf_usdt_readarg(1, ctx, &misbehaving.id);
    156+    bpf_usdt_readarg_p(2, ctx, &misbehaving.message, 64);
    


    vasild commented at 4:36 pm on October 2, 2024:
    64? Shouldn’t that be MAX_MISBEHAVING_MESSAGE_LENGTH (which is 128)?
  102. in test/functional/interface_usdt_net.py:510 in 5b59cfc944 outdated
    486+            testnodes.append(testnode)
    487+        for node in testnodes:
    488+            node.peer_disconnect()
    489+        bpf.perf_buffer_poll(timeout=400)
    490+
    491+        assert_equal(EXPECTED_CLOSED_CONNECTIONS, len(closed_connections))
    


    vasild commented at 4:45 pm on October 2, 2024:

    What happens if the disconnections do not happen within the timeout? Is that 400 seconds? Would it then go to the assert and fail?

    Here we can deterministically wait to have 0 connected peers:

    0self.wait_until(lambda: len(self.nodes[0].getpeerinfo()) == 0)
    
  103. in contrib/tracing/log_p2p_connections.bt:15 in 5b59cfc944 outdated
    10+  $id = (int64) arg0;
    11+  $addr = str(arg1);
    12+  $conn_type = str(arg2);
    13+  $network = (int32) arg3;
    14+  $existing = (uint64) arg4;
    15+  printf("INBOUND conn from %s: id=%ld, type=%s, network=%d, total=%d\n", $addr, $id, $conn_type, $network, $existing + 1);
    


    vasild commented at 4:49 pm on October 2, 2024:
    I got the impression that $existing should be already the number of connections, including this one, why + 1? (same question for the code below)

    0xB10C commented at 2:08 pm on November 13, 2024:
    Correct, this is an oversight from an older version of this PR. Likely related to #25832 (review).
  104. vasild commented at 4:51 pm on October 2, 2024: contributor
    Approach ACK 5b59cfc944f9eac73e1e69bcc660280b6809724a
  105. 0xB10C commented at 10:54 am on November 11, 2024: contributor
    Thanks for the comments @vasild. I plan to address your comments when I rebase on #26593.
  106. DrahtBot added the label CI failed on Nov 11, 2024
  107. DrahtBot commented at 10:57 am on November 11, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/29613964632

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  108. tracing: add inbound connection tracepoint f1bfebcaf8
  109. tracing: add outbound connection tracepoint 4519b89a31
  110. tracing: add inbound connection eviction tracepoint 182ff193de
  111. tracing: add misbehaving conn tracepoint dbf5509572
  112. tracing: connection closed tracepoint a094a39787
  113. tracing: log_p2p_connections.bt example
    A bpftrace script that logs information from the
    net:*_connection tracepoints.
    
    I've tested this script with bpftrace version 0.14.1 and v0.20.2.
    3ff54572e8
  114. 0xB10C referenced this in commit f52ccaa34d on Nov 13, 2024
  115. tracing: document that peer addrs can be >68 chars
    A v3 onion address with a `:` and a five digit port has a length of
    68 chars. As noted in
    https://github.com/bitcoin/bitcoin/pull/25832#discussion_r1781040991
    peers e.g. added via hostname might have a longer CNode::m_addr_name.
    These might be cut off in tracing tools.
    15dfd72f8f
  116. 0xB10C force-pushed on Nov 13, 2024
  117. DrahtBot removed the label CI failed on Nov 13, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me