tracing: network connection tracepoints #25832

pull 0xB10C wants to merge 6 commits into bitcoin:master from 0xB10C:2022-05-connection-tracepoints changing 6 files +494 −3
  1. 0xB10C commented at 5:18 pm on August 12, 2022: contributor

    This adds five new tracepoints with documentation and tests for network connections:

    • established connections with net:inbound_connection and net:outbound_connection
    • closed connections (both closed by us or by the peer) with net:closed_connnection
    • inbound connections that we choose to evict with net:evicted_inbound_connection
    • connections that are misbehaving and punished with net:misbehaving_connection

    I’ve been using these tracepoints for a few months now to monitor connection lifetimes, re-connection frequency by IP and netgroup, misbehavior, peer discouragement, and eviction and more. Together with the two existing P2P message tracepoints they allow for a good overview of local P2P network activity. Also sort-of addresses #22006 (review).

    I’ve been back and forth on which arguments to include. For example, net:evicted_connection could also include some of the eviction metrics like e.g. last_block_time, min_ping_time, … but I’ve left them out for now. If wanted, this can be added here or in a follow-up. I’ve tried to minimize a potential performance impact by measuring executed instructions with gdb where possible (method described here). I don’t think a few hundred extra instructions are too crucial, as connection opens/closes aren’t too frequent (compared to e.g. P2P messages). Note: e.g. CreateNodeFromAcceptedSocket() usually executes between 80k and 90k instructions for each new inbound connection.

    tracepoint instructions
    net:inbound_connection 390 ins
    net:outbound_connection between 700 and 1000 ins
    net:closed_connnection 473 ins
    net:evicted_inbound_connection not measured; likely similar to net:closed_connnection
    net:misbehaving_connection not measured

    Also added a bpftrace (tested with v0.14.1) log_p2p_connections.bt example script that produces output similar to:

    0Attaching 6 probes...
    1Logging opened, closed, misbehaving, and evicted P2P connections
    2OUTBOUND conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, total_out=1
    3INBOUND conn from 127.0.0.1:45324: id=1, type=inbound, network=0, total_in=1
    4MISBEHAVING conn id=1, message='getdata message size = 50001'
    5CLOSED conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, established=1231006505
    6EVICTED conn to 127.0.0.1:45324: id=1, type=inbound, network=0, established=1612312312
    
  2. jb55 commented at 10:54 pm on August 12, 2022: contributor
    nice, utACK 7252c60a7ef3b86b52aba110e2f808171d952577
  3. kouloumos referenced this in commit 6f3480e204 on Aug 17, 2022
  4. russeree commented at 10:10 am on August 18, 2022: contributor

    Just reporting in that everything is functional. Console log looks correct.

    Tests Passed image

  5. glozow added the label Utils/log/libs on Aug 19, 2022
  6. 0xB10C referenced this in commit c4b03c092a on Sep 12, 2022
  7. 0xB10C referenced this in commit 89918949e9 on Sep 12, 2022
  8. 0xB10C referenced this in commit b167ccf1c6 on Sep 12, 2022
  9. 0xB10C referenced this in commit c317f09f1e on Sep 12, 2022
  10. DrahtBot commented at 6:16 pm on September 13, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK dergoegge
    Stale ACK jb55, virtu

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)
    • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  11. 0xB10C referenced this in commit f0e8758415 on Sep 14, 2022
  12. 0xB10C commented at 1:08 pm on November 16, 2022: contributor
    Note to self: might be worth checking if we can also pass AS information and not only the net_group.
  13. 0xB10C force-pushed on Jan 20, 2023
  14. 0xB10C commented at 4:28 pm on January 20, 2023: contributor

    Note to self: might be worth checking if we can also pass AS information and not only the net_group.

    This is possible by accessing the CNodeStats, but don’t think we should do that in the tracepoint. We have the IP and can lookup the AS in a tracing scirpt, if necessary.


    Fixed up the linter complains and rebased. This is ready for further review.

  15. dergoegge commented at 4:36 pm on March 9, 2023: member
    Concept ACK
  16. DrahtBot added the label Needs rebase on Mar 20, 2023
  17. 0xB10C force-pushed on Mar 21, 2023
  18. DrahtBot removed the label Needs rebase on Mar 21, 2023
  19. 0xB10C force-pushed on Mar 21, 2023
  20. 0xB10C commented at 11:45 am on March 21, 2023: contributor

    The linter started complaining about “E275 missing whitespace after keyword”. Fixed this up in the individual commits.

    ac438b10918ef05df2e564734d879699c8aa18f9 -> 6de4fc73c935baa2cc6f54dd57c8e57a5df3c7ae (2022-05-connection-tracepoints.pr.1 -> 2022-05-connection-tracepoints.pr.2; compare)

  21. in src/net.cpp:948 in 6de4fc73c9 outdated
    943+                pnode->GetId(),
    944+                pnode->m_addr_name.c_str(),
    945+                pnode->ConnectionTypeAsString().c_str(),
    946+                pnode->ConnectedThroughNetwork(),
    947+                pnode->nKeyedNetGroup,
    948+                std::chrono::duration<uint64_t>(pnode->m_connected).count());
    


    dergoegge commented at 12:20 pm on March 21, 2023:

    Thinking a little bit about what data should go into the tracepoints.

    m_addr_name: would it make more sense to put addr in binary form (or both)? I’m not sure they are equivalent all the time. nKeyedNetGroup: is derived deterministically from addr using random seeds that are created with every CConnman. I think that means that across restarts of a node previous net groups loose meaning (i.e. you can’t compare them to netgroups of new connections). Maybe NetGroupManager::GetGroup(addr) would be better?


    0xB10C commented at 9:25 am on March 22, 2023:

    Thanks, helpful comments!

    • m_addr_name: my reasoning for passing m_addr_name as string instead of binary is that we likely don’t want to implement the binary decoding in each tracing script. I think the string is closer to what users need. However, not opposing to add the binary if there’s need.
    • nKeyedNetGroup: I added the netgroup as argument to better identify connections coming from the same network. For my use-case, it’s not problematic if nKeyedNetGroup isn’t consistent across restarts and doesn’t have a meaning besides being an ID. I had a quick look at NetGroupManager::GetGroup(). This looks like a good alternative to nKeyedNetGroup as consumers can choose to decode it if they want. I plan to play around with it a bit to see if it works.

    virtu commented at 1:24 pm on March 24, 2023:
    I can imagine some use cases where having consistent netgroup identifiers would be advantageous. Unless NetGroupManager::GetGroup() adds too much overhead, they seem preferable.

    0xB10C commented at 12:00 pm on April 14, 2023:

    Looking into GetGroup() I saw it returns a C++ std::vector<unsigned char>. We can’t pass C++ std::vector’s in the tracepoints as the eBPF code, written in C, won’t be able to read it. We can call data() and pass this in the tracepoint as pointer to unsigned chars (we do this with e.g. the raw P2P messages too). To read from this pointer, we also need to pass the of the length of underlying array.

    I’ll give this a shot in a separate branch.


    0xB10C commented at 12:20 pm on April 14, 2023:
    An alternative is to pass GetMappedAS() which is an uint32_t that will be !=0 in an ASMap future. A net group similar to Bitcoin Core’s can be calculated from the peer address by consumers that need it.
  22. in src/net.cpp:599 in 6de4fc73c9 outdated
    594+            GetId(),
    595+            m_addr_name.c_str(),
    596+            ConnectionTypeAsString().c_str(),
    597+            ConnectedThroughNetwork(),
    598+            nKeyedNetGroup,
    599+            std::chrono::duration<uint64_t>(m_connected).count());
    


    maflcko commented at 9:31 am on March 22, 2023:
    0            Ticks<std::chrono::seconds>(m_connected);
    

    nit, use Ticks, or count_seconds (if you want the compile failure if m_connected is changed to higher precision than seconds)?

  23. virtu commented at 1:43 pm on March 24, 2023: contributor

    ACK 6de4fc7

    Code reviewed tracepoints, tests, and demo script. Tracepoint placement looks good, so do tests. Successfully ran functional tests and the demo script.

    Some points:

    • Any reason why only inbound_connection and outbound_connection explicitly cast all arguments in the demo script?
    • I think some of the format specifiers in the demo script are lightly off, leading to outputs with negative netgroup values on my system. I believe it should be %lld for int64 and %llu for uint64.
    • As brought up previously, it would be nice to switch from nKeyedNetGroup to netgroup ids that are consistent over time
  24. DrahtBot requested review from jb55 on Mar 24, 2023
  25. in src/net.cpp:942 in 6de4fc73c9 outdated
    937@@ -930,6 +938,15 @@ bool CConnman::AttemptToEvictConnection()
    938     for (CNode* pnode : m_nodes) {
    939         if (pnode->GetId() == *node_id_to_evict) {
    940             LogPrint(BCLog::NET, "selected %s connection for eviction peer=%d; disconnecting\n", pnode->ConnectionTypeAsString(), pnode->GetId());
    941+
    942+            TRACE6(net, evicted_connection,
    


    dergoegge commented at 3:05 pm on March 29, 2023:
    0            TRACE6(net, evicted_inbound_connection,
    

    The logic in net exclusively evicts inbound peers and there is more eviction logic in net_processing for outbound connections. Maybe adding separate trace points for both makes sense?

  26. 0xB10C commented at 3:57 pm on April 25, 2023: contributor
    Still think this is relevant, marking this as in draft until I address the feedback.
  27. 0xB10C marked this as a draft on Apr 25, 2023
  28. 0xB10C force-pushed on Sep 11, 2023
  29. 0xB10C marked this as ready for review on Sep 11, 2023
  30. 0xB10C commented at 11:24 am on September 11, 2023: contributor

    This is ready for review again. Changes:

    • Rebased: master now includes #27831 and #27944. The tests in this PR have been updated to match the changes in these PRs.
    • #25832 (review): I’ve dropped nKeyedNetGroup from all tracepoints and didn’t add a GetGroup() call. The net group can be calculated from the passed address. Passing NetGroupManager::GetGroup() seems impractical as it’s a std::vector<unsigned char>.
    • #25832 (review): Renamed evicted_connection to evicted_inbound_connection to clarify that this only covers inbound evictions. Outbound eviction is happening in multiple places in net_processsing.cpp and would need multiple tracepoints. If needed, these can be added in a separate pull-request.
    • #25832 (review): using Ticks<std::chrono::seconds>(m_connected); now
    • adapted the bpftrace demo script and tested with bpftrace v0.18
  31. jb55 commented at 5:56 pm on September 11, 2023: contributor

    On Mon, May 02, 2022 at 04:30:15PM +0200, 0xb10c wrote:


    doc/tracing.md | 12 ++++ src/net_processing.cpp | 10 +++- test/functional/interface_usdt_net.py | 84 ++++++++++++++++++++++++++-

    diff –git a/src/net_processing.cpp b/src/net_processing.cpp index 6b415b3a1ea63..ca7e2c8ca3be0 100644 — a/src/net_processing.cpp +++ b/src/net_processing.cpp @@ -1662,13 +1662,21 @@ void PeerManagerImpl::Misbehaving(Peer& peer, int howmuch, const std::string& me const std::string message_prefixed = message.empty() ? "" : (": " + message); std::string warning;

    • if (score_now >= DISCOURAGEMENT_THRESHOLD && score_before < DISCOURAGEMENT_THRESHOLD) {
    • const bool discurage_threshold_exceeded = score_now >= DISCOURAGEMENT_THRESHOLD && score_before < DISCOURAGEMENT_THRESHOLD;
    • if (discurage_threshold_exceeded) {

    nit: should this be discourage_threshold_exceeded ? 😅

  32. 0xB10C requested review from virtu on Sep 11, 2023
  33. 0xB10C force-pushed on Sep 11, 2023
  34. 0xB10C force-pushed on Oct 16, 2023
  35. 0xB10C commented at 8:56 am on October 16, 2023: contributor
    rebased to inlcude #28629 and added the compiler flags here too.
  36. DrahtBot added the label CI failed on Oct 16, 2023
  37. DrahtBot removed the label CI failed on Oct 16, 2023
  38. jb55 commented at 1:22 am on October 19, 2023: contributor

    crACK af207077578265964674618b9e2b0e40c087ae35

    (although I noticed the mispelling is still there)

  39. DrahtBot removed review request from jb55 on Oct 19, 2023
  40. DrahtBot removed review request from virtu on Oct 19, 2023
  41. DrahtBot requested review from virtu on Oct 19, 2023
  42. DrahtBot requested review from dergoegge on Oct 19, 2023
  43. DrahtBot removed review request from virtu on Oct 19, 2023
  44. DrahtBot requested review from virtu on Oct 19, 2023
  45. DrahtBot added the label CI failed on Jan 14, 2024
  46. 0xB10C force-pushed on Jan 18, 2024
  47. 0xB10C commented at 10:03 am on January 18, 2024: contributor

    Hm the CI failure of previous releases, qt5 dev package and depends packages, DEBUG (CI job used to find silent-merge conflicts (?)) four days ago seems weird. Didn’t see any conflicts when rebasing.

    Anyway, rebased. Sorry for invalidating your ACK again, @jb55. Fixed the discurage -> discourage misspelling. I thought I did it in an earlier push.

  48. jb55 commented at 7:42 pm on January 22, 2024: contributor

    On Thu, Jan 18, 2024 at 02:03:35AM -0800, 0xB10C wrote:

    Hm the CI failure of previous releases, qt5 dev package and depends packages, DEBUG (CI job used to find silent-merge conflicts (?)) four days ago seems weird. Didn’t see any conflicts when rebasing.

    Anyway, rebased. Sorry for invalidating your ACK again, @jb55. Fixed the discurage -> discourage misspelling. I thought I did it in an earlier push.

    sorry I haven’t been able to sit down and actually test this yet. The changes look good though!

  49. 0xB10C commented at 10:51 am on March 7, 2024: contributor
    Can we re-trigger the Cirrus CI / ASan + LSan + UBSan + integer, no depends, USDT job? The failing wallet_send.py --descriptors test looks unrelated.
  50. DrahtBot removed the label CI failed on Mar 7, 2024
  51. Jess25051980 approved
  52. in doc/tracing.md:105 in 996029b1fd outdated
    100+
    101+Arguments passed:
    102+1. Peer ID as `int64`
    103+2. Peer address and port (IPv4, IPv6, Tor v3, I2P, ...) as `pointer to C-style String` (max. length 68 characters)
    104+3. Connection Type (inbound, feeler, outbound-full-relay, ...) as `pointer to C-style String` (max. length 20 characters)
    105+4. Network the peer connects from as `uint32` (1 = IPv4, 2 = IPv6, 3 = Onion, 4 = I2P, 5 = CJDNS). See `Network` enum in `netaddress.h`.
    


    laanwj commented at 2:26 pm on April 8, 2024:
    It’s slightly inconsistent to pass this parameter as int, while the other enum (Connection Type) gets passed as string. Might similarly pass GetNetworkName(...).c_str(). Especially after #26593 which would remove the call overhead in the happy path. No strong opinion though, I think the integer value is easier to use if you’re going to make a histogram or such. My personal preference is also string-less APIs. But it’s less consistent.

    0xB10C commented at 7:36 am on May 10, 2024:
    I fully agree what you’ve said. I’m leaving as is for now. If #26593 is merged and I retouch this, I’ll reconsider changing this. I’ll leave this as “unresolved” for now.
  53. in contrib/tracing/README.md:357 in 996029b1fd outdated
    352+Logging opened, closed, misbehaving, and evicted P2P connections
    353+OUTBOUND conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, net_group=245974404224840066, total_out=1
    354+INBOUND conn from 127.0.0.1:45324: id=1, type=inbound, network=0, net_group=90061749176390714, total_in=1
    355+MISBEHAVING conn id=1, score_before=0, score_increase=20, message='getdata message size = 50001', threshold_exceeded=false
    356+CLOSED conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, net_group=245974404224840066, established=1231006505
    357+EVICTED conn to 127.0.0.1:45324: id=1, type=inbound, network=0, net_group=90061749176390714, established=1612312312
    


    virtu commented at 8:53 am on April 29, 2024:
    0OUTBOUND conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, total_out=1
    1INBOUND conn from 127.0.0.1:45324: id=1, type=inbound, network=0, total_in=1
    2MISBEHAVING conn id=1, score_before=0, score_increase=20, message='getdata message size = 50001', threshold_exceeded=false
    3CLOSED conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, established=1231006505
    4EVICTED conn to 127.0.0.1:45324: id=1, type=inbound, network=0, established=1612312312
    

    0xB10C commented at 8:25 am on May 10, 2024:
    dropped in 8109319e102c41d3aeed0ecfbc3a0e23b7fea807
  54. virtu commented at 9:26 am on April 29, 2024: contributor

    ACK 996029b (modulo comment and this)

    Successfully ran log_p2p_connections.bt over the weekend on both x86_64/clang and aarch64/gcc.

    Extended functional test (interface_usdt_net.py) equally passed on both machines.

  55. 0xB10C force-pushed on May 10, 2024
  56. virtu commented at 7:41 am on May 21, 2024: contributor
    LGTM! ACK 8109319
  57. 0xB10C commented at 9:43 am on May 21, 2024: contributor
    note that #29575 removes the misbehavior score (which is exposed via the net:misbehaving_connection tracepoint). If/when #29575 is merged, the misbehaving_connection tracepoint will change.
  58. DrahtBot added the label Needs rebase on Jun 20, 2024
  59. 0xB10C force-pushed on Jun 21, 2024
  60. 0xB10C closed this on Jun 21, 2024

  61. 0xB10C force-pushed on Jun 21, 2024
  62. tracing: add inbound connection tracepoint fce46fda40
  63. tracing: add outbound connection tracepoint 941272c970
  64. tracing: add inbound connection eviction tracepoint 85be643c28
  65. tracing: add misbehaving conn tracepoint f91084af3f
  66. tracing: connection closed tracepoint 52e0914ff5
  67. tracing: log_p2p_connections.bt example
    A bpftrace script that logs information from the
    net:*_connection tracepoints.
    
    I've tested this script with bpftrace version 0.14.1 and v0.20.2.
    eca0750b8d
  68. 0xB10C reopened this on Jun 21, 2024

  69. 0xB10C commented at 11:38 am on June 21, 2024: contributor

    Must have closed this while force pushing. That was not my intention.

    I’ve rebased this with #29575 merged. The net:misbehaving_connection changed from

    0TRACE5(net, misbehaving_connection,
    1  peer.m_id,
    2  score_before,
    3  howmuch,
    4  message.c_str(),
    5  discourage_threshold_exceeded);
    

    to

    0TRACE2(net, misbehaving_connection,
    1  peer.m_id,
    2  message.c_str());
    

    I’ve updates the tests and the bpftrace demo script accordingly.

  70. DrahtBot removed the label Needs rebase on Jun 21, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-03 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me