0xB10C
commented at 5:18 pm on August 12, 2022:
contributor
This adds five new tracepoints with documentation and tests for network connections:
established connections with net:inbound_connection and net:outbound_connection
closed connections (both closed by us or by the peer) with net:closed_connnection
inbound connections that we choose to evict with net:evicted_inbound_connection
connections that are misbehaving and punished with net:misbehaving_connection
I’ve been using these tracepoints for a few months now to monitor connection lifetimes, re-connection frequency by IP and netgroup, misbehavior, peer discouragement, and eviction and more. Together with the two existing P2P message tracepoints they allow for a good overview of local P2P network activity. Also sort-of addresses #22006 (review).
I’ve been back and forth on which arguments to include. For example, net:evicted_connection could also include some of the eviction metrics like e.g. last_block_time, min_ping_time, … but I’ve left them out for now. If wanted, this can be added here or in a follow-up. I’ve tried to minimize a potential performance impact by measuring executed instructions with gdb where possible (method described here). I don’t think a few hundred extra instructions are too crucial, as connection opens/closes aren’t too frequent (compared to e.g. P2P messages). Note: e.g. CreateNodeFromAcceptedSocket() usually executes between 80k and 90k instructions for each new inbound connection.
tracepoint
instructions
net:inbound_connection
390 ins
net:outbound_connection
between 700 and 1000 ins
net:closed_connnection
473 ins
net:evicted_inbound_connection
not measured; likely similar to net:closed_connnection
net:misbehaving_connection
not measured
Also added a bpftrace (tested with v0.14.1) log_p2p_connections.bt example script that produces output similar to:
#30381 ([WIP] net: return result from addnode RPC by willcl-ark)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
0xB10C referenced this in commit
f0e8758415
on Sep 14, 2022
0xB10C
commented at 1:08 pm on November 16, 2022:
contributor
Note to self: might be worth checking if we can also pass AS information and not only the net_group.
0xB10C force-pushed
on Jan 20, 2023
0xB10C
commented at 4:28 pm on January 20, 2023:
contributor
Note to self: might be worth checking if we can also pass AS information and not only the net_group.
This is possible by accessing the CNodeStats, but don’t think we should do that in the tracepoint. We have the IP and can lookup the AS in a tracing scirpt, if necessary.
Fixed up the linter complains and rebased. This is ready for further review.
dergoegge
commented at 4:36 pm on March 9, 2023:
member
Concept ACK
DrahtBot added the label
Needs rebase
on Mar 20, 2023
0xB10C force-pushed
on Mar 21, 2023
DrahtBot removed the label
Needs rebase
on Mar 21, 2023
0xB10C force-pushed
on Mar 21, 2023
0xB10C
commented at 11:45 am on March 21, 2023:
contributor
The linter started complaining about “E275 missing whitespace after keyword”. Fixed this up in the individual commits.
dergoegge
commented at 12:20 pm on March 21, 2023:
Thinking a little bit about what data should go into the tracepoints.
m_addr_name: would it make more sense to put addr in binary form (or both)? I’m not sure they are equivalent all the time.
nKeyedNetGroup: is derived deterministically from addr using random seeds that are created with every CConnman. I think that means that across restarts of a node previous net groups loose meaning (i.e. you can’t compare them to netgroups of new connections). Maybe NetGroupManager::GetGroup(addr) would be better?
m_addr_name: my reasoning for passing m_addr_name as string instead of binary is that we likely don’t want to implement the binary decoding in each tracing script. I think the string is closer to what users need. However, not opposing to add the binary if there’s need.
nKeyedNetGroup: I added the netgroup as argument to better identify connections coming from the same network. For my use-case, it’s not problematic if nKeyedNetGroup isn’t consistent across restarts and doesn’t have a meaning besides being an ID. I had a quick look at NetGroupManager::GetGroup(). This looks like a good alternative to nKeyedNetGroup as consumers can choose to decode it if they want. I plan to play around with it a bit to see if it works.
I can imagine some use cases where having consistent netgroup identifiers would be advantageous. Unless NetGroupManager::GetGroup() adds too much overhead, they seem preferable.
Looking into GetGroup() I saw it returns a C++ std::vector<unsigned char>. We can’t pass C++ std::vector’s in the tracepoints as the eBPF code, written in C, won’t be able to read it. We can call data() and pass this in the tracepoint as pointer to unsigned chars (we do this with e.g. the raw P2P messages too). To read from this pointer, we also need to pass the of the length of underlying array.
An alternative is to pass GetMappedAS() which is an uint32_t that will be !=0 in an ASMap future. A net group similar to Bitcoin Core’s can be calculated from the peer address by consumers that need it.
Code reviewed tracepoints, tests, and demo script. Tracepoint placement looks good, so do tests. Successfully ran functional tests and the demo script.
Some points:
Any reason why only inbound_connection and outbound_connection explicitly cast all arguments in the demo script?
I think some of the format specifiers in the demo script are lightly off, leading to outputs with negative netgroup values on my system. I believe it should be %lld for int64 and %llu for uint64.
As brought up previously, it would be nice to switch from nKeyedNetGroup to netgroup ids that are consistent over time
DrahtBot requested review from jb55
on Mar 24, 2023
The logic in net exclusively evicts inbound peers and there is more eviction logic in net_processing for outbound connections. Maybe adding separate trace points for both makes sense?
0xB10C
commented at 3:57 pm on April 25, 2023:
contributor
Still think this is relevant, marking this as in draft until I address the feedback.
0xB10C marked this as a draft
on Apr 25, 2023
0xB10C force-pushed
on Sep 11, 2023
0xB10C marked this as ready for review
on Sep 11, 2023
0xB10C
commented at 11:24 am on September 11, 2023:
contributor
This is ready for review again. Changes:
Rebased: master now includes #27831 and #27944. The tests in this PR have been updated to match the changes in these PRs.
#25832 (review): I’ve dropped nKeyedNetGroup from all tracepoints and didn’t add a GetGroup() call. The net group can be calculated from the passed address. Passing NetGroupManager::GetGroup() seems impractical as it’s a std::vector<unsigned char>.
#25832 (review): Renamed evicted_connection to evicted_inbound_connection to clarify that this only covers inbound evictions. Outbound eviction is happening in multiple places in net_processsing.cpp and would need multiple tracepoints. If needed, these can be added in a separate pull-request.
#25832 (review): using Ticks<std::chrono::seconds>(m_connected); now
adapted the bpftrace demo script and tested with bpftrace v0.18
jb55
commented at 5:56 pm on September 11, 2023:
contributor
On Mon, May 02, 2022 at 04:30:15PM +0200, 0xb10c wrote:
nit: should this be discourage_threshold_exceeded ? 😅
0xB10C requested review from virtu
on Sep 11, 2023
0xB10C force-pushed
on Sep 11, 2023
0xB10C force-pushed
on Oct 16, 2023
0xB10C
commented at 8:56 am on October 16, 2023:
contributor
rebased to inlcude #28629 and added the compiler flags here too.
DrahtBot added the label
CI failed
on Oct 16, 2023
DrahtBot removed the label
CI failed
on Oct 16, 2023
jb55
commented at 1:22 am on October 19, 2023:
contributor
crACKaf207077578265964674618b9e2b0e40c087ae35
(although I noticed the mispelling is still there)
DrahtBot removed review request from jb55
on Oct 19, 2023
DrahtBot removed review request from virtu
on Oct 19, 2023
DrahtBot requested review from virtu
on Oct 19, 2023
DrahtBot requested review from dergoegge
on Oct 19, 2023
DrahtBot removed review request from virtu
on Oct 19, 2023
DrahtBot requested review from virtu
on Oct 19, 2023
DrahtBot added the label
CI failed
on Jan 14, 2024
0xB10C force-pushed
on Jan 18, 2024
0xB10C
commented at 10:03 am on January 18, 2024:
contributor
Hm the CI failure of previous releases, qt5 dev package and depends packages, DEBUG (CI job used to find silent-merge conflicts (?)) four days ago seems weird. Didn’t see any conflicts when rebasing.
Anyway, rebased. Sorry for invalidating your ACK again, @jb55. Fixed the discurage -> discourage misspelling. I thought I did it in an earlier push.
jb55
commented at 7:42 pm on January 22, 2024:
contributor
On Thu, Jan 18, 2024 at 02:03:35AM -0800, 0xB10C wrote:
Hm the CI failure of previous releases, qt5 dev package and depends packages, DEBUG (CI job used to find silent-merge conflicts (?)) four
days ago seems weird. Didn’t see any conflicts when rebasing.
Anyway, rebased. Sorry for invalidating your ACK again, @jb55. Fixed
the discurage -> discourage misspelling. I thought I did it in an
earlier push.
sorry I haven’t been able to sit down and actually test this yet. The
changes look good though!
0xB10C
commented at 10:51 am on March 7, 2024:
contributor
Can we re-trigger the Cirrus CI / ASan + LSan + UBSan + integer, no depends, USDT job? The failing wallet_send.py --descriptors test looks unrelated.
DrahtBot removed the label
CI failed
on Mar 7, 2024
Jess25051980 approved
in
doc/tracing.md:108
in
996029b1fdoutdated
100+
101+Arguments passed:
102+1. Peer ID as `int64`
103+2. Peer address and port (IPv4, IPv6, Tor v3, I2P, ...) as `pointer to C-style String` (max. length 68 characters)
104+3. Connection Type (inbound, feeler, outbound-full-relay, ...) as `pointer to C-style String` (max. length 20 characters)
105+4. Network the peer connects from as `uint32` (1 = IPv4, 2 = IPv6, 3 = Onion, 4 = I2P, 5 = CJDNS). See `Network` enum in `netaddress.h`.
It’s slightly inconsistent to pass this parameter as int, while the other enum (Connection Type) gets passed as string. Might similarly pass GetNetworkName(...).c_str(). Especially after #26593 which would remove the call overhead in the happy path.
No strong opinion though, I think the integer value is easier to use if you’re going to make a histogram or such. My personal preference is also string-less APIs. But it’s less consistent.
I fully agree what you’ve said. I’m leaving as is for now. If #26593 is merged and I retouch this, I’ll reconsider changing this. I’ll leave this as “unresolved” for now.
0xB10C
commented at 9:43 am on May 21, 2024:
contributor
note that #29575 removes the misbehavior score (which is exposed via the net:misbehaving_connection tracepoint). If/when #29575 is merged, the misbehaving_connection tracepoint will change.
DrahtBot added the label
Needs rebase
on Jun 20, 2024
0xB10C force-pushed
on Jun 21, 2024
0xB10C closed this
on Jun 21, 2024
0xB10C force-pushed
on Jun 21, 2024
0xB10C reopened this
on Jun 21, 2024
0xB10C
commented at 11:38 am on June 21, 2024:
contributor
Must have closed this while force pushing. That was not my intention.
I’ve rebased this with #29575 merged. The net:misbehaving_connection changed from
nit: (coming from one of the conflicting pull requests):
It probably doesn’t matter much, but as a tracepoint user I’d find it a bit odd to receive the trace, then call getpeerinfo and get an empty reply, as the RPC thread may still have the view on the previous m_nodes state.
I understand that this can never be race-free for the user, but for consistency, I’d recommend to move this after the next LOCK scope. See also commit 55555574d105f6c529c5c966c3c971c9db5ab786 / https://github.com/bitcoin/bitcoin/pull/30512/commits
As a follow-up idea, it could make sense to reduce the timeout here and instead scale it with the self.timeout_factor, so that the timeout is magically adjusted to the speed of the test-env it is running in
in
doc/tracing.md:129
in
07e7bd873boutdated
124+Arguments passed:
125+1. Peer ID as `int64`
126+2. Peer address and port (IPv4, IPv6, Tor v3, I2P, ...) as `pointer to C-style String` (max. length 68 characters)
127+3. Connection Type (inbound, feeler, outbound-full-relay, ...) as `pointer to C-style String` (max. length 20 characters)
128+4. Network the peer connects from as `uint32` (1 = IPv4, 2 = IPv6, 3 = Onion, 4 = I2P, 5 = CJDNS). See `Network` enum in `netaddress.h`.
129+5. Connection established UNIX epoch timestamp as `uint64`.
style nit in 16ea431f1f3a4ae57729ba265ff080b64b711c5d: Could add the missing trailing comma now? Otherwise this line will have to be touched again in the future, if something is appended.
in
doc/tracing.md:150
in
d07d558da9outdated
145+Arguments passed:
146+1. Peer ID as `int64`
147+2. Peer address and port (IPv4, IPv6, Tor v3, I2P, ...) as `pointer to C-style String` (max. length 68 characters)
148+3. Connection Type (inbound, feeler, outbound-full-relay, ...) as `pointer to C-style String` (max. length 20 characters)
149+4. Network the peer connects from as `uint32` (1 = IPv4, 2 = IPv6, 3 = Onion, 4 = I2P, 5 = CJDNS). See `Network` enum in `netaddress.h`.
150+5. Connection established UNIX epoch timestamp as `uint64`.
DrahtBot requested review from dergoegge
on Sep 4, 2024
DrahtBot requested review from virtu
on Sep 4, 2024
i-am-yuvi
commented at 2:16 pm on September 18, 2024:
contributor
tested ACK 🔥
Successfully ran demo scripts and functional tests!!
I have been using these tracepoints for 3-4 months from now and I find it very useful. They help you keep track of your peers, also I am working on a tool that detects spy peers and for that these tracepoints are very useful.
98+Is called when a new inbound connection is opened to us. Passes information about
99+the peer and the number of inbound connections including the newly opened connection.
100+
101+Arguments passed:
102+1. Peer ID as `int64`
103+2. Peer address and port (IPv4, IPv6, Tor v3, I2P, ...) as `pointer to C-style String` (max. length 68 characters)
vasild
commented at 12:42 pm on September 30, 2024:
Then addnode aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa:8333 and CNode::m_addr_name gets assigned that 70-chars long string.
Less importantly, that 68 includes space for : and for the port, whereas here we don’t have a port.
Good catch. I confirmed that the a’s actually show up in the tracing scripts and are cut off. I think it’s fine to leave 68 in the tests, but I’ll add a footnote in the docs in a separate commit.
in
test/functional/interface_usdt_net.py:120
in
5b59cfc944outdated
I couldn’t find the doc of bpf_usdt_readarg_p(). What would it do if the argument is longer than the supplied buffer? I presume the last arg is the length of the buffer, so it is not going to overflow, I guess. But would it add a trailing '\0'? That is - could it copy MAX_PEER_ADDR_LENGTH from the input to the output and leave it like that? If yes, would that be a problem - a string without a terminating '\0'?
What would it do if the argument is longer than the supplied buffer? I presume the last arg is the length of the buffer, so it is not going to overflow, I guess.
Correct, it would cut it off.
But would it add a trailing ‘\0’?
No. However there is bpf_probe_read_user_str() which does that. This generally seems to be the better suited methodology here. I’ll use that for the changes in this PR. There are other places in our code we can update in a potential follow up.
That is - could it copy MAX_PEER_ADDR_LENGTH from the input to the output and leave it like that? If yes, would that be a problem - a string without a terminating ‘\0’?
I think, for us, in this test here, this isn’t a problem. The Python ctype mapping for Connection defined below has the same size for an address (("addr", ctypes.c_char * MAX_PEER_ADDR_LENGTH)).
in
doc/tracing.md:117
in
5b59cfc944outdated
112+
113+Arguments passed:
114+1. Peer ID as `int64`
115+2. Peer address and port (IPv4, IPv6, Tor v3, I2P, ...) as `pointer to C-style String` (max. length 68 characters)
116+3. Connection Type (inbound, feeler, outbound-full-relay, ...) as `pointer to C-style String` (max. length 20 characters)
117+4. Network the peer connects from as `uint32` (1 = IPv4, 2 = IPv6, 3 = Onion, 4 = I2P, 5 = CJDNS). See `Network` enum in `netaddress.h`.
117+4. Network the peer connects from as `uint32` (1 = IPv4, 2 = IPv6, 3 = Onion, 4 = I2P, 5 = CJDNS). See `Network` enum in `netaddress.h`.
118+5. Number of existing outbound connections as `uint64` including the newly opened outbound connection.
119+
120+#### Tracepoint `net:evicted_inbound_connection`
121+
122+Is called when a inbound connection is evicted by us. Passes information about the evicted peer and the time at connection establishment.
Based on earlier review, I renamed the tracepoint to explicitly mention “inbound” - I think the same is fine for the documentation. If we were going to add (an) outbound eviction tracepoint(s), then they’d get their own tracepoint documentation.
I think this can be handled on the user side. The bpftrace tool cuts of strings after 64 chars by default and bcc and libbpf both have bpf_probe_read_user_str(), which would replace the last char with \0 on overflow.
Is bpf.perf_buffer_poll(timeout=400) needed now that there is self.wait_until(lambda: len(self.nodes[0].getpeerinfo()) == 0)? If not needed, then better drop it.
in
contrib/tracing/log_p2p_connections.bt:15
in
5b59cfc944outdated
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
0xB10C referenced this in commit
f52ccaa34d
on Nov 13, 2024
0xB10C referenced this in commit
15dfd72f8f
on Nov 13, 2024
0xB10C force-pushed
on Nov 13, 2024
DrahtBot removed the label
CI failed
on Nov 13, 2024
DrahtBot added the label
CI failed
on Dec 13, 2024
DrahtBot
commented at 0:43 am on December 13, 2024:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
HiHat referenced this in commit
7914d1fd6f
on Dec 16, 2024
in
test/functional/interface_usdt_net.py:127
in
15dfd72f8foutdated
Non blocker comment: this cast from uint64_t to (void*) looks odd, maybe it will produce compiler warning in some cases? Also, on 32 bit systems that should be uint32_t. Can these variables be declared as void* instead? Like:
A bpftrace script that logs information from the
net:*_connection tracepoints.
I've tested this script with bpftrace version 0.14.1 and v0.20.2.
b19b526758
tracing: document that peer addrs can be >68 chars
A v3 onion address with a `:` and a five digit port has a length of
68 chars. As noted in
https://github.com/bitcoin/bitcoin/pull/25832#discussion_r1781040991
peers e.g. added via hostname might have a longer CNode::m_addr_name.
These might be cut off in tracing tools.
e3622a9692
0xB10C force-pushed
on Feb 4, 2025
0xB10C
commented at 9:26 am on February 4, 2025:
contributor
The BIP155 approach is a bit more involved with making GetBIP155Network() public, introducing a ConnectedThroughBIP155Network() helper, and passing 0 (or something else?) for NET_INTERNAL as there is no BIP155 equivalent. With this approach, NET_UNROUTABLE isn’t exposed - e.g. 127.0.0.1 will get the BIP155 NetworkID for IPv4.
The network name as string approach exposes NET_UNROUTABLE as not_publicly_routable and NET_INTERNAL as internal. We have these in -netinfo too.
I have a slight preference on BIP155 NetworkIDs over keeping-as-is or network name as string.
laanwj
commented at 2:50 pm on February 9, 2025:
member
I have a slight preference on BIP155 NetworkIDs over keeping-as-is or network name as string.
Agree. Strings are sensitive to typos, and require extra effort for matching when doing statistics. i think even the current enumeration is better than that. BIP155 is a good suggestion, but as you have to make up non-standard values, and handle NET_UNROUTABLE differently i’m not entirely sure.
The purpose of tracing is to gain insight into internal state so logging the actual values makes more sense than thinking up a new convention. The internal enumeration is likely stable enough to warrant being a (semi-)stable interface.
sipa
commented at 3:22 pm on February 9, 2025:
member
The BIP155 approach is a bit more involved with making GetBIP155Network() public, introducing a ConnectedThroughBIP155Network() helper, and passing 0 (or something else?) for NET_INTERNAL as there is no BIP155 equivalent.
Given that BIP155 network IDs are uint8_t’s, I think it suffices to pick a value outside of that range for unreachable. For example, 0x100, or 0xffffffff for a uint32_t, or -1 for a signed integer type.
Using 0 works too of course, and I guess it won’t get assigned in future extensions, but using something outside of the range might be even more future-proof.
0xB10C
commented at 2:46 pm on February 12, 2025:
contributor
Agree. Strings are sensitive to typos, and require extra effort for matching when doing statistics. i think even the current enumeration is better than that. BIP155 is a good suggestion, but as you have to make up non-standard values, and handle NET_UNROUTABLE differently i’m not entirely sure.
The purpose of tracing is to gain insight into internal state so logging the actual values makes more sense than thinking up a new convention. The internal enumeration is likely stable enough to warrant being a (semi-)stable interface.
I tend to agree. A few toughs:
Mapping the internal Network enum to other values makes it harder to go from tracepoint output/observation to Bitcoin Core internals again
Using a third (1. Network enum, 2. BIP155) protocol “BIP155 extended for tracepoint interface to handle Bitcoin Core internal values not present in BIP155” seems weird, confusing, and not worth it (?). If mapping to BIP155 would have been more straight forward, I’d have been happy to use it.
Personally, I’d favor to avoid changes/functions (like ConnectedThroughBIP155Network(), making GetBIP155Network() public) that are only used by or useful for tracepoints.
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2025-06-03 09:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me