← index

How similar are addrman of different nodes

An archive of bnoc.xyz · view original topic →

stratospher · #1 ·

recently chatted with a few people about how similar the addrman of 2 nodes might be - that is how much of the ip addresses stored in the addrman of different nodes overlap and we decided to actually measure it. this post compares the addrman of 9 nodes.

TLDR

addrman of 2 nodes tend to be more similar if they support the same networks.

the addrmen under consideration

all the addrman were downloaded around the same time on may 19. getrawaddrman RPC lets you download addrman in json format. i downloaded some of b10c’s addrman using the endpoint on peer.observer and compared it with my addrman.

summarising node descriptions from peer.observer:(sorted by descending order of number of total entries)

addrman consists of 2 tables:

here is the compositions of the new and tried tables of the peers mentioned above for reference:

peer number of entries in new table new table composition number of entries in tried table tried table composition
alice 65,534 ipv4 79% / ipv6 21% 9,837 ipv4 100%
bob 65,533 ipv4 79% / ipv6 21% 8,251 ipv4 100%
frank 65,535 ipv4 79% / ipv6 21% 8,299 ipv4 100%
mike 65,482 ipv4 80% / ipv6 20% 10,920 ipv4 100%
nico 65,530 ipv4 78% / ipv6 22% 8,647 ipv4 100%
dave 65,534 ipv4 61% / ipv6 12% / onion 24% / i2p 3% 12,429 ipv4 49% / onion 38% / i2p 13%
erin 65,532 ipv4 61% / ipv6 11% / onion 24% / i2p 3% 12,650 ipv4 48% / onion 39% / i2p 13%
kane 62,535 ipv4 38% / ipv6 6% / onion 42% / i2p 14% 10,550 ipv4 19% / onion 54% / i2p 28%
my node 65,450 ipv4 66% / ipv6 14% / onion 20% 7,726 ipv4 67% / ipv6 2% / onion 32%

*a handful of cjdns entries in tried tables are not shown above - frank has 7, mike has 1, erin has 3, kane has 2.
*kane’s addrman contains very old clearnet entries probably from an old configuration?

similarity metrics

% similarity is calculated by comparing only the ip addresses. ex: 100.10.90.1:8333 and 100.10.90.1:8339 are considered the same since they have the same ip address even though ports might differ.

if you compare alice’s addrman and bob’s addrman, alice ∩ bob means IP addresses present in both addrman tables.

since the new table size is roughly the same for all peers, there won’t be much of a difference in how similar alice finds her new table when compared to bob ([alice, bob] cell in the similarity calculation for new table below) and how similar bob finds his new table when compared to Alice ([bob,alice] cell in the similarity calculation for new table below).
the new table similarity calculation table below is kind of symmetric.

however there is a huge difference in tried table size among the nodes. so you will find the tried table similarity calculation table below not symmetric at all! maybe we should measure it some other way.
since bob has a smaller tried table compared to alice, % similarity for bob would be more - that is bob finds his tried table ~54% similar to alice, whereas alice finds her tried table only ~45% similar to bob (she has a larger tried table!).

1. new table similarity

node under consideration\compare with alice bob dave erin frank kane mike nico my node
alice x 48.61% 38.23% 38.68% 48.72% 8.84% 48.57% 43.94% 43.74%
bob 48.66% x 38.84% 38.96% 49.55% 9.22% 48.72% 44.10% 44.30%
dave 38.19% 38.77% x 49.62% 38.74% 29.69% 38.83% 34.20% 49.08%
erin 38.66% 38.90% 49.64% x 38.99% 29.34% 38.73% 34.44% 48.90%
frank 48.82% 49.61% 38.85% 39.08% x 9.17% 48.82% 44.20% 44.27%
kane 10.28% 10.71% 34.54% 34.13% 10.64% x 10.65% 8.85% 29.91%
mike 48.80% 48.91% 39.06% 38.94% 48.96% 9.21% x 43.94% 44.69%
nico 43.87% 43.99% 34.18% 34.41% 44.05% 7.61% 43.66% x 38.60%
my node 44.17% 44.69% 49.61% 49.41% 44.61% 25.98% 44.91% 39.04% x

2. tried table similarity

node under consideration\compare with alice bob dave erin frank kane mike nico my node
alice x 45.33% 37.06% 37.25% 45.42% 12.54% 55.73% 50.46% 32.37%
bob 54.13% x 36.86% 36.99% 49.37% 12.09% 54.62% 50.61% 31.58%
dave 29.19% 24.32% x 37.82% 24.90% 27.04% 30.57% 27.16% 23.58%
erin 28.83% 23.98% 37.15% x 24.41% 27.39% 29.72% 26.77% 23.26%
frank 54.05% 49.21% 37.61% 37.54% x 12.09% 54.82% 50.41% 32.00%
kane 11.61% 9.37% 31.78% 32.76% 9.41% x 12.29% 10.04% 14.31%
mike 50.20% 41.21% 34.96% 34.59% 41.49% 11.96% x 46.02% 29.61%
nico 57.53% 48.32% 39.31% 39.45% 48.30% 12.37% 58.25% x 35.61%
my node 41.04% 33.53% 37.95% 38.10% 34.09% 19.60% 41.68% 39.60% x

interpretation

new table is shaped by ADDR relay gossip (a network phenomenon), so similarity makes sense — though i’d have guessed higher:

tried table is shaped by each node’s unique connection history. similarity ranges between 9 - 58% and is asymmetric because tried table sizes differ a lot!

i’d have guessed a lower % for tried table similarity though maybe the 50% for clearnet only cluster makes sense because we have a limited pool of clearnet nodes we can connect to?

would be curious about what people think about the addrman similarity stats!

b10c · #2 ·

Thanks for posting!

I think your comparision matrixes could also be plotted as a heatmap with matplotib/seaborn and have a color showing higher/lower similarities.

Maycon Fabio · #3 · · in reply to #2

I was going to comment the same thing, it would be way better to visualize. So to add to this post, I asked to claude to do it as it’s a simple thing, the hard work was already done by @stratospher :slight_smile: .

1. new table similarity

2. tried table similarity

stratospher · #4 · · in reply to #3

wow so pretty! it looks so much better than just numbers! thanks @m4ycon! and thanks @b10c for the idea!

I also wanted to reorder the rows and columns so that the similarity degradation can hopefully be seen better.

basically “alice, bob, frank, mike, nico, my node, dave, erin, kane” as the row + column headers instead of current order.

EDIT: so I asked claude to make heat map as well! it didn’t look nice at all - so shared @m4ycon’s great colours and aesthetics.

but I’m not able to edit the original post and replace the numbers with the nice heatmaps and maybe remove more numbers.


Daniela Brozzoni · #5 ·

Thanks for sharing!

I also expected the similarity for the new tables to be higher, and for the tried tables to be lower, I’m quite surprised!

It’s interesting to see that the similarity for the new table is around 50%. I expected it to be higher, since I thought every node would have pretty much every address on the network in its addrman. However, I didn’t consider that the size of the new table is capped, so maybe that’s why there is so much discrepancy.

Is there any way you can share the raw addrman data with me? I would like to do the same experiment, but by comparing timestamps too, it would be helpful to sort out Fingerprinting nodes: Possible Solutions - #6 by naiyoma - Protocol Design - Delving Bitcoin. I would create a thread asking for people’s getrawaddrman, but sadly I need to gather data from various nodes at about the same time for it to be meaningful :frowning:

stratospher · #6 · · in reply to #5

you should reach out to @b10c! he has very cool infra for this. also related Historical Bitcoin Core IP address manager snapshots (via getrawaddrman)

note to self: I do want to run the similarity comparison again after removing the super old addresses + also check similarity of alice’s addrman over the years just for fun.

b10c · #7 · · in reply to #5

Welcome @danielabrozzoni!

I think @deadmanoz’s addrman snapshots from 2026-03-05 till about now should be a good starting point. In the README, he mentions that these are all captured at about the same time.

Additionally, I’ve just last week set up something that captures the snapshots at 0 UTC on all my monitoring nodes: add: daily getrawaddrman snapshots by 0xB10C · Pull Request #160 · peer-observer/infra-library · GitHub & change: enable addrman snapshots by default by 0xB10C · Pull Request #174 · peer-observer/infra-library · GitHub. So we have data from now on.