See: #16070
Let’s continue the conversation here though.
Instead of limiting by number of total IPs, we now limit by family, which seems more sane, and doesn’t bias against IPv6.
Concept ACK
The quoted discussion mentions that a maximum 31 IP:s fit in a single DNS response packet over UDP. Would it make sense to go with a nMaxIPs
of 31
instead of the 32
?
The quoted discussion mentions that a maximum 31 IP:s fit in a single DNS response packet over UDP. Would it make sense to go with a
nMaxIPs
of31
instead of the32
?
I actually don’t remember how I calculated that or if I even calculated it correctly… Someone should verify… :eyes: @EthanHeilman
Update: Ah, so we have 512 bytes to spend, and the DNS response should look something like: Header = 96 bits Query = 32 bits (assuming “.” as question) Answer = ( 96 bits + 32 bits ) * numIPv4 + ( 96 bits + 128 bits ) * numIPv6
Note that this assumes we use DNS packet compression in the Answer section, no Authority or Additional sections, and “.” as the question (empty QNAME).
So in this case we can get 31 IPv4 addresses at the most, or 17 IPv6 addresses (with some room to spare).
This is just a rough calculation, we have yet to decide whether it is even a good reference point or not.
Reference: https://www2.cs.duke.edu/courses/fall16/compsci356/DNS/DNS-primer.pdf
If I’m reading the code correctly even if you set nMaxIPs=1
that will not stop a DNS seeder or misconfigured DNS resolver from sending you far more than 1
IP address over UDP or TCP. Instead Bitcoin will receive all of those IPs and then select 1
of them and throw away the rest. So sadly this change won’t fix DNS over TCP grossness.
It seems reasonable to reduce this number to 64
, 32
, or 16
but beyond that point I start to worry that there is a chance a node will get all offline IP addresses.
Here is my back of the envelope calculation. Assume that U is the fraction of IP addresses returned by seeders that are offline and numSeeders is the number of DNSseeders. The probability a node gets all unreachable IP addresses is:
U^(numSeeders*nMaxIPs)
U=0.9
, numSeeders=8
and nMaxIPs = 8
then the probability is 0.001.
0.00117=0.9^(8*8)
That is we will see a failure to connect on average once each thousand times a node calls the seeders. Assuming seeders go down from time to time or incoming connection exhaustion attacks we probably want a number greater than nMaxIPs=8
.
The blog post I wrote that counted IP addresses in a DNS query focuses on TCP based responses. How many IP addresses can a DNS query return?
Instead Bitcoin will receive all of those IPs and then select
1
of them and throw away the rest. So sadly this change won’t fix DNS over TCP grossness.
Yes, that’s a good point. Perhaps it should be recommended that Bitcoin DNS seeder implementations limit the number of returned addresses to keep the response in one UDP packet and thus avoiding fragile DNS-over-TCP (fragile due middleboxes filtering TCP/53).
Are there any BIPs around how seeders should behave?
There’s doc/dnsseed-policy.md
which all DNS seed operators need to agree with (this is particular to this project, not bitcoin in general).
@EthanHeilman yeah I think we can definitely do 32, one thing that I realized after reading TheBlueMatt’s response (in the other thread :roll_eyes:) was that: because we set ai_family = AF_UNSPEC
, we actually get back both the A
and AAAA
records, and the total would be more than 32.
Something that I want to discuss is: after we get the names from getaddrinfo
, should we randomize the returned linked list first before truncating to 32 entries? Mostly because getaddrinfo
is usually implemented in such a way that the IPv4
addresses come before the IPv6
addresses. As it currently stands, IPv6
addresses are already disadvantaged by the fact that less of them fit within a 512byte payload, with this change, we’d be truncating mostly IPv6
addresses…
It also seems kind of weird that we’re basically disadvantaging the IPv6
network by nature of DNS’s 512 byte limit (which remains true even with randomization)… I’m not sure what’s best here, but perhaps a limit per-family is what we want?
It also seems kind of weird that we’re basically disadvantaging the IPv6 network by nature of DNS’s 512 byte limit (which remains true even with randomization)… I’m not sure what’s best here, but perhaps a limit per-family is what we want?
I’m not sure I understand, but I maybe I missed something what is limiting DNS to 512 bytes? Neither Bitcoin’s DNS local resolver nor the DNS system have 512 Byte limit. I suppose @sipa’s DNSseeder limits responses to 512 Bytes but other seeders don’t always.
Something that I want to discuss is: after we get the names from getaddrinfo, should we randomize the returned linked list first before truncating to 32 entries? Mostly because getaddrinfo is usually implemented in such a way that the IPv4 addresses come before the IPv6 addresses.
I’m worried that we now adding complexity to the DNS seeding process and I don’t see any benefit that justifies this added complex. Is middleware boxes filtering TCP/53 a serious ongoing problem impacting Bitcoin?
On the other hand it seems simple to reduce the number of IP addresses accepted from 256 to 64. Given none of the seeders return more than 50 IP addresses no IPv6 addresses would be dropped in favor of IPv4 addresses even if the response isn’t randomized.
@sipa What benefit justifies making this code more complex? The current code doesn’t have any negative effects on seeders that use your implementation.
I see it as a choice between:
Setting nMaxIPs=16
or nMaxIPs=32
which would cause a preference for IPv4 over IPv6 in some circumstances, This might require writing additional code to remove this preference.
and setting nMaxIPs=64
which requires no additional code or added complexity.
@EthanHeilman See my patch here: https://github.com/bitcoin/bitcoin/compare/master...dongcarl:2019-12-rework-seeding
It doesn’t radomize (so less complexity), but it does pick 16 addresses from each address family (eliminating family bias). Once the seeders use domain name compression properly, they should be able to fit 16 IPv6 addresses in a single packet. Let me know if that seems reasonable, I’d be happy to reach out to seed owners.
@dongcarl I like your change of pulling seed IPs by family. I think that is a good improvement and I would support merging it. I was mostly responding to the earlier comments about randomization as a solution however this solves that problem nicely.
It is satisfying that someone could write a DNS seeder that uses DNS compression to fit 16 IPv4 and 16 IPv6 addresses in one DNS packet. I think this should be recommended behavior.
I like your change of pulling seed IPs by family. I think that is a good improvement and I would support merging it. I was mostly responding to the earlier comments about randomization as a solution however this solves that problem nicely.
Great! I’ll push it to this PR :-)
It is satisfying that someone could write a DNS seeder that uses DNS compression to fit 16 IPv4 and 16 IPv6 addresses in one DNS packet. I think this should be recommended behavior.
Just to be clear, I believe this will be two DNS packets, one for A
and another for AAAA
, we unfortunately can’t fit both 16 IPv4 and 16 IPv6 addresses in one single DNS packet.
What’s interesting here is that when I try to observe the outgoing packets using:
0sudo tcpdump -ni <iface> ip src <iface-ip> and udp and port 53
and run this script:
0import socket
1import time
2
3seeds = [ "seed.bitcoin.sipa.be",
4 "dnsseed.bluematt.me",
5 "dnsseed.bitcoin.dashjr.org",
6 "seed.bitcoinstats.com",
7 "seed.bitcoin.jonasschnelli.ch",
8 "seed.btc.petertodd.org",
9 "seed.bitcoin.sprovoost.nl",
10 "dnsseed.emzy.de" ]
11
12for seed in seeds:
13 for family in [(socket.AF_UNSPEC, "UNSPEC"), (socket.AF_INET, "INET"), (socket.AF_INET6, "INET6")]:
14 result = len(socket.getaddrinfo(seed, None, family=family[0], type=socket.SOCK_STREAM, proto=socket.IPPROTO_TCP, flags=socket.AI_ADDRCONFIG))
15 print("Got {:>4} results for:\t{}\t{}".format(result, family[1], seed))
16 input("Press Enter to continue...")
I see that the UNSPEC
getaddrinfo
sends out 4 packets, 2 for A and 2 for AAAA. I think that’s twice the number of packets we expect.
The INET
and INET6
getaddrinfo
only send out 1 each, which is expected.
What’s crazier is that even though the UNSPEC
one sends out twice the expected number of packets and receives twice the expected number… Its number of results is exactly the sum of the number of results for the INET
and INET6
ones…
Maybe I’m missing something obvious, but this seems strange.
Both A and AAAA responses can fit 16 addresses if domain name
compression is properly used.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
1625+ CAddress addr = CAddress(CService(ip, Params().GetDefaultPort()), requiredServiceBits);
1626+ addr.nTime = GetTime() - 3*nOneDay - rng.randrange(4*nOneDay); // use a random age between 3 and 7 days old
1627+ vAdd.push_back(addr);
1628+ found++;
1629+ }
1630+ addrman.Add(vAdd, resolveSource);
addrinfo_family
, vAdd
here would also contain records from the first iteration?
🐙 This pull request conflicts with the target branch and needs rebase.
Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.