net: Limit # of IPs learned from a DNS seed by family #17602

pull dongcarl wants to merge 1 commits into bitcoin:master from dongcarl:2019-11-lower-nMaxIPs changing 3 files +20 −15
  1. dongcarl commented at 8:56 pm on November 25, 2019: member

    See: #16070

    Let’s continue the conversation here though.

    Instead of limiting by number of total IPs, we now limit by family, which seems more sane, and doesn’t bias against IPv6.

  2. fanquake added the label P2P on Nov 25, 2019
  3. practicalswift commented at 9:26 pm on November 25, 2019: contributor

    Concept ACK

    The quoted discussion mentions that a maximum 31 IP:s fit in a single DNS response packet over UDP. Would it make sense to go with a nMaxIPs of 31 instead of the 32?

  4. dongcarl commented at 5:37 pm on November 26, 2019: member

    The quoted discussion mentions that a maximum 31 IP:s fit in a single DNS response packet over UDP. Would it make sense to go with a nMaxIPs of 31 instead of the 32?

    I actually don’t remember how I calculated that or if I even calculated it correctly… Someone should verify… :eyes: @EthanHeilman

    Update: Ah, so we have 512 bytes to spend, and the DNS response should look something like: Header = 96 bits Query = 32 bits (assuming “.” as question) Answer = ( 96 bits + 32 bits ) * numIPv4 + ( 96 bits + 128 bits ) * numIPv6

    Note that this assumes we use DNS packet compression in the Answer section, no Authority or Additional sections, and “.” as the question (empty QNAME).

    So in this case we can get 31 IPv4 addresses at the most, or 17 IPv6 addresses (with some room to spare).

    This is just a rough calculation, we have yet to decide whether it is even a good reference point or not.

    Reference: https://www2.cs.duke.edu/courses/fall16/compsci356/DNS/DNS-primer.pdf

  5. EthanHeilman commented at 8:27 pm on November 26, 2019: contributor

    If I’m reading the code correctly even if you set nMaxIPs=1 that will not stop a DNS seeder or misconfigured DNS resolver from sending you far more than 1 IP address over UDP or TCP. Instead Bitcoin will receive all of those IPs and then select 1 of them and throw away the rest. So sadly this change won’t fix DNS over TCP grossness.

    https://github.com/bitcoin/bitcoin/blob/73b26e38d7a174d5409dda8aa1fe9804e7779a1f/src/netbase.cpp#L194-L209

    It seems reasonable to reduce this number to 64, 32, or 16 but beyond that point I start to worry that there is a chance a node will get all offline IP addresses.

    Here is my back of the envelope calculation. Assume that U is the fraction of IP addresses returned by seeders that are offline and numSeeders is the number of DNSseeders. The probability a node gets all unreachable IP addresses is:

    U^(numSeeders*nMaxIPs)

    U=0.9, numSeeders=8 and nMaxIPs = 8 then the probability is 0.001.

    0.00117=0.9^(8*8)

    That is we will see a failure to connect on average once each thousand times a node calls the seeders. Assuming seeders go down from time to time or incoming connection exhaustion attacks we probably want a number greater than nMaxIPs=8.

    The blog post I wrote that counted IP addresses in a DNS query focuses on TCP based responses. How many IP addresses can a DNS query return?

  6. practicalswift commented at 9:52 pm on November 26, 2019: contributor

    Instead Bitcoin will receive all of those IPs and then select 1 of them and throw away the rest. So sadly this change won’t fix DNS over TCP grossness.

    Yes, that’s a good point. Perhaps it should be recommended that Bitcoin DNS seeder implementations limit the number of returned addresses to keep the response in one UDP packet and thus avoiding fragile DNS-over-TCP (fragile due middleboxes filtering TCP/53).

  7. EthanHeilman commented at 5:18 pm on November 27, 2019: contributor
    @practicalswift Are there any BIPs around how seeders should behave? I don’t think we should specify implementation details, but having a rule saying something like don’t return 3000 IPs would be helpful.
  8. laanwj commented at 12:06 pm on November 28, 2019: member

    Are there any BIPs around how seeders should behave?

    There’s doc/dnsseed-policy.md which all DNS seed operators need to agree with (this is particular to this project, not bitcoin in general).

  9. dongcarl commented at 3:59 pm on December 2, 2019: member
    Some discussion from the other thread: #16070 (comment)
  10. dongcarl commented at 6:35 pm on December 2, 2019: member

    @EthanHeilman yeah I think we can definitely do 32, one thing that I realized after reading TheBlueMatt’s response (in the other thread :roll_eyes:) was that: because we set ai_family = AF_UNSPEC, we actually get back both the A and AAAA records, and the total would be more than 32.

    Something that I want to discuss is: after we get the names from getaddrinfo, should we randomize the returned linked list first before truncating to 32 entries? Mostly because getaddrinfo is usually implemented in such a way that the IPv4 addresses come before the IPv6 addresses. As it currently stands, IPv6 addresses are already disadvantaged by the fact that less of them fit within a 512byte payload, with this change, we’d be truncating mostly IPv6 addresses…


    It also seems kind of weird that we’re basically disadvantaging the IPv6 network by nature of DNS’s 512 byte limit (which remains true even with randomization)… I’m not sure what’s best here, but perhaps a limit per-family is what we want?

  11. sipa commented at 7:29 pm on December 2, 2019: member
    FWIW, https://github.com/sipa/bitcoin-seeder responds with as many addresses as fit in a UDP packet. DNS supports a weird compression scheme that lets record names point to earlier records in the same packet, which IIRC bitcoin-seeder does not use. As a result, the number of addresses that fit actually depends on the length of the domain name. For seed.bitcoin.sipa.be it’s 25 IPv4 addresses or 15 IPv6 addresses.
  12. dongcarl commented at 10:29 pm on December 2, 2019: member
    Here’s a branch that limits by family, which I think is more sane: https://github.com/bitcoin/bitcoin/compare/master...dongcarl:2019-12-rework-seeding, note that if the seeders use domain name compression properly, they should be able to fit 16 IPv6 addresses in a single packet.
  13. EthanHeilman commented at 3:28 pm on December 3, 2019: contributor

    @dongcarl

    It also seems kind of weird that we’re basically disadvantaging the IPv6 network by nature of DNS’s 512 byte limit (which remains true even with randomization)… I’m not sure what’s best here, but perhaps a limit per-family is what we want?

    I’m not sure I understand, but I maybe I missed something what is limiting DNS to 512 bytes? Neither Bitcoin’s DNS local resolver nor the DNS system have 512 Byte limit. I suppose @sipa’s DNSseeder limits responses to 512 Bytes but other seeders don’t always.

    Something that I want to discuss is: after we get the names from getaddrinfo, should we randomize the returned linked list first before truncating to 32 entries? Mostly because getaddrinfo is usually implemented in such a way that the IPv4 addresses come before the IPv6 addresses.

    I’m worried that we now adding complexity to the DNS seeding process and I don’t see any benefit that justifies this added complex. Is middleware boxes filtering TCP/53 a serious ongoing problem impacting Bitcoin?

    On the other hand it seems simple to reduce the number of IP addresses accepted from 256 to 64. Given none of the seeders return more than 50 IP addresses no IPv6 addresses would be dropped in favor of IPv4 addresses even if the response isn’t randomized.

  14. sipa commented at 9:20 pm on December 3, 2019: member
    @EthanHeilman Traditionally, DNS UDP responses are limited to 512 bytes. The EDNS0 extension increases that limit, but isn’t implemented by my seeder.
  15. EthanHeilman commented at 11:08 pm on December 3, 2019: contributor

    @sipa What benefit justifies making this code more complex? The current code doesn’t have any negative effects on seeders that use your implementation.

    I see it as a choice between:

    1. Setting nMaxIPs=16 or nMaxIPs=32 which would cause a preference for IPv4 over IPv6 in some circumstances, This might require writing additional code to remove this preference.

    2. and setting nMaxIPs=64 which requires no additional code or added complexity.

  16. sipa commented at 0:35 am on December 4, 2019: member
    I have no opinion on the PR here, just giving potentially relevant information about my seeder.
  17. dongcarl commented at 4:50 pm on December 4, 2019: member

    @EthanHeilman See my patch here: https://github.com/bitcoin/bitcoin/compare/master...dongcarl:2019-12-rework-seeding

    It doesn’t radomize (so less complexity), but it does pick 16 addresses from each address family (eliminating family bias). Once the seeders use domain name compression properly, they should be able to fit 16 IPv6 addresses in a single packet. Let me know if that seems reasonable, I’d be happy to reach out to seed owners.

  18. EthanHeilman commented at 5:39 pm on December 4, 2019: contributor

    @dongcarl I like your change of pulling seed IPs by family. I think that is a good improvement and I would support merging it. I was mostly responding to the earlier comments about randomization as a solution however this solves that problem nicely.

    It is satisfying that someone could write a DNS seeder that uses DNS compression to fit 16 IPv4 and 16 IPv6 addresses in one DNS packet. I think this should be recommended behavior.

  19. dongcarl commented at 6:27 pm on December 4, 2019: member

    I like your change of pulling seed IPs by family. I think that is a good improvement and I would support merging it. I was mostly responding to the earlier comments about randomization as a solution however this solves that problem nicely.

    Great! I’ll push it to this PR :-)

    It is satisfying that someone could write a DNS seeder that uses DNS compression to fit 16 IPv4 and 16 IPv6 addresses in one DNS packet. I think this should be recommended behavior.

    Just to be clear, I believe this will be two DNS packets, one for A and another for AAAA, we unfortunately can’t fit both 16 IPv4 and 16 IPv6 addresses in one single DNS packet.


    What’s interesting here is that when I try to observe the outgoing packets using:

    0sudo tcpdump -ni <iface> ip src <iface-ip> and udp and port 53
    

    and run this script:

     0import socket
     1import time
     2
     3seeds = [ "seed.bitcoin.sipa.be",
     4          "dnsseed.bluematt.me",
     5          "dnsseed.bitcoin.dashjr.org",
     6          "seed.bitcoinstats.com",
     7          "seed.bitcoin.jonasschnelli.ch",
     8          "seed.btc.petertodd.org",
     9          "seed.bitcoin.sprovoost.nl",
    10          "dnsseed.emzy.de" ]
    11
    12for seed in seeds:
    13    for family in [(socket.AF_UNSPEC, "UNSPEC"), (socket.AF_INET, "INET"), (socket.AF_INET6, "INET6")]:
    14        result = len(socket.getaddrinfo(seed, None, family=family[0], type=socket.SOCK_STREAM, proto=socket.IPPROTO_TCP, flags=socket.AI_ADDRCONFIG))
    15        print("Got {:>4} results for:\t{}\t{}".format(result, family[1], seed))
    16        input("Press Enter to continue...")
    

    I see that the UNSPEC getaddrinfo sends out 4 packets, 2 for A and 2 for AAAA. I think that’s twice the number of packets we expect.

    The INET and INET6 getaddrinfo only send out 1 each, which is expected.

    What’s crazier is that even though the UNSPEC one sends out twice the expected number of packets and receives twice the expected number… Its number of results is exactly the sum of the number of results for the INET and INET6 ones…

    Maybe I’m missing something obvious, but this seems strange.

  20. dongcarl force-pushed on Dec 4, 2019
  21. dongcarl force-pushed on Dec 4, 2019
  22. dongcarl renamed this:
    net: Lower # of IPs learned from a DNS seed
    net: Limit # of IPs learned from a DNS seed by family
    on Dec 4, 2019
  23. net: Limit # of IPs learned from a DNS seed by family
    Both A and AAAA responses can fit 16 addresses if domain name
    compression is properly used.
    48a45119d8
  24. dongcarl force-pushed on Dec 4, 2019
  25. DrahtBot commented at 3:51 pm on December 16, 2019: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #17754 (net: Don’t allow resolving of std::string with embedded NUL characters. Add tests. by practicalswift)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  26. fanquake requested review from naumenkogs on Jun 6, 2020
  27. fanquake requested review from ajtowns on Jun 6, 2020
  28. fanquake requested review from amitiuttarwar on Jun 6, 2020
  29. in src/net.cpp:1622 in 48a45119d8
    1625+                        CAddress addr = CAddress(CService(ip, Params().GetDefaultPort()), requiredServiceBits);
    1626+                        addr.nTime = GetTime() - 3*nOneDay - rng.randrange(4*nOneDay); // use a random age between 3 and 7 days old
    1627+                        vAdd.push_back(addr);
    1628+                        found++;
    1629+                    }
    1630+                    addrman.Add(vAdd, resolveSource);
    


    naumenkogs commented at 2:15 pm on June 9, 2020:
    In the second iteration of addrinfo_family, vAdd here would also contain records from the first iteration?
  30. naumenkogs commented at 2:50 pm on June 9, 2020: member
    I tried to go through the discussion, but I might be missing some of the points. Is the goal here is to avoid getting all IPv6 records, because they all might be Sybils (since IPv6 are cheaper)?
  31. DrahtBot commented at 11:19 pm on December 13, 2021: member

    🐙 This pull request conflicts with the target branch and needs rebase.

    Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

  32. DrahtBot added the label Needs rebase on Dec 13, 2021
  33. dongcarl commented at 5:37 pm on March 1, 2022: member
    Not planning on working on this for a while. Closing
  34. dongcarl closed this on Mar 1, 2022

  35. laanwj added the label Up for grabs on Mar 1, 2022
  36. DrahtBot locked this on Mar 1, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me