Add policy for DNS seeds #4566

pull laanwj wants to merge 1 commits into bitcoin:master from laanwj:2014_07_dnsseed_policy changing 1 files +52 −0
  1. laanwj commented at 8:05 AM on July 21, 2014: member

    @gmaxwell wrote down a few rules to make it clear what is expected of DNS seeds and their operators.

    Rendered document: https://github.com/laanwj/bitcoin/blob/2014_07_dnsseed_policy/doc/dnsseed-policy.md

  2. laanwj commented at 8:12 AM on July 21, 2014: member

    TODO:

    • Add link to bitcoin-seeder as reference implementation
    • Maybe expand on what is meant with 'fair selection' in rule 1
  3. in doc/dnsseed-policy.md:None in 9e40114596 outdated
      10 | +
      11 | +0. A DNSseed operating organization or person is expected
      12 | +to follow good host security practices and maintain control of
      13 | +their serving infrastructure and not sell or transfer control of their
      14 | +infrastructure. Any hosting services contracted by the operator are
      15 | +equally expected to uphold these expectations.
    


    luke-jr commented at 12:44 PM on July 21, 2014:

    It seems unreasonable to forbid usage of datacenters without a commitment from them not to be bought out...


    gmaxwell commented at 4:31 PM on July 21, 2014:

    Yea, sorry— thats not the intent. I mentioned hosting services at all because the first sentence of 0 could have been read as precluding the use of hosting services. Can you suggest a rephrase which is consistent with your own expectations?


    luke-jr commented at 4:35 PM on July 21, 2014:

    I would think a clause prohibiting sale of the server may be problematic to any organisationally-hosted seeds, even aside from hosting companies. What is the goal of prohibiting such transfers (without knowing, I can't come up with any alternative ideas)?


    gmaxwell commented at 4:42 PM on July 21, 2014:

    Oh I see how you're reading this. In light of that it should say instead:

    their serving infrastructure and not sell or transfer control of their DNSseed.

    The expectations here are largely not directly enforceable by technology (or we wouldn't need to ask for them, they'd just be enforced), so there is a degree of reliance on honest behavior by operators. What I want to address here is the risk that some anonymous party comes up with a way to exploit the position of being a DNS seed for a dishonest end and they offer to buy control of a DNSseed from an existing operator (without mentioning their intended attack). An honest operator should, per these expectations (if not common sense first), refuse such a request.


    luke-jr commented at 4:48 PM on July 21, 2014:

    Ah, so the goal is to stop selling/leasing of the DNS seed by itself, but doing so as part of a larger transfer (company sale) is okay? I can't think of a good way to phrase this better :(


    gmaxwell commented at 5:09 PM on July 21, 2014:

    Yep. Well at least we can make it clear that it's the dnsseed and not the underlying server. :)

  4. in doc/dnsseed-policy.md:None in 9e40114596 outdated
      25 | +3. The results may not be served with a DNS TTL of less than one minute.
      26 | +
      27 | +4. Any logging of DNS queries should be only that which is necessary
      28 | +for the operation of the service or urgent health of the Bitcoin
      29 | +network and must not be retained longer than necessary or disclosed
      30 | +to any third party.
    


    luke-jr commented at 12:51 PM on July 21, 2014:

    How about anonymous statistics?


    gmaxwell commented at 4:35 PM on July 21, 2014:

    It's very easy to mess up anonymous statistics and leak information. Can you suggest a way that would be better defined? E.g. what level of statistics you think would be sensible? I don't see any big issue with raw traffic amounts.


    cdecker commented at 5:24 PM on July 21, 2014:

    How about using a similar level of granularity as the EDNS Client Subnet extensions (http://tools.ietf.org/html/draft-vandergaast-edns-client-subnet-02). The originating IP address is truncated to the BGP prefix in logs used for statistical analysis. This would prevent any analysis from identifying an individual user, which is the rationale behind point 4 if I'm not mistaken and would still allow to do research based on the query logs.

    As it stands right now, point 4 simply is too restrictive.


    gmaxwell commented at 5:31 PM on July 21, 2014:

    The purpose of the DNSseed infrastructure is to facilitate fast introduction to the network for hosts. Can you explain how (4) is at odds with this goal?

    Identifying organizations which are using Bitcoin may still be harmful for those users and a violation of their privacy.

    Bitcoin users are human being and should not be made subjects of research without informed consent. Use of the reference software is not consent. The privileged position to observe DNSseed queries should not be used for research purposes.


    cdecker commented at 5:37 PM on July 21, 2014:

    While I do agree on the main purpose of the DNSSeed infrastructure, I do also believe that valuable insights can be gained from analyzing the queries, that might ultimately benefit Bitcoin itself.

    I'm proposing the BGP prefix as a lower bound on the granularity of the collected data, which I believe is far more reasonable than an outright ban on any collection and use of this data. Data that by the way is still available to anyone with enough time to build a crawler of the network.

    According to your stance any research on the Bitcoin system would require the consent of all participants. This includes all research into message propagation and into protocol optimizations. Without the ground truth gathered by measurements we cannot improve the system.


    gmaxwell commented at 6:27 PM on July 21, 2014:

    A crawler returns nothing on clients which are either not listening/advertising or on full nodes which are not advertising and are only listening to a subset of the network. The document already makes an effort to separate data produced from crawling without the aid of a privileged position in the network.

    Without reaching any conclusions on the merits of collecting data on users, what you're proposing is equivalent to an explicit phone-home feature but less transparent and less subject to user choice and privacy controls because it silently piggybacking on an existing infrastructure.

    If it were desirable to have phone-home user monitoring it should be done as an explicit feature so that its privacy implications can be carefully considered and so that it could be disabled independently of the rest of the system.

    I had no idea that anyone would have thought it even remotely acceptable to utilize the privileged position of operating a DNS seed to track users, but the purpose of this document is to minimize those sorts of surprises.


    cdecker commented at 6:55 PM on July 21, 2014:

    The address of a node will be forwarded independently of whether it is listening to the network or not, you can collect the IP addresses of all nodes simply by listening for incoming addr messages.

    I am trying to make it possible to make anonymized information available to the public in a controlled fashion in order to level the playing field. It is likely that some people will make use of the information they collect, ignoring the policy because it cannot be enforced by technical means. Without an agreed upon way to make this information available we actually increase the disparity between seed providers that have the privileged access and the rest that do not.

    That being said, my seed will adhere to the policy, should the other seed providers agree to do so. I would however prefer if a fair-use policy were put in place that allows anonymized data to be used for research.


    sipa commented at 7:02 PM on July 21, 2014:

    Just to be clear: this section is about logging of received DNS queries, not about crawling or data gathered through crawling.

    For that purpose, DNS seeds are in a privileged position, as they get information from who is running bitcoin nodes, rather than who is running a reachable full node.


    gmaxwell commented at 7:31 PM on July 21, 2014:

    Just a point on the behavior of the software,

    The address of a node will be forwarded independently of whether it is listening to the network or not, you can collect the IP addresses of all nodes simply by listening for incoming addr messages.

    It will be forwarded if the node announces itself. If the node does not announce, there will be nothing to forward. Setting listen=0, for example, disables announcements... or just being a non-full node client will also result in no announcements.


    cdecker commented at 7:35 PM on July 21, 2014:

    Hm, didn't know that. That is assuming that all nodes behave like Bitcoin Core though, if a node decides to send an addr on behalf of one of its peers then that will be forwarded, right? That would explain a strange behavior I had a few weeks ago, but that's probably off-topic, sorry :-)


    gmaxwell commented at 9:19 PM on July 21, 2014:

    on behalf of one of its peers then that will be forwarded, right?

    It will be, but thats broken. It isn't how the protocol works, any implementation that was doing that would be broken and it would be mildly harmful to the network. Of course, nodes can do malicious things and the system is generally robust against them.


    laanwj commented at 8:00 AM on July 24, 2014:

    OK - so anonymous statistics (collected from DNS queries) are not allowed either. Do we need to reword anything to make that clear?


    cdecker commented at 9:59 PM on July 24, 2014:

    How about rephrasing the first sentence to "Any logging of DNS queries, or derivatives thereof, must be limited to the scope necessary for the operation of the service." to explicitly include statistics and aggregations.

    I find the part about urgent health of the network confusing and might create loopholes.


    sipa commented at 10:06 PM on July 24, 2014:

    I think that aggregation/logging of total queries (nothing broken down by any IP range) may be useful in the longer term to monitor service health.


    cdecker commented at 10:26 PM on July 24, 2014:

    I guess for the operation of the seeds we have a limit on the retention time in place.

    I started asking about statistics and aggregate data because like you I believe there is a use for some statistics, if handled carefully.

    We need to define two granularity levels:

    • granularity for logging for operational purposes
    • granularity for information to be released to the general public for network health monitoring

    What I gathered so far from the discussion is that any logging is ok as long as it is strictly needed for the operation of the seed (still a bit vague for my taste) and we have a limited retention time. For the granularity of data to be released to the public some are pushing for total silence.

    I'm ok with these if everybody agrees, but the current formulation is ambiguous as it concentrates on logs of individual queries, which is why I started asking about aggregated data.


    gmaxwell commented at 11:46 PM on July 24, 2014:

    The intention is to absolutely prohibit using this as a privileged position to monitor users. "Aggregates" have a long history of surprising outcomes, especially because the user's threat model might not be what the aggregator is thinking about.

    If we wanted to enable user-monitoring we would do so with a separate service specifically designed and intended for that so it could be transparent about its operation, so that the risks could be maximally mitigated, and so that users could opt out of it without otherwise degrading the operation of the software.

    "Necessary for the operation" was intended to cover things like measuring traffic levels for capacity planning or investigating high load (e.g. to figure out how to block a DOS attack), or dumping queries that fail for software troubleshooting. Since it seems to be enabling some confusion here I'll think up some other language.


    laanwj commented at 10:24 AM on July 25, 2014:

    Aggregates that are OK: total number of queries, bandwidth up/down, CPU load, I/O load - these don't discriminate clients in any way, and require no logging of possibly identifying (meta)data.

    Aggregates that are not OK: counting requests per country, client OS/program, number of unique IPs - these require acting on request contents or metadata.


    cdecker commented at 11:44 AM on July 25, 2014:

    Ack, sounds reasonable, thanks for clarifying. I would however allow counting queries by type (A, AAAA, SRV, ...) since at least in my case that helped debug quite a few issues with my DNS seed.


    sipa commented at 12:12 PM on July 25, 2014:

    That all sounds reasonable to me too. I'd prefer to see that spelled out explicitly in the document though.

  5. in doc/dnsseed-policy.md:None in 9e40114596 outdated
      14 | +infrastructure. Any hosting services contracted by the operator are
      15 | +equally expected to uphold these expectations.
      16 | +
      17 | +1. The DNSseed results must consist exclusively of fairly selected and
      18 | +functioning Bitcoin nodes from the public network to the best of the
      19 | +operators understanding and capability.
    


    luke-jr commented at 12:53 PM on July 21, 2014:

    "Fairly selected" might be good to define, but maybe not very easy to. Some seed nodes may out of necessity only index IPv4 or IPv6, so discrimination based on IP/address in general can't be prohibited. OTOH, leaving this undefined may be better just to avoid loopholes around common sense...

  6. in doc/dnsseed-policy.md:None in 9e40114596 outdated
       0 | @@ -0,0 +1,49 @@
       1 | +Expectations for DNSSeed operators
       2 | +====================================
       3 | +
       4 | +Bitcoin Core attempts to minimize the level of trust in DNS seeds,
       5 | +but DNS seeds still pose a small amount of risk for the network.
       6 | +Other implementations of Bitcoin software may also use the same
       7 | +seeds and may be more exposed. In light of this exposure this
       8 | +document establishes some basic expectations for the expectations
       9 | +for the operation of dnsseeds.
    


    Diapolo commented at 1:14 PM on July 21, 2014:

    Nit: If you call them a DNSSeed, you should write DNSSeeds here, IMO.


    laanwj commented at 1:32 PM on July 21, 2014:

    Yeah that's my fault. I find 'DNSseed' difficult to read, and changed it in the initial sentence, then saw it was used all over the place and forgot to change it back here.


    sipa commented at 7:03 PM on July 21, 2014:

    I vote for just 'DNS seed'.


    petertodd commented at 8:38 PM on July 22, 2014:

    @sipa +1


    davecgh commented at 10:55 PM on July 22, 2014:

    I also vote for DNS seed.


    laanwj commented at 7:59 AM on July 24, 2014:

    Changed to DNS seed everywhere

  7. luke-jr commented at 8:00 AM on July 24, 2014: member

    ACK

  8. laanwj added the label Docs and Output on Jul 31, 2014
  9. doc: Add new DNSseed policy 0a0878d43a
  10. laanwj merged this on Aug 4, 2014
  11. laanwj closed this on Aug 4, 2014

  12. laanwj referenced this in commit c3029052b7 on Aug 4, 2014
  13. BitcoinPullTester commented at 8:29 AM on August 4, 2014: none

    Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4566_0a0878d43a2e7db9c41b20ba1d3eb714fd6806c4/ for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

  14. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 15:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me