net: Add Tor v3 hardcoded seeds #21560

pull laanwj wants to merge 4 commits into bitcoin:master from laanwj:2021-03-torv3-hardcoded-seeds changing 7 files +1306 −1240
  1. laanwj commented at 11:50 am on March 31, 2021: member

    Closes #20239 and mitigates my node’s problem in #21351.

    • Add a few hardcoded seeds for TorV3

      • As the bitcoin-seeder doesn’t collect TorV3 addresses yet, I have extracted these from my own node using a script and added them manually. This is intended to be a temporary stop gap until 22.0’s seeds update.
    • Change hardcoded seeds to variable length BIP155 binary format.

      • It is stored as a single serialized blob in a byte array, instead of pseudo-IPv6 address slots. This is more flexible and, assuming most of the list is IPv4, more compact.
      • Only the (networkID, addr, port) subset (CService). Services and time are construed on the fly as before.
    • Change input format for nodes_*.txt.

      • Drop legacy 0xAABBCCDD format for IPv4. It is never generated by makeseeds.py.
      • Stop interpreting lack of port as default port, interpret it as ’no port’, to accomodate I2P and other port-less protocols (not handled in this PR). An explicit port is always generated by makeseeds.py so in practice this makes no difference right now.

    A follow-up to this PR could do the same for I2P.

  2. laanwj added the label P2P on Mar 31, 2021
  3. laanwj added this to the milestone 22.0 on Mar 31, 2021
  4. jonatack commented at 11:58 am on March 31, 2021: member
    Concept ACK, will review
  5. laanwj force-pushed on Mar 31, 2021
  6. in contrib/seeds/nodes_main.txt:1166 in 1007a73402 outdated
    1161@@ -1162,3 +1162,19 @@ zuytrfevzjcpizli.onion:8333
    1162 zvq6dpt3i2ofdp3g.onion:8333
    1163 zwwm6ga7u2hqe2sd.onion:8333
    1164 zyqb4lenfspntj5m.onion:8333
    1165+
    1166+# manually added 2021-03 for minimal torv3 bootstrap support
    


    vasild commented at 1:19 pm on March 31, 2021:

    I ran https://gist.github.com/laanwj/b3d7b01ef61ce07c2eff0a72a6b90183 on my node:

    • Extra addresses I have that are not on this list:
    02g5qfdkn2vvcbqhzcyvyiitg4ceukybxklraxjnu7atlhd22gdwywaid.onion:8333
    12jmtxvyup3ijr7u6uvu7ijtnojx4g5wodvaedivbv74w4vzntxbrhvad.onion:8333
    237m62wn7dz3uqpathpc4qfmgrbupachj52nt3jbtbjugpbu54kbud7yd.onion:8333
    37cgwjuwi5ehvcay4tazy7ya6463bndjk6xzrttw5t3xbpq4p22q6fyid.onion:8333
    4fjdyxicpm4o42xmedlwl3uvk5gmqdfs5j37wir52327vncjzvtpfv7yd.onion:8333
    5fzhn4uoxfbfss7h7d6ffbn266ca432ekbbzvqtsdd55ylgxn4jucm5qd.onion:8333
    6ifdu5qvbofrt4ekui2iyb3kbcyzcsglazhx2hn4wfskkrx2v24qxriid.onion:8333
    7m7cbpjolo662uel7rpaid46as2otcj44vvwg3gccodnvaeuwbm3anbyd.onion:8333
    8owjsdxmzla6d7lrwkbmetywqym5cyswpihciesfl5qdv2vrmwsgy4uqd.onion:8333
    9vi5bnbxkleeqi6hfccjochnn65lcxlfqs4uwgmhudph554zibiusqnad.onion:8333
    
    • Addresses that are on this list, but not on my node:
     07pyrpvqdhmayxggpcyqn5l3m5vqkw3qubnmgwlpya2mdo6x7pih7r7id.onion:8333
     1b64xcbleqmwgq2u46bh4hegnlrzzvxntyzbmucn3zt7cssm7y4ubv3id.onion:8333
     2fpz6r5ppsakkwypjcglz6gcnwt7ytfhxskkfhzu62tnylcknh3eq6pad.onion:8333
     3gxo5anvfnffnftfy5frkgvplq3rpga2ie3tcblo2vl754fvnhgorn5yd.onion:8333
     4itz3oxsihs62muvknc237xabl5f6w6rfznfhbpayrslv2j2ubels47yd.onion:8333
     5lrjh6fywjqttmlifuemq3puhvmshxzzyhoqx7uoufali57eypuenzzid.onion:8333
     6opnyfyeiibe5qo5a3wbxzbb4xdiagc32bbce46owmertdknta5mi7uyd.onion:8333
     7q7kgmd7n7h27ds4fg7wocgniuqb3oe2zxp4nfe4skd5da6wyipibqzqd.onion:8333
     8sys54sv4xv3hn3sdiv3oadmzqpgyhd4u4xphv4xqk64ckvaxzm57a7yd.onion:8333
     9tddeij4qigtjr6jfnrmq6btnirmq5msgwcsdpcdjr7atftm7cxlqztid.onion:8333
    10xqt25cobm5zqucac3634zfght72he6u3eagfyej5ellbhcdgos7t2had.onion:8333
    

    I can connect to all of them except opnyfyeiibe5qo5a3wbxzbb4xdiagc32bbce46owmertdknta5mi7uyd.onion:8333.

    • Addresses on both this list and my node:
    05g72ppm3krkorsfopcm2bi7wlv4ohhs4u4mlseymasn7g7zhdcyjpfid.onion:8333
    1ejxefzf5fpst4mg2rib7grksvscl7p6fvjp6agzgfc2yglxnjtxc3aid.onion:8333
    2rp7k2go3s5lyj3fnj6zn62ktarlrsft2ohlsxkyd7v3e3idqyptvread.onion:8333
    

    It makes things a bit easier if the list is sorted.


    laanwj commented at 3:06 pm on March 31, 2021:
    Thank you. I pushed a squashme commit to add your nodes and sort the list. I have also adapted the script to print a sorted list.
  7. in contrib/seeds/generate-seeds.py:46 in 213a740fe4 outdated
    55+    CJDNS = 6
    56 
    57-def name_to_ipv6(addr):
    58+def name_to_bip155(addr):
    59+    '''Convert address string to BIP155 (networkID, addr) tuple.'''
    60     if len(addr)>6 and addr.endswith('.onion'):
    


    vasild commented at 5:04 pm on March 31, 2021:
    nit: I guess len(addr)>6 and can be removed - python’s endswith() will return false if the string is too short.

    laanwj commented at 5:36 pm on March 31, 2021:
    Yes, that could be removed (done in squashme pr).
  8. in contrib/seeds/generate-seeds.py:55 in 213a740fe4 outdated
    66+            assert(vchAddr[34] == 3)
    67+            return (BIP155Network.TORV3, vchAddr[:32])
    68+        else:
    69             raise ValueError('Invalid onion %s' % vchAddr)
    70-        return pchOnionCat + vchAddr
    71     elif '.' in addr: # IPv4
    


    vasild commented at 5:07 pm on March 31, 2021:
    This will brick with foo.b32.i2p. I guess out of the scope of this PR.

    laanwj commented at 5:35 pm on March 31, 2021:
    Yes, adding I2P hardcoded seeds can be done in a follow-up PR.
  9. in contrib/seeds/generate-seeds.py:98 in 213a740fe4 outdated
     99+    return host + (port, )
    100 
    101-def process_nodes(g, f, structname, defaultport):
    102-    g.write('static SeedSpec6 %s[] = {\n' % structname)
    103-    first = True
    104+def ser_compact_size(l):
    


    vasild commented at 5:15 pm on March 31, 2021:
    Can this reuse ser_compact_size() from test/functional/test_framework/messages.py? Or move it to another place so that both can use it?

    laanwj commented at 5:32 pm on March 31, 2021:
    I would like to keep this script self-contained. It is simple enough logic.
  10. in contrib/seeds/generate-seeds.py:110 in 213a740fe4 outdated
    111+        r = struct.pack("<BI", 254, l)
    112+    else:
    113+        r = struct.pack("<BQ", 255, l)
    114+    return r
    115+
    116+def bip155_serialize(spec):
    


    vasild commented at 5:20 pm on March 31, 2021:
    note: very similar to CAddress::serialize_v2() from test/functional/test_framework/messages.py.

    laanwj commented at 5:33 pm on March 31, 2021:
    Same comment as above. Jumping though hoops to share code between this script and the tests is not worth it for such trivial functionality imo.
  11. in contrib/seeds/generate-seeds.py:18 in 213a740fe4 outdated
    17     <ip>:<port>
    18-    [<ipv6>]
    19     [<ipv6>]:<port>
    20-    <onion>.onion
    21-    0xDDBBCCAA (IPv4 little-endian old pnSeeds format)
    22+    <onion>.onion:<port>
    


    vasild commented at 5:26 pm on March 31, 2021:
    After these changes contrib/seeds/nodes_test.txt needs to have :8333 appended to its addresses.

    laanwj commented at 5:34 pm on March 31, 2021:
    Good catch! s/8333/18333/ofc
  12. vasild commented at 5:27 pm on March 31, 2021: member
    (reviewed up to f3e26a167)
  13. DrahtBot commented at 6:07 pm on March 31, 2021: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    No conflicts as of last run.

  14. in src/net.cpp:154 in 1007a73402 outdated
    156     FastRandomContext rng;
    157-    for (const auto& seed_in : vSeedsIn) {
    158-        struct in6_addr ip;
    159-        memcpy(&ip, seed_in.addr, sizeof(ip));
    160-        CAddress addr(CService(ip, seed_in.port), GetDesirableServiceFlags(NODE_NONE));
    161+    CDataStream s(vSeedsIn, SER_NETWORK, PROTOCOL_VERSION | ADDRV2_FORMAT);
    


    sipa commented at 10:42 pm on March 31, 2021:
    With a VectorReader here you’d avoid copying the encoded bytes.

    laanwj commented at 7:12 am on April 1, 2021:
    Yea—a lot of optimizations could be made here; it’s a byte array in rodata, we could return it as span<uint8_t>, and deserialize directly from there, making it zero-copy. That said, this happens once in the lifetime of a node (ideally) and it’s not that much data, so i’m not sure. But if it’s just a matter of swapping the type used i’m in :smile:

    laanwj commented at 7:29 am on April 1, 2021:

    I think I’m not sure how to use it. At least the obvious change gives a lot of compiler errors:

     0diff --git a/src/net.cpp b/src/net.cpp
     1index 52e35b56077bf6562202461f73a93e03097c91f9..507d61df0cd541e0696ead1c22cc582380bb1521 100644
     2--- a/src/net.cpp
     3+++ b/src/net.cpp
     4@@ -151,8 +151,8 @@ static std::vector<CAddress> convertSeeds(const std::vector<uint8_t> &vSeedsIn)
     5     const int64_t nOneWeek = 7*24*60*60;
     6     std::vector<CAddress> vSeedsOut;
     7     FastRandomContext rng;
     8-    CDataStream s(vSeedsIn, SER_NETWORK, PROTOCOL_VERSION | ADDRV2_FORMAT);
     9-    while (!s.eof()) {
    10+    VectorReader s(SER_NETWORK, PROTOCOL_VERSION | ADDRV2_FORMAT, vSeedsIn, 0);
    11+    while (!s.empty()) {
    12         CService endpoint;
    13         s >> endpoint;
    14         CAddress addr(endpoint, GetDesirableServiceFlags(NODE_NONE));
    
    0/…/bitcoin/src/netaddress.h:430:15: error: invalid operands to binary expression ('VectorReader' and 'Wrapper<CompactSizeFormatter<true>, unsigned long &>')                            
    1            s >> COMPACTSIZE(address_size);
    2            ~ ^  ~~~~~~~~~~~~~~~~~~~~~~~~~                                                                                                                                                                                          
    

    sipa commented at 7:37 am on April 1, 2021:
    Ok, never mind then. We can figure out how to make this work in a follow-up (there are probably several other cases in the codebase where this is done).
  15. sipa commented at 10:52 pm on March 31, 2021: member
    Concept/approach/code review ACK. I’m ok with squashing.
  16. laanwj commented at 7:14 am on April 1, 2021: member

    I’m ok with squashing.

    Right i’ll squash the squashme: commits into the appropriate top-level commits once it seems the (first round of) review is over. I think this is somewhat friendlier on reviewers in progress than squashing all the time.

  17. in src/net.cpp:160 in ede28e76c6 outdated
    162+    while (!s.eof()) {
    163+        CService endpoint;
    164+        s >> endpoint;
    165+        CAddress addr(endpoint, GetDesirableServiceFlags(NODE_NONE));
    166         addr.nTime = GetTime() - rng.randrange(nOneWeek) - nOneWeek;
    167+        LogPrint(BCLog::NET, "Added hardcoded seed: %s\n", addr.ToString());
    


    vasild commented at 11:01 am on April 1, 2021:
    This printed 1188 lines to the log. I think it is ok.

    laanwj commented at 11:36 am on April 1, 2021:
    AHHH good catch, this is just to debug / test, it’s convenient to be able to check if it matches the generated data. I’m not sure there is any point to it for end users :smile: But if you think it’s ok we can leave it in.

    vasild commented at 1:42 pm on April 1, 2021:
    There are also a lot of IP ... mapped to AS0 belongs to new bucket ... messages printed by -debug=net. ~0 on this new printout.
  18. in contrib/seeds/generate-seeds.py:115 in ede28e76c6 outdated
    116+def bip155_serialize(spec):
    117+    '''
    118+    Serialize (networkID, addr, port) tuple to BIP155 binary format.
    119+    '''
    120+    r = b""
    121+    r += struct.pack('>B', spec[0].value)
    


    vasild commented at 11:11 am on April 1, 2021:

    nit: the corresponding line in the test framework uses B instead of >B:

    https://github.com/bitcoin/bitcoin/blob/80a699fda9ff1129546cabbf17e955680a1cc705/test/functional/test_framework/messages.py#L273

    since this is duplicated code, I think it is good to keep it “the same” and since byte order is irrelevant for one-byte fields, maybe remove the > from here.


    laanwj commented at 11:37 am on April 1, 2021:

    Sure, will remove it. Myself I tend to use “>” symbolically in any kind of big-endian structure just in case I don’t forget it when adding fields later (but there is not really any risk of that here).

    Edit: done in a9ee5683ce4013904f6e53f29e37632969926b0d

  19. vasild commented at 11:12 am on April 1, 2021: member

    ede28e76c6ddf5da5518d834c3c3a18ce69cfb6f looks good, modulo the squashing.

    I verified that contrib/seeds/generate-seeds.py generates the same src/chainparamsseeds.h as included in this PR.

    Also, the added seeds (as reported in the newly added log printout) are the same as the ones in contrib/seeds/nodes_main.txt (serialize from py and deserialize in cpp work as expected).

  20. vasild commented at 1:38 pm on April 1, 2021: member
    a9ee5683ce4013904f6e53f29e37632969926b0d looks good, modulo the squashing.
  21. in src/net.cpp:145 in a9ee5683ce outdated
    140@@ -141,22 +141,23 @@ bool GetLocal(CService& addr, const CNetAddr *paddrPeer)
    141     return nBestScore >= 0;
    142 }
    143 
    144-//! Convert the pnSeed6 array into usable address objects.
    145-static std::vector<CAddress> convertSeed6(const std::vector<SeedSpec6> &vSeedsIn)
    146+//! Convert the serialized seeds into usable address objects.
    147+static std::vector<CAddress> convertSeeds(const std::vector<uint8_t> &vSeedsIn)
    


    jonatack commented at 7:16 pm on April 2, 2021:

    1007a73 naming nit, this function only has one caller so could be easily updated to the current style

    0static std::vector<CAddress> ConvertSeeds(const std::vector<uint8_t> &vSeedsIn)
    

    laanwj commented at 12:03 pm on April 5, 2021:
    Thanks, done
  22. in src/net.cpp:158 in a9ee5683ce outdated
    160-        CAddress addr(CService(ip, seed_in.port), GetDesirableServiceFlags(NODE_NONE));
    161+    CDataStream s(vSeedsIn, SER_NETWORK, PROTOCOL_VERSION | ADDRV2_FORMAT);
    162+    while (!s.eof()) {
    163+        CService endpoint;
    164+        s >> endpoint;
    165+        CAddress addr(endpoint, GetDesirableServiceFlags(NODE_NONE));
    


    jonatack commented at 7:17 pm on April 2, 2021:

    1007a73 could use braced initialization

    0        CAddress addr{endpoint, GetDesirableServiceFlags(NODE_NONE)};
    

    laanwj commented at 12:03 pm on April 5, 2021:
    Thanks, done
  23. jonatack commented at 9:18 pm on April 2, 2021: member
    ACK modulo squash. I’m running the gist (had to update the file perms for it to run) and it’s taking a while as I have many (nearly 1835 and increasing) tor v3 addresses; LMK if the output would be useful. A couple nits below to pick/choose/ignore.
  24. jonatack commented at 10:18 am on April 3, 2021: member

    I’m running the gist (had to update the file perms for it to run) and it’s taking a while as I have many (nearly 1835 and increasing) tor v3 addresses

    The gist script finished. It checked 1836 v3 addresses and returned 1523 of them, sorted. I saved it in a text file if useful.

  25. jonatack commented at 11:04 am on April 4, 2021: member

    Update:

     0$ ./src/bitcoin-cli -addressinfo
     1{
     2  "addresses known": {
     3    "ipv4": 185,
     4    "ipv6": 0,
     5    "torv2": 5597,
     6    "torv3": 2220,
     7    "i2p": 7,
     8    "total": 8009
     9  }
    10}
    
  26. MarcoFalke commented at 12:08 pm on April 4, 2021: member
    Concept ACK. I’ll pick up #20648 again after this is merged.
  27. laanwj commented at 11:50 am on April 5, 2021: member

    The gist script finished. It checked 1836 v3 addresses and returned 1523 of them, sorted. I saved it in a text file if useful.

    Maybe! The scope of this PR is just to get things started in that regard, not to add as many nodes as possible. We’d have to decide on how many, and which ones to use; as a guide makeseeds.py limits to 512 seeds per net, but how do we want to divide this over TorV2 and TorV3?

    We could do that kind of process to create a list of I2P and Tor seed nodes for 0.22. I’ve openened sipa/bitcoin-seeder#92 but am not sure that is the best approach. I mean it’s not rational for the DNS seeder’s crawler to collect addresses it can’t do anything with.

    The advantage of the dedicated crawler approach to my gist is that it applies some more stringent conditions to the nodes. For example it samples uptime and gets the node’s version information. But maybe that can be added, too.

    ./src/bitcoin-cli -addressinfo

    Nice!

  28. contrib: generate-seeds.py generates output in BIP155 format 06030f7a42
  29. contrib: Add a few TorV3 seed nodes 2a257de113
  30. contrib: Add explicit port numbers for testnet seeds
    This is necessary now due to parsing change.
    9b29d5df7f
  31. net: Deserialize hardcoded seeds from BIP155 blob
    Switch from IPv6 slot-based format to more compact and flexible BIP155
    format.
    b2ee8b207d
  32. laanwj force-pushed on Apr 5, 2021
  33. laanwj commented at 12:08 pm on April 5, 2021: member

    Squashed all the squashme’s into the appropriate commits a9ee5683ce4013904f6e53f29e37632969926b0d..b2ee8b207de78f03356905bd60b7b00b6f49c252

    The only overall differences are @jonatack’s latest comments.

  34. jonatack commented at 5:52 pm on April 5, 2021: member

    ACK b2ee8b207de78f03356905bd60b7b00b6f49c252

    Tested hitting ConvertSeeds() on mainnet and testnet.

  35. laanwj merged this on Apr 6, 2021
  36. laanwj closed this on Apr 6, 2021

  37. sidhujag referenced this in commit d00b95ee5a on Apr 6, 2021
  38. laanwj referenced this in commit ab9a566ab3 on May 4, 2021
  39. laanwj referenced this in commit 811aa24c71 on May 27, 2021
  40. furszy referenced this in commit 62e9993f26 on Aug 16, 2021
  41. DrahtBot locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 04:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me