Use addrv2 serialization in anchors.dat #20514

pull sipa wants to merge 1 commits into bitcoin:master from sipa:202011_v2_anchors changing 1 files +7 −7
  1. sipa commented at 10:02 pm on November 26, 2020: member

    This changes anchors.dat to use ADDRV2_FORMAT, so its CAddress entries can use v2 serialization.

    This solves #20511, but is incompatible with #20509.

    This should be compatible with (v1) addresses stored by 0.21.0rc{1,2} as well, as it uses the client version stored in the CAddress records to decide which deserializer to use.

  2. Use addrv2 serialization in anchors.dat 8ed2c72a06
  3. sipa added this to the milestone 0.21.0 on Nov 26, 2020
  4. sipa requested review from hebasto on Nov 26, 2020
  5. sipa requested review from vasild on Nov 26, 2020
  6. naumenkogs commented at 7:13 am on November 27, 2020: member

    utACK.

    I assume this code won’t fail a node if you feed it an older anchors.dat format? At least I think it should not.

    Pushing it to 0.21 solves it at scale, but we probably want to handle even corner cases.

  7. sipa commented at 7:21 am on November 27, 2020: member

    @naumenkogs Yes, it should be as deserialization uses the version number written in the CAddress record. So anchors.dat from rc1 or rc2 should be read correctly.

    Still, it seems there aren’t any tests for the anchoring functionality, so it’d be good if someone wrote one, or it gets tested manually.

  8. MarcoFalke added the label Needs backport (0.21) on Nov 27, 2020
  9. MarcoFalke added the label Block storage on Nov 27, 2020
  10. MarcoFalke removed the label Block storage on Nov 27, 2020
  11. in src/addrdb.cpp:164 in 8ed2c72a06
    160@@ -161,13 +161,13 @@ bool CAddrDB::Read(CAddrMan& addr, CDataStream& ssPeers)
    161 void DumpAnchors(const fs::path& anchors_db_path, const std::vector<CAddress>& anchors)
    162 {
    163     LOG_TIME_SECONDS(strprintf("Flush %d outbound block-relay-only peer addresses to anchors.dat", anchors.size()));
    164-    SerializeFileDB("anchors", anchors_db_path, anchors);
    165+    SerializeFileDB("anchors", anchors_db_path, anchors, ADDRV2_FORMAT);
    


    vasild commented at 10:01 am on November 27, 2020:

    Maybe this comment should be updated, given that now ADDRV2_FORMAT is also used here instead of CLIENT_VERSION:

    https://github.com/bitcoin/bitcoin/blob/50091592dd875a1c94030dbed74112b003732d68/src/netaddress.h#L26-L32

  12. in src/addrdb.cpp:170 in 8ed2c72a06
    167 
    168 std::vector<CAddress> ReadAnchors(const fs::path& anchors_db_path)
    169 {
    170     std::vector<CAddress> anchors;
    171-    if (DeserializeFileDB(anchors_db_path, anchors)) {
    172+    if (DeserializeFileDB(anchors_db_path, anchors, ADDRV2_FORMAT)) {
    


    vasild commented at 10:18 am on November 27, 2020:

    I think this will not deserialize properly anchors.dat from before this PR.

    • CLIENT_VERSION is something like 219900
    • ADDRV2_FORMAT is 0x20000000

    In anchors.dat before this PR we have CAddress entries which contain version=219900 and addresses serialized in addrv1 format.

    With this PR we will set the stream version to ADDRV2_FORMAT during deserialize:

     0    // Notice added comments below
     1    SERIALIZE_METHODS(CAddress, obj)
     2    {
     3        SER_READ(obj, obj.nTime = TIME_INIT);
     4        int nVersion = s.GetVersion(); // nVersion will be assigned ADDRV2_FORMAT (0x20000000)
     5        if (s.GetType() & SER_DISK) {
     6            READWRITE(nVersion); // nVersion will be assigned CLIENT_VERSION (219900)
     7        }
     8...
     9        if (nVersion & ADDRV2_FORMAT) { // this will be false
    10            uint64_t services_tmp;
    11            SER_WRITE(obj, services_tmp = obj.nServices);
    12            READWRITE(Using<CompactSizeFormatter<false>>(services_tmp));
    13            SER_READ(obj, obj.nServices = static_cast<ServiceFlags>(services_tmp));
    14        } else {
    15            // it will read services as 8 bytes (correct, as that is what is actually stored on disk)
    16            READWRITE(Using<CustomUintFormatter<8>>(obj.nServices));
    17        }
    18        // oops, this will deserialize CService, which will deserialize CNetAddr which will check
    19        // s.GetVersion() which is ADDRV2_FORMAT (0x20000000) and will deserialize the
    20        // address as if it is stored in addrv2 format on disk, but it is actually stored in addrv1,
    21        // so this will deserialize garbage
    22        READWRITEAS(CService, obj);
    23    }
    

    jnewbery commented at 10:32 am on November 27, 2020:
    I think in this case the file checksum will mismatch and we won’t load any anchors at all, which is ok (anchors.dat is not in any release and losing the file isn’t a problem).

    vasild commented at 10:44 am on November 27, 2020:

    My point is that it will not be able to read “old” anchors.dat (I am not sure of the exact way it will fail, maybe it will be indeed checksum failure).

    I mention this because the comments above give the impression that “old” anchors.dat would be readable.


    hebasto commented at 10:47 am on November 27, 2020:

    This should be compatible with (v1) addresses stored by 0.21.0rc{1,2} as well…

    Such compatibility shouldn’t be a point for the reason mentioned by @jnewbery.

  13. in src/addrdb.cpp:38 in 8ed2c72a06
    34@@ -35,7 +35,7 @@ bool SerializeDB(Stream& stream, const Data& data)
    35 }
    36 
    37 template <typename Data>
    38-bool SerializeFileDB(const std::string& prefix, const fs::path& path, const Data& data)
    39+bool SerializeFileDB(const std::string& prefix, const fs::path& path, const Data& data, int version = CLIENT_VERSION)
    


    jnewbery commented at 10:46 am on November 27, 2020:
    I’d prefer this (and DeserializeFileDB) to not have a default argument. There are only three places where they’re called, and it seems better to be explicit in those places.
  14. jnewbery commented at 10:54 am on November 27, 2020: member

    Concept ACK, although I’m not sure how urgent this is. If I’m reading the code correctly, then the effect will be:

    (it’s very possible I’ve misread the code - this is split across addrdb/netaddress/net and is quite hard to follow)

    If I’m right, then I don’t think we necessarily need to fix this for 0.21. We didn’t have anchor connections or torv3 before, so the fact that they don’t work together isn’t a regression and can always be fixed up later.

    I think the same problem exists for banlist.dat - we’ll always serialize/deserialize in addrv1 format.

  15. jonatack commented at 10:57 am on November 27, 2020: member
    Concept ACK, could use test coverage.
  16. vasild commented at 11:05 am on November 27, 2020: member

    it’s very possible I’ve misread the code - this is split across addrdb/netaddress/net

    I think you read it correctly (or at least I read it in the same way). It will deserialize 0s from disk which will be invalid addresses which later will be skipped gracefully:

    https://github.com/bitcoin/bitcoin/blob/50091592dd875a1c94030dbed74112b003732d68/src/net.cpp#L1971-L1975

  17. jnewbery commented at 11:06 am on November 27, 2020: member

    it uses the client version stored in the CAddress records to decide which deserializer to use.

    anchors.dat from rc1 or rc2 should be read correctly. @sipa I don’t understand this. With this PR, we always unserialize anchors.dat using a CAutoFile with version ADDRV2_FORMAT. I think the version stored in the CAddress will be used to deserialize the services in the CAddress, but the underlying CNetAddr will always be unserialized as addrv2.

  18. vasild commented at 11:14 am on November 27, 2020: member

    I think the inconsistency in CAddress deserialization should be fixed in one way (#20509) or another:

     0diff --git i/src/protocol.h w/src/protocol.h
     1index 309fac621..634a1a91a 100644
     2--- i/src/protocol.h
     3+++ w/src/protocol.h
     4@@ -367,17 +367,17 @@ public:
     5     CAddress(CService ipIn, ServiceFlags nServicesIn, uint32_t nTimeIn) : CService{ipIn}, nTime{nTimeIn}, nServices{nServicesIn} {};
     6 
     7     SERIALIZE_METHODS(CAddress, obj)
     8     {
     9         SER_READ(obj, obj.nTime = TIME_INIT);
    10         int nVersion = s.GetVersion();
    11-        if (s.GetType() & SER_DISK) {
    12+        const bool is_disk_ser = s.GetType() & SER_DISK;
    13+        if (is_disk_ser) {
    14             READWRITE(nVersion);
    15         }
    16-        if ((s.GetType() & SER_DISK) ||
    17-            (nVersion != INIT_PROTO_VERSION && !(s.GetType() & SER_GETHASH))) {
    18+        if (is_disk_ser || (nVersion != INIT_PROTO_VERSION && !(s.GetType() & SER_GETHASH))) {
    19             // The only time we serialize a CAddress object without nTime is in
    20             // the initial VERSION messages which contain two CAddress records.
    21             // At that point, the serialization version is INIT_PROTO_VERSION.
    22             // After the version handshake, serialization version is >=
    23             // MIN_PEER_PROTO_VERSION and all ADDR messages are serialized with
    24             // nTime.
    25@@ -388,13 +388,16 @@ public:
    26             SER_WRITE(obj, services_tmp = obj.nServices);
    27             READWRITE(Using<CompactSizeFormatter<false>>(services_tmp));
    28             SER_READ(obj, obj.nServices = static_cast<ServiceFlags>(services_tmp));
    29         } else {
    30             READWRITE(Using<CustomUintFormatter<8>>(obj.nServices));
    31         }
    32+        const int stream_version_orig = s.GetVersion();
    33+        SER_READ(obj, if (is_disk_ser) { s.SetVersion(nVersion); });
    34         READWRITEAS(CService, obj);
    35+        SER_READ(obj, if (is_disk_ser) { s.SetVersion(stream_version_orig); });
    36     }
    

    The above patch, on top of this PR will:

    • Fix the problem which #20509 is fixing, so that PR can be closed
    • Make “old” anchors.dat readable (that may not be a concern, other than some obscure/scary message about being unable to read anchors.dat when a user/tester upgrades from rc2 to rc3)
    • Work just fine if there is ever addrv3 format
  19. hebasto commented at 11:22 am on November 27, 2020: member
    • Work just fine if there is ever addrv3 format

    Future proof is good.

  20. vasild commented at 12:21 pm on November 27, 2020: member

    I think the same problem exists for banlist.dat - we’ll always serialize/deserialize in addrv1 format.

    True that we will always ser/deser in addrv1, but that is not a problem because the banlist contains CSubNet objects which must be either IPv4 or IPv6 (no Tor v3 involved):

    https://github.com/bitcoin/bitcoin/blob/e2ff5e7b35d71195278d2a2ed9485f141de33d7a/src/netaddress.cpp#L992-L995

  21. jnewbery commented at 12:25 pm on November 27, 2020: member

    that is not a problem because the banlist contains CSubNet objects which must be either IPv4 or IPv6 (no Tor v3 involved)

    Ah good. Thanks @vasild!

  22. sipa commented at 4:56 pm on November 27, 2020: member
    @vasild Did you see #20516?
  23. sipa commented at 5:03 pm on November 27, 2020: member
    @jnewbery Oops, yes, that is only true when combined with #20516, or with @vasild’s patch above.
  24. sipa commented at 7:01 pm on November 27, 2020: member

    If I’m right, then I don’t think we necessarily need to fix this for 0.21. We didn’t have anchor connections or torv3 before, so the fact that they don’t work together isn’t a regression and can always be fixed up later

    That’s fair, and you’ve shown that this PR as-is actually makes things worse, as it’ll fail to deserialize the CService object.

    To fix it, we’ll either need @vasild’s patch from #20514 (comment), or doing it on top of #20516. I’m going to close this PR, and take the functionality there.

  25. sipa closed this on Nov 27, 2020

  26. MarcoFalke removed the label Needs backport (0.21) on Dec 4, 2020
  27. MarcoFalke removed this from the milestone 0.21.0 on Dec 4, 2020
  28. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me