Well-defined CAddress disk serialization, and addrv2 anchors.dat #20516

pull sipa wants to merge 3 commits into bitcoin:master from sipa:202011_disk_addr changing 3 files +119 −30
  1. sipa commented at 1:21 am on November 27, 2020: member

    Alternative to #20509.

    This makes the CAddress disk serialization format well defined, and uses it to enable addrv2 support in anchors.dat (in a way that’s compatible with older software). The new format is:

    • The first 4 bytes store a format version number. Its low 19 bits are ignored (as those historically stored the CLIENT_VERSION), but its high 13 bits specify the actual serialization:
      • 0x00000000: LE64 encoding for nServices, V1 encoding for CService (like pre-BIP155 network serialization).
      • 0x20000000: CompactSize encoding for nServices, V2 encoding for CService (like BIP155 network serialization).
      • Any other value triggers an unsupported format error on deserialization, and can be used for future format changes.
    • The ADDRV2_FORMAT flag in the stream’s version does not determine the actual serialization format; it only sets whether or not V2 encoding is permitted.
  2. sipa requested review from vasild on Nov 27, 2020
  3. DrahtBot commented at 2:24 am on November 27, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #21483 (p2p: add time when deserialize file db for ReadAnchors by brunoerg)
    • #20966 (banman: save the banlist in a JSON format on disk by vasild)
    • #20509 (net: CAddress deser: use stream’s version, not what’s coming from disk by vasild)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. DrahtBot added the label Build system on Nov 27, 2020
  5. DrahtBot added the label P2P on Nov 27, 2020
  6. sipa force-pushed on Nov 27, 2020
  7. MarcoFalke added this to the milestone 22.0 on Nov 27, 2020
  8. sipa force-pushed on Nov 27, 2020
  9. sipa renamed this:
    Introduce well-defined CAddress disk serialization
    Well-defined CAddress disk serialization, and addrv2 anchors.dat support
    on Nov 27, 2020
  10. sipa renamed this:
    Well-defined CAddress disk serialization, and addrv2 anchors.dat support
    Well-defined CAddress disk serialization, and addrv2 anchors.dat
    on Nov 27, 2020
  11. sipa commented at 7:17 pm on November 27, 2020: member
    I merged the anchors.dat addrv2 support from #20514 into this PR, as doing it correctly requires changes the Cservice serialization from stream version based to stored version based.
  12. hebasto commented at 7:49 pm on November 27, 2020: member

    Concept ACK.

    From the PR description and quick code reading it follows that there is no hurry to backport these changes into 0.21, right?

  13. sipa commented at 7:50 pm on November 27, 2020: member
    Indeed, there is probably no hurry, unless we want to support torv3 anchors in 0.21.
  14. in src/addrdb.cpp:26 in 04e3733b8d outdated
    22@@ -23,7 +23,7 @@ bool SerializeDB(Stream& stream, const Data& data)
    23 {
    24     // Write and commit header, data
    25     try {
    26-        CHashWriter hasher(SER_DISK, CLIENT_VERSION);
    27+        CHashWriter hasher(SER_DISK, stream.GetVersion());
    


    hebasto commented at 11:03 am on November 28, 2020:

    501de309e682f2edddb7e51e55665e034e1a86be, while this line is touched, it seems natural to inherit the stream type as well:

    0        CHashWriter hasher(stream.GetType(), stream.GetVersion());
    

    And CHashVerifier uses type of stream.


    sipa commented at 0:31 am on May 25, 2021:
    Done (ages ago).
  15. hebasto commented at 3:38 pm on November 28, 2020: member

    In commit 6d16903c4c384bc15f3f29e5635bb9dccd789cb6 message:

    These do no introduce incompatibilities, as no code versions exist that write any value other than 0 or 0x20000000 in the top 19 bits, and no code paths where the stream’s version differs from the stored version.

    s/top 19/high 13/ ?

  16. in src/protocol.h:386 in 04e3733b8d outdated
    381+    static constexpr uint32_t DISK_VERSION_INIT{220000};
    382+    static constexpr uint32_t DISK_VERSION_IGNORE_MASK{(1 << 19) - 1};
    383+    static_assert((DISK_VERSION_INIT & ~DISK_VERSION_IGNORE_MASK) == 0, "DISK_VERSION_INIT must be covered by DISK_VERSION_IGNORE_MASK");
    384+    static constexpr uint32_t DISK_VERSION_ADDRV2 = 0x20000000;
    385+    static_assert((DISK_VERSION_ADDRV2 & DISK_VERSION_IGNORE_MASK) == 0, "DISK_VERSION_ADDRV2 must not be covered by DISK_VERSION_IGNORE_MASK");
    386+    static_assert(DISK_VERSION_ADDRV2 == ADDRV2_FORMAT, "DISK_VERSION_ADDRV2 must ADDRV2_FORMAT for backward compatibility");
    


    hebasto commented at 3:43 pm on November 28, 2020:

    6d16903c4c384bc15f3f29e5635bb9dccd789cb6

    0    static_assert(DISK_VERSION_ADDRV2 == ADDRV2_FORMAT, "DISK_VERSION_ADDRV2 must be equal to ADDRV2_FORMAT for backward compatibility");
    
  17. in src/protocol.h:384 in 04e3733b8d outdated
    379+     *  embedded version number).
    380+     */
    381+    static constexpr uint32_t DISK_VERSION_INIT{220000};
    382+    static constexpr uint32_t DISK_VERSION_IGNORE_MASK{(1 << 19) - 1};
    383+    static_assert((DISK_VERSION_INIT & ~DISK_VERSION_IGNORE_MASK) == 0, "DISK_VERSION_INIT must be covered by DISK_VERSION_IGNORE_MASK");
    384+    static constexpr uint32_t DISK_VERSION_ADDRV2 = 0x20000000;
    


    hebasto commented at 4:21 pm on November 28, 2020:

    Maybe present DISK_VERSION_ADDRV2 as a bit flag explicitly, i.e.

    0    static constexpr uint32_t DISK_VERSION_ADDRV2 = 1 << 29;
    

    ?


    sipa commented at 1:06 am on May 25, 2021:
    Done.
  18. in src/protocol.h:410 in 04e3733b8d outdated
    409+            if (s.GetVersion() & ADDRV2_FORMAT) stored_format_version |= DISK_VERSION_ADDRV2;
    410+            READWRITE(stored_format_version);
    411+            stored_format_version &= ~DISK_VERSION_IGNORE_MASK; // ignore low bits
    412+            if (stored_format_version == 0) {
    413+                use_v2 = false;
    414+            } else if (stored_format_version == DISK_VERSION_ADDRV2 && (s.GetVersion() & ADDRV2_FORMAT)) {
    


    hebasto commented at 4:27 pm on November 28, 2020:

    When a new bit flag for disk serialization will be introduced in the future, a downgraded (after upgrading) node will throw an exception during deserialization. Maybe:

    0            } else if (stored_format_version & DISK_VERSION_ADDRV2 && (s.GetVersion() & ADDRV2_FORMAT)) {
    

    ?


    sipa commented at 6:23 pm on November 28, 2020:

    No, that would break the ability to add other formats in a compatible way.

    It’s supposed to throw an exception if it can’t be read.

  19. hebasto dismissed
  20. sipa force-pushed on Nov 30, 2020
  21. sipa force-pushed on Nov 30, 2020
  22. sipa commented at 9:41 pm on November 30, 2020: member
    Addressed comments.
  23. DrahtBot added the label Needs rebase on Dec 15, 2020
  24. in src/protocol.h:376 in 80f5c54420 outdated
    371+     *  - The high bits (masked by ~DISK_VERSION_IGNORE_MASK) store actual serialization information.
    372+     *    Only 0 or DISK_VERSION_ADDRV2 (equal to the historical ADDRV2_FORMAT) are valid now, and
    373+     *    any other value triggers a deserialization failure. Other values can be added later if
    374+     *    needed.
    375+     *
    376+     *  For disk deserialization, ADDRV2_FORMAT signals that ADDRV2 deserialization is permitted,
    


    vasild commented at 3:17 pm on December 18, 2020:
    0     *  For disk deserialization, ADDRV2_FORMAT in the stream version signals that ADDRV2 deserialization is permitted,
    

    sipa commented at 1:07 am on May 25, 2021:
    Done.
  25. in src/protocol.h:378 in 80f5c54420 outdated
    373+     *    any other value triggers a deserialization failure. Other values can be added later if
    374+     *    needed.
    375+     *
    376+     *  For disk deserialization, ADDRV2_FORMAT signals that ADDRV2 deserialization is permitted,
    377+     *  but the actual format is determined by the high bits in the stored version field.
    378+     *  For network serialization ADDRV2_FORMAT determines the actual format used (as it has no
    


    vasild commented at 3:17 pm on December 18, 2020:
    0     *  For network serialization ADDRV2_FORMAT in the stream version determines the actual format used (as it has no
    

    sipa commented at 1:07 am on May 25, 2021:
    Done.
  26. in src/protocol.h:382 in 80f5c54420 outdated
    377+     *  but the actual format is determined by the high bits in the stored version field.
    378+     *  For network serialization ADDRV2_FORMAT determines the actual format used (as it has no
    379+     *  embedded version number).
    380+     */
    381+    static constexpr uint32_t DISK_VERSION_INIT{220000};
    382+    static constexpr uint32_t DISK_VERSION_IGNORE_MASK{(1 << 19) - 1};
    


    vasild commented at 3:42 pm on December 18, 2020:

    Nowadays

    0(1 << 19) - 1
    

    can be written as

    00b00000000'00000111'11111111'11111111
    

    (feel free to ignore)


    vasild commented at 3:53 pm on December 18, 2020:
    Shouldn’t this be 18 bytes, instead of 19? CLIENT_VERSION of 219900 is 18 bytes (0b11'01011010'11111100).

    sipa commented at 0:52 am on May 25, 2021:
    I guess I added one slack bit, because the version number is pretty close to needing that many already.

    sipa commented at 1:07 am on May 25, 2021:
    Done.

    vasild commented at 1:51 pm on May 27, 2021:

    s/bytes/bits/ in my comment above

    I see, 18 bits would work until version 26 and 19 bits until version 52 (~15 years from now). Maybe add a few more bits (up to 29 is ok). And/or add some comment on what is that 19 (or another number if you choose to increment it).

  27. in src/protocol.h:384 in 80f5c54420 outdated
    379+     *  embedded version number).
    380+     */
    381+    static constexpr uint32_t DISK_VERSION_INIT{220000};
    382+    static constexpr uint32_t DISK_VERSION_IGNORE_MASK{(1 << 19) - 1};
    383+    static_assert((DISK_VERSION_INIT & ~DISK_VERSION_IGNORE_MASK) == 0, "DISK_VERSION_INIT must be covered by DISK_VERSION_IGNORE_MASK");
    384+    static constexpr uint32_t DISK_VERSION_ADDRV2 = 1 << 29;
    


    vasild commented at 3:49 pm on December 18, 2020:
    I think it is ok to assign ADDRV2_FORMAT here and remove the last static_assert below.

    sipa commented at 0:51 am on May 25, 2021:

    I’d rather not do that. The idea is that ADDRV2_FORMAT is irrelevant in the new code - it’s purely an internal flag that has no impact on the disk format. It could be changed to whatever, or dropped entirely - but if that happens, DISK_VERSION_ADDRV2 must still remain 1 << 29, because that’s what old versions used as disk serialization marker.

    I’ve just removed the assert, and replaced it with a comment.

  28. in src/protocol.h:442 in 80f5c54420 outdated
    445         } else {
    446             READWRITE(Using<CustomUintFormatter<8>>(obj.nServices));
    447+            // Invoke V1 serializer for CService parent object.
    448+            OverrideStream<Stream> os(&s, s.GetType(), 0);
    449+            SerReadWriteMany(os, ser_action, ReadWriteAsHelper<CService>(obj));
    450         }
    


    vasild commented at 4:16 pm on December 18, 2020:

    The repeated code can be moved after the if:

    0        } else {
    1            READWRITE(Using<CustomUintFormatter<8>>(obj.nServices));
    2        }
    3        OverrideStream<Stream> os(&s, s.GetType(), use_v2 ? ADDRV2_FORMAT : 0);
    4        SerReadWriteMany(os, ser_action, ReadWriteAsHelper<CService>(obj));
    

    sipa commented at 1:07 am on May 25, 2021:
    Done.
  29. in src/addrdb.cpp:164 in 80f5c54420 outdated
    160@@ -161,13 +161,13 @@ bool CAddrDB::Read(CAddrMan& addr, CDataStream& ssPeers)
    161 void DumpAnchors(const fs::path& anchors_db_path, const std::vector<CAddress>& anchors)
    162 {
    163     LOG_TIME_SECONDS(strprintf("Flush %d outbound block-relay-only peer addresses to anchors.dat", anchors.size()));
    164-    SerializeFileDB("anchors", anchors_db_path, anchors);
    165+    SerializeFileDB("anchors", anchors_db_path, anchors, CLIENT_VERSION | ADDRV2_FORMAT);
    


    vasild commented at 4:22 pm on December 18, 2020:
    Here and below in deserialize, why add CLIENT_VERSION when it is going to be ignored during ser/deser (they only check s.GetVersion() & ADDRV2_FORMAT)?

    sipa commented at 1:12 am on May 25, 2021:

    That’s just common in the codebase; use CLIENT_VERSION for disk serializations, PROTOCOL_VERSION for network serializations.

    I think this whole idea of streams having a version numbers with magic meanings, and implicit (=bad) compatibility properties for anything relying on it are silly, but getting rid of it isn’t for this PR.

  30. in src/protocol.h:449 in 80f5c54420 outdated
    450         }
    451-        READWRITEAS(CService, obj);
    452     }
    453 
    454     // disk and network only
    455     uint32_t nTime{TIME_INIT};
    


    vasild commented at 4:27 pm on December 18, 2020:

    Maybe this comment warrants an update, since we do not allow anything other than disk and network now.

    Also, it is “network only if s.GetVersion() != INIT_PROTO_VERSION”.


    sipa commented at 1:07 am on May 25, 2021:
    I’ve updated the comments here.
  31. in src/protocol.h:377 in 80f5c54420 outdated
    375+     *
    376+     *  For disk deserialization, ADDRV2_FORMAT signals that ADDRV2 deserialization is permitted,
    377+     *  but the actual format is determined by the high bits in the stored version field.
    378+     *  For network serialization ADDRV2_FORMAT determines the actual format used (as it has no
    379+     *  embedded version number).
    380+     */
    


    vasild commented at 4:49 pm on December 18, 2020:

    Maybe also mention what happens during disk serialization and network deserialization:

    For disk serialization, ADDRV2_FORMAT in the stream version indicates a write in that format For network deserialization ADDRV2_FORMAT in the stream version indicates that the data is expected to be in that format


    sipa commented at 1:08 am on May 25, 2021:
    I feel that’s pretty much already mentioned (just above, and in other places). I’ve tried to improve the comments further.
  32. vasild commented at 4:50 pm on December 18, 2020: member
    Approach ACK 80f5c5442
  33. Introduce well-defined CAddress disk serialization
    Before this commit, CAddress disk serialization was messy. It stored
    CLIENT_VERSION in the first 4 bytes, optionally OR'ed with ADDRV2_FORMAT.
     - All bits except ADDRV2_FORMAT were ignored, making it hard to use for actual
       future format changes.
     - ADDRV2_FORMAT determines whether or not nServices is serialized in LE64
       format or in CompactSize format.
     - Whether or not the embedded CService is serialized in V1 or V2 format is
       determined by the stream's version having ADDRV2_FORMAT (as opposed to the
       nServices encoding, which is determined by the disk version).
    
    To improve the situation, this commit introduces the following disk
    serialization format, compatible with earlier versions, but better defined for
    future changes:
     - The first 4 bytes store a format version number. Its low 19 bits are ignored
       (as it historically stored the CLIENT_VERSION), but its high 13 bits specify
       the serialization exactly:
       - 0x00000000: LE64 encoding for nServices, V1 encoding for CService
       - 0x20000000: CompactSize encoding for nServices, V2 encoding for CService
       - Any other value triggers an unsupported format error on deserialization,
         and can be used for future format changes.
     - The ADDRV2_FORMAT flag in the stream's version does not impact the actual
       serialization format; it only determines whether V2 encoding is permitted;
       whether it's actually enabled depends solely on the disk version number.
    
    Operationally the changes to the deserializer are:
     - Failure when the stored format version number is unexpected.
     - The embedded CService's format is determined by the stored format version
       number rather than the stream's version number.
    
    These do no introduce incompatibilities, as no code versions exist that write
    any value other than 0 or 0x20000000 in the top 13 bits, and no code paths
    where the stream's version differs from the stored version.
    8cd8f37dfe
  34. Use addrv2 serialization in anchors.dat e2f0548b52
  35. Add roundtrip fuzz tests for CAddress serialization f8866e8c32
  36. sipa force-pushed on May 25, 2021
  37. sipa commented at 1:08 am on May 25, 2021: member
    Rebased, and addressed comments.
  38. DrahtBot removed the label Needs rebase on May 25, 2021
  39. vasild approved
  40. vasild commented at 10:01 am on May 28, 2021: member
    ACK f8866e8c324be3322fa507c2ceb1de35d148d0f1
  41. in src/protocol.h:453 in f8866e8c32
    459     uint32_t nTime{TIME_INIT};
    460-
    461+    //! Serialized as uint64_t in V1, and as CompactSize in V2.
    462     ServiceFlags nServices{NODE_NONE};
    463+
    464+    friend bool operator==(const CAddress& a, const CAddress& b)
    


    jonatack commented at 10:25 am on May 28, 2021:
    8cd8f37 perhaps constexpr?

    sipa commented at 9:05 pm on June 12, 2021:

    Seems fine, but also pointless. I don’t think anyone is going to use these comparisons in compile-time initialized objects.

    I’ll do this if I need to retouch.


    sipa commented at 9:11 pm on June 12, 2021:
    Is it possible to drop the “changes requested” marker here?

    jonatack commented at 9:33 pm on June 12, 2021:
    I think that marker may be related to “hebasto requested changes on Nov 28, 2020” above (I don’t use the “requested changes” review type here, at least not intentionally).

    hebasto commented at 9:37 pm on June 12, 2021:
    I don’t know how to drop it. My recent comment has no such an attribute.

    sipa commented at 9:38 pm on June 12, 2021:
    @jonatack My apologies for assuming it was due to your comment!

    jonatack commented at 9:43 pm on June 12, 2021:
    It might be because #20516 (review) wasn’t “resolved”. This has been my superstition leading me to not use it (that, and the bright red flag).

    hebasto commented at 9:49 pm on June 12, 2021:

    I don’t know how to drop it.

    The magic GH button is “Dismiss review” :)

  42. jonatack commented at 10:34 am on May 28, 2021: member
    ACK f8866e8c324be3322fa507c2ceb1de35d148d0f1 tested rebased to master and built/run/restarted with DEBUG_ADDRMAN, peers.dat and anchors ser/deser seems fine
  43. hebasto commented at 9:36 pm on June 12, 2021: member
    Approach ACK
  44. sipa commented at 9:38 pm on June 12, 2021: member
    @hebasto I think you need to leave a new review.
  45. hebasto requested review from hebasto on Jun 12, 2021
  46. in src/protocol.h:407 in f8866e8c32
    408+            // that's part of the serialization itself. ADDRV2_FORMAT in the stream version only determines
    409+            // whether V2 is chosen/permitted at all.
    410+            uint32_t stored_format_version = DISK_VERSION_INIT;
    411+            if (s.GetVersion() & ADDRV2_FORMAT) stored_format_version |= DISK_VERSION_ADDRV2;
    412+            READWRITE(stored_format_version);
    413+            stored_format_version &= ~DISK_VERSION_IGNORE_MASK; // ignore low bits
    


    hebasto commented at 3:54 am on June 14, 2021:

    nit:

    This is the only place (besides two assertions above) where DISK_VERSION_IGNORE_MASK is used, and it is inverted. Isn’t it more clear:

    0            stored_format_version &= DISK_VERSION_MASK; // ignore low bits
    

    ?


    sipa commented at 11:30 pm on June 14, 2021:
    Will do if I retouch.
  47. hebasto approved
  48. hebasto commented at 3:54 am on June 14, 2021: member

    ACK f8866e8c324be3322fa507c2ceb1de35d148d0f1, tested on Linux Mint 20.1 (x86_64).

    Checked that changes in anchors.dat are forward compatible. They are not backward compatible – getting “ERROR: DeserializeDB: Deserialize or I/O error - CAutoFile::read: end of file: iostream error”, but this is expected.

  49. achow101 commented at 7:20 pm on June 14, 2021: member
    ACK f8866e8c324be3322fa507c2ceb1de35d148d0f1
  50. laanwj commented at 3:41 pm on June 17, 2021: member
    Code review ACK f8866e8c324be3322fa507c2ceb1de35d148d0f1
  51. laanwj merged this on Jun 17, 2021
  52. laanwj closed this on Jun 17, 2021

  53. sidhujag referenced this in commit 77b2f7a0f2 on Jun 18, 2021
  54. MarcoFalke commented at 11:37 am on June 21, 2021: member

    review ACK f8866e8c324be3322fa507c2ceb1de35d148d0f1 🕑

    Signature:

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA512
     2
     3review ACK f8866e8c324be3322fa507c2ceb1de35d148d0f1 🕑
     4-----BEGIN PGP SIGNATURE-----
     5
     6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
     7pUipdwwAnD4R+92A+ogj6ZHCw+VUa2T+wnnhhGhfQ54cW2Hldl9pS1qTe7zPaNy8
     8XDn9OzCwU8LHNOWlqZo+e+lRiL09CVW+E7dwwe13ioGrk7wkXw/yO+XfmWNvx/np
     9X3sOCI8sNq+p1UHVhJLaLtPyuTa3/3/s5Vi157KUjqeBRaQZfTuFXbs6QgWzzjFb
    10K7KIhN+LINgI0zPmJU2OBrkTvkg+HDNCDW4DWhFFR3AbAdszEKZtYzz9flZIG3X0
    11LOP/iRBAud+o62YYazGo86iGla7LR9cToVKkOOQdTgPRT5diFmFnHxvRVizobLPb
    12TfmA9BqB/Ws7wg6h9A6kqAN/B1O9S0uAyPoJhdsTZaQTNt7ouB+q/20jwaVwX61r
    130DZ5jhU0VgAkh4II1CH65oYTespndkvDvejjggyK7qFNFfUhKY7zePaMbmUu5eub
    14NKRkX6EN6GrKqP8dD26yMkPSXov0WjIZ89ebTI8joac8brNmX1TZMe70JYUdhgl/
    15IFgK8gnv
    16=Ldtq
    17-----END PGP SIGNATURE-----
    

    Timestamp of file with hash 71c25ee05aade76e5311b5e5ffa61e3c8b280b51a49049dba2025a6d50dfe77b -

  55. in src/test/fuzz/deserialize.cpp:254 in f8866e8c32
    250@@ -251,9 +251,37 @@ FUZZ_TARGET_DESERIALIZE(messageheader_deserialize, {
    251         DeserializeFromFuzzingInput(buffer, mh);
    252         (void)mh.IsCommandValid();
    253 })
    254-FUZZ_TARGET_DESERIALIZE(address_deserialize, {
    255+FUZZ_TARGET_DESERIALIZE(address_deserialize_v1_notime, {
    


    MarcoFalke commented at 11:38 am on June 21, 2021:
    Copied the fuzz inputs in commit https://github.com/bitcoin-core/qa-assets/commit/836513af1edae5987d8d4051b60d96ac4a5b484a, so that the targets have something nice to start with.
  56. gwillen referenced this in commit 3bef51bb0d on Jun 1, 2022
  57. DrahtBot locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-06 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me