net: prevent peers.dat corruptions by only serializing once #26909

pull mzumsande wants to merge 2 commits into bitcoin:master from mzumsande:202301_peersdat_corruption changing 4 files +43 −6
  1. mzumsande commented at 6:56 pm on January 17, 2023: contributor

    There have been various reports of corruption of peers.dat recently, see #26599. As explained in this post in more detail, the underlying issue is likely that we currently serialize AddrMan twice - once for the file stream, once for the hasher that helps create the checksum - and if AddrMan changes in between these two calls, the checksum doesn’t match the data and the resulting peers.dat is corrupted.

    This PR attempts to fix this by introducing and using HashedSourceWriter - a class that keeps a running hash while serializing data, similar to the existing CHashVerifier which does the analogous thing while unserializing data. Something like this was suggested before, see #10248 (review).

    Fixes #26599 (not by changing the behavior in case of a crash, but by hopefully fixing the underlying cause of these corruptions).

  2. DrahtBot commented at 6:56 pm on January 17, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK sipa, naumenkogs
    Stale ACK MarcoFalke

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #25284 ([WIP] consensus: Remove dependency on net (BIP 155 / ADDRV2_FORMAT) by MarcoFalke)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. mzumsande commented at 6:57 pm on January 17, 2023: contributor
    I also considered extending CHashVerifier instead of creating a new class, but that didn’t work out of the box because of how that class is used in https://github.com/bitcoin/bitcoin/blob/01ec5308bf616740c804247043046b23b483df5c/src/node/blockstorage.cpp#L501 to write data into the hasher without serializing into the file - I’m not too experienced with our streams, so would be interested in opinions on this.
  4. in src/addrdb.cpp:37 in 49f6fb6874 outdated
    33@@ -34,10 +34,9 @@ bool SerializeDB(Stream& stream, const Data& data)
    34 {
    35     // Write and commit header, data
    36     try {
    37-        CHashWriter hasher(stream.GetType(), stream.GetVersion());
    38-        stream << Params().MessageStart() << data;
    39-        hasher << Params().MessageStart() << data;
    40-        stream << hasher.GetHash();
    41+        CHashedSourceWriter<Stream> hashwriter(&stream);
    


    maflcko commented at 7:36 pm on January 17, 2023:

    Nit: Starting with C++17 you don’t need to specify redundant template parameters to constructor calls. Also some other style nits (feel free to ignore):

     0diff --git a/src/addrdb.cpp b/src/addrdb.cpp
     1index cee8d5e66a..900ba41025 100644
     2--- a/src/addrdb.cpp
     3+++ b/src/addrdb.cpp
     4@@ -34,7 +34,7 @@ bool SerializeDB(Stream& stream, const Data& data)
     5 {
     6     // Write and commit header, data
     7     try {
     8-        CHashedSourceWriter<Stream> hashwriter(&stream);
     9+        CHashedSourceWriter hashwriter{stream};
    10         hashwriter << Params().MessageStart() << data;
    11         hashwriter << hashwriter.GetHash();
    12     } catch (const std::exception& e) {
    13diff --git a/src/hash.h b/src/hash.h
    14index 7b2f82c456..31370c301a 100644
    15--- a/src/hash.h
    16+++ b/src/hash.h
    17@@ -6,6 +6,7 @@
    18 #ifndef BITCOIN_HASH_H
    19 #define BITCOIN_HASH_H
    20 
    21+#include <attributes.h>
    22 #include <crypto/common.h>
    23 #include <crypto/ripemd160.h>
    24 #include <crypto/sha256.h>
    25@@ -204,23 +205,22 @@ template <typename Source>
    26 class CHashedSourceWriter : public CHashWriter
    27 {
    28 private:
    29-    Source* source;
    30+    Source& m_source;
    31 
    32 public:
    33-    explicit CHashedSourceWriter(Source* source_) : CHashWriter(source_->GetType(), source_->GetVersion()), source(source_) {}
    34+    explicit CHashedSourceWriter(Source& source LIFETIMEBOUND) : CHashWriter{source.GetType(), source.GetVersion()}, m_source{source} {}
    35 
    36     void write(Span<const std::byte> src)
    37     {
    38-        source->write(src);
    39+        m_source.write(src);
    40         CHashWriter::write(src);
    41     }
    42 
    43     template <typename T>
    44     CHashedSourceWriter<Source>& operator<<(const T& obj)
    45     {
    46-        // Serialize to this stream
    47         ::Serialize(*this, obj);
    48-        return (*this);
    49+        return *this;
    50     }
    51 };
    52 
    53diff --git a/src/test/streams_tests.cpp b/src/test/streams_tests.cpp
    54index 7d2a1b6818..90ad61ca6c 100644
    55--- a/src/test/streams_tests.cpp
    56+++ b/src/test/streams_tests.cpp
    57@@ -503,11 +503,11 @@ BOOST_AUTO_TEST_CASE(streams_buffered_file_rand)
    58 BOOST_AUTO_TEST_CASE(streams_hashed)
    59 {
    60     CDataStream stream(SER_NETWORK, INIT_PROTO_VERSION);
    61-    CHashedSourceWriter<CDataStream> hash_writer(&stream);
    62+    CHashedSourceWriter hash_writer{stream};
    63     const std::string data{"bitcoin"};
    64     hash_writer << data;
    65 
    66-    CHashVerifier<CDataStream> hash_verifier(&stream);
    67+    CHashVerifier hash_verifier{stream};
    68     std::string result;
    69     hash_verifier >> result;
    70     BOOST_CHECK_EQUAL(data, result);
    

    mzumsande commented at 9:19 pm on January 17, 2023:
    Thanks, I took your suggestions (except keeping the & in the last line for CHashVerifier).
  5. maflcko approved
  6. maflcko commented at 7:42 pm on January 17, 2023: member
    review ACK 49f6fb6874e1e89336e94f3764ee236ec3f45537
  7. maflcko commented at 7:43 pm on January 17, 2023: member

    didn’t work out of the box because of how that class is used in bitcoin/src/node/blockstorage.cpp

    This is removed in #26649, but your approach seems fine, since it targets a backport?

  8. maflcko renamed this:
    addrdb: prevent peers.dat corruptions by only serializing once
    net: prevent peers.dat corruptions by only serializing once
    on Jan 17, 2023
  9. DrahtBot added the label P2P on Jan 17, 2023
  10. maflcko added the label Needs backport (24.x) on Jan 17, 2023
  11. maflcko added this to the milestone 24.1 on Jan 17, 2023
  12. maflcko added the label Needs backport (23.x) on Jan 17, 2023
  13. mzumsande force-pushed on Jan 17, 2023
  14. maflcko commented at 9:19 pm on January 17, 2023: member
    reutACK 7def75595f6da036ec79de0e13ac10dcb677afdf
  15. mzumsande commented at 9:22 pm on January 17, 2023: contributor

    This is removed in #26649, but your approach seems fine, since it targets a backport?

    Yes, I agree it would be nice to backport this, maybe HashedSourceWriter and CHashVerifier could be merged at some later point after #26649.

  16. in src/hash.h:220 in c275065262 outdated
    215+        m_source.write(src);
    216+        CHashWriter::write(src);
    217+    }
    218+
    219+    template <typename T>
    220+    CHashedSourceWriter<Source>& operator<<(const T& obj)
    


    sipa commented at 9:26 pm on January 17, 2023:
    I believe you can drop the <Source> here; it’s used as default.

    mzumsande commented at 10:23 pm on January 17, 2023:
    done
  17. in src/addrdb.cpp:39 in 7def75595f outdated
    38-        stream << Params().MessageStart() << data;
    39-        hasher << Params().MessageStart() << data;
    40-        stream << hasher.GetHash();
    41+        CHashedSourceWriter hashwriter{stream};
    42+        hashwriter << Params().MessageStart() << data;
    43+        hashwriter << hashwriter.GetHash();
    


    sipa commented at 9:27 pm on January 17, 2023:
    I think you can use stream << hashwriter.GetHash(); instead here, to avoid hashing the hash itself.

    maflcko commented at 9:33 pm on January 17, 2023:
    I believe this would require another explicit instantiation below, so I didn’t nit that in my review ;)

    sipa commented at 10:12 pm on January 17, 2023:
    @MarcoFalke I don’t think so; stream is just CAutoFile, and hashwriter.GetHash() is a uint256?

    mzumsande commented at 10:23 pm on January 17, 2023:
    yes, it compiles fine, so I changed it.
  18. in src/hash.h:205 in 7def75595f outdated
    199@@ -199,6 +200,30 @@ class CHashVerifier : public CHashWriter
    200     }
    201 };
    202 
    203+/** Writes data to an underlying source stream, while hashing the written data. */
    204+template <typename Source>
    205+class CHashedSourceWriter : public CHashWriter
    


    sipa commented at 9:27 pm on January 17, 2023:
    Nit coding style: no C prefix for new classes.

    mzumsande commented at 10:22 pm on January 17, 2023:
    removed the C
  19. hash: add HashedSourceWriter
    This class is the counterpart to CHashVerifier, in that it
    writes data to an underlying source stream,
    while keeping a hash of the written data.
    da6c7aeca3
  20. addrdb: Only call Serialize() once
    The previous logic would call it once for serializing into the filestream,
    and then again for serializing into the hasher. If AddrMan was changed
    in between these calls by another thread, the resulting peers.dat would
    be corrupt with non-matching checksum and data.
    Fix this by using HashedSourceWriter, which writes the data
    to the underlying stream and keeps track of the hash in one go.
    5eabb61b23
  21. mzumsande force-pushed on Jan 17, 2023
  22. sipa commented at 10:42 pm on January 17, 2023: member

    I think a possibility is adding a CHashWriter& GetHasher() { return *this; } function to the merged HashedSourceWriter + CHashVerifier class, and then for the blockstorage usage use verifier.GetHasher() << pindex->pprev->GetBlockHash();, as it’s data to be fed to just the hash computation rather than the file.

    No need for that in this PR though.

  23. sipa commented at 10:43 pm on January 17, 2023: member
    utACK 5eabb61b2386d00e93e6bbb2f493a56d1b326ad9
  24. naumenkogs commented at 12:00 pm on January 19, 2023: member
    utACK 5eabb61b2386d00e93e6bbb2f493a56d1b326ad9
  25. maflcko merged this on Jan 19, 2023
  26. maflcko closed this on Jan 19, 2023

  27. fanquake removed the label Needs backport (24.x) on Jan 19, 2023
  28. fanquake commented at 3:11 pm on January 19, 2023: member
    Added to #26878 for backporting into 24.x.
  29. fanquake referenced this in commit 2af31f2081 on Jan 19, 2023
  30. fanquake referenced this in commit d064abd8d6 on Jan 19, 2023
  31. maflcko referenced this in commit 591ec4841c on Jan 19, 2023
  32. maflcko referenced this in commit 21b0bc1fb9 on Jan 19, 2023
  33. fanquake referenced this in commit fd94befbc6 on Jan 19, 2023
  34. fanquake referenced this in commit 412cd1a34e on Jan 19, 2023
  35. maflcko removed the label Needs backport (23.x) on Jan 19, 2023
  36. sidhujag referenced this in commit 4ab996b227 on Jan 19, 2023
  37. mzumsande deleted the branch on Jan 19, 2023
  38. fanquake referenced this in commit 0567787f5e on Feb 16, 2023
  39. fanquake referenced this in commit 91f83dbeb1 on Feb 22, 2023
  40. fanquake referenced this in commit 07397cdede on Feb 22, 2023
  41. glozow referenced this in commit c8c85ca16e on Feb 27, 2023
  42. bitcoin locked this on Jan 19, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 19:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me