index: Verify the block filter hash when reading the filter from disk. #24832

pull kcalvinalvin wants to merge 4 commits into bitcoin:master from kcalvinalvin:2020-06-14-blockfilterindex-checksums changing 5 files +80 −28
  1. kcalvinalvin commented at 10:38 am on April 12, 2022: contributor

    This PR picks up the abandoned #19280

    BlockFilterIndex was depending on GolombRiceDecode() during the filter decode to sanity check that the filter wasn’t corrupt. However, we can check for corruption by ensuring that the encoded blockfilter’s hash matches up with the one stored in the index database.

    Benchmarks that were added in #19280 showed that checking the hash is much faster.

    The benchmarks were changed to nanobench and the relevant benchmarks were like below, showing a clear win for the hash check method.

    0|             ns/elem |              elem/s |    err% |        ins/elem |       bra/elem |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|---------------:|--------:|----------:|:----------
    2|              531.40 |        1,881,819.43 |    0.3% |        3,527.01 |         411.00 |    0.2% |      0.01 | `DecodeCheckedGCSFilter`
    3|          258,220.50 |            3,872.66 |    0.1% |    2,990,092.00 |     586,706.00 |    1.7% |      0.01 | `DecodeGCSFilter`
    4|           13,036.77 |           76,706.09 |    0.3% |       64,238.24 |         513.04 |    0.2% |      0.01 | `BlockFilterGetHash`
    
  2. fanquake added the label UTXO Db and Indexes on Apr 12, 2022
  3. in src/blockfilter.cpp:66 in 294a8454e4 outdated
    58@@ -59,9 +59,11 @@ GCSFilter::GCSFilter(const Params& params, std::vector<unsigned char> encoded_fi
    59     }
    60     m_F = static_cast<uint64_t>(m_N) * static_cast<uint64_t>(m_params.m_M);
    61 
    62+    if (filter_checked) return;
    63+
    64     // Verify that the encoded filter contains exactly N elements. If it has too much or too little
    65     // data, a std::ios_base::failure exception will be raised.
    66-    BitStreamReader<SpanReader> bitreader{stream};
    67+    BitStreamReader<SpanReader> bitreader(stream);
    


    jonatack commented at 11:26 am on April 12, 2022:
    No need to change this.

    kcalvinalvin commented at 12:24 pm on April 12, 2022:
    Reverted
  4. jonatack commented at 11:45 am on April 12, 2022: contributor

    Concept ACK

     0diff --git a/src/blockfilter.cpp b/src/blockfilter.cpp
     1index 59f34bc54e..0ab89fdbdb 100644
     2--- a/src/blockfilter.cpp
     3+++ b/src/blockfilter.cpp
     4@@ -47,7 +47,7 @@ GCSFilter::GCSFilter(const Params& params)
     5     : m_params(params), m_N(0), m_F(0), m_encoded{0}
     6 {}
     7 
     8-GCSFilter::GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, const bool filter_checked)
     9+GCSFilter::GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, bool filter_checked)
    10     : m_params(params), m_encoded(std::move(encoded_filter))
    11 {
    12     SpanReader stream{GCS_SER_TYPE, GCS_SER_VERSION, m_encoded};
    13@@ -221,7 +221,7 @@ static GCSFilter::ElementSet BasicFilterElements(const CBlock& block,
    14 }
    15 
    16 BlockFilter::BlockFilter(BlockFilterType filter_type, const uint256& block_hash,
    17-                         std::vector<unsigned char> filter, const bool filter_checked)
    18+                         std::vector<unsigned char> filter, bool filter_checked)
    19     : m_filter_type(filter_type), m_block_hash(block_hash)
    20 {
    21     GCSFilter::Params params;
    22diff --git a/src/blockfilter.h b/src/blockfilter.h
    23index af054dbf34..5de5679834 100644
    24--- a/src/blockfilter.h
    25+++ b/src/blockfilter.h
    26@@ -59,7 +59,7 @@ public:
    27     explicit GCSFilter(const Params& params = Params());
    28 
    29     /** Reconstructs an already-created filter from an encoding. */
    30-    GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, const bool filter_checked=false);
    31+    GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, bool filter_checked = false);
    32 
    33     /** Builds a new filter from the params and set of elements. */
    34     GCSFilter(const Params& params, const ElementSet& elements);
    35@@ -122,7 +122,7 @@ public:
    36 
    37     //! Reconstruct a BlockFilter from parts.
    38     BlockFilter(BlockFilterType filter_type, const uint256& block_hash,
    39-                std::vector<unsigned char> filter, const bool filter_checked=false);
    40+                std::vector<unsigned char> filter, bool filter_checked = false);
    41 
    42     //! Construct a new BlockFilter of the specified type from a block.
    43     BlockFilter(BlockFilterType filter_type, const CBlock& block, const CBlockUndo& block_undo);
    44diff --git a/src/index/blockfilterindex.cpp b/src/index/blockfilterindex.cpp
    45index bad766ebe1..0a4728f571 100644
    46--- a/src/index/blockfilterindex.cpp
    47+++ b/src/index/blockfilterindex.cpp
    48@@ -159,7 +159,7 @@ bool BlockFilterIndex::ReadFilterFromDisk(const FlatFilePos& pos, BlockFilter& f
    49         uint256 result;
    50         CHash256().Write(encoded_filter).Finalize(result);
    51         if (result != hash) return error("Checksum mismatch in filter decode.");
    52-        filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter), true);
    53+        filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter), /*filter_checked=*/true);
    54     }
    
  5. kcalvinalvin force-pushed on Apr 12, 2022
  6. kcalvinalvin commented at 12:27 pm on April 12, 2022: contributor

    A few suggestions regarding filter_checked

    I’ve applied all the suggested changes :)

  7. kcalvinalvin force-pushed on Apr 12, 2022
  8. in src/index/blockfilterindex.h:34 in 1faac19651 outdated
    30@@ -31,7 +31,7 @@ class BlockFilterIndex final : public BaseIndex
    31     FlatFilePos m_next_filter_pos;
    32     std::unique_ptr<FlatFileSeq> m_filter_fileseq;
    33 
    34-    bool ReadFilterFromDisk(const FlatFilePos& pos, BlockFilter& filter) const;
    35+    bool ReadFilterFromDisk(const FlatFilePos& pos, BlockFilter& filter, const uint256& hash) const;
    


    jonatack commented at 3:36 pm on April 12, 2022:

    It might be nice to order the ReadFilterFromDisk() arguments with in-params first, then the out-param. There are only four lines to change of the ones touched in this pull.

    0    bool ReadFilterFromDisk(const FlatFilePos& pos, const uint256& hash, BlockFilter& filter) const;
    

    kcalvinalvin commented at 8:43 am on April 14, 2022:
    Addressed
  9. in src/bench/gcs_filter.cpp:55 in 1faac19651 outdated
    55+    bench.unit("elem").run([&] {
    56+        GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded);
    57+    });
    58+}
    59+
    60+static void BlockFilterGetHash(benchmark::Bench& bench)
    


    jonatack commented at 4:39 pm on April 12, 2022:
    It would be handy to name all of these benchmarks with a common prefix to group them together in the output and that is easy to filter for, i.e. ./src/bench/bench_bitcoin -filter=EvictionProtection*.*

    kcalvinalvin commented at 8:23 am on April 14, 2022:
    All the GCSFIlter benchmarks are now changed to have the prefix GCSFilter.
  10. in src/bench/gcs_filter.cpp:66 in 1faac19651 outdated
    62+    GCSFilter::ElementSet elements;
    63+    for (int i = 0; i < 10000; ++i) {
    64+        GCSFilter::Element element(32);
    65+        element[0] = static_cast<unsigned char>(i);
    66+        element[1] = static_cast<unsigned char>(i >> 8);
    67+        elements.insert(std::move(element));
    


    jonatack commented at 4:41 pm on April 12, 2022:
    Optional: it looks like there is a fair amount of duplicate boilerplate in each of these benchmarks that you could extract out to a common setup helper in a separate commit, if you like. See EvictionProtectionCommon() in src/bench/peer_eviction.cpp for an example.

    kcalvinalvin commented at 4:12 pm on April 13, 2022:

    Since most of the duplicate code is just generating the elements, how’s separating that out to a function like so:

     0static const GCSFilter::ElementSet GenerateGCSTestElements()
     1{
     2    GCSFilter::ElementSet elements;
     3    for (int i = 0; i < 10000; ++i) {
     4        GCSFilter::Element element(32);
     5        element[0] = static_cast<unsigned char>(i);
     6        element[1] = static_cast<unsigned char>(i >> 8);
     7        elements.insert(std::move(element));
     8    }
     9
    10    return elements;
    11}
    

    And having each of the benchmarks call that function?


    kcalvinalvin commented at 8:43 am on April 14, 2022:
    I’ve applied the above change I suggested above.
  11. in src/blockfilter.h:62 in 1faac19651 outdated
    58@@ -59,7 +59,7 @@ class GCSFilter
    59     explicit GCSFilter(const Params& params = Params());
    60 
    61     /** Reconstructs an already-created filter from an encoding. */
    62-    GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter);
    63+    GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, bool filter_checked=false);
    


    jonatack commented at 4:42 pm on April 12, 2022:

    Clang format for each of these defaults

    0    GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, bool filter_checked = false);
    

    kcalvinalvin commented at 8:42 am on April 14, 2022:
    No longer needed as the default value was removed
  12. jonatack commented at 4:44 pm on April 12, 2022: contributor
    Approach ACK, a few iterative further comments on second look.
  13. in src/bench/gcs_filter.cpp:89 in 1faac19651 outdated
    85+    }
    86+    GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, elements);
    87+    auto encoded = filter.GetEncoded();
    88+
    89+    bench.unit("elem").run([&] {
    90+        GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded, true);
    


    jonatack commented at 4:53 pm on April 12, 2022:
    0        GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded, /*filter_checked=*/true);
    

    kcalvinalvin commented at 8:42 am on April 14, 2022:
    Addressed
  14. mzumsande commented at 10:52 pm on April 12, 2022: contributor

    Concept ACK

    As far as I can see, the sanity check that this PR makes optional is now only invoked from BlockFilter::Unserialize(), which is used only in test code because as a full node, bitcoin core doesn’t have the use case of accepting unchecked serialized blockfilters over p2p, only send self-generated ones to peers. So I wonder whether it might make sense to just remove the old sanity check instead of making it optional?

  15. jonatack commented at 2:06 pm on April 13, 2022: contributor

    @mzumsande good point, and I think it also highlights that it may alternatively be better to drop the default false value for filter_checked here as follows to make that more explicit. Then (here or in a follow-up), if BlockFilter::Unserialize() is removed or moved to the test code, the sanity check here could be removed.

     0diff --git a/src/bench/gcs_filter.cpp b/src/bench/gcs_filter.cpp
     1index 89fa07f602..56f3cff0c1 100644
     2--- a/src/bench/gcs_filter.cpp
     3+++ b/src/bench/gcs_filter.cpp
     4@@ -52,7 +52,7 @@ static void DecodeGCSFilter(benchmark::Bench& bench)
     5     auto encoded = filter.GetEncoded();
     6 
     7     bench.unit("elem").run([&] {
     8-        GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded);
     9+        GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded, /*filter_checked=*/false);
    10     });
    11 }
    12 
    13@@ -66,7 +66,7 @@ static void BlockFilterGetHash(benchmark::Bench& bench)
    14         elements.insert(std::move(element));
    15     }
    16     GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, elements);
    17-    BlockFilter block_filter(BlockFilterType::BASIC, {}, filter.GetEncoded());
    18+    BlockFilter block_filter(BlockFilterType::BASIC, {}, filter.GetEncoded(), /*filter_checked=*/false);
    19 
    20     bench.unit("elem").run([&] {
    21         block_filter.GetHash();
    22@@ -86,7 +86,7 @@ static void DecodeCheckedGCSFilter(benchmark::Bench& bench)
    23     auto encoded = filter.GetEncoded();
    24 
    25     bench.unit("elem").run([&] {
    26-        GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded, true);
    27+        GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded, /*filter_checked=*/true);
    28     });
    29 }
    30 BENCHMARK(BlockFilterGetHash);
    31diff --git a/src/blockfilter.h b/src/blockfilter.h
    32index 21d6a295a2..f823354989 100644
    33--- a/src/blockfilter.h
    34+++ b/src/blockfilter.h
    35@@ -59,7 +59,7 @@ public:
    36     explicit GCSFilter(const Params& params = Params());
    37 
    38     /** Reconstructs an already-created filter from an encoding. */
    39-    GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, bool filter_checked=false);
    40+    GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, bool filter_checked);
    41 
    42     /** Builds a new filter from the params and set of elements. */
    43     GCSFilter(const Params& params, const ElementSet& elements);
    44@@ -122,7 +122,7 @@ public:
    45 
    46     //! Reconstruct a BlockFilter from parts.
    47     BlockFilter(BlockFilterType filter_type, const uint256& block_hash,
    48-                std::vector<unsigned char> filter, bool filter_checked=false);
    49+                std::vector<unsigned char> filter, bool filter_checked);
    50 
    51     //! Construct a new BlockFilter of the specified type from a block.
    52     BlockFilter(BlockFilterType filter_type, const CBlock& block, const CBlockUndo& block_undo);
    53@@ -164,7 +164,7 @@ public:
    54         if (!BuildParams(params)) {
    55             throw std::ios_base::failure("unknown filter_type");
    56         }
    57-        m_filter = GCSFilter(params, std::move(encoded_filter));
    58+        m_filter = GCSFilter(params, std::move(encoded_filter), /*filter_checked=*/false);
    59     }
    60 };
    
  16. kcalvinalvin commented at 3:42 pm on April 13, 2022: contributor

    Sure I can just get rid of the option. I’ll move BlockFilter::Unserialize() to the test files in a follow-up.

    Quick side-question: my initial thought was that it’d be helpful to leave an option in there in case anyone is using the code as a library. In general, does Bitcoin Core development think about potential use cases of the code being used as a library somewhere else?

  17. jonatack commented at 7:38 pm on April 13, 2022: contributor

    @kcalvinalvin maybe decided case-by-case; I don’t recall a recent specific example of leave-it-in-for-reuse-as-a-library being invoked as a rationale for not moving code used only for tests out to the test code.

    Verified that removing BlockFilter::Unserialize() compiles with the following test-only deletions:

     0diff --git a/src/blockfilter.h b/src/blockfilter.h
     1index 21d6a295a2..764395c370 100644
     2--- a/src/blockfilter.h
     3+++ b/src/blockfilter.h
     4@@ -148,24 +148,6 @@ public:
     5           << m_block_hash
     6           << m_filter.GetEncoded();
     7     }
     8-
     9-    template <typename Stream>
    10-    void Unserialize(Stream& s) {
    11-        std::vector<unsigned char> encoded_filter;
    12-        uint8_t filter_type;
    13-
    14-        s >> filter_type
    15-          >> m_block_hash
    16-          >> encoded_filter;
    17-
    18-        m_filter_type = static_cast<BlockFilterType>(filter_type);
    19-
    20-        GCSFilter::Params params;
    21-        if (!BuildParams(params)) {
    22-            throw std::ios_base::failure("unknown filter_type");
    23-        }
    24-        m_filter = GCSFilter(params, std::move(encoded_filter));
    25-    }
    26 };
    27 
    28 #endif // BITCOIN_BLOCKFILTER_H
    29diff --git a/src/test/blockfilter_tests.cpp b/src/test/blockfilter_tests.cpp
    30index 8eb4dbc592..b99d9a32c3 100644
    31--- a/src/test/blockfilter_tests.cpp
    32+++ b/src/test/blockfilter_tests.cpp
    33@@ -106,23 +106,6 @@ BOOST_AUTO_TEST_CASE(blockfilter_basic_test)
    34     for (const CScript& script : excluded_scripts) {
    35         BOOST_CHECK(!filter.Match(GCSFilter::Element(script.begin(), script.end())));
    36     }
    37-
    38-    // Test serialization/unserialization.
    39-    BlockFilter block_filter2;
    40-
    41-    CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
    42-    stream << block_filter;
    43-    stream >> block_filter2;
    44-
    45-    BOOST_CHECK_EQUAL(block_filter.GetFilterType(), block_filter2.GetFilterType());
    46-    BOOST_CHECK_EQUAL(block_filter.GetBlockHash(), block_filter2.GetBlockHash());
    47-    BOOST_CHECK(block_filter.GetEncodedFilter() == block_filter2.GetEncodedFilter());
    48-
    49-    BlockFilter default_ctor_block_filter_1;
    50-    BlockFilter default_ctor_block_filter_2;
    51-    BOOST_CHECK_EQUAL(default_ctor_block_filter_1.GetFilterType(), default_ctor_block_filter_2.GetFilterType());
    52-    BOOST_CHECK_EQUAL(default_ctor_block_filter_1.GetBlockHash(), default_ctor_block_filter_2.GetBlockHash());
    53-    BOOST_CHECK(default_ctor_block_filter_1.GetEncodedFilter() == default_ctor_block_filter_2.GetEncodedFilter());
    54 }
    55 
    56 BOOST_AUTO_TEST_CASE(blockfilters_json_test)
    57diff --git a/src/test/fuzz/deserialize.cpp b/src/test/fuzz/deserialize.cpp
    58index ed6f172a2a..3793921e86 100644
    59--- a/src/test/fuzz/deserialize.cpp
    60+++ b/src/test/fuzz/deserialize.cpp
    61@@ -95,12 +95,6 @@ void DeserializeFromFuzzingInput(FuzzBufferType buffer, T& obj, const std::optio
    62             throw invalid_fuzzing_input_exception();
    63         }
    64     }
    65-    try {
    66-        ds >> obj;
    67-    } catch (const std::ios_base::failure&) {
    68-        throw invalid_fuzzing_input_exception();
    69-    }
    70-    assert(buffer.empty() || !Serialize(obj).empty());
    71 }
    72 
    73 template <typename T>
    74diff --git a/src/test/fuzz/util.h b/src/test/fuzz/util.h
    75index 6c91844633..e8e8e57394 100644
    76--- a/src/test/fuzz/util.h
    77+++ b/src/test/fuzz/util.h
    78@@ -145,11 +145,6 @@ template <typename T>
    79     const std::vector<uint8_t> buffer = ConsumeRandomLengthByteVector(fuzzed_data_provider, max_length);
    80     CDataStream ds{buffer, SER_NETWORK, INIT_PROTO_VERSION};
    81     T obj;
    82-    try {
    83-        ds >> obj;
    84-    } catch (const std::ios_base::failure&) {
    85-        return std::nullopt;
    86-    }
    87     return obj;
    88 }
    
  18. kcalvinalvin force-pushed on Apr 14, 2022
  19. kcalvinalvin force-pushed on Apr 14, 2022
  20. kcalvinalvin force-pushed on Apr 14, 2022
  21. kcalvinalvin commented at 8:45 am on April 14, 2022: contributor
    Addressed all of the changes requested except for removing BlockFilter::Unserialize() and getting rid of the option to sanity check. I’ll do those in a follow up.
  22. jonatack commented at 1:42 pm on April 14, 2022: contributor

    ACK 3e6d98e2dc98a4305a5bbdf4b6bea10bcf4aca71

    modulo:

    • ordering the gcs_filter benchmark methods in the same order as the benchmarks in that file
    • maybe naming the BlockFilterGetHash bench as GCSBlockFilterGetHash so that ./src/bench/bench_bitcoin -filter=GCS*.* can invoke it with the others in that file

    Here is a patch on top of your last push, if helpful, that removes BlockFilter::Unserialize: https://github.com/jonatack/bitcoin/commits/remove-BlockFilter-Unserialize. Though it may be more prudent to move it to the test code and leave the tests in place.

  23. kcalvinalvin force-pushed on Apr 19, 2022
  24. jonatack commented at 3:51 pm on April 19, 2022: contributor

    ACK 0b976a6dc0003898abaef14254f572b58f83926a

    Sanity check:

    Running ./src/bench/bench_bitcoin -filter=GCS*.* on this branch (debug build and not tuned for benchmarking)

    0|             ns/elem |              elem/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|          179,394.67 |            5,574.30 |    0.5% |      0.01 | `GCSBlockFilterGetHash`
    3|            3,376.66 |          296,150.75 |    2.1% |      0.38 | `GCSFilterConstruct`
    4|        5,163,985.00 |              193.65 |    2.5% |      0.06 | `GCSFilterDecode`
    5|            1,926.40 |          519,102.50 |    1.9% |      0.01 | `GCSFilterDecodeChecked`
    6|          577,446.00 |            1,731.76 |    1.0% |      0.01 | `GCSFilterMatch`
    
     0diff --git a/src/blockfilter.cpp b/src/blockfilter.cpp
     1index 08e2e6f72e..9d76d2b60e 100644
     2--- a/src/blockfilter.cpp
     3+++ b/src/blockfilter.cpp
     4@@ -59,7 +59,7 @@ GCSFilter::GCSFilter(const Params& params, std::vector<unsigned char> encoded_fi
     5     }
     6     m_F = static_cast<uint64_t>(m_N) * static_cast<uint64_t>(m_params.m_M);
     7 
     8-    if (filter_checked) return;
     9+    // if (filter_checked) return;
    10 
    11     // Verify that the encoded filter contains exactly N elements. If it has too much or too little
    12     // data, a std::ios_base::failure exception will be raised.
    13diff --git a/src/index/blockfilterindex.cpp b/src/index/blockfilterindex.cpp
    14index 1471c29a85..f83b883c8f 100644
    15--- a/src/index/blockfilterindex.cpp
    16+++ b/src/index/blockfilterindex.cpp
    17@@ -156,9 +156,6 @@ bool BlockFilterIndex::ReadFilterFromDisk(const FlatFilePos& pos, const uint256&
    18     std::vector<uint8_t> encoded_filter;
    19     try {
    20         filein >> block_hash >> encoded_filter;
    21-        uint256 result;
    22-        CHash256().Write(encoded_filter).Finalize(result);
    23-        if (result != hash) return error("Checksum mismatch in filter decode.");
    24         filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter), /*filter_checked=*/true);
    

    …GCSFilterDecodeChecked is far slower and equivalent to GCSFilterDecode as expected

    0|             ns/elem |              elem/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|          165,774.17 |            6,032.30 |    0.5% |      0.01 | `GCSBlockFilterGetHash`
    3|            3,775.71 |          264,850.84 |    0.5% |      0.42 | `GCSFilterConstruct`
    4|        4,960,766.00 |              201.58 |    0.9% |      0.05 | `GCSFilterDecode`
    5|        4,968,437.00 |              201.27 |    1.4% |      0.06 | `GCSFilterDecodeChecked`
    6|          584,923.00 |            1,709.63 |    3.7% |      0.01 | `GCSFilterMatch`
    
  25. kcalvinalvin force-pushed on Apr 20, 2022
  26. kcalvinalvin commented at 6:48 am on April 20, 2022: contributor

    ACK 0b976a6

    Thank you for the speedy review!

    I pushed 04008e2df6e2bac0cc92a2be05fec257a0fe7d7f on top of 0b976a6dc0003898abaef14254f572b58f83926a which does mostly what you did in https://github.com/jonatack/bitcoin/commits/remove-BlockFilter-Unserialize but it keeps the serialize/unserialize test in src/test/blockfilter_tests.cpp with the function

     0template <typename Stream>
     1static BlockFilter UnserializeBlockFilter(Stream& s) {
     2    std::vector<unsigned char> encoded_filter;
     3    uint8_t filter_type_uint8;
     4    uint256 block_hash;
     5
     6    s >> filter_type_uint8
     7      >> block_hash
     8      >> encoded_filter;
     9
    10    BlockFilterType filter_type = static_cast<BlockFilterType>(filter_type_uint8);
    11    BlockFilter block_filter(filter_type, block_hash, std::move(encoded_filter), /*filter_checked=*/false);
    12    return block_filter;
    13}
    

    I think this is a good tradeoff between keeping the serialization in the unit tests but still keeping the maintenance overhead low.

  27. kcalvinalvin force-pushed on Apr 20, 2022
  28. kcalvinalvin commented at 7:38 am on April 20, 2022: contributor

    Looks like you are removing the fuzz test as well? @MarcoFalke Ah my first instinct was to get rid of the entire test as ConsumeDeserializable would no longer be usable. I’ll push a change with the fuzz tests kept in.

  29. kcalvinalvin force-pushed on Apr 21, 2022
  30. kcalvinalvin force-pushed on Apr 21, 2022
  31. jonatack commented at 6:00 pm on April 21, 2022: contributor
    @kcalvinalvin if you move BlockFilter::Unserialize() to the test code in this pull, it will be a smaller change as the first commit rather than last one, WDYT? Edit: maybe a first commit that adds the check in ReadFilterFromDisk() and rename the commit, then the one that moves Unserialize() to the test code.
  32. kcalvinalvin commented at 7:33 am on April 22, 2022: contributor

    @kcalvinalvin if you move BlockFilter::Unserialize() to the test code in this pull, it will be a smaller change as the first commit rather than last one, WDYT? Edit: maybe a first commit that adds the check in ReadFilterFromDisk() and rename the commit, then the one that moves Unserialize() to the test code.

    I’m ok with that if it’d lower the cost of reviewing. However, I’m thinking if the cost of reviewing would be even lower if I have commit 447174d208bc3fdd32874656344a736edf949e63 be in a separate PR. I’m sorta thinking maybe getting rid of Unserialize() and replacing GolombRiceDecode() check are two separate things. Let me know what you’d think would lower the cost of reviewing :)

  33. jonatack commented at 8:10 am on April 22, 2022: contributor
  34. kcalvinalvin force-pushed on Apr 22, 2022
  35. kcalvinalvin commented at 4:03 pm on April 22, 2022: contributor
    Reverted remove the last commit. Latest commit is now back to 0b976a6dc0003898abaef14254f572b58f83926a. Will make a follow up PR after this one.
  36. in src/blockfilter.cpp:50 in c35df324b1 outdated
    46@@ -47,7 +47,7 @@ GCSFilter::GCSFilter(const Params& params)
    47     : m_params(params), m_N(0), m_F(0), m_encoded{0}
    48 {}
    49 
    50-GCSFilter::GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter)
    51+GCSFilter::GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, bool filter_checked)
    


    ryanofsky commented at 4:13 pm on April 28, 2022:

    In commit “Make sanity check in GCSFilter constructor optional” (c35df324b19b94aaecfbcd26111b7d00c5bed4e0)

    Requesting change: Can we rename filter_checked to skip_checks or skip_decode_check. I spent an embarrassing amount of time being confused by the changes in this PR because I interpreted filter_checked to mean “filter should be checked” not “filter should not be checked” which is what it seems to actually mean.


    kcalvinalvin commented at 7:10 am on May 2, 2022:
    Changed filter_checked to skip_decode_check
  37. ryanofsky approved
  38. ryanofsky commented at 4:15 pm on April 28, 2022: contributor
    Code review ACK 0b976a6dc0003898abaef14254f572b58f83926a
  39. fanquake requested review from mzumsande on Apr 29, 2022
  40. Make sanity check in GCSFilter constructor optional
    BlockFilterIndex will perform the cheaper check of verifying the filter
    hash when reading the filter from disk.
    b0a53d50d9
  41. kcalvinalvin force-pushed on May 2, 2022
  42. ryanofsky approved
  43. ryanofsky commented at 8:12 pm on May 2, 2022: contributor
    Code review ACK b97685965c6646b3a8521eb49d1f79b03a1b473f. Just renamed filter_checked variable since last review (thanks!)
  44. fanquake removed review request from mzumsande on May 3, 2022
  45. fanquake requested review from mzumsande on May 3, 2022
  46. jonatack approved
  47. jonatack commented at 3:35 pm on May 3, 2022: contributor
    Diff-review-only ACK b97685965c6646b3a8521eb49d1f79b03a1b473f per git diff 0b976a6 b976859, only change since my previous full review at #24832 (comment) is renaming the bool param added to GCSFilter() from filter_checked to skip_decode_check – this param is expected to go away in the planned follow-up commit 04008e2df6e2bac0cc92a2be05fec257a0fe7d7f to remove BlockFilter::Unserialize()
  48. in src/bench/gcs_filter.cpp:28 in a06ecf3579 outdated
    23+    auto elements = GenerateGCSTestElements();
    24+
    25+    GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, elements);
    26+    BlockFilter block_filter(BlockFilterType::BASIC, {}, filter.GetEncoded(), /*skip_decode_check=*/false);
    27+
    28+    bench.unit("elem").run([&] {
    


    mzumsande commented at 7:12 pm on May 3, 2022:

    Here and in the other added benchmarks. I don’t think “elem” as a custom unit is correct. My understanding is that it was used in GCSFilterConstruct together with .batch(elements.size()) in order to list the time per element because we expect the operation to scale with the number of elements (10k).

    But using it without .batch() in something like GCSBlockFilterGetHash that doesn’t really scale linearly with the number of elements doesn’t seem correct, the default “op” seems like a better fit.


    kcalvinalvin commented at 7:35 am on May 8, 2022:
    Will change all the units to op except for GCSFilterConstruct as that benchmark still is done with the 10k batch.

    mzumsande commented at 10:31 am on May 8, 2022:
    no need to type “op” (it’s the default), could just remove the .unit(...) as in most other benchmarks.

    kcalvinalvin commented at 11:19 am on May 8, 2022:
    Also fixed to just be bench.run
  49. in src/bench/gcs_filter.cpp:26 in a06ecf3579 outdated
    14@@ -15,29 +15,56 @@ static void ConstructGCSFilter(benchmark::Bench& bench)
    15         elements.insert(std::move(element));
    16     }
    17 
    18+    return elements;
    19+}
    20+
    21+static void GCSBlockFilterGetHash(benchmark::Bench& bench)
    


    mzumsande commented at 8:13 pm on May 3, 2022:
    Could name it GCSFilterGetHash instead of GCSBlockFilterGetHash for consistency with the other benchmarks.
  50. kcalvinalvin force-pushed on May 8, 2022
  51. kcalvinalvin force-pushed on May 8, 2022
  52. mzumsande commented at 4:51 pm on May 8, 2022: contributor
    Code review ACK b32e96578182b9af3283c12829d080fc4bba2be4
  53. jonatack commented at 8:11 pm on May 9, 2022: contributor
     0--- a/src/bench/gcs_filter.cpp
     1+++ b/src/bench/gcs_filter.cpp
     2@@ -5,10 +5,12 @@
     3 #include <bench/bench.h>
     4 #include <blockfilter.h>
     5 
     6+static constexpr int NUM_ELEMENTS = 10'000;
     7+
     8 static const GCSFilter::ElementSet GenerateGCSTestElements()
     9 {
    10     GCSFilter::ElementSet elements;
    11-    for (int i = 0; i < 10000; ++i) {
    12+    for (int i = 0; i < NUM_ELEMENTS; ++i) {
    13         GCSFilter::Element element(32);
    14         element[0] = static_cast<unsigned char>(i);
    15         element[1] = static_cast<unsigned char>(i >> 8);
    16@@ -18,15 +20,27 @@ static const GCSFilter::ElementSet GenerateGCSTestElements()
    17     return elements;
    18 }
    19 
    20+static void GCSBlockFilterGetHash(benchmark::Bench& bench)
    21+{
    22+    auto elements = GenerateGCSTestElements();
    23+
    24+    GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, elements);
    25+    BlockFilter block_filter(BlockFilterType::BASIC, {}, filter.GetEncoded(), /*skip_decode_check=*/false);
    26+
    27+    bench.batch(NUM_ELEMENTS).run([&] {
    28+        block_filter.GetHash();
    29+    });
    30+}
    31+
    32 static void GCSFilterConstruct(benchmark::Bench& bench)
    33 {
    34     auto elements = GenerateGCSTestElements();
    35 
    36     uint64_t siphash_k0 = 0;
    37-    bench.batch(elements.size()).unit("elem").run([&] {
    38+    bench.batch(NUM_ELEMENTS).run([&] {
    39         GCSFilter filter({siphash_k0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, elements);
    40 
    41-        siphash_k0++;
    42+        ++siphash_k0;
    43     });
    44 }
    45 
    46@@ -37,7 +51,7 @@ static void GCSFilterDecode(benchmark::Bench& bench)
    47     GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, elements);
    48     auto encoded = filter.GetEncoded();
    49 
    50-    bench.run([&] {
    51+    bench.batch(NUM_ELEMENTS).run([&] {
    52         GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded, /*skip_decode_check=*/false);
    53     });
    54 }
    55@@ -49,35 +63,24 @@ static void GCSFilterDecodeChecked(benchmark::Bench& bench)
    56     GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, elements);
    57     auto encoded = filter.GetEncoded();
    58 
    59-    bench.run([&] {
    60+    bench.batch(NUM_ELEMENTS).run([&] {
    61         GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded, /*skip_decode_check=*/true);
    62     });
    63 }
    64 
    65-static void GCSFilterGetHash(benchmark::Bench& bench)
    66-{
    67-    auto elements = GenerateGCSTestElements();
    68-
    69-    GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, elements);
    70-    BlockFilter block_filter(BlockFilterType::BASIC, {}, filter.GetEncoded(), /*skip_decode_check=*/false);
    71-
    72-    bench.run([&] {
    73-        block_filter.GetHash();
    74-    });
    75-}
    76-
    77 static void GCSFilterMatch(benchmark::Bench& bench)
    78 {
    79     auto elements = GenerateGCSTestElements();
    80 
    81     GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, elements);
    82 
    83-    bench.run([&] {
    84+    bench.batch(NUM_ELEMENTS).run([&] {
    85         filter.Match(GCSFilter::Element());
    86     });
    87 }
    88+
    89+BENCHMARK(GCSBlockFilterGetHash);
    90 BENCHMARK(GCSFilterConstruct);
    91 BENCHMARK(GCSFilterDecode);
    92 BENCHMARK(GCSFilterDecodeChecked);
    93-BENCHMARK(GCSFilterGetHash);
    94 BENCHMARK(GCSFilterMatch);
    

    Output of last push

    0|             ns/elem |              elem/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|            2,257.12 |          443,042.46 |    3.1% |      0.25 | `GCSFilterConstruct`
    3
    4|               ns/op |                op/s |    err% |     total | benchmark
    5|--------------------:|--------------------:|--------:|----------:|:----------
    6|        2,800,765.00 |              357.05 |    3.1% |      0.03 | `GCSFilterDecode`
    7|            1,218.41 |          820,739.84 |   11.4% |      0.01 | `GCSFilterDecodeChecked`
    8|           93,894.33 |           10,650.27 |    4.1% |      0.01 | `GCSFilterGetHash`
    9|          315,092.00 |            3,173.68 |    2.7% |      0.01 | `GCSFilterMatch`
    

    Output with proposed diff

    0|               ns/op |                op/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|                9.02 |      110,833,467.68 |    0.0% |      0.01 | `GCSBlockFilterGetHash`
    3|            2,183.44 |          457,992.58 |    1.3% |      0.24 | `GCSFilterConstruct`
    4|              264.56 |        3,779,922.34 |    1.7% |      0.03 | `GCSFilterDecode`
    5|                0.11 |    9,386,117,464.60 |    1.6% |      0.01 | `GCSFilterDecodeChecked`
    6|               30.60 |       32,684,901.21 |    1.6% |      0.01 | `GCSFilterMatch`
    

    Other notes:

    • Could be wrong but it seems to me the GCSBlockFilterGetHash name is preferable, as that describes the particularity of that bench.
    • I’m indifferent to “op” or “elem”; “op” for all the benchmarks here is fine.
  54. mzumsande commented at 9:58 pm on May 9, 2022: contributor

    Could be wrong but it seems to me the GCSBlockFilterGetHash name is preferable, as that describes the particularity of that bench.

    No, you are right, I missed before that this benchmark is on the level of BlockFilter, whereas the others are on the level of GCSFilter, so I agree.

    Rationale: compare all operations on the same basis (per operation).

    I’m indifferent to “op” or “elem”; “op” for all the benchmarks here is fine.

    My understanding of the bench framework (correct me if I’m wrong!) is that ns/op=10000 should be read such that one operation (whatever is inside of run()) takes 10000 ns on average. If we then choose .batch(NUM_ELEMENTS), we’d expect this number to be divided by NUM_ELEMENTS and thus go down to ns/elem=1 on average. So this means that it takes 1ns to process 1 filter element (so using the default ns/op would be kind of misleading in combination with .batch()). For this to make sense, there should be a linear relation between the custom unit (“elements”) and the time, so if we used .batch(NUM_ELEMENTS) and set NUM_ELEMENTS=100,000 instead of 10,000 the benchmark results shouldn’t change - which seems not really the case for some of the benchmarks here, at least locally for me. So maybe using the default ns/op and no .batch() for each benchmark would be the best here? Does that make sense?

  55. jonatack commented at 10:21 pm on May 9, 2022: contributor
    I need to verify after tuning for benchmarking. My hypothesis (to be confirmed or not) is that batch appears to behave like a linear scale factor. I agree that it seems good to use batch for all of them (result per op) or for none (result per batched op) to make comparing intra-bench results more intuitive.
  56. jonatack commented at 10:48 pm on May 9, 2022: contributor
    (Also to reverify: lower err% with batch?)
  57. kcalvinalvin commented at 8:39 am on May 10, 2022: contributor
    Changing NUM_ELEMENTS to 100,000 from 10,000 results in different ns/elem with my untuned system. I’ll try running the benchmark with a tuned system.
  58. kcalvinalvin commented at 7:46 pm on May 20, 2022: contributor

    My system

    I got around to doing the benchmarks with a (sorta) tuned system. I wasn’t able to achieve a completely stable benchmark system but the error rates are much lower now than before.

    output of sudo pyperf system:

     0[I] calvin@bitcoin ~/b/c/l/bitcoin (2020-06-14-blockfilterindex-checksums) [1]> sudo pyperf system
     1Show the system configuration
     2
     3System state
     4============
     5
     6CPU: use 6 logical CPUs: 0-5
     7Perf event: Maximum sample rate: 1 per second
     8ASLR: Full randomization
     9Linux scheduler: Isolated CPUs (6/12): 0-5
    10Linux scheduler: RCU disabled on CPUs (6/12): 0-5
    11CPU Frequency: 0-5=min=max=3600 MHz; 6-11=min=2200 MHz, max=3600 MHz
    12IRQ affinity: irqbalance service: inactive
    13IRQ affinity: Default IRQ affinity: CPU 6-11
    14IRQ affinity: IRQ affinity: IRQ 0,2=CPU 0-11; IRQ 1,3-15,25,40-41,73,75,77-78,80-89,91,93=CPU 6-11; IRQ 42,57=CPU 0; IRQ 43,58=CPU 1; IRQ 44,59=CPU 2; IRQ 45,60=CPU 3; IRQ 46,61=CPU 4; IRQ 47,62=CPU 5; IRQ 48,63=CPU 6; IRQ 49,64=CPU 7; IRQ 50,65=CPU 8; IRQ 51,66=CPU 9; IRQ 52,67=CPU 10; IRQ 53,68=CPU 11; IRQ 54-56,69-72=CPU None
    15
    16Warnings
    17========
    18
    19Turbo Boost (MSR): Failed to read MSR 0x1a0 from /dev/cpu/0/msr: [Errno 5] Input/output error
    20
    21OK! System ready for benchmarking
    

    For some reason pyperf doesn’t seem to recognize amd cpus with boost turned off. Boost is manually turned off by writing 0 to /sys/devices/system/cpu/cpufreq/boost.

    cat /sys/devices/system/cpu/cpufreq/boost output:

    0[N] calvin@bitcoin ~/b/c/l/bitcoin (2020-06-14-blockfilterindex-checksums)> cat /sys/devices/system/cpu/cpufreq/boost
    10
    

    I get this output when running the benchmarks:

    0Warning, results might be unstable:
    1* CPU frequency scaling enabled: CPU 6 between 2,200.0 and 3,600.0 MHz
    2* Turbo is enabled, CPU frequency will fluctuate
    3
    4Recommendations
    5* Use 'pyperf system tune' before benchmarking. See https://github.com/psf/pyperf
    

    But I believe pyperf only uses the cpus that are isolated from the kernel scheduler so I believe the warning will not affect the benchmarks. I double checked and also isolated CPU 6 and pyperf repeated the same warning with CPU 7 so I think it’s ok. Pyperf also doesn’t support turning off boost for amd cpus so I believe that warning is an error by pyperf.

    Benchmarks

    I tested to see if the results would be linear across many different NUM_ELEMENTs. Since the original benchmarks were done with 10,000, I took powers of 10 for the NUM_ELEMENTs for my test, ranging from 1-100,000,000.

    The benchmarks were done with batching and without batching. The batched test resulted in a graph like so:

    results_from_benchmarks_ns-elem

    The tests done without batching resulted in a graph like so:

    results_from_benchmarks_ns-op

    From these benchmarks, you can see that there is a linear relation between the custom unit (“elements”) and the time once you have 100,000 elements. Anything below 100,000 doesn’t result in a linear relation.

    ns/elem is easier to read than ns/op since 9.7million is kinda hard to read. However, you can also take a look at op/s which is just as easy to read as ns/elem. Here’s a graph without batch with op/s instead of ns/op.

    results_from_benchmarks_op-s

    Error rates with and without batch

    I also graphed how the error rates compare between calling batch and not calling batch.

    error_rates_for_benchmarks

    Average error rate in % with batch: 0.38888888888888884 Average error rate in % without batch: 0.4

    Calling batch doesn’t seem to effect the error rates in my testing.

    My takeaway

    • Lower NUM_ELEMENT count seem to result in ns/elem that’s same for different NUM_ELEMENT values. Needs a minimum of 100,000 elements.
    • It’d be better to unify everything under ns/op as batch doesn’t provide any less error rates.

    txt file for the raw benchmark results

  59. Add GCSFilterDecode and GCSBlockFilterGetHash benchmarks.
    All of the benchmarks are standardized on the BASIC filter parameters
    so we can compare between all the benchmarks. All the GCS
    benchmarks are renamed to have "GCS" as the prefix.
    299023c1d9
  60. Add GCSFilterDecodeSkipCheck benchmark
    This benchmark allows us to compare the differences between doing the
    sanity check for corruption via GolombRiceDecode() vs checking the hash
    of the encoded block filter.
    aee9a8140b
  61. Update GCSFilter benchmarks
    Element count used in the GCSFilter benchmarks are increased to 100,000
    from 10,000. Testing the benchmarks with different element counts showed
    that a filter with 100,000 elements resulted in the same ns/op. This
    this a desirable thing to have as it allows us to reason about how long
    a single filter element takes to process, letting us easily calculate
    how long a filter with N elements (where N > 100,000) would take to
    process.
    
    GCSFilterConstruct benchmark is now called without batch. This makes
    intra-bench results more intuitive as all benchmarks are in ns/op
    instead of a custom unit. There are no downsides to this change as
    testing showed that there is no observable difference in error rates
    in the benchmarks when calling without batch.
    e734228d85
  62. kcalvinalvin force-pushed on May 22, 2022
  63. kcalvinalvin commented at 5:42 am on May 22, 2022: contributor

    4 things changed in the latest push:

    1. Renamed the GCSFilterGetHash benchmark back to GCSBlockFilterGetHash.
    2. Renamed GCSFilterDecodeChecked to GCSFilterDecodeSkipCheck as filter_checked argument was renamed to skip_decode_check. Changing the name of the benchmark as well seemed fitting.
    3. Added a commit where GenerateGCSTestElements generates 100,000 elements instead of 10,000. The reasoning was based off of my findings in testing the benchmarks. I’ve included an explanation in the code as a comment and in the commit message.
    4. In the same added commit, I’ve changed the GCSFilterConstruct to be called without batch and the custom unit. This was done as it makes comparing intra-bench results more intuitive. Calling with batch didn’t result in lower error rates either so there were no tradeoffs made.
  64. kcalvinalvin requested review from jonatack on May 30, 2022
  65. kcalvinalvin requested review from mzumsande on May 30, 2022
  66. mzumsande commented at 8:44 pm on May 31, 2022: contributor

    Code Review ACK e734228d8585c0870c71ce8ba8c037f8cf8b249a

    I’m not sure if it really matters a lot how big we choose the number of elements used in the test, I just think that if we pick custom measure other than “ops”, the benchmark should be linear in that - so I like that everything uses ns/op now. In any case, the important feature of this PR is the improvement of the filter verification.

  67. in src/index/blockfilterindex.cpp:163 in e734228d85
    160-        filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter));
    161+        uint256 result;
    162+        CHash256().Write(encoded_filter).Finalize(result);
    163+        if (result != hash) return error("Checksum mismatch in filter decode.");
    164+        filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter), /*skip_decode_check=*/true);
    165     }
    


    stickies-v commented at 11:27 pm on June 23, 2022:
    Would you consider moving the hash checking into a separate function for readability and maintainability?
  68. in src/index/blockfilterindex.cpp:147 in e734228d85
    143@@ -143,18 +144,22 @@ bool BlockFilterIndex::CommitInternal(CDBBatch& batch)
    144     return BaseIndex::CommitInternal(batch);
    145 }
    146 
    147-bool BlockFilterIndex::ReadFilterFromDisk(const FlatFilePos& pos, BlockFilter& filter) const
    148+bool BlockFilterIndex::ReadFilterFromDisk(const FlatFilePos& pos, const uint256& hash, BlockFilter& filter) const
    


    stickies-v commented at 11:32 pm on June 23, 2022:

    nit: Since this function also has a block_hash variable, maybe filter_hash would be a clearer disambiguation?

    0bool BlockFilterIndex::ReadFilterFromDisk(const FlatFilePos& pos, const uint256& filter_hash, BlockFilter& filter) const
    
  69. in src/index/blockfilterindex.cpp:159 in e734228d85
    156     uint256 block_hash;
    157     std::vector<uint8_t> encoded_filter;
    158     try {
    159         filein >> block_hash >> encoded_filter;
    160-        filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter));
    161+        uint256 result;
    


    stickies-v commented at 11:33 pm on June 23, 2022:

    calculated_hash might be more intuitive?

    0        uint256 calculated_hash;
    
  70. in src/index/blockfilterindex.cpp:160 in e734228d85
    157     std::vector<uint8_t> encoded_filter;
    158     try {
    159         filein >> block_hash >> encoded_filter;
    160-        filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter));
    161+        uint256 result;
    162+        CHash256().Write(encoded_filter).Finalize(result);
    


    stickies-v commented at 11:38 pm on June 23, 2022:

    I think {} list initialization is generally preferred? Here, and in some other places in the PR.

    0        CHash256{}.Write(encoded_filter).Finalize(result);
    
  71. in src/index/blockfilterindex.cpp:161 in e734228d85
    158     try {
    159         filein >> block_hash >> encoded_filter;
    160-        filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter));
    161+        uint256 result;
    162+        CHash256().Write(encoded_filter).Finalize(result);
    163+        if (result != hash) return error("Checksum mismatch in filter decode.");
    


    stickies-v commented at 11:42 pm on June 23, 2022:
    Would it be helpful to include pos in the error message? Also, not sure if “filter decode” is the most accurate description? What about e.g. “Checksum mismatch for filter loaded from <pos>”?
  72. theStack commented at 1:33 pm on July 4, 2022: contributor
    Concept ACK
  73. theStack approved
  74. theStack commented at 11:23 am on July 6, 2022: contributor

    Code-review ACK e734228d8585c0870c71ce8ba8c037f8cf8b249a

    FWIW, this are the benchmark results on my machine:

    0$ ./src/bench/bench_bitcoin -filter=GCS.*
    
    ns/op op/s err% total benchmark
    719,504.00 1,389.85 2.4% 0.01 GCSBlockFilterGetHash
    28,431,064.00 35.17 4.0% 0.31 GCSFilterConstruct
    3,641,177.00 274.64 3.4% 0.04 GCSFilterDecode
    13,994.04 71,458.99 1.5% 0.01 GCSFilterDecodeSkipCheck
    424,500.50 2,355.71 2.2% 0.01 GCSFilterMatch
  75. fanquake requested review from ryanofsky on Jul 6, 2022
  76. stickies-v approved
  77. stickies-v commented at 4:10 pm on July 6, 2022: contributor

    ACK e734228d8585c0870c71ce8ba8c037f8cf8b249a

    Non-controversial and significant performance improvement when loading block filters from disk, which can be a frequent process depending on peers or RPC users. I’ve left some style/readability suggestions but none of them are blocking for me.

    If BlockFilter::Unserialize() is removed in a follow up, further code cleanup is possible by removing the skip_decode_check parameter and the check branch.

    $ ./src/bench/bench_bitcoin -filter=GCS.* yields

    0|               ns/op |                op/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|          766,333.00 |            1,304.92 |    0.8% |      0.01 | `GCSBlockFilterGetHash`
    3|       60,891,167.00 |               16.42 |    0.3% |      0.67 | `GCSFilterConstruct`
    4|       15,802,125.00 |               63.28 |    0.3% |      0.17 | `GCSFilterDecode`
    5|        1,697,875.00 |              588.97 |    0.2% |      0.02 | `GCSFilterDecodeSkipCheck`
    6|        1,687,000.00 |              592.77 |    0.1% |      0.02 | `GCSFilterMatch`
    
  78. ryanofsky approved
  79. ryanofsky commented at 4:34 pm on July 7, 2022: contributor
    Code review ACK e734228d8585c0870c71ce8ba8c037f8cf8b249a, with caveat that I mostly paid attention to the main code, not the changes to the benchmark. Only changes since last review were changes to the benchmark code.
  80. fanquake commented at 4:59 pm on July 7, 2022: member
    @stickies-v maybe you’d like to follow up here?
  81. fanquake merged this on Jul 7, 2022
  82. fanquake closed this on Jul 7, 2022

  83. jonatack commented at 5:02 pm on July 7, 2022: contributor
    If someone follows up, this comment and patch might be useful: #24832 (comment)
  84. kcalvinalvin commented at 5:08 pm on July 7, 2022: contributor
  85. sidhujag referenced this in commit a42480f022 on Jul 8, 2022
  86. kcalvinalvin referenced this in commit f5563da683 on Jul 19, 2022
  87. kcalvinalvin referenced this in commit 32953718a3 on Jul 27, 2022
  88. jonatack commented at 1:56 pm on July 27, 2022: contributor
    Post-merge re-ACK e734228d8585c0870c71ce8ba8c037f8cf8b249a, thanks for updating.
  89. kcalvinalvin referenced this in commit 727648a515 on Aug 5, 2022
  90. kcalvinalvin referenced this in commit ac4bf1ec5d on Aug 5, 2022
  91. kcalvinalvin referenced this in commit 1204e3d4d3 on Aug 5, 2022
  92. kcalvinalvin referenced this in commit 9b87854cf4 on Aug 30, 2022
  93. kcalvinalvin referenced this in commit 51fd464150 on Aug 31, 2022
  94. kcalvinalvin referenced this in commit 07fdd10ca1 on Aug 31, 2022
  95. bitcoin locked this on Jul 27, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-31 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me