index: Remove unused BlockFilter::Unserialize() #25637

pull kcalvinalvin wants to merge 1 commits into bitcoin:master from kcalvinalvin:2022-07-13-remove-unused-blockfilter-unserialize changing 7 files +37 −72
  1. kcalvinalvin commented at 5:16 am on July 19, 2022: contributor

    BlockFilter::Unserialize is only used in the test code. Moving it to the tests allows for the removal of the optional sanity check in the GCSFilter and BlockFilter constructor, resulting in simpler code.

    This is a follow up to #24832.

  2. DrahtBot added the label UTXO Db and Indexes on Jul 19, 2022
  3. jonatack commented at 10:19 am on July 19, 2022: contributor

    ACK

    Some nits if you retouch:

    • Not sure all the scope brackets in the tests are useful.
    • Prefere prefix iterators (faster due to avoiding a copy), see our developer notes
    • Avoid unneeded temporaries in loop
    • s/insert/emplace/ where it makes sense
    • named params and constify where helpful/clearer
     0diff --git a/src/test/blockfilter_tests.cpp b/src/test/blockfilter_tests.cpp
     1index a17a71bf9f..a20b49e0a7 100644
     2--- a/src/test/blockfilter_tests.cpp
     3+++ b/src/test/blockfilter_tests.cpp
     4@@ -62,7 +62,7 @@ static BlockFilter UnserializeBlockFilter(Stream& s) {
     5       >> block_hash
     6       >> encoded_filter;
     7 
     8-    BlockFilterType filter_type = static_cast<BlockFilterType>(filter_type_uint8);
     9+    const BlockFilterType filter_type = static_cast<BlockFilterType>(filter_type_uint8);
    10     BlockFilter block_filter(filter_type, block_hash, std::move(encoded_filter));
    11     return block_filter;
    12 }
    13diff --git a/src/test/fuzz/blockfilter.cpp b/src/test/fuzz/blockfilter.cpp
    14index db3788d46f..2a1d81d970 100644
    15--- a/src/test/fuzz/blockfilter.cpp
    16+++ b/src/test/fuzz/blockfilter.cpp
    17@@ -22,23 +22,17 @@ FUZZ_TARGET(blockfilter)
    18 
    19     GCSFilter::ElementSet elements;
    20     size_t num_elements = fuzzed_data_provider.ConsumeIntegralInRange<unsigned int>(1, 100000);
    21-    for (size_t i=0; i < num_elements; i++) {
    22-        std::vector<unsigned char> element = ConsumeRandomLengthIntegralVector<unsigned char>(fuzzed_data_provider, 32);
    23-        elements.insert(element);
    24+    for (size_t i=0; i < num_elements; ++i) {
    25+        elements.emplace(ConsumeRandomLengthIntegralVector<unsigned char>(fuzzed_data_provider, /*max_vector_size=*/32));
    26char>(fuzzed_data_provider, 32));
    27     }
    28 
    29     GCSFilter filter({u256->GetUint64(0), u256->GetUint64(1), BASIC_FILTER_P, BASIC_FILTER_M}, elements);
    30-    BlockFilter block_filter(BlockFilterType::BASIC, u256.value(), filter.GetEncoded());
    31-    {
    32-        (void)block_filter.ComputeHeader(ConsumeUInt256(fuzzed_data_provider));
    33-        (void)block_filter.GetBlockHash();
    34-        (void)block_filter.GetEncodedFilter();
    35-        (void)block_filter.GetHash();
    36-    }
    37-    {
    38-        const BlockFilterType block_filter_type = block_filter.GetFilterType();
    39-        (void)BlockFilterTypeName(block_filter_type);
    40-    }
    41+    const BlockFilter block_filter(BlockFilterType::BASIC, u256.value(), filter.GetEncoded());
    42+    (void)block_filter.ComputeHeader(ConsumeUInt256(fuzzed_data_provider));
    43+    (void)block_filter.GetBlockHash();
    44+    (void)block_filter.GetEncodedFilter();
    45+    (void)block_filter.GetHash();
    46+    (void)BlockFilterTypeName(block_filter.GetFilterType());
    47     {
    48         const GCSFilter gcs_filter = block_filter.GetFilter();
    49         (void)gcs_filter.GetN();
    
  4. shaavan commented at 1:54 pm on July 20, 2022: contributor

    ACK f5563da6836433a08438893c77bd87028e4b026c

    Since it is my first time reviewing a PR concerning filters and streams, I want to detail my understanding of this PR before proceeding with the review. If my account is erroneous, please do correct me. Thank you.

    • Normally, the filters are initialized manually and are not read from a stream. So we can be sure that the encoded_filter always contains the correct number of bits.

    Eg.

    0GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, elements);
    
    • And when the encoded_filter is read from a data stream in a function, a preliminary check is done before passing it through the BlockFilter function.

    Eg.

    0filein >> block_hash >> encoded_filter;
    1uint256 result;
    2CHash256().Write(encoded_filter).Finalize(result);
    3if (result != hash) return error("Checksum mismatch in filter decode.");
    
    • The only function which allows getting the encoded_filter from the data stream and passing it through the GCSFilter() function without any preliminary checks is the Unserialize function.
    • Since this function is used only in the testing, it could be moved to the test file, which removes the need for checking encoded_filter correctness in the BlockFilter/GCSFilter functions.
    • In the test file, the correctness of encoded_filter is checked by checking if the encoded_filter remains unchanged after moving block_filter in and out of the stream.
    0stream << block_filter;
    1stream >> block_filter2;
    2block_filter2 = UnserializeBlockFilter(stream);
    3...
    4BOOST_CHECK(block_filter.GetEncodedFilter() == block_filter2.GetEncodedFilter()); 
    

    Review:

    • I agree with moving the Unserialize function to the test file as it simplifies the code’s logic while making it more straightforward.
    • The code changes are clean and logical.
    • And successful passing of CI shows that the fuzz testing was successful on the PR.
    • I would like to suggest addressing @JonAtack suggestions before merging this PR.
  5. DrahtBot commented at 11:09 am on July 22, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #26158 (bench: add “priority level” to the benchmark framework by furszy)
    • #25296 (Add DataStream without ser-type and ser-version and use it where possible by MarcoFalke)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  6. kcalvinalvin force-pushed on Jul 27, 2022
  7. kcalvinalvin commented at 7:52 am on July 27, 2022: contributor
    • Not sure all the scope brackets in the tests are useful.
    • Prefere prefix iterators (faster due to avoiding a copy), see our developer notes
    • Avoid unneeded temporaries in loop
    • s/insert/emplace/ where it makes sense
    • named params and constify where helpful/clearer

    Pushed new code with the suggested changes stated here.

  8. shaavan approved
  9. shaavan commented at 1:28 pm on July 27, 2022: contributor

    reACK 32953718a3fb8e2f298b99e1a2ee72af1ea5be8c

    Changes since my last review:

    I verified that the update adequately addresses all the suggestions in this comment.

  10. fanquake commented at 1:31 pm on July 27, 2022: member
    cc @mzumsande @ryanofsky @stickies-v @theStack given you all reviewed #24832.
  11. jonatack commented at 1:40 pm on July 27, 2022: contributor
    ACK 32953718a3fb8e2f298b99e1a2ee72af1ea5be8c edit: modulo removing the GCSFilterDecodeSkipCheck() benchmark as pointed out below
  12. in src/test/fuzz/blockfilter.cpp:18 in 32953718a3 outdated
    14@@ -15,22 +15,26 @@
    15 FUZZ_TARGET(blockfilter)
    16 {
    17     FuzzedDataProvider fuzzed_data_provider(buffer.data(), buffer.size());
    18-    const std::optional<BlockFilter> block_filter = ConsumeDeserializable<BlockFilter>(fuzzed_data_provider);
    19-    if (!block_filter) {
    20+    const std::optional<uint256> u256 = ConsumeDeserializable<uint256>(fuzzed_data_provider);
    


    stickies-v commented at 2:54 pm on July 27, 2022:
    nit: I don’t understand the fuzzing code very well yet, but is there no more descriptive name than u256?

    kcalvinalvin commented at 12:56 pm on August 2, 2022:
    It’s mainly being used as a blockhash so I guess I could rename it to be block_hash
  13. in src/test/blockfilter_tests.cpp:130 in 32953718a3 outdated
    126@@ -112,7 +127,7 @@ BOOST_AUTO_TEST_CASE(blockfilter_basic_test)
    127 
    128     CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
    129     stream << block_filter;
    130-    stream >> block_filter2;
    131+    block_filter2 = UnserializeBlockFilter(stream);
    


    stickies-v commented at 3:01 pm on July 27, 2022:

    nit: I’d just initialize at once:

    0    BlockFilter block_filter2 {UnserializeBlockFilter(stream)};
    
  14. in src/bench/gcs_filter.cpp:70 in 32953718a3 outdated
    66@@ -67,7 +67,7 @@ static void GCSFilterDecodeSkipCheck(benchmark::Bench& bench)
    67     auto encoded = filter.GetEncoded();
    68 
    69     bench.run([&] {
    70-        GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded, /*skip_decode_check=*/true);
    


    stickies-v commented at 3:23 pm on July 27, 2022:
    I think the entire GCSFilterDecodeSkipCheck() test can be removed? It’s identical to GCSFilterDecode()

    jonatack commented at 3:51 pm on July 27, 2022:
    Indeed, thanks for noticing :+1:

    kcalvinalvin commented at 12:52 pm on August 2, 2022:
    Good catch. I’ll remove it
  15. stickies-v commented at 3:24 pm on July 27, 2022: contributor

    Approach ACK 32953718a , with the caveat that I don’t understand the fuzzing code very well so count that as unreviewed from my end (even though I left a small nit)

    I think GCSFilterDecodeSkipCheck() should be removed before ACK. Besides that, left 2 style comments but none blocking.

    Summary: #24832 introduced a new file hash-based check to ensure filter integrity. This hash check (in BlockFilterIndex::ReadFilterFromDisk) can not be used when filters are unserialized from a stream because the hash is unknown, but that unserialization only happened in the test suite. As such, this PR moving unserialization logic to /src/test/ and simplifying the non-test code removing the skip_decode_check parameter and path makes sense.

  16. in src/test/fuzz/blockfilter.cpp:21 in 32953718a3 outdated
    14@@ -15,22 +15,26 @@
    15 FUZZ_TARGET(blockfilter)
    16 {
    17     FuzzedDataProvider fuzzed_data_provider(buffer.data(), buffer.size());
    18-    const std::optional<BlockFilter> block_filter = ConsumeDeserializable<BlockFilter>(fuzzed_data_provider);
    19-    if (!block_filter) {
    20+    const std::optional<uint256> u256 = ConsumeDeserializable<uint256>(fuzzed_data_provider);
    21+    if (!u256) {
    22         return;
    23     }
    


    theStack commented at 7:13 am on July 28, 2022:

    I’m not too deep into fuzzing, but IIUC, we just want a random uint256 here and there is not really any benefit by involving a deserialization (that could fail)?

    0    const uint256 u256 = ConsumeUInt256(fuzzed_data_provider);
    

    Could also limit this variable’s scope by moving it down before it’s first use (that is, before GCSFilter is constructed).


    kcalvinalvin commented at 12:57 pm on August 2, 2022:

    we just want a random uint256 here and there is not really any benefit by involving a deserialization (that could fail)?

    Yeah that’s right. I’ll apply the suggested change.

    Could also limit this variable’s scope by moving it down before it’s first use (that is, before GCSFilter is constructed).

    I’m also using the it for entropy with the siphash key. But I could get rid of that and just use the default key and initialize it right before the filter construction.

    Oops this is also done during the filter construction. I’ll move it down.

  17. theStack approved
  18. theStack commented at 7:23 am on July 28, 2022: contributor

    Code-review ACK 32953718a3fb8e2f298b99e1a2ee72af1ea5be8c (modulo potential benchmark/fuzz test improvements)

    Left a nit regarding the fuzz test below. I also agree that the benchmark GCSFilterDecodeSkipCheck can be removed as it’s identical to GCSFilterDecode now, as suggested by stickies-v (good catch!).

  19. ryanofsky commented at 6:17 pm on July 29, 2022: contributor

    Just commenting on this PR since I got a review request above. I am -0 on this change. I don’t think it’s an improvement, but would not object if others want to go ahead with it.

    I think this is worse for code organization and debugability. I think serialize and deserialize methods should live next to each other in the code so they can don’t get out of sync and stay available for understanding and debugging. If there was a large amount of code that was only called by tests and not used by the application, I’d agree it should be moved or removed. But this is a small amount and IMO this change just moves it someplace harder to find.

  20. mzumsande commented at 6:56 pm on July 29, 2022: contributor
    I see this similarly (as expressed here), but I think removing the skip_decode_check logic is orthogonal to this and would definitely be an improvement.
  21. kcalvinalvin force-pushed on Aug 5, 2022
  22. stickies-v commented at 8:12 pm on August 8, 2022: contributor

    code review ACK 1204e3d4d

    Verified that since https://github.com/bitcoin/bitcoin/commit/32953718a3fb8e2f298b99e1a2ee72af1ea5be8c the only changes made are incorporating (all) the suggestions made by myself and @theStack

    I think serialize and deserialize methods should live next to each other in the code so they can don’t get out of sync and stay available for understanding and debugging.

    I’m not sufficiently familiar with how (de)serialization is organized across the codebase, so I’ll refrain from going beyond a code review ACK, even though otherwise I think this PR is in good shape.

  23. theStack approved
  24. theStack commented at 2:34 pm on August 11, 2022: contributor

    re-ACK 1204e3d4d35a1adf7e2160f29b604bb1ba1ac899

    While I agree that to each serialization routine there should also be a deserialization counterpart and it wouldn’t be a good idea to remove it completely, I think just having it in the unit tests is fine. If serialization and deserialization went out of sync, the serialization/deserialization roundtrip test would likely detect this, no?

  25. in src/test/blockfilter_tests.cpp:67 in 1204e3d4d3 outdated
    62+      >> block_hash
    63+      >> encoded_filter;
    64+
    65+    const BlockFilterType filter_type = static_cast<BlockFilterType>(filter_type_uint8);
    66+    BlockFilter block_filter(filter_type, block_hash, std::move(encoded_filter));
    67+    return block_filter;
    


    jonatack commented at 2:48 pm on August 15, 2022:

    Can drop a temporary here if you retouch

    0-    BlockFilter block_filter(filter_type, block_hash, std::move(encoded_filter));
    1-    return block_filter;
    2+    return BlockFilter{filter_type, block_hash, std::move(encoded_filter)};
    
  26. in src/test/blockfilter_tests.cpp:128 in 1204e3d4d3 outdated
    122@@ -108,11 +123,9 @@ BOOST_AUTO_TEST_CASE(blockfilter_basic_test)
    123     }
    124 
    125     // Test serialization/unserialization.
    126-    BlockFilter block_filter2;
    127-
    128     CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
    129     stream << block_filter;
    130-    stream >> block_filter2;
    131+    BlockFilter block_filter2 {UnserializeBlockFilter(stream)};
    


    jonatack commented at 2:49 pm on August 15, 2022:

    clang format

    0    BlockFilter block_filter2{UnserializeBlockFilter(stream)};
    
  27. in src/test/blockfilter_tests.cpp:56 in 1204e3d4d3 outdated
    51@@ -52,6 +52,21 @@ BOOST_AUTO_TEST_CASE(gcsfilter_default_constructor)
    52     BOOST_CHECK_EQUAL(params.m_M, 1U);
    53 }
    54 
    55+template <typename Stream>
    56+static BlockFilter UnserializeBlockFilter(Stream& s) {
    


    jonatack commented at 2:50 pm on August 15, 2022:

    clang format

     0 template <typename Stream>
     1-static BlockFilter UnserializeBlockFilter(Stream& s) {
     2+static BlockFilter UnserializeBlockFilter(Stream& s)
     3+{
     4     std::vector<unsigned char> encoded_filter;
     5     uint8_t filter_type_uint8;
     6     uint256 block_hash;
     7 
     8-    s >> filter_type_uint8
     9-      >> block_hash
    10-      >> encoded_filter;
    11+    s >> filter_type_uint8 >> block_hash >> encoded_filter;
    
  28. in src/test/fuzz/blockfilter.cpp:21 in 1204e3d4d3 outdated
    29-        const BlockFilterType block_filter_type = block_filter->GetFilterType();
    30-        (void)BlockFilterTypeName(block_filter_type);
    31+
    32+    GCSFilter::ElementSet elements;
    33+    size_t num_elements = fuzzed_data_provider.ConsumeIntegralInRange<unsigned int>(1, 100000);
    34+    for (size_t i=0; i < num_elements; ++i) {
    


    jonatack commented at 2:51 pm on August 15, 2022:

    clang format

    0    for (size_t i = 0; i < num_elements; ++i) {
    
  29. jonatack commented at 2:58 pm on August 15, 2022: contributor

    ACK 1204e3d4d35a1adf7e2160f29b604bb1ba1ac899

    Some nit comments (feel free to ignore), most of which would be pre-empted by running

    0git diff -U0 HEAD~1.. | ./contrib/devtools/clang-format-diff.py -p1 -i -v
    

    to apply clang formatting to the changes; see contrib/devtools/clang-format-diff.py.

  30. kcalvinalvin force-pushed on Aug 30, 2022
  31. kcalvinalvin commented at 8:33 am on August 30, 2022: contributor

    ACK 1204e3d

    Some nit comments (feel free to ignore), most of which would be pre-empted by running

    0git diff -U0 HEAD~1.. | ./contrib/devtools/clang-format-diff.py -p1 -i -v
    

    to apply clang formatting to the changes; see contrib/devtools/clang-format-diff.py.

    Addressed.

  32. jonatack commented at 3:56 pm on August 30, 2022: contributor

    Looks like ae7ae36d311a869b3bda41 added a new check to the Clang-tidy CI (unrelated to changes in the last push) that is red now:

    0/src/test/fuzz/blockfilter.cpp:34:25: error: the const qualified variable 'gcs_filter'
    1is copy-constructed from a const reference; consider making it a const reference
    2[performance-unnecessary-copy-initialization,-warnings-as-errors]
    3        const GCSFilter gcs_filter = block_filter.GetFilter();
    4                        ^
    5                       &
    

    (ACK 9b87854 per git diff 1204e3d 9b87854 otherwise)

  33. kcalvinalvin force-pushed on Aug 31, 2022
  34. index: Remove unused BlockFilter::Unserialize()
    BlockFilter::Unserialize is only used in the test code. Moving it to
    the tests allows for the removal of the optional sanity check in the
    GCSFilter and BlockFilter constructor, resulting in simpler code.
    
    This is a follow up to #24832.
    07fdd10ca1
  35. kcalvinalvin force-pushed on Aug 31, 2022
  36. stickies-v commented at 1:13 pm on August 31, 2022: contributor

    Code Review re-ACK 07fdd10

    Verified that since 1204e3d changes are limited to clang formatting and declaring gcs_filter as a reference variable.

  37. jonatack commented at 1:19 pm on August 31, 2022: contributor
    Diff-only review re-ACK 07fdd10ca1d0bd2bc1d2756f1bc2c069ae97c45a per git diff 1204e3d 07fdd10
  38. maflcko referenced this in commit f821fc9813 on Sep 1, 2022
  39. sidhujag referenced this in commit 502389d546 on Sep 1, 2022
  40. glozow commented at 7:57 pm on October 12, 2022: member
    cc @theStack for re-ACK?
  41. theStack approved
  42. theStack commented at 8:34 pm on October 12, 2022: contributor

    re-ACK 07fdd10ca1d0bd2bc1d2756f1bc2c069ae97c45a

    (verified via git range-diff 1204e3d4...07fdd10c that changes since my last ACK are trivial refactors)

  43. achow101 commented at 2:33 pm on October 13, 2022: member
    @furszy My understanding is that you are working on implementing something that requires block filter download, which would require having this Unserialize method. If that’s the case, then I don’t think it really makes sense to merge this PR now as it would mean that already existing/planned future work will just end up reverting this PR.
  44. furszy commented at 5:00 pm on October 14, 2022: member

    Yeah, the new feature requires to unserialize block filters from the wire. Would be good to not remove this method, so I don’t have to re-introduce it later on.

    Thanks for the ping @achow101.

  45. kcalvinalvin closed this on Oct 15, 2022

  46. kcalvinalvin commented at 2:29 pm on October 15, 2022: contributor
    Talked to andrew and fursy and am convinced that the PR would just have to be reverted in a later commit. Closing.
  47. bitcoin locked this on Oct 15, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-31 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me