Assumeutxo: Sanitize block height in metadata #30516

pull fjahr wants to merge 2 commits into bitcoin:master from fjahr:2024-07-au-blockheight-san changing 3 files +11 −1
  1. fjahr commented at 10:21 pm on July 23, 2024: contributor

    Fixes #30514 which has more context.

    The block height in the assumeutxo metadata was implicitly cast to int which could cause it to be interpreted as signed.

    The first commit shows how to reproduce the error shown in #30514 and how it could be fixed with a minimal change. The second commit adds an explicit sanitizer that will probably be the preferred approach. I can squash the commits but first I would like to hear conceptual feedback since #30514 suggested to remove the block height instead.

  2. Assumeutxo: Fix implicit conversion of block height and add test 98e119da35
  3. Assumeutxo: Sanitize block height in assumeutxo metadata 51f197bd84
  4. DrahtBot commented at 10:21 pm on July 23, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK Sjors

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30598 (assumeutxo: Drop block height from metadata by fjahr)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. in src/node/utxo_snapshot.h:104 in 51f197bd84
     98@@ -99,6 +99,9 @@ class SnapshotMetadata
     99         }
    100 
    101         s >> m_base_blockheight;
    102+        if (m_base_blockheight > static_cast<uint32_t>(std::numeric_limits<int>::max())) {
    103+            throw std::ios_base::failure("Block height is out of range.");
    104+        }
    


    maflcko commented at 5:19 am on July 24, 2024:

    The fix isn’t enough. The file may be completely untrusted and corrupt, so just because the block height is a signed 32-bit, but positive integer, doesn’t mean that the height is correct.

    You’ll have to check that the block height corresponds to the block hash, otherwise I fail to see how it can be used without adding bugs or risks or confusion. However, if you do the check, you may as well discard the deserialized value and just use the value from the check.

    So again, my recommendation would be to remove it, because either:

    1. The value is untrusted, thus can not be used
    2. The value is checked, in which case it is redundant with the value from the check

    An alternative would be to write a full file-content hash in the metadata as the second field (after the file magic), covering the whole file after the second field. This way, the file-content hash would be retrieved from the trusted chainparams, then calculated and compared. Afterwards the code can assume that all fields in the checked file are trusted to some extent, which would allow using the block height value.

    However, that is a more involved fix, so my recommendation for now would be to drop the value.

    Suggested diff:

      0diff --git a/src/node/utxo_snapshot.h b/src/node/utxo_snapshot.h
      1index a7c4135787..bd5b68ba30 100644
      2--- a/src/node/utxo_snapshot.h
      3+++ b/src/node/utxo_snapshot.h
      4@@ -28,16 +28,17 @@ class Chainstate;
      5 namespace node {
      6 //! Metadata describing a serialized version of a UTXO set from which an
      7 //! assumeutxo Chainstate can be constructed.
      8+//! All metadata fields come from an untrusted file, so must be validated
      9+//! before being used. Thus, new fields should be added only if needed.
     10 class SnapshotMetadata
     11 {
     12-    const uint16_t m_version{1};
     13-    const std::set<uint16_t> m_supported_versions{1};
     14+    static const uint16_t VERSION{2};
     15+    const std::set<uint16_t> m_supported_versions{VERSION};
     16     const MessageStartChars m_network_magic;
     17 public:
     18     //! The hash of the block that reflects the tip of the chain for the
     19     //! UTXO set contained in this snapshot.
     20     uint256 m_base_blockhash;
     21-    uint32_t m_base_blockheight;
     22 
     23 
     24     //! The number of coins in the UTXO set contained in this snapshot. Used
     25@@ -54,15 +55,13 @@ public:
     26         uint64_t coins_count) :
     27             m_network_magic(network_magic),
     28             m_base_blockhash(base_blockhash),
     29-            m_base_blockheight(base_blockheight),
     30             m_coins_count(coins_count) { }
     31 
     32     template <typename Stream>
     33     inline void Serialize(Stream& s) const {
     34         s << SNAPSHOT_MAGIC_BYTES;
     35-        s << m_version;
     36+        s << VERSION;
     37         s << m_network_magic;
     38-        s << m_base_blockheight;
     39         s << m_base_blockhash;
     40         s << m_coins_count;
     41     }
     42@@ -98,7 +97,6 @@ public:
     43             }
     44         }
     45 
     46-        s >> m_base_blockheight;
     47         s >> m_base_blockhash;
     48         s >> m_coins_count;
     49     }
     50diff --git a/src/rpc/blockchain.cpp b/src/rpc/blockchain.cpp
     51index 9899a13a1e..ee72fb2c20 100644
     52--- a/src/rpc/blockchain.cpp
     53+++ b/src/rpc/blockchain.cpp
     54@@ -2862,10 +2862,12 @@ static RPCHelpMan loadtxoutset()
     55         throw JSONRPCError(RPC_INTERNAL_ERROR, strprintf("Unable to load UTXO snapshot: %s. (%s)", util::ErrorString(activation_result).original, path.utf8string()));
     56     }
     57 
     58+    CBlockIndex& snap{*CHECK_NONFATAL(*activation_result)};
     59+
     60     UniValue result(UniValue::VOBJ);
     61     result.pushKV("coins_loaded", metadata.m_coins_count);
     62-    result.pushKV("tip_hash", metadata.m_base_blockhash.ToString());
     63-    result.pushKV("base_height", metadata.m_base_blockheight);
     64+    result.pushKV("tip_hash", snap.GetBlockHash().ToString());
     65+    result.pushKV("base_height", snap.nHeight);
     66     result.pushKV("path", fs::PathToString(path));
     67     return result;
     68 },
     69diff --git a/src/validation.cpp b/src/validation.cpp
     70index 2b8f64e81a..0b101d1551 100644
     71--- a/src/validation.cpp
     72+++ b/src/validation.cpp
     73@@ -5633,31 +5633,31 @@ Chainstate& ChainstateManager::InitializeChainstate(CTxMemPool* mempool)
     74     return destroyed && !fs::exists(db_path);
     75 }
     76 
     77-util::Result<void> ChainstateManager::ActivateSnapshot(
     78+util::Result<CBlockIndex*> ChainstateManager::ActivateSnapshot(
     79         AutoFile& coins_file,
     80         const SnapshotMetadata& metadata,
     81         bool in_memory)
     82 {
     83     uint256 base_blockhash = metadata.m_base_blockhash;
     84-    int base_blockheight = metadata.m_base_blockheight;
     85 
     86     if (this->SnapshotBlockhash()) {
     87         return util::Error{Untranslated("Can't activate a snapshot-based chainstate more than once")};
     88     }
     89 
     90+    CBlockIndex* snapshot_start_block{};
     91+
     92     {
     93         LOCK(::cs_main);
     94 
     95         if (!GetParams().AssumeutxoForBlockhash(base_blockhash).has_value()) {
     96             auto available_heights = GetParams().GetAvailableSnapshotHeights();
     97             std::string heights_formatted = util::Join(available_heights, ", ", [&](const auto& i) { return util::ToString(i); });
     98-            return util::Error{strprintf(Untranslated("assumeutxo block hash in snapshot metadata not recognized (hash: %s, height: %s). The following snapshot heights are available: %s"),
     99+            return util::Error{strprintf(Untranslated("assumeutxo block hash in snapshot metadata not recognized (hash: %s). The following snapshot heights are available: %s"),
    100                 base_blockhash.ToString(),
    101-                base_blockheight,
    102                 heights_formatted)};
    103         }
    104 
    105-        CBlockIndex* snapshot_start_block = m_blockman.LookupBlockIndex(base_blockhash);
    106+        snapshot_start_block = m_blockman.LookupBlockIndex(base_blockhash);
    107         if (!snapshot_start_block) {
    108             return util::Error{strprintf(Untranslated("The base block header (%s) must appear in the headers chain. Make sure all headers are syncing, and call loadtxoutset again"),
    109                           base_blockhash.ToString())};
    110@@ -5668,7 +5668,7 @@ util::Result<void> ChainstateManager::ActivateSnapshot(
    111             return util::Error{strprintf(Untranslated("The base block header (%s) is part of an invalid chain"), base_blockhash.ToString())};
    112         }
    113 
    114-        if (!m_best_header || m_best_header->GetAncestor(base_blockheight) != snapshot_start_block) {
    115+        if (!m_best_header || m_best_header->GetAncestor(snapshot_start_block->nHeight) != snapshot_start_block) {
    116             return util::Error{_("A forked headers-chain with more work than the chain with the snapshot base block header exists. Please proceed to sync without AssumeUtxo.")};
    117         }
    118 
    119@@ -5782,7 +5782,7 @@ util::Result<void> ChainstateManager::ActivateSnapshot(
    120         m_snapshot_chainstate->CoinsTip().DynamicMemoryUsage() / (1000 * 1000));
    121 
    122     this->MaybeRebalanceCaches();
    123-    return {};
    124+    return snapshot_start_block;
    125 }
    126 
    127 static void FlushSnapshotToDisk(CCoinsViewCache& coins_cache, bool snapshot_loaded)
    128diff --git a/src/validation.h b/src/validation.h
    129index 08e672c620..0638bbe983 100644
    130--- a/src/validation.h
    131+++ b/src/validation.h
    132@@ -1098,7 +1098,7 @@ public:
    133     //!   faking nTx* block index data along the way.
    134     //! - Move the new chainstate to `m_snapshot_chainstate` and make it our
    135     //!   ChainstateActive().
    136-    [[nodiscard]] util::Result<void> ActivateSnapshot(
    137+    [[nodiscard]] util::Result<CBlockIndex*> ActivateSnapshot(
    138         AutoFile& coins_file, const node::SnapshotMetadata& metadata, bool in_memory);
    139 
    140     //! Once the background validation chainstate has reached the height which
    

    maflcko commented at 2:57 pm on August 1, 2024:

    I forgot to mention this earlier, but to reply to your comment on IRC that this is only used in “uncritical” places:

    It is currently used in validation.cpp, added as part of #30320.

    This means:

    • The check accidentally serves as a validity check for the untrusted or possibly corrupt metadata value. (I don’t think this was intentional by the author of the pull request; If it was, it would be good to document it)
    • This may be used in new places in the future without proper checks, leading to more issues down the line.
    • In an otherwise fully valid utxo snapshot dump, with a single bit flip in the block height field in the metadata, the error message for the user will not be: “Corrupt metadata, block height corrupt”, but instead “A forked headers-chain with more work than the chain with the snapshot base block header exists. Please proceed to sync without AssumeUtxo.”

    This seems confusing at best.

    Steps to reproduce should be something like:

     0diff --git a/test/functional/feature_assumeutxo.py b/test/functional/feature_assumeutxo.py
     1index da7cabf87c..0c4eb126ed 100755
     2--- a/test/functional/feature_assumeutxo.py
     3+++ b/test/functional/feature_assumeutxo.py
     4@@ -68,8 +68,9 @@ class AssumeutxoTest(BitcoinTestFramework):
     5         parsing_error_code = -22
     6         bad_magic = 0xf00f00f000
     7         with open(bad_snapshot_path, 'wb') as f:
     8-            f.write(bad_magic.to_bytes(5, "big") + valid_snapshot_contents[5:])
     9+            f.write(valid_snapshot_contents[:11]+(1).to_bytes(4, "little")+valid_snapshot_contents[15:])
    10         assert_raises_rpc_error(parsing_error_code, "Unable to parse metadata: Invalid UTXO set snapshot magic bytes. Please check if this is indeed a snapshot file or if you are using an outdated snapshot format.", node.loadtxoutset, bad_snapshot_path)
    11+        assert False
    12 
    13         self.log.info("  - snapshot file with unsupported version")
    14         for version in [0, 2]:
    

    mzumsande commented at 5:12 pm on August 6, 2024:
    • I don’t think this was intentional by the author of the pull request;

    No, it wasn’t intentional, I didn’t consider the possibility of incorrect metadata - might simply use snapshot_start_block->nHeight instead of base_blockheight though I think?

  6. mzumsande commented at 5:14 pm on August 6, 2024: contributor

    From the issue:

    The additional information can be helpful to us in certain failure cases, especially if we discuss issues like in #30288 seriously.

    How? I could see how situations described in #30288 would require to add the block hash in a hypothetical scenario where we only saved the height - but we are in the reverse scenario where we already have the block hash, so I can’t really think of a scenario where haveing the block height would help. I also didn’t find any discussion about the reason for the block height in #29612 on a quick glance (but didn’t expand every comment).

  7. fjahr commented at 10:27 pm on August 6, 2024: contributor

    How?

    The height in the metadata clearly tells us which height the hash is expected at. Not having the same hash at that height clearly tells us that the node is on a different chain than was expected by the metadata. That is better than just saying “we don’t know this hash for whatever reason”.

    I also didn’t find any discussion about the reason for the block height in #29612 on a quick glance (but didn’t expand every comment).

    It was suggested here and when I added it along with the other suggested values, all reviewers ACK-ed it.

    An alternative would be to write a full file-content hash in the metadata as the second field (after the file magic), covering the whole file after the second field. This way, the file-content hash would be retrieved from the trusted chainparams, then calculated and compared. Afterwards, the code can assume that all fields in the checked file are trusted to some extent, which would allow using the block height value.

    Do you mean the serialized tx hash here? That’s different from the file content hash. We do have the serialized hash in the trusted assumeutxo data but not the file hash. I don’t think we want to have it in there because we want to force users of the data to calculate it themselves. And if there is a mismatch it’s just the same as with the block height where it seems to be an issue for you that it may mismatch. Or are you arguing that the file hash should be added to the chainparams as well? It’s unclear to me, honestly.

    You’ll have to check that the block height corresponds to the block hash, otherwise I fail to see how it can be used without adding bugs or risks or confusion. However, if you do the check, you may as well discard the deserialized value and just use the value from the check.

    The block height is intended as additional information that can help users debug what the issue is, like the error message in the suggested change above where it is removed. It doesn’t need to be checked for that. If the file is corrupt or the network is found in an unexpected state, it can be helpful as additional information as described above.

    In an otherwise fully valid utxo snapshot dump, with a single bit flip in the block height field in the metadata, the error message for the user will not be: “Corrupt metadata, block height corrupt”, but instead “A forked headers-chain with more work than the chain with the snapshot base block header exists. Please proceed to sync without AssumeUtxo.”

    I don’t think it’s fair criticism to say “this is confusing when I introduce a magic bitflip in this exact place”. There are literally infinite places all over the code base where that is true and we won’t get anywhere if this is the bar we need to meet. If I can just flip any bit in your block storage or utxo set, we are completely screwed and most importantly you probably wouldn’t even see an error until it’s too late while in the case discussed here, there is a failure right away (if we check) or there is no issue at all because the value is not used for anything critical as outlined before.

  8. fjahr commented at 10:32 pm on August 6, 2024: contributor

    The arguments here don’t convince me yet that the height really has to be dropped necessarily but I also seem to be less passionate about this issue than @maflcko et al. So I opened the suggested approach of removing the block height in #30598 and I pinged other reviewers of the original metadata PR to weigh in shortly.

    Ultimately what matters most to me is that we finally get this feature into the hands of users and I am fine to go along with either approach if it avoids further blocking of that.

  9. mzumsande commented at 11:16 pm on August 6, 2024: contributor

    The height in the metadata clearly tells us which height the hash is expected at. Not having the same hash at that height clearly tells us that the node is on a different chain than was expected by the metadata. That is better than just saying “we don’t know this hash for whatever reason”.

    Ah, ok, my point is that the height is implied by the block hash, together with the block index db of the node. If the block tree db doesn’t contain a header with that block hash, then we abort anyway. If it contains the index but we don’t have a registered snapshot for this height in the chainparams, we can look the attempted height up from the block index, e.g. for a user-facing error message. But it should be impossible to construct an alternative block tree db where a block with the given hash is at a different height (unless you can break sha256…) so providing the height doesn’t add any information in my opinion.

  10. theStack commented at 10:18 am on August 7, 2024: contributor
    Not sure if this is a convincing argument for anyone, but I can at least see a point of keeping the block height for purely informative purposes, in case the snapshot is processed by external tools which don’t have access to the block index, e.g. #27432 (or, at some point even the https://linux.die.net/man/1/file utility, which can not only determine a file type, but also emit a short meta-data summary, if properly implemented in the magic file). Wouldn’t it be an alternative to simply keep the format as-is, but use the block height only for an immediate sanity check in Bitcoin Core, i.e. throw if it doesn’t match the included block hash?
  11. maflcko commented at 10:29 am on August 7, 2024: member

    The arguments here don’t convince me yet that the height really has to be dropped necessarily but I also seem to be less passionate about this issue than @maflcko et al.

    Just to clarify, I am happy to review and ack this approach here, but I haven’t acked the current state of the pull request, because it leaves bugs unfixed. I haven’t tried it, but #30516 (review) (the error message for the user will not be: “Corrupt metadata, block height corrupt”, but instead “A forked headers-chain with more work than the chain with the snapshot base block header exists. Please proceed to sync without AssumeUtxo.”) should still be reproducible on top of this branch currently.

    If you keep the metadata, it should be renamed to m_base_blockheight_untrusted, so that new places don’t use it by accident, like #30516 (review).

    Other than that, I agree this is mostly a bike-shedding discussions whether or not external tools (or error messages) display the height and hash, or just the hash.

  12. theStack commented at 11:17 am on August 7, 2024: contributor

    @maflcko: I think the simplest (and IMHO most obvious) solution would be to just verify the height as well, then all of the problems you described should be solved? E.g. with something like

     0diff --git a/src/validation.cpp b/src/validation.cpp
     1index d8c1c27aae..612bb960e5 100644
     2--- a/src/validation.cpp
     3+++ b/src/validation.cpp
     4@@ -5660,10 +5660,11 @@ util::Result<void> ChainstateManager::ActivateSnapshot(
     5     {
     6         LOCK(::cs_main);
     7 
     8-        if (!GetParams().AssumeutxoForBlockhash(base_blockhash).has_value()) {
     9+        auto maybe_assumeutxo_data = GetParams().AssumeutxoForBlockhash(base_blockhash);
    10+        if (!maybe_assumeutxo_data.has_value() || maybe_assumeutxo_data->height != base_blockheight) {
    11             auto available_heights = GetParams().GetAvailableSnapshotHeights();
    12             std::string heights_formatted = util::Join(available_heights, ", ", [&](const auto& i) { return util::ToString(i); });
    13-            return util::Error{strprintf(Untranslated("assumeutxo block hash in snapshot metadata not recognized (hash: %s, height: %s). The following snapshot heights are available: %s"),
    14+            return util::Error{strprintf(Untranslated("assumeutxo block hash/height in snapshot metadata not recognized (hash: %s, height: %s). The following snapshot heights are available: %s"),
    15                 base_blockhash.ToString(),
    16                 base_blockheight,
    17                 heights_formatted)};
    
  13. maflcko commented at 11:54 am on August 7, 2024: member
    I haven’t tried to compile that suggestion, but it’ll probably fail -Wsign-compare. Also, the problem remains that future code may accidentally use it before it is checked (or after the check failed). So I’d say that a rename to m_base_blockheight_untrusted could still make sense. An alternative may be to move the metadata parsing into the ActivateSnapshot function itself (instead of parsing it outside and passing it to the function, exposing it to a greater scope than needed). Any metadata values that are validated, can then be returned in a struct from ActivateSnapshot, and hopefully it will be clearer to code readers that any values in the metadata need to be checked before they can be used?
  14. ismaelsadeeq commented at 4:17 pm on August 7, 2024: member

    I haven’t tried to compile that suggestion, but it’ll probably fail -Wsign-compare.

    Indeed compiling this emits -Wsign-compare warning

     0
     1Making all in src
     2  CXX      libbitcoin_node_a-validation.o
     3validation.cpp:5664:81: warning: comparison of integers of different signs: 'int' and 'uint32_t' (aka 'unsigned int') [-Wsign-compare]
     4        if (!maybe_assumeutxo_data.has_value() || maybe_assumeutxo_data->height != base_blockheight){
     5                                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~
     61 warning generated.
     7  AR       libbitcoin_node.a
     8  GEN      obj/build.h
     9  CXXLD    bitcoind
    10</details>
    
  15. in test/functional/feature_assumeutxo.py:105 in 51f197bd84
     99@@ -100,14 +100,19 @@ def expected_error(log_msg="", rpc_details=""):
    100         # The height is not used for anything critical currently, so we just
    101         # confirm the manipulation in the error message
    102         bogus_height = 1337
    103-        signed_overflow_height = 3275262676
    104         for bad_block_hash in [bogus_block_hash, prev_block_hash]:
    105-            for bad_height in [bogus_height, signed_overflow_height]:
    


    ismaelsadeeq commented at 4:23 pm on August 7, 2024:
    In 51f197bd84916c494e9250926776b9efc3225100 “Assumeutxo: Sanitize block height in assumeutxo metadata " This change was added in 98e119da35c4f7e74f0c423974d497e350c8e9fa and then removed in this commit. Why not just add it like this in the previous commit with the same error msg string, and then update the msg string in this commit?

    fjahr commented at 9:39 pm on August 7, 2024:
    See the description: “The first commit shows how to reproduce the error shown in #30514 and how it could be fixed with a minimal change. The second commit adds an explicit sanitizer that will probably be the preferred approach. I can squash the commits but first I would like to hear conceptual feedback since #30514 suggested to remove the block height instead.”

    fjahr commented at 9:53 pm on August 7, 2024:
    So I was still hunting for the conceptual consensus and didn’t change this yet. I think we are coming to an end though (see my comment below). It would be great if you could indicate where you stand on this though @ismaelsadeeq . If more support for the approach here I will squash the commits and implement the suggested improvements.

    ismaelsadeeq commented at 12:40 pm on August 8, 2024:
    @fjahr, I reviewed the code primarily to address PRs targeting issues in the v28.0 milestone. I haven’t gone through all the conceptual discussions here, as I’m still getting up to speed with the direction of the assumeutxo project.
  16. in src/validation.cpp:5639 in 98e119da35 outdated
    5638@@ -5639,7 +5639,7 @@ util::Result<void> ChainstateManager::ActivateSnapshot(
    5639         bool in_memory)
    


    ismaelsadeeq commented at 4:39 pm on August 7, 2024:
    AssumeutxoData::height is int, IMO would be better for them to be the same type, so should we consider changing it also to uint32_t? this will avoid the issue #30516 (comment) during comparison or having to explicitly cast one.

    fjahr commented at 9:41 pm on August 7, 2024:
    We would probably have to cast somewhere else then because nHeight is int.
  17. ismaelsadeeq commented at 4:39 pm on August 7, 2024: member
    Tested this on master
  18. fjahr commented at 9:42 pm on August 7, 2024: contributor

    Just to clarify, I am happy to review and ack this approach here, but I haven’t acked the current state of the pull request, because it leaves bugs unfixed

    Thanks, that was not clear to me indeed.

    I haven’t tried it, but #30516 (review) (the error message for the user will not be: “Corrupt metadata, block height corrupt”, but instead “A forked headers-chain with more work than the chain with the snapshot base block header exists. Please proceed to sync without AssumeUtxo.”) should still be reproducible on top of this branch currently.

    I don’t understand how we can fix that issue to work perfectly for all cases. IMO it is unfixable because we can not know if a value that doesn’t match is corrupt or not a match. If there is a bitflip in the version field we will show an error message that the version doesn’t match and not that the version field is corrupt. If there is a bitflip in the network magic we will show an error that the network doesn’t match and not that the network magic is corrupt and so on…

  19. fjahr commented at 9:50 pm on August 7, 2024: contributor

    A little recap so far: While I still don’t find the arguments against having the height included convincing I am currently leaning towards going with the approach in #30598 and removing it for now. My reasoning is as follows:

    • While the arguments against may not be strong I conceive that it would be better if we had a clear plan for how we use the height which would give a stronger argument for keeping it in. We can re-add the height in a new version together with a clear plan of how it is going to be used.
    • Removing the height is a much cleaner change. Due to the lack of a clear plan and the arguments exchanged here in the last ~8 hours it became evident to me that we might get stuck in discussions about how to handle all eventual cases with the approach of keeping the height, while in the removal PR there is not really anything to discuss because the contentious part is, well, removed.
    • Looking at the engagement there seem to be more proponents for removal: Pro removal 3 explicit ACKs in the removal PR (@maflcko , @BrandonOdiwuor , @TheCharlatan ) and I am implying @mzumsande is also for removal based on his comments here (please confirm!). Pro keeping height are @theStack and @Sjors . (@ismaelsadeeq since you have started reviewing this change, could you please indicate if this is because you prefer this change?)

    The primary downside to removing the height seems to be that we need to produce a new torrent but I hope @Sjors would be fine with that and I am currently the only reviewer of #28553 as well.

    I will keep this PR here open for discussion for at least a few more hours at least since there may be some contributors in other timezones that were not able to comment yet. But if no new findings arise I will move forward with #30598 for reasons outlined above.

  20. Sjors commented at 8:54 am on August 8, 2024: member

    I don’t mind making a new torrent.

    I’m still puzzled why all of a sudden a single integer in a serialized file is such a big deal that removing it is simpler than fixing it. But I haven’t reviewed this PR yet.

    It’s true that we can add it back later, but it seems wasteful if we first have to review #30598 only to revert it later and still having to review the changes here.

    Even if we never want it back, it seems very likely we’ll add an integer to some other serialized thing in the future. We should just be able to do that. But again, maybe there’s something really special about the integer here that I missed.

    Approach wise, it makes sense to me to have two commits.

  21. in src/node/utxo_snapshot.h:102 in 51f197bd84
     98@@ -99,6 +99,9 @@ class SnapshotMetadata
     99         }
    100 
    101         s >> m_base_blockheight;
    102+        if (m_base_blockheight > static_cast<uint32_t>(std::numeric_limits<int>::max())) {
    


    Sjors commented at 10:19 am on August 8, 2024:

    51f197bd84916c494e9250926776b9efc3225100 maybe use int32_t instead of int for clarity?

    0std::numeric_limits<int32_t>::max())
    
  22. Sjors commented at 10:47 am on August 8, 2024: member

    Just to clarify, I am happy to review and ack this approach here, but I haven’t acked the current state of the pull request, because it leaves bugs unfixed

    This PR fixes the undefined behavior that the sanitizer picked up in #30514. I think that’s sufficient for now. We can improve handling of corrupt and/or malicious files later, which (unless I missed something) is what @maflcko’s critique is about.

    Afaik any corrupt (accidentally or otherwise) snapshot file will ultimately be rejected by the node because height, block_hash, nChainTx or hash_serialized doesn’t match the hardcode chainparams. That all happens before we activate a snapshot and start syncing to a potentially malicious tip. So worst case you’re wasting ten minutes of user time, plus whatever time they need to deal with the confusing error message.

    Assuming I didn’t miss anything in my reasoning above:

    ACK 51f197bd84916c494e9250926776b9efc3225100

    Using 98e119da35c4f7e74f0c423974d497e350c8e9fa I was able to reproduce the test failure by undoing the change in validation.cpp - without the help of a sanitiser, so that’s nice. I think it’s useful to keep this commit.

    For the second commit maybe add to the commit message:

    0Review hint: `--color-moved=dimmed-zebra`
    

    Though it’s easier if you change the first commit in a way that avoids this code churn in the second commit, i.e. by copy-pasting rather than expanding the loop.

  23. Sjors approved
  24. maflcko commented at 11:56 am on August 8, 2024: member

    This PR fixes the undefined behavior that the sanitizer picked up in #30514.

    I don’t think this is UB. Personally I couldn’t care less about a implicit fuzz integer issue, and I don’t think any production user cares either, because fuzz tests aren’t run in production.

    I don’t see the value in only silencing the fuzz target, but leaving the other reported bugs unfixed. If silencing the fuzz target was a goal, it could be done with a trivial temporary one-line patch to the suppressions file. Once there is a real fix, the temporary suppression can be removed. I just don’t see the point of changing real code temporarily, and then revisit it in a “follow-up”. Either the reported bugs are worthy to fix, in which case they should be fixed, or they are not, in which case they shouldn’t be fixed and this can just be replaced with a sanitizer suppression.

  25. fjahr commented at 12:36 pm on August 8, 2024: contributor
    It doesn’t seem like we can come to an agreement quickly here, so I am closing this in favor of #30598 so we can move forward with the mainnet params. I would appreciate your reviews there, thank you!
  26. fjahr closed this on Aug 8, 2024

  27. achow101 referenced this in commit 9a696397e7 on Aug 9, 2024
  28. Theghost256 approved

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me