implicit-integer-sign-change in ActivateSnapshot #30514

issue maflcko openend this issue on July 23, 2024
  1. maflcko commented at 7:37 pm on July 23, 2024: member
     0$ echo 'dXR4b/8BAPq/tdph' | base64 --decode > ./crash
     1
     2$ UBSAN_OPTIONS="suppressions=$PWD/test/sanitizer_suppressions/ubsan:print_stacktrace=1:halt_on_error=1:report_error_type=1" FUZZ=utxo_snapshot ./src/test/fuzz/fuzz ./crash
     3INFO: Running with entropic power schedule (0xFF, 100).
     4INFO: Seed: 2946336184
     5INFO: Loaded 1 modules   (625551 inline 8-bit counters): 625551 [0x572ffe28af78, 0x572ffe323b07), 
     6INFO: Loaded 1 PC tables (625551 PCs): 625551 [0x572ffe323b08,0x572ffecaf3f8), 
     7./src/test/fuzz/fuzz: Running 1 inputs 1 time(s) each.
     8Running: ./crash
     9validation.cpp:5657:28: runtime error: implicit conversion from type 'uint32_t' (aka 'unsigned int') of value 3275262676 (32-bit, unsigned) to type 'int' changed the value to -1019704620 (32-bit, signed)
    10    [#0](/bitcoin-bitcoin/0/) 0x572ffc2f8a69 in ChainstateManager::ActivateSnapshot(AutoFile&, node::SnapshotMetadata const&, bool) src/validation.cpp:5657:28
    11    [#1](/bitcoin-bitcoin/1/) 0x572ffb5b93ac in (anonymous namespace)::utxo_snapshot_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>)::$_0::operator()() const src/test/fuzz/utxo_snapshot.cpp:83:27
    12    [#2](/bitcoin-bitcoin/2/) 0x572ffb5b6896 in (anonymous namespace)::utxo_snapshot_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>) src/test/fuzz/utxo_snapshot.cpp:96:9
    13    [#3](/bitcoin-bitcoin/3/) 0x572ffb73ac5d in std::function<void (std::span<unsigned char const, 18446744073709551615ul>)>::operator()(std::span<unsigned char const, 18446744073709551615ul>) const /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9
    14    [#4](/bitcoin-bitcoin/4/) 0x572ffb73ac5d in LLVMFuzzerTestOneInput src/test/fuzz/fuzz.cpp:209:5
    15
    16SUMMARY: UndefinedBehaviorSanitizer: implicit-integer-sign-change validation.cpp:5657:28 
    

    The reason is that metadata is untrusted input, but it isn’t sanitized at that point. I presume it can also be reproduced with the integer sanitizer and:

     0diff --git a/test/functional/feature_assumeutxo.py b/test/functional/feature_assumeutxo.py
     1index 943862f8cf..fcb894c924 100755
     2--- a/test/functional/feature_assumeutxo.py
     3+++ b/test/functional/feature_assumeutxo.py
     4@@ -113,10 +113,10 @@ class AssumeutxoTest(BitcoinTestFramework):
     5         bogus_block_hash = "0" * 64  # Represents any unknown block hash
     6         # The height is not used for anything critical currently, so we just
     7         # confirm the manipulation in the error message
     8-        bogus_height = 1337
     9+        bogus_height = -1019704620
    10         for bad_block_hash in [bogus_block_hash, prev_block_hash]:
    11             with open(bad_snapshot_path, 'wb') as f:
    12-                f.write(valid_snapshot_contents[:11] + bogus_height.to_bytes(4, "little") + bytes.fromhex(bad_block_hash)[::-1] + valid_snapshot_contents[47:])
    13+                f.write(valid_snapshot_contents[:11] + bogus_height.to_bytes(4, "little", signed=True) + bytes.fromhex(bad_block_hash)[::-1] + valid_snapshot_contents[47:])
    14 
    15             msg = f"Unable to load UTXO snapshot: assumeutxo block hash in snapshot metadata not recognized (hash: {bad_block_hash}, height: {bogus_height}). The following snapshot heights are available: 110, 200, 299."
    16             assert_raises_rpc_error(-32603, msg, node.loadtxoutset, bad_snapshot_path)
    
  2. maflcko commented at 7:38 pm on July 23, 2024: member

    Generally, I am not sure if the metadata is the right place to add the block height. It is just another field that may be corrupt (intentionally, or accidentally) and may lead to issues down the line, when used in a context that hasn’t fully checked it.

    My recommendation would be to just remove it from the metadata. The block hash should be sufficient to get the height (for logging and other purposes). And if the block headers are insufficient to get the height, ActivateSnapshot will error anyway, due to missing headers.

  3. fjahr commented at 10:24 pm on July 23, 2024: contributor

    Nice find, thanks for reporting!

    I think it is worth keeping the block height around even if it seems completely redundant in the base case. The additional information can be helpful to us in certain failure cases, especially if we discuss issues like in #30288 seriously. So, I am suggesting to keep the block height and sanitize it in #30516.

    I presume it can also be reproduced with the integer sanitizer and:

    Not quite, I have added the test to reproduce the failure in the first commit: https://github.com/bitcoin/bitcoin/pull/30516/commits/98e119da35c4f7e74f0c423974d497e350c8e9fa

  4. maflcko commented at 6:59 am on July 24, 2024: member

    I presume it can also be reproduced with the integer sanitizer and:

    Not quite

    Can you clarify what you mean with “not quite”? Locally, if I compile with undefined,integer,float-divide-by-zero and run my diff with: UBSAN_OPTIONS="suppressions=$PWD/test/sanitizer_suppressions/ubsan:print_stacktrace=1:halt_on_error=1:report_error_type=1" ./test/functional/feature_assumeutxo.py, I get the same crash.

    Happy to take a look, if you share your steps from a clean build.

  5. fjahr commented at 4:50 am on July 25, 2024: contributor

    Can you clarify what you mean with “not quite”? Locally, if I compile with undefined,integer,float-divide-by-zero and run my diff with: UBSAN_OPTIONS=“suppressions=$PWD/test/sanitizer_suppressions/ubsan:print_stacktrace=1:halt_on_error=1:report_error_type=1” ./test/functional/feature_assumeutxo.py, I get the same crash.

    Ah, it wasn’t clear to me that the I would have to run it with the sanitizer. That’s why it didn’t fail for me.

    You can reproduce the issue and make the test fail with the following edit on master and without needing sanitizers:

    0-        bogus_height = 1337
    1+        bogus_height = 3275262676
    

    That’s a simplified version of the test I linked to in https://github.com/bitcoin/bitcoin/commit/98e119da35c4f7e74f0c423974d497e350c8e9fa, I just wanted to keep the old bogus number around too.

    This produces:

    0substring: 'Unable to load UTXO snapshot: assumeutxo block hash in snapshot metadata not recognized (hash: 0000000000000000000000000000000000000000000000000000000000000000, height: 3275262676). The following snapshot heights are available: 110, 200, 299.'
    1error message: 'Unable to load UTXO snapshot: assumeutxo block hash in snapshot metadata not recognized (hash: 0000000000000000000000000000000000000000000000000000000000000000, height: -1019704620). The following snapshot heights are available: 110, 200, 299. (/var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_e19uh_0n/node0/regtest/utxos.dat.mod)'.
    

    And I think that’s a tick closer to what the fuzzer did if I interpret the error correctly. I think it handed in 3275262676 as the block height as well.

  6. maflcko added this to the milestone 28.0 on Aug 2, 2024
  7. maflcko added the label Bug on Aug 2, 2024
  8. Sjors commented at 12:21 pm on August 7, 2024: member

    As mentioned in #30598 (comment) I also prefer to keep height around and just (sanity) check it.

    Maybe make it a uint64 if you have to touch it anyway. (no that means I have to make a new torrent)

  9. Sjors commented at 12:39 pm on August 7, 2024: member

    There’s two things I’d like to be able to do in the future. One doesn’t require having the height in the metadata, but the other does.

    1. I’d like to keep the option to improve the The base block header (%s) must appear in the headers chain. message by checking the snapshot height.

    In the normal case, where header sync is still in progress, we can change it to:

    0The base block header (%s) must appear in the headers chain at height %d (currently synced to %d).
    

    If we already have a header at that height, it suggest there’s a large fork, see #30288. In that case a different message could warn about that.

    But since the height is hardcoded in m_assumeutxo_data, we can get it from there. It doesn’t need to be in the metadata file.

    1. As we add new snapshots in future releases, a user might download a recent snapshot and try to use it in an older release. That won’t work, and the current message “The base block header (%s) must appear in the headers chain.” is too cryptic.

    Instead, if we keep the height in the metadata we can do something like:

    0if (metadata.height > latest_known_snapshot.height) {
    1   // "Unsupported snapshot for height %d, considering downloading a more recent Bitcoin Core release."
    2} else {
    3   // Probably a malicious snapshot
    4}
    
  10. maflcko commented at 1:18 pm on August 7, 2024: member

    But since the height is hardcoded in m_assumeutxo_data, we can get it from there. It doesn’t need to be in the metadata file.

    Correct.

    // “Unsupported snapshot for height %d, considering downloading a more recent Bitcoin Core release.”

    Not sure about error messages to urge users to upgrade, when triggering the error message is in the hands of untrusted input. Otherwise, it seems too easy for a user to download Bitcoin Core, then download a malicious snapshot from a random website, then go back to the random website to discover (oh, how convenient) that they offer “the most recent version of Bitcoin Core”, which is a solution to their problem :sparkles:

    As we add new snapshots in future releases, a user might download a recent snapshot and try to use it in an older release. That won’t work, and the current message “The base block header (%s) must appear in the headers chain.” is too cryptic.

    Are you sure? The error message that the chain-params are missing is the error message that list the available blocks in the params.

    The error message “The base block header (%s) must appear in the headers chain” happens when the headers are syncing and it correctly says currently: “Make sure all headers are syncing”.

  11. maflcko commented at 1:38 pm on August 7, 2024: member

    Also, to reply to #30598 (comment) (replying here, to keep the “List of features that supposedly need a block height” in one place)

    If the user tries to load a snapshot during header sync, we can give an indication of how long they have to wait. If we’re already past the expected height, we can say something useful about potential forks.

    Header (pre-)sync progress is already reported as part of the pull request that introduced header pre-sync. Reporting that value as an indication can be done regardless of any metadata at all. If you want to report the header pre-sync progress value for the block hash in the metadata, again the height isn’t needed, because, as you say, it is already hardcoded in m_assumeutxo_data.

  12. Sjors commented at 1:39 pm on August 7, 2024: member

    I haven’t tested the error message.

    list the available blocks in the params

    That would be better, but it’s too vague for the user to realize they need to install a newer version.

    Not sure about error messages to urge users to upgrade

    I agree that needs more thinking. But if we don’t show the message, the “handy tutorial” written by the attacker will.

    it correctly says currently: “Make sure all headers are syncing”

    That’s not correct in the event of a large fork, where this message would be shown forever.

  13. maflcko commented at 2:03 pm on August 7, 2024: member

    it correctly says currently: “Make sure all headers are syncing”

    That’s not correct in the event of a large fork, where this message would be shown forever.

    Correct. However, it seems unrelated to the metadata discussion. At least I fail to see how adding the block height to the metadata in this case helps, when the height is already hardcoded in m_assumeutxo_data. Even if it wasn’t, the error message can (and should be) improved for this case, regardless of the metadata contents. (But this is probably best done in a separate pull request)

    I am happy to review any pull request related to this discussion.

  14. Sjors commented at 2:15 pm on August 7, 2024: member

    I fail to see how adding the block height to the metadata in this case helps, when the height is already hardcoded in m_assumeutxo_data.

    When the user loads a recent snapshot into an older version of Bitcoin Core the height won’t be in m_assumeutxo_data. That’s why we should keep it the metadata (even if we need to be careful on how to render the error).

  15. maflcko commented at 3:17 pm on August 7, 2024: member

    I think there was a slight miscommunication. My reply was to your point of a “large fork”, whose check (currently) happens after the chain params are asked for the hash: https://github.com/bitcoin/bitcoin/blob/676abd1af754964858a60fddffb9a12c0a6c2749/src/validation.cpp#L5663 .

    You also raised a different point about a user downloading a too recent snapshot. This is a different scenario, however the same check in validation will hit it currently. I agree that the error message can be improved in both cases.

    If you really want to report the height back to the user, I think it should also be fine to just require all headers to be synced to at least include all AU-block-hashes in the chain params. Moving this to be the first check should allow to print an error message about either an ongoing header pre-sync, or a large fork. Once the header pre-sync is done, any block hash can either be resolved to a height, or it is corrupt, or it is from a “post-AU-params large fork”. Once the correct height is known, it can be used in an any other error message to the user, possibly to tell them which block heights are supported and which block height they asked for.

    However, I don’t think the metadata block height can reliably be used to differentiate between a corrupt block hash and a “post-AU-params large fork”. So using it here, is confusing at best, when the error messages are picked based on the height, which comes from an untrusted source, or may be corrupt as well.

    (I’d say that file corruption is more likely to happen than a large fork, because one requires work, and the other happens by doing nothing and waiting long enough (or doing something trivially low-work actively). So assuming the block height is corrupt as well is the most plausible explanation, or at least it can’t be ruled out. However, that knowledge can’t be used to rely on.)

    Maybe I am missing something obvious? Generally, it is easier for me to understand C++ code, so if you have the time to write down your idea in code, I am happy to take a look and try to write a test to break it.

  16. achow101 closed this on Aug 9, 2024

  17. achow101 referenced this in commit 9a696397e7 on Aug 9, 2024
  18. maflcko commented at 7:59 am on August 13, 2024: member

    Personally, what I find a lot more interesting than the discussion whether or not a metadata field should exist, is the evaluation of how a fuzz engine can find the initially reported bug at all.

    Right now, I couldn’t get a fuzz engine to find it, regardless of what I do.

    (The initial finding was due to a manually written fuzz input generator, but that seems suboptimal, because it is difficult to scale and it would be better if a fuzz engine could find the bug by itself).

    I think it is due to the fuzz engine (libFuzzer) preferring smaller inputs to reach the same coverage, so it’ll probably pick the path where a “synthetic” metadata is generated, because it requires less input bytes than reading the full metadata from the fuzz input. Since the “synthetic” metadata path doesn’t generate arbitrary bit flips in the sign bit, a fuzz engine can’t find the bug. (I haven’t verified this thought)

    Also, the fuzz target is slow, so that should probably be fixed first.

  19. maflcko commented at 8:23 am on August 13, 2024: member

    Also, the fuzz target is slow, so that should probably be fixed first.

    Initial attempt in https://github.com/bitcoin/bitcoin/pull/30644


maflcko fjahr Sjors

Labels
Bug

Milestone
28.0


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me