qa: make feature_assumeutxo.py test more robust #32117

pull darosior wants to merge 1 commits into bitcoin:master from darosior:2503_more_assumeutxo_test_unbrittling changing 1 files +14 −1
  1. darosior commented at 9:29 pm on March 21, 2025: member

    This is another robustness fix that i dropped from #31907, intending to just tweak the error in my branch which triggers the issue. As i played more with it i eventually decided to submit a proper fix instead. Since the rationale initially confused reviewers i tried to explain it in details in the commit message (reproduced here in OP) with enough context.

    The feature_assumeutxo.py functional test contains a set of checks for expected errors when a snapshot can be parsed but contains incorrect values. Some of these checks are brittle as they assert an error that is raised not only because of the injected fault, but also because of the following data in the snapshot. A previous occurence of this was fixed in e5ff4e416ecc8a51367022eb8a7a291f8cbc0c65.

    In this commit, we look into another check of the same category. This one inserts an invalid coins count for the first transaction in the snapshot. It asserts that by doing so the snapshot validation logic will try to deserialize an additional coin for the first transaction, while it fact it will be deserializing the txid of the next transaction. It then asserts that this attempt to deserialize the txid of the next transaction will result in a specific error: a coin height higher than the base height (returning the string “Bad snapshot data after deserializing 1 coins.”).

    This is an issue as it relies on data outside of the fault injected to assert the error string. For instance the next txid could well deserialize in a coin at a height lower than the base height, or (more probably) not successfully deserialize into a coin at all!

    Fix this by injecting all the data necessary to reliably assert the expected error: an invalid coins count, followed by a valid coin, followed by a valid-serialization coin with an invalid height (which in this scenario is supposed to be the next coin’s txid).

  2. qa: make feature_assumeutxo.py test more robust
    The feature_assumeutxo.py functional test contains a set of checks for expected errors when a
    snapshot can be parsed but contains incorrect values. Some of these checks are brittle as they
    assert an error that is raised not only because of the injected fault, but also because of the
    following data in the snapshot. A previous occurence of this was fixed in
    e5ff4e416ecc8a51367022eb8a7a291f8cbc0c65.
    
    In this commit, we look into another check of the same category. This one inserts an invalid coins
    count for the first transaction in the snapshot. It asserts that by doing so the snapshot validation
    logic will try to deserialize an additional coin for the first transaction, while it fact it will be
    deserializing the txid of the next transaction. It then asserts that this attempt to deserialize the
    txid of the next transaction will result in a specific error: a coin height higher than the base
    height (returning the string "Bad snapshot data after deserializing 1 coins.").
    
    This is an issue as it relies on data outside of the fault injected to assert the error string. For
    instance the next txid could well deserialize in a coin at a height lower than the base height, or
    (more probably) not successfully deserialize into a coin at all!
    
    Fix this by injecting all the data necessary to reliably assert the expected error: an invalid coins
    count, followed by a valid coin, followed by a valid-serialization coin with an invalid height
    (which in this scenario is supposed to be the next coin's txid).
    a7c57cf48a
  3. DrahtBot commented at 9:29 pm on March 21, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32117.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK rkrux, fjahr

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32155 (miner: timelock the coinbase to the mined block’s height by darosior)
    • #32096 (Move some tests and documentation from testnet3 to testnet4 by Sjors)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. DrahtBot added the label Tests on Mar 21, 2025
  5. fanquake requested review from fjahr on Mar 23, 2025
  6. in test/functional/feature_assumeutxo.py:143 in a7c57cf48a
    139@@ -139,7 +140,19 @@ def expected_error(msg):
    140         cases = [
    141             # (content, offset, wrong_hash, custom_message)
    142             [b"\xff" * 32, 0, "7d52155c9a9fdc4525b637ef6170568e5dad6fabd0b1fdbb9432010b8453095b", None],  # wrong outpoint hash
    143-            [(2).to_bytes(1, "little"), 32, None, "Bad snapshot data after deserializing 1 coins."],  # wrong txid coins count
    144+            # Enter an invalid number of coins for the first txid: 2 instead of 1. The logic will
    


    rkrux commented at 11:41 am on March 24, 2025:

    The comment says this is an invalid number of coins here but the validation code does actually read the said number of coins. Maybe mention the reason why the number of coins is invalid at this stage? https://github.com/bitcoin/bitcoin/blob/770d39a37652d40885533fecce37e9f71cc0d051/src/validation.cpp#L5928-L5938

    Mentioned this because the actual test failure error is caused by invalid height being passed in the serialized coin data, I’m trying to tie it up to why the number of coins is invalid. https://github.com/bitcoin/bitcoin/blob/770d39a37652d40885533fecce37e9f71cc0d051/src/validation.cpp#L5944-L5949

  7. in test/functional/feature_assumeutxo.py:151 in a7c57cf48a
    147+            # To get the expected error we serialize the first bytes of the would-be next txid as a
    148+            # valid coin with an invalid height (here 300, when the snapshot base height is 299).
    149+            [
    150+                ser_compact_size(2)  # coins count
    151+                + ser_compact_size(0) + ser_varint(1) + ser_varint(compress_amount(0)) + ser_varint(0) + b"\x00"*20  # first coin, height 1
    152+                + ser_compact_size(0) + ser_varint(300 * 2) + ser_varint(compress_amount(0)) + ser_varint(0) + b"\x00"*20,  # second coin, height 300
    


    rkrux commented at 11:51 am on March 24, 2025:
    0-ser_varint(300 * 2)
    1+ser_varint((SNAPSHOT_BASE_HEIGHT + 1) * 2)
    

    darosior commented at 10:41 pm on March 25, 2025:
    Will do if this gets more Concept ACKs.
  8. rkrux commented at 11:52 am on March 24, 2025: contributor

    Concept ACK a7c57cf48a147c3ae3709f0630a3ba175aa4c841

    This is much clearer compared to last time. I have reviewed this PR from the POV of someone new to the assumeutxo feature and asked a question because there appears to be two points that I have not been able to tie together yet.

  9. fjahr commented at 12:45 pm on March 24, 2025: contributor

    The intent is clearer to me now as well, thanks. I think the new test is valuable but I am not sure that this means that the old test has to be removed as well. These are two different tests now, the old test is testing a very simple off by one error and the new one is more elaborate. I don’t really see why we have to throw away the first one.

    The fear seems to be that the structure of the dump changes and that would require touching the test. I’m just not that concerned about that because these tests overall have very strong assumptions about the structure of the dump (with the offset for example) and I expect will need to touch them anyway if we make changes to the dump file. But I would be fine if the error message is loosened so that it’s not checking the exact error message but instead just ensures that there is a failure without expecting that exact message.

  10. darosior commented at 10:36 pm on March 25, 2025: member

    These are two different tests now

    No they are not. This just made the existing test deterministic, of sorts.

    the old test is testing a very simple off by one error

    To be honest i don’t think the test as it stands today in master is all that useful. It claims to be testing a case where the coins count is too large. A too large coins count would “overflow” reading coins into the next coin’s txid. In all likelihood this should fail to deserialize and return a “Bad snapshot format or truncated snapshot after deserializing” error. By some luck it happens to deserialize with the current snapshot data, so this actually tests the case where a coin has a too high block height. It seems the person who introduced this test just asserted the first error that arised when injecting this fault without thinking too much what this is supposed to be testing.

    these tests overall have very strong assumptions about the structure of the dump (with the offset for example)

    If they do then i think this should be fixed, which is what i propose in this PR. But it is not true that all those tests are similarly brittle. Let’s go case by case: https://github.com/bitcoin/bitcoin/blob/c0b7159de47592c95c6db061de682b4af4058102/test/functional/feature_assumeutxo.py#L140-L155

    • The first one replaces the txid of the first coin with a different one. So it wouldn’t break if the snapshot data changes.
    • The second one is the one fixed in this PR.
    • The third one overwrite the coins count with an insanely high value and would still run into this error if the snapshot data changes.
    • The third one replaces the vout of the first coin in the first transaction with 1 instead of 0. It would not break if the snapshot data were to change.
    • The fourth one does the same for the coin height+coinbase bit. Same it would not break.
    • The fifth one is equivalent to the fourth one.
    • The sixth one tests an invalid coin block height. It would still run into this error if the snapshot data around it were to change.
    • The seventh and final one is the one that was fixed in #31907.

    So besides the one that is fixed in this PR and the one that was fixed in my previous PR, none are similarly brittle.

  11. fjahr commented at 11:01 pm on March 25, 2025: contributor

    But it is not true that all those tests are similarly brittle.

    You are looking at these tests in isolation, that’s not what I was talking about. The offsets of these test cases are still just as brittle and this carries through all the tests that would follow a mismatch. For example the coins count is a VARINT so it can have a different size and if its size of that VARINT changes in the dump then all the following offsets are wrong. All I am saying is that the test still has very strong assumptions about the structure of the dump.

  12. darosior commented at 7:38 pm on March 27, 2025: member

    For what it’s worth #32155 is the branch i was working on and the reason for this PR. I just wanted to avoid changing the error messages there and let reviewers figure out why it’s fine (which involves learning the structure of the snapshot, familiarizing themselves with the offset, and understanding that actually the error message depends on the next txid in the snapshot and therefore may change if the txid change).

    I usually find that “change the error message until the test pass” to be a big red flag in review and wanted to save time to reviewer of what i consider an important PR by doing the right thing with this preparatory PR. Given the feedback here i just went with “update the error message and let people figure it out” in #32155. If this PR gets renewed interest i am also happy to rebase #32155 on top of it after it’s merged. Feel free also to NACK and i’ll close, my feelings won’t be hurt (too much :p).

  13. fjahr commented at 8:39 pm on March 27, 2025: contributor

    Thanks for giving more context @darosior . I am certainly concept ACK on adding this new test and -0 on removing the old one. If other reviewers agree that the old test should be removed I’m fine with it.

    We seem to have differing views on two points (that are related): 1. How much more robust the new test is than the old one, you described you view in detail now added the context for that and I just see a lot more ways that the new test could still break. 2. What level the test should focus on. I would interpret what you write as you think it’s not necessary to test failures that are caught by deserialization. I guess that’s fair if we assume deserialization should be covered by unit tests instead. But my view is that we shouldn’t try to be too smart about these things because it can also mean something can be missed, so more of a grey/black box approach than white box.

    Maybe these test are not perfect but one of them found a pretty surprising bug so they did serve their purpose already as they are, I guess that’s why I am biased to keep them around.

    Another thought I had: We could, for each of these cases, mine a new chain in the test with slightly different transactions included which cover the different cases, create a new snapshot for each of them and then only swap the metadata part of these snapshots. That would guarantee that we wouldn’t see any deserialization related error even if the structure (and even serialization) changes. I’m not sure that is what we should do because it would slow test down significantly nor do I think this means the new test suggested here shouldn’t be added. I am just putting the thought out here to demonstrate that there is spectrum of robustness and other trade-offs here.

  14. darosior commented at 8:57 pm on March 27, 2025: member

    Ok, i’ll keep this open a couple more days in case someone else has a different opinion. Otherwise i guess this can be closed.

    What level the test should focus on. I would interpret what you write as you think it’s not necessary to test failures that are caught by deserialization.

    This is of course not what i say. Testing edge cases is good. It’s just better if it’s done deterministically.

    Again, the test as it stands today does not check a deserialization error but a “height too high” error.

    Maybe these test are not perfect but one of them found a pretty surprising bug so they did serve their purpose already as they are, I guess that’s why I am biased to keep them around.

    A non-deterministic test is better than no test. Perfection or nothing is not my point. However your example seems to make my point: it found an issue in the coin’s height because this is not a test for a deserialization error but an insane coin height error.

    I don’t feel too strongly. So unless someone else does i don’t think this is worth hashing further.


darosior DrahtBot rkrux fjahr


fjahr

Labels
Tests


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-28 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me