qa: make feature_assumeutxo.py test more robust

darosior commented at 9:29 pm on March 21, 2025: member

This is another robustness fix that i dropped from #31907, intending to just tweak the error in my branch which triggers the issue. As i played more with it i eventually decided to submit a proper fix instead. Since the rationale initially confused reviewers i tried to explain it in details in the commit message (reproduced here in OP) with enough context.

The feature_assumeutxo.py functional test contains a set of checks for expected errors when a snapshot can be parsed but contains incorrect values. Some of these checks are brittle as they assert an error that is raised not only because of the injected fault, but also because of the following data in the snapshot. A previous occurence of this was fixed in e5ff4e416ecc8a51367022eb8a7a291f8cbc0c65.

In this commit, we look into another check of the same category. This one inserts an invalid coins count for the first transaction in the snapshot. It asserts that by doing so the snapshot validation logic will try to deserialize an additional coin for the first transaction, while it fact it will be deserializing the txid of the next transaction. It then asserts that this attempt to deserialize the txid of the next transaction will result in a specific error: a coin height higher than the base height (returning the string “Bad snapshot data after deserializing 1 coins.”).

This is an issue as it relies on data outside of the fault injected to assert the error string. For instance the next txid could well deserialize in a coin at a height lower than the base height, or (more probably) not successfully deserialize into a coin at all!

Fix this by injecting all the data necessary to reliably assert the expected error: an invalid coins count, followed by a valid coin, followed by a valid-serialization coin with an invalid height (which in this scenario is supposed to be the next coin’s txid).

qa: make feature_assumeutxo.py test more robust

The feature_assumeutxo.py functional test contains a set of checks for expected errors when a
snapshot can be parsed but contains incorrect values. Some of these checks are brittle as they
assert an error that is raised not only because of the injected fault, but also because of the
following data in the snapshot. A previous occurence of this was fixed in
e5ff4e416ecc8a51367022eb8a7a291f8cbc0c65.

In this commit, we look into another check of the same category. This one inserts an invalid coins
count for the first transaction in the snapshot. It asserts that by doing so the snapshot validation
logic will try to deserialize an additional coin for the first transaction, while it fact it will be
deserializing the txid of the next transaction. It then asserts that this attempt to deserialize the
txid of the next transaction will result in a specific error: a coin height higher than the base
height (returning the string "Bad snapshot data after deserializing 1 coins.").

This is an issue as it relies on data outside of the fault injected to assert the error string. For
instance the next txid could well deserialize in a coin at a height lower than the base height, or
(more probably) not successfully deserialize into a coin at all!

Fix this by injecting all the data necessary to reliably assert the expected error: an invalid coins
count, followed by a valid coin, followed by a valid-serialization coin with an invalid height
(which in this scenario is supposed to be the next coin's txid).

a7c57cf48a

DrahtBot commented at 9:29 pm on March 21, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32117.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	rkrux, fjahr, TheCharlatan

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#32155 (miner: timelock the coinbase to the mined block’s height by darosior)
#32096 (Move some tests and documentation from testnet3 to testnet4 by Sjors)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label Tests on Mar 21, 2025

fanquake requested review from fjahr on Mar 23, 2025

in test/functional/feature_assumeutxo.py:143 in a7c57cf48a

139@@ -139,7 +140,19 @@ def expected_error(msg):
140         cases = [
141             # (content, offset, wrong_hash, custom_message)
142             [b"\xff" * 32, 0, "7d52155c9a9fdc4525b637ef6170568e5dad6fabd0b1fdbb9432010b8453095b", None],  # wrong outpoint hash
143-            [(2).to_bytes(1, "little"), 32, None, "Bad snapshot data after deserializing 1 coins."],  # wrong txid coins count
144+            # Enter an invalid number of coins for the first txid: 2 instead of 1. The logic will

rkrux commented at 11:41 am on March 24, 2025:

The comment says this is an invalid number of coins here but the validation code does actually read the said number of coins. Maybe mention the reason why the number of coins is invalid at this stage? https://github.com/bitcoin/bitcoin/blob/770d39a37652d40885533fecce37e9f71cc0d051/src/validation.cpp#L5928-L5938

Mentioned this because the actual test failure error is caused by invalid height being passed in the serialized coin data, I’m trying to tie it up to why the number of coins is invalid. https://github.com/bitcoin/bitcoin/blob/770d39a37652d40885533fecce37e9f71cc0d051/src/validation.cpp#L5944-L5949

in test/functional/feature_assumeutxo.py:151 in a7c57cf48a

147+            # To get the expected error we serialize the first bytes of the would-be next txid as a
148+            # valid coin with an invalid height (here 300, when the snapshot base height is 299).
149+            [
150+                ser_compact_size(2)  # coins count
151+                + ser_compact_size(0) + ser_varint(1) + ser_varint(compress_amount(0)) + ser_varint(0) + b"\x00"*20  # first coin, height 1
152+                + ser_compact_size(0) + ser_varint(300 * 2) + ser_varint(compress_amount(0)) + ser_varint(0) + b"\x00"*20,  # second coin, height 300

rkrux commented at 11:51 am on March 24, 2025:

0-ser_varint(300 * 2)
1+ser_varint((SNAPSHOT_BASE_HEIGHT + 1) * 2)

darosior commented at 10:41 pm on March 25, 2025:

Will do if this gets more Concept ACKs.

rkrux commented at 11:52 am on March 24, 2025: contributor

Concept ACK a7c57cf48a147c3ae3709f0630a3ba175aa4c841

This is much clearer compared to last time. I have reviewed this PR from the POV of someone new to the assumeutxo feature and asked a question because there appears to be two points that I have not been able to tie together yet.

fjahr commented at 12:45 pm on March 24, 2025: contributor

The intent is clearer to me now as well, thanks. I think the new test is valuable but I am not sure that this means that the old test has to be removed as well. These are two different tests now, the old test is testing a very simple off by one error and the new one is more elaborate. I don’t really see why we have to throw away the first one.

The fear seems to be that the structure of the dump changes and that would require touching the test. I’m just not that concerned about that because these tests overall have very strong assumptions about the structure of the dump (with the offset for example) and I expect will need to touch them anyway if we make changes to the dump file. But I would be fine if the error message is loosened so that it’s not checking the exact error message but instead just ensures that there is a failure without expecting that exact message.

darosior commented at 10:36 pm on March 25, 2025: member

These are two different tests now

No they are not. This just made the existing test deterministic, of sorts.

the old test is testing a very simple off by one error

To be honest i don’t think the test as it stands today in master is all that useful. It claims to be testing a case where the coins count is too large. A too large coins count would “overflow” reading coins into the next coin’s txid. In all likelihood this should fail to deserialize and return a “Bad snapshot format or truncated snapshot after deserializing” error. By some luck it happens to deserialize with the current snapshot data, so this actually tests the case where a coin has a too high block height. It seems the person who introduced this test just asserted the first error that arised when injecting this fault without thinking too much what this is supposed to be testing.

these tests overall have very strong assumptions about the structure of the dump (with the offset for example)

If they do then i think this should be fixed, which is what i propose in this PR. But it is not true that all those tests are similarly brittle. Let’s go case by case: https://github.com/bitcoin/bitcoin/blob/c0b7159de47592c95c6db061de682b4af4058102/test/functional/feature_assumeutxo.py#L140-L155

The first one replaces the txid of the first coin with a different one. So it wouldn’t break if the snapshot data changes.
The second one is the one fixed in this PR.
The third one overwrite the coins count with an insanely high value and would still run into this error if the snapshot data changes.
The third one replaces the vout of the first coin in the first transaction with 1 instead of 0. It would not break if the snapshot data were to change.
The fourth one does the same for the coin height+coinbase bit. Same it would not break.
The fifth one is equivalent to the fourth one.
The sixth one tests an invalid coin block height. It would still run into this error if the snapshot data around it were to change.
The seventh and final one is the one that was fixed in #31907.

So besides the one that is fixed in this PR and the one that was fixed in my previous PR, none are similarly brittle.

fjahr commented at 11:01 pm on March 25, 2025: contributor

But it is not true that all those tests are similarly brittle.

You are looking at these tests in isolation, that’s not what I was talking about. The offsets of these test cases are still just as brittle and this carries through all the tests that would follow a mismatch. For example the coins count is a VARINT so it can have a different size and if its size of that VARINT changes in the dump then all the following offsets are wrong. All I am saying is that the test still has very strong assumptions about the structure of the dump.

darosior commented at 7:38 pm on March 27, 2025: member

For what it’s worth #32155 is the branch i was working on and the reason for this PR. I just wanted to avoid changing the error messages there and let reviewers figure out why it’s fine (which involves learning the structure of the snapshot, familiarizing themselves with the offset, and understanding that actually the error message depends on the next txid in the snapshot and therefore may change if the txid change).

I usually find that “change the error message until the test pass” to be a big red flag in review and wanted to save time to reviewer of what i consider an important PR by doing the right thing with this preparatory PR. Given the feedback here i just went with “update the error message and let people figure it out” in #32155. If this PR gets renewed interest i am also happy to rebase #32155 on top of it after it’s merged. Feel free also to NACK and i’ll close, my feelings won’t be hurt (too much :p).

fjahr commented at 8:39 pm on March 27, 2025: contributor

Thanks for giving more context @darosior . I am certainly concept ACK on adding this new test and -0 on removing the old one. If other reviewers agree that the old test should be removed I’m fine with it.

We seem to have differing views on two points (that are related): 1. How much more robust the new test is than the old one, you described you view in detail now added the context for that and I just see a lot more ways that the new test could still break. 2. What level the test should focus on. I would interpret what you write as you think it’s not necessary to test failures that are caught by deserialization. I guess that’s fair if we assume deserialization should be covered by unit tests instead. But my view is that we shouldn’t try to be too smart about these things because it can also mean something can be missed, so more of a grey/black box approach than white box.

Maybe these test are not perfect but one of them found a pretty surprising bug so they did serve their purpose already as they are, I guess that’s why I am biased to keep them around.

Another thought I had: We could, for each of these cases, mine a new chain in the test with slightly different transactions included which cover the different cases, create a new snapshot for each of them and then only swap the metadata part of these snapshots. That would guarantee that we wouldn’t see any deserialization related error even if the structure (and even serialization) changes. I’m not sure that is what we should do because it would slow test down significantly nor do I think this means the new test suggested here shouldn’t be added. I am just putting the thought out here to demonstrate that there is spectrum of robustness and other trade-offs here.

darosior commented at 8:57 pm on March 27, 2025: member

Ok, i’ll keep this open a couple more days in case someone else has a different opinion. Otherwise i guess this can be closed.

What level the test should focus on. I would interpret what you write as you think it’s not necessary to test failures that are caught by deserialization.

This is of course not what i say. Testing edge cases is good. It’s just better if it’s done deterministically.

Again, the test as it stands today does not check a deserialization error but a “height too high” error.

Maybe these test are not perfect but one of them found a pretty surprising bug so they did serve their purpose already as they are, I guess that’s why I am biased to keep them around.

A non-deterministic test is better than no test. Perfection or nothing is not my point. However your example seems to make my point: it found an issue in the coin’s height because this is not a test for a deserialization error but an insane coin height error.

I don’t feel too strongly. So unless someone else does i don’t think this is worth hashing further.

DrahtBot commented at 4:00 pm on April 1, 2025: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

DrahtBot added the label Needs rebase on Apr 1, 2025

Sjors commented at 8:40 am on April 2, 2025: member

Rebase is needed probably because of #32096 8cfc09fafe59adae4904a21589736de93a00ad2d.

TheCharlatan commented at 1:05 pm on April 9, 2025: contributor

Concept ACK

Sjors commented at 9:52 am on May 15, 2025: member

I’ll give this PR some thought after you rebase it…

darosior commented at 1:20 pm on May 15, 2025: member

My rationale for this PR was to make #32155 easier to review. Since it is now merged, i think my time would be better spent on other changes so i’m going to close this one.

darosior closed this on May 15, 2025

qa: make feature_assumeutxo.py test more robust #32117

Code Coverage & Benchmarks

Reviews

Conflicts