BIP54: Consensus Cleanup test vectors

darosior commented at 12:56 pm on October 21, 2025: member

This introduces test vectors for BIP54. There is one set of vectors per each of the 4 mitigations.

The vectors were generated using the BIP54 implementation against Bitcoin Inquisition available here, as well as a custom miner as a Bitcoin Core unit test available here. Documentation is provided with more details about each set of test vectors and describing how to use them.

bip-0054: add a reference implementation section 89dfe03a64

bip-0054: add test vectors for each mitigation

This introduces a set of test vectors for each of the 4 mitigations in the BIP. The sigops and
transaction size vectors were generated using the unit tests included with the Bitcoin Core
implementation of BIP54, available at https://github.com/darosior/bitcoin/tree/2509_inquisition_consensus_cleanup.
The timestamps and coinbases required mainnet blocks for maximum compatibility, and were generated
by two dedicated unit tests not included with the Bitcoin Core implementation above but available at
https://github.com/darosior/bitcoin/tree/bip54_miner.

30b0084808

jonatack added the label Proposed BIP modification on Oct 21, 2025

in bip-0054/test_vectors/README.md:4 in 2fead0c179

0@@ -0,0 +1,102 @@
1+## BIP54 test vectors
2+
3+This folder contains a set of test vectors for each mitigation introduced in the BIP. This document
4+presents them in more details.

jonatack commented at 3:54 pm on October 21, 2025:

0presents them in more detail.

darosior commented at 8:10 am on October 27, 2025:

Done, thanks.

jonatack commented at 4:13 pm on October 21, 2025: member

Concept ACK, successfully built the test branch at https://github.com/darosior/bitcoin/blob/2509_inquisition_consensus_cleanup and ran the unit and functional tests.

0$ ./build/bin/test_bitcoin --run_test=bip54_tests
1Running 4 test cases...
2
3*** No errors detected

0$ ./build/test/functional/feature_bip54.py     
12025-10-21T16:05:13.483000Z TestFramework (INFO): PRNG seed is: 80635940825195715
22025-10-21T16:05:13.483000Z TestFramework (INFO): Initializing test directory /var/folders/bz/mn3hr6td37bczwp7j89frs4w0000gn/T/bitcoin_func_test_dznr0x1z
32025-10-21T16:05:14.300000Z TestFramework (INFO): BIP54 tests before activation
42025-10-21T16:05:14.558000Z TestFramework (INFO): Activating BIP54
52025-10-21T16:05:14.745000Z TestFramework (INFO): BIP54 tests after activation
62025-10-21T16:05:15.296000Z TestFramework (INFO): Stopping nodes
72025-10-21T16:05:15.460000Z TestFramework (INFO): Cleaning up /var/folders/bz/mn3hr6td37bczwp7j89frs4w0000gn/T/bitcoin_func_test_dznr0x1z on exit
82025-10-21T16:05:15.460000Z TestFramework (INFO): Tests successful

Are the functional tests worth mentioning in the test_vectors README? Perhaps with some of the content in the commit message:

The previously introduced unit tests extensively test the specific implementation of each
mitigation. This functional test complements them by sanity checking all mitigations in a "black
box" manner. For the added timestamp constraints, it mimicks how they would get exploited (by
implementing pseudo timewarp and Murch-Zawy attacks) and demonstrates those exploits are not
possible anymore after BIP54 activates.

in bip-0054/test_vectors/README.md:88 in 2fead0c179 outdated

83+The [`coinbases.json`](./coinbases.json) file contains test cases exercising the new restrictions on
84+coinbase transactions introduced in BIP54 to prevent duplicate coinbase transactions without
85+resorting to BIP30 validation. Each test case contains a chain of mainnet blocks (including the
86+genesis block), and whether this block chain is valid according to BIP54. All test cases are valid
87+according to current Bitcoin's consensus rules, except one that features a block containing a
88+coinbase transaction timelocked to a future block height.

murchandamus commented at 7:53 am on October 22, 2025:

Should there then perhaps also be a test case that has a coinbase transaction locked to a block height that is lower than required?

darosior commented at 8:37 am on October 22, 2025:

There definitely was one but looks like i messed up somewhere in re-generating the vectors. Will re-add it, thank you.

murchandamus commented at 8:24 am on October 23, 2025:

Just to be clear, I was just looking at the description, I did not check the test vectors, so it might just be the description that is off.

darosior commented at 12:46 pm on October 23, 2025:

Oh, then i just checked and it still is in here! Link.

murchandamus commented at 7:57 am on October 22, 2025: member

Looks good, see nit

bip-0054: document the test vectors 0777a81367

darosior force-pushed on Oct 27, 2025

darosior commented at 8:12 am on October 27, 2025: member

Are the functional tests worth mentioning in the test_vectors README?

No, i don’t think so. The functional test sanity checks the implementation with a blackbox approach, but does not implement test vectors. Hence they don’t need to be mentioned in the test vectors README.

in bip-0054/test_vectors/README.md:30 in 0777a81367

25+The [`timestamps.json`](./timestamps.json) test vectors exercise the two constraints on block header
26+timestamps introduced by BIP54 to mitigate the Timewarp and Murch-Zawy attacks. Each test case
27+features a chain of mainnet headers starting from the genesis block, and whether this header chain
28+is valid by BIP54 rules. Each test case also contains a comment describing why this particular chain
29+is (in)valid according to BIP54.  All test cases are valid according to current Bitcoin consensus
30+rules. It is intended to be used to test a BIP54 implementation by feeding the header chain to a

ariard commented at 4:46 am on October 30, 2025:

Started to look on the timestamps.json file, I’m understanding that the headers for the 9 chain of headers have been drawn the historical mainnet chain, and then permutation of a subset of those chain of headers have been made accordingly to make some of them (in)valid according to BIP54, assuming that BIP54 would have been enforced as a rule since the genesis block.

Given the difficulty adjustment is proposing rules for the validation state where block is equal to height N % 2016 equal 0 and the state where block is equal to height N % 2016 equal 2016, and there is at least >, =, < to test for each boundary check, there should be at least 6 tests. From checking timestamps.json, there at least 9 test chains in the file.

Wondering if there is full coverage for the first equality check, given you have > and = to test each for the variations of minus 7200 (and then same, you have >, =, < I think that are valuable to test for).

darosior commented at 2:34 pm on November 25, 2025:

Not exactly. Only the genesis block’s header is in common with the historical main chain. But you are correct in how it’s used: creating various forks which test the Timewarp and Murch-Zawy fixes in various conditions.

Rather than >, =, <, i think what’s important to test is the boundary: the last value for which the test pass and the first for which it fails. So using your terms it would be <, =. Plus of course also testing that it does not trigger at other heights, especially those off-by-one from the height that needs to be checked.

At the moment the tests exercise for Timewarp that:

The check is not performed on heights in the difficulty adjustment period (heights 41 and 2001)
The lowest value for which the check passes for the correct height (2016)
The first value for which the check fails for the correct height
The check is not performed on the height just preceding the one on which it should be enforced (2015) to check for off-by-one in implementations
The check is not performed on the height just following the one on which it should be enforced (2017) to check for off-by-one in implementations

At the moment the tests exercise for Murch-Zawy that:

The check fails on the first value for which it is supposed to fail
The passes on the first value for which it is not supposed to fail
The check is not enforced on the height just following the one on which it should be enforced (4032 instead of 4031)

Which test cases do you think should be added exactly and for what reason? I think the above is sufficient for a first shot, but i’m happy to consider re-mining the chain of headers and add the test cases you would suggest in a followup.

ariard commented at 5:08 am on December 1, 2025:

Which test cases do you think should be added exactly and for what reason?

I think it’s on the situation where the height of the checked block is a difficulty-adjustement block and verifying the variation for the triplet where the previous block time is T - 7199, 7200 and 7201 so testing the boundary of MAX_TIMEWARP when current block time and previous block time are 2 temporal points with strictly MAX_TIMEWARP between them. Thanks already for more comment on what is already covered.

in bip-0054/test_vectors/README.md:42 in 0777a81367

37+- `comment`: a JSON string. Description of the test case.
38+
39+For the purpose of testing a Timewarp fix, a Timewarp attack was included early on in the history of
40+testnet3. An implementer of BIP54 may want to ensure that syncing testnet3 by enforcing BIP54 since
41+genesis will treat block `00000000118da1e2165a19307b86f87eba814845e8a0f99734dce279ca3fb029` as
42+invalid.

ariard commented at 4:47 am on October 30, 2025:

Very open question: how hard it would be to coordinate a Murch-Zawy attack variant on testnet3.

darosior commented at 2:51 pm on November 25, 2025:

We’d need some non-trivial hash power to mine at least 3 difficulty adjustment periods without publishing them. We’d also need to ramp up the difficulty first, which is currently 1 on testnet3. A Murch-Zawy can also be done on testnet4, for what it’s worth.

If we can find someone with some hashrate that could be interesting to do.

ariard commented at 4:48 am on December 1, 2025:

Yeah, I’m reading back the computational cost estimations for a Murch-Zawy attack. Always interesting to have coverage for edge case, even if it’s only to test more the novel code logic.

ariard commented at 4:51 am on October 30, 2025: none

Started to review the code and test coverage on the branch in bitcoin-inquisition.

For the current bip-0054.md, maybe it could be valuable to have an editorial refinement of the “Specification” section where the new rules are each categorized in its section (“difficulty adjustement fix”, “long block validation time fix”, “merkle tree malleability fix” and “duplicate coinbase tx”) as it’s done for the test-vectors/README.md”. Believing it’s fruitful to review that for each line of specification, and corresponding line of code, there is adequate test coverage.

in bip-0054/test_vectors/txsize.json:30 in 0777a81367

25+        "comment": "A 65-byte legacy transaction."
26+    },
27+    {
28+        "tx": "02000000000101827da3d85a6547d6b03662d2cb86982d655a6f390547285a3bf9ec9f28e0c8831500000000ffffffff01000000000000000004005152540108213245576281941200000000",
29+        "valid": false,
30+        "comment": "A 64-byte Segwit transaction."

ariard commented at 0:15 am on November 10, 2025:

I’m not sure if there is a coverage for a 64-byte Taproot transaction. Not sure if it’s either strictly necessary as Taproot transactions are Segwit transactions, even if the other way isn’t necessarily true.

There is no change with Segwit transactions, but having cases for a Taproot + annex (empty) and a Taproot + annex (full) to be sure that the 64-byte verification logic is checking the code correctly, can be valuable in my opinion.

darosior commented at 2:57 pm on November 25, 2025:

There is coverage for a 64-byte Taproot transaction, see the test case just below your comment:

A 64-byte Segwit transaction (1 p2tr input, 1 p2a output).

The annex is just another witness stack element, so i’m not sure it would add more value, but it also doesn’t hurt to have. I’m happy to add it in a follow-up.

ariard commented at 4:47 am on December 1, 2025:

There is coverage for a 64-byte Taproot transaction, see the test case just below your comment:

Ah yes, seen the test case for Taproot “1 pt2r output input, 1 p2a output.

The annex is just another witness stack element, so i’m not sure it would add more value,

It’s indeed another witness stack element, so a priori I don’t see why it would raise issues. But iirc, it’s pop up and processed differently than other stack elements, so it’s more to check there is no unintended interactions.

What if a 65-byte Taproot transaction is stripped of its 1-byte annex at the p2p-level, normally the annex is committed in the taproot signature so it should fail at signature verification, but the tx should be now rejected early due to its 64-byte size. I’m not sure if you can make 64-byte tx that is not a key spend-path, so the annex should be always covered by the annex, but it’s more interactions with p2p that might be interesting to check (— which is strictly outside the scope of this BIP, I know).

in bip-0054/test_vectors/README.md:78 in 0777a81367

73+
74+The test vector file features a JSON array of JSON objects, each corresponding to a test case. Each
75+JSON object features the following entries:
76+- `tx`: a JSON string. A hex-encoded Bitcoin-serialized transaction to be evaluated.
77+- `valid`: a JSON boolean. Whether this transaction is valid according to BIP54.
78+- `comment`: a JSON string. Description of the test case.

ariard commented at 0:21 am on November 10, 2025:

I had a read of the sigops.json but I cannot comment directly on it because it is considered as a “diff too large” by Github.

On the test vector: “Mixed input types reaching exactly 2500 BIP54-sigops + a future Segwit program input”, I think it could be valuable to multiply it for the whole range of future Segwit program (i.e from OP_02 to OP_16) to ensure the CheckSigopsBIP54 parser works well.

darosior commented at 3:10 pm on November 25, 2025:

Now i think that would be overkill, each of these cases takes quite a lot of space and i don’t think there is much value in testing every single Segwit version here. We already test for versions 0, 1 and 2. Maybe i could add a version 16 in a followup if you really think it’s worthwhile.

ariard commented at 4:31 am on December 1, 2025:

I know it’s a bit overkill…on the same time the CheckSigopsBIP54 parser it’s consensus. Maybe it’s worthwhile to add it in a follow-up when there are more test coverage vectors worthy to add.

ariard commented at 0:23 am on November 10, 2025: none

I had a first overview on all the test vector files (timestamps.json, sigops.json, txsize.json and coinbases.json). Left already few comments, and I’ll pursue review on the code implementation on bitcoin-inquisition.

darosior commented at 3:20 pm on November 25, 2025: member

Thanks for the comments. There are some suggestions that i believe to be overkill, but i think may be worth including: the marginal cost of an additional test case is overall quite low and if even the slightest reduction in risk of a bug in consensus code is probably worth it.

That said i think those can wait for a follow-up. I think what’s here is a pretty solid base for anyone to start testing an implementation, and that the more overkill test cases can wait for now. I am also waiting for more feedback, and want to address all of it an once since it will require re-mining a bunch of blocks/headers.

jonatack merged this on Nov 25, 2025

jonatack closed this on Nov 25, 2025

jonatack commented at 5:56 pm on November 25, 2025: member

Feel free to continue the review discussion here until an eventual follow-up is opened.

darosior commented at 7:05 pm on November 25, 2025: member

Ironic that this PR’s number is 2015, the number of block intervals considered by the difficulty adjustment instead of 2016, which opens up the Timewarp attack fixed in BIP 54. :smile:

in bip-0054/test_vectors/README.md:8 in 0777a81367

0@@ -0,0 +1,102 @@
1+## BIP54 test vectors
2+
3+This folder contains a set of test vectors for each mitigation introduced in the BIP. This document
4+presents them in more detail.
5+
6+The code used to generate half of the test vectors is included with the implementation and available
7+[here][other-vectors]. The other half requires mining mainnet blocks and is [published
8+separately][bip54-miner]. In both cases it is implemented as a regular Bitcoin Core unit test, and

Christewart commented at 10:41 pm on November 25, 2025:

The bip54-miner link seems to be broken? Admittedly I’m not sure how this linking format works (such as other-vectors), but here is what i see

darosior commented at 11:16 pm on November 25, 2025:

Indeed this should link to https://github.com/darosior/bitcoin/commits/bip54_miner/ instead of https://github.com/darosior/bitcoin/blob/bip54_miner/commits my bad.

darosior commented at 11:19 pm on November 25, 2025:

Fixed in #2045, nice catch thanks.

darosior deleted the branch on Nov 25, 2025

ariard commented at 5:14 am on December 1, 2025: none

I’m on reviewing the implementation in bitcoin-inquisition, for more worthwhile test coverage that it can be worthy to add, I’ve already noted:

64-byte transaction with Taproot annex full / empty
max-sigops: segwit from OP_2 to OP_16
more Murch-Zawy attack

Fully agree there is no need to add more, it’s better to wait when the implementation would have been more reviewed, by more people. Somehow it can be worthy not only to test the correctness of the new consensus code w.r.t its BIP54 specification, but also to verify there is no unintended interaction of the novel code with things like p2p.

Christewart commented at 9:09 pm on December 2, 2025: contributor

In sigops.json there is a test case with the comment

0"comment": "Invalid bare script with 125 CHECMULTISIGs each accounted for 20 BIP54-sigops"

The comment is inaccurate, the test case has 126 OP_CHECKMULTISIGS. That is why the transaction is invalid. If it had 125 the transaction would be valid (125 * 20 = 2500)

If you decide to fix the comment, probably should fix the misspelling of CHECMULTISIGs while you are at it

darosior commented at 9:36 pm on December 2, 2025: member

Nice catch, thanks. Yes, will do both.

darosior commented at 3:06 pm on January 20, 2026: member

Some more feedback for the BIP: https://github.com/bitcoin-inquisition/bitcoin/pull/99#discussion_r2703817811. Collecting it here to address all feedback in a single followup.

And in this comment too, regarding hardcoding 2016: https://github.com/bitcoin-inquisition/bitcoin/pull/99#discussion_r2703984275

Move comment up: https://github.com/bitcoin-inquisition/bitcoin/pull/99#discussion_r2693674854, https://github.com/bitcoin-inquisition/bitcoin/pull/99#discussion_r2732460234

Make sure test vectors contain an EOF: https://github.com/bitcoin-inquisition/bitcoin/pull/99#discussion_r2732510123

Clarify the rationale for constraining the nSequence of coinbase transactions, as this caused some confusions. See https://github.com/bitcoin-inquisition/bitcoin/pull/99#discussion_r2636788599 and https://gnusha.org/pi/bitcoindev/e758af9b-72fc-4fdd-8e07-e1126635780an@googlegroups.com/

BIP54: Consensus Cleanup test vectors #2015