script: return verification flag responsible for error upon validation failure

darosior commented at 7:19 pm on July 18, 2025: member

For unconfirmed transactions, we currently distinguish between Script validation failures due to standardness rules and those due to consensus rules. The distinction is used to decide whether to disconnect the peer which submitted this transaction.

This is currently achieved by repeating Script validation with only the consensus verification flags, after it failed with the standardness flags. It is wasteful, and potentially significantly more so since we have Taproot. This PR proposes to replace this detection by having Script validation return the verification flag responsible for the failure instead.

Note this is a slight behaviour change, as a consensus-related Script validation failure that happens after a standardness-related Script validation failure would not be treated as a consensus error anymore (and consequentially the peer not disconnected). I don’t think this is an issue because we already do not treat all consensus-invalid transactions we get submitted as consensus errors. For instance, a non-standard transaction (for any reason other than its Scripts) with a consensus-invalid Script will be treated as a standardness error. Same for a transaction with a consensus-invalid Script in a later input than another with a non-standard Script. In these conditions, a Script that is consensus-invalid after being standardness-invalid is an edge case that seems unnecessary to detect. It also seems to me that the fact this edge case is detected at all is more an artifact of the existing logic (introduced in 97e7901a3a74 13 years ago) than a design decision. For all these reasons, this slight behaviour change seems fine to me (and definitely worth it when considering the upside of not having to run Script validation twice).

This change is quite verbose. If reviewers want i can try to make it less so, but i first wanted to open this as draft to get feedback on the broader approach of returning errors from Script validation directly instead of having to run it multiple times with different flags to distinguish the cause of an error.

DrahtBot added the label Consensus on Jul 18, 2025

DrahtBot commented at 7:19 pm on July 18, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33012.

Reviews

See the guideline for information on the review process. A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#32998 (Bump SCRIPT_VERIFY flags to 64 bit by ajtowns)
#32575 (refactor: Split multithreaded case out of CheckInputScripts by fjahr)
#32473 (Introduce per-txin sighash midstate cache for legacy/p2sh/segwitv0 scripts by sipa)
#32247 (BIP-348 (OP_CHECKSIGFROMSTACK) (regtest only) by jamesob)
#32143 (Fix 11-year-old mis-categorized error code in OP_IF evaluation by cculianu)
#31713 (miniscript refactor: Remove unique_ptr-indirection (#30866 follow-up) by hodlinator)
#31650 (refactor: Avoid copies by using const references or by move-construction by maflcko)
#31576 (test: Move script_assets_tests into its own suite by hebasto)
#29247 (CAT in Tapscript (BIP-347) by arminsabouri)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

darosior force-pushed on Jul 18, 2025

DrahtBot added the label CI failed on Jul 18, 2025

DrahtBot commented at 8:11 pm on July 18, 2025: contributor

🚧 At least one of the CI tasks failed. Task TSan, depends, gui: https://github.com/bitcoin/bitcoin/runs/46284608490 LLM reason (✨ experimental): The CI failure is caused by a compilation error in verify_script.cpp due to an incorrect call to EvalScript with mismatched argument types.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label CI failed on Jul 18, 2025

ajtowns commented at 5:44 am on July 20, 2025: contributor

Note this is a slight behaviour change, as a consensus-related Script validation failure that happens after a standardness-related Script validation failure would not be treated as a consensus error anymore (and consequentially the peer not disconnected).

I’m skeptical whether this behaviour is really worth preserving in a limited fashion? With this change, an attacker can waste your resources without being discouraged or risking having to pay tx fees by making a consensus invalid tx that fails standardness before the consensus failure is seen; and any innocent third parties are more likely to waste your resources by relaying txs under different standardness policies, and won’t be penalised at all.

I think getting different errors for standardness vs consensus failures can be useful for debugging (and perhaps functional tests or network behaviour monitoring) so could see it being worthwhile to retain a (default off) config option to continue testing transactions twice.

darosior commented at 1:35 pm on July 21, 2025: member

The behaviour already exists in a limited fashion. Even without this change, an attacker can trivially create such a transaction (as discussed in OP). Here is for instance a p2p_invalid_tx.py case which passes both with or without this PR:

 0diff --git a/test/functional/data/invalid_txs.py b/test/functional/data/invalid_txs.py
 1index 36b274efc27..46bb2bba367 100644
 2--- a/test/functional/data/invalid_txs.py
 3+++ b/test/functional/data/invalid_txs.py
 4@@ -286,6 +286,22 @@ class NonStandardAndInvalid(BadTxTemplate):
 5             self.spend_tx, 0, script_sig=b'\x00' * 3 + b'\xab\x6a',
 6             amount=(self.spend_avail // 2))
 7 
 8+
 9+class FirstInNonStdSecondInInvalid(BadTxTemplate):
10+    """A transaction with first a standardness error and then a consensus error."""
11+    reject_reason = "non-mandatory-script-verify-flag (Using OP_CODESEPARATOR in non-witness script)"
12+    # NOTE: feature_block would need to be tweaked for the spent tx to have two outputs.
13+    block_reject_reason = "mandatory-script-verify-flag-failed (OP_RETURN was encountered)"
14+    expect_disconnect = False
15+    valid_in_block = False
16+
17+    def get_tx(self):
18+        tx = create_tx_with_script(self.spend_tx, 0, script_sig=b"", amount=(self.spend_avail // 2))
19+        tx.vin.append(CTxIn(COutPoint(self.spend_tx.txid_int, 1)))
20+        tx.vin[0].scriptSig = b"\xab"  # A CODESEP in the first input (non-standard)
21+        tx.vin[1].scriptSig = b"\x6a"  # An OP_RETURN in the second input (invalid)
22+        return tx
23+
24 # Disabled opcode tx templates (CVE-2010-5137)
25 DisabledOpcodeTemplates = [getDisabledOpcodeTemplate(opcode) for opcode in [
26     OP_CAT,
27diff --git a/test/functional/p2p_invalid_tx.py b/test/functional/p2p_invalid_tx.py
28index ae9771e7cb4..33082337cc8 100755
29--- a/test/functional/p2p_invalid_tx.py
30+++ b/test/functional/p2p_invalid_tx.py
31@@ -14,6 +14,7 @@ from test_framework.messages import (
32     CTxOut,
33 )
34 from test_framework.p2p import P2PDataStore
35+from test_framework.script import CScript, OP_TRUE
36 from test_framework.test_framework import BitcoinTestFramework
37 from test_framework.util import (
38     assert_equal,
39@@ -56,7 +57,7 @@ class InvalidTxRequestTest(BitcoinTestFramework):
40 
41         self.log.info("Create a new block with an anyone-can-spend coinbase.")
42         height = 1
43-        block = create_block(tip, create_coinbase(height), block_time)
44+        block = create_block(tip, create_coinbase(height, extra_output_script=CScript([OP_TRUE])), block_time)
45         block.solve()
46         # Save the coinbase for later
47         block1 = block

I don’t think this logic is actually meant to prevent an attacker from wasting resources anyways. I think it’s rather to make sure we don’t split the network when introducing new Script-runtime standardness rules. And this change doesn’t affect this.

script: turn ScriptError into a proper C++ enum

typedef enum is a C idiom which is unnecessary in C++

c9a906f777

script: rename ScriptError to ScriptErrorType f7cdad3e15

darosior force-pushed on Jul 21, 2025

darosior commented at 2:35 pm on July 21, 2025: member

Rebased to fix a silent merge conflict that was making the “test each commit” CI job fail.

in src/script/interpreter.cpp:209 in b8c2777c07 outdated

206     }
207     if ((flags & (SCRIPT_VERIFY_DERSIG | SCRIPT_VERIFY_LOW_S | SCRIPT_VERIFY_STRICTENC)) != 0 && !IsValidSignatureEncoding(vchSig)) {
208-        return set_error(serror, SCRIPT_ERR_SIG_DER);
209+        // If any of the three flags are set, stronger ones take precedence in the error.
210+        uint32_t err_flag{flags & SCRIPT_VERIFY_DERSIG};
211+        if (err_flag == 0) err_flag &= SCRIPT_VERIFY_LOW_S;

sipa commented at 3:02 pm on July 21, 2025:

This has no effect. If err_flag is already zero, then err_flah &= ...; will not change that.

darosior commented at 3:09 pm on July 21, 2025:

Oops, thanks. Of course i meant |= here.

darosior commented at 3:09 pm on July 21, 2025:

There is a couple other occurrences where i must have made the same mistake.

darosior commented at 4:40 pm on July 22, 2025:

Well, apparently that was the only place. Fixed.

script: return offending Script verification flag upon error cc615fa359

qa: demonstrate existing distinction between consensus/standardness errors

We distinguish between consensus and standardness errors in validating a transaction to decide
whether to disconnect the peer. When a transaction is both consensus-invalid and non-standard, we
sometimes treat it as a consensus failure and sometimes not. This commit adds additional cases to
invalid_txs.py to clarify the distinction and highlight the behaviour change in the next commit.

We treat a non-standard transaction with an invalid Script as non-standard (and not disconnect the
peer), and always treat a standard transaction with an invalid Script (whether or not the Script
itself is also non-standard) as consensus. (That is, as long as no non-standard Script is encountered
before.)

46ce07469e

darosior force-pushed on Jul 22, 2025

DrahtBot added the label CI failed on Jul 22, 2025

validation: use Script validation error to distinguish standardness and standardness errors

Change CScriptCheck() to return both the Script error type and the Script verification flag which
caused the failure. This prevents having to run Script validation a second time upon failure.

It does change behaviour slightly, in that a Script consensus failure after a Script standardness
failure won't be detected anymore, and therefore won't lead to a disconnect. This is arguably by
design though: this can only ever be detected by running Script validation without the standardness
verification flags, which we are trying to avoid here.

648e5e2d54

darosior force-pushed on Jul 22, 2025

ajtowns commented at 0:52 am on July 23, 2025: contributor

The behaviour already exists in a limited fashion. Even without this change, an attacker can trivially create such a transaction (as discussed in OP). Here is for instance a p2p_invalid_tx.py case which passes both with or without this PR:

Okay, but that just means the extra work isn’t providing much value today either?

I don’t think this logic is actually meant to prevent an attacker from wasting resources anyways. I think it’s rather to make sure we don’t split the network when introducing new Script-runtime standardness rules. And this change doesn’t affect this.

I’m suggesting we should not discourage for consensus invalid txs either, since we can’t do that both consistently and efficiently – ie, drop MaybePunishNodeForTx() entirely. That would also have the benefit of not having to worry about potential network splits due to txs violating new consensus rules that never actually make it into a block (cf #31269 and #26291). See https://github.com/ajtowns/bitcoin/commits/202507-nopunishtx/

DrahtBot removed the label CI failed on Jul 23, 2025

script: return verification flag responsible for error upon validation failure #33012

Code Coverage & Benchmarks

Reviews

Conflicts