Improve Transaction Tests Flags

JeremyRubin commented at 8:35 pm on September 2, 2021: contributor

Originally posted by @glozow in #21702#r684235896

While trying to write transaction tests for CTV i’ve come across the pathological case of trying to specify flags.

Suppose I have a script that is <X> CTV. CTV is an OP_NOP4 takeover, so when DISCOURAGE_UPGRADABLE_NOPS is set but VERIFY_CTV flag is not, then it should fail. Therefore we need to mark DISCOURAGE_UPGRADABLE_NOPS as excluded. However, when we exclude that flag, then the flag maximality check wants it to fail, but the txn is consensus valid.

If we were to just update the trim/fill flag functions, then we would be overly restrictive because we would set things up to e.g. always exlcude or include a flag when in reality it depends on the specific flags being set or unset.

I think this can be addressed by adding 2 new fields to the test:

flags that should be always on
transactions that should be exempt from the maximality check

We can then duplicate some tests to test the combo of flags to check that NONE are excluded + the new NOP4 passes as well as the case where DISCOURAGE_NOPS is excluded (i.e., never set) that turning it back on (during the maximality check) is not a failure.

Any thoughts on this approach?

JeremyRubin commented at 9:07 pm on September 2, 2021: contributor

Concrete example patch + txvalid vectors. Without the patch the first example would fail because of the DISCOURAGE rules kicking ON with CTV off. The second example would fail becuase when the DISCOURAGE rule is excluded, turning it off to check for failure fails when the CTV rule is on. These kinda sound like the same case, but they’re a bit different…

0["Check that CTV is Processed with a Taproot Spend"],
1[[["f90521604b56c392ffa17a01bcae5914b8cf7728cc6cec00d90838818cc5465f", 0, "1 0x20 0x24f5fe807bcee7774dc515f0b7ee8d6ae39eefd1b590264c52ff867e22c49419", 155000]],
2"020000000001015f46c58c813808d900ec6ccc2877cfb81459aebc017aa1ff92c3564b602105f90000000000000000000ae80300000000000017a914ce0036ae7d49f06967dd92cc1ffff4a878c457f987d00700000000000017a91406e00c3b362e65e03507a2858d7b6499b668669887b80b00000000000017a9142ee42c65592c59b69bfefbd03781140c67e5232487a00f00000000000017a9146b3df16a1e6651d582ca6598900cb4f2d6c9dfb887881300000000000017a914877d55932d4f38b476d4db27e4efbe159ff0a07187701700000000000017a91441e9dc892e861d252d513d594ba833cd6bc8917087581b00000000000017a914b93075800c693dcc78b0553bf9d1cf879d76a02487401f00000000000017a914e9f0ea3a2cae0ad01114e2ec3502ef08bbc50af487282300000000000017a9149a645b5293bdf8be72cb9d1460bce7d64445cfad87102700000000000017a91451e5d6b2ee24ae128234c92245df3624620ea7d3870222209eb65498bfcd4eb90e61c2c5e323a9c16c8bfd8d53ba649915bcdb572099c12fb321c0b7e0105780185688d998a8f8438aa07637a5799755688ec80175cb26c0406e0200000000",
3"NONE", "DEFAULT_CHECK_TEMPLATE_VERIFY_HASH"],
4["Check that CTV upgradability works (taproot)"],
5[[["f90521604b56c392ffa17a01bcae5914b8cf7728cc6cec00d90838818cc5465f", 0, "1 0x20 0x24f5fe807bcee7774dc515f0b7ee8d6ae39eefd1b590264c52ff867e22c49419", 155000]],
6"020000000001015f46c58c813808d900ec6ccc2877cfb81459aebc017aa1ff92c3564b602105f90000000000000000000ae80300000000000017a914ce0036ae7d49f06967dd92cc1ffff4a878c457f987d00700000000000017a91406e00c3b362e65e03507a2858d7b6499b668669887b80b00000000000017a9142ee42c65592c59b69bfefbd03781140c67e5232487a00f00000000000017a9146b3df16a1e6651d582ca6598900cb4f2d6c9dfb887881300000000000017a914877d55932d4f38b476d4db27e4efbe159ff0a07187701700000000000017a91441e9dc892e861d252d513d594ba833cd6bc8917087581b00000000000017a914b93075800c693dcc78b0553bf9d1cf879d76a02487401f00000000000017a914e9f0ea3a2cae0ad01114e2ec3502ef08bbc50af487282300000000000017a9149a645b5293bdf8be72cb9d1460bce7d64445cfad87102700000000000017a91451e5d6b2ee24ae128234c92245df3624620ea7d3870222209eb65498bfcd4eb90e61c2c5e323a9c16c8bfd8d53ba649915bcdb572099c12fb321c0b7e0105780185688d998a8f8438aa07637a5799755688ec80175cb26c0406e0200000000",
7"DISCOURAGE_UPGRADABLE_NOPS", "NONE", true],

 0diff --git a/src/test/transaction_tests.cpp b/src/test/transaction_tests.cpp
 1index df2071d2b..717e30276 100644
 2--- a/src/test/transaction_tests.cpp
 3+++ b/src/test/transaction_tests.cpp
 4@@ -198,11 +198,19 @@ BOOST_AUTO_TEST_CASE(tx_valid)
 5         std::string strTest = test.write();
 6         if (test[0].isArray())
 7         {
 8-            if (test.size() != 3 || !test[1].isStr() || !test[2].isStr())
 9+            const size_t size = test.size();
10+            const bool default_args = size == 3;
11+            const bool has_skip_exclude_one = size == 5;
12+            const bool has_extra_flags = size == 4;
13+            const bool size_correct = default_args || has_extra_flags || has_skip_exclude_one;
14+            const bool extra_flags_correct = !has_extra_flags || test[3].isStr();
15+            const bool skip_exclude_one_correct = !has_skip_exclude_one  || test[4].isBool();
16+            if (!size_correct || !test[1].isStr() || !test[2].isStr() || !extra_flags_correct || !skip_exclude_one_correct)
17             {
18                 BOOST_ERROR("Bad test: " << strTest);
19                 continue;
20             }
21+            const bool skip_exclude_one = has_skip_exclude_one? test[4].get_bool() : false;
22 
23             std::map<COutPoint, CScript> mapprevOutScriptPubKeys;
24             std::map<COutPoint, int64_t> mapprevOutValues;
25@@ -243,33 +251,36 @@ BOOST_AUTO_TEST_CASE(tx_valid)
26 
27             PrecomputedTransactionData txdata(tx);
28             unsigned int verify_flags = ParseScriptFlags(test[2].get_str());
29+            unsigned int extra_verify_flags = has_extra_flags? ParseScriptFlags(test[3].get_str()) : 0;
30 
31             // Check that the test gives a valid combination of flags (otherwise VerifyScript will throw). Don't edit the flags.
32             if (~verify_flags != FillFlags(~verify_flags)) {
33                 BOOST_ERROR("Bad test flags: " << strTest);
34             }
35 
36-            BOOST_CHECK_MESSAGE(CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, ~verify_flags, txdata, strTest, /* expect_valid */ true),
37+            BOOST_CHECK_MESSAGE(CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, extra_verify_flags|~verify_flags, txdata, strTest, /* expect_valid */ true),
38                                 "Tx unexpectedly failed: " << strTest);
39 
40             // Backwards compatibility of script verification flags: Removing any flag(s) should not invalidate a valid transaction
41             for (const auto& [name, flag] : mapFlagNames) {
42                 // Removing individual flags
43-                unsigned int flags = TrimFlags(~(verify_flags | flag));
44+                unsigned int flags = TrimFlags(extra_verify_flags | ~(verify_flags | flag));
45                 if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, flags, txdata, strTest, /* expect_valid */ true)) {
46                     BOOST_ERROR("Tx unexpectedly failed with flag " << name << " unset: " << strTest);
47                 }
48                 // Removing random combinations of flags
49-                flags = TrimFlags(~(verify_flags | (unsigned int)InsecureRandBits(mapFlagNames.size())));
50+                flags = TrimFlags(extra_verify_flags | ~(verify_flags | (unsigned int)InsecureRandBits(mapFlagNames.size())));
51                 if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, flags, txdata, strTest, /* expect_valid */ true)) {
52                     BOOST_ERROR("Tx unexpectedly failed with random flags " << ToString(flags) << ": " << strTest);
53                 }
54             }
55 
56             // Check that flags are maximal: transaction should fail if any unset flags are set.
57-            for (auto flags_excluding_one : ExcludeIndividualFlags(verify_flags)) {
58-                if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, ~flags_excluding_one, txdata, strTest, /* expect_valid */ false)) {
59-                    BOOST_ERROR("Too many flags unset: " << strTest);
60+            if (!skip_exclude_one) {
61+                for (auto flags_excluding_one : ExcludeIndividualFlags(verify_flags)) {
62+                    if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, ~flags_excluding_one, txdata, strTest, /* expect_valid */ false)) {
63+                        BOOST_ERROR("Too many flags unset: " << strTest);
64+                    }
65                 }
66             }
67         }

glozow commented at 10:49 am on September 3, 2021: member

Just to make sure I understand you correctly, please correct me if this is inaccurate:

Your test is a valid OP_CTV transaction, and this is what we get when passing in these combinations of script verification flags

0DISCOURAGE_NOPS      CHECKTEMPLATEVERIFY   expected result   your result
1      on                 on                 success            fail
2      on                 off                success            fail
3      off                on                 success            success
4      off                off                success            success

(Edit: I realized that NOP4 should no longer be gated by DISCOURAGE_UPGRADEABLE_NOPS, #21702 (review))

The transaction_tests framework expects the set of excluded flags to be minimal, i.e. removing any more flags should be fine, but adding any flags should make it fail. Flags are specified as excluded and the default passed in is everything in STANDARD_SCRIPT_VERIFY_FLAGS. If your test is failing the flag maximality check when you specify exclude=DISCOURAGE_UPGRADEABLE_NOPS, that means either:

It passed in an extra flag and it didn’t fail. This means your test needs to be more specific.
It removed an extra flag and it failed. Note that if it removed CTV and it failed, this means CTV isn’t a soft fork. There are other possibilities as well.

JeremyRubin commented at 6:30 pm on September 3, 2021: contributor

You’re missing what’s going on. DISCOURAGE_UPGRADABLE_NOPS is a non consensus flag.

You should be able to e.g. test the following script spend:

<H> CTV

in the correct txn for H.

When VERIFY_CTV is OFF and DISCOURAGE is ON it will fail.
When VERIFY_CTV is OFF and DISCOURAGE is OFF it will pass.
When VERIFY_CTV is ON and DISCOURAGE is ON it will pass.
When VERIFY_CTV is ON and DISCOURAGE is OFF it will pass.

Therefore we must either set either VERIFY_CTV is always on, or DISCOURAGE is always off and skip the DISCOURAGE minimal test.

Now consider the spend:

1 CTV

When VERIFY_CTV is OFF and DISCOURAGE is ON it will fail.
When VERIFY_CTV is OFF and DISCOURAGE is OFF it will pass.
When VERIFY_CTV is ON and DISCOURAGE is ON it will fail.
When VERIFY_CTV is ON and DISCOURAGE is OFF it will pass.

Therefore, if we would like to test this, we need to specify that just DISCOURAGE is OFF (covered by existing behavior).

This is also why we cannot use FillFlags or TrimFlags afaict, since we don’t know if the transaction is intended to succeed or fail ( CTV or 1 CTV) so we can’t infer the correct fill.

We cannot simply change the line back to the form that does not reture discourage upgradable because we need to discourage using such features to protect old mining nodes during a soft fork upgrade in the future.

This was botched with CSV; if we wanted to add a new CSV type that uses the disabled flag, we would have to wait to age out old mining nodes for it to be safe for them to continue to mine on the network. Hence my patch to fix it, so we get a head start on that arduous process.

glozow commented at 1:42 pm on September 4, 2021: member

No, I think we have a miscommunication here. When I say a flag is a soft fork, I mean that applying the flag can only restrict the space of acceptable scripts - see #10699

in the correct txn for H. When VERIFY_CTV is OFF and DISCOURAGE is ON it will fail. When VERIFY_CTV is ON and DISCOURAGE is ON it will pass.

In the example you’re giving here, VERIFY_CTV is not a soft fork, since applying it to this transaction causes it to go from invalid to valid. Thus, I don’t think it is the right approach. If you want templates to be upgradeable, you can version them and discourage greater versions, allow an extra field and discourage using it, etc.

We cannot simply change the line back to the form that does not reture discourage upgradable because we need to discourage using such features to protect old mining nodes during a soft fork upgrade in the future.

I agree with protecting old nodes that haven’t upgraded in soft forks, but I still don’t understand why you’re doing it this way. IIUC since #5000 they should be discouraging NOP4 in policy already.

JeremyRubin commented at 3:16 pm on September 4, 2021: contributor

Hard forks / soft fork only applies in that way to consensus flags, not standardness. DISCOURAGE_UPGRADABLE_NOPS is never used in consensus.

Taproot does something sorta similar currently, albeit in Input standardness rather than interpreter although the distinction is immaterial.

0    // Check for non-standard pay-to-script-hash in inputs
1    const bool taproot_active = DeploymentActiveAfter(m_active_chainstate.m_chain.Tip(), args.m_chainparams.GetConsensus(), Consensus::DEPLOYMENT_TAPROOT);
2    if (fRequireStandard && !AreInputsStandard(tx, m_view, taproot_active)) {
3        return state.Invalid(TxValidationResult::TX_INPUTS_NOT_STANDARD, "bad-txns-nonstandard-inputs");
4    }

Therefore when taproot_active goes from false -> true, a transaction goes from invalid to valid. It’s still a soft fork.

Part of the reason why we need the discourage in the case that SCRIPT_VERIFY_DEFAULT_CHECK_TEMPLATE_VERIFY_HASH is not set is (if it helps to reason about it) what if CTV never gets activated? We should perfectly preserve the old NOP semantics of discouraging it.

JeremyRubin commented at 3:40 pm on September 4, 2021: contributor

Something else that might help to see this more clearly:

The (current BIP119) code for CTV & NOPs:

 0               case OP_CHECKTEMPLATEVERIFY:
 1                {
 2                    // if flags not enabled; treat as a NOP4
 3                    if (!(flags & SCRIPT_VERIFY_DEFAULT_CHECK_TEMPLATE_VERIFY_HASH)) {
 4                        if (flags & SCRIPT_VERIFY_DISCOURAGE_UPGRADABLE_NOPS)
 5                            return set_error(serror, SCRIPT_ERR_DISCOURAGE_UPGRADABLE_NOPS);
 6                        break;
 7                    }
 8
 9                    if (stack.size() < 1)
10                        return set_error(serror, SCRIPT_ERR_INVALID_STACK_OPERATION);
11
12                    // If the argument was not 32 bytes, treat as OP_NOP4:
13                    switch (stack.back().size()) {
14                        case 32:
15                            if (!checker.CheckDefaultCheckTemplateVerifyHash(stack.back())) {
16                                return set_error(serror, SCRIPT_ERR_TEMPLATE_MISMATCH);
17                            }
18                            break;
19                        default:
20                            // future upgrade can add semantics for this opcode with different length args
21                            // so discourage use when applicable
22                            if (flags & SCRIPT_VERIFY_DISCOURAGE_UPGRADABLE_NOPS) {
23                                return set_error(serror, SCRIPT_ERR_DISCOURAGE_UPGRADABLE_NOPS);
24                            }
25                    }
26                }
27                break;
28
29                case OP_NOP1: case OP_NOP5:
30                case OP_NOP6: case OP_NOP7: case OP_NOP8: case OP_NOP9: case OP_NOP10:
31                {
32                    if (flags & SCRIPT_VERIFY_DISCOURAGE_UPGRADABLE_NOPS)
33                        return set_error(serror, SCRIPT_ERR_DISCOURAGE_UPGRADABLE_NOPS);
34                }
35                break;

The pruned code if SCRIPT_VERIFY_CHECK_TEMPLATE_VERIFY_HASH is always set:

 0               case OP_CHECKTEMPLATEVERIFY:
 1                {
 2                    if (stack.size() < 1)
 3                        return set_error(serror, SCRIPT_ERR_INVALID_STACK_OPERATION);
 4
 5                    // If the argument was not 32 bytes, treat as OP_NOP4:
 6                    switch (stack.back().size()) {
 7                        case 32:
 8                            if (!checker.CheckDefaultCheckTemplateVerifyHash(stack.back())) {
 9                                return set_error(serror, SCRIPT_ERR_TEMPLATE_MISMATCH);
10                            }
11                            break;
12                        default:
13                            // future upgrade can add semantics for this opcode with different length args
14                            // so discourage use when applicable
15                            if (flags & SCRIPT_VERIFY_DISCOURAGE_UPGRADABLE_NOPS) {
16                                return set_error(serror, SCRIPT_ERR_DISCOURAGE_UPGRADABLE_NOPS);
17                            }
18                    }
19                }
20                break;

The (pruned) code if CTV is never set.

 0               case OP_CHECKTEMPLATEVERIFY:
 1                {
 2                    // if flags not enabled; treat as a NOP4
 3                    if (flags & SCRIPT_VERIFY_DISCOURAGE_UPGRADABLE_NOPS)
 4                        return set_error(serror, SCRIPT_ERR_DISCOURAGE_UPGRADABLE_NOPS);
 5                }
 6                break;
 7
 8                case OP_NOP1: case OP_NOP5:
 9                case OP_NOP6: case OP_NOP7: case OP_NOP8: case OP_NOP9: case OP_NOP10:
10                {
11                    if (flags & SCRIPT_VERIFY_DISCOURAGE_UPGRADABLE_NOPS)
12                        return set_error(serror, SCRIPT_ERR_DISCOURAGE_UPGRADABLE_NOPS);
13                }
14                break;

Which is the same as the current (master branch) code:

0                case OP_NOP1: case OP_NOP4: case OP_NOP5:
1                case OP_NOP6: case OP_NOP7: case OP_NOP8: case OP_NOP9: case OP_NOP10:
2                {
3                    if (flags & SCRIPT_VERIFY_DISCOURAGE_UPGRADABLE_NOPS)
4                        return set_error(serror, SCRIPT_ERR_DISCOURAGE_UPGRADABLE_NOPS);
5                }
6                break;

The pruned code if SCRIPT_VERIFY_CHECK_TEMPLATE_VERIFY_HASH is always set and the argument is known to be 32 bytes:

0               case OP_CHECKTEMPLATEVERIFY:
1                {
2                    if (stack.size() < 1)
3                        return set_error(serror, SCRIPT_ERR_INVALID_STACK_OPERATION);
4                    if (!checker.CheckDefaultCheckTemplateVerifyHash(stack.back())) {
5                        return set_error(serror, SCRIPT_ERR_TEMPLATE_MISMATCH);
6                    }
7                }
8                break;

The pruned code if SCRIPT_VERIFY_CHECK_TEMPLATE_VERIFY_HASH is always set and the argument is known to not be 32 bytes:

0               case OP_CHECKTEMPLATEVERIFY:
1                {
2                    if (stack.size() < 1)
3                        return set_error(serror, SCRIPT_ERR_INVALID_STACK_OPERATION);
4                    if (flags & SCRIPT_VERIFY_DISCOURAGE_UPGRADABLE_NOPS) {
5                        return set_error(serror, SCRIPT_ERR_DISCOURAGE_UPGRADABLE_NOPS);
6                    }
7                 }                }
8                break;

The list below (I hope?) makes it more plain to see:

When CTV is not active, the semantics are entirely identical to CTV being NOP4.
- During Consensus, Always Passes
- During Standardness, Always Fails (DISCOURAGE_UPGRADABLE_NOPS)
When CTV is active, there must always be at least 1 element on the stack (it’s a mild design goal that opcodes should not take a variable # of things off the stack, so we do not allow CTV with an empty stack to have a meaning, see https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-June/016055.html for some discussion on the matter).
- During Consensus, Passes if >=1 element on stack
- During Standardness, Passes if >=1 element on stack
When CTV is active, and the argument is 32 bytes, we apply CTV rules
- During Consensus, Passes if CTV Hash Matches TXN
- During Standardness, Passes if CTV Hash Matches TXN
When CTV is active, and the argument is not 32 bytes, we continue to treat as a NOP (I could probably clarify the situation a bit better here by making a new flag – DISCOURAGE_UPGRADABLE_TEMPLATE_PROGRAM, but the first DISCOURAGE_UPGRADABLE_NOP check still causes this problem by itself).
- During Consensus, always passes
- During Standardness, always fails (DISCOURAGE_UPGRADABLE_NOPS)

Therefore we’re only ever adding cases where something can fail in consensus, and standardness is more strict that consensus.

glozow commented at 9:54 am on September 7, 2021: member

Taproot does something sorta similar currently, albeit in Input standardness rather than interpreter although the distinction is immaterial. Therefore when taproot_active goes from false -> true, a transaction goes from invalid to valid. It’s still a soft fork.

To clarify, I’m talking about soft forks wrt script verification flags as inputs to the interpreter, not about our policy rules. I don’t think the distinction is immaterial. Yes, I am aware that when taproot activates we will no longer discourage taproot spends in policy, while consensus rules will tighten, since it is a soft fork.

What I am trying to say is: a property we desire in the script verification flags is that the addition of another flag strictly reduces the space of acceptable scripts. This property is what is being tested by the maximality checks in transaction_tests; they fail because SCRIPT_VERIFY_CHECK_TEMPLATE_VERIFY_HASH does not have this property. My suggestion is rather than bypassing this check in transaction_tests, implement CTV extensibility and maintain this property by adding another script verification flag. A method is outlined here.

When CTV is active, and the argument is not 32 bytes, we continue to treat as a NOP (I could probably clarify the situation a bit better here by making a new flag – DISCOURAGE_UPGRADABLE_TEMPLATE_PROGRAM, but the first DISCOURAGE_UPGRADABLE_NOP check still causes this problem by itself).

I believe the approach of having a SCRIPT_VERIFY_DISCOURAGE_UPGRADABLE_TEMPLATE_PROGRAM would be better. Again, I don’t think SCRIPT_VERIFY_DISCOURAGE_UPGRADABLE_NOPS should be in the logic that handles the now-repurposed NOP4.

JeremyRubin commented at 9:29 pm on September 7, 2021: contributor

Soft fork/hard fork refers strictly to consensus, not to policy. DISCOURAGE_UPGRADABLE_NOPS is policy.

I don’t think we can – while preserving the (A|B) is subset A – safely do any OP_NOP upgrade since we need some flag which is DISCOURAGE_UNACTIVE_UPGRADED_BLAH to discourage using a thing before the upgrade is active but where it is still defined. This will be a problem for Taproot, CTV, APO, etc. So it seems either we must break this property of (A|B) is subset A while doing an upgrade, or we don’t do any upgrades?

In other words, we fundamentally need some sort of IS_EXECUTING_IN_CONSENSUS/IS_EXECUTING_IN_POLICY flag to distinguish if using a to-be activated feature before it is active should be a failure or not.

(sorry for the deleted comment, accident)

sipa commented at 10:23 pm on September 7, 2021: member

I don’t see why this property needs to be broken. To the best of my knowledge, every script validation flag today, is purely restrictive (in the sense that the set of valid scripts under A|B is a subset of the scripts valid under A). It is true that such a property is not strictly required for anything but consensus rules (which the various DISCOURAGE ones aren’t), but it is extremely valuable for testing I think.

I haven’t followed the discussion, feel free to explain what I’m missing, or where I can find it.

JeremyRubin commented at 10:35 pm on September 7, 2021: contributor

Sure. Suppose you want to introduce a new-fangled thing (like… say a Segwit V1 spend).

We want Segwit V1 to be discouraged for spending until such a time that Segwit V1 is active on the network, and if Segwit V1 failed to activate, we don’t want it to become able to broadcast.

Taproot in particular cheats at this, though, because it uses the witnessversion transaction level test. So the fact that V1 is discouraged does not require a flag. However, if we didn’t scan witness versions, we would have to have a script flag for discouraging.

Take for example, any OP_SUCESSX. If we add a semantic to it, before it is active we need to continue to set SCRIPT_VERIFY_DISCOURAGE_OP_SUCCESS, and after the rule is active we need to set SCRIPT_VERIFY_NEWFANGLED_DOODAD which should cause us to not have SCRIPT_VERIFY_DISCOURAGE_OP_SUCCESS applied. If it fails to activate we must continue to have SCRIPT_VERIFY_NEWFANGLED_DOODAD unset.

edit: and it’s not sufficient to use the TrimFlags/FillFlags because having unset SCRIPT_VERIFY_NEWFANGLED_DOODAD does not mean that SCRIPT_VERIFY_DISCOURAGE_OP_SUCCESS must be any value nor does having SCRIPT_VERIFY_NEWFANGLED_DOODAD mean that SCRIPT_VERIFY_DISCOURAGE_OP_SUCCESS must be set or unset, and it may be desirable to test that the opcode gracefully continues it’s old behavior if the upgrade does not activate.

This is currently an issue for OP_CTV that I noticed (and patched) by making it have discouraged if fails to activate.

However, this is problematic for the test_valid.json file because we can’t write test vectors that are happy under all circumstances or unhappy under all circumstances. E.g., if VERIFY_CTV is unset, it means we failed to activate, so DISCOURAGE_UPGRADABLE_NOP should takeover and fail an otherwise valid tx. If VERIFY_CTV is set, we succeeded activation, so the transaction should be OK.

see #21702 (review) for more context. See also #22871 where I discovered we are not properly discouraging unknown sequences in the arg for CSV, nor in nSequence (two distinct, but realted issues – txin.nSequence had some crufty reasons not to be restrictive, but CSV did not).

sipa commented at 10:58 pm on September 7, 2021: member

@JeremyRubin Thanks for elaborating.

Historically I think we used a different mechanism than Taproot used, where as soon as a new opcode was defined, we’d unconditionally remove DISCOURAGE semantics from it, and just always fully enforce it (correctly), even pre-activation, for mempool transactions.

It hadn’t occurred to me that this approach we’re taking with Taproot (where even valid taproot spends remain non-standard until activation) is in fact only possible because it’s done through non-script IsStandard() rules, and doing the same through DISCOURAGE flags would violate the restricting-only property.

However, I do think it’s better to take that path, and treat future opcode spends as non-standard until activation. At the very least, it means a more predictable relay policy, new nodes switch policy in lockstep, rather than following adoption patterns.

That will mean indeed that adding FLAG_NEW_OPCODE to FLAG_DISCOURAGE is not a pure restriction, but that sounds like a reasonable price to pay.

JeremyRubin commented at 11:38 pm on September 7, 2021: contributor

The patch #22876 makes an attempt at at least preserving testability until we know it is fully active by allowing to test the same transaction with flags as below:

0// tests with DISCOURAGE_UPGRADABLE_NOPS must be unset, no flags required to be set, and the maximality check skipped
1...'DISCOURAGE_UPGRADABLE_NOPS', 'NONE', true]
2// tests with all flags ok, SOME_OPCODE_VERIFY flags required to be set, and the maximality check enforced
3...'NONE', 'SOME_OPCODE_VERIFY']

I think it’s less clean than ideal, but allows cover both cases of soft fork succeeds and soft fork fails.

ajtowns commented at 7:56 pm on May 12, 2022: contributor

I think there’s four interpretations we want; one pre CTV activation:

x OP_NOP4 – consensus valid, relay invalid, doesn’t matter what x might be

and three post CTV activation:

goodhash OP_CTV – fully specced and enforced per BIP119, consensus valid, relay valid
badhash OP_CTV – hash doesn’t match tx, consensus invalid
upgrade OP_CTV – part of upgradeability, consensus valid, relay invalid

The specific case causing a conflict is that you want goodhash OP_CTV to go from “relay invalid” prior to CTV activating to “relay valid” after it activates which is removing a restriction, but indicating CTV is active is adding a flag, and it would be nice if that made things more restrictive rather than less restrictive.

Perhaps ideally we’d say DISCOURAGE_CTV prior to it being active?

Maybe it would be better to explicitly add a flag like DISCOURAGE_CTV or DISCOURAGE_TAPROOT for preventing relay when the softfork is not active; then remove that flag once the fork is active and buried? That would preserve the “flags only restrict things” and still get the full variety of behaviours, and only mean adding a two flags instead of one temporarily?

If we wanted to ensure new nodes didn’t relay CTV transactions before they’d caught up to CTV activation even after DISCOURAGE_CTV was removed from the codebase, we could keep that behaviour by making standardness logic be “don’t relay anything until your tip sees all the expected soft forks as active”.

JeremyRubin commented at 10:05 pm on May 12, 2022: contributor

Would this approach work for anyprevout key types as well?

JeremyRubin commented at 10:06 pm on May 12, 2022: contributor

I generally philosophically feel a bit funky about taking on more code complexity in consensus logic, where extending the testing logic complexity would do. But if it’s a clear cut approach that works generally I might feel different.

ajtowns commented at 3:55 pm on May 13, 2022: contributor

Would this approach work for anyprevout key types as well?

I think the idea is that current behaviour would be just:

key	sig	flags=nothing	DISCOURAGE
apo + future	*	valid	invalid

But with APO support that would become:

key	sig	flags=nothing	DISCOURAGE	APO	DISCOURAGE_APO
apo	empty/valid	valid	valid	valid	invalid
apo	badsig	valid	valid	invalid	invalid
future	*	valid	invalid	valid	valid

(And when you combine flags, the result is valid iff all the individual results were valid)

And you’d change PolicyScriptChecks to something like:

0    const unsigned int discourageflags = DeploymentActiveAfter(Tip(), ANYPREVOUT) ? 0 : DISCOURAGE_APO;
1    const unsigned int scriptVerifyFlags = STANDARD_SCRIPT_VERIFY_FLAGS | discourageflags;

(with VERIFY_APO included in STANDARD_SCRIPT_VERIFY_FLAGS)

JeremyRubin commented at 6:10 pm on May 13, 2022: contributor

I’d need to stew on this more and maybe see a PoC working. My brain hurts!

The issue I see is that we need to not only ensure the “invariants” hold for both tx_valid and tx_invalid checks, and it seems hard to ensure that these flags are valid for all the test patterns and combinations we try.

For tx_valid, we need to have that any additional flag set makes it invalid,right?

And for tx_invalid, we need to have that any addtional flag unset makes it valid, right?

So if we had an invalid tx, it seems that both the APO and DISCOURAGE_APO flag being set, unsetting just one still yields and invalid transaction…

ajtowns commented at 7:33 pm on May 13, 2022: contributor

Hmm, fair: in my head I was thinking you’d set one of APO or DISCOURAGE_APO not both, but then when I looked at the code figured you’d want to have APO in STANDARD_SCRIPT_VERIFY_FLAGS, which would mean setting both. Having DISCOURAGE_VALID_APO instead avoids that inconsistency.

 0    } else if ((pubkey.size() == 1 || pubkey.size() == 33) && pubkey[0] == 0x01) {
 1        if (flags & SCRIPT_VERIFY_ANYPREVOUT) != 0) {
 2            // check valid APO
 3            if (not valid APO) {
 4                 return false; // serror is set
 5            }
 6            if (flags & SCRIPT_VERIFY_DISCOURAGE_VALIDANYPREVOUT) != 0) {
 7                 return set_error(serror, TAPROOT_DISCOURAGE_VALIDANYPREVOUT);
 8            }
 9        }
10    } else { ... }

For tx_valid, we need to have that any additional flag set makes it invalid,right?

Hmm. Yeah, the tests are:

tx_valid:
- we’ve specified the maximal set of flags, adding any will make it invalid
- removing any flags will not make it invalid
tx_invalid:
- removing any flags will make it valid
- adding any flags will not make it valid

With the caveat that the flags specified for tx_valid are inverted – you specify the flags that would make it invalid, which is usually NONE.

So if we had an invalid tx, it seems that both the APO and DISCOURAGE_APO flag being set, unsetting just one still yields and invalid transaction…

This just means you test it by having the same tx specified twice in tx_invalid, once with just the APO flag and once with just the DISCOURAGE_APO flag? I think doing DISCOURAGE_VALID_APO solves this though, since then you just have one test with both flags, and removing either makes it valid.

JeremyRubin commented at 9:41 pm on May 13, 2022: contributor

So we need the following:

0DISCOURAGE_APO
1DISCOURAGE_VALID_APO
2DISCOURAGE_NEW_KEYTYPE
3VALIDATE_APO

and we do 2 copies of each transaction in tx_invalid? (Another option here would be to allow for lists of lists of flag combos, and re-run each test across a couple different rules.)

And after activation, we bury it to be:

0DISCOURAGE_NEW_KEYTYPE
1VALIDATE_APO

Overall this feels like a lot of complexity for reviewers on the consensus code v.s. just making the testing logic handle it.

Do you think there are other ways we could clean up our testing strategies / flag passing strategy to make things more clean?

ajtowns commented at 10:46 pm on May 13, 2022: contributor

We currently have DISCOURAGE_UPGRADABLE_PUBKEYTYPE.

When adding APO, we’d add SCRIPT_VERIFY_APO and SCRIPT_VERIFY_DISCOURAGE_VALID_APO.

Tests of valid script path using APO would specify DISCOURAGE_VALID_APO in the tx_valid verify flags, rather than NONE.

Tests of invalid script path spends using APO would specify VERIFY_APO as the minimal flag to make them invalid, rather than DISCOURAGE_UPGRADABLE_PUBKEYTYPE (which no longer makes them invalid).

When burying APO, we’d drop SCRIPT_VERIFY_DISCOURAGE_VALID_APO – no longer specifying it for the tests, removing the if () { return set_error(discouraged); }, and no longer setting the flag for mempool entry when APO is not yet active.

That is, there’s no DISCOURAGE_APO flag, just DISCOURAGE_VALID_APO, and no need to duplicate txs for testing. At least, I think that all works.

bitcoin deleted a comment on May 19, 2022

glozow added the label Tests on Aug 4, 2022

ajtowns referenced this in commit 352be83690 on Sep 7, 2022

ajtowns commented at 8:36 am on September 16, 2022: contributor

Here’s what I’ve ended up doing in https://github.com/bitcoin-inquisition/bitcoin/pull/4/files:

add SCRIPT_VERIFY_ANYPREVOUT and SCRIPT_VERIFY_DISCOURAGE_ANYPREVOUT
remove APO pubkeys (ie 0x01 and 0x01[32 bytes]) from SCRIPT_VERIFY_DISCOURAGE_UPGRADEABLE_PUBKEYTYPE
if DISCOURAGE_ANYPREVOUT is set, invalid if an APO pubkey is attempted to be used
if ANYPREVOUT is set, invalidate if an APO pubkey is used and the signature doesn’t comply with BIP 118
set DISCOURAGE_ANYPREVOUT in MemPoolAccept::PolicyScriptChecks when APO is not activated

After activation, drop DISCOURAGE_ANYPREVOUT and remove code paths that set it, or that were only executed if it was set.

(I ended up having DISCOURAGE_APO instead of DISCOURAGE_VALID_APO rather than the other way around; but either way, there’s only one, not both)

ajtowns commented at 4:02 am on October 11, 2022: contributor

Possibly relevant #5253 (comment)

JeremyRubin commented at 4:31 am on October 11, 2022: contributor

good find!

JeremyRubin closed this on Dec 16, 2022

bitcoin locked this on Dec 16, 2023

Improve Transaction Tests Flags #22865