[Tests] Compute the Power Set of all flags instead of one by one exclusion #22948

JeremyRubin commented at 11:19 PM on September 10, 2021: contributor

Currently, at the end of a test vector (valid or invalid) we iterate the flags one by one and unset them and check that it makes the transaction succeed or fail.

This is called 'minimality' in that we want to check that each flag is sufficient on it's own. However, I think we should be instead doing the 2**(specified) combinations of flag, to ensure that for all combinations of flags only the one we have specified is either valid or invalid.

Otherwise, we might have e.g. subsets of flags that have interactions we're not detecting here.

Interestingly, the asymptotic runtime here should be better on average (since we don't usually set that many flags, v.s. 32 iterations on the one by one checker), but the worst case is 2**32 flag combos. It's up to the test writers to not abuse this check.

Contrived example demonstrating the problem:

const auto sum = flag_a_set + flag_b_set +flag_c_set ;
// check parity is even
if (sum & 1) throw "Oops";

having set "a,b,c", the one-by-one checker for "a,b,c," expects failure would check ["a,b", "a,c", "b,c"] and see no failures since the parity of each is 2. However, ["a", "b", "c"] would fail since the parity is 1.

I'm not aware of any code that is written in this specific manner, but similar circumstance might arise naturally in the code.

in src/test/transaction_tests.cpp:200 in 125fb1c8dc outdated

 197 |      std::set<unsigned int> flags_combos;
 198 | -    for (const auto& pair : mapFlagNames) {
 199 | -        const unsigned int flags_excluding_one = TrimFlags(flags & ~(pair.second));
 200 | +    for (const auto& excluded_flags : all_flag_combos) {
 201 | +        const unsigned int flags_excluding_one = TrimFlags(excluded_flags);
 202 |          if (flags != flags_excluding_one) {

JeremyRubin commented at 11:21 PM on September 10, 2021:

this check is actually probably superfluous with the iteration not including combos -1 (the all bits set).

JeremyRubin commented at 11:30 PM on September 10, 2021:

update: it's not, because of the TrimFlags

esneider commented at 2:28 AM on September 12, 2021:

Notice that TrimFlags only removes flags, so it is, indeed, superfluous: flags_excluding_one ⊆ excluded_flags ⊂ flags

Having said that, there's no harm in leaving it here.

JeremyRubin force-pushed on Sep 10, 2021

JeremyRubin renamed this:
~~[Tests] Computer the Power Set of all flags instead of one by one exclusion~~
[Tests] Compute the Power Set of all flags instead of one by one exclusion
on Sep 10, 2021

fanquake added the label Tests on Sep 10, 2021

JeremyRubin commented at 2:53 PM on September 11, 2021: contributor

If anyone doesn't like the submask code, I could swap it for this algorithm (which is more efficient but not-obvious) https://cp-algorithms.com/algebra/all-submasks.html#toc-tgt-0

JeremyRubin cross-referenced this on Sep 11, 2021 from issue [TESTS] Allow tx_invalid.json tests to include flag rules for if_unset: [A,B,C] then_unset: [D] by JeremyRubin

in src/test/transaction_tests.cpp:178 in 7812cff7c3 outdated

 173 | @@ -174,9 +174,29 @@ unsigned int FillFlags(unsigned int flags)
 174 |  // Assumes that mapFlagNames contains all script verify flags.
 175 |  std::set<unsigned int> ExcludeIndividualFlags(unsigned int flags)
 176 |  {
 177 | +    std::vector<unsigned int> set_flags;
 178 | +    const size_t n_flags = sizeof(unsigned int);

esneider commented at 1:59 AM on September 12, 2021:

I think you might be missing a * 8 here

JeremyRubin commented at 6:06 AM on September 12, 2021:

🤦 --> will fix

in src/test/transaction_tests.cpp:173 in 7812cff7c3 outdated

 173 | @@ -174,9 +174,29 @@ unsigned int FillFlags(unsigned int flags)
 174 |  // Assumes that mapFlagNames contains all script verify flags.

esneider commented at 2:08 AM on September 12, 2021:

The doc is out of date now

JeremyRubin commented at 6:13 AM on September 12, 2021:

i can't tell what line you are commenting on

esneider commented at 6:30 AM on September 12, 2021:

Right, sorry about that.

// Exclude each possible script verify flag from flags. Returns a set of these flag combinations
// that are valid and without duplicates. For example: if flags=1111 and the 4 possible flags are
// 0001, 0010, 0100, and 1000, this should return the set {0111, 1011, 1101, 1110}.

in src/test/transaction_tests.cpp:202 in 7812cff7c3 outdated

 199 | -        const unsigned int flags_excluding_one = TrimFlags(flags & ~(pair.second));
 200 | +    for (const unsigned int excluded_flags : all_flag_combos) {
 201 | +        const unsigned int flags_excluding_one = TrimFlags(excluded_flags);
 202 |          if (flags != flags_excluding_one) {
 203 |              flags_combos.insert(flags_excluding_one);
 204 |          }

esneider commented at 2:40 AM on September 12, 2021:

Not sure if this is a problem, but notice that TrimFlags might cause the resulting flag_combos to have duplicate elements.

Eg, if SCRIPT_VERIFY_P2SH and SCRIPT_VERIFY_WITNESS are set in flags, the mapping done by TrimFlags will make 25% of the resulting elements duplicates.

JeremyRubin commented at 6:08 AM on September 12, 2021:

yeah I could do a de-dup pass with TrimFlags. It makes more sense IMO to keep TrimFlags even if superfluos with the above as TrimFlags is a separate validity of flag combo check v.s. the dumb subset generator.

[Tests] Compute the Power Set of all flags instead of one by one exclusion in transaction_tests.cpp 885bf41580

JeremyRubin force-pushed on Sep 12, 2021

in src/test/transaction_tests.cpp:200 in 885bf41580 outdated

 204 | +
 205 | +    std::set<unsigned int> flags_combos;
 206 | +    for (const unsigned int submask : all_flag_combos) {
 207 | +        flags_combos.insert(TrimFlags(submask));
 208 | +    }
 209 | +    flags_combos.erase(flags);

jonatack commented at 2:53 PM on September 12, 2021:

nit, clang-format

@@ -174,7 +174,7 @@ unsigned int FillFlags(unsigned int flags)
 std::set<unsigned int> GenerateAllSubMasks(unsigned int flags)
 {
     std::vector<unsigned int> set_flags;
-    const size_t n_flags = sizeof(unsigned int)*8;
+    const size_t n_flags = sizeof(unsigned int) * 8;
     set_flags.reserve(n_flags);
@@ -184,10 +184,10 @@ std::set<unsigned int> GenerateAllSubMasks(unsigned int flags)
     }
     const size_t combos = 1 << set_flags.size();
     // skip last all set flags
-    std::vector<unsigned int> all_flag_combos(combos-1);
-    for (size_t i = 0; i < combos-1; ++i){
-        for(size_t j = 0; j < set_flags.size(); ++j) {
-            if (i & (1<<j)) {
+    std::vector<unsigned int> all_flag_combos(combos - 1);
+    for (size_t i = 0; i < combos - 1; ++i) {
+        for (size_t j = 0; j < set_flags.size(); ++j) {
+            if (i & (1 << j)) {
                 all_flag_combos[i] |= set_flags[j];

in src/test/transaction_tests.cpp:180 in 885bf41580 outdated

 184 | -            flags_combos.insert(flags_excluding_one);
 185 | +    std::vector<unsigned int> set_flags;
 186 | +    const size_t n_flags = sizeof(unsigned int)*8;
 187 | +    set_flags.reserve(n_flags);
 188 | +    for (size_t i = 0; i < n_flags; ++i) {
 189 | +        const unsigned int flag = flags & (1 << i);

jonatack commented at 2:58 PM on September 12, 2021:

nit, can use braced initialization on these for type safety

-    const size_t n_flags = sizeof(unsigned int)*8;
+    const size_t n_flags{sizeof(unsigned int) * 8};
     set_flags.reserve(n_flags);
     for (size_t i = 0; i < n_flags; ++i) {
-        const unsigned int flag = flags & (1 << i);
+        const unsigned int flag{flags & (1 << i)};

in src/test/transaction_tests.cpp:188 in 885bf41580 outdated

 192 | +        }
 193 | +    }
 194 | +    const size_t combos = 1 << set_flags.size();
 195 | +    // skip last all set flags
 196 | +    std::vector<unsigned int> all_flag_combos(combos-1);
 197 | +    for (size_t i = 0; i < combos-1; ++i){

jonatack commented at 3:02 PM on September 12, 2021:

maybe hoist subtracting one to the declaration of combos

-    const size_t combos = 1 << set_flags.size();
+    const size_t combos = (1 << set_flags.size()) - 1;
     // skip last all set flags
-    std::vector<unsigned int> all_flag_combos(combos-1);
-    for (size_t i = 0; i < combos-1; ++i){
+    std::vector<unsigned int> all_flag_combos(combos);
+    for (size_t i = 0; i < combos; ++i) {

jonatack commented at 3:32 PM on September 12, 2021: contributor

Concept ACK. Here are the results for me before/after. It might be interesting to see the result with https://cp-algorithms.com/algebra/all-submasks.html if you're motivated to try it.

EDIT: added the results for both tests (tx_valid and tx_invalid).

<details><summary><code>ExcludeIndividualFlags</code></summary>

test/transaction_tests.cpp(187): Entering test suite "transaction_tests"
test/transaction_tests.cpp(189): Entering test case "tx_valid"
6
10
12
6
10
12
14
22
26
28
14
22
26
28
14
22
26
28
2
8
0
65536
6
10
12
0
0
0
0
0
0
0
0
0
0
2
16384
0
0
0
8
65536
8
65536
8
65536
8
65536
8
65536
0
40
65544
65568
65576
40
65544
65568
65576
32
65536
65568
40
65544
65568
65576
8
65536
8
65536
0
0
0
0
64
0
32
6
10
12
0
0
0
64
0
32
0
0
0
64
0
0
64
0
64
8
65536
0
8
65536
0
test/transaction_tests.cpp(189): Leaving test case "tx_valid"; testing time: 918607us
test/transaction_tests.cpp(280): Entering test case "tx_invalid"
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
512
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1024
0
0
2049
4096
4097
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
2049
4096
4097
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
1
65536
0
test/transaction_tests.cpp(280): Leaving test case "tx_invalid"; testing time: 270583us

</details>

<details><summary><code>GenerateAllSubMasks</code></summary>

test/transaction_tests.cpp(204): Entering test suite "transaction_tests"
test/transaction_tests.cpp(206): Entering test case "tx_valid"
0
2
4
6
8
10
12
0
2
4
6
8
10
12
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0
2
8
0
65536
0
2
4
6
8
10
12
0
0
0
0
0
0
0
0
0
0
0
2
16384
0
0
0
0
8
65536
0
8
65536
0
8
65536
0
8
65536
0
8
65536
0
0
8
32
40
65536
65544
65568
65576
0
8
32
40
65536
65544
65568
65576
0
32
65536
65568
0
8
32
40
65536
65544
65568
65576
0
8
65536
0
8
65536
0
0
0
0
64
0
32
0
2
4
6
8
10
12
0
0
0
64
0
32
0
0
0
64
0
0
64
0
64
0
8
65536
0
0
8
65536
0
test/transaction_tests.cpp(206): Leaving test case "tx_valid"; testing time: 1159591us
test/transaction_tests.cpp(297): Entering test case "tx_invalid"
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
512
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1024
0
0
0
1
2049
4096
4097
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
2049
4096
4097
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
65536
0
test/transaction_tests.cpp(297): Leaving test case "tx_invalid"; testing time: 539616us

</details>

DrahtBot commented at 3:40 PM on September 12, 2021: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#22954 ([TESTS] Allow tx_invalid.json tests to include flag rules for if_unset: [A,B,C] then_unset: [D] by JeremyRubin)
#22876 ([TESTS] Update Transaction Tests to permit setting a flag as always on and disabling the exhaustive failure test by JeremyRubin)
#21702 (Implement BIP-119 Validation (CheckTemplateVerify) by JeremyRubin)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot cross-referenced this on Sep 12, 2021 from issue [TESTS] Update Transaction Tests to permit setting a flag as always on and disabling the exhaustive failure test by JeremyRubin

Squash: Use FancyPants Bit Twiddling Algorithm to Generate Submasks d185f36ede

JeremyRubin commented at 7:06 PM on September 12, 2021: contributor

I added the textbook algorithm for this: feel free to test that the output is the same then I can squash.

DrahtBot cross-referenced this on Sep 12, 2021 from issue Implement BIP-119 Validation (CheckTemplateVerify) by JeremyRubin

jonatack commented at 12:08 PM on September 13, 2021: contributor

@JeremyRubin Nice. The textbook algorithm produces the same values in the tx_invalid test but fewer values in the tx_valid test. Among the values that it does not return compared to your algo, only 2 or 3 of them are unique. WDYT?

<details><summary>FancyPants version</summary>

test/transaction_tests.cpp(194): Entering test case "tx_valid"
0
2
4
6
8
10
12
0
2
4
6
8
10
12
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0
2
8
0
0
2
4
6
8
10
12
0
0
0
0
0
0
0
2
16384
0
0
0
0
8
65536
0
8
65536
0
8
65536
0
8
65536
0
8
65536
0
0
8
32
40
65536
65544
65568
0
8
32
40
65536
65544
65568
0
32
65536
0
8
32
40
65536
65544
65568
0
8
65536
0
8
65536
0
0
0
2
4
6
8
10
12
0
0
0
0
0
0
0
0
8
65536
0
0
8
65536
0
test/transaction_tests.cpp(194): Leaving test case "tx_valid"; testing time: 597522us
test/transaction_tests.cpp(284): Entering test case "tx_invalid"
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
512
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1024
0
0
0
1
2049
4096
4097
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
2049
4096
4097
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
65536
0
test/transaction_tests.cpp(284): Leaving test case "tx_invalid"; testing time: 170187us

</details>

in src/test/transaction_tests.cpp:278 in d185f36ede

 275 | -                if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, ~flags_excluding_one, txdata, strTest, /* expect_valid */ false)) {
 276 | +            for (auto submask_flags : GenerateAllSubMasks(verify_flags)) {
 277 | +                if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, ~submask_flags, txdata, strTest, /* expect_valid */ false)) {
 278 |                      BOOST_ERROR("Too many flags unset: " << strTest);
 279 |                  }
 280 |              }

esneider commented at 4:48 AM on September 14, 2021:

If I understand the intent of this code correctly, the idea is that given a set of chosen flags, we want to check that every possible superset of flags doesn't succeed. If that's so, I believe there's a problem with the current FlipAll( GenerateAllSubMasks( flags ) ) logic for computing all possible supersets.

Eg, assuming 3-bit words, for flags = 100, we'll generate all flags sub masks {000}, and then flip them {111}. However, what we really want is to generate all the ~flags = 011 sub masks {010, 001, 000}, and then flip them {101, 110, 111}.

In short, there's a missing bit flip before computing all the sub masks. Ie. it should be

FlipAll( GenerateAllSubMasks( ~flags ) )

With that said, there's still the issue that flipping flag bits doesn't interact well with the TrimFlags operation done inside GenerateAllSubMasks.

Eg. if the only flag bits wereVERIFY_P2SH and VERIFY_WITNESS (in that order), for flags = 00 we'd generate all the ~flags = 11 sub masks {10, 01, 00}, trim them {10, 00, 00} (01 is mapped to 00), remove duplicates {10, 00}, and finally flip them {01, 11}. However, we ended up with an invalid combination (01) when the correct result would have been {10, 11}.

More generally, after fixing the initial missing flip, the code would look something like

FlipAll( Unique( TrimFlags( GenerateAllSubMasksWithoutTrimming( ~flags ) ) ) )

when what we really want is something like:

for flag_superset in FlipAll( GenerateAllSubMasksWithoutTrimming( ~flags ) ) ) ):
    if flag_superset == TrimFlags( flag_superset ):
        # use flag_superset

JeremyRubin commented at 6:13 PM on September 14, 2021:

yeah you're totally right; i've been playing with it a bit as well and noticed that it seems to not be quite right (thanks @jonatack for pointing it out).

The logic is definitely off, and I think it may be intractible in this case since GenerateAllSubMasksWithoutTrimming(~flags) ends up being pretty big... with 20 it might be ok, but eventually there would be too many flags.

esneider commented at 5:08 AM on September 15, 2021:

Makes sense. Maybe bound the amount of flipped bits? That way you can test some flag interactions, but bound the exponential growth. The bounding capability would also help to put a safeguard against exponential explosion in the tx_invalid test if there are too many set bits in the initial mask.

JeremyRubin commented at 6:45 PM on October 22, 2021: contributor

Closing this for now -- needs more research/discussion on how to accomplish actually using the powerset properly, see #22948 (review)

JeremyRubin closed this on Oct 22, 2021

JeremyRubin added the label Up for grabs on Oct 22, 2021

bitcoin locked this on Oct 30, 2022