[Tests] Compute the Power Set of all flags instead of one by one exclusion

JeremyRubin commented at 11:19 pm on September 10, 2021: contributor

Currently, at the end of a test vector (valid or invalid) we iterate the flags one by one and unset them and check that it makes the transaction succeed or fail.

This is called ‘minimality’ in that we want to check that each flag is sufficient on it’s own. However, I think we should be instead doing the 2**(specified) combinations of flag, to ensure that for all combinations of flags only the one we have specified is either valid or invalid.

Otherwise, we might have e.g. subsets of flags that have interactions we’re not detecting here.

Interestingly, the asymptotic runtime here should be better on average (since we don’t usually set that many flags, v.s. 32 iterations on the one by one checker), but the worst case is 2**32 flag combos. It’s up to the test writers to not abuse this check.

Contrived example demonstrating the problem:

0const auto sum = flag_a_set + flag_b_set +flag_c_set ;
1// check parity is even
2if (sum & 1) throw "Oops";

having set “a,b,c”, the one-by-one checker for “a,b,c,” expects failure would check [“a,b”, “a,c”, “b,c”] and see no failures since the parity of each is 2. However, [“a”, “b”, “c”] would fail since the parity is 1.

I’m not aware of any code that is written in this specific manner, but similar circumstance might arise naturally in the code.

in src/test/transaction_tests.cpp:200 in 125fb1c8dc outdated

197     std::set<unsigned int> flags_combos;
198-    for (const auto& pair : mapFlagNames) {
199-        const unsigned int flags_excluding_one = TrimFlags(flags & ~(pair.second));
200+    for (const auto& excluded_flags : all_flag_combos) {
201+        const unsigned int flags_excluding_one = TrimFlags(excluded_flags);
202         if (flags != flags_excluding_one) {

JeremyRubin commented at 11:21 pm on September 10, 2021:

this check is actually probably superfluous with the iteration not including combos -1 (the all bits set).

JeremyRubin commented at 11:30 pm on September 10, 2021:

update: it’s not, because of the TrimFlags

esneider commented at 2:28 am on September 12, 2021:

Notice that TrimFlags only removes flags, so it is, indeed, superfluous: flags_excluding_one ⊆ excluded_flags ⊂ flags

Having said that, there’s no harm in leaving it here.

JeremyRubin force-pushed on Sep 10, 2021

JeremyRubin renamed this:
~~[Tests] Computer the Power Set of all flags instead of one by one exclusion~~
[Tests] Compute the Power Set of all flags instead of one by one exclusion
on Sep 10, 2021

fanquake added the label Tests on Sep 10, 2021

JeremyRubin commented at 2:53 pm on September 11, 2021: contributor

If anyone doesn’t like the submask code, I could swap it for this algorithm (which is more efficient but not-obvious) https://cp-algorithms.com/algebra/all-submasks.html#toc-tgt-0

in src/test/transaction_tests.cpp:178 in 7812cff7c3 outdated

173@@ -174,9 +174,29 @@ unsigned int FillFlags(unsigned int flags)
174 // Assumes that mapFlagNames contains all script verify flags.
175 std::set<unsigned int> ExcludeIndividualFlags(unsigned int flags)
176 {
177+    std::vector<unsigned int> set_flags;
178+    const size_t n_flags = sizeof(unsigned int);

esneider commented at 1:59 am on September 12, 2021:

I think you might be missing a * 8 here

JeremyRubin commented at 6:06 am on September 12, 2021:

🤦 –> will fix

in src/test/transaction_tests.cpp:173 in 7812cff7c3 outdated

173@@ -174,9 +174,29 @@ unsigned int FillFlags(unsigned int flags)
174 // Assumes that mapFlagNames contains all script verify flags.

esneider commented at 2:08 am on September 12, 2021:

The doc is out of date now

JeremyRubin commented at 6:13 am on September 12, 2021:

i can’t tell what line you are commenting on

esneider commented at 6:30 am on September 12, 2021:

Right, sorry about that.

0// Exclude each possible script verify flag from flags. Returns a set of these flag combinations
1// that are valid and without duplicates. For example: if flags=1111 and the 4 possible flags are
2// 0001, 0010, 0100, and 1000, this should return the set {0111, 1011, 1101, 1110}.

in src/test/transaction_tests.cpp:202 in 7812cff7c3 outdated

199-        const unsigned int flags_excluding_one = TrimFlags(flags & ~(pair.second));
200+    for (const unsigned int excluded_flags : all_flag_combos) {
201+        const unsigned int flags_excluding_one = TrimFlags(excluded_flags);
202         if (flags != flags_excluding_one) {
203             flags_combos.insert(flags_excluding_one);
204         }

esneider commented at 2:40 am on September 12, 2021:

Not sure if this is a problem, but notice that TrimFlags might cause the resulting flag_combos to have duplicate elements.

Eg, if SCRIPT_VERIFY_P2SH and SCRIPT_VERIFY_WITNESS are set in flags, the mapping done by TrimFlags will make 25% of the resulting elements duplicates.

JeremyRubin commented at 6:08 am on September 12, 2021:

yeah I could do a de-dup pass with TrimFlags. It makes more sense IMO to keep TrimFlags even if superfluos with the above as TrimFlags is a separate validity of flag combo check v.s. the dumb subset generator.

[Tests] Compute the Power Set of all flags instead of one by one exclusion in transaction_tests.cpp 885bf41580

JeremyRubin force-pushed on Sep 12, 2021

in src/test/transaction_tests.cpp:200 in 885bf41580 outdated

204+
205+    std::set<unsigned int> flags_combos;
206+    for (const unsigned int submask : all_flag_combos) {
207+        flags_combos.insert(TrimFlags(submask));
208+    }
209+    flags_combos.erase(flags);

jonatack commented at 2:53 pm on September 12, 2021:

nit, clang-format

 0@@ -174,7 +174,7 @@ unsigned int FillFlags(unsigned int flags)
 1 std::set<unsigned int> GenerateAllSubMasks(unsigned int flags)
 2 {
 3     std::vector<unsigned int> set_flags;
 4-    const size_t n_flags = sizeof(unsigned int)*8;
 5+    const size_t n_flags = sizeof(unsigned int) * 8;
 6     set_flags.reserve(n_flags);
 7@@ -184,10 +184,10 @@ std::set<unsigned int> GenerateAllSubMasks(unsigned int flags)
 8     }
 9     const size_t combos = 1 << set_flags.size();
10     // skip last all set flags
11-    std::vector<unsigned int> all_flag_combos(combos-1);
12-    for (size_t i = 0; i < combos-1; ++i){
13-        for(size_t j = 0; j < set_flags.size(); ++j) {
14-            if (i & (1<<j)) {
15+    std::vector<unsigned int> all_flag_combos(combos - 1);
16+    for (size_t i = 0; i < combos - 1; ++i) {
17+        for (size_t j = 0; j < set_flags.size(); ++j) {
18+            if (i & (1 << j)) {
19                 all_flag_combos[i] |= set_flags[j];

in src/test/transaction_tests.cpp:180 in 885bf41580 outdated

184-            flags_combos.insert(flags_excluding_one);
185+    std::vector<unsigned int> set_flags;
186+    const size_t n_flags = sizeof(unsigned int)*8;
187+    set_flags.reserve(n_flags);
188+    for (size_t i = 0; i < n_flags; ++i) {
189+        const unsigned int flag = flags & (1 << i);

jonatack commented at 2:58 pm on September 12, 2021:

nit, can use braced initialization on these for type safety

0-    const size_t n_flags = sizeof(unsigned int)*8;
1+    const size_t n_flags{sizeof(unsigned int) * 8};
2     set_flags.reserve(n_flags);
3     for (size_t i = 0; i < n_flags; ++i) {
4-        const unsigned int flag = flags & (1 << i);
5+        const unsigned int flag{flags & (1 << i)};

in src/test/transaction_tests.cpp:188 in 885bf41580 outdated

192+        }
193+    }
194+    const size_t combos = 1 << set_flags.size();
195+    // skip last all set flags
196+    std::vector<unsigned int> all_flag_combos(combos-1);
197+    for (size_t i = 0; i < combos-1; ++i){

jonatack commented at 3:02 pm on September 12, 2021:

maybe hoist subtracting one to the declaration of combos

0-    const size_t combos = 1 << set_flags.size();
1+    const size_t combos = (1 << set_flags.size()) - 1;
2     // skip last all set flags
3-    std::vector<unsigned int> all_flag_combos(combos-1);
4-    for (size_t i = 0; i < combos-1; ++i){
5+    std::vector<unsigned int> all_flag_combos(combos);
6+    for (size_t i = 0; i < combos; ++i) {

jonatack commented at 3:32 pm on September 12, 2021: member

Concept ACK. Here are the results for me before/after. It might be interesting to see the result with https://cp-algorithms.com/algebra/all-submasks.html if you’re motivated to try it.

EDIT: added the results for both tests (tx_valid and tx_invalid).

  0test/transaction_tests.cpp(187): Entering test suite "transaction_tests"
  1test/transaction_tests.cpp(189): Entering test case "tx_valid"
  26
  310
  412
  56
  610
  712
  814
  922
 1026
 1128
 1214
 1322
 1426
 1528
 1614
 1722
 1826
 1928
 202
 218
 220
 2365536
 246
 2510
 2612
 270
 280
 290
 300
 310
 320
 330
 340
 350
 360
 372
 3816384
 390
 400
 410
 428
 4365536
 448
 4565536
 468
 4765536
 488
 4965536
 508
 5165536
 520
 5340
 5465544
 5565568
 5665576
 5740
 5865544
 5965568
 6065576
 6132
 6265536
 6365568
 6440
 6565544
 6665568
 6765576
 688
 6965536
 708
 7165536
 720
 730
 740
 750
 7664
 770
 7832
 796
 8010
 8112
 820
 830
 840
 8564
 860
 8732
 880
 890
 900
 9164
 920
 930
 9464
 950
 9664
 978
 9865536
 990
1008
10165536
1020
103test/transaction_tests.cpp(189): Leaving test case "tx_valid"; testing time: 918607us
104test/transaction_tests.cpp(280): Entering test case "tx_invalid"
1050
1060
1070
1080
1090
1100
1110
1120
1130
1140
1150
1160
1170
1180
1190
1200
1210
1220
1230
1240
1250
1260
1270
1280
1291
130512
1310
1320
1330
1340
1350
1360
1370
1380
1390
1400
1410
1420
1430
1440
1451
1461024
1470
1480
1492049
1504096
1514097
1520
1531
1540
1551
1560
1571
1580
1591
1600
1611
1620
1631
1640
1651
1660
1671
1680
1691
1700
1711
1722049
1734096
1744097
1750
1760
1771
1780
1790
1801
1810
1820
1830
1840
1850
1860
1870
1880
1890
1900
1911
19265536
1930
194test/transaction_tests.cpp(280): Leaving test case "tx_invalid"; testing time: 270583us

  0test/transaction_tests.cpp(204): Entering test suite "transaction_tests"
  1test/transaction_tests.cpp(206): Entering test case "tx_valid"
  20
  32
  44
  56
  68
  710
  812
  90
 102
 114
 126
 138
 1410
 1512
 160
 172
 184
 196
 208
 2110
 2212
 2314
 2416
 2518
 2620
 2722
 2824
 2926
 3028
 310
 322
 334
 346
 358
 3610
 3712
 3814
 3916
 4018
 4120
 4222
 4324
 4426
 4528
 460
 472
 484
 496
 508
 5110
 5212
 5314
 5416
 5518
 5620
 5722
 5824
 5926
 6028
 610
 622
 638
 640
 6565536
 660
 672
 684
 696
 708
 7110
 7212
 730
 740
 750
 760
 770
 780
 790
 800
 810
 820
 830
 842
 8516384
 860
 870
 880
 890
 908
 9165536
 920
 938
 9465536
 950
 968
 9765536
 980
 998
10065536
1010
1028
10365536
1040
1050
1068
10732
10840
10965536
11065544
11165568
11265576
1130
1148
11532
11640
11765536
11865544
11965568
12065576
1210
12232
12365536
12465568
1250
1268
12732
12840
12965536
13065544
13165568
13265576
1330
1348
13565536
1360
1378
13865536
1390
1400
1410
1420
14364
1440
14532
1460
1472
1484
1496
1508
15110
15212
1530
1540
1550
15664
1570
15832
1590
1600
1610
16264
1630
1640
16564
1660
16764
1680
1698
17065536
1710
1720
1738
17465536
1750
176test/transaction_tests.cpp(206): Leaving test case "tx_valid"; testing time: 1159591us
177test/transaction_tests.cpp(297): Entering test case "tx_invalid"
1780
1790
1800
1810
1820
1830
1840
1850
1860
1870
1880
1890
1900
1910
1920
1930
1940
1950
1960
1970
1980
1990
2000
2010
2020
2031
204512
2050
2060
2070
2080
2090
2100
2110
2120
2130
2140
2150
2160
2170
2180
2190
2201
2211024
2220
2230
2240
2251
2262049
2274096
2284097
2290
2301
2310
2321
2330
2341
2350
2361
2370
2381
2390
2401
2410
2421
2430
2441
2450
2461
2470
2481
2490
2501
2512049
2524096
2534097
2540
2550
2561
2570
2580
2591
2600
2610
2620
2630
2640
2650
2660
2670
2680
2690
2700
2711
27265536
2730
274test/transaction_tests.cpp(297): Leaving test case "tx_invalid"; testing time: 539616us

DrahtBot commented at 3:40 pm on September 12, 2021: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#22954 ([TESTS] Allow tx_invalid.json tests to include flag rules for if_unset: [A,B,C] then_unset: [D] by JeremyRubin)
#22876 ([TESTS] Update Transaction Tests to permit setting a flag as always on and disabling the exhaustive failure test by JeremyRubin)
#21702 (Implement BIP-119 Validation (CheckTemplateVerify) by JeremyRubin)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Squash: Use FancyPants Bit Twiddling Algorithm to Generate Submasks d185f36ede

JeremyRubin commented at 7:06 pm on September 12, 2021: contributor

I added the textbook algorithm for this: feel free to test that the output is the same then I can squash.

jonatack commented at 12:08 pm on September 13, 2021: member

@JeremyRubin Nice. The textbook algorithm produces the same values in the tx_invalid test but fewer values in the tx_valid test. Among the values that it does not return compared to your algo, only 2 or 3 of them are unique. WDYT?

  0test/transaction_tests.cpp(194): Entering test case "tx_valid"
  10
  22
  34
  46
  58
  610
  712
  80
  92
 104
 116
 128
 1310
 1412
 150
 162
 174
 186
 198
 2010
 2112
 2214
 2316
 2418
 2520
 2622
 2724
 2826
 2928
 300
 312
 324
 336
 348
 3510
 3612
 3714
 3816
 3918
 4020
 4122
 4224
 4326
 4428
 450
 462
 474
 486
 498
 5010
 5112
 5214
 5316
 5418
 5520
 5622
 5724
 5826
 5928
 600
 612
 628
 630
 640
 652
 664
 676
 688
 6910
 7012
 710
 720
 730
 740
 750
 760
 770
 782
 7916384
 800
 810
 820
 830
 848
 8565536
 860
 878
 8865536
 890
 908
 9165536
 920
 938
 9465536
 950
 968
 9765536
 980
 990
1008
10132
10240
10365536
10465544
10565568
1060
1078
10832
10940
11065536
11165544
11265568
1130
11432
11565536
1160
1178
11832
11940
12065536
12165544
12265568
1230
1248
12565536
1260
1278
12865536
1290
1300
1310
1322
1334
1346
1358
13610
13712
1380
1390
1400
1410
1420
1430
1440
1450
1468
14765536
1480
1490
1508
15165536
1520
153test/transaction_tests.cpp(194): Leaving test case "tx_valid"; testing time: 597522us
154test/transaction_tests.cpp(284): Entering test case "tx_invalid"
1550
1560
1570
1580
1590
1600
1610
1620
1630
1640
1650
1660
1670
1680
1690
1700
1710
1720
1730
1740
1750
1760
1770
1780
1790
1801
181512
1820
1830
1840
1850
1860
1870
1880
1890
1900
1910
1920
1930
1940
1950
1960
1971
1981024
1990
2000
2010
2021
2032049
2044096
2054097
2060
2071
2080
2091
2100
2111
2120
2131
2140
2151
2160
2171
2180
2191
2200
2211
2220
2231
2240
2251
2260
2271
2282049
2294096
2304097
2310
2320
2331
2340
2350
2361
2370
2380
2390
2400
2410
2420
2430
2440
2450
2460
2470
2481
24965536
2500
251test/transaction_tests.cpp(284): Leaving test case "tx_invalid"; testing time: 170187us

in src/test/transaction_tests.cpp:278 in d185f36ede

275-                if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, ~flags_excluding_one, txdata, strTest, /* expect_valid */ false)) {
276+            for (auto submask_flags : GenerateAllSubMasks(verify_flags)) {
277+                if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, ~submask_flags, txdata, strTest, /* expect_valid */ false)) {
278                     BOOST_ERROR("Too many flags unset: " << strTest);
279                 }
280             }

esneider commented at 4:48 am on September 14, 2021:

If I understand the intent of this code correctly, the idea is that given a set of chosen flags, we want to check that every possible superset of flags doesn’t succeed. If that’s so, I believe there’s a problem with the current FlipAll( GenerateAllSubMasks( flags ) ) logic for computing all possible supersets.

Eg, assuming 3-bit words, for flags = 100, we’ll generate all flags sub masks {000}, and then flip them {111}. However, what we really want is to generate all the ~flags = 011 sub masks {010, 001, 000}, and then flip them {101, 110, 111}.

In short, there’s a missing bit flip before computing all the sub masks. Ie. it should be

0FlipAll( GenerateAllSubMasks( ~flags ) )

With that said, there’s still the issue that flipping flag bits doesn’t interact well with the TrimFlags operation done inside GenerateAllSubMasks.

Eg. if the only flag bits wereVERIFY_P2SH and VERIFY_WITNESS (in that order), for flags = 00 we’d generate all the ~flags = 11 sub masks {10, 01, 00}, trim them {10, 00, 00} (01 is mapped to 00), remove duplicates {10, 00}, and finally flip them {01, 11}. However, we ended up with an invalid combination (01) when the correct result would have been {10, 11}.

More generally, after fixing the initial missing flip, the code would look something like

0FlipAll( Unique( TrimFlags( GenerateAllSubMasksWithoutTrimming( ~flags ) ) ) )

when what we really want is something like:

0for flag_superset in FlipAll( GenerateAllSubMasksWithoutTrimming( ~flags ) ) ) ):
1    if flag_superset == TrimFlags( flag_superset ):
2        # use flag_superset

JeremyRubin commented at 6:13 pm on September 14, 2021:

yeah you’re totally right; i’ve been playing with it a bit as well and noticed that it seems to not be quite right (thanks @jonatack for pointing it out).

The logic is definitely off, and I think it may be intractible in this case since GenerateAllSubMasksWithoutTrimming(~flags) ends up being pretty big… with 20 it might be ok, but eventually there would be too many flags.

esneider commented at 5:08 am on September 15, 2021:

Makes sense. Maybe bound the amount of flipped bits? That way you can test some flag interactions, but bound the exponential growth. The bounding capability would also help to put a safeguard against exponential explosion in the tx_invalid test if there are too many set bits in the initial mask.

JeremyRubin commented at 6:45 pm on October 22, 2021: contributor

Closing this for now – needs more research/discussion on how to accomplish actually using the powerset properly, see #22948 (review)

JeremyRubin closed this on Oct 22, 2021

JeremyRubin added the label Up for grabs on Oct 22, 2021

DrahtBot locked this on Oct 30, 2022

[Tests] Compute the Power Set of all flags instead of one by one exclusion #22948

Conflicts