[Tests] Compute the Power Set of all flags instead of one by one exclusion #22948

pull JeremyRubin wants to merge 2 commits into bitcoin:master from JeremyRubin:flag-powerset changing 1 files +19 −14
  1. JeremyRubin commented at 11:19 pm on September 10, 2021: contributor

    Currently, at the end of a test vector (valid or invalid) we iterate the flags one by one and unset them and check that it makes the transaction succeed or fail.

    This is called ‘minimality’ in that we want to check that each flag is sufficient on it’s own. However, I think we should be instead doing the 2**(specified) combinations of flag, to ensure that for all combinations of flags only the one we have specified is either valid or invalid.

    Otherwise, we might have e.g. subsets of flags that have interactions we’re not detecting here.

    Interestingly, the asymptotic runtime here should be better on average (since we don’t usually set that many flags, v.s. 32 iterations on the one by one checker), but the worst case is 2**32 flag combos. It’s up to the test writers to not abuse this check.

    Contrived example demonstrating the problem:

    0const auto sum = flag_a_set + flag_b_set +flag_c_set ;
    1// check parity is even
    2if (sum & 1) throw "Oops";
    

    having set “a,b,c”, the one-by-one checker for “a,b,c,” expects failure would check [“a,b”, “a,c”, “b,c”] and see no failures since the parity of each is 2. However, [“a”, “b”, “c”] would fail since the parity is 1.

    I’m not aware of any code that is written in this specific manner, but similar circumstance might arise naturally in the code.

  2. in src/test/transaction_tests.cpp:200 in 125fb1c8dc outdated
    197     std::set<unsigned int> flags_combos;
    198-    for (const auto& pair : mapFlagNames) {
    199-        const unsigned int flags_excluding_one = TrimFlags(flags & ~(pair.second));
    200+    for (const auto& excluded_flags : all_flag_combos) {
    201+        const unsigned int flags_excluding_one = TrimFlags(excluded_flags);
    202         if (flags != flags_excluding_one) {
    


    JeremyRubin commented at 11:21 pm on September 10, 2021:
    this check is actually probably superfluous with the iteration not including combos -1 (the all bits set).

    JeremyRubin commented at 11:30 pm on September 10, 2021:
    update: it’s not, because of the TrimFlags

    esneider commented at 2:28 am on September 12, 2021:

    Notice that TrimFlags only removes flags, so it is, indeed, superfluous: flags_excluding_one ⊆ excluded_flags ⊂ flags

    Having said that, there’s no harm in leaving it here.

  3. JeremyRubin force-pushed on Sep 10, 2021
  4. JeremyRubin renamed this:
    [Tests] Computer the Power Set of all flags instead of one by one exclusion
    [Tests] Compute the Power Set of all flags instead of one by one exclusion
    on Sep 10, 2021
  5. fanquake added the label Tests on Sep 10, 2021
  6. JeremyRubin commented at 2:53 pm on September 11, 2021: contributor
    If anyone doesn’t like the submask code, I could swap it for this algorithm (which is more efficient but not-obvious) https://cp-algorithms.com/algebra/all-submasks.html#toc-tgt-0
  7. in src/test/transaction_tests.cpp:178 in 7812cff7c3 outdated
    173@@ -174,9 +174,29 @@ unsigned int FillFlags(unsigned int flags)
    174 // Assumes that mapFlagNames contains all script verify flags.
    175 std::set<unsigned int> ExcludeIndividualFlags(unsigned int flags)
    176 {
    177+    std::vector<unsigned int> set_flags;
    178+    const size_t n_flags = sizeof(unsigned int);
    


    esneider commented at 1:59 am on September 12, 2021:
    I think you might be missing a * 8 here

    JeremyRubin commented at 6:06 am on September 12, 2021:
    🤦 –> will fix
  8. in src/test/transaction_tests.cpp:173 in 7812cff7c3 outdated
    173@@ -174,9 +174,29 @@ unsigned int FillFlags(unsigned int flags)
    174 // Assumes that mapFlagNames contains all script verify flags.
    


    esneider commented at 2:08 am on September 12, 2021:
    The doc is out of date now

    JeremyRubin commented at 6:13 am on September 12, 2021:
    i can’t tell what line you are commenting on

    esneider commented at 6:30 am on September 12, 2021:

    Right, sorry about that.

    0// Exclude each possible script verify flag from flags. Returns a set of these flag combinations
    1// that are valid and without duplicates. For example: if flags=1111 and the 4 possible flags are
    2// 0001, 0010, 0100, and 1000, this should return the set {0111, 1011, 1101, 1110}.
    
  9. in src/test/transaction_tests.cpp:202 in 7812cff7c3 outdated
    199-        const unsigned int flags_excluding_one = TrimFlags(flags & ~(pair.second));
    200+    for (const unsigned int excluded_flags : all_flag_combos) {
    201+        const unsigned int flags_excluding_one = TrimFlags(excluded_flags);
    202         if (flags != flags_excluding_one) {
    203             flags_combos.insert(flags_excluding_one);
    204         }
    


    esneider commented at 2:40 am on September 12, 2021:

    Not sure if this is a problem, but notice that TrimFlags might cause the resulting flag_combos to have duplicate elements.

    Eg, if SCRIPT_VERIFY_P2SH and SCRIPT_VERIFY_WITNESS are set in flags, the mapping done by TrimFlags will make 25% of the resulting elements duplicates.


    JeremyRubin commented at 6:08 am on September 12, 2021:
    yeah I could do a de-dup pass with TrimFlags. It makes more sense IMO to keep TrimFlags even if superfluos with the above as TrimFlags is a separate validity of flag combo check v.s. the dumb subset generator.
  10. [Tests] Compute the Power Set of all flags instead of one by one exclusion in transaction_tests.cpp 885bf41580
  11. JeremyRubin force-pushed on Sep 12, 2021
  12. in src/test/transaction_tests.cpp:200 in 885bf41580 outdated
    204+
    205+    std::set<unsigned int> flags_combos;
    206+    for (const unsigned int submask : all_flag_combos) {
    207+        flags_combos.insert(TrimFlags(submask));
    208+    }
    209+    flags_combos.erase(flags);
    


    jonatack commented at 2:53 pm on September 12, 2021:

    nit, clang-format

     0@@ -174,7 +174,7 @@ unsigned int FillFlags(unsigned int flags)
     1 std::set<unsigned int> GenerateAllSubMasks(unsigned int flags)
     2 {
     3     std::vector<unsigned int> set_flags;
     4-    const size_t n_flags = sizeof(unsigned int)*8;
     5+    const size_t n_flags = sizeof(unsigned int) * 8;
     6     set_flags.reserve(n_flags);
     7@@ -184,10 +184,10 @@ std::set<unsigned int> GenerateAllSubMasks(unsigned int flags)
     8     }
     9     const size_t combos = 1 << set_flags.size();
    10     // skip last all set flags
    11-    std::vector<unsigned int> all_flag_combos(combos-1);
    12-    for (size_t i = 0; i < combos-1; ++i){
    13-        for(size_t j = 0; j < set_flags.size(); ++j) {
    14-            if (i & (1<<j)) {
    15+    std::vector<unsigned int> all_flag_combos(combos - 1);
    16+    for (size_t i = 0; i < combos - 1; ++i) {
    17+        for (size_t j = 0; j < set_flags.size(); ++j) {
    18+            if (i & (1 << j)) {
    19                 all_flag_combos[i] |= set_flags[j];
    
  13. in src/test/transaction_tests.cpp:180 in 885bf41580 outdated
    184-            flags_combos.insert(flags_excluding_one);
    185+    std::vector<unsigned int> set_flags;
    186+    const size_t n_flags = sizeof(unsigned int)*8;
    187+    set_flags.reserve(n_flags);
    188+    for (size_t i = 0; i < n_flags; ++i) {
    189+        const unsigned int flag = flags & (1 << i);
    


    jonatack commented at 2:58 pm on September 12, 2021:

    nit, can use braced initialization on these for type safety

    0-    const size_t n_flags = sizeof(unsigned int)*8;
    1+    const size_t n_flags{sizeof(unsigned int) * 8};
    2     set_flags.reserve(n_flags);
    3     for (size_t i = 0; i < n_flags; ++i) {
    4-        const unsigned int flag = flags & (1 << i);
    5+        const unsigned int flag{flags & (1 << i)};
    
  14. in src/test/transaction_tests.cpp:188 in 885bf41580 outdated
    192+        }
    193+    }
    194+    const size_t combos = 1 << set_flags.size();
    195+    // skip last all set flags
    196+    std::vector<unsigned int> all_flag_combos(combos-1);
    197+    for (size_t i = 0; i < combos-1; ++i){
    


    jonatack commented at 3:02 pm on September 12, 2021:

    maybe hoist subtracting one to the declaration of combos

    0-    const size_t combos = 1 << set_flags.size();
    1+    const size_t combos = (1 << set_flags.size()) - 1;
    2     // skip last all set flags
    3-    std::vector<unsigned int> all_flag_combos(combos-1);
    4-    for (size_t i = 0; i < combos-1; ++i){
    5+    std::vector<unsigned int> all_flag_combos(combos);
    6+    for (size_t i = 0; i < combos; ++i) {
    
  15. jonatack commented at 3:32 pm on September 12, 2021: member

    Concept ACK. Here are the results for me before/after. It might be interesting to see the result with https://cp-algorithms.com/algebra/all-submasks.html if you’re motivated to try it.

    EDIT: added the results for both tests (tx_valid and tx_invalid).

      0test/transaction_tests.cpp(187): Entering test suite "transaction_tests"
      1test/transaction_tests.cpp(189): Entering test case "tx_valid"
      26
      310
      412
      56
      610
      712
      814
      922
     1026
     1128
     1214
     1322
     1426
     1528
     1614
     1722
     1826
     1928
     202
     218
     220
     2365536
     246
     2510
     2612
     270
     280
     290
     300
     310
     320
     330
     340
     350
     360
     372
     3816384
     390
     400
     410
     428
     4365536
     448
     4565536
     468
     4765536
     488
     4965536
     508
     5165536
     520
     5340
     5465544
     5565568
     5665576
     5740
     5865544
     5965568
     6065576
     6132
     6265536
     6365568
     6440
     6565544
     6665568
     6765576
     688
     6965536
     708
     7165536
     720
     730
     740
     750
     7664
     770
     7832
     796
     8010
     8112
     820
     830
     840
     8564
     860
     8732
     880
     890
     900
     9164
     920
     930
     9464
     950
     9664
     978
     9865536
     990
    1008
    10165536
    1020
    103test/transaction_tests.cpp(189): Leaving test case "tx_valid"; testing time: 918607us
    104test/transaction_tests.cpp(280): Entering test case "tx_invalid"
    1050
    1060
    1070
    1080
    1090
    1100
    1110
    1120
    1130
    1140
    1150
    1160
    1170
    1180
    1190
    1200
    1210
    1220
    1230
    1240
    1250
    1260
    1270
    1280
    1291
    130512
    1310
    1320
    1330
    1340
    1350
    1360
    1370
    1380
    1390
    1400
    1410
    1420
    1430
    1440
    1451
    1461024
    1470
    1480
    1492049
    1504096
    1514097
    1520
    1531
    1540
    1551
    1560
    1571
    1580
    1591
    1600
    1611
    1620
    1631
    1640
    1651
    1660
    1671
    1680
    1691
    1700
    1711
    1722049
    1734096
    1744097
    1750
    1760
    1771
    1780
    1790
    1801
    1810
    1820
    1830
    1840
    1850
    1860
    1870
    1880
    1890
    1900
    1911
    19265536
    1930
    194test/transaction_tests.cpp(280): Leaving test case "tx_invalid"; testing time: 270583us
    
      0test/transaction_tests.cpp(204): Entering test suite "transaction_tests"
      1test/transaction_tests.cpp(206): Entering test case "tx_valid"
      20
      32
      44
      56
      68
      710
      812
      90
     102
     114
     126
     138
     1410
     1512
     160
     172
     184
     196
     208
     2110
     2212
     2314
     2416
     2518
     2620
     2722
     2824
     2926
     3028
     310
     322
     334
     346
     358
     3610
     3712
     3814
     3916
     4018
     4120
     4222
     4324
     4426
     4528
     460
     472
     484
     496
     508
     5110
     5212
     5314
     5416
     5518
     5620
     5722
     5824
     5926
     6028
     610
     622
     638
     640
     6565536
     660
     672
     684
     696
     708
     7110
     7212
     730
     740
     750
     760
     770
     780
     790
     800
     810
     820
     830
     842
     8516384
     860
     870
     880
     890
     908
     9165536
     920
     938
     9465536
     950
     968
     9765536
     980
     998
    10065536
    1010
    1028
    10365536
    1040
    1050
    1068
    10732
    10840
    10965536
    11065544
    11165568
    11265576
    1130
    1148
    11532
    11640
    11765536
    11865544
    11965568
    12065576
    1210
    12232
    12365536
    12465568
    1250
    1268
    12732
    12840
    12965536
    13065544
    13165568
    13265576
    1330
    1348
    13565536
    1360
    1378
    13865536
    1390
    1400
    1410
    1420
    14364
    1440
    14532
    1460
    1472
    1484
    1496
    1508
    15110
    15212
    1530
    1540
    1550
    15664
    1570
    15832
    1590
    1600
    1610
    16264
    1630
    1640
    16564
    1660
    16764
    1680
    1698
    17065536
    1710
    1720
    1738
    17465536
    1750
    176test/transaction_tests.cpp(206): Leaving test case "tx_valid"; testing time: 1159591us
    177test/transaction_tests.cpp(297): Entering test case "tx_invalid"
    1780
    1790
    1800
    1810
    1820
    1830
    1840
    1850
    1860
    1870
    1880
    1890
    1900
    1910
    1920
    1930
    1940
    1950
    1960
    1970
    1980
    1990
    2000
    2010
    2020
    2031
    204512
    2050
    2060
    2070
    2080
    2090
    2100
    2110
    2120
    2130
    2140
    2150
    2160
    2170
    2180
    2190
    2201
    2211024
    2220
    2230
    2240
    2251
    2262049
    2274096
    2284097
    2290
    2301
    2310
    2321
    2330
    2341
    2350
    2361
    2370
    2381
    2390
    2401
    2410
    2421
    2430
    2441
    2450
    2461
    2470
    2481
    2490
    2501
    2512049
    2524096
    2534097
    2540
    2550
    2561
    2570
    2580
    2591
    2600
    2610
    2620
    2630
    2640
    2650
    2660
    2670
    2680
    2690
    2700
    2711
    27265536
    2730
    274test/transaction_tests.cpp(297): Leaving test case "tx_invalid"; testing time: 539616us
    
  16. DrahtBot commented at 3:40 pm on September 12, 2021: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #22954 ([TESTS] Allow tx_invalid.json tests to include flag rules for if_unset: [A,B,C] then_unset: [D] by JeremyRubin)
    • #22876 ([TESTS] Update Transaction Tests to permit setting a flag as always on and disabling the exhaustive failure test by JeremyRubin)
    • #21702 (Implement BIP-119 Validation (CheckTemplateVerify) by JeremyRubin)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  17. Squash: Use FancyPants Bit Twiddling Algorithm to Generate Submasks d185f36ede
  18. JeremyRubin commented at 7:06 pm on September 12, 2021: contributor
    I added the textbook algorithm for this: feel free to test that the output is the same then I can squash.
  19. jonatack commented at 12:08 pm on September 13, 2021: member

    @JeremyRubin Nice. The textbook algorithm produces the same values in the tx_invalid test but fewer values in the tx_valid test. Among the values that it does not return compared to your algo, only 2 or 3 of them are unique. WDYT?

      0test/transaction_tests.cpp(194): Entering test case "tx_valid"
      10
      22
      34
      46
      58
      610
      712
      80
      92
     104
     116
     128
     1310
     1412
     150
     162
     174
     186
     198
     2010
     2112
     2214
     2316
     2418
     2520
     2622
     2724
     2826
     2928
     300
     312
     324
     336
     348
     3510
     3612
     3714
     3816
     3918
     4020
     4122
     4224
     4326
     4428
     450
     462
     474
     486
     498
     5010
     5112
     5214
     5316
     5418
     5520
     5622
     5724
     5826
     5928
     600
     612
     628
     630
     640
     652
     664
     676
     688
     6910
     7012
     710
     720
     730
     740
     750
     760
     770
     782
     7916384
     800
     810
     820
     830
     848
     8565536
     860
     878
     8865536
     890
     908
     9165536
     920
     938
     9465536
     950
     968
     9765536
     980
     990
    1008
    10132
    10240
    10365536
    10465544
    10565568
    1060
    1078
    10832
    10940
    11065536
    11165544
    11265568
    1130
    11432
    11565536
    1160
    1178
    11832
    11940
    12065536
    12165544
    12265568
    1230
    1248
    12565536
    1260
    1278
    12865536
    1290
    1300
    1310
    1322
    1334
    1346
    1358
    13610
    13712
    1380
    1390
    1400
    1410
    1420
    1430
    1440
    1450
    1468
    14765536
    1480
    1490
    1508
    15165536
    1520
    153test/transaction_tests.cpp(194): Leaving test case "tx_valid"; testing time: 597522us
    154test/transaction_tests.cpp(284): Entering test case "tx_invalid"
    1550
    1560
    1570
    1580
    1590
    1600
    1610
    1620
    1630
    1640
    1650
    1660
    1670
    1680
    1690
    1700
    1710
    1720
    1730
    1740
    1750
    1760
    1770
    1780
    1790
    1801
    181512
    1820
    1830
    1840
    1850
    1860
    1870
    1880
    1890
    1900
    1910
    1920
    1930
    1940
    1950
    1960
    1971
    1981024
    1990
    2000
    2010
    2021
    2032049
    2044096
    2054097
    2060
    2071
    2080
    2091
    2100
    2111
    2120
    2131
    2140
    2151
    2160
    2171
    2180
    2191
    2200
    2211
    2220
    2231
    2240
    2251
    2260
    2271
    2282049
    2294096
    2304097
    2310
    2320
    2331
    2340
    2350
    2361
    2370
    2380
    2390
    2400
    2410
    2420
    2430
    2440
    2450
    2460
    2470
    2481
    24965536
    2500
    251test/transaction_tests.cpp(284): Leaving test case "tx_invalid"; testing time: 170187us
    
  20. in src/test/transaction_tests.cpp:278 in d185f36ede
    275-                if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, ~flags_excluding_one, txdata, strTest, /* expect_valid */ false)) {
    276+            for (auto submask_flags : GenerateAllSubMasks(verify_flags)) {
    277+                if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, ~submask_flags, txdata, strTest, /* expect_valid */ false)) {
    278                     BOOST_ERROR("Too many flags unset: " << strTest);
    279                 }
    280             }
    


    esneider commented at 4:48 am on September 14, 2021:

    If I understand the intent of this code correctly, the idea is that given a set of chosen flags, we want to check that every possible superset of flags doesn’t succeed. If that’s so, I believe there’s a problem with the current FlipAll( GenerateAllSubMasks( flags ) ) logic for computing all possible supersets.

    Eg, assuming 3-bit words, for flags = 100, we’ll generate all flags sub masks {000}, and then flip them {111}. However, what we really want is to generate all the ~flags = 011 sub masks {010, 001, 000}, and then flip them {101, 110, 111}.

    In short, there’s a missing bit flip before computing all the sub masks. Ie. it should be

    0FlipAll( GenerateAllSubMasks( ~flags ) )
    

    With that said, there’s still the issue that flipping flag bits doesn’t interact well with the TrimFlags operation done inside GenerateAllSubMasks.

    Eg. if the only flag bits wereVERIFY_P2SH and VERIFY_WITNESS (in that order), for flags = 00 we’d generate all the ~flags = 11 sub masks {10, 01, 00}, trim them {10, 00, 00} (01 is mapped to 00), remove duplicates {10, 00}, and finally flip them {01, 11}. However, we ended up with an invalid combination (01) when the correct result would have been {10, 11}.

    More generally, after fixing the initial missing flip, the code would look something like

    0FlipAll( Unique( TrimFlags( GenerateAllSubMasksWithoutTrimming( ~flags ) ) ) )
    

    when what we really want is something like:

    0for flag_superset in FlipAll( GenerateAllSubMasksWithoutTrimming( ~flags ) ) ) ):
    1    if flag_superset == TrimFlags( flag_superset ):
    2        # use flag_superset
    

    JeremyRubin commented at 6:13 pm on September 14, 2021:

    yeah you’re totally right; i’ve been playing with it a bit as well and noticed that it seems to not be quite right (thanks @jonatack for pointing it out).

    The logic is definitely off, and I think it may be intractible in this case since GenerateAllSubMasksWithoutTrimming(~flags) ends up being pretty big… with 20 it might be ok, but eventually there would be too many flags.


    esneider commented at 5:08 am on September 15, 2021:
    Makes sense. Maybe bound the amount of flipped bits? That way you can test some flag interactions, but bound the exponential growth. The bounding capability would also help to put a safeguard against exponential explosion in the tx_invalid test if there are too many set bits in the initial mask.
  21. JeremyRubin commented at 6:45 pm on October 22, 2021: contributor
    Closing this for now – needs more research/discussion on how to accomplish actually using the powerset properly, see #22948 (review)
  22. JeremyRubin closed this on Oct 22, 2021

  23. JeremyRubin added the label Up for grabs on Oct 22, 2021
  24. DrahtBot locked this on Oct 30, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-24 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me