[Tests] Compute the Power Set of all flags instead of one by one exclusion #22948

pull JeremyRubin wants to merge 2 commits into bitcoin:master from JeremyRubin:flag-powerset changing 1 files +19 −14
  1. JeremyRubin commented at 11:19 PM on September 10, 2021: contributor

    Currently, at the end of a test vector (valid or invalid) we iterate the flags one by one and unset them and check that it makes the transaction succeed or fail.

    This is called 'minimality' in that we want to check that each flag is sufficient on it's own. However, I think we should be instead doing the 2**(specified) combinations of flag, to ensure that for all combinations of flags only the one we have specified is either valid or invalid.

    Otherwise, we might have e.g. subsets of flags that have interactions we're not detecting here.

    Interestingly, the asymptotic runtime here should be better on average (since we don't usually set that many flags, v.s. 32 iterations on the one by one checker), but the worst case is 2**32 flag combos. It's up to the test writers to not abuse this check.

    Contrived example demonstrating the problem:

    const auto sum = flag_a_set + flag_b_set +flag_c_set ;
    // check parity is even
    if (sum & 1) throw "Oops";
    

    having set "a,b,c", the one-by-one checker for "a,b,c," expects failure would check ["a,b", "a,c", "b,c"] and see no failures since the parity of each is 2. However, ["a", "b", "c"] would fail since the parity is 1.

    I'm not aware of any code that is written in this specific manner, but similar circumstance might arise naturally in the code.

  2. in src/test/transaction_tests.cpp:200 in 125fb1c8dc outdated
     197 |      std::set<unsigned int> flags_combos;
     198 | -    for (const auto& pair : mapFlagNames) {
     199 | -        const unsigned int flags_excluding_one = TrimFlags(flags & ~(pair.second));
     200 | +    for (const auto& excluded_flags : all_flag_combos) {
     201 | +        const unsigned int flags_excluding_one = TrimFlags(excluded_flags);
     202 |          if (flags != flags_excluding_one) {
    


    JeremyRubin commented at 11:21 PM on September 10, 2021:

    this check is actually probably superfluous with the iteration not including combos -1 (the all bits set).


    JeremyRubin commented at 11:30 PM on September 10, 2021:

    update: it's not, because of the TrimFlags


    esneider commented at 2:28 AM on September 12, 2021:

    Notice that TrimFlags only removes flags, so it is, indeed, superfluous: flags_excluding_one ⊆ excluded_flags ⊂ flags

    Having said that, there's no harm in leaving it here.

  3. JeremyRubin force-pushed on Sep 10, 2021
  4. JeremyRubin renamed this:
    [Tests] Computer the Power Set of all flags instead of one by one exclusion
    [Tests] Compute the Power Set of all flags instead of one by one exclusion
    on Sep 10, 2021
  5. fanquake added the label Tests on Sep 10, 2021
  6. JeremyRubin commented at 2:53 PM on September 11, 2021: contributor

    If anyone doesn't like the submask code, I could swap it for this algorithm (which is more efficient but not-obvious) https://cp-algorithms.com/algebra/all-submasks.html#toc-tgt-0

  7. in src/test/transaction_tests.cpp:178 in 7812cff7c3 outdated
     173 | @@ -174,9 +174,29 @@ unsigned int FillFlags(unsigned int flags)
     174 |  // Assumes that mapFlagNames contains all script verify flags.
     175 |  std::set<unsigned int> ExcludeIndividualFlags(unsigned int flags)
     176 |  {
     177 | +    std::vector<unsigned int> set_flags;
     178 | +    const size_t n_flags = sizeof(unsigned int);
    


    esneider commented at 1:59 AM on September 12, 2021:

    I think you might be missing a * 8 here


    JeremyRubin commented at 6:06 AM on September 12, 2021:

    🤦 --> will fix

  8. in src/test/transaction_tests.cpp:173 in 7812cff7c3 outdated
     173 | @@ -174,9 +174,29 @@ unsigned int FillFlags(unsigned int flags)
     174 |  // Assumes that mapFlagNames contains all script verify flags.
    


    esneider commented at 2:08 AM on September 12, 2021:

    The doc is out of date now


    JeremyRubin commented at 6:13 AM on September 12, 2021:

    i can't tell what line you are commenting on


    esneider commented at 6:30 AM on September 12, 2021:

    Right, sorry about that.

    // Exclude each possible script verify flag from flags. Returns a set of these flag combinations
    // that are valid and without duplicates. For example: if flags=1111 and the 4 possible flags are
    // 0001, 0010, 0100, and 1000, this should return the set {0111, 1011, 1101, 1110}.
    
  9. in src/test/transaction_tests.cpp:202 in 7812cff7c3 outdated
     199 | -        const unsigned int flags_excluding_one = TrimFlags(flags & ~(pair.second));
     200 | +    for (const unsigned int excluded_flags : all_flag_combos) {
     201 | +        const unsigned int flags_excluding_one = TrimFlags(excluded_flags);
     202 |          if (flags != flags_excluding_one) {
     203 |              flags_combos.insert(flags_excluding_one);
     204 |          }
    


    esneider commented at 2:40 AM on September 12, 2021:

    Not sure if this is a problem, but notice that TrimFlags might cause the resulting flag_combos to have duplicate elements.

    Eg, if SCRIPT_VERIFY_P2SH and SCRIPT_VERIFY_WITNESS are set in flags, the mapping done by TrimFlags will make 25% of the resulting elements duplicates.


    JeremyRubin commented at 6:08 AM on September 12, 2021:

    yeah I could do a de-dup pass with TrimFlags. It makes more sense IMO to keep TrimFlags even if superfluos with the above as TrimFlags is a separate validity of flag combo check v.s. the dumb subset generator.

  10. [Tests] Compute the Power Set of all flags instead of one by one exclusion in transaction_tests.cpp 885bf41580
  11. JeremyRubin force-pushed on Sep 12, 2021
  12. in src/test/transaction_tests.cpp:200 in 885bf41580 outdated
     204 | +
     205 | +    std::set<unsigned int> flags_combos;
     206 | +    for (const unsigned int submask : all_flag_combos) {
     207 | +        flags_combos.insert(TrimFlags(submask));
     208 | +    }
     209 | +    flags_combos.erase(flags);
    


    jonatack commented at 2:53 PM on September 12, 2021:

    nit, clang-format

    @@ -174,7 +174,7 @@ unsigned int FillFlags(unsigned int flags)
     std::set<unsigned int> GenerateAllSubMasks(unsigned int flags)
     {
         std::vector<unsigned int> set_flags;
    -    const size_t n_flags = sizeof(unsigned int)*8;
    +    const size_t n_flags = sizeof(unsigned int) * 8;
         set_flags.reserve(n_flags);
    @@ -184,10 +184,10 @@ std::set<unsigned int> GenerateAllSubMasks(unsigned int flags)
         }
         const size_t combos = 1 << set_flags.size();
         // skip last all set flags
    -    std::vector<unsigned int> all_flag_combos(combos-1);
    -    for (size_t i = 0; i < combos-1; ++i){
    -        for(size_t j = 0; j < set_flags.size(); ++j) {
    -            if (i & (1<<j)) {
    +    std::vector<unsigned int> all_flag_combos(combos - 1);
    +    for (size_t i = 0; i < combos - 1; ++i) {
    +        for (size_t j = 0; j < set_flags.size(); ++j) {
    +            if (i & (1 << j)) {
                     all_flag_combos[i] |= set_flags[j];
    
  13. in src/test/transaction_tests.cpp:180 in 885bf41580 outdated
     184 | -            flags_combos.insert(flags_excluding_one);
     185 | +    std::vector<unsigned int> set_flags;
     186 | +    const size_t n_flags = sizeof(unsigned int)*8;
     187 | +    set_flags.reserve(n_flags);
     188 | +    for (size_t i = 0; i < n_flags; ++i) {
     189 | +        const unsigned int flag = flags & (1 << i);
    


    jonatack commented at 2:58 PM on September 12, 2021:

    nit, can use braced initialization on these for type safety

    -    const size_t n_flags = sizeof(unsigned int)*8;
    +    const size_t n_flags{sizeof(unsigned int) * 8};
         set_flags.reserve(n_flags);
         for (size_t i = 0; i < n_flags; ++i) {
    -        const unsigned int flag = flags & (1 << i);
    +        const unsigned int flag{flags & (1 << i)};
    
  14. in src/test/transaction_tests.cpp:188 in 885bf41580 outdated
     192 | +        }
     193 | +    }
     194 | +    const size_t combos = 1 << set_flags.size();
     195 | +    // skip last all set flags
     196 | +    std::vector<unsigned int> all_flag_combos(combos-1);
     197 | +    for (size_t i = 0; i < combos-1; ++i){
    


    jonatack commented at 3:02 PM on September 12, 2021:

    maybe hoist subtracting one to the declaration of combos

    -    const size_t combos = 1 << set_flags.size();
    +    const size_t combos = (1 << set_flags.size()) - 1;
         // skip last all set flags
    -    std::vector<unsigned int> all_flag_combos(combos-1);
    -    for (size_t i = 0; i < combos-1; ++i){
    +    std::vector<unsigned int> all_flag_combos(combos);
    +    for (size_t i = 0; i < combos; ++i) {
    
  15. jonatack commented at 3:32 PM on September 12, 2021: member

    Concept ACK. Here are the results for me before/after. It might be interesting to see the result with https://cp-algorithms.com/algebra/all-submasks.html if you're motivated to try it.

    EDIT: added the results for both tests (tx_valid and tx_invalid).

    <details><summary><code>ExcludeIndividualFlags</code></summary><p>

    test/transaction_tests.cpp(187): Entering test suite "transaction_tests"
    test/transaction_tests.cpp(189): Entering test case "tx_valid"
    6
    10
    12
    6
    10
    12
    14
    22
    26
    28
    14
    22
    26
    28
    14
    22
    26
    28
    2
    8
    0
    65536
    6
    10
    12
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    2
    16384
    0
    0
    0
    8
    65536
    8
    65536
    8
    65536
    8
    65536
    8
    65536
    0
    40
    65544
    65568
    65576
    40
    65544
    65568
    65576
    32
    65536
    65568
    40
    65544
    65568
    65576
    8
    65536
    8
    65536
    0
    0
    0
    0
    64
    0
    32
    6
    10
    12
    0
    0
    0
    64
    0
    32
    0
    0
    0
    64
    0
    0
    64
    0
    64
    8
    65536
    0
    8
    65536
    0
    test/transaction_tests.cpp(189): Leaving test case "tx_valid"; testing time: 918607us
    test/transaction_tests.cpp(280): Entering test case "tx_invalid"
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    1
    512
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    1
    1024
    0
    0
    2049
    4096
    4097
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    2049
    4096
    4097
    0
    0
    1
    0
    0
    1
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    1
    65536
    0
    test/transaction_tests.cpp(280): Leaving test case "tx_invalid"; testing time: 270583us
    

    </p></details>

    <details><summary><code>GenerateAllSubMasks</code></summary><p>

    test/transaction_tests.cpp(204): Entering test suite "transaction_tests"
    test/transaction_tests.cpp(206): Entering test case "tx_valid"
    0
    2
    4
    6
    8
    10
    12
    0
    2
    4
    6
    8
    10
    12
    0
    2
    4
    6
    8
    10
    12
    14
    16
    18
    20
    22
    24
    26
    28
    0
    2
    4
    6
    8
    10
    12
    14
    16
    18
    20
    22
    24
    26
    28
    0
    2
    4
    6
    8
    10
    12
    14
    16
    18
    20
    22
    24
    26
    28
    0
    2
    8
    0
    65536
    0
    2
    4
    6
    8
    10
    12
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    2
    16384
    0
    0
    0
    0
    8
    65536
    0
    8
    65536
    0
    8
    65536
    0
    8
    65536
    0
    8
    65536
    0
    0
    8
    32
    40
    65536
    65544
    65568
    65576
    0
    8
    32
    40
    65536
    65544
    65568
    65576
    0
    32
    65536
    65568
    0
    8
    32
    40
    65536
    65544
    65568
    65576
    0
    8
    65536
    0
    8
    65536
    0
    0
    0
    0
    64
    0
    32
    0
    2
    4
    6
    8
    10
    12
    0
    0
    0
    64
    0
    32
    0
    0
    0
    64
    0
    0
    64
    0
    64
    0
    8
    65536
    0
    0
    8
    65536
    0
    test/transaction_tests.cpp(206): Leaving test case "tx_valid"; testing time: 1159591us
    test/transaction_tests.cpp(297): Entering test case "tx_invalid"
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    1
    512
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    1
    1024
    0
    0
    0
    1
    2049
    4096
    4097
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    2049
    4096
    4097
    0
    0
    1
    0
    0
    1
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    1
    65536
    0
    test/transaction_tests.cpp(297): Leaving test case "tx_invalid"; testing time: 539616us
    

    </p></details>

  16. DrahtBot commented at 3:40 PM on September 12, 2021: member

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #22954 ([TESTS] Allow tx_invalid.json tests to include flag rules for if_unset: [A,B,C] then_unset: [D] by JeremyRubin)
    • #22876 ([TESTS] Update Transaction Tests to permit setting a flag as always on and disabling the exhaustive failure test by JeremyRubin)
    • #21702 (Implement BIP-119 Validation (CheckTemplateVerify) by JeremyRubin)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  17. Squash: Use FancyPants Bit Twiddling Algorithm to Generate Submasks d185f36ede
  18. JeremyRubin commented at 7:06 PM on September 12, 2021: contributor

    I added the textbook algorithm for this: feel free to test that the output is the same then I can squash.

  19. jonatack commented at 12:08 PM on September 13, 2021: member

    @JeremyRubin Nice. The textbook algorithm produces the same values in the tx_invalid test but fewer values in the tx_valid test. Among the values that it does not return compared to your algo, only 2 or 3 of them are unique. WDYT?

    <details><summary>FancyPants version</summary><p>

    test/transaction_tests.cpp(194): Entering test case "tx_valid"
    0
    2
    4
    6
    8
    10
    12
    0
    2
    4
    6
    8
    10
    12
    0
    2
    4
    6
    8
    10
    12
    14
    16
    18
    20
    22
    24
    26
    28
    0
    2
    4
    6
    8
    10
    12
    14
    16
    18
    20
    22
    24
    26
    28
    0
    2
    4
    6
    8
    10
    12
    14
    16
    18
    20
    22
    24
    26
    28
    0
    2
    8
    0
    0
    2
    4
    6
    8
    10
    12
    0
    0
    0
    0
    0
    0
    0
    2
    16384
    0
    0
    0
    0
    8
    65536
    0
    8
    65536
    0
    8
    65536
    0
    8
    65536
    0
    8
    65536
    0
    0
    8
    32
    40
    65536
    65544
    65568
    0
    8
    32
    40
    65536
    65544
    65568
    0
    32
    65536
    0
    8
    32
    40
    65536
    65544
    65568
    0
    8
    65536
    0
    8
    65536
    0
    0
    0
    2
    4
    6
    8
    10
    12
    0
    0
    0
    0
    0
    0
    0
    0
    8
    65536
    0
    0
    8
    65536
    0
    test/transaction_tests.cpp(194): Leaving test case "tx_valid"; testing time: 597522us
    test/transaction_tests.cpp(284): Entering test case "tx_invalid"
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    1
    512
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    1
    1024
    0
    0
    0
    1
    2049
    4096
    4097
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    0
    1
    2049
    4096
    4097
    0
    0
    1
    0
    0
    1
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    1
    65536
    0
    test/transaction_tests.cpp(284): Leaving test case "tx_invalid"; testing time: 170187us
    

    </p></details>

  20. in src/test/transaction_tests.cpp:278 in d185f36ede
     275 | -                if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, ~flags_excluding_one, txdata, strTest, /* expect_valid */ false)) {
     276 | +            for (auto submask_flags : GenerateAllSubMasks(verify_flags)) {
     277 | +                if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, ~submask_flags, txdata, strTest, /* expect_valid */ false)) {
     278 |                      BOOST_ERROR("Too many flags unset: " << strTest);
     279 |                  }
     280 |              }
    


    esneider commented at 4:48 AM on September 14, 2021:

    If I understand the intent of this code correctly, the idea is that given a set of chosen flags, we want to check that every possible superset of flags doesn't succeed. If that's so, I believe there's a problem with the current FlipAll( GenerateAllSubMasks( flags ) ) logic for computing all possible supersets.

    Eg, assuming 3-bit words, for flags = 100, we'll generate all flags sub masks {000}, and then flip them {111}. However, what we really want is to generate all the ~flags = 011 sub masks {010, 001, 000}, and then flip them {101, 110, 111}.

    In short, there's a missing bit flip before computing all the sub masks. Ie. it should be

    FlipAll( GenerateAllSubMasks( ~flags ) )
    

    With that said, there's still the issue that flipping flag bits doesn't interact well with the TrimFlags operation done inside GenerateAllSubMasks.

    Eg. if the only flag bits wereVERIFY_P2SH and VERIFY_WITNESS (in that order), for flags = 00 we'd generate all the ~flags = 11 sub masks {10, 01, 00}, trim them {10, 00, 00} (01 is mapped to 00), remove duplicates {10, 00}, and finally flip them {01, 11}. However, we ended up with an invalid combination (01) when the correct result would have been {10, 11}.

    More generally, after fixing the initial missing flip, the code would look something like

    FlipAll( Unique( TrimFlags( GenerateAllSubMasksWithoutTrimming( ~flags ) ) ) )
    

    when what we really want is something like:

    for flag_superset in FlipAll( GenerateAllSubMasksWithoutTrimming( ~flags ) ) ) ):
        if flag_superset == TrimFlags( flag_superset ):
            # use flag_superset
    

    JeremyRubin commented at 6:13 PM on September 14, 2021:

    yeah you're totally right; i've been playing with it a bit as well and noticed that it seems to not be quite right (thanks @jonatack for pointing it out).

    The logic is definitely off, and I think it may be intractible in this case since GenerateAllSubMasksWithoutTrimming(~flags) ends up being pretty big... with 20 it might be ok, but eventually there would be too many flags.


    esneider commented at 5:08 AM on September 15, 2021:

    Makes sense. Maybe bound the amount of flipped bits? That way you can test some flag interactions, but bound the exponential growth. The bounding capability would also help to put a safeguard against exponential explosion in the tx_invalid test if there are too many set bits in the initial mask.

  21. JeremyRubin commented at 6:45 PM on October 22, 2021: contributor

    Closing this for now -- needs more research/discussion on how to accomplish actually using the powerset properly, see #22948 (review)

  22. JeremyRubin closed this on Oct 22, 2021

  23. JeremyRubin added the label Up for grabs on Oct 22, 2021
  24. DrahtBot locked this on Oct 30, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-15 21:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me