O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation #16902

pull sipa wants to merge 3 commits into bitcoin:master from sipa:201909_o1nestedifs changing 2 files +90 −3
  1. sipa commented at 2:53 am on September 18, 2019: member

    While investigating what mechanisms are possible to maximize the per-opcode verification cost of scripts, I noticed that the logic for determining whether a particular opcode is to be executed is O(n) in the nesting depth. This issue was also pointed out by Sergio Demian Lerner in https://bitslog.wordpress.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/, and this PR implements a variant of the O(1) algorithm suggested there.

    This is not a problem currently, because even with a nesting depth of 100 (the maximum possible right now due to the 201 ops limit), the slowdown caused by this on my machine is around 70 ns per opcode (or 0.25 s per block) at worst, far lower than what is possible with other opcodes.

    This PR mostly serves as a proof of concept that it’s possible to avoid it, which may be relevant in discussions around increasing the opcode limits in future script versions. Without it, the execution time of scripts can grow quadratically with the nesting depth, which very quickly becomes unreasonable.

    This improves upon #14245 by completely removing the vfExec vector.

  2. fanquake added the label Needs Conceptual Review on Sep 18, 2019
  3. sipa force-pushed on Sep 18, 2019
  4. fanquake added the label Resource usage on Sep 18, 2019
  5. DrahtBot commented at 5:06 am on September 18, 2019: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #18011 (Replace current benchmarking framework with nanobench by martinus)
    • #13062 (Make script interpreter independent from storage type CScript by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  6. ajtowns commented at 6:22 am on September 18, 2019: member

    It might be easier to review if the operations on vfExec are pulled out into a separate class first, so that the optimisation just means updating the class from:

    0class FalseCounter {
    1private:
    2    std::vector<bool> flags;
    3public:
    4    bool all_true() { return !count(flags.begin(), flags.end(), false); }
    5    void push_back(bool f) { flags.push_back(f); }
    6    void pop_back() { flags.pop_back(); }
    7    bool empty() { return flags.empty(); }
    8    void toggle_top() { flags.back() = !flags.back(); }
    9};
    

    to

     0class FalseCounter {
     1private:
     2    int depth = 0;
     3    int first_false = 0;
     4public:
     5    bool all_true() { return first_false == 0; }
     6    void push_back(bool f) {
     7        ++depth;
     8        if (first_false == 0 && !f) first_false = depth;
     9    }
    10    void pop_back() {
    11        if (first_false == depth) first_false = 0;
    12        --depth;
    13    }
    14    bool empty() { return depth == 0; }
    15    void toggle_top() {
    16        if (first_false == 0) {
    17            first_false = depth;
    18        } else if (first_false == depth) {
    19            first_false = 0;
    20        } else {
    21            // no action needed as change is unobservable
    22        }
    23    }
    24};
    

    without changing the usage of the class. I think it’d be plausible to do a coq proof that those implementations are logically equivalent / implement the same API (at least given !empty() preconditions for pop_back() and toggle_top()).

    cf https://github.com/ajtowns/bitcoin/commits/201909-vfexec-optimise

  7. sipa commented at 6:28 am on September 18, 2019: member
    @ajtowns If there’s interest in including this patch at some point we should pick up that approach.
  8. practicalswift commented at 9:22 am on September 18, 2019: contributor

    Somewhat related issue: https://github.com/sipa/miniscript/issues/7 (“Policies with too high nesting depth are not rejected: It is possible to create small inputs that cause extreme memory usage (in practice: OOM kill or std::bad_alloc)”)

    Context: Nested tresh tend to be very OP_IF/OP_ELSE/OP_ENDIF intensive (or_i) when compiled to script :)

  9. sipa force-pushed on Nov 4, 2019
  10. sipa renamed this:
    [POC] O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation
    O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation
    on Nov 4, 2019
  11. sipa force-pushed on Nov 4, 2019
  12. sipa commented at 6:54 pm on November 4, 2019: member
    I’ve modified the code to follow @ajtowns’s approach more closely, though with fewer commits, and extra comments.
  13. sipa force-pushed on Nov 4, 2019
  14. sipa force-pushed on Nov 4, 2019
  15. in src/script/interpreter.cpp:300 in d4721cd00b outdated
    296  */
    297 class ConditionStack {
    298 private:
    299-    std::vector<bool> m_flags;
    300+    //! The size of the implied stack.
    301+    uin32_t m_stack_size = 0;
    


    ajtowns commented at 8:50 am on November 5, 2019:
    uint32_t with a t :)

    sipa commented at 8:47 pm on November 6, 2019:
    Fixed.
  16. in src/script/interpreter.cpp:302 in d4721cd00b outdated
    298 private:
    299-    std::vector<bool> m_flags;
    300+    //! The size of the implied stack.
    301+    uin32_t m_stack_size = 0;
    302+    //! The position of the first false value on the implied stack, or -1 if all true.
    303+    int32_t m_first_false_pos = -1;
    


    ajtowns commented at 8:54 am on November 5, 2019:
    Using int32_t gives “comparison of integers of different signs” warnings. Having uint32_t m_first_false_pos_plus_one = 0 instead avoids that, and seems to make the code simpler (avoids a bunch of decrements/increments). I think you could call it m_size_of_stack_including_first_false with 0 as a special value, and have it make sense, maybe?

    sipa commented at 8:47 pm on November 6, 2019:
    I’m instead adding a NO_FALSE constant (equal to 0xFFFFFFFF), and reordering operations a bit to avoid some - 1s.
  17. sipa force-pushed on Nov 6, 2019
  18. sipa force-pushed on Nov 6, 2019
  19. sipa force-pushed on Nov 7, 2019
  20. Benchmark script verification with 100 nested IFs 89fb241c54
  21. [refactor] interpreter: define interface for vfExec
    Includes comments added by Pieter Wuille.
    d0e8f4d5d8
  22. Implement O(1) OP_IF/NOTIF/ELSE/ENDIF logic
    This optimization was first suggested by Sergio Demian Lerner in
    https://bitslog.wordpress.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/.
    The implementation follows the suggested approach there, but with a slightly
    simpler representation.
    e6e622e5a0
  23. sipa force-pushed on Nov 7, 2019
  24. nkohen referenced this in commit e10228b74d on Nov 7, 2019
  25. Christewart referenced this in commit d86acfffed on Nov 8, 2019
  26. in src/script/interpreter.cpp:310 in e6e622e5a0
    305+    uint32_t m_first_false_pos = NO_FALSE;
    306+
    307+public:
    308+    bool empty() { return m_stack_size == 0; }
    309+    bool all_true() { return m_first_false_pos == NO_FALSE; }
    310+    void push_back(bool f)
    


    NicolasDorier commented at 11:36 am on January 26, 2020:
    I think an assert(m_stack_size != NO_FALSE - 1) is needed. Also, does not seems caller are checking this. Unsure if this can realistically taken advantage of, but I think it does not hurt the scriptevaluator to check for it.

    ajtowns commented at 2:18 am on January 28, 2020:
    You’d need a script with 4 billion OP_IF’s to hit that (so a tx with 4GB of data or over 1G weight), so taking advantage of it shouldn’t be possible… Adding the assert seems plausible though.

    roconnor-blockstream commented at 2:11 pm on March 18, 2020:
    In particular Script length is limited to MAX_SIZE (2^25) bytes.
  27. jnewbery commented at 3:31 pm on February 11, 2020: member

    Code review ACK e6e622e5a0e22c2ac1b50b96af818e412d67ac54

    I’ve run the bench and this change cuts time by ~75%

    before:

    0→ ./bench_bitcoin -filter=VerifyNestedIfScript -evals=50
    1WARNING: This is a debug build - may result in slower benchmarks.
    2# Benchmark, evals, iterations, total, min, max, median
    3VerifyNestedIfScript, 50, 100, 14.0987, 0.00267213, 0.00323801, 0.00277969
    

    after:

    0→ ./bench_bitcoin -filter=VerifyNestedIfScript -evals=50
    1WARNING: This is a debug build - may result in slower benchmarks.
    2# Benchmark, evals, iterations, total, min, max, median
    3VerifyNestedIfScript, 50, 100, 3.15142, 0.000624684, 0.0006379, 0.000630272
    

    I think the asserts in the final implementations of pop_back() and toggle_top() can actually be added in the intermediate commit to better illustrate that this doesn’t change behaviour. I’ve done that, as well as adding the assert suggested in #16902 (review) and some more commenting here: https://github.com/jnewbery/bitcoin/tree/pr16902.1. Feel free to take any of that if you think it’s an improvement.

  28. MarcoFalke commented at 3:11 pm on February 12, 2020: member

    ACK e6e622e5a0e22c2ac1b50b96af818e412d67ac54 🐴

    • Checked that d0e8f4d5d8ddaccb37f98b7989fb944081e41ab8 is moving the code around and that the symbol ‘count’ is equal to ‘std::count’ in that call stack
    • Checked that e6e622e5a0e22c2ac1b50b96af818e412d67ac54 is not a behavior change by looking at the code and running the fuzzer from #18127 for a couple of minutes

    Signature:

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA512
     2
     3ACK e6e622e5a0e22c2ac1b50b96af818e412d67ac54 🐴
     4
     5* Checked that d0e8f4d5d8ddaccb37f98b7989fb944081e41ab8 is moving the code around and that the symbol 'count' is equal to 'std::count' in that call stack
     6* Checked that e6e622e5a0e22c2ac1b50b96af818e412d67ac54 is not a behavior change by looking at the code and running the fuzzer from [#18127](/bitcoin-bitcoin/18127/) for a couple of minutes
     7
     8
     9
    10
    11-----BEGIN PGP SIGNATURE-----
    12
    13iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
    14pUidcQwAkkads4/9CEnAK1tMJcqU+hjCau8yTqstSzi9plrVJ3gUijrrQ5IaKljg
    159i88GlQS42vo83ySz+MRvXBV0oeQGOlwS6mcGjaiNyYKlDPRlbZ5XPlKHa6DbqVp
    16pJr0rDuzUTB5o7F2w/01CNEtptYFUes+CWm0TiSWAiKJJ5j5W54F8RXQDSs/SZij
    17ATXAg3dj7OKvV2D7mbh9zK0ZODZnTFjBn5ej5onLXKsf21aypZRfrA30bdFnyemX
    181lsYXwvaZYgieVIqGEm2pbBRdEiIKNwivYblMvBEVnf8cPehG0hM2PJNcREGN/ZT
    19jObzNT+N/529p/no4RImfiW7+QZ8mmD6pFgV18T0t+Aw7z5KnZYnrS5iXgYPmPeS
    20ca1/zKhFc73gCOlfitookclqT2f2F1m7EUVW0lEoTSkhSZpuQfiLa/C/8QFYdr/s
    21sUodGjEcj1SwjwTrTnxadDjsVHIPfqV5kGSWzY4m5EtvQ1JERviJkoG+ME0awkPR
    22NhdKgKb5
    23=ZGkg
    24-----END PGP SIGNATURE-----
    

    Timestamp of file with hash 5bbbbde183fde736dde66bf963f6b1f690bb0927454cf1500b52aeadf50396c3 -

  29. in src/script/interpreter.cpp:334 in e6e622e5a0
    329+    {
    330+        assert(m_stack_size > 0);
    331+        if (m_first_false_pos == NO_FALSE) {
    332+            // The current stack is all true values; the first false will be the top.
    333+            m_first_false_pos = m_stack_size - 1;
    334+        } else if (m_first_false_pos == m_stack_size - 1) {
    


    jonatack commented at 4:52 pm on February 12, 2020:
    pico-nit: perhaps execute m_stack_size - 1 between the assert line 330 and the start of the conditional line 331 and use the result in lines 333 and 334 in the conditional
  30. jonatack commented at 5:01 pm on February 12, 2020: member

    ACK e6e622e5a0e22c2ac1b50b96af818e412d67ac54 code review, build, benches, fuzzing

    bench before

    0(HEAD detached at 89fb241c54)$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
    1WARNING: This is a debug build - may result in slower benchmarks.
    2# Benchmark, evals, iterations, total, min, max, median
    3VerifyNestedIfScript, 50, 100, 24.0408, 0.00434014, 0.00559945, 0.00475711
    4VerifyNestedIfScript, 50, 100, 23.5444, 0.00428717, 0.00569794, 0.00467757
    5VerifyNestedIfScript, 50, 100, 24.6077, 0.00431298, 0.00633760, 0.00485635
    

    bench after

    0(pr/16902)$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
    1WARNING: This is a debug build - may result in slower benchmarks.
    2# Benchmark, evals, iterations, total, min, max, median
    3VerifyNestedIfScript, 50, 100, 5.49356, 0.000954118, 0.00140947, 0.00106701
    4VerifyNestedIfScript, 50, 100, 6.64376, 0.000977855, 0.00202046, 0.00132529
    5VerifyNestedIfScript, 50, 100, 5.88235, 0.000968067, 0.00177160, 0.00110281
    

    fuzzing results (#18127)

     0(pr/18127)$ src/test/fuzz/script_condition_stack 
     1INFO: Seed: 3449044402
     2INFO: Loaded 1 modules   (510932 inline 8-bit counters): 510932 [0x5642d9dc1678, 0x5642d9e3e24c), 
     3INFO: Loaded 1 PC tables (510932 PCs): 510932 [0x5642d9e3e250,0x5642da609f90), 
     4INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
     5INFO: A corpus is not provided, starting from an empty corpus
     6[#2](/bitcoin-bitcoin/2/)	INITED cov: 177 ft: 178 corp: 1/1b exec/s: 0 rss: 117Mb
     7[#3](/bitcoin-bitcoin/3/)	NEW    cov: 179 ft: 242 corp: 2/59b exec/s: 0 rss: 117Mb L: 58/58 MS: 1 InsertRepeatedBytes-
     8[#4](/bitcoin-bitcoin/4/)	NEW    cov: 179 ft: 290 corp: 3/90b exec/s: 0 rss: 117Mb L: 31/58 MS: 1 EraseBytes-
     9.../...
    10[#5136](/bitcoin-bitcoin/5136/)	REDUCE cov: 438 ft: 2238 corp: 116/218Kb exec/s: 7 rss: 152Mb L: 269/4096 MS: 1 EraseBytes-
    

    I’ll leave the fuzzer running for a while longer.

  31. jonatack commented at 5:31 pm on February 12, 2020: member
    I like the changes in @jnewbery’s branch mentioned in #16902 (comment)) esp. this code comment.
  32. fjahr commented at 0:19 am on February 13, 2020: member

    ACK e6e622e5a0e22c2ac1b50b96af818e412d67ac54

    Reviewed code, compiled, ran tests and benchmarks

    I received better results from the benchmarks after the change, although improvements are not nearly as significant as the others reported. Not sure why that is, I ran them multiple times with different configs, without enable_debug for example, and results were consistent.

    Before (HEAD detached at 89fb241c5):

    0$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
    1WARNING: This is a debug build - may result in slower benchmarks.
    2# Benchmark, evals, iterations, total, min, max, median
    3VerifyNestedIfScript, 50, 100, 8.34206, 0.0016061, 0.00195646, 0.00163048
    

    After:

    0$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
    1WARNING: This is a debug build - may result in slower benchmarks.
    2# Benchmark, evals, iterations, total, min, max, median
    3VerifyNestedIfScript, 50, 100, 7.66019, 0.00146802, 0.00266328, 0.00149105
    
  33. MarcoFalke commented at 1:19 pm on February 13, 2020: member
    Can anyone explain me why everyone is benchmarking with optimizations disabled? I hope all nodes on mainnet are running -O2, at least for applications where performance matters.
  34. fjahr commented at 1:22 pm on February 13, 2020: member

    Can anyone explain me why everyone is benchmarking with optimizations disabled? I hope all nodes on mainnet are running -O2, at least for applications where performance matters.

    I did both (with optimizations first). Then I tested it without because my results were different than the others here and ended up posting those because of the differences to make them more comparable. Here are some results with optimizations:

    Before:

    0$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
    1# Benchmark, evals, iterations, total, min, max, median
    2VerifyNestedIfScript, 50, 100, 0.789322, 0.000153463, 0.000178044, 0.000157402
    

    After:

    0$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
    1# Benchmark, evals, iterations, total, min, max, median
    2VerifyNestedIfScript, 50, 100, 0.77861, 0.000145174, 0.000211004, 0.000154109
    
  35. MarcoFalke added the label Needs gitian build on Feb 13, 2020
  36. MarcoFalke removed the label Needs gitian build on Feb 13, 2020
  37. MarcoFalke added the label Needs gitian build on Feb 13, 2020
  38. jonatack commented at 5:57 pm on February 13, 2020: member

    Fair enough. Compiled with --enable-bench CXXFLAGS="-O2"

    before

    0((HEAD detached at 89fb241c54))$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
    1# Benchmark, evals, iterations, total, min, max, median
    2VerifyNestedIfScript, 50, 100, 1.154,   0.000201797, 0.000381135, 0.000218384
    3VerifyNestedIfScript, 50, 100, 1.11177, 0.000202117, 0.000342534, 0.000218816
    4VerifyNestedIfScript, 50, 100, 1.1449,  0.000201718, 0.000425754, 0.00021702
    5VerifyNestedIfScript, 50, 100, 1.21042, 0.000206192, 0.000390067, 0.000229842
    

    after

    0((HEAD detached at origin/pr/16902))$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
    1# Benchmark, evals, iterations, total, min, max, median
    2VerifyNestedIfScript, 50, 100, 0.554961, 8.99878e-05, 0.000188464, 9.65497e-05
    3VerifyNestedIfScript, 50, 100, 0.475112, 8.24424e-05, 0.000149663, 9.15264e-05
    4VerifyNestedIfScript, 50, 100, 0.491641, 8.80607e-05, 0.000171669, 9.19584e-05
    5VerifyNestedIfScript, 50, 100, 0.496316, 8.20691e-05, 0.00017851,  9.32482e-05
    
  39. NicolasDorier commented at 7:52 am on February 14, 2020: contributor

    ACK,

    I was wondering about the assert I suggested on #16902 (review)

    I don’t really know if it is good idea or not. While theorically possible, even if it was practically possible, I guess it would be more desirable to get a IF bypassed rather than a crash of the node. Somebody make 4 billions of OP_IF probably can’t steal anybody, but might want to crash nodes.

  40. DrahtBot removed the label Needs gitian build on Feb 17, 2020
  41. ajtowns commented at 4:04 am on March 2, 2020: member

    ACK e6e622e5a0e22c2ac1b50b96af818e412d67ac54

    I don’t understand fjahr’s results and investigating further didn’t reveal anything; seems like any quadratic behaviour is getting completely lost in the noise there somehow. Might be worth adding additional benchmarks once tapscript removes the opcode limit to report the performance of very large scripts; say 10k nested IFs and 20k OP_NOPs vs 40k OP_NOPs etc?

  42. MarcoFalke deleted a comment on Mar 2, 2020
  43. laanwj commented at 8:00 pm on March 14, 2020: member
    concept and code review ACK e6e622e5a0e22c2ac1b50b96af818e412d67ac54
  44. laanwj merged this on Mar 14, 2020
  45. laanwj closed this on Mar 14, 2020

  46. fanquake removed the label Needs Conceptual Review on Mar 14, 2020
  47. sidhujag referenced this in commit 36b125ba68 on Mar 15, 2020
  48. sidhujag referenced this in commit afa5aedab6 on Nov 10, 2020
  49. Fabcien referenced this in commit 31616ea119 on Jan 8, 2021
  50. Fabcien referenced this in commit 9f3f4dd474 on Jan 8, 2021
  51. Fabcien referenced this in commit 4ace18ab64 on Jan 8, 2021
  52. ftrader referenced this in commit 37261c6d4f on Apr 14, 2021
  53. ftrader referenced this in commit bea1a7fab6 on Apr 14, 2021
  54. ftrader referenced this in commit 30313fdb51 on Apr 14, 2021
  55. Christewart referenced this in commit c5be8c1be5 on May 1, 2021
  56. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me