O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation

sipa commented at 2:53 am on September 18, 2019: member

While investigating what mechanisms are possible to maximize the per-opcode verification cost of scripts, I noticed that the logic for determining whether a particular opcode is to be executed is O(n) in the nesting depth. This issue was also pointed out by Sergio Demian Lerner in https://bitslog.wordpress.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/, and this PR implements a variant of the O(1) algorithm suggested there.

This is not a problem currently, because even with a nesting depth of 100 (the maximum possible right now due to the 201 ops limit), the slowdown caused by this on my machine is around 70 ns per opcode (or 0.25 s per block) at worst, far lower than what is possible with other opcodes.

This PR mostly serves as a proof of concept that it’s possible to avoid it, which may be relevant in discussions around increasing the opcode limits in future script versions. Without it, the execution time of scripts can grow quadratically with the nesting depth, which very quickly becomes unreasonable.

This improves upon #14245 by completely removing the vfExec vector.

fanquake added the label Needs Conceptual Review on Sep 18, 2019

sipa force-pushed on Sep 18, 2019

fanquake added the label Resource usage on Sep 18, 2019

DrahtBot commented at 5:06 am on September 18, 2019: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#18011 (Replace current benchmarking framework with nanobench by martinus)
#13062 (Make script interpreter independent from storage type CScript by sipa)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

ajtowns commented at 6:22 am on September 18, 2019: member

It might be easier to review if the operations on vfExec are pulled out into a separate class first, so that the optimisation just means updating the class from:

0class FalseCounter {
1private:
2    std::vector<bool> flags;
3public:
4    bool all_true() { return !count(flags.begin(), flags.end(), false); }
5    void push_back(bool f) { flags.push_back(f); }
6    void pop_back() { flags.pop_back(); }
7    bool empty() { return flags.empty(); }
8    void toggle_top() { flags.back() = !flags.back(); }
9};

to

 0class FalseCounter {
 1private:
 2    int depth = 0;
 3    int first_false = 0;
 4public:
 5    bool all_true() { return first_false == 0; }
 6    void push_back(bool f) {
 7        ++depth;
 8        if (first_false == 0 && !f) first_false = depth;
 9    }
10    void pop_back() {
11        if (first_false == depth) first_false = 0;
12        --depth;
13    }
14    bool empty() { return depth == 0; }
15    void toggle_top() {
16        if (first_false == 0) {
17            first_false = depth;
18        } else if (first_false == depth) {
19            first_false = 0;
20        } else {
21            // no action needed as change is unobservable
22        }
23    }
24};

without changing the usage of the class. I think it’d be plausible to do a coq proof that those implementations are logically equivalent / implement the same API (at least given !empty() preconditions for pop_back() and toggle_top()).

cf https://github.com/ajtowns/bitcoin/commits/201909-vfexec-optimise

sipa commented at 6:28 am on September 18, 2019: member

@ajtowns If there’s interest in including this patch at some point we should pick up that approach.

practicalswift commented at 9:22 am on September 18, 2019: contributor

Somewhat related issue: https://github.com/sipa/miniscript/issues/7 (“Policies with too high nesting depth are not rejected: It is possible to create small inputs that cause extreme memory usage (in practice: OOM kill or std::bad_alloc)”)

Context: Nested tresh tend to be very OP_IF/OP_ELSE/OP_ENDIF intensive (or_i) when compiled to script :)

sipa force-pushed on Nov 4, 2019

sipa renamed this:
~~[POC] O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation~~
O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation
on Nov 4, 2019

sipa force-pushed on Nov 4, 2019

sipa commented at 6:54 pm on November 4, 2019: member

I’ve modified the code to follow @ajtowns’s approach more closely, though with fewer commits, and extra comments.

sipa force-pushed on Nov 4, 2019

in src/script/interpreter.cpp:300 in d4721cd00b outdated

296  */
297 class ConditionStack {
298 private:
299-    std::vector<bool> m_flags;
300+    //! The size of the implied stack.
301+    uin32_t m_stack_size = 0;

ajtowns commented at 8:50 am on November 5, 2019:

uint32_t with a t :)

sipa commented at 8:47 pm on November 6, 2019:

Fixed.

in src/script/interpreter.cpp:302 in d4721cd00b outdated

298 private:
299-    std::vector<bool> m_flags;
300+    //! The size of the implied stack.
301+    uin32_t m_stack_size = 0;
302+    //! The position of the first false value on the implied stack, or -1 if all true.
303+    int32_t m_first_false_pos = -1;

ajtowns commented at 8:54 am on November 5, 2019:

Using int32_t gives “comparison of integers of different signs” warnings. Having uint32_t m_first_false_pos_plus_one = 0 instead avoids that, and seems to make the code simpler (avoids a bunch of decrements/increments). I think you could call it m_size_of_stack_including_first_false with 0 as a special value, and have it make sense, maybe?

sipa commented at 8:47 pm on November 6, 2019:

I’m instead adding a NO_FALSE constant (equal to 0xFFFFFFFF), and reordering operations a bit to avoid some - 1s.

sipa force-pushed on Nov 6, 2019

sipa force-pushed on Nov 7, 2019

Benchmark script verification with 100 nested IFs 89fb241c54

[refactor] interpreter: define interface for vfExec

Includes comments added by Pieter Wuille.

d0e8f4d5d8

Implement O(1) OP_IF/NOTIF/ELSE/ENDIF logic

This optimization was first suggested by Sergio Demian Lerner in
https://bitslog.wordpress.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/.
The implementation follows the suggested approach there, but with a slightly
simpler representation.

e6e622e5a0

sipa force-pushed on Nov 7, 2019

nkohen referenced this in commit e10228b74d on Nov 7, 2019

Christewart referenced this in commit d86acfffed on Nov 8, 2019

in src/script/interpreter.cpp:310 in e6e622e5a0

305+    uint32_t m_first_false_pos = NO_FALSE;
306+
307+public:
308+    bool empty() { return m_stack_size == 0; }
309+    bool all_true() { return m_first_false_pos == NO_FALSE; }
310+    void push_back(bool f)

NicolasDorier commented at 11:36 am on January 26, 2020:

I think an assert(m_stack_size != NO_FALSE - 1) is needed. Also, does not seems caller are checking this. Unsure if this can realistically taken advantage of, but I think it does not hurt the scriptevaluator to check for it.

ajtowns commented at 2:18 am on January 28, 2020:

You’d need a script with 4 billion OP_IF’s to hit that (so a tx with 4GB of data or over 1G weight), so taking advantage of it shouldn’t be possible… Adding the assert seems plausible though.

roconnor-blockstream commented at 2:11 pm on March 18, 2020:

In particular Script length is limited to MAX_SIZE (2^25) bytes.

jnewbery commented at 3:31 pm on February 11, 2020: member

Code review ACK e6e622e5a0e22c2ac1b50b96af818e412d67ac54

I’ve run the bench and this change cuts time by ~75%

before:

0→ ./bench_bitcoin -filter=VerifyNestedIfScript -evals=50
1WARNING: This is a debug build - may result in slower benchmarks.
2# Benchmark, evals, iterations, total, min, max, median
3VerifyNestedIfScript, 50, 100, 14.0987, 0.00267213, 0.00323801, 0.00277969

after:

0→ ./bench_bitcoin -filter=VerifyNestedIfScript -evals=50
1WARNING: This is a debug build - may result in slower benchmarks.
2# Benchmark, evals, iterations, total, min, max, median
3VerifyNestedIfScript, 50, 100, 3.15142, 0.000624684, 0.0006379, 0.000630272

I think the asserts in the final implementations of pop_back() and toggle_top() can actually be added in the intermediate commit to better illustrate that this doesn’t change behaviour. I’ve done that, as well as adding the assert suggested in #16902 (review) and some more commenting here: https://github.com/jnewbery/bitcoin/tree/pr16902.1. Feel free to take any of that if you think it’s an improvement.

MarcoFalke commented at 3:11 pm on February 12, 2020: member

ACK e6e622e5a0e22c2ac1b50b96af818e412d67ac54 🐴

Checked that d0e8f4d5d8ddaccb37f98b7989fb944081e41ab8 is moving the code around and that the symbol ‘count’ is equal to ‘std::count’ in that call stack
Checked that e6e622e5a0e22c2ac1b50b96af818e412d67ac54 is not a behavior change by looking at the code and running the fuzzer from #18127 for a couple of minutes

Signature:

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA512
 2
 3ACK e6e622e5a0e22c2ac1b50b96af818e412d67ac54 🐴
 4
 5* Checked that d0e8f4d5d8ddaccb37f98b7989fb944081e41ab8 is moving the code around and that the symbol 'count' is equal to 'std::count' in that call stack
 6* Checked that e6e622e5a0e22c2ac1b50b96af818e412d67ac54 is not a behavior change by looking at the code and running the fuzzer from [#18127](/bitcoin-bitcoin/18127/) for a couple of minutes
 7
 8
 9
10
11-----BEGIN PGP SIGNATURE-----
12
13iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
14pUidcQwAkkads4/9CEnAK1tMJcqU+hjCau8yTqstSzi9plrVJ3gUijrrQ5IaKljg
159i88GlQS42vo83ySz+MRvXBV0oeQGOlwS6mcGjaiNyYKlDPRlbZ5XPlKHa6DbqVp
16pJr0rDuzUTB5o7F2w/01CNEtptYFUes+CWm0TiSWAiKJJ5j5W54F8RXQDSs/SZij
17ATXAg3dj7OKvV2D7mbh9zK0ZODZnTFjBn5ej5onLXKsf21aypZRfrA30bdFnyemX
181lsYXwvaZYgieVIqGEm2pbBRdEiIKNwivYblMvBEVnf8cPehG0hM2PJNcREGN/ZT
19jObzNT+N/529p/no4RImfiW7+QZ8mmD6pFgV18T0t+Aw7z5KnZYnrS5iXgYPmPeS
20ca1/zKhFc73gCOlfitookclqT2f2F1m7EUVW0lEoTSkhSZpuQfiLa/C/8QFYdr/s
21sUodGjEcj1SwjwTrTnxadDjsVHIPfqV5kGSWzY4m5EtvQ1JERviJkoG+ME0awkPR
22NhdKgKb5
23=ZGkg
24-----END PGP SIGNATURE-----

Timestamp of file with hash 5bbbbde183fde736dde66bf963f6b1f690bb0927454cf1500b52aeadf50396c3 -

in src/script/interpreter.cpp:334 in e6e622e5a0

329+    {
330+        assert(m_stack_size > 0);
331+        if (m_first_false_pos == NO_FALSE) {
332+            // The current stack is all true values; the first false will be the top.
333+            m_first_false_pos = m_stack_size - 1;
334+        } else if (m_first_false_pos == m_stack_size - 1) {

jonatack commented at 4:52 pm on February 12, 2020:

pico-nit: perhaps execute m_stack_size - 1 between the assert line 330 and the start of the conditional line 331 and use the result in lines 333 and 334 in the conditional

jonatack commented at 5:01 pm on February 12, 2020: member

ACK e6e622e5a0e22c2ac1b50b96af818e412d67ac54 code review, build, benches, fuzzing

bench before

0(HEAD detached at 89fb241c54)$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
1WARNING: This is a debug build - may result in slower benchmarks.
2# Benchmark, evals, iterations, total, min, max, median
3VerifyNestedIfScript, 50, 100, 24.0408, 0.00434014, 0.00559945, 0.00475711
4VerifyNestedIfScript, 50, 100, 23.5444, 0.00428717, 0.00569794, 0.00467757
5VerifyNestedIfScript, 50, 100, 24.6077, 0.00431298, 0.00633760, 0.00485635

bench after

0(pr/16902)$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
1WARNING: This is a debug build - may result in slower benchmarks.
2# Benchmark, evals, iterations, total, min, max, median
3VerifyNestedIfScript, 50, 100, 5.49356, 0.000954118, 0.00140947, 0.00106701
4VerifyNestedIfScript, 50, 100, 6.64376, 0.000977855, 0.00202046, 0.00132529
5VerifyNestedIfScript, 50, 100, 5.88235, 0.000968067, 0.00177160, 0.00110281

fuzzing results (#18127)

 0(pr/18127)$ src/test/fuzz/script_condition_stack 
 1INFO: Seed: 3449044402
 2INFO: Loaded 1 modules   (510932 inline 8-bit counters): 510932 [0x5642d9dc1678, 0x5642d9e3e24c), 
 3INFO: Loaded 1 PC tables (510932 PCs): 510932 [0x5642d9e3e250,0x5642da609f90), 
 4INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
 5INFO: A corpus is not provided, starting from an empty corpus
 6[#2](/bitcoin-bitcoin/2/)	INITED cov: 177 ft: 178 corp: 1/1b exec/s: 0 rss: 117Mb
 7[#3](/bitcoin-bitcoin/3/)	NEW    cov: 179 ft: 242 corp: 2/59b exec/s: 0 rss: 117Mb L: 58/58 MS: 1 InsertRepeatedBytes-
 8[#4](/bitcoin-bitcoin/4/)	NEW    cov: 179 ft: 290 corp: 3/90b exec/s: 0 rss: 117Mb L: 31/58 MS: 1 EraseBytes-
 9.../...
10[#5136](/bitcoin-bitcoin/5136/)	REDUCE cov: 438 ft: 2238 corp: 116/218Kb exec/s: 7 rss: 152Mb L: 269/4096 MS: 1 EraseBytes-

I’ll leave the fuzzer running for a while longer.

jonatack commented at 5:31 pm on February 12, 2020: member

I like the changes in @jnewbery’s branch mentioned in #16902 (comment)) esp. this code comment.

fjahr commented at 0:19 am on February 13, 2020: member

ACK e6e622e5a0e22c2ac1b50b96af818e412d67ac54

Reviewed code, compiled, ran tests and benchmarks

I received better results from the benchmarks after the change, although improvements are not nearly as significant as the others reported. Not sure why that is, I ran them multiple times with different configs, without enable_debug for example, and results were consistent.

Before (HEAD detached at 89fb241c5):

0$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
1WARNING: This is a debug build - may result in slower benchmarks.
2# Benchmark, evals, iterations, total, min, max, median
3VerifyNestedIfScript, 50, 100, 8.34206, 0.0016061, 0.00195646, 0.00163048

After:

0$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
1WARNING: This is a debug build - may result in slower benchmarks.
2# Benchmark, evals, iterations, total, min, max, median
3VerifyNestedIfScript, 50, 100, 7.66019, 0.00146802, 0.00266328, 0.00149105

MarcoFalke commented at 1:19 pm on February 13, 2020: member

Can anyone explain me why everyone is benchmarking with optimizations disabled? I hope all nodes on mainnet are running -O2, at least for applications where performance matters.

fjahr commented at 1:22 pm on February 13, 2020: member

Can anyone explain me why everyone is benchmarking with optimizations disabled? I hope all nodes on mainnet are running -O2, at least for applications where performance matters.

I did both (with optimizations first). Then I tested it without because my results were different than the others here and ended up posting those because of the differences to make them more comparable. Here are some results with optimizations:

Before:

0$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
1# Benchmark, evals, iterations, total, min, max, median
2VerifyNestedIfScript, 50, 100, 0.789322, 0.000153463, 0.000178044, 0.000157402

After:

0$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
1# Benchmark, evals, iterations, total, min, max, median
2VerifyNestedIfScript, 50, 100, 0.77861, 0.000145174, 0.000211004, 0.000154109

MarcoFalke added the label Needs gitian build on Feb 13, 2020

MarcoFalke removed the label Needs gitian build on Feb 13, 2020

MarcoFalke added the label Needs gitian build on Feb 13, 2020

jonatack commented at 5:57 pm on February 13, 2020: member

Fair enough. Compiled with --enable-bench CXXFLAGS="-O2"

before

0((HEAD detached at 89fb241c54))$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
1# Benchmark, evals, iterations, total, min, max, median
2VerifyNestedIfScript, 50, 100, 1.154,   0.000201797, 0.000381135, 0.000218384
3VerifyNestedIfScript, 50, 100, 1.11177, 0.000202117, 0.000342534, 0.000218816
4VerifyNestedIfScript, 50, 100, 1.1449,  0.000201718, 0.000425754, 0.00021702
5VerifyNestedIfScript, 50, 100, 1.21042, 0.000206192, 0.000390067, 0.000229842

after

0((HEAD detached at origin/pr/16902))$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
1# Benchmark, evals, iterations, total, min, max, median
2VerifyNestedIfScript, 50, 100, 0.554961, 8.99878e-05, 0.000188464, 9.65497e-05
3VerifyNestedIfScript, 50, 100, 0.475112, 8.24424e-05, 0.000149663, 9.15264e-05
4VerifyNestedIfScript, 50, 100, 0.491641, 8.80607e-05, 0.000171669, 9.19584e-05
5VerifyNestedIfScript, 50, 100, 0.496316, 8.20691e-05, 0.00017851,  9.32482e-05

NicolasDorier commented at 7:52 am on February 14, 2020: contributor

ACK,

I was wondering about the assert I suggested on #16902 (review)

I don’t really know if it is good idea or not. While theorically possible, even if it was practically possible, I guess it would be more desirable to get a IF bypassed rather than a crash of the node. Somebody make 4 billions of OP_IF probably can’t steal anybody, but might want to crash nodes.

DrahtBot removed the label Needs gitian build on Feb 17, 2020

ajtowns commented at 4:04 am on March 2, 2020: member

ACK e6e622e5a0e22c2ac1b50b96af818e412d67ac54

I don’t understand fjahr’s results and investigating further didn’t reveal anything; seems like any quadratic behaviour is getting completely lost in the noise there somehow. Might be worth adding additional benchmarks once tapscript removes the opcode limit to report the performance of very large scripts; say 10k nested IFs and 20k OP_NOPs vs 40k OP_NOPs etc?

MarcoFalke deleted a comment on Mar 2, 2020

laanwj commented at 8:00 pm on March 14, 2020: member

concept and code review ACK e6e622e5a0e22c2ac1b50b96af818e412d67ac54

laanwj merged this on Mar 14, 2020

laanwj closed this on Mar 14, 2020

fanquake removed the label Needs Conceptual Review on Mar 14, 2020

sidhujag referenced this in commit 36b125ba68 on Mar 15, 2020

sidhujag referenced this in commit afa5aedab6 on Nov 10, 2020

Fabcien referenced this in commit 31616ea119 on Jan 8, 2021

Fabcien referenced this in commit 9f3f4dd474 on Jan 8, 2021

Fabcien referenced this in commit 4ace18ab64 on Jan 8, 2021

ftrader referenced this in commit 37261c6d4f on Apr 14, 2021

ftrader referenced this in commit bea1a7fab6 on Apr 14, 2021

ftrader referenced this in commit 30313fdb51 on Apr 14, 2021

Christewart referenced this in commit c5be8c1be5 on May 1, 2021

DrahtBot locked this on Feb 15, 2022

O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation #16902

Conflicts