miniscript: avoid wasteful computation, prevent memory blowup when fuzzing

darosior commented at 1:19 pm on July 4, 2022: member

As reported in #24860 (review), the current code to construct a miniscript::Node could cause a blowup on large fuzzer inputs. This is because:

The duplicate key check is redundantly done at parsing time, since we will recursively create miniscript nodes and the constructor will unconditionally look for duplicate across this node’s keys and all its sub-nodes'.
We don’t put an upper bound on the size of the inputs to consider for parsing.

To avoid wasteful computation, and prevent the blowup on some fuzzer inputs, limit the size of reasonable inputs and only perform the check for duplicate keys once when parsing. Regarding the duplicate key check bypass in the constructor we iterated on different approaches, and eventually settled on passing a dummy argument. Albeit less elegant, all other approaches required getting rid of std::make_shared and adding an allocation per node created.

This PR contains code from Pieter Wuille (see commits).

Fixes #25824.

darosior commented at 1:22 pm on July 4, 2022: member

ACK 23db26fc6263e501312bc015d288f502d40cf0c3 – this change was entirely authored by Pieter Wuille (i only patched CheckDuplicateKey and added a trivial unit test). I reviewed and tested it by rebasing #24148 and #24149 on top of it, checking the runtime of the fuzz targets and adapting the targets creating “random” nodes.

As a mean of comparison here are the runtime of the miniscript_string and miniscript_script targets on my machine with the corpus from qa-assets at commit 1e3c7cd69fc06fda49a5286b98069ee5b0d64edc:

Before: Done 1046 runs in 10 second(s) and Done 2908 runs in 13 second(s)
After: Done 1046 runs in 0 second(s) and Done 2908 runs in 6 second(s)

This also made the two targets from #24149 twice as fast on average.

maflcko added this to the milestone 24.0 on Jul 4, 2022

DrahtBot added the label Tests on Jul 4, 2022

maflcko removed the label Tests on Jul 4, 2022

maflcko renamed this:
~~Permit delaying duplicate key check in `miniscript::Node` construction~~
miniscript: Permit delaying duplicate key check in `miniscript::Node` construction
on Jul 4, 2022

DrahtBot added the label Tests on Jul 4, 2022

maflcko removed the label Tests on Jul 4, 2022

DrahtBot added the label Descriptors on Jul 4, 2022

darosior commented at 5:26 pm on July 4, 2022: member

I think we should also have a limit on the maximum number of nodes we want to tolerate when parsing. For instance we would happily parse at this moment a string so long that i can’t upload it on 0bin, and run out of memory. This can be the reason for fuzz crashes too.

EDIT: added a commit for this, and changed the OP.

DrahtBot commented at 5:27 pm on July 4, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#24149 (Signing support for Miniscript Descriptors by darosior)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

darosior force-pushed on Jul 5, 2022

darosior renamed this:
~~miniscript: Permit delaying duplicate key check in `miniscript::Node` construction~~
miniscript: avoid wasteful computation, prevent memory blowup when fuzzing
on Jul 5, 2022

DrahtBot added the label Needs rebase on Jul 14, 2022

darosior force-pushed on Jul 15, 2022

darosior commented at 8:48 am on July 15, 2022: member

Rebased after #24148 was merged.

DrahtBot removed the label Needs rebase on Jul 15, 2022

in src/script/miniscript.h:1045 in 088e5b0b93 outdated

1040     std::vector<NodeRef<Key>> constructed;
1041 
1042     to_parse.emplace_back(ParseContext::WRAPPED_EXPR, -1, -1);
1043 
1044     while (!to_parse.empty()) {
1045+        if (opened_frags > MAX_STANDARD_P2WSH_SCRIPT_SIZE) return {};

Sjors commented at 1:15 pm on July 16, 2022:

IIUC this limit is specific to Miniscript inside P2WSH. That’s the case now for our descriptor implementation. Maybe use MAX_SCRIPT_SIZE (10,000) instead?

See also Resource Limitations under https://bitcoin.sipa.be/miniscript/

darosior commented at 12:29 pm on July 19, 2022:

The Miniscript code already is P2WSH specifc. We can change this once we implement Tapscript support in Miniscript.

in src/script/miniscript.h:310 in 6705b9cb02 outdated

307-    const bool duplicate_key;
308+    //! Whether a public key appears more than once in this node. This value is initialized
309+    //! by all constructors except the NoDupCheck ones. The NoDupCheck ones skip the
310+    //! computation, requiring it to be done manually by invoking DuplicateKeyCheck().
311+    //! DuplicateKeyCheck(), or a non-NoDupCheck constructor, will compute has_duplicate_keys
312+    //! for all subnodes as well.

Sjors commented at 1:18 pm on July 16, 2022:

I’m assuming this is useful because it allows performing other checks (like size limit) before checking for duplicates? Is so, maybe say that in the comment.

darosior commented at 12:35 pm on July 19, 2022:

It could technically be used for this but it’s not the intended purpose: we only ever use it to skip redundant computation.

in src/script/miniscript.h:1060 in 088e5b0b93 outdated

1050 
1051         switch (cur_context) {
1052         case ParseContext::WRAPPED_EXPR: {
1053-            int colon_index = -1;
1054-            for (int i = 1; i < (int)in.size(); ++i) {
1055+            std::optional<size_t> colon_index{};

achow101 commented at 7:03 pm on July 18, 2022:

In 088e5b0b9317bd1d7c9755fb839b7326d99bc910 “miniscript: upper bound on the size of inputs to consider for parsing”

It seems like this is an unrelated change. I don’t see how changing colon_index to an optional is related to the size check.

darosior commented at 12:27 pm on July 19, 2022:

In order to avoid explicit cast all around the place i made it a size_t, which makes it impossible to use the -1 magic value. Hence the optional.

achow101 commented at 7:03 pm on July 18, 2022: member

I noticed that inside of Parse, there are 3 MakeNodeRef calls which are doing the duplicate keys check. Is that intentional, and if so, why? These are for parsing AND_N, ANDOR, and THRESH.

darosior force-pushed on Aug 11, 2022

darosior commented at 9:40 am on August 11, 2022: member

i didn’t author this commit, but i don’t think it was intentional. I see no reason for duplicating the checks there. I pushed a version removing them.

(Sorry, i had forgotten to reply to your question.)

achow101 commented at 7:51 pm on August 15, 2022: member

ACK 9f92de39e126e3655fbcafd0e19d4b31c1ff099d

glozow commented at 3:29 pm on September 8, 2022: member

Pinging @sipa @sanket1729 if you wouldn’t mind taking a look? Or perhaps @stickies-v, @jb55, @Sjors since you reviewed #24147?

in src/script/miniscript.h:1033 in 9f92de39e1 outdated

1029@@ -1030,29 +1030,39 @@ inline NodeRef<Key> Parse(Span<const char> in, const Ctx& ctx)
1030 {
1031     using namespace spanparsing;
1032 
1033+    // Account for the number of fragments we started to parse. Any fragment is at least one byte

sipa commented at 3:20 pm on September 12, 2022:

That’s not true for v: and and_v. It probably doesn’t matter, but it makes the analysis harder.

sipa commented at 6:39 pm on September 12, 2022: member

@darosior See https://github.com/sipa/bitcoin/commits/202209_miniscript_parsesize for a commit that instead of opened fragments computes the actual script size redundantly during string parsing. That’s definitely overkill, but it has the advantage of being very sensitively testable. What do you think?

darosior commented at 1:51 pm on September 14, 2022: member

@sipa your commit doesn’t limit the number of v: and and_v we allow, so we could still crash/timeout on parsing a large and_v(and_v(and_v(....... string.

I had incorrectly thought about the and_v and v: in my version by double counting some ops.

What do you think of a simple arbitrary limit on the number of characters in the string instead? Something like 100_000?

sipa commented at 2:11 pm on September 14, 2022: member

@darosior Good call!

I think there may be another strategy too based on:

every “leaf” (pk_k, pk_h, hashes, older, after) is a nonzero number of script bytes
every combinator with more than 1 argument increases the total number of leaves by at least one, so will increase the total script size by at least 1, even and_v.
every combinator with 1 argument also increases the script size by at least one, except v:, but v: cannot be recursed (vv: is never valid), so if it happens in a sequence, it has to be interleaved with other things which do add script size.

So I think a strategy could be similar to the script size counting I’m doing in my current commit, but instead “borrow” 1 byte from each leaf, and add it to the combinator that added that leaf to the count.

darosior commented at 2:58 pm on September 14, 2022: member

I thought about something along these lines as well. By taking a lower bound of 0.5 bytes per fragment (because of v:), we could have accounted for half a byte per combinator and nothing for leaf fragments. But this becomes an issue when the number of leaf fragments we parse inside a combinator isn’t bounded, like in thresh.

So we are left with having to conditionally account for leaf fragments, as you mention with “borrowing”. I’m not sure the complexity is worth it, but we could do this on top of my current version. I’ve pushed a fixup commit implementing this on top of my branch here. Let me know if i’m missing something or if you think your version accurately accounting for fragments’ sizes is still preferable (since it’s applicable to it as well).

sipa commented at 4:09 pm on September 14, 2022: member

@darosior See added commit in https://github.com/sipa/bitcoin/commits/202209_miniscript_parsesize

darosior force-pushed on Sep 15, 2022

sipa commented at 6:25 pm on September 15, 2022: member

@darosior Not sure you saw my commit above, WDYT? It has the property that every fragment increases the script_size variable now, with the exception of v: (which needs to be interleaved with others), and 0 and 1 (but those are terminal, so other combinators which do increment the variable are needed to even make a space to add a 0 or 1).

~~Its comments are outdated, though. I’ll update those if desired.~~

EDIT: squashed and updated comments

Permit delaying duplicate key check in miniscript::Node construction 4cb8f9a92c

Make miniscript string parsing account for exact script size as bound

Co-Authored-by: Antoine Poinsot <darosior@protonmail.com>

e8cc2e4afc

darosior force-pushed on Sep 17, 2022

darosior commented at 1:20 pm on September 17, 2022: member

I picked @sipa’s latest commit and squashed mine with it. Although it replicates all the script size calculation, his version indeed gives better assurance that the check is in fact correct. Thanks!

I ran this branch against the descriptor_parse fuzz corpus augmented with the seeds from https://github.com/bitcoin-core/qa-assets/pull/92. I don’t hit a timeout anymore, neither did it hit the new assertion that the parsing-time calculation of the script size is correct.

ACK e8cc2e4afc1142aa2b3da19cd5c17deea9963244 – it’s my own PR but most of the code here was written by sipa. I’ve reviewed and tested it.

fanquake requested review from achow101 on Sep 18, 2022

sipa commented at 2:27 pm on September 19, 2022: member

ACK e8cc2e4afc1142aa2b3da19cd5c17deea9963244 (for the few parts of the code that aren’t mine)

glozow merged this on Sep 19, 2022

glozow closed this on Sep 19, 2022

sidhujag referenced this in commit a06877b1b4 on Sep 20, 2022

achow101 referenced this in commit 2b2c970627 on Sep 21, 2022

sidhujag referenced this in commit 8d81a841f5 on Sep 23, 2022

darosior deleted the branch on Feb 26, 2023

bitcoin locked this on Feb 26, 2024

miniscript: avoid wasteful computation, prevent memory blowup when fuzzing #25540

Conflicts