Reuse sighash computations across evaluation

sipa commented at 10:43 pm on July 19, 2014: member

(As suggested by @jtimon)

We’re currently recomputing the signature hash multiple times during an OP_CHECKMULTISIG validation, mostly all with the same data and nHashType anyway. Implement a small per-evaluation cache to reuse those values.

petertodd commented at 6:26 am on July 20, 2014: contributor

Have you actually benchmarked this? The vast majority of CHECKMULTISIG’s are just two signatures, three at most.

jtimon commented at 1:52 pm on July 22, 2014: contributor

No we didn’t and probably the gains won’t be very big. This came up in a conversation when I was pointing out that if we knew the sighash before calling EvalScript, the serialization could be done before and only once per script evaluation (instead of once per CheckSig). EvalScript receives a nHashType parameter but it’s only used for checking it is the same as the ones contained at the end of the signatures. As shown in #4555, it is almost always called with a 0 which makes CheckSig ignore the check and just use what’s contained at the end of the signature. I’m still struggling to understand why there can be different sighashes for the same script/output and why they’re are placed after the signatures instead of somewhere else. Sipa’s “ask Satoshi” it’s not a satisfactory answer to me…

I’m also still thinking what good be a way to benchmark this…

petertodd commented at 7:50 pm on July 22, 2014: contributor

#4555 is removing a feature in EvalScript() that lets the callee determine what type of sighashes a scriptSig contains. For instance I might want to verify that a scriptSig only contains SIGHASH_ANYONECANPAY sigs to see if I can add/remove inputs.

As for why each CHECKSIG can use a different nHashType and thus sighash, that’s just flexibility. For instance GreenAddress.it’s nLockTime scheme uses that feature - the refund scripts that they sign with nLockTime are signed with ANYONECANPAY|NONE letting the other key do whatever they want too.

What do you mean by “why they’re are placed after the signatures instead of somewhere else”?

jtimon commented at 2:53 am on July 23, 2014: contributor

I don’t understand that use case. Is the refund script signed with anyone can pay and something else at the same time? Is it multisig? I don’t get it. By “placed after the signatures” I mean why they are in vchSig.back() of each checksig call. About the feature that #4555 removes, it is not currently used by the core nor the wallet, but anyway, if there was only one sighash per evalscript/input instead of one sighash for each signature that appears in the scriptSig, that additional check would be trivial to implement at a higher level. It could be a field on each input, for example:

0if (input.nSighash == SIGHASH_ANYONECANPAY)

Probably is too late to change that, but I would like to understand if there was a motivation I’m missing or it was just a design mistake.

petertodd commented at 3:57 am on July 23, 2014: contributor

@jtimon Yeah, it’s multisig.

The vchSig.back() bit is just standard conservatism to make sure a signature can’t be reused for something unintended. I remember finding a case where that prevented a security issue awhile back, although for the life of me I don’t remember the details exactly.

Making the sighash be a part of the CTxIn structure doesn’t really make sense in a system with scripts that’s meant to be expendable. Anyway CPU usage isn’t the bottleneck for any of this stuff.

jtimon commented at 8:27 pm on July 30, 2014: contributor

@simondlr see #4498 for style @petertodd I’m trying to think of a use case in which you need different sigashes for the same input and nothing comes to mind…sigall and sigsingle at the same time? Anyway, my intent wasn’t to improve performance but rather improving readability without hurting performance. I found a potential optimization instead, what I wanted to do was to serialize the transaction previously for each script evaluation, but @sipa pointed out that the transaction serialization depends on the sighash and this was the best you can do.
If the sighash was known and equal for each input, we could replace parameters “const CTransaction& txTo, unsigned int nIn” with “uint256 hash” in VerifyScript. That would be a hardfork now, I know, but I see no functional drawbacks. So I will consider this a design flaw unless someone convinces me that Satoshi had thought this in ways I haven’t that make his decision better.

sipa commented at 8:43 pm on July 30, 2014: member

@jtimon That’s a bit of a stretch. I wouldn’t call it a design flaw just because it’s not understood (it adds a tiny bit of complexity, and has no known drawbacks for performance or security).

laanwj added the label Improvement on Jul 31, 2014

petertodd commented at 9:19 am on July 31, 2014: contributor

@jtimon reread my answer above pointing out how Green address already makes use of the fact that there can be more than one SIGHASH flag in a deployed, production, application.

Premature optimisation is the root of all evil; the scalability problems Bitcoin has aren’t going to be solved with little single digit improvements everywhere at the cost of making the protocol less generic.

jtimon commented at 11:54 am on July 31, 2014: contributor

@sipa I was going to say that it adds some complexity and minimally hurts for no good reason (that I know). But, yes, re-reading @petertodd ’s green address example is enough of a reason, even though I’m not convinced that what they’re trying to do cannot be better done in another way. No one is talking about Bitcoin’s scalability problems, just about small improvements you could have if you could completely redesign the protocol (without hurting functionality), that was my thought exercise, nothing else. Anyway, enough of polluting the thread with what-ifs, my apologies.

petertodd commented at 10:38 pm on July 31, 2014: contributor

@jtimon I’d be inclined to make Bitcoin more generic, not less, in the event of a protocol redesign. That CHECKSIG is multiple operations combined into one is a mistake, not something we should push even further.

sipa commented at 10:27 pm on August 2, 2014: member

I’ve rewritten it to use statically allocated hashes rather than an std::map, so it doesn’t even have dynamic memory overhead.

I think the improvement is small, but maybe worthwhile. I’m not going to spend more time on it, however.

Cache signature hashes during evaluation 31dd5b3727

BitcoinPullTester commented at 10:53 pm on August 2, 2014: none

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4562_31dd5b37272f162155b3dd944a6f8c2b74c7448c/ for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

jtimon commented at 9:46 am on August 5, 2014: contributor

Initially I thought the optimization was going to be a side effect in some refactorings I’m working on. Thank you @sipa for showing me that this is the best I could do optimization-wise and realizing that wasn’t going to be the case. I was also interested in understanding why it was allowed to have different sighash in the same script evaluation, thank you @petertodd for pointing out the green address use case, once there’s one there’s probably more, but I really needed a first one.

This PR has been definitely useful for me, but since we know the optimization will be small and only for multisig, I’m not convinced that the additional software complexity is worth it.

sipa commented at 0:00 am on September 10, 2014: member

Needs rebase, and hasn’t received any interest. Closing.

sipa closed this on Sep 10, 2014

MarcoFalke locked this on Sep 8, 2021

Reuse sighash computations across evaluation #4562