Inline signature serializer #2645

sipa commented at 2:24 PM on May 12, 2013: member

Instead of building a full copy of a CTransaction being signed, and then modifying bits and pieces until its fits the form necessary for computing the signature hash, use a wrapper serializer that only serializes the necessary bits on-the-fly.

This makes it easier to see which data is actually being hashed, reduces load on the heap, and also marginally improves performances (around 3-4us/sigcheck here). The performance improvements are much larger for large transactions, though.

The old implementation of SignatureHash is moved to a unit tests, to test whether the old and new algorithm result in the same value for randomly-constructed transactions.

sipa cross-referenced this on May 21, 2013 from issue Add a few data-driven tests for SIGHASH_ANYONECANPAY by TheBlueMatt

sipa commented at 10:11 AM on June 16, 2013: member

Anyone interested in this? @gavinandresen @jgarzik @laanwj @gmaxwell ?

jgarzik commented at 1:08 PM on June 17, 2013: contributor

So, it's one of those things :)

It is a good change, and something my python code should probably emulate to boost python speed. Technical review yields an ACK.

The main issues are external: the area of code being touched is a critical component, and this totally rewrites the component [in a positive way], in a slightly more complex way. That increases the risk profile. It correspondingly lacks a driving use case, making this is a very high risk, useful cleanup.

Your inclusion of the old code as a test was an excellent move, and does serve to help mitigate this change risk.

in src/script.cpp:None in 7eb5183e40 outdated

 981 | -    {
 982 | -        printf("ERROR: SignatureHash() : nIn=%d out of range\n", nIn);
 983 | -        return 1;
 984 | +public:
 985 | +    CTransactionSignatureSerializer(const CTransaction &txToIn, const CScript &scriptCodeIn, unsigned int nInIn, int nHashTypeIn) :
 986 | +        txTo(txToIn), scriptCode(scriptCodeIn), nIn(nInIn), fAnyoneCanPay(!!(nHashTypeIn & SIGHASH_ANYONECANPAY)),

laanwj commented at 7:17 PM on June 17, 2013:

As there is quite some complex logic here, I'd prefer if each variable was on a separate line (at least fAnyoneCanPay, fHashSingle, fHashNone).

sipa commented at 9:25 AM on June 18, 2013:

Done.

petertodd commented at 7:21 PM on June 17, 2013: contributor

Are the code paths independent enough to run them side-by-side on live data? If so we could do exactly that and have bitcoind quit if they disagree. You'd then want to enable this behavior probabilistically so that at any time a small subset of the network is using the new code, say 1/8th. If a bug or exploit starts getting used hopefully bug reports will start flowing in. The reason why you would do this probabilistically, as opposed to just enabling side-by-side checking for everyone, is to ensure that a failure resulting in a crash/exploit will not affect the majority of the network either.

Of course, prior to doing that, run such a side-by-side node yourself, which implies step zero should be just a command line option defaulting to off.

sipa commented at 7:24 PM on June 17, 2013: member

@petertodd If it takes that much effort to convince you this is safe, I'd rather just close this pull request. (note: this isn't a argument to convince you otherwise in any case; it's just not worth that much effort).

I've of course already run this on testnet and mainnet without problems, but that doesn't mean anything for potential false positives.

laanwj commented at 7:27 PM on June 17, 2013: member

I agree with this change as it makes what happens more explicit. On the other hand it seems risky as it's very hard to verify or prove that the hashed parts stay exactly the same.

petertodd commented at 7:56 PM on June 17, 2013: contributor

@sipa I haven't reviewed the code carefully enough to be convinced one way or the other technically - I just wanted to float the idea given it's an option we haven't used yet.

That said inline signature serialization helps most with the largest transactions; I don't think we know much yet about what the advantages and disadvantages of allowing really large and complex transactions actually are yet and there are a whole lot of tradeoffs between networking, UTXO proofs, distributed verification etc. On that basis I'm slightly inclined to say NACK for possible premature optimization, but do keep the code around for later.

sipa commented at 10:05 PM on August 15, 2013: member

@gavinandresen @gmaxwell @jgarzik I'd like some opinions about this. I'm not going to work on this further, but I believe it is sufficiently tested. If it's not considered useful or too risky, I'd rather close it.

gmaxwell commented at 10:23 PM on August 15, 2013: contributor

ACK. I'm happy with this, and the randomized unit tests builds confidence. I was running it on a live node for a while, fwiw.

jgarzik commented at 10:36 PM on August 15, 2013: contributor

Nothing to add beyond my above comments, which still apply.

gavinandresen commented at 11:34 PM on August 15, 2013: contributor

I spent some time this morning trying to break this; the good news is I couldn't, so ACK overall.

I did find some weaknesses in the unit test, though; the same 20,000 random transactions were being tested every time, not all possible SIGHASH values were being tested (only "reasonable" ones), and you weren't testing some ==0/empty edge cases (important to test invalid transactions as well as valid ones).

Suggested patch:

diff --git a/src/test/sighash_tests.cpp b/src/test/sighash_tests.cpp
index cf1f486..fd2ab30 100644
--- a/src/test/sighash_tests.cpp
+++ b/src/test/sighash_tests.cpp
@@ -70,13 +70,13 @@ uint256 static SignatureHashOld(CScript scriptCode, const CTransaction& txTo, un
 void static RandomScript(CScript &script) {
     static const opcodetype oplist[] = {OP_FALSE, OP_1, OP_2, OP_3, OP_CHECKSIG, OP_IF, OP_VERIF, OP_RETURN, OP_CODESEPAR
     script = CScript();
-    int ops = (insecure_rand() % 10) + 1;
+    int ops = (insecure_rand() % 10);
     for (int i=0; i<ops; i++)
         script << oplist[insecure_rand() % (sizeof(oplist)/sizeof(oplist[0]))];
 }

 void static RandomTransaction(CTransaction &tx, bool fSingle) {
-    tx.nVersion = (insecure_rand() % 2) + 1;
+    tx.nVersion = insecure_rand();
     tx.vin.clear();
     tx.vout.clear();
     tx.nLockTime = (insecure_rand() % 2) ? insecure_rand() : 0;
@@ -93,7 +93,7 @@ void static RandomTransaction(CTransaction &tx, bool fSingle) {
     for (int out = 0; out < outs; out++) {
         tx.vout.push_back(CTxOut());
         CTxOut &txout = tx.vout.back();
-        txout.nValue = (insecure_rand() % 100000000) + 1;
+        txout.nValue = insecure_rand() % 100000000;
         RandomScript(txout.scriptPubKey);
     }
 }
@@ -102,8 +102,10 @@ BOOST_AUTO_TEST_SUITE(sighash_tests)

 BOOST_AUTO_TEST_CASE(sighash_test)
 {
+    seed_insecure_rand(false);
+
     for (int i=0; i<20000; i++) {
-        int nHashType = 1 + (insecure_rand() % 3) + (insecure_rand() % 2)*0x80;
+        int nHashType = insecure_rand();
         CTransaction txTo;
         RandomTransaction(txTo, (nHashType & 0x1f) == SIGHASH_SINGLE);
         CScript scriptCode;

sipa commented at 4:29 PM on September 28, 2013: member

Rebased and incorporated Gavin's suggested changes.

Inline signature serializer

Instead of building a full copy of a CTransaction being signed, and
then modifying bits and pieces until its fits the form necessary
for computing the signature hash, use a wrapper serializer that
only serializes the necessary bits on-the-fly.

This makes it easier to see which data is actually being hash,
reduces load on the heap, and also marginally improves performances
(around 3-4us/sigcheck here). The performance improvements are much
larger for large transactions, though.

The old implementation of SignatureHash is moved to a unit tests,
to test whether the old and new algorithm result in the same value
for randomly-constructed transactions.

f5857e5cb5

BitcoinPullTester commented at 8:08 AM on October 4, 2013: none

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/f5857e5cb5fb03bee9c05d1dd6ba2621cac49179 for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

sipa commented at 7:28 PM on October 15, 2013: member

@gavinandresen Care to have another look?

SergioDemianLerner commented at 12:29 AM on October 17, 2013: contributor

As far as I can see by inspection, both functions are equivalent. Is the OP_CODESEPARATOR removal part tested by test cases? (e.g. OP_CS xx OP_CS OP_CS yy OP_CS -> xx yy)

gavinandresen commented at 12:53 AM on October 17, 2013: contributor

@SergioDemianLerner : yes, the unit test creates 50,000 random, 10-opcode-long scriptPubKey transactions, many of which will contain multiple OP_CODESEPARATORs.

gavinandresen referenced this in commit 5a8a4be289 on Oct 17, 2013

gavinandresen merged this on Oct 17, 2013

gavinandresen closed this on Oct 17, 2013

bitcoin locked this on Sep 8, 2021