Refactor: Initialize PrecomputedTransactionData in CheckInputScripts #18401

pull jnewbery wants to merge 1 commits into bitcoin:master from jnewbery:2020-03-precomputedtransactiondata changing 3 files +32 −9
  1. jnewbery commented at 3:18 am on March 22, 2020: member

    This is a single commit taken from the Schnorr/Taproot PR #17977.

    Add a default constructor to PrecomputedTransactionData, which doesn’t initialize the struct’s members. Instead they’re initialized inside the CheckInputScripts() function. This allows a later commit to add the spent UTXOs to that structure. The spent UTXOs are required for the schnorr signature hash, since it commits to the scriptPubKeys. See https://github.com/bitcoin/bips/blob/master/bip-0341.mediawiki#common-signature-message for details.

    By itself, this isn’t really an improvement to the code, but I think it makes sense to separate out the refactor/moveonly commits from PR #17977 so that PR is only the logical changes needed for Schnorr/Taproot.

  2. fanquake added the label Refactoring on Mar 22, 2020
  3. DrahtBot commented at 10:39 am on March 22, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #18071 (Refactoring CHashWriter & Get{Prevouts,Sequence,Outputs}Hash to SHA256 by JeremyRubin)
    • #17479 (Return BlockValidationState from ProcessNewBlock if CheckBlock/AcceptBlock fails by jnewbery)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. in src/validation.cpp:2087 in 9f94e16daa outdated
    2083@@ -2083,7 +2084,7 @@ bool CChainState::ConnectBlock(const CBlock& block, BlockValidationState& state,
    2084     int64_t nSigOpsCost = 0;
    2085     blockundo.vtxundo.reserve(block.vtx.size() - 1);
    2086     std::vector<PrecomputedTransactionData> txdata;
    2087-    txdata.reserve(block.vtx.size()); // Required so that pointers to individual PrecomputedTransactionData don't get invalidated
    2088+    txdata.resize(block.vtx.size()); // Required so that pointers to individual PrecomputedTransactionData don't get invalidated
    


    promag commented at 11:21 pm on March 22, 2020:
    nit, above could do std::vector<PrecomputedTransactionData> txdata{block.vtx.size()}; and remove this,

    jnewbery commented at 5:37 pm on March 23, 2020:
    Done, and expanded the comment on why this vector is necessary.
  5. in src/script/interpreter.cpp:1286 in 9f94e16daa outdated
    1286     if (txTo.HasWitness()) {
    1287-        hashPrevouts = GetPrevoutHash(txTo);
    1288-        hashSequence = GetSequenceHash(txTo);
    1289-        hashOutputs = GetOutputsHash(txTo);
    1290-        ready = true;
    1291+        if (!ready) {
    


    promag commented at 11:49 pm on March 22, 2020:
    nit, early return in L1284 if (ready) return;?

    jnewbery commented at 5:57 pm on March 23, 2020:
    The later commits in 17977 can’t return early, because we may need to cache the hash of the amounts spent in this transaction (see https://github.com/bitcoin/bitcoin/pull/17977/commits/607843e0e1e8e09f11192e6489ecb22b6acdae79#diff-be2905e2f5218ecdbe4e55637dac75f3R1308). However, I think we can return early for non-witness transactions, so I’ve implemented that.

    sipa commented at 10:27 pm on March 25, 2020:
    For now that’s ok, but https://github.com/bitcoin/bitcoin/pull/17977/commits/dd0622b8e2f8a324fdf73eb476f136ee556f3e68 relies on storing the spent UTXOs in txdata, even for non-witness outputs (it makes validation use the scriptPubKeys from there, to avoid duplicating them).

    jnewbery commented at 10:39 pm on March 25, 2020:
    I think the function can be re-ordered to first store the spent UTXOs for all transactions, and then do the witness-specific logic later (and exit early for non-witness txs).

    sipa commented at 0:18 am on March 26, 2020:
    Sure; just clarifying why it wasn’t done in the original commit. This is a triviality that can always be changed.
  6. in src/script/interpreter.cpp:1287 in de43efc605 outdated
    1284 {
    1285     // Cache is calculated only for transactions with witness
    1286-    if (txTo.HasWitness()) {
    1287+    if (!txTo.HasWitness()) return;
    1288+
    1289+    if (!ready) {
    


    sipa commented at 0:20 am on March 26, 2020:
    Supernit: ready is faster to check than HasWitness, so maybe you want it first (or check HasWitness) inside the !ready branch.

    jnewbery commented at 2:44 pm on March 26, 2020:

    My preferred form after taproot would be:

     0template <class T>
     1void PrecomputedTransactionData::Init(const T& txTo, std::vector<CTxOut> spent_outputs)
     2{
     3    // Save spent outputs for all transactions (not just those with witnesses)
     4    if (m_spent_outputs.empty()) m_spent_outputs = std::move(spent_outputs);
     5
     6    // Transactions without witnesses don't need a cache
     7    if (!txTo.HasWitness()) return;
     8
     9    // Cache prevouts, sequences and outputs
    10    if (!ready) {
    11        m_prevouts_hash = GetPrevoutHash(txTo);
    12        hashPrevouts = SHA256Uint256(m_prevouts_hash);
    13        m_sequences_hash = GetSequenceHash(txTo);
    14        hashSequence = SHA256Uint256(m_sequences_hash);
    15        m_outputs_hash = GetOutputsHash(txTo);
    16        hashOutputs = SHA256Uint256(m_outputs_hash);
    17        ready = true;
    18    }
    19
    20    // Cache amounts spent
    21    if (!m_amounts_spent_ready && !m_spent_outputs.empty()) {
    22        m_amounts_spent_hash = GetSpentAmountsHash(m_spent_outputs);
    23        m_amounts_spent_ready = true;
    24    }
    25}
    

    which is why I modified this commit to early exit after checking for witnesses. Happy to change that back if you think that’s better.


    sipa commented at 6:05 pm on March 26, 2020:
    Heh, I missed this comment, and made this exact change just now in the taproot branch (except a comment).

    instagibbs commented at 2:18 pm on April 1, 2020:
    while we’re touching this code can we make ready be something more self-explanatory? Especially if we’re expanding condition checks in the future. e.g., m_amounts_spent_ready

    jnewbery commented at 8:03 pm on April 1, 2020:
    This, plus the discussion here: https://github.com/bitcoin/bitcoin/pull/18401/files#r401812158 has made me think that there shouldn’t be two different ready bools in the final taproot implementation, since we only ever want to initialize the object once. I’ve changed this to m_ready and made some minor changes to the Init function.
  7. sipa commented at 0:20 am on March 26, 2020: member
    Concept ACK, of course.
  8. laanwj added this to the milestone 0.21.0 on Mar 27, 2020
  9. sipa commented at 1:18 am on March 28, 2020: member
    Code review ACK, post squash, and for 0.21.
  10. jnewbery force-pushed on Mar 28, 2020
  11. jnewbery commented at 3:06 am on March 28, 2020: member
    Squashed commits. No code changes.
  12. theStack approved
  13. theStack commented at 6:16 pm on March 29, 2020: member
    Code-review ACK https://github.com/bitcoin/bitcoin/pull/18401/commits/b409b611eb0fc6c71f107b5313ab79ecaf57f479 :computer:
    Checked that the resulting code logic remains the same, the only internal difference is that the initialization of the precomputed transaction data happens later, in CheckInputScripts() rather than in AcceptSingleTransaction().
  14. fjahr commented at 4:13 pm on April 1, 2020: member

    Code-review ACK b409b611eb0fc6c71f107b5313ab79ecaf57f479

    Also ran tests locally.

  15. jonatack commented at 4:44 pm on April 1, 2020: member
    ACK b409b611eb0fc6c71f107b531
  16. in src/validation.cpp:2082 in b409b611eb outdated
    2075@@ -2075,15 +2076,18 @@ bool CChainState::ConnectBlock(const CBlock& block, BlockValidationState& state,
    2076 
    2077     CBlockUndo blockundo;
    2078 
    2079+    // Precomputed transaction data pointers must not be invalidated
    2080+    // until after control has run the script checks (potentially
    2081+    // in multiple threads). Keep txsdata in scope for as long as
    2082+    // control.
    


    jonatack commented at 5:21 pm on April 1, 2020:
    Suggestion here or for a follow-up: write and use the word control here in a way that indicates more clearly that it’s not just the word “control”.

    instagibbs commented at 5:40 pm on April 1, 2020:
    I think we should keep around the note not to have create any new allocations to avoid pointer invalidation.

    jonatack commented at 6:04 pm on April 1, 2020:

    I think we should keep around the note not to have create any new allocations to avoid pointer invalidation.

    Seconded.


    instagibbs commented at 6:19 pm on April 1, 2020:
    alternative is to simply use a C array or heap-based std::unique_ptr<PrecomputedTransactionData[]> txsdata(new PrecomputedTransactionData[block.vtx.size()]);? One less footgun maybe?

    jnewbery commented at 8:03 pm on April 1, 2020:
    Added backticks to it’s more obviously referring to the code.

    jnewbery commented at 8:04 pm on April 1, 2020:
    I’ll just update the comment for now.

    jkczyz commented at 8:42 pm on April 1, 2020:
    Could the vector be removed entirely if CScriptCheck owned the PrecomputedTransactionData rather than taking a pointer?

    instagibbs commented at 8:51 pm on April 1, 2020:
    @jkczyz let’s punt on changing ownership for a future PR, discuss offline maybe

    sipa commented at 8:52 pm on April 1, 2020:
    That could work, but it’d need something like a shared_ptr (there is one PrecomputedTransactionData per transaction, but a CScriptCheck per txin). So I think that’s perhaps an unreasonable overhead.

    jnewbery commented at 8:54 pm on April 1, 2020:
    @jkczyz - that was my first thought when I started looking into this, but my conclusion was also to punt :)

    jkczyz commented at 8:55 pm on April 1, 2020:
    @sipa Was this suppose to be in response to #18401 (review)?

    sipa commented at 9:07 pm on April 1, 2020:
    Oops, yes.

    jkczyz commented at 9:24 pm on April 1, 2020:

    SGTM

    For posterity, see #18401 (review) by @sipa for why this might be not that simple.

  17. in src/validation.cpp:2133 in b409b611eb outdated
    2133@@ -2130,13 +2134,12 @@ bool CChainState::ConnectBlock(const CBlock& block, BlockValidationState& state,
    2134             return state.Invalid(BlockValidationResult::BLOCK_CONSENSUS, "bad-blk-sigops");
    2135         }
    2136 
    2137-        txdata.emplace_back(tx);
    


    instagibbs commented at 5:35 pm on April 1, 2020:
    small bonus is that precompute data isn’t computed unless fScriptChecks is true

    sipa commented at 6:05 pm on April 1, 2020:
    Oh, that’s great. I never realized that.

    jnewbery commented at 7:57 pm on April 1, 2020:
    That is a nice bonus. That’s millions of hashes saved over the course of an IBD. Not sure how meaningful that is.

    instagibbs commented at 7:58 pm on April 1, 2020:
    freebie is free
  18. in src/script/interpreter.cpp:1282 in b409b611eb outdated
    1278@@ -1279,18 +1279,28 @@ uint256 GetOutputsHash(const T& txTo)
    1279 } // namespace
    1280 
    1281 template <class T>
    1282-PrecomputedTransactionData::PrecomputedTransactionData(const T& txTo)
    1283+void PrecomputedTransactionData::Init(const T& txTo)
    


    jkczyz commented at 6:11 pm on April 1, 2020:
    This interface allows for (mistakenly) calling Init on the same object with different inputs. i.e., once ready is set, any further calls to Init would be a no-op (at least in this commit). So it leaves open the potential for misuse. Could we design the interface in such away that makes this impossible?

    sipa commented at 6:14 pm on April 1, 2020:
    Maybe assert fail when you’re trying to re-Init?

    jnewbery commented at 8:47 pm on April 1, 2020:
    Done
  19. sipa commented at 9:20 pm on April 1, 2020: member
    @jnewbery I guess this calls for splitting up CheckInputScripts into two separate functions (one looking up inputs and initializing txdata, and one doing the rest). I suggest keeping that for a future improvement, though.
  20. jnewbery commented at 9:31 pm on April 1, 2020: member

    @sipa

    … this calls for splitting up CheckInputScripts…

    Sounds sensible now that it has a potentially unexpected side-effect of initializing the txdata. I agree that we should save that for later.

    Would you prefer to leave the assert in this commit, or revert to not having it?

  21. narula deleted a comment on Apr 1, 2020
  22. sipa commented at 9:50 pm on April 1, 2020: member

    Would you prefer to leave the assert in this commit, or revert to not having it?

    No opinion either way.

  23. in src/script/interpreter.cpp:1293 in 851908de0e outdated
    1290         hashSequence = GetSequenceHash(txTo);
    1291         hashOutputs = GetOutputsHash(txTo);
    1292-        ready = true;
    1293     }
    1294+
    1295+    m_ready = true;
    


    vasild commented at 9:01 am on April 2, 2020:
    Why move m_ready = true; outside of the if? This makes it possible to end up in a weird state where m_ready is true, but the member variables are not initialized (if txTo.HasWitness() is false).

    jnewbery commented at 1:51 pm on April 2, 2020:
    Because m_ready now means ‘is this object initialized’ (see conversation at #18401 (review)). A future commit in #17977 changes PrecomputedTransactionData to store data even for non-segwit transactions (https://github.com/bitcoin/bitcoin/pull/17977/commits/6dcc85e3347fe8a0c5e3e578176fd38fa093df39), so m_ready would need to be set outside the if (txTo.HasWitness) block.
  24. in src/validation.cpp:1515 in 851908de0e outdated
    1511@@ -1512,6 +1512,10 @@ bool CheckInputScripts(const CTransaction& tx, TxValidationState &state, const C
    1512         return true;
    1513     }
    1514 
    1515+    if (!txdata.m_ready) {
    


    jonatack commented at 5:18 pm on April 2, 2020:

    Verified that bitcoind and tests abort on the assert sans the new conditional.

    0bitcoind: script/interpreter.cpp:1284: void PrecomputedTransactionData::Init(const T&)
    1                                       [with T = CTransaction]: Assertion `!m_ready' failed.
    2Aborted
    
  25. jonatack commented at 5:44 pm on April 2, 2020: member
    re-ACK 851908de0e655b828f7cc71565f90b9a94290849 for 0.21
  26. MarcoFalke commented at 6:02 pm on April 2, 2020: member
    Does this speed up IBD?
  27. sipa commented at 6:07 pm on April 2, 2020: member
    @MarcoFalke Probably slightly, as it avoids constructing precomputed tx data in the assumevalid range.
  28. [REFACTOR] Initialize PrecomputedTransactionData in CheckInputScripts
    Add a default constructor to `PrecomputedTransactionData`, which doesn't
    initialize the struct's members. Instead they're initialized inside the
    `CheckInputScripts()` function. This allows a later commit to add the
    spent UTXOs to that structure.
    f63dec189c
  29. jnewbery force-pushed on Apr 12, 2020
  30. jnewbery commented at 1:33 am on April 12, 2020: member
    Squashed commits
  31. jonatack commented at 6:59 pm on April 14, 2020: member
    Re-ACK f63dec1 git diff 851908d f63dec1 shows no change since last ACK.
  32. jnewbery commented at 0:25 am on April 15, 2020: member
    I think this should be ready for merge if @sipa @fjahr @theStack reACK.
  33. sipa commented at 6:48 am on April 15, 2020: member
    utACK f63dec189c3c8eee1ab2187681d5d0b2513b1b2e
  34. ariard commented at 8:20 am on April 15, 2020: member

    Code Review ACK f63dec1

    Add a commit commenting PrecomputedTransactionData, there is a lot of context around it : https://github.com/ariard/bitcoin/commit/56247242c1dbe8542e6f82ee743a58ad6d59734a

    Few aside questions, :

    • As far as I can tell, we’re going to compute “midstate” even if validation fail before signature verification (like a script spending a coin already spent), instead of hashing-at-first-use ? (not sure if ternary in SignatureHash are currently exercised)
    • Why we don’t cache nVersion and nLocktime they won’t change given the sigops, is this because they’re not worth it considering their size ?

    I guess minimizing code complexity answers both.

  35. theStack commented at 10:31 am on April 15, 2020: member
    re-ACK f63dec189c3c8eee1ab2187681d5d0b2513b1b2e Verified that since my last ACK only minor changes happened that didn’t change the overall logic (s/ready/m_ready/, conditional call of .Init() method in CheckInputScripts(), minor reordering of conditions and added assert() in .Init())
  36. jnewbery commented at 2:11 pm on April 15, 2020: member

    As far as I can tell, we’re going to compute “midstate” even if validation fail before signature verification (like a script spending a coin already spent), instead of hashing-at-first-use ? (not sure if ternary in SignatureHash are currently exercised)

    No. Coins checks are done in MemPoolAccept::PreChecks(), which is called before PrecomputedTransactionData is instantiated (both before and after this PR).

    Why we don’t cache nVersion and nLocktime they won’t change given the sigops, is this because they’re not worth it considering their size ?

    Because those fields aren’t hashed before going into the sighash message. hashPrevouts, hashSequence and hashOutputs are all hashed before going into that message, so caching the hash digests means we don’t have to recalculate those hashes every time.

    Add a commit commenting PrecomputedTransactionData, there is a lot of context around it : ariard@5624724

    Feel free to open a follow-up PR. No need to overload this one now that we have several ACKs.

  37. fjahr commented at 2:42 pm on April 15, 2020: member

    Re-ACK f63dec189c3c8eee1ab2187681d5d0b2513b1b2e

    Only changes since my last ACK were minor improvements addressing review comments and squash of commits.

  38. ariard commented at 6:27 pm on April 15, 2020: member

    No. Coins checks are done in MemPoolAccept::PreChecks(), which is called before PrecomputedTransactionData is instantiated (both before and after this PR).

    Coin checks was just a (bad) example, you may have a scriptpubkey without any sigops or script may fail before first one.

    Because those fields aren’t hashed before going into the sighash message. hashPrevouts, hashSequence and hashOutputs are all hashed before going into that message, so caching the hash digests means we don’t have to recalculate those hashes every time.

    Hmmm right, reusing their hash would break final digest, I understand better what BIP143 aim to achieve compare to original transaction digest algo.

    Feel free to open a follow-up PR. No need to overload this one now that we have several ACKs.

    Will do, ofc no need to break ACKs for docs!

  39. jnewbery commented at 6:55 pm on April 15, 2020: member

    Coin checks was just a (bad) example, you may have a scriptpubkey without any sigops or script may fail before first one.

    We need the precomputed transaction data to be cached before any script checking because in the ConnectBlock case, each input script validation is parallelized separately. If the cache was filled during script execution, you’d need locking between those threads to ensure they’re not trying to read/write concurrently.

  40. ariard commented at 7:32 pm on April 15, 2020: member

    We need the precomputed transaction data to be cached before any script checking because in the ConnectBlock case

    Right but not in the mempool case. Thanks for answer, will update comment with this insight in a follow-up PR.

  41. MarcoFalke merged this on Apr 16, 2020
  42. MarcoFalke closed this on Apr 16, 2020

  43. jnewbery deleted the branch on Apr 16, 2020
  44. sidhujag referenced this in commit 8e77630a77 on Apr 16, 2020
  45. MarcoFalke commented at 2:40 pm on April 24, 2020: member

    I did some more benchmarking and this should speed up CreateNewBlock by a factor inversely related to the number of transactions in the block.

    So a block with one transaction (beside the coinbase tx) should be generated about twice as fast. Whereas a block with 1000 txs (beside the coinbase tx) should be generated at a slight but almost no speedup.

  46. ComputerCraftr referenced this in commit a600f3b1f9 on Jun 10, 2020
  47. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me