Refactoring CHashWriter & Get{Prevouts,Sequence,Outputs}Hash to SHA256 #18071

pull JeremyRubin wants to merge 3 commits into bitcoin:master from JeremyRubin:refactoring-hashers changing 3 files +52 −19
  1. JeremyRubin commented at 8:21 am on February 5, 2020: contributor

    These are a couple common refactoring changes in both Taproot and CTV that I think can be merged as is. Other than that the two PRs should largely be conflict free (except for caching, which is a relatively trivial fix).

    These essentially just allow us to get access to the single SHA256 version of Get{Prevouts,Sequence,Outputs} hash which are used both in the Taproot and CTV spec. They use the single hash versions because they are cheaper to compute, which reduces validation overhead. These refactors are non-functional, and just expose the ability to get the single hash when needed.

    The names of Get*Hash are renamed to Get*SHA256 to avoid confusion with the value returned previously.

  2. JeremyRubin requested review from sipa on Feb 5, 2020
  3. fanquake added the label Refactoring on Feb 5, 2020
  4. DrahtBot commented at 10:14 am on February 5, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #19601 (Refactoring CHashWriter & Get{Prevouts,Sequence,Outputs}Hash to SHA256 (Alternative to #18071) by JeremyRubin)
    • #17977 (Implement BIP 340-342 validation (Schnorr/taproot/tapscript) by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. JeremyRubin force-pushed on Feb 5, 2020
  6. JeremyRubin renamed this:
    [WIP] Refactoring CHashWriter & moving some hashed fields around
    Refactoring CHashWriter & Get{Prevouts,Sequence,Outputs}Hash to SHA256
    on Feb 5, 2020
  7. JeremyRubin marked this as ready for review on Feb 5, 2020
  8. JeremyRubin force-pushed on Feb 7, 2020
  9. JeremyRubin force-pushed on Feb 7, 2020
  10. prestwich commented at 6:44 pm on February 17, 2020: none
    utACK
  11. DrahtBot added the label Needs rebase on Apr 16, 2020
  12. JeremyRubin force-pushed on Apr 21, 2020
  13. DrahtBot removed the label Needs rebase on Apr 21, 2020
  14. Sjors commented at 9:30 am on July 24, 2020: member
    Concept ACK on getting this refactor in separate from #17977 (Taproot) and #19055 (MuHash). Will review later.
  15. in src/hash.h:141 in 3f2b537fa8 outdated
    160      */
    161     inline uint64_t GetCheapHash() {
    162-        unsigned char result[CHash256::OUTPUT_SIZE];
    163-        ctx.Finalize(result);
    164-        return ReadLE64(result);
    165+        uint256 result = GetHash();
    


    Sjors commented at 10:22 am on July 24, 2020:
    Shouldn’t GetCheapHash be calling GetSHA256? I.e. “Cheap” refers to a single rather than a double hash.

    JeremyRubin commented at 4:53 pm on July 24, 2020:
  16. in src/script/interpreter.cpp:1262 in 09b832b57b outdated
    1258@@ -1259,33 +1259,33 @@ class CTransactionSignatureSerializer
    1259 };
    1260 
    1261 template <class T>
    1262-uint256 GetPrevoutHash(const T& txTo)
    1263+uint256 GetPrevoutSHA256(const T& txTo)
    


    Sjors commented at 10:45 am on July 24, 2020:
    Might as well rename these to plural prevouts and sequences: GetPrevoutsSHA256 and GetSequencesSHA256.

    JeremyRubin commented at 5:57 pm on July 24, 2020:
    done.
  17. in src/script/interpreter.cpp:1303 in 09b832b57b outdated
    1301-        hashSequence = GetSequenceHash(txTo);
    1302-        hashOutputs = GetOutputsHash(txTo);
    1303+        hashPrevouts = GetPrevoutSHA256(txTo);
    1304+        hashSequence = GetSequenceSHA256(txTo);
    1305+        hashOutputs = GetOutputsSHA256(txTo);
    1306+        SHA256Uint256(hashPrevouts);
    


    Sjors commented at 10:47 am on July 24, 2020:
    Would prefer to do this as a one-liner: hashPrevouts = SHA256Uint256(GetPrevoutSHA256(txTo)), as you do below.

    JeremyRubin commented at 4:54 pm on July 24, 2020:
    The point of not being a one liner is to make it easier for future work to just add a line of copying.
  18. Sjors commented at 10:50 am on July 24, 2020: member

    09b832b could mention in the description that several proposals require us to “get access to the single SHA256”.

    Should we rename GetHash() to GetDoubleSha256()?

  19. jnewbery commented at 12:11 pm on July 24, 2020: member
    There are some changes in #17977 to these commits, which should be added here if we merge this separately.
  20. JeremyRubin force-pushed on Jul 24, 2020
  21. JeremyRubin commented at 5:58 pm on July 24, 2020: contributor
    @Sjors requested changes made where applicable. @jnewbery could you be more specific? I scanned over the diffs and am not sure what you’re referring to.
  22. JeremyRubin commented at 8:29 pm on July 24, 2020: contributor

    Ah those changes are intentional for SHA256Uint256, since we use a r-value version at a callsite I figure it’s better to have an explicit version that does that.

    In this version, I make the r-value version return a hash (forcing lifetime consumption of it’s parameter) and make the reference version overwrite it’s parameter and return void. It should be harder to misuse this API as the types are more clear in the places we use it.

    In any case, if there is consensus preferring the approach currently in #17977, we’d want to minimally see changes to the Taproot version to both:

    1. change the parameter names for consistency.
    2. make the SHA256Uint256 take a const uint256& instead of a uint256&.

    I think those are all the differences, aside from the precomputed caches (which would be inappropriate to modify in this PR)? If there are other changes, please let me know.

  23. fjahr approved
  24. fjahr commented at 1:45 pm on July 25, 2020: member

    Code review ACK b0ac6eb86f98258d14800a8f913f977902aed856

    Micro-nit: the last commit has a typo in the commit message: Prvouts.

  25. Sjors commented at 9:02 am on July 27, 2020: member
    It’s still unresolved if the CheapHash function can use GetHash (instead of GetSHA256) without breaking Addrman’s bucketing. See #19055 (review)
  26. jnewbery commented at 10:35 am on July 27, 2020: member
    If the intention here is to have function signatures that are different from #17977, then I’m NACKish on this, since it would create additional rebase work for the author and reviewers of that PR.
  27. fjahr commented at 12:34 pm on July 27, 2020: member

    It’s still unresolved if the CheapHash function can use GetHash (instead of GetSHA256) without breaking Addrman’s bucketing. See #19055 (comment)

    I think that consideration can be left for a follow-up. It is a further behavior change that makes sense to review separately to make sure nothing breaks. And this change here is valuable enough without it for #17977 etc. So after it is merged the other PRs can be rebased and the CheapHash discussion can go on without blocking anything.

  28. JeremyRubin commented at 5:41 pm on July 27, 2020: contributor

    I agree with @fjahr, it’s best if this PR can make no functional changes. Am happy to make a follow-up PR that can be reviewed independently switching cheaphash to a single sha256 (or as sipa points out, even siphash). @Sjors is that OK? @jnewbery this PR has been open for half a year, and is a part of my review of the Taproot code. The below should be about the only diff required as a result of interface changes, which mostly have to be patched into 41d08f5d77f52bec0e31bb081d85fff2d67d0467. This code is more clear because m_*_hash is not touched once it has been set.

    Regardless of this PR, the function signatures should be changed to a const ref. It’s not clear to me that it’s defined behavior to pass an r-value as a non-const ref, which is done in the taproot code currently (hence the interface changes here – clearly no UB). If there is consensus that the interfaces should not change, preferring the approach in #17977, I can refactor this PR to be similar.

    edit: I guess it’s not using UB… I could have sworn somewhere that the input to SHA256Uint256 was a non const…

     0diff --git a/src/script/interpreter.cpp b/src/script/interpreter.cpp
     1index 9ec960d17..003c873d9 100644
     2--- a/src/script/interpreter.cpp
     3+++ b/src/script/interpreter.cpp
     4@@ -1408,12 +1408,12 @@ void PrecomputedTransactionData::Init(const T& txTo, std::vector<CTxOut> spent_o
     5 
     6     // Cache is calculated only for transactions with witness
     7     if (txTo.HasWitness()) {
     8-        m_prevouts_hash = GetPrevoutHash(txTo);
     9-        hashPrevouts = SHA256Uint256(m_prevouts_hash);
    10-        m_sequences_hash = GetSequenceHash(txTo);
    11-        hashSequence = SHA256Uint256(m_sequences_hash);
    12-        m_outputs_hash = GetOutputsHash(txTo);
    13-        hashOutputs = SHA256Uint256(m_outputs_hash);
    14+        hashPrevouts = m_prevouts_hash = GetPrevoutsSHA256(txTo);
    15+        SHA256Uint256(hashPrevouts);
    16+        hashSequence = m_sequences_hash = GetSequencesSHA256(txTo);
    17+        SHA256Uint256(hashSequence);
    18+        hashOutputs = m_outputs_hash = GetOutputsSHA256(txTo);
    19+        SHA256Uint256(hashOutputs);
    
  29. in src/hash.cpp:83 in b0ac6eb86f outdated
    76@@ -77,3 +77,14 @@ void BIP32Hash(const ChainCode &chainCode, unsigned int nChild, unsigned char he
    77     num[3] = (nChild >>  0) & 0xFF;
    78     CHMAC_SHA512(chainCode.begin(), chainCode.size()).Write(&header, 1).Write(data, 32).Write(num, 4).Finalize(output);
    79 }
    80+
    81+void SHA256Uint256(uint256& hash_io)
    82+{
    83+    CSHA256().Write(hash_io.begin(), 32).Finalize(hash_io.begin());
    


    instagibbs commented at 8:35 pm on July 30, 2020:
    nano nit: hash_io.size()

    JeremyRubin commented at 8:55 pm on July 30, 2020:
    We do this a ton of places in the codebase – might be nice for someone to add a Write/Finalize interface that is uint256 aware and fix these all at once?

    sipa commented at 9:14 pm on July 30, 2020:
    See #19326 (which only does it for the hash.h ones and not the crypto/* ones, but certainly makes sense as a follow-up there too).
  30. in src/hash.cpp:88 in b0ac6eb86f outdated
    83+    CSHA256().Write(hash_io.begin(), 32).Finalize(hash_io.begin());
    84+}
    85+
    86+uint256 SHA256Uint256(uint256&& hash_io)
    87+{
    88+    CSHA256().Write(hash_io.begin(), 32).Finalize(hash_io.begin());
    


    instagibbs commented at 8:35 pm on July 30, 2020:
    nano nit: hash_io.size()
  31. instagibbs approved
  32. instagibbs commented at 8:38 pm on July 30, 2020: member

    I have a mild preference for this PR. Both seem correct.

    code review ACK https://github.com/bitcoin/bitcoin/pull/18071/commits/b0ac6eb86f98258d14800a8f913f977902aed856

    Nano nits can be ignored.

  33. luke-jr approved
  34. luke-jr commented at 0:59 am on July 31, 2020: member
    utACK
  35. Sjors commented at 5:32 pm on July 31, 2020: member

    I agree with @fjahr, it’s best if this PR can make no functional changes. Am happy to make a follow-up PR that can be reviewed independently switching cheaphash to a single sha256 (or as sipa points out, even siphash). @Sjors is that OK?

    That makes no sense; cheaphash uses a single sha256. This PR changes it to a double hash, so it’s a functional change. Unless I’m reading it wrong.

  36. JeremyRubin commented at 5:42 pm on July 31, 2020: contributor
    @Sjors yes you are reading it wrong. The type of ctx changes from double hashed writer to a single hashed writer.
  37. DrahtBot added the label Needs rebase on Aug 3, 2020
  38. Add single sha256 call to CHashWriter 97fe971b11
  39. Add SHA256Uint256 helper functions 425ba916d8
  40. JeremyRubin force-pushed on Aug 3, 2020
  41. JeremyRubin force-pushed on Aug 3, 2020
  42. Refactor Get{Prevout,Sequence,Outputs}Hash to Get{Prevouts,Sequences,Outputs}SHA256.
    Several proposals (Taproot, MuHash, CTV) require access to the single
    hash.
    03b1cf8e93
  43. JeremyRubin force-pushed on Aug 3, 2020
  44. DrahtBot removed the label Needs rebase on Aug 3, 2020
  45. JeremyRubin commented at 7:54 pm on August 3, 2020: contributor

    I’ve rebased #18071 following the hash.h span changes (but not #19601 – I can rebase that one if there is a preference for that approach).

    pre-rebase summary: @jnewbery nack-ish (#17977 conflicts may prefer #19601 approach?) @instagibbs code review ACK (slightly prefers #18071 over #19601) @fjahr code review ACK (micro nit fixed) @luke-jr utack @prestwich utack @Sjors concept-ack

    Please check the rebase diff at your convenience! @sipa it would be good if you could weigh in on if this is acceptable or if #19601 is better.

    I think since merge of the span.h changes causes #17977 to need to rebase, it might be a decent time to rebase onto this branch without too much extra work.

  46. in src/hash.h:201 in 425ba916d8 outdated
    193@@ -194,6 +194,12 @@ uint256 SerializeHash(const T& obj, int nType=SER_GETHASH, int nVersion=PROTOCOL
    194     return ss.GetHash();
    195 }
    196 
    197+/** Single-SHA256 a 32-byte input (represented as uint256).
    198+ *  hash_io is read and written over with the result.
    199+ */
    200+void SHA256Uint256(uint256& hash_io);
    201+uint256 SHA256Uint256(uint256&& hash_io);
    


    sipa commented at 8:04 pm on August 3, 2020:
    What is the point of having these functions instead of a single uint256 SHA256Uint256(const uint256& input) ?

    JeremyRubin commented at 0:36 am on August 4, 2020:
    One benefit is that both versions operate completely in-place on their arguments (otherwise we’re reading from a pointer, writing to the stack, then writing to a pointer). Another benefit is that it is slightly harder to misuse this because when you are operating on a l-value there is no return value, which otherwise you might forget to assign.

    sipa commented at 7:33 pm on August 4, 2020:
    That could be solved using NODISCARD instead. I don’t think saving a super brief 32 bytes on the stack is even going to be be observable.
  47. Sjors commented at 9:11 am on August 4, 2020: member

    Ah, I missed the switch from CHash256 (double) to CSHA256 (single). I wrote a test, which this PR indeed doesn’t break:

    0BOOST_AUTO_TEST_CASE(getcheaphash)
    1{
    2    CHashWriter ss(SER_DISK, CLIENT_VERSION);
    3    ss << 1;
    4    BOOST_CHECK_EQUAL(ss.GetCheapHash(), 0x8D07CCE5F258F741ULL);    
    5}
    
  48. JeremyRubin commented at 9:04 pm on August 7, 2020: contributor
    Since #19601 seems to be preferred by @sipa, @practicalswift, and @jnewbery, going to close #18071.
  49. JeremyRubin closed this on Aug 7, 2020

  50. fanquake referenced this in commit f8462a6d27 on Aug 25, 2020
  51. sidhujag referenced this in commit 16d841beaa on Aug 25, 2020
  52. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-03 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me