Divergence between 32- and 64-bit when hashing >4GB affects gettxoutsetinfo #7848

pull laanwj wants to merge 3 commits into bitcoin:master from laanwj:2016_04_hash_4gb_utxoset changing 6 files +12 −4
  1. laanwj commented at 6:44 am on April 9, 2016: member

    I was seeing differerent UTXO hashes from gettxoutsetinfo on my ARM nodes than on x86 (64 bit). After the initial panic had weared off, I eventually managed to trace this to the following:

    The counters for SHA256 and friends are supposed to be 64 bit. size_t, the type currently used, is 32-bit on architectures with a 32-bit address space. So after processing 4GB of data, the counter will wrap around on 32-bit and the hashes will start to diverge compared to those computed on 64-bit architectures.

    I am sure this has no effect on any other use of hashes by the project, as there is no other place where >4GB of data is hashed in one go.

    N.B. as this changes the output of gettxoutsetinfo on many platforms anyway (needs mention in release notes) I’ve taken the liberty of addressing the direct concern in #7758 as well in a second commit.

    Before first commit

     0// [32-bit]
     1{
     2  "height": 406339,
     3  "bestblock": "0000000000000000030eccfc56131ea1a560e965d22957098f978cb070f6374a",
     4  "transactions": 9254293,
     5  "txouts": 35345291,
     6  "bytes_serialized": 1214932313,
     7  "hash_serialized": "992696b26016449d90736ff28d0776167014a5d916298963331eb2c9d9d36928",
     8  //                  ⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧
     9  "total_amount": 15408336.11047113
    10}
    11
    12// [64-bit]
    13{
    14  "height": 406339,
    15  "bestblock": "0000000000000000030eccfc56131ea1a560e965d22957098f978cb070f6374a",
    16  "transactions": 9254293,
    17  "txouts": 35345291,
    18  "bytes_serialized": 1214932313,
    19  "hash_serialized": "acaf2a5c17a8264ea7ee8aecba3dccb9efe6c8ee1861492fff5013ff4b7a14fc",
    20  //                  ⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧
    21  "total_amount": 15408336.11047113
    22}
    

    After first commit

     0// [32-bit]
     1{
     2  "height": 406339,
     3  "bestblock": "0000000000000000030eccfc56131ea1a560e965d22957098f978cb070f6374a",
     4  "transactions": 9254293,
     5  "txouts": 35345291,
     6  "bytes_serialized": 1214932313,
     7  "hash_serialized": "acaf2a5c17a8264ea7ee8aecba3dccb9efe6c8ee1861492fff5013ff4b7a14fc",
     8  //                  ⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧
     9  "total_amount": 15408336.11047113
    10}
    11
    12[64-bit]
    13// {
    14  "height": 406339,
    15  "bestblock": "0000000000000000030eccfc56131ea1a560e965d22957098f978cb070f6374a",
    16  "transactions": 9254293,
    17  "txouts": 35345291,
    18  "bytes_serialized": 1214932313,
    19  "hash_serialized": "acaf2a5c17a8264ea7ee8aecba3dccb9efe6c8ee1861492fff5013ff4b7a14fc",
    20  //                  ⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧
    21  "total_amount": 15408336.11047113
    22}
    

    After second commit

     0// [32-bit]
     1{
     2  "height": 406339,
     3  "bestblock": "0000000000000000030eccfc56131ea1a560e965d22957098f978cb070f6374a",
     4  "transactions": 9254293,
     5  "txouts": 35345291,
     6  "bytes_serialized": 1214932313,
     7  "hash_serialized": "1814733d4f783cccfc4e5046b69e08077af4b26dc2279825a87c7d67522112a1",
     8  //                  ⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧
     9  "total_amount": 15408336.11047113
    10}
    11
    12// [64-bit]
    13{
    14  "height": 406339,
    15  "bestblock": "0000000000000000030eccfc56131ea1a560e965d22957098f978cb070f6374a",
    16  "transactions": 9254293,
    17  "txouts": 35345291,
    18  "bytes_serialized": 1214932313,
    19  "hash_serialized": "1814733d4f783cccfc4e5046b69e08077af4b26dc2279825a87c7d67522112a1",
    20  //                  ⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧
    21  "total_amount": 15408336.11047113
    22}
    
  2. laanwj added the label RPC/REST/ZMQ on Apr 9, 2016
  3. laanwj added the label Validation on Apr 9, 2016
  4. jonasschnelli commented at 2:05 pm on April 11, 2016: contributor

    What would be the downside of a forced 64bit type on 32bit platform? I guess the performance reduction and the slightly higher memory consumption is totally negligible?

    Would fixing it “on the other side” (detect size_t overflow while hashing, force the 32bit reset on 64bit platforms) not involve less risks?

  5. sipa commented at 2:23 pm on April 11, 2016: member

    We should replace the utxo hash reported by gettxoutsetinfo by a Merkle root of a tree whose leaves are the UTXO entries in a simple serialized format. That would allow producing short proofs about the existance or non-existance of certain entries to anyone who gets such a hash from a trusted source.

    This isn’t hard to do using a variation of the algorithm in consensus/merkle.cpp, but it’s not my priority now.

  6. theuni commented at 7:06 am on April 12, 2016: member
    @laanwj Nice catches. utACK. @jonasschnelli I think it’s not worth the trouble. For platforms constrained enough for the change to be significant, I suspect that a separate hashing implementation would’ve already been patched in.
  7. laanwj commented at 7:42 am on April 14, 2016: member
    @sipa I know you have bigger plans for this. Do you disagree, though, that this is an improvement? I’m using gettxoutsetinfo a lot so it would really help me if this is fixed. @jonasschnelli I’d say doing the hashing as it is supposed to be done, with counters of the appropriate width - as done here - is less risky than trying to hack something with overflow detection? Adding a 64-bit with 32-bit value is cheap on 32-bit architectures as well, it even involves overflow detection (the carry flag) internally.
  8. gmaxwell commented at 8:54 am on April 14, 2016: contributor
    The hash function code should be correct with large inputs regardless of what the utxoset hash stuff returns. utACK.
  9. sipa commented at 9:09 am on April 14, 2016: member

    @laanwj Oops, I just responded to @jonasschnelli’s comment, not seeing there was a patch too.

    Yes, uint64_t should definitely be used for at least SHA1, SHA256 and RIPEMD160. In theory, we’d need to use a 128-bit integer for SHA512 (as it writes a 128-bit length descriptor in the padding), but I guess we’ll never hit that.

  10. gmaxwell commented at 9:13 am on April 14, 2016: contributor
    there can be an assert to catch the overflow in the sha512 case, if we want to be pedantic?
  11. laanwj commented at 10:04 am on April 14, 2016: member

    In theory, we’d need to use a 128-bit integer for SHA512 (as it writes a 128-bit length descriptor in the padding), but I guess we’ll never hit that.

    Good catch!

    Although implementing actual 128-bit counting is just as much work as adding an overflow assertion, I’d personally say adding an assert is preferable here, as we can never test this behavior.

  12. sipa commented at 10:06 am on April 14, 2016: member
    The assert should probably exist for SHA1/256/RIPEMD160 as well, as those are undefined for longer messages.
  13. laanwj commented at 1:32 pm on April 14, 2016: member
    Hmm looking at the actual code I’ll leave that for another time. At least this is trivially correct. I feel that the risk of someone hashing 18,446,744 TB of data is extremely low in the foreseeable future (even with the unfortunate rate of UTXO growth that will take a while), at least smaller than the risk of me introducing bugs there.
  14. sipa commented at 1:35 pm on April 14, 2016: member
    utACK 088c270c2928089f1903db0ff01c82eaaa5f6e45
  15. MarcoFalke commented at 8:51 am on April 15, 2016: member
    Concept ACK
  16. crypto: bytes counts are 64 bit
    Byte counts for SHA256, SHA512, SHA1 and RIPEMD160 must be 64 bits.
    `size_t` has a different size per platform, causing divergent results
    when hashing more than 4GB of data.
    9ad1a51857
  17. laanwj force-pushed on Apr 15, 2016
  18. rpc: make sure `gettxoutsetinfo` hash has txids
    The key (transaction id for the following outputs) should be serialized
    to the HashWriter.
    
    This is a problem as it means different transactions in the same
    position with the same outputs will potentially result in the same hash.
    
    Fixes primary concern of #7758.
    76212bbc6a
  19. doc: update release-notes for `gettxoutsetinfo` change 28b400f7d1
  20. laanwj force-pushed on Apr 15, 2016
  21. gavinandresen commented at 5:13 pm on April 15, 2016: contributor

    Tested ACK, on an OSX 64-bit machine.

    On a debug build on my machine, hashing the txids makes gettxoutsetinfo about ten seconds slower (3 minutes versus 2 minutes 50 seconds).

  22. laanwj merged this on Apr 18, 2016
  23. laanwj closed this on Apr 18, 2016

  24. laanwj referenced this in commit 88616d2008 on Apr 18, 2016
  25. codablock referenced this in commit 60928cf022 on Sep 16, 2017
  26. codablock referenced this in commit c81ac15bc5 on Sep 19, 2017
  27. codablock referenced this in commit cfb8b82dc3 on Dec 20, 2017
  28. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me