Use SipHash-2-4 for various non-cryptographic hashes #8020

pull sipa wants to merge 4 commits into bitcoin:master from sipa:siphash changing 9 files +166 −88
  1. sipa commented at 8:01 pm on May 6, 2016: member

    Use SipHash-2-4 for:

    • CCoinsViewCache hashmap (instead of a custom Lookup3/XOR scheme)
    • CTxMempool::mapTx txid index (converting from an ordered map)
    • Address relay peer selection (instead of SHA256)

    Computing a hash for a txid using this takes around 52 CPU cycles in benchmarks (the Lookup3/XOR based mechanism used for CCoinsViewCache took 31 cycles). I think that’s negligible still, while using a much more standard construct designed for such purposes.

  2. gmaxwell commented at 8:24 pm on May 6, 2016: contributor
    Concept ACK.
  3. pstratem commented at 9:11 pm on May 6, 2016: contributor
    Concept ACK ad00e3af31df0655c1cdc95b8d29dcc1ec413e5b
  4. in src/main.cpp: in ad00e3af31 outdated
    4721                     {
    4722                         if (pnode->nVersion < CADDR_TIME_VERSION)
    4723                             continue;
    4724-                        unsigned int nPointer;
    4725+                        uint64_t nPointer = 0;
    4726                         memcpy(&nPointer, &pnode, sizeof(nPointer));
    


    kazcw commented at 1:06 am on May 7, 2016:
    What if pnode is 32-bit?

    sipa commented at 2:25 am on May 7, 2016:
    Nice catch. I’ve replaced it by using pnode->id.
  5. sipa force-pushed on May 7, 2016
  6. laanwj commented at 8:27 am on May 7, 2016: member
    Concept ACK Python took the same step for 3.4 to convert their hash algorithms to SipHash, and they have similar motivations of mitigating collision attacks, so we’re in good company: https://www.python.org/dev/peps/pep-0456/
  7. laanwj added the label UTXO Db and Indexes on May 7, 2016
  8. sipa force-pushed on May 8, 2016
  9. sipa commented at 11:13 pm on May 8, 2016: member

    Here is a comment about using SipHash-1-3 instead of SipHash-2-4: https://github.com/rust-lang/rust/issues/29754#issuecomment-156073946

    It would be approximately twice as fast for hashing txids, and the distinguisher mentioned in that comment is not relevant for data that is a multiple of 8 bytes (which is the case for us), as 8 bytes of padding are added in that case. Opinions?

  10. laanwj commented at 10:39 am on May 9, 2016: member

    Siphash 2-4 seems to be the defacto ‘safe’ choice. Siphash 1-3 happens to be secure for our use case right now, but that sounds a tad brittle: as if a small change in our use case, or a small advancement in cryptoanalysis of the function, could mean that the known weakness does suddenly affect us?

    Not sure how much hashing the txids is a bottleneck in the whole scheme of things around CCoinsCache, it’s only 32 bytes after all. Conservatively I’d say only consider ‘downgrading’ if performance is seriously affected by going from 29 cycles to 49 cycles.

    On the other hand it’s pretty nice if using a hash with better security properties could be faster than the Lookup3/XOR scheme, so I see the attraction in that.

  11. sipa commented at 5:28 pm on May 9, 2016: member

    Did some better benchmarks:

    • Current Lookup3-based approach: 31 cycles
    • SipHash-2-4: 52 cycles
    • SipHash-1-3: 32 cycles
  12. dcousens commented at 3:14 am on May 11, 2016: contributor
    Ha, I thought SipHash was something home grown by @sipa… TIL. LGTM, for the record, do we have a comparison with cycle counts of the other hash functions? (SHA1 say?)
  13. sipa commented at 4:14 am on May 11, 2016: member

    @dcousens Based on numbers I get from #8039:

    • SHA1: 577 cycles (1 block, 64 bytes)
    • SHA256: 1504 cycles (1 block, 64 bytes)
    • SHA512: 1988 cycles (1 block, 128 bytes)
    • RIPEMD160: 751 cycles (1 block, 64 bytes)
  14. dcousens commented at 4:25 am on May 11, 2016: contributor
    Thanks @sipa :+1:
  15. in src/hash.cpp: in d92580d8e9 outdated
    137+}
    138+
    139+uint64_t SipHashUint256(uint64_t k0, uint64_t k1, const uint256& val)
    140+{
    141+    /* Specialized implementation for efficiency */
    142+    const uint64_t* data = (const uint64_t*)(val.begin());
    


    TheBlueMatt commented at 11:37 pm on May 15, 2016:
    It’d be nice to not break aliasing rules here? Shouldnt the compiler optimize out a manual-shift-or?

    sipa commented at 4:49 am on May 16, 2016:
    Fixed.
  16. sipa force-pushed on May 16, 2016
  17. sipa commented at 10:38 pm on May 16, 2016: member

    Going to stick to SipHash-2-4 for now. We can always switch to something else later.

    Ready for merge, I think.

  18. pstratem commented at 4:49 am on May 17, 2016: contributor
    utACK 658c6481616f12a3466d1ddeb23eb91788ab2e66
  19. gmaxwell commented at 7:46 am on May 17, 2016: contributor
    ACK
  20. Add SipHash-2-4 primitives to hash 0b1295b066
  21. Use SipHash-2-4 for CCoinsCache index
    This is ~1.7x slower than the Lookup3-of-Xor-with-salt construct we were
    using before, but it is a primitive designed for exactly this.
    382c871d28
  22. Switch CTxMempool::mapTx to use a hash index for txids 8cc9cfe160
  23. Use SipHash-2-4 for address relay selection a68ec21f7e
  24. in src/uint256.h: in 658c648161 outdated
    82@@ -83,6 +83,12 @@ class base_blob
    83         return sizeof(data);
    84     }
    85 
    86+    uint64_t GetUint64(int pos) const
    87+    {
    88+        const uint8_t* ptr = data + pos * 8;
    89+        return ((uint64_t)ptr[0]) | ((uint64_t)ptr[1]) << 8 | ((uint64_t)ptr[2]) << 16 | ((uint64_t)ptr[3]) << 24 | ((uint64_t)ptr[4]) << 32 | ((uint64_t)ptr[5]) << 40 | ((uint64_t)ptr[6]) << 48 | ((uint64_t)ptr[7]) << 56;
    


    TheBlueMatt commented at 5:31 pm on May 17, 2016:

    Can you make this more readable?

    Something like

    0                                 (uint64_t(ptr[0]) <<  0) |
    1                                 (uint64_t(ptr[1]) <<  8) |
    2                                 (uint64_t(ptr[2]) << 16) |
    3                                 (uint64_t(ptr[3]) << 24) |
    4                                 (uint64_t(ptr[4]) << 32) |
    5                                 (uint64_t(ptr[5]) << 40) |
    6                                 (uint64_t(ptr[6]) << 48) |
    7                                 (uint64_t(ptr[7]) << 56))
    

    sipa commented at 6:15 pm on May 17, 2016:
    Done.
  25. in src/hash.cpp: in 658c648161 outdated
    137+}
    138+
    139+uint64_t SipHashUint256(uint64_t k0, uint64_t k1, const uint256& val)
    140+{
    141+    /* Specialized implementation for efficiency */
    142+    uint64_t d = le64toh(val.GetUint64(0));
    


    TheBlueMatt commented at 5:32 pm on May 17, 2016:
    I dont think you need the le64toh now that you’re using GetUint64?

    sipa commented at 6:15 pm on May 17, 2016:
    Oops, fixed.
  26. in src/coins.h: in 658c648161 outdated
    296@@ -295,7 +297,7 @@ struct CCoinsCacheEntry
    297     CCoinsCacheEntry() : coins(), flags(0) {}
    298 };
    299 
    300-typedef boost::unordered_map<uint256, CCoinsCacheEntry, CCoinsKeyHasher> CCoinsMap;
    301+typedef boost::unordered_map<uint256, CCoinsCacheEntry, SaltedTxidHasher> CCoinsMap;
    


    TheBlueMatt commented at 5:57 pm on May 17, 2016:
    Does std/boost::unordered_map guarantee that this will not create SaltedTxidHashers willy-nilly, using different salts?

    sipa commented at 6:15 pm on May 17, 2016:
    Yes, a single hasher is constructed at map creation time.
  27. sipa force-pushed on May 17, 2016
  28. TheBlueMatt commented at 6:47 pm on May 17, 2016: member
    ACK a68ec21f7ed2978d8945a0f4cfd7e80bfa5fd917
  29. laanwj merged this on May 18, 2016
  30. laanwj closed this on May 18, 2016

  31. laanwj referenced this in commit 5e374f7306 on May 18, 2016
  32. rebroad commented at 8:58 am on August 9, 2016: contributor
    I’m trying to work out what problem these siphashes fix. What is the risk that this solves please?
  33. laanwj commented at 1:46 pm on August 13, 2016: member

    It’s fast and cryptographically solid - which reduces DoS risk by attackers remotely predicting the behavior of the hash function.

    As I quoted above already, pretty much same rationale as for Python: https://www.python.org/dev/peps/pep-0456/

  34. codablock referenced this in commit f4b4e39344 on Sep 16, 2017
  35. codablock referenced this in commit 5b9905b525 on Sep 19, 2017
  36. codablock referenced this in commit eed4e854a6 on Sep 27, 2017
  37. codablock referenced this in commit 8b28f5f995 on Dec 21, 2017
  38. elichai commented at 9:17 am on January 30, 2020: contributor

    Here is a comment about using SipHash-1-3 instead of SipHash-2-4: rust-lang/rust#29754 (comment)

    It would be approximately twice as fast for hashing txids, and the distinguisher mentioned in that comment is not relevant for data that is a multiple of 8 bytes (which is the case for us), as 8 bytes of padding are added in that case. Opinions?

    For future reference. Because of all the interesting results @jamesob got from the hashmap modifications, I decided to try benchmarking siphash2-4 vs 1-3 by replacing SaltedOutpointHasher and SaltedTxidHasher with SipHash1-3 and this is the result:

    Command: /home/profiler/bitcoin/src/bitcoind -datadir="/home/profiler/.bitcoin" -dbcache=12288 -debug=0 -nodebuglogfile -printtoconsole=0 -reindex-chainstate -stopatheight=600000

    Before:

    0real    119m16.133s
    1user    137m38.092s
    2sys     2m30.990s
    

    After:

    0real    115m6.045s
    1user    133m53.661s
    2sys     2m30.177s
    

    Spec: i7-8700 CPU @ 3.20GHz, 32GB RAM, NVMe SSD.

  39. furszy referenced this in commit 6881e1063f on Aug 3, 2020
  40. zkbot referenced this in commit c81adc3718 on Feb 15, 2021
  41. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me