Faster sigcache nonce #13204

pull JeremyRubin wants to merge 2 commits into bitcoin:master from JeremyRubin:faster-sigcache-nonce changing 4 files +71 −12
  1. JeremyRubin commented at 6:26 pm on May 9, 2018: contributor

    This PR replaces nonces in two places with pre-salted hashers.

    The nonce is chosen to be 64 bytes long so that it forces the SHA256 hasher to process the chunk. This leaves the next 64 (or 56 depending if final chunk) open for data. In the case of the script execution cache, this does not make a big performance improvement because the nonce was already properly padded to fit into one buffer, but does make the code a little simpler. In the case of the sig cache, this should reduce the hashing overhead slightly because we are less likely to need an additional processing step.

    I haven’t benchmarked this, but back of the envelope it should reduce the hashing by one buffer for all combinations except compressed public keys with compact signatures.

  2. laanwj added the label Validation on May 9, 2018
  3. in src/script/sigcache.cpp:35 in 9c8cba86fd outdated
    32 
    33 public:
    34     CSignatureCache()
    35     {
    36-        GetRandBytes(nonce.begin(), 32);
    37+        base_blob<64*8> nonce;
    


    jimpo commented at 10:29 pm on May 14, 2018:
    nit: unsigned char nonce[64] ought to work fine.
  4. jimpo approved
  5. jimpo commented at 10:33 pm on May 14, 2018: contributor
    utACK 9c8cba8
  6. laanwj referenced this in commit 3c2a41a9fc on May 23, 2018
  7. DrahtBot closed this on Jul 22, 2018

  8. DrahtBot reopened this on Jul 22, 2018

  9. DrahtBot commented at 6:55 pm on September 7, 2018: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #18815 (bench: Add logging benchmark by MarcoFalke)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  10. sipa commented at 3:47 am on November 10, 2018: member
    Benchmarks would be welcome.
  11. DrahtBot closed this on Apr 28, 2019

  12. DrahtBot commented at 7:11 pm on April 28, 2019: member
  13. DrahtBot reopened this on Apr 28, 2019

  14. MarcoFalke commented at 0:27 am on April 29, 2019: member
    This will be closed due to inactivity in two weeks
  15. MarcoFalke commented at 2:15 pm on May 7, 2019: member
    Closing for now. Let me know when you want this reopened to work on it again.
  16. MarcoFalke closed this on May 7, 2019

  17. gmaxwell commented at 7:23 pm on May 7, 2019: contributor
    This seems like an obvious (minor) win. It does need to be benchmarked, but it could just be an informal test, not some benchmark tool checked in.
  18. MarcoFalke commented at 8:21 pm on May 7, 2019: member

    The tests were failing, so at the very least this needs to be fixed:

    0test_bitcoin-qt: random.cpp:577: void ProcRand(unsigned char*, int, RNGLevel): Assertion `num <= 32' failed.
    
  19. MarcoFalke reopened this on May 7, 2019

  20. jamesob commented at 8:41 pm on May 7, 2019: member

    Oddly, pruned IBD (500_000 - 505_000) comparison indicates this PR slightly slower than master. Maybe I’ll try again after a rebase?

    nonce ibd local 500000 505000 dbcache=2048

    faster-sigcache-nonce vs. master (absolute)

    name iterations faster-sigcache-nonce master
    build.make.23.clang 1 125.9031 (± 0.0000) 135.5557 (± 0.0000)
    build.make.23.clang.mem-usage 1 700992.0000 (± 0.0000) 647144.0000 (± 0.0000)
    ibd.local.500000.505000.dbcache=2048 3 472.4958 (± 2.4356) 439.0951 (± 2.0998)
    ibd.local.500000.505000.dbcache=2048.mem-usage 3 2434952.0000 (± 2700.1412) 2417438.6667 (± 1709.0577)

    faster-sigcache-nonce vs. master (relative)

    name iterations faster-sigcache-nonce master
    build.make.23.clang 1 1.000 1.08
    build.make.23.clang.mem-usage 1 1.083 1.00
    ibd.local.500000.505000.dbcache=2048 3 1.076 1.00
    ibd.local.500000.505000.dbcache=2048.mem-usage 3 1.007 1.00
  21. JeremyRubin force-pushed on May 7, 2019
  22. JeremyRubin commented at 8:50 pm on May 7, 2019: contributor
    Rebased – go ahead and retry!
  23. in src/validation.cpp:1348 in f6b9da76b6 outdated
    1344@@ -1345,9 +1345,14 @@ int GetSpendHeight(const CCoinsViewCache& inputs)
    1345 
    1346 
    1347 static CuckooCache::cache<uint256, SignatureCacheHasher> scriptExecutionCache;
    1348-static uint256 scriptExecutionCacheNonce(GetRandHash());
    1349+static CSHA256 scriptExecutionCacheHasher;
    


    MarcoFalke commented at 10:41 pm on May 7, 2019:

    style-nit:

    0static CSHA256 g_scriptExecutionCacheHasher;
    

    Also, should probably squash the third commit, so that all commits compile and pass the tests?


    JeremyRubin commented at 10:49 pm on May 7, 2019:
    I’d prefer to leave it intact as we would want to undo this commit in the future if we get rid of the 32 byte limit on GetRand which did not exist at the time this PR was made, but I’m happy to squash if it’s your preference
  24. gmaxwell commented at 11:14 pm on May 7, 2019: contributor
    Irrelevant nit: I’d recommend just zerofilling the input rather than calling getrand again. 256-bits is more than then enough (and less wouldn’t make getrand any faster).
  25. jamesob commented at 3:00 am on May 8, 2019: member

    Current revision fails to build. Benched the rebase and didn’t see much difference between master and this branch, though I don’t like how much variance there was so I’m going to run again once this compiles.

    ibd local 500000 505000 dbcache=2048

  26. JeremyRubin commented at 10:09 pm on May 9, 2019: contributor
    fixed build issue; travis needs to be kicked @MarcoFalke
  27. MarcoFalke commented at 10:40 pm on May 9, 2019: member
    kicked travis, @jamesob needs to be kicked
  28. MarcoFalke commented at 10:41 pm on May 9, 2019: member
    I’d prefer if the commits were squashed, so that git bisect would’t break on half of them
  29. JeremyRubin commented at 10:48 pm on May 9, 2019: contributor
    I can squash them down to one commit, unless you prefer the original two?
  30. MarcoFalke commented at 12:26 pm on May 10, 2019: member
    One commit is fine. Makes it trivial to see that bisect won’t break.
  31. jamesob commented at 1:38 pm on May 10, 2019: member

    Yeah I need a good kick once in a while.

    Benching shows this as being negligibly slower than master.

    ibd local 500005 505000 dbcache=2048

    faster-sigcache-nonce vs. master (absolute)

    bench name x faster-sigcache-nonce master
    ibd.local.500000.505000.dbcache=2048 1 310.3099 (± 0.0000)
    ibd.local.500000.505000.dbcache=2048.mem-usage 1 2459212.0000 (± 0.0000)
    ibd.local.500005.505000.dbcache=2048 2 323.1098 (± 4.7144) 319.6353 (± 1.7855)
    ibd.local.500005.505000.dbcache=2048.mem-usage 2 2457362.0000 (± 1446.0000) 2464301.3333 (± 5965.4940)

    faster-sigcache-nonce vs. master (relative)

    bench name x faster-sigcache-nonce master
    ibd.local.500000.505000.dbcache=2048 1 1.00
    ibd.local.500000.505000.dbcache=2048.mem-usage 1 1.00
    ibd.local.500005.505000.dbcache=2048 2 1.01 1.000
    ibd.local.500005.505000.dbcache=2048.mem-usage 2 1.00 1.003
  32. MarcoFalke commented at 5:48 pm on May 10, 2019: member

    The code is only run when fScriptCheck (see https://github.com/bitcoin/bitcoin/pull/13868/files#diff-24efdb00bfbe56b140fb006b562cc70bR1847)

    You might want to check height=[565k-575k]

  33. JeremyRubin commented at 10:19 pm on May 10, 2019: contributor

    Are you able to just look at the time spent in connect block as opposed to the aggregate time.

    Thats the time we reasonably care about, the total time is not granular enough

  34. gmaxwell commented at 11:06 pm on May 11, 2019: contributor
    If it’s making total time worse though, that suggests either we’re suffering measurement error or something interesting is going on. Indeed, this only applies where scriptchecks are taking effect. Use assumevalid=0 ?
  35. jamesob commented at 3:25 pm on May 13, 2019: member

    Latest bench with -assumevalid=0 is consistent with previous results: basically no difference between this and master. I think the only way to sensibly proceed on this is if someone comes up with an actually-relevant microbench that I can measure. I ran the microbenches on this last run, but it wasn’t clear to me which bench was actually worth looking at, and the total runtime was swamped by (spurious?) prevector differences.

    NACK from me until we come up with a clear way of showing perf improvements.

    ibd local 500000 505000 dbcache=2048

  36. MarcoFalke commented at 4:00 pm on May 13, 2019: member
    Could we remove the caching if it has no impact with noassumevalid (other than code complexity)?
  37. MarcoFalke commented at 4:31 pm on May 13, 2019: member
    I guess the only way to see a difference is to populate a mempool and then create a block with and without the cache enabled?
  38. JeremyRubin commented at 5:30 pm on May 13, 2019: contributor
    @jamesob are the times you are showing total time or just time inside or block validation/check inputs?
  39. MarcoFalke commented at 5:34 pm on May 13, 2019: member
    Total time
  40. JeremyRubin commented at 6:21 pm on May 13, 2019: contributor
    It should be fairly easy to pull out the validation times from the debug log and compare those.
  41. MarcoFalke commented at 6:25 pm on May 13, 2019: member
  42. JeremyRubin force-pushed on May 13, 2019
  43. JeremyRubin commented at 6:53 pm on May 13, 2019: contributor
    rebased, squashed, applied @gmaxwell’s nit (was easiest to just write the nonce twice rather than write zeros, which would have added more lines).
  44. in src/script/sigcache.cpp:40 in 51b6e73377 outdated
    34     CSignatureCache()
    35     {
    36-        GetRandBytes(nonce.begin(), 32);
    37+        uint256 nonce = GetRandHash();
    38+        salted_hasher.Write(nonce.begin(), 32);
    39+        salted_hasher.Write(nonce.begin(), 32);
    


    ryanofsky commented at 6:12 pm on July 23, 2019:

    It’d be good to have a code comment here saying why this writes the same nonce twice, since it does look strange. The PR description has a good explanation, but there should be some hint in the code.

    The nonce is chosen to be 64 bytes long so that it forces the SHA256 hasher to process the chunk. This leaves the next 64 (or 56 depending if final chunk) open for data. In the case of the script execution cache, this does not make a big performance improvement because the nonce was already properly padded to fit into one buffer, but does make the code a little simpler. In the case of the sig cache, this should reduce the hashing overhead slightly because we are less likely to need an additional processing step.


    JeremyRubin commented at 8:14 pm on July 24, 2019:
    Comment added.
  45. in src/validation.cpp:1488 in 51b6e73377 outdated
    1348-static uint256 scriptExecutionCacheNonce(GetRandHash());
    1349+static CuckooCache::cache<uint256, SignatureCacheHasher> g_scriptExecutionCache;
    1350+static CSHA256 g_scriptExecutionCacheHasher;
    1351 
    1352 void InitScriptExecutionCache() {
    1353+    // Setup the salted hasher
    


    ryanofsky commented at 6:14 pm on July 23, 2019:
    Again would be good to say something about why double nonce setup is intentional.
  46. ryanofsky commented at 6:33 pm on July 23, 2019: member

    Code review ACK 51b6e73377cad1a1b0d180be5ff72e1f9c4e7a14. This is a simple code cleanup. Nice especially to get rid or the 19 + 32 + 4 = 55 bytes (+ 8 + 1 = 64) checking.

    Since there isn’t a microbenchmark for this or other benchmark showing a performance difference, it’s possible the PR title is overpromising by saying this is faster. But “back of the envelope” reasoning to me seems adequate for a minor change like this.

  47. ryanofsky approved
  48. JeremyRubin force-pushed on Jul 24, 2019
  49. JeremyRubin commented at 8:37 pm on July 24, 2019: contributor

    I added a contrived microbenchmark which shows:

    0# Benchmark, evals, iterations, total, min, max, median
    1PrePadded, 5, 10000, 0.00320418, 6.18847e-08, 7.06114e-08, 6.28207e-08
    2RegularPadded, 5, 10000, 0.00560628, 1.09935e-07, 1.18426e-07, 1.10481e-07
    

    Pre padding is indeed better than regular padding.

  50. ryanofsky approved
  51. ryanofsky commented at 8:50 pm on July 24, 2019: member
    utACK 5b877679d9f20d7ce18097b22b0843a652eb105b. Only changes since last review are new comments and benchmark. The new comments look good! And the new microbenchmark should be a sufficient sanity check for performance of this change, along with the previous neutral IBD results. Like I wrote in #13204#pullrequestreview-265587021, I think this change seems nice even as a simple code cleanup.
  52. in src/bench/hashpadding.cpp:41 in 5b877679d9 outdated
    36+    uint256 nonce = GetRandHash();
    37+    uint256 data = GetRandHash();
    38+    while (state.KeepRunning()) {
    39+        unsigned char out[32];
    40+        CSHA256 h = hasher;
    41+        hasher.Write(nonce.begin(), 32);
    


    MarcoFalke commented at 9:42 pm on July 24, 2019:
    h or hasher? If hasher is intended, it looks like that this benchmark will perform as good as the previous one on every other run. Might want to add a comment to explain why. If h is intended, could make hasher const?

    JeremyRubin commented at 9:54 pm on July 24, 2019:
    Ah you’re right. good catch. Let me fix that…

    JeremyRubin commented at 9:55 pm on July 24, 2019:
    It can’t be made const because it needs to be updated a little bit.

    JeremyRubin commented at 9:56 pm on July 24, 2019:
    well it can be for one of the tests, and not the other. But I want to keep the code as similar as possible between pre/regular.
  53. in src/script/sigcache.cpp:27 in 5b877679d9 outdated
    23@@ -24,21 +24,27 @@ class CSignatureCache
    24 {
    25 private:
    26      //! Entries are SHA256(nonce || signature hash || public key || signature):
    27-    uint256 nonce;
    28+    CSHA256 salted_hasher;
    


    MarcoFalke commented at 9:43 pm on July 24, 2019:
    0    CSHA256 m_salted_hasher;
    
  54. MarcoFalke commented at 9:43 pm on July 24, 2019: member

    Thanks for adding a benchmark. Just one question about it.

    Would be nice to add the benchmark in the first commit, so that performance can be compared.

  55. JeremyRubin commented at 9:53 pm on July 24, 2019: contributor
    @MarcoFalke because of modularization, there’s no real interfaces exposed for me to run in a bench, so the benchmark just had both techniques demonstrated (doesn’t really matter if the commit comes before or after, the code changes don’t affect the benchmark).
  56. JeremyRubin force-pushed on Jul 24, 2019
  57. JeremyRubin commented at 10:01 pm on July 24, 2019: contributor
    @MarcoFalke nits addressed
  58. ryanofsky approved
  59. ryanofsky commented at 5:06 pm on July 31, 2019: member
    utACK dec5f968788bd96f1eec4dfea3f4ab584a2d7672. Only changes since last review were adding suggested m_ prefixes and making the RegularPadded case closer to what it’s supposed to measure.
  60. ariard approved
  61. ariard commented at 5:47 pm on August 2, 2019: member

    ACK dec5f96, I’ve tested change locally and run benchmark on a i7-7700HQ CPU @ 2.80GHz (Debian), got the following results (yes PrePadded is faster):

    0# Benchmark, evals, iterations, total, min, max, median
    1RegularPadded, 5, 10000, 0.0195961, 3.75465e-07, 4.17618e-07, 3.83324e-07
    2# Benchmark, evals, iterations, total, min, max, median
    3PrePadded, 5, 10000, 0.0114673, 1.9201e-07, 2.7354e-07, 2.1187e-07
    
  62. DrahtBot added the label Needs rebase on Sep 2, 2019
  63. JeremyRubin force-pushed on Apr 29, 2020
  64. JeremyRubin force-pushed on Apr 29, 2020
  65. Add Hash Padding Microbenchmarks 5495fa5850
  66. DrahtBot removed the label Needs rebase on Apr 29, 2020
  67. JeremyRubin force-pushed on Apr 29, 2020
  68. Use salted hasher instead of nonce in sigcache
    Use salted hasher instead of nonce in Script Execution Cache
    
    Don't read more than 32 bytes from GetRand
    
    Apply g_* naming convention to scriptExecutionCache in validation.cpp
    
    Fully apply g_* naming convention to scriptCacheHasher
    
    Write same uint256 nonce twice for cache hash rather than calling getrand twice
    
    Use salted hasher instead of nonce in sigcache
    
    Use salted hasher instead of nonce in Script Execution Cache
    
    Don't read more than 32 bytes from GetRand
    
    Apply g_* naming convention to scriptExecutionCache in validation.cpp
    
    Fully apply g_* naming convention to scriptCacheHasher
    
    Write same uint256 nonce twice for cache hash rather than calling getrand twice
    152e8baf08
  69. ryanofsky approved
  70. ryanofsky commented at 8:54 pm on May 13, 2020: member
    Code review ACK 152e8baf08c7379e5cc09f90863e6309bdd4866c. No code changes, just rebase since last review and expanded commit message
  71. JeremyRubin commented at 9:59 pm on May 13, 2020: contributor

    I think this is ready to merge @MarcoFalke.

    All outstanding feedback has been addressed & has two “fresh” ACKs and a handful of stale utACKs/makes senses.

  72. ryanofsky commented at 7:06 pm on May 19, 2020: member

    I think this is ready to merge @MarcoFalke.

    All outstanding feedback has been addressed & has two “fresh” ACKs and a handful of stale utACKs/makes senses.

    Any agreement / disagreement?

  73. MarcoFalke merged this on Jun 2, 2020
  74. MarcoFalke closed this on Jun 2, 2020

  75. MarcoFalke commented at 1:29 pm on June 2, 2020: member

    On arm:

    0$ ./src/bench/bench_bitcoin --filter=.*Padded
    1# Benchmark, evals, iterations, total, min, max, median
    2PrePadded, 5, 10000, 0.0305467, 6.09309e-07, 6.13757e-07, 6.10061e-07
    3RegularPadded, 5, 10000, 0.057972, 1.15744e-06, 1.16534e-06, 1.15813e-06
    
  76. sidhujag referenced this in commit 5d2d0f4905 on Jun 2, 2020
  77. Fabcien referenced this in commit 95d0162dae on Feb 12, 2021
  78. UdjinM6 referenced this in commit c85dc3df80 on Jun 18, 2021
  79. UdjinM6 referenced this in commit 26087f4dd1 on Jun 24, 2021
  80. UdjinM6 referenced this in commit f33ed5594a on Jun 26, 2021
  81. UdjinM6 referenced this in commit 408d0a2afc on Jun 26, 2021
  82. UdjinM6 referenced this in commit c8ee5e1665 on Jun 28, 2021
  83. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me