Avoid integer division in the benchmark inner-most loop. #8115

pull gmaxwell wants to merge 1 commits into bitcoin:master from gmaxwell:better_bench changing 2 files +28 −14
  1. gmaxwell commented at 1:51 AM on May 29, 2016: contributor

    Previously the benchmark code used an integer division (%) with a non-constant in the inner-loop. This is quite slow on many processors, especially ones like ARM that lack a hardware divide.

    Even on fairly recent x86_64 like haswell an integer division can take something like 100 cycles-- making it comparable to the runtime of siphash.

    This change avoids the division by using bitmasking instead. This was especially easy since the count was only increased by doubling.

    This change also restarts the timing when the execution time was very low this avoids mintimes of zero in cases where one execution ends up below the timer resolution. It also reduces the impact of the overhead on the final result.

    The formatting of the prints is changed to not use scientific notation make it more machine readable (in particular, gnuplot croaks on the non-fixedpoint, and it doesn't sort correctly).

    This also hoists out all the floating point divisions out of the semi-hot path because it was easy to do so.

    It might be prudent to break out the critical test into a macro just to guarantee that it gets inlined. It might also make sense to just save out the intermediate counts and times and get the floating point completely out of the timing loop (because e.g. on hardware without a fast hardware FPU like some ARM it will still be slow enough to distort the results). I haven't done either of these in this commit.

  2. fanquake commented at 7:46 AM on May 29, 2016: member

    Master

    Benchmark,count,min,max,average
    RIPEMD160,415,0.00242996,0.00253913,0.00246098
    RollingBloom-refresh,1,0.000801,0.000801,0.000801
    RollingBloom-refresh,1,0.000111,0.000111,0.000111
    RollingBloom-refresh,1,0.000107,0.000107,0.000107
    RollingBloom-refresh,1,0.000107,0.000107,0.000107
    RollingBloom-refresh,1,0.000111,0.000111,0.000111
    RollingBloom-refresh,1,0.000107,0.000107,0.000107
    RollingBloom-refresh,1,0.000108,0.000108,0.000108
    RollingBloom-refresh,1,0.000108,0.000108,0.000108
    RollingBloom-refresh,1,0.000107,0.000107,0.000107
    RollingBloom-refresh,1,0.000131,0.000131,0.000131
    RollingBloom-refresh,1,0.000111,0.000111,0.000111
    RollingBloom-refresh,1,0.00011,0.00011,0.00011
    RollingBloom-refresh,1,0.000109,0.000109,0.000109
    RollingBloom-refresh,1,0.000108,0.000108,0.000108
    RollingBloom-refresh,1,0.000129,0.000129,0.000129
    RollingBloom-refresh,1,0.000109,0.000109,0.000109
    RollingBloom-refresh,1,0.000108,0.000108,0.000108
    RollingBloom-refresh,1,0.000114,0.000114,0.000114
    RollingBloom-refresh,1,0.000134,0.000134,0.000134
    RollingBloom-refresh,1,0.000109,0.000109,0.000109
    RollingBloom-refresh,1,0.000108,0.000108,0.000108
    RollingBloom-refresh,1,0.000108,0.000108,0.000108
    RollingBloom-refresh,1,0.00012,0.00012,0.00012
    RollingBloom-refresh,1,0.000111,0.000111,0.000111
    RollingBloom,1441791,6.77479e-07,1.19209e-06,7.49941e-07
    SHA1,575,0.00178885,0.00195658,0.0018124
    SHA256,239,0.00424147,0.00436625,0.00429098
    SHA512,383,0.00258148,0.00269461,0.00262054
    Sleep100ms,10,0.100642,0.105008,0.103793
    Trig,46137343,0,9.53674e-07,2.26862e-08
    

    This PR

    #Benchmark,count,min,max,average
    RIPEMD160,448,0.001245033173334,0.002638196945190,0.002461894814457
    RollingBloom-refresh,1,0.000635000000000,0.000635000000000,0.000635000000000
    RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
    RollingBloom-refresh,1,0.000107000000000,0.000107000000000,0.000107000000000
    RollingBloom-refresh,1,0.000204000000000,0.000204000000000,0.000204000000000
    RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
    RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
    RollingBloom-refresh,1,0.000120000000000,0.000120000000000,0.000120000000000
    RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
    RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
    RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
    RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
    RollingBloom-refresh,1,0.000127000000000,0.000127000000000,0.000127000000000
    RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
    RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
    RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
    RollingBloom-refresh,1,0.000134000000000,0.000134000000000,0.000134000000000
    RollingBloom-refresh,1,0.000113000000000,0.000113000000000,0.000113000000000
    RollingBloom-refresh,1,0.000111000000000,0.000111000000000,0.000111000000000
    RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
    RollingBloom-refresh,1,0.000110000000000,0.000110000000000,0.000110000000000
    RollingBloom-refresh,1,0.000107000000000,0.000107000000000,0.000107000000000
    RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
    RollingBloom,1310720,0.000000364444583,0.000000838096198,0.000000766858648
    SHA1,640,0.000909024336207,0.001938136418660,0.001843086257577
    SHA256,256,0.002209486499909,0.008500099182129,0.004300644621253
    SHA512,384,0.001319904176016,0.002813005447388,0.002615700786312
    Sleep100ms,10,0.205592155456543,0.210056066513062,0.104166316986084
    Trig,67108864,0.000000014997003,0.000000015448112,0.000000015188842
    
  3. laanwj commented at 7:13 AM on May 30, 2016: member

    utACK

  4. gmaxwell commented at 4:28 PM on May 30, 2016: contributor

    There is a bug in the min/max handling (see the sleep 100 timings). I'll fix.

  5. Avoid integer division in the benchmark inner-most loop.
    Previously the benchmark code used an integer division (%) with
     a non-constant in the inner-loop.  This is quite slow on many
     processors, especially ones like ARM that lack a hardware divide.
    
    Even on fairly recent x86_64 like haswell an integer division can
     take something like 100 cycles-- making it comparable to the
     runtime of siphash.
    
    This change avoids the division by using bitmasking instead. This
     was especially easy since the count was only increased by doubling.
    
    This change also restarts the timing when the execution time was
     very low this avoids mintimes of zero in cases where one execution
     ends up below the timer resolution. It also reduces the impact of
     the overhead on the final result.
    
    The formatting of the prints is changed to not use scientific
     notation make it more machine readable (in particular, gnuplot
     croaks on the non-fixedpoint, and it doesn't sort correctly).
    
    This also hoists out all the floating point divisions out of the
     semi-hot path because it was easy to do so.
    
    It might be prudent to break out the critical test into a macro
     just to guarantee that it gets inlined.  It might also make sense
     to just save out the intermediate counts and times and get the
     floating point completely out of the timing loop (because e.g.
     on hardware without a fast hardware FPU like some ARM it will
     still be slow enough to distort the results). I haven't done
     either of these in this commit.
    63ff57db4b
  6. laanwj merged this on May 31, 2016
  7. laanwj closed this on May 31, 2016

  8. laanwj referenced this in commit 0026e0ef34 on May 31, 2016
  9. codablock referenced this in commit 06a6770441 on Sep 16, 2017
  10. codablock referenced this in commit 13d7dd3d7c on Sep 19, 2017
  11. codablock referenced this in commit a3c63033d1 on Dec 21, 2017
  12. andvgal referenced this in commit 92d59b3958 on Jan 6, 2019
  13. zkbot referenced this in commit aa225ebb0b on Jan 24, 2020
  14. zkbot referenced this in commit 74ff73abab on Jan 24, 2020
  15. furszy referenced this in commit 4ed15cc69d on Jun 8, 2020
  16. MarcoFalke locked this on Sep 8, 2021
Labels

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-18 21:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me