Enable Link Time Optimizations #16791

pull elichai wants to merge 1 commits into bitcoin:master from elichai:2019-09-thinlto changing 2 files +28 −4
  1. elichai commented at 0:10 am on September 3, 2019: contributor

    Mostly an idea, hard to measure exact benefits, if anyone has any ideas of good benchmarks for this will be welcome, altough in theory this should increase performance.

    Minimal building + bench_bitcoin benchmark in the comment.

    EDIT: As fanquake pointed out, this was brought up before and there were some issues around IBD. I personally do not think that not stripped binary sizes really matter, because we care about sizes only when shipping to others(i.e. part of a linux dist) and then we strip anyway. I will conduct IBD tests. I have a feeling that this will show good results with clang(thin lto) and not so good ones with gcc. I’ll report results.

  2. Enable Link Time Optimizations 3643493b5d
  3. elichai commented at 0:10 am on September 3, 2019: contributor

    clang without LTO run 1

     0# Benchmark, evals, iterations, total, min, max, median
     1AssembleBlock, 5, 700, 1.75888, 0.000499424, 0.000509562, 0.000501284
     2Base58CheckEncode, 5, 320000, 4.44531, 2.76679e-06, 2.79008e-06, 2.78027e-06
     3Base58Decode, 5, 800000, 2.51903, 6.27043e-07, 6.33854e-07, 6.29572e-07
     4Base58Encode, 5, 470000, 4.544, 1.92153e-06, 1.9427e-06, 1.93532e-06
     5Bech32Decode, 5, 800000, 1.12439, 2.77588e-07, 2.83886e-07, 2.80391e-07
     6Bech32Encode, 5, 800000, 1.52587, 3.7683e-07, 3.84009e-07, 3.81521e-07
     7BenchLockedPool, 5, 1300, 3.97761, 0.000601612, 0.000639539, 0.000602839
     8BnBExhaustion, 5, 650, 2.60574, 0.000790225, 0.000820404, 0.000799679
     9CCheckQueueSpeedPrevectorJob, 5, 1400, 13.258, 0.00183946, 0.00196889, 0.00184942
    10CCoinsCaching, 5, 170000, 0.328199, 3.58821e-07, 4.01443e-07, 3.91374e-07
    11CoinSelection, 5, 650, 0.572936, 0.000171902, 0.000183726, 0.000176019
    12ConstructGCSFilter, 5, 1000, 8.10747, 0.00153589, 0.00174128, 0.001621
    13DeserializeAndCheckBlockTest, 5, 160, 4.65824, 0.00567065, 0.00629352, 0.00569696
    14DeserializeBlockTest, 5, 130, 3.08922, 0.00465626, 0.00492937, 0.00473448
    15DuplicateInputs, 5, 10, 0.357063, 0.00663135, 0.00819393, 0.00668833
    16FastRandom_1bit, 5, 440000000, 3.66524, 1.66211e-09, 1.67058e-09, 1.66704e-09
    17FastRandom_32bit, 5, 110000000, 4.597, 8.26189e-09, 8.46674e-09, 8.3029e-09
    18MatchGCSFilter, 5, 50000, 8.19405, 3.22754e-05, 3.39363e-05, 3.25489e-05
    19MempoolEviction, 5, 41000, 2.87917, 1.40002e-05, 1.40963e-05, 1.4055e-05
    20MerkleRoot, 5, 800, 4.37644, 0.00109208, 0.00109532, 0.00109406
    21PrevectorClearNontrivial, 5, 28300, 0.000286874, 1.66226e-09, 2.55887e-09, 1.75353e-09
    22PrevectorClearTrivial, 5, 88600, 0.000678416, 1.51194e-09, 1.53922e-09, 1.53875e-09
    23PrevectorDeserializeNontrivial, 5, 6800, 6.14197, 0.000179907, 0.000181122, 0.000180681
    24PrevectorDeserializeTrivial, 5, 52000, 5.27403, 2.02025e-05, 2.03668e-05, 2.02626e-05
    25PrevectorDestructorNontrivial, 5, 28800, 0.000242584, 1.57406e-09, 1.76795e-09, 1.69417e-09
    26PrevectorDestructorTrivial, 5, 88900, 0.000692209, 1.41076e-09, 2.01678e-09, 1.45669e-09
    27PrevectorResizeNontrivial, 5, 28900, 1.09466, 7.54659e-06, 7.63179e-06, 7.57301e-06
    28PrevectorResizeTrivial, 5, 90300, 3.63849, 8.03433e-06, 8.08772e-06, 8.04891e-06
    29RIPEMD160, 5, 440, 4.87181, 0.00220603, 0.00222362, 0.00221297
    30RollingBloom, 5, 1500000, 3.624, 4.71539e-07, 4.92985e-07, 4.82655e-07
    31SHA1, 5, 570, 4.6708, 0.00162474, 0.00166248, 0.00163211
    32SHA256, 5, 340, 4.53006, 0.00262891, 0.00274689, 0.0026462
    33SHA256D64_1024, 5, 7400, 4.87046, 0.000115724, 0.000142581, 0.000135341
    34SHA256_32b, 5, 4700000, 4.58475, 1.91316e-07, 2.0184e-07, 1.94727e-07
    35SHA512, 5, 330, 3.97634, 0.00235566, 0.00248732, 0.00239992
    36SipHash_32b, 5, 40000000, 4.72604, 2.29608e-08, 2.44476e-08, 2.33909e-08
    37Sleep100ms, 5, 10, 5.01552, 0.100257, 0.100413, 0.100312
    38Trig, 5, 12000000, 0.761389, 1.1107e-08, 1.3417e-08, 1.3107e-08
    39VerifyScriptBench, 5, 6300, 2.60303, 7.9582e-05, 8.53183e-05, 8.21487e-05
    

    clang without lto run 2

     0# Benchmark, evals, iterations, total, min, max, median
     1AssembleBlock, 5, 700, 1.96617, 0.000499029, 0.00061569, 0.000555522
     2Base58CheckEncode, 5, 320000, 4.58013, 2.85247e-06, 2.88703e-06, 2.85549e-06
     3Base58Decode, 5, 800000, 2.59868, 6.43124e-07, 6.6978e-07, 6.45136e-07
     4Base58Encode, 5, 470000, 4.73274, 1.99772e-06, 2.0328e-06, 2.01745e-06
     5Bech32Decode, 5, 800000, 1.17128, 2.91664e-07, 2.94089e-07, 2.92609e-07
     6Bech32Encode, 5, 800000, 1.58015, 3.93101e-07, 3.98773e-07, 3.94542e-07
     7BenchLockedPool, 5, 1300, 4.17851, 0.000625214, 0.000674724, 0.000633105
     8BnBExhaustion, 5, 650, 2.65366, 0.000803413, 0.000839476, 0.000811133
     9CCheckQueueSpeedPrevectorJob, 5, 1400, 13.386, 0.00183121, 0.00195832, 0.00193187
    10CCoinsCaching, 5, 170000, 0.33303, 3.75554e-07, 4.06653e-07, 3.9148e-07
    11CoinSelection, 5, 650, 0.549849, 0.0001686, 0.000170028, 0.000169009
    12ConstructGCSFilter, 5, 1000, 7.81065, 0.00153585, 0.00160931, 0.00155086
    13DeserializeAndCheckBlockTest, 5, 160, 4.3975, 0.00542555, 0.00555456, 0.00549932
    14DeserializeBlockTest, 5, 130, 3.01934, 0.00452266, 0.00473618, 0.0046844
    15DuplicateInputs, 5, 10, 0.341157, 0.00633856, 0.00768086, 0.00638198
    16FastRandom_1bit, 5, 440000000, 3.63868, 1.61861e-09, 1.70596e-09, 1.64989e-09
    17FastRandom_32bit, 5, 110000000, 4.5247, 7.9982e-09, 8.44111e-09, 8.28698e-09
    18MatchGCSFilter, 5, 50000, 7.74736, 3.03944e-05, 3.26011e-05, 3.06948e-05
    19MempoolEviction, 5, 41000, 2.98457, 1.41979e-05, 1.50077e-05, 1.44873e-05
    20MerkleRoot, 5, 800, 5.06723, 0.00113876, 0.00143356, 0.00125824
    21PrevectorClearNontrivial, 5, 28300, 0.000297168, 1.73883e-09, 2.80184e-09, 2.0053e-09
    22PrevectorClearTrivial, 5, 88600, 0.000623501, 1.33983e-09, 1.52511e-09, 1.40331e-09
    23PrevectorDeserializeNontrivial, 5, 6800, 6.34991, 0.000179312, 0.000194279, 0.000186073
    24PrevectorDeserializeTrivial, 5, 52000, 5.48437, 2.05534e-05, 2.16369e-05, 2.10327e-05
    25PrevectorDestructorNontrivial, 5, 28800, 0.000243668, 1.5741e-09, 2.10503e-09, 1.57552e-09
    26PrevectorDestructorTrivial, 5, 88900, 0.000728, 1.47732e-09, 2.03974e-09, 1.5748e-09
    27PrevectorResizeNontrivial, 5, 28900, 1.12402, 7.61811e-06, 8.15651e-06, 7.66037e-06
    28PrevectorResizeTrivial, 5, 90300, 3.74789, 7.93443e-06, 8.72624e-06, 8.28083e-06
    29RIPEMD160, 5, 440, 5.06353, 0.00226185, 0.00238738, 0.00226887
    30RollingBloom, 5, 1500000, 4.37212, 4.89467e-07, 7.16985e-07, 5.49323e-07
    31SHA1, 5, 570, 4.97618, 0.00164691, 0.0018421, 0.00175501
    32SHA256, 5, 340, 4.76886, 0.002788, 0.00284018, 0.00279755
    33SHA256D64_1024, 5, 7400, 4.36494, 0.000117053, 0.000119305, 0.000117821
    34SHA256_32b, 5, 4700000, 4.78085, 2.01433e-07, 2.09491e-07, 2.0206e-07
    35SHA512, 5, 330, 4.02069, 0.00239671, 0.00248702, 0.00243321
    36SipHash_32b, 5, 40000000, 4.7499, 2.34689e-08, 2.42086e-08, 2.35923e-08
    37Sleep100ms, 5, 10, 5.01435, 0.100229, 0.100332, 0.100288
    38Trig, 5, 12000000, 0.673552, 1.0744e-08, 1.22859e-08, 1.10018e-08
    39VerifyScriptBench, 5, 6300, 2.60643, 8.2142e-05, 8.32055e-05, 8.27119e-05
    

    clang with lto run 1

     0# Benchmark, evals, iterations, total, min, max, median
     1AssembleBlock, 5, 700, 1.72164, 0.000488554, 0.000495535, 0.00049131
     2Base58CheckEncode, 5, 320000, 4.15869, 2.59372e-06, 2.60309e-06, 2.60095e-06
     3Base58Decode, 5, 800000, 2.17661, 5.41487e-07, 5.4834e-07, 5.44013e-07
     4Base58Encode, 5, 470000, 4.14466, 1.75899e-06, 1.76838e-06, 1.76369e-06
     5Bech32Decode, 5, 800000, 1.19821, 2.97339e-07, 3.00996e-07, 2.99391e-07
     6Bech32Encode, 5, 800000, 1.62597, 4.03557e-07, 4.08587e-07, 4.0681e-07
     7BenchLockedPool, 5, 1300, 3.89975, 0.000597339, 0.000602308, 0.000599304
     8BnBExhaustion, 5, 650, 2.36543, 0.00072606, 0.000729458, 0.00072771
     9CCheckQueueSpeedPrevectorJob, 5, 1400, 12.4383, 0.00173862, 0.0018336, 0.00177781
    10CCoinsCaching, 5, 170000, 0.312746, 3.501e-07, 3.79826e-07, 3.68905e-07
    11CoinSelection, 5, 650, 0.546785, 0.000166795, 0.000169946, 0.000168376
    12ConstructGCSFilter, 5, 1000, 7.03575, 0.00140299, 0.0014086, 0.00140827
    13DeserializeAndCheckBlockTest, 5, 160, 4.18461, 0.00522168, 0.00523717, 0.00523443
    14DeserializeBlockTest, 5, 130, 2.87828, 0.00441814, 0.00444233, 0.00442485
    15DuplicateInputs, 5, 10, 0.328547, 0.00644949, 0.00681943, 0.00654519
    16FastRandom_1bit, 5, 440000000, 3.2298, 1.46282e-09, 1.4778e-09, 1.46598e-09
    17FastRandom_32bit, 5, 110000000, 4.31731, 7.82775e-09, 7.86024e-09, 7.85525e-09
    18MatchGCSFilter, 5, 50000, 6.96338, 2.77827e-05, 2.78858e-05, 2.7871e-05
    19MempoolEviction, 5, 41000, 2.47832, 1.20621e-05, 1.21223e-05, 1.20881e-05
    20MerkleRoot, 5, 800, 4.29475, 0.00106896, 0.00107777, 0.00107411
    21PrevectorClearNontrivial, 5, 28300, 0.000235125, 1.41194e-09, 1.75353e-09, 1.72703e-09
    22PrevectorClearTrivial, 5, 88600, 0.000636042, 1.38168e-09, 1.45128e-09, 1.44847e-09
    23PrevectorDeserializeNontrivial, 5, 6800, 6.45151, 0.000188512, 0.00019084, 0.000190182
    24PrevectorDeserializeTrivial, 5, 52000, 5.1855, 1.99341e-05, 1.99596e-05, 1.99413e-05
    25PrevectorDestructorNontrivial, 5, 28800, 0.000230332, 1.57406e-09, 1.6334e-09, 1.58128e-09
    26PrevectorDestructorTrivial, 5, 88900, 0.000636916, 1.39389e-09, 1.51434e-09, 1.42107e-09
    27PrevectorResizeNontrivial, 5, 28900, 1.06484, 7.31545e-06, 7.40119e-06, 7.37174e-06
    28PrevectorResizeTrivial, 5, 90300, 3.55419, 7.8563e-06, 7.88425e-06, 7.87187e-06
    29RIPEMD160, 5, 440, 4.72668, 0.00214663, 0.00215083, 0.00214833
    30RollingBloom, 5, 1500000, 3.433, 4.5134e-07, 4.60353e-07, 4.58777e-07
    31SHA1, 5, 570, 4.62547, 0.00162106, 0.00162443, 0.00162312
    32SHA256, 5, 340, 4.44984, 0.00260789, 0.00263764, 0.00261407
    33SHA256D64_1024, 5, 7400, 4.23212, 0.000114082, 0.000114491, 0.000114457
    34SHA256_32b, 5, 4700000, 4.30349, 1.81287e-07, 1.85391e-07, 1.81894e-07
    35SHA512, 5, 330, 3.91958, 0.002371, 0.00238079, 0.00237517
    36SipHash_32b, 5, 40000000, 4.53516, 2.26558e-08, 2.27179e-08, 2.26719e-08
    37Sleep100ms, 5, 10, 5.02578, 0.100485, 0.100544, 0.100521
    38Trig, 5, 12000000, 0.621274, 9.54652e-09, 1.06749e-08, 1.05556e-08
    39VerifyScriptBench, 5, 6300, 2.58936, 8.19543e-05, 8.24921e-05, 8.2189e-05
    

    clang with lto run 2

     0# Benchmark, evals, iterations, total, min, max, median
     1AssembleBlock, 5, 700, 1.99427, 0.000493252, 0.000665168, 0.000559993
     2Base58CheckEncode, 5, 320000, 5.05568, 3.06908e-06, 3.25902e-06, 3.13718e-06
     3Base58Decode, 5, 800000, 2.95883, 6.62271e-07, 7.67973e-07, 7.58572e-07
     4Base58Encode, 5, 470000, 5.0744, 1.94787e-06, 2.43021e-06, 2.21603e-06
     5Bech32Decode, 5, 800000, 1.36161, 3.20821e-07, 3.8574e-07, 3.31155e-07
     6Bech32Encode, 5, 800000, 1.87327, 4.1505e-07, 5.15584e-07, 4.81936e-07
     7BenchLockedPool, 5, 1300, 4.97714, 0.000717744, 0.000807546, 0.000761353
     8BnBExhaustion, 5, 650, 2.9194, 0.000846583, 0.000942232, 0.000902514
     9CCheckQueueSpeedPrevectorJob, 5, 1400, 13.2917, 0.00177237, 0.00208845, 0.0018112
    10CCoinsCaching, 5, 170000, 0.312361, 3.58964e-07, 3.74724e-07, 3.69679e-07
    11CoinSelection, 5, 650, 0.544713, 0.000166541, 0.000169268, 0.000167463
    12ConstructGCSFilter, 5, 1000, 7.76531, 0.00150814, 0.0016057, 0.0015303
    13DeserializeAndCheckBlockTest, 5, 160, 4.32749, 0.00533709, 0.00547389, 0.00540589
    14DeserializeBlockTest, 5, 130, 2.9427, 0.00449615, 0.00455946, 0.00452504
    15DuplicateInputs, 5, 10, 0.31676, 0.0063014, 0.00635066, 0.0063463
    16FastRandom_1bit, 5, 440000000, 3.50853, 1.58325e-09, 1.60725e-09, 1.59132e-09
    17FastRandom_32bit, 5, 110000000, 4.42874, 7.95085e-09, 8.17166e-09, 8.05469e-09
    18MatchGCSFilter, 5, 50000, 8.16707, 3.12369e-05, 3.47419e-05, 3.22018e-05
    19MempoolEviction, 5, 41000, 3.01924, 1.31778e-05, 1.61007e-05, 1.46303e-05
    20MerkleRoot, 5, 800, 4.27195, 0.00103943, 0.00113243, 0.00106114
    21PrevectorClearNontrivial, 5, 28300, 0.000258918, 1.49e-09, 2.61336e-09, 1.69908e-09
    22PrevectorClearTrivial, 5, 88600, 0.000699667, 1.44422e-09, 1.75696e-09, 1.53358e-09
    23PrevectorDeserializeNontrivial, 5, 6800, 5.88229, 0.000172483, 0.000173993, 0.000172633
    24PrevectorDeserializeTrivial, 5, 52000, 5.05375, 1.94089e-05, 1.94787e-05, 1.94224e-05
    25PrevectorDestructorNontrivial, 5, 28800, 0.000259292, 1.57552e-09, 2.55642e-09, 1.63194e-09
    26PrevectorDestructorTrivial, 5, 88900, 0.00065175, 1.39061e-09, 1.57434e-09, 1.44263e-09
    27PrevectorResizeNontrivial, 5, 28900, 1.03944, 7.14281e-06, 7.25415e-06, 7.20041e-06
    28PrevectorResizeTrivial, 5, 90300, 3.45045, 7.62638e-06, 7.65751e-06, 7.63868e-06
    29RIPEMD160, 5, 440, 4.60618, 0.00208886, 0.00210049, 0.00209236
    30RollingBloom, 5, 1500000, 3.35536, 4.46947e-07, 4.47789e-07, 4.47367e-07
    31SHA1, 5, 570, 4.50482, 0.00157792, 0.00158378, 0.00157961
    32SHA256, 5, 340, 4.30758, 0.00250335, 0.00257137, 0.00252218
    33SHA256D64_1024, 5, 7400, 4.04927, 0.000108863, 0.00011012, 0.000109247
    34SHA256_32b, 5, 4700000, 4.50374, 1.89222e-07, 1.94606e-07, 1.90612e-07
    35SHA512, 5, 330, 3.90106, 0.00230413, 0.00246755, 0.00234779
    36SipHash_32b, 5, 40000000, 4.54047, 2.24717e-08, 2.28638e-08, 2.27357e-08
    37Sleep100ms, 5, 10, 5.0232, 0.100405, 0.100544, 0.10044
    38Trig, 5, 12000000, 0.607282, 9.90421e-09, 1.03326e-08, 1.01497e-08
    39VerifyScriptBench, 5, 6300, 2.45329, 7.70539e-05, 7.95727e-05, 7.75427e-05
    
  4. fanquake added the label Build system on Sep 3, 2019
  5. elichai commented at 0:12 am on September 3, 2019: contributor

    Build time benchmark (with ./configure --with-incompatible-bdb)

    1. Clang without LTO:
    0real    3m59.577s
    1user    54m18.507s
    2sys     2m7.528s
    
    1. Clang with LTO:
    0real    7m7.265s
    1user    97m1.044s
    2sys     2m49.948s
    
    1. GCC without LTO:
    0real    3m13.461s
    1user    37m23.436s
    2sys     2m8.695s
    
    1. GCC with LTO:
    0real    6m20.885s
    1user    40m0.574s
    2sys     2m53.457s
    
  6. fanquake commented at 0:14 am on September 3, 2019: member
    Have you read through the past (#10616, #10800) and current (#14277) discussions around enabling LTO? If not, that will likely give you a starting point for performance measurement, build system considerations etc.
  7. elichai commented at 0:18 am on September 3, 2019: contributor
    @fanquake I knew I forgot something hehe, i’ll go read them now. thanks!
  8. DrahtBot commented at 3:10 am on September 3, 2019: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #16834 (Fetch Headers over DNS by TheBlueMatt)
    • #16762 (Rust-based Backup over-REST block downloader by TheBlueMatt)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  9. practicalswift commented at 12:07 pm on September 3, 2019: contributor

    Concept ACK assuming the default is switched: LTO should be opt-in using --enable-lto to allow for risk-free experimentation and for the reason @laanwj gives in #10616 (comment):

    It should definitely not be enabled by default! Programs usually shouldn’t add non-standard compilation flags by default unless necessary.

  10. laanwj commented at 12:54 pm on September 3, 2019: member

    I think this is an interesting experiment!

    However, build-system-wise, this simply adds some compiler and linker flags, which could be passed in through CFLAGS, CXXFLAGS, CPPFLAGS, LDFLAGS environment variables. I don’t think it really belongs as a separate configure option for individual applications.

    Will leave it to @theuni though.

  11. laanwj assigned theuni on Sep 3, 2019
  12. laanwj commented at 8:45 am on October 2, 2019: member
    This is unlikely to be merged. Closing this PR. (feel free to continue discussion about LTO in the release builds, of course)
  13. laanwj closed this on Oct 2, 2019

  14. fanquake referenced this in commit 681b25e3cd on Nov 25, 2021
  15. sidhujag referenced this in commit 67658eec7d on Nov 25, 2021
  16. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me