Enable Link Time Optimizations #16791

pull elichai wants to merge 1 commits into bitcoin:master from elichai:2019-09-thinlto changing 2 files +28 −4
  1. elichai commented at 12:10 AM on September 3, 2019: contributor

    Mostly an idea, hard to measure exact benefits, if anyone has any ideas of good benchmarks for this will be welcome, altough in theory this should increase performance.

    Minimal building + bench_bitcoin benchmark in the comment.

    EDIT: As fanquake pointed out, this was brought up before and there were some issues around IBD. I personally do not think that not stripped binary sizes really matter, because we care about sizes only when shipping to others(i.e. part of a linux dist) and then we strip anyway. I will conduct IBD tests. I have a feeling that this will show good results with clang(thin lto) and not so good ones with gcc. I'll report results.

  2. Enable Link Time Optimizations 3643493b5d
  3. elichai commented at 12:10 AM on September 3, 2019: contributor

    clang without LTO run 1

    # Benchmark, evals, iterations, total, min, max, median
    AssembleBlock, 5, 700, 1.75888, 0.000499424, 0.000509562, 0.000501284
    Base58CheckEncode, 5, 320000, 4.44531, 2.76679e-06, 2.79008e-06, 2.78027e-06
    Base58Decode, 5, 800000, 2.51903, 6.27043e-07, 6.33854e-07, 6.29572e-07
    Base58Encode, 5, 470000, 4.544, 1.92153e-06, 1.9427e-06, 1.93532e-06
    Bech32Decode, 5, 800000, 1.12439, 2.77588e-07, 2.83886e-07, 2.80391e-07
    Bech32Encode, 5, 800000, 1.52587, 3.7683e-07, 3.84009e-07, 3.81521e-07
    BenchLockedPool, 5, 1300, 3.97761, 0.000601612, 0.000639539, 0.000602839
    BnBExhaustion, 5, 650, 2.60574, 0.000790225, 0.000820404, 0.000799679
    CCheckQueueSpeedPrevectorJob, 5, 1400, 13.258, 0.00183946, 0.00196889, 0.00184942
    CCoinsCaching, 5, 170000, 0.328199, 3.58821e-07, 4.01443e-07, 3.91374e-07
    CoinSelection, 5, 650, 0.572936, 0.000171902, 0.000183726, 0.000176019
    ConstructGCSFilter, 5, 1000, 8.10747, 0.00153589, 0.00174128, 0.001621
    DeserializeAndCheckBlockTest, 5, 160, 4.65824, 0.00567065, 0.00629352, 0.00569696
    DeserializeBlockTest, 5, 130, 3.08922, 0.00465626, 0.00492937, 0.00473448
    DuplicateInputs, 5, 10, 0.357063, 0.00663135, 0.00819393, 0.00668833
    FastRandom_1bit, 5, 440000000, 3.66524, 1.66211e-09, 1.67058e-09, 1.66704e-09
    FastRandom_32bit, 5, 110000000, 4.597, 8.26189e-09, 8.46674e-09, 8.3029e-09
    MatchGCSFilter, 5, 50000, 8.19405, 3.22754e-05, 3.39363e-05, 3.25489e-05
    MempoolEviction, 5, 41000, 2.87917, 1.40002e-05, 1.40963e-05, 1.4055e-05
    MerkleRoot, 5, 800, 4.37644, 0.00109208, 0.00109532, 0.00109406
    PrevectorClearNontrivial, 5, 28300, 0.000286874, 1.66226e-09, 2.55887e-09, 1.75353e-09
    PrevectorClearTrivial, 5, 88600, 0.000678416, 1.51194e-09, 1.53922e-09, 1.53875e-09
    PrevectorDeserializeNontrivial, 5, 6800, 6.14197, 0.000179907, 0.000181122, 0.000180681
    PrevectorDeserializeTrivial, 5, 52000, 5.27403, 2.02025e-05, 2.03668e-05, 2.02626e-05
    PrevectorDestructorNontrivial, 5, 28800, 0.000242584, 1.57406e-09, 1.76795e-09, 1.69417e-09
    PrevectorDestructorTrivial, 5, 88900, 0.000692209, 1.41076e-09, 2.01678e-09, 1.45669e-09
    PrevectorResizeNontrivial, 5, 28900, 1.09466, 7.54659e-06, 7.63179e-06, 7.57301e-06
    PrevectorResizeTrivial, 5, 90300, 3.63849, 8.03433e-06, 8.08772e-06, 8.04891e-06
    RIPEMD160, 5, 440, 4.87181, 0.00220603, 0.00222362, 0.00221297
    RollingBloom, 5, 1500000, 3.624, 4.71539e-07, 4.92985e-07, 4.82655e-07
    SHA1, 5, 570, 4.6708, 0.00162474, 0.00166248, 0.00163211
    SHA256, 5, 340, 4.53006, 0.00262891, 0.00274689, 0.0026462
    SHA256D64_1024, 5, 7400, 4.87046, 0.000115724, 0.000142581, 0.000135341
    SHA256_32b, 5, 4700000, 4.58475, 1.91316e-07, 2.0184e-07, 1.94727e-07
    SHA512, 5, 330, 3.97634, 0.00235566, 0.00248732, 0.00239992
    SipHash_32b, 5, 40000000, 4.72604, 2.29608e-08, 2.44476e-08, 2.33909e-08
    Sleep100ms, 5, 10, 5.01552, 0.100257, 0.100413, 0.100312
    Trig, 5, 12000000, 0.761389, 1.1107e-08, 1.3417e-08, 1.3107e-08
    VerifyScriptBench, 5, 6300, 2.60303, 7.9582e-05, 8.53183e-05, 8.21487e-05
    

    clang without lto run 2

    # Benchmark, evals, iterations, total, min, max, median
    AssembleBlock, 5, 700, 1.96617, 0.000499029, 0.00061569, 0.000555522
    Base58CheckEncode, 5, 320000, 4.58013, 2.85247e-06, 2.88703e-06, 2.85549e-06
    Base58Decode, 5, 800000, 2.59868, 6.43124e-07, 6.6978e-07, 6.45136e-07
    Base58Encode, 5, 470000, 4.73274, 1.99772e-06, 2.0328e-06, 2.01745e-06
    Bech32Decode, 5, 800000, 1.17128, 2.91664e-07, 2.94089e-07, 2.92609e-07
    Bech32Encode, 5, 800000, 1.58015, 3.93101e-07, 3.98773e-07, 3.94542e-07
    BenchLockedPool, 5, 1300, 4.17851, 0.000625214, 0.000674724, 0.000633105
    BnBExhaustion, 5, 650, 2.65366, 0.000803413, 0.000839476, 0.000811133
    CCheckQueueSpeedPrevectorJob, 5, 1400, 13.386, 0.00183121, 0.00195832, 0.00193187
    CCoinsCaching, 5, 170000, 0.33303, 3.75554e-07, 4.06653e-07, 3.9148e-07
    CoinSelection, 5, 650, 0.549849, 0.0001686, 0.000170028, 0.000169009
    ConstructGCSFilter, 5, 1000, 7.81065, 0.00153585, 0.00160931, 0.00155086
    DeserializeAndCheckBlockTest, 5, 160, 4.3975, 0.00542555, 0.00555456, 0.00549932
    DeserializeBlockTest, 5, 130, 3.01934, 0.00452266, 0.00473618, 0.0046844
    DuplicateInputs, 5, 10, 0.341157, 0.00633856, 0.00768086, 0.00638198
    FastRandom_1bit, 5, 440000000, 3.63868, 1.61861e-09, 1.70596e-09, 1.64989e-09
    FastRandom_32bit, 5, 110000000, 4.5247, 7.9982e-09, 8.44111e-09, 8.28698e-09
    MatchGCSFilter, 5, 50000, 7.74736, 3.03944e-05, 3.26011e-05, 3.06948e-05
    MempoolEviction, 5, 41000, 2.98457, 1.41979e-05, 1.50077e-05, 1.44873e-05
    MerkleRoot, 5, 800, 5.06723, 0.00113876, 0.00143356, 0.00125824
    PrevectorClearNontrivial, 5, 28300, 0.000297168, 1.73883e-09, 2.80184e-09, 2.0053e-09
    PrevectorClearTrivial, 5, 88600, 0.000623501, 1.33983e-09, 1.52511e-09, 1.40331e-09
    PrevectorDeserializeNontrivial, 5, 6800, 6.34991, 0.000179312, 0.000194279, 0.000186073
    PrevectorDeserializeTrivial, 5, 52000, 5.48437, 2.05534e-05, 2.16369e-05, 2.10327e-05
    PrevectorDestructorNontrivial, 5, 28800, 0.000243668, 1.5741e-09, 2.10503e-09, 1.57552e-09
    PrevectorDestructorTrivial, 5, 88900, 0.000728, 1.47732e-09, 2.03974e-09, 1.5748e-09
    PrevectorResizeNontrivial, 5, 28900, 1.12402, 7.61811e-06, 8.15651e-06, 7.66037e-06
    PrevectorResizeTrivial, 5, 90300, 3.74789, 7.93443e-06, 8.72624e-06, 8.28083e-06
    RIPEMD160, 5, 440, 5.06353, 0.00226185, 0.00238738, 0.00226887
    RollingBloom, 5, 1500000, 4.37212, 4.89467e-07, 7.16985e-07, 5.49323e-07
    SHA1, 5, 570, 4.97618, 0.00164691, 0.0018421, 0.00175501
    SHA256, 5, 340, 4.76886, 0.002788, 0.00284018, 0.00279755
    SHA256D64_1024, 5, 7400, 4.36494, 0.000117053, 0.000119305, 0.000117821
    SHA256_32b, 5, 4700000, 4.78085, 2.01433e-07, 2.09491e-07, 2.0206e-07
    SHA512, 5, 330, 4.02069, 0.00239671, 0.00248702, 0.00243321
    SipHash_32b, 5, 40000000, 4.7499, 2.34689e-08, 2.42086e-08, 2.35923e-08
    Sleep100ms, 5, 10, 5.01435, 0.100229, 0.100332, 0.100288
    Trig, 5, 12000000, 0.673552, 1.0744e-08, 1.22859e-08, 1.10018e-08
    VerifyScriptBench, 5, 6300, 2.60643, 8.2142e-05, 8.32055e-05, 8.27119e-05
    

    clang with lto run 1

    # Benchmark, evals, iterations, total, min, max, median
    AssembleBlock, 5, 700, 1.72164, 0.000488554, 0.000495535, 0.00049131
    Base58CheckEncode, 5, 320000, 4.15869, 2.59372e-06, 2.60309e-06, 2.60095e-06
    Base58Decode, 5, 800000, 2.17661, 5.41487e-07, 5.4834e-07, 5.44013e-07
    Base58Encode, 5, 470000, 4.14466, 1.75899e-06, 1.76838e-06, 1.76369e-06
    Bech32Decode, 5, 800000, 1.19821, 2.97339e-07, 3.00996e-07, 2.99391e-07
    Bech32Encode, 5, 800000, 1.62597, 4.03557e-07, 4.08587e-07, 4.0681e-07
    BenchLockedPool, 5, 1300, 3.89975, 0.000597339, 0.000602308, 0.000599304
    BnBExhaustion, 5, 650, 2.36543, 0.00072606, 0.000729458, 0.00072771
    CCheckQueueSpeedPrevectorJob, 5, 1400, 12.4383, 0.00173862, 0.0018336, 0.00177781
    CCoinsCaching, 5, 170000, 0.312746, 3.501e-07, 3.79826e-07, 3.68905e-07
    CoinSelection, 5, 650, 0.546785, 0.000166795, 0.000169946, 0.000168376
    ConstructGCSFilter, 5, 1000, 7.03575, 0.00140299, 0.0014086, 0.00140827
    DeserializeAndCheckBlockTest, 5, 160, 4.18461, 0.00522168, 0.00523717, 0.00523443
    DeserializeBlockTest, 5, 130, 2.87828, 0.00441814, 0.00444233, 0.00442485
    DuplicateInputs, 5, 10, 0.328547, 0.00644949, 0.00681943, 0.00654519
    FastRandom_1bit, 5, 440000000, 3.2298, 1.46282e-09, 1.4778e-09, 1.46598e-09
    FastRandom_32bit, 5, 110000000, 4.31731, 7.82775e-09, 7.86024e-09, 7.85525e-09
    MatchGCSFilter, 5, 50000, 6.96338, 2.77827e-05, 2.78858e-05, 2.7871e-05
    MempoolEviction, 5, 41000, 2.47832, 1.20621e-05, 1.21223e-05, 1.20881e-05
    MerkleRoot, 5, 800, 4.29475, 0.00106896, 0.00107777, 0.00107411
    PrevectorClearNontrivial, 5, 28300, 0.000235125, 1.41194e-09, 1.75353e-09, 1.72703e-09
    PrevectorClearTrivial, 5, 88600, 0.000636042, 1.38168e-09, 1.45128e-09, 1.44847e-09
    PrevectorDeserializeNontrivial, 5, 6800, 6.45151, 0.000188512, 0.00019084, 0.000190182
    PrevectorDeserializeTrivial, 5, 52000, 5.1855, 1.99341e-05, 1.99596e-05, 1.99413e-05
    PrevectorDestructorNontrivial, 5, 28800, 0.000230332, 1.57406e-09, 1.6334e-09, 1.58128e-09
    PrevectorDestructorTrivial, 5, 88900, 0.000636916, 1.39389e-09, 1.51434e-09, 1.42107e-09
    PrevectorResizeNontrivial, 5, 28900, 1.06484, 7.31545e-06, 7.40119e-06, 7.37174e-06
    PrevectorResizeTrivial, 5, 90300, 3.55419, 7.8563e-06, 7.88425e-06, 7.87187e-06
    RIPEMD160, 5, 440, 4.72668, 0.00214663, 0.00215083, 0.00214833
    RollingBloom, 5, 1500000, 3.433, 4.5134e-07, 4.60353e-07, 4.58777e-07
    SHA1, 5, 570, 4.62547, 0.00162106, 0.00162443, 0.00162312
    SHA256, 5, 340, 4.44984, 0.00260789, 0.00263764, 0.00261407
    SHA256D64_1024, 5, 7400, 4.23212, 0.000114082, 0.000114491, 0.000114457
    SHA256_32b, 5, 4700000, 4.30349, 1.81287e-07, 1.85391e-07, 1.81894e-07
    SHA512, 5, 330, 3.91958, 0.002371, 0.00238079, 0.00237517
    SipHash_32b, 5, 40000000, 4.53516, 2.26558e-08, 2.27179e-08, 2.26719e-08
    Sleep100ms, 5, 10, 5.02578, 0.100485, 0.100544, 0.100521
    Trig, 5, 12000000, 0.621274, 9.54652e-09, 1.06749e-08, 1.05556e-08
    VerifyScriptBench, 5, 6300, 2.58936, 8.19543e-05, 8.24921e-05, 8.2189e-05
    

    clang with lto run 2

    # Benchmark, evals, iterations, total, min, max, median
    AssembleBlock, 5, 700, 1.99427, 0.000493252, 0.000665168, 0.000559993
    Base58CheckEncode, 5, 320000, 5.05568, 3.06908e-06, 3.25902e-06, 3.13718e-06
    Base58Decode, 5, 800000, 2.95883, 6.62271e-07, 7.67973e-07, 7.58572e-07
    Base58Encode, 5, 470000, 5.0744, 1.94787e-06, 2.43021e-06, 2.21603e-06
    Bech32Decode, 5, 800000, 1.36161, 3.20821e-07, 3.8574e-07, 3.31155e-07
    Bech32Encode, 5, 800000, 1.87327, 4.1505e-07, 5.15584e-07, 4.81936e-07
    BenchLockedPool, 5, 1300, 4.97714, 0.000717744, 0.000807546, 0.000761353
    BnBExhaustion, 5, 650, 2.9194, 0.000846583, 0.000942232, 0.000902514
    CCheckQueueSpeedPrevectorJob, 5, 1400, 13.2917, 0.00177237, 0.00208845, 0.0018112
    CCoinsCaching, 5, 170000, 0.312361, 3.58964e-07, 3.74724e-07, 3.69679e-07
    CoinSelection, 5, 650, 0.544713, 0.000166541, 0.000169268, 0.000167463
    ConstructGCSFilter, 5, 1000, 7.76531, 0.00150814, 0.0016057, 0.0015303
    DeserializeAndCheckBlockTest, 5, 160, 4.32749, 0.00533709, 0.00547389, 0.00540589
    DeserializeBlockTest, 5, 130, 2.9427, 0.00449615, 0.00455946, 0.00452504
    DuplicateInputs, 5, 10, 0.31676, 0.0063014, 0.00635066, 0.0063463
    FastRandom_1bit, 5, 440000000, 3.50853, 1.58325e-09, 1.60725e-09, 1.59132e-09
    FastRandom_32bit, 5, 110000000, 4.42874, 7.95085e-09, 8.17166e-09, 8.05469e-09
    MatchGCSFilter, 5, 50000, 8.16707, 3.12369e-05, 3.47419e-05, 3.22018e-05
    MempoolEviction, 5, 41000, 3.01924, 1.31778e-05, 1.61007e-05, 1.46303e-05
    MerkleRoot, 5, 800, 4.27195, 0.00103943, 0.00113243, 0.00106114
    PrevectorClearNontrivial, 5, 28300, 0.000258918, 1.49e-09, 2.61336e-09, 1.69908e-09
    PrevectorClearTrivial, 5, 88600, 0.000699667, 1.44422e-09, 1.75696e-09, 1.53358e-09
    PrevectorDeserializeNontrivial, 5, 6800, 5.88229, 0.000172483, 0.000173993, 0.000172633
    PrevectorDeserializeTrivial, 5, 52000, 5.05375, 1.94089e-05, 1.94787e-05, 1.94224e-05
    PrevectorDestructorNontrivial, 5, 28800, 0.000259292, 1.57552e-09, 2.55642e-09, 1.63194e-09
    PrevectorDestructorTrivial, 5, 88900, 0.00065175, 1.39061e-09, 1.57434e-09, 1.44263e-09
    PrevectorResizeNontrivial, 5, 28900, 1.03944, 7.14281e-06, 7.25415e-06, 7.20041e-06
    PrevectorResizeTrivial, 5, 90300, 3.45045, 7.62638e-06, 7.65751e-06, 7.63868e-06
    RIPEMD160, 5, 440, 4.60618, 0.00208886, 0.00210049, 0.00209236
    RollingBloom, 5, 1500000, 3.35536, 4.46947e-07, 4.47789e-07, 4.47367e-07
    SHA1, 5, 570, 4.50482, 0.00157792, 0.00158378, 0.00157961
    SHA256, 5, 340, 4.30758, 0.00250335, 0.00257137, 0.00252218
    SHA256D64_1024, 5, 7400, 4.04927, 0.000108863, 0.00011012, 0.000109247
    SHA256_32b, 5, 4700000, 4.50374, 1.89222e-07, 1.94606e-07, 1.90612e-07
    SHA512, 5, 330, 3.90106, 0.00230413, 0.00246755, 0.00234779
    SipHash_32b, 5, 40000000, 4.54047, 2.24717e-08, 2.28638e-08, 2.27357e-08
    Sleep100ms, 5, 10, 5.0232, 0.100405, 0.100544, 0.10044
    Trig, 5, 12000000, 0.607282, 9.90421e-09, 1.03326e-08, 1.01497e-08
    VerifyScriptBench, 5, 6300, 2.45329, 7.70539e-05, 7.95727e-05, 7.75427e-05
    
  4. fanquake added the label Build system on Sep 3, 2019
  5. elichai commented at 12:12 AM on September 3, 2019: contributor

    Build time benchmark (with ./configure --with-incompatible-bdb)

    1. Clang without LTO:
    real    3m59.577s
    user    54m18.507s
    sys     2m7.528s
    
    1. Clang with LTO:
    real    7m7.265s
    user    97m1.044s
    sys     2m49.948s
    
    1. GCC without LTO:
    real    3m13.461s
    user    37m23.436s
    sys     2m8.695s
    
    1. GCC with LTO:
    real    6m20.885s
    user    40m0.574s
    sys     2m53.457s
    
  6. fanquake commented at 12:14 AM on September 3, 2019: member

    Have you read through the past (#10616, #10800) and current (#14277) discussions around enabling LTO? If not, that will likely give you a starting point for performance measurement, build system considerations etc.

  7. elichai commented at 12:18 AM on September 3, 2019: contributor

    @fanquake I knew I forgot something hehe, i'll go read them now. thanks!

  8. DrahtBot commented at 3:10 AM on September 3, 2019: member

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #16834 (Fetch Headers over DNS by TheBlueMatt)
    • #16762 (Rust-based Backup over-REST block downloader by TheBlueMatt)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  9. practicalswift commented at 12:07 PM on September 3, 2019: contributor

    Concept ACK assuming the default is switched: LTO should be opt-in using --enable-lto to allow for risk-free experimentation and for the reason @laanwj gives in #10616 (comment):

    It should definitely not be enabled by default! Programs usually shouldn't add non-standard compilation flags by default unless necessary.

  10. laanwj commented at 12:54 PM on September 3, 2019: member

    I think this is an interesting experiment!

    However, build-system-wise, this simply adds some compiler and linker flags, which could be passed in through CFLAGS, CXXFLAGS, CPPFLAGS, LDFLAGS environment variables. I don't think it really belongs as a separate configure option for individual applications.

    Will leave it to @theuni though.

  11. laanwj assigned theuni on Sep 3, 2019
  12. laanwj commented at 8:45 AM on October 2, 2019: member

    This is unlikely to be merged. Closing this PR. (feel free to continue discussion about LTO in the release builds, of course)

  13. laanwj closed this on Oct 2, 2019

  14. fanquake referenced this in commit 681b25e3cd on Nov 25, 2021
  15. sidhujag referenced this in commit 67658eec7d on Nov 25, 2021
  16. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-17 09:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me