build: add --enable-lto configuration option #23152

pull fanquake wants to merge 1 commits into bitcoin:master from fanquake:experiment_with_lto changing 2 files +17 −4
  1. fanquake commented at 7:35 am on October 1, 2021: member

    It’s been 5 years since using LTO was first suggested for use when building Bitcoin Core, and it’s time to revisit it again. Compilers, and their LTO implementations, have matured, and Bitcoin Core has come a long way in terms of pruning dependencies which may have proved troublesome (i.e Boost previously had issues when using LTO). We’ll have even less Boost code after moving to std::filesystem (#20744).

    Experimenting with LTO came up on IRC last night:

    sipa: jamesob: i’m interested in knowing whether “-flto” and/or “-fdata-sections -ffunction-sections -Wl,–gc-sections” are possible/beneficial with our current compiler suite; what would be a good way to have your test infrastructure benchmark things?

    So this PR just adds the bare minimum to make it easier to configure, compile and perform some bench-marking using -flto. This PR doesn’t do anything depends wise, however if we decide this is what we want to do, I’ll expand the changes here.

    I had previously had a PR open (#18605) to perform link time garbage collection (-ffunction-sections -fdata-sections & -Wl,--gc-sections), however moving straight to using LTO would be preferable.

    Note that our minimum required set of compilers, GCC 8.1 and Clang 7, all support the -flto option.

    Related #18579. Previous discussion: #10616, #14277. Previous related PRs: #10800 (-flto), #16791 (ThinLTO).

    Guix build:

     0bash-5.1# find guix-build-$(git rev-parse --short=12 HEAD)/output/ -type f -print0 | env LC_ALL=C sort -z | xargs -r0 sha256sum
     11f3a7c5be4169aaa444b481d3e65a7bb72da9007fee6e6c416ded2e70f97374b  guix-build-68e5aafde3e8/output/aarch64-linux-gnu/SHA256SUMS.part
     2fa8f4cf223d9aaf0b2c1ef55ce61256a19cd1ad7f42b99d0b98c9a52fe6ad8ba  guix-build-68e5aafde3e8/output/aarch64-linux-gnu/bitcoin-68e5aafde3e8-aarch64-linux-gnu-debug.tar.gz
     39a9967078cd1849b4e85db619e1f55d305c6d44e9e013067c0e8d62c1ba54087  guix-build-68e5aafde3e8/output/aarch64-linux-gnu/bitcoin-68e5aafde3e8-aarch64-linux-gnu.tar.gz
     418c71f30722102baaf3dfda67f7c7aac38723510b142e8df8ee7063c5d499368  guix-build-68e5aafde3e8/output/arm-linux-gnueabihf/SHA256SUMS.part
     50854cc0d17c045a118df2a24e4cf36d727e7e7e2dea37c2492ee21b71cb79b4b  guix-build-68e5aafde3e8/output/arm-linux-gnueabihf/bitcoin-68e5aafde3e8-arm-linux-gnueabihf-debug.tar.gz
     6215256897dde4e8412ed60473376c694a80c5479fb08039107fb62435f2816ef  guix-build-68e5aafde3e8/output/arm-linux-gnueabihf/bitcoin-68e5aafde3e8-arm-linux-gnueabihf.tar.gz
     75fad0d9d12bc514ec46ed5d66fd29b7da1376a4a69c3b692936f1ab2356e2f85  guix-build-68e5aafde3e8/output/dist-archive/bitcoin-68e5aafde3e8.tar.gz
     84f32989d4ab1946048ca7caee9a983fa875be262282562f5a3e040f4bf92158e  guix-build-68e5aafde3e8/output/powerpc64-linux-gnu/SHA256SUMS.part
     9ae45df309ae8ada52891efac0a369a69fed4ab93847a7bc4150a62230df4c8d7  guix-build-68e5aafde3e8/output/powerpc64-linux-gnu/bitcoin-68e5aafde3e8-powerpc64-linux-gnu-debug.tar.gz
    100ced227de15cb578567131271e2effe80681b4d7a436c92bf1caec735a576fa4  guix-build-68e5aafde3e8/output/powerpc64-linux-gnu/bitcoin-68e5aafde3e8-powerpc64-linux-gnu.tar.gz
    1126fc5d2ccc1bc17ee0a146cacada6f4909d90c136ae640c8337332adce414ee0  guix-build-68e5aafde3e8/output/powerpc64le-linux-gnu/SHA256SUMS.part
    129956b544d90a62a8ba9fc9dc6b6b7f0efe193357332ec19e88053a89d4aab37e  guix-build-68e5aafde3e8/output/powerpc64le-linux-gnu/bitcoin-68e5aafde3e8-powerpc64le-linux-gnu-debug.tar.gz
    13be8e39ceea1d36086ce5fa93bfb138c68d3bdf0dd6950b192dfa27a65cce3836  guix-build-68e5aafde3e8/output/powerpc64le-linux-gnu/bitcoin-68e5aafde3e8-powerpc64le-linux-gnu.tar.gz
    14a7755edc394972885c4c77a7798007e5ba4126b177c4ff6224275c4fb8f3b1c4  guix-build-68e5aafde3e8/output/riscv64-linux-gnu/SHA256SUMS.part
    15b6d252993d8aae7582ad6385fe53c61c54c284c68ece6cb2b2d1ac9554e06139  guix-build-68e5aafde3e8/output/riscv64-linux-gnu/bitcoin-68e5aafde3e8-riscv64-linux-gnu-debug.tar.gz
    16bb4860f3bbd815f800333124ff901d880741792ab47097f49bda3a6931144da0  guix-build-68e5aafde3e8/output/riscv64-linux-gnu/bitcoin-68e5aafde3e8-riscv64-linux-gnu.tar.gz
    173dd17deed5c5935fb28b62dfc7afca5caab0d67862cdcbf3337edae73e1d0c4c  guix-build-68e5aafde3e8/output/x86_64-apple-darwin19/SHA256SUMS.part
    18fa2d68c54fda0816188c81ce2201a77340b82645da2ffe412526f92c297a82df  guix-build-68e5aafde3e8/output/x86_64-apple-darwin19/bitcoin-68e5aafde3e8-osx-unsigned.dmg
    19f6e5accdcd201f522b6426e4d8cc9b3643d4d43a57d268fa0e79ea9a34cfac01  guix-build-68e5aafde3e8/output/x86_64-apple-darwin19/bitcoin-68e5aafde3e8-osx-unsigned.tar.gz
    204e5a127df957d1c73b65925d685f6620e7bc5667efcb6dcd98be76effc22fc12  guix-build-68e5aafde3e8/output/x86_64-apple-darwin19/bitcoin-68e5aafde3e8-osx64.tar.gz
    2156ccd216a69acafacbdc6bae0bdcc1faa50b6a51be1aebfa7068206c88b3241a  guix-build-68e5aafde3e8/output/x86_64-linux-gnu/SHA256SUMS.part
    2277b93dd5fad322636853e5b0244ffafd97cc97f3b4b4ee755d5f830b75d77d13  guix-build-68e5aafde3e8/output/x86_64-linux-gnu/bitcoin-68e5aafde3e8-x86_64-linux-gnu-debug.tar.gz
    231feda932fc127b900316a232432b91e46e57ee12a81e12a7d888fdc3296219c1  guix-build-68e5aafde3e8/output/x86_64-linux-gnu/bitcoin-68e5aafde3e8-x86_64-linux-gnu.tar.gz
    24aa7c53ab4164b3736049065c3c24391fc5bd7f26b4bda4aa877c378f0636a125  guix-build-68e5aafde3e8/output/x86_64-w64-mingw32/SHA256SUMS.part
    255e76148e67aef7e91e70074bfadc08e94373449ac3b966f4343b04d230c778fd  guix-build-68e5aafde3e8/output/x86_64-w64-mingw32/bitcoin-68e5aafde3e8-win-unsigned.tar.gz
    2634123e3d818beeb70113caeda66945bc7cb9d9e987515d5b149bd17b4b38da90  guix-build-68e5aafde3e8/output/x86_64-w64-mingw32/bitcoin-68e5aafde3e8-win64-debug.zip
    272bba7f40a2b23c6ea3d47c4f564ab54201bf27f7f57103a98cc9bceea4e70c4d  guix-build-68e5aafde3e8/output/x86_64-w64-mingw32/bitcoin-68e5aafde3e8-win64-setup-unsigned.exe
    280e7e124144af4a92a4344cf70a3b7c06fbd2b8782aee7ede7263893afa3a5ef0  guix-build-68e5aafde3e8/output/x86_64-w64-mingw32/bitcoin-68e5aafde3e8-win64.zip
    
  2. fanquake added the label Build system on Oct 1, 2021
  3. laanwj commented at 7:54 am on October 1, 2021: member

    Concept ACK, conditional on that we are going to try make this be default for the release binaries (I don’t mean in this PR). I’m generally not a fan of configure options that simply add cflags/linkerflags, but I think that’s something for which to make an exception.

    I had previously had a PR open to perform link time garbage collection (-ffunction-sections -fdata-sections -Wl,–gc-sections), in #18605, however moving straight to using LTO would be preferable.

    Agree. Function/data garbage collection is mostly an executable size concern, it doesn’t, besides possibly better (and sometimes worse!) cache access patterns, affect performance.

  4. fanquake commented at 8:20 am on October 1, 2021: member

    Concept ACK, conditional on that we are going to try make this be default for the release binaries

    I’ve added --enable-lto to the configure for the Guix build, so it’s easier for anyone to test/run those now as well.

    EDIT: Looks like this will need some changes to the Guix toolchains.

  5. MarcoFalke added the label DrahtBot Guix build requested on Oct 1, 2021
  6. sipa commented at 1:34 pm on October 1, 2021: member

    Is there a possibility to use -flto=jobserver? That makes GCC use make’s parallel scheduler (so the argument to -j is available for multiple parallel compilations). It requires the linker’s command in Makefile to be prepended with +.

    Alternatively, some way of setting N for -flto=N. Doing the entire LTO stage single-threadedly (the default, I think) would be very slow, especially on machines with lots of cores.

    Of course, none of this is a requirement to evaluate whether we want LTO or not.

  7. laanwj commented at 2:07 pm on October 1, 2021: member

    EDIT: Looks like this will need some changes to the Guix toolchains.

    Which reminds me—should we enable lto for the depends build as well? I guess it’s another separate decision, but it would allow for the most optimization opportunities.

  8. practicalswift commented at 2:30 pm on October 1, 2021: contributor

    Concept ACK

    Some LTO results from measurements made back in 2018 can be found in #14277 :)

  9. jamesob commented at 2:51 pm on October 1, 2021: member
    Concept ACK
  10. sipa commented at 3:03 pm on October 1, 2021: member
    There seems to be a patch for automake to support -flto=jobserver, but it’s not yet accepted: https://www.mail-archive.com/automake-patches@gnu.org/msg07973.html
  11. martinus commented at 10:23 pm on October 1, 2021: contributor

    I ran all of the benchmarks with clang++ 12.0.1, g++ 11.1.0 with and without --enable-lto on my Intel i7-8700. Some benchmarks results change quite a lot, and I also didn’t expect that clang++ and g++ can behave so differently. But microbenchmark results should always be taken with a grain of salt.

    clang++ ns/op clang++ lto ns/op g++ ns/op g++ lto ns/op benchmark
    44,850,331.00 41,649,957.00 45,752,245.00 44,851,254.00 AddrManAdd
    115,908,612.00 108,865,250.00 118,227,091.00 115,271,712.00 AddrManAddThenGood
    343,083.67 317,334.33 357,469.33 362,177.00 AddrManGetAddr
    185.81 169.72 192.36 187.05 AddrManSelect
    389,690.00 369,410.50 392,837.50 432,549.00 AssembleBlock
    clang++ ns/byte clang++ lto ns/byte g++ ns/byte g++ lto ns/byte benchmark
    110.74 111.10 69.79 70.17 Base58CheckEncode
    18.69 23.26 24.59 24.62 Base58Decode
    76.33 77.01 41.60 42.04 Base58Encode
    9.05 8.63 9.56 9.16 Bech32Decode
    19.08 18.32 19.37 20.05 Bech32Encode
    clang++ ns/op clang++ lto ns/op g++ ns/op g++ lto ns/op benchmark
    157.37 152.55 172.30 174.80 BenchLockedPool
    4.09 2.50 4.38 5.02 BenchTimeDeprecated
    24.05 20.26 24.00 20.58 BenchTimeMillis
    21.95 20.16 22.97 20.63 BenchTimeMillisSys
    2.21 0.31 1.56 0.63 BenchTimeMock
    62,958,650.00 62,619,805.00 59,419,903.00 62,824,850.00 BlockToJsonVerbose
    25,195,748.00 25,594,730.00 30,520,213.00 30,058,971.00 BlockToJsonVerboseWrite
    1,028,430.00 1,084,184.00 1,233,630.00 1,286,939.00 BnBExhaustion
    clang++ ns/job clang++ lto ns/job g++ ns/job g++ lto ns/job benchmark
    902.03 812.48 822.47 814.88 CCheckQueueSpeedPrevectorJob
    clang++ ns/op clang++ lto ns/op g++ ns/op g++ lto ns/op benchmark
    329.16 298.86 354.83 347.61 CCoinsCaching
    clang++ ns/byte clang++ lto ns/byte g++ ns/byte g++ lto ns/byte benchmark
    1.98 1.93 1.93 1.92 CHACHA20_1MB
    1.97 1.94 1.97 1.95 CHACHA20_256BYTES
    2.06 2.04 2.07 2.06 CHACHA20_64BYTES
    5.58 5.54 5.54 5.51 CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT
    2.74 2.79 2.77 2.76 CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT
    7.02 6.95 7.61 7.61 CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT
    3.53 3.47 3.82 3.78 CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT
    11.51 11.25 13.91 14.06 CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT
    5.80 5.68 6.94 6.95 CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT
    clang++ ns/op clang++ lto ns/op g++ ns/op g++ lto ns/op benchmark
    547,191.50 542,479.50 559,722.50 552,408.50 CoinSelection
    349,819,707.00 339,765,062.00 307,623,521.00 286,347,434.00 ComplexMemPool
    clang++ ns/elem clang++ lto ns/elem g++ ns/elem g++ lto ns/elem benchmark
    183.01 171.14 159.75 159.41 ConstructGCSFilter
    clang++ ns/block clang++ lto ns/block g++ ns/block g++ lto ns/block benchmark
    6,413,984.00 6,106,104.00 6,165,040.00 6,470,182.00 DeserializeAndCheckBlockTest
    5,317,655.00 5,157,372.00 5,142,803.00 5,225,309.00 DeserializeBlockTest
    clang++ ns/op clang++ lto ns/op g++ ns/op g++ lto ns/op benchmark
    7,837,080.00 7,805,025.00 7,807,746.00 7,867,452.00 DuplicateInputs
    10,777.09 8,604.29 13,273.04 10,657.84 EvictionProtection0Networks250Candidates
    16,693.46 12,104.46 17,841.92 14,626.06 EvictionProtection1Networks250Candidates
    22,453.76 18,138.77 23,554.42 20,238.65 EvictionProtection2Networks250Candidates
    3,773.30 3,281.87 4,258.64 3,979.64 EvictionProtection3Networks050Candidates
    10,160.02 8,873.71 10,910.08 10,462.54 EvictionProtection3Networks100Candidates
    24,539.60 20,099.13 26,483.08 23,736.26 EvictionProtection3Networks250Candidates
    1.78 2.18 1.77 1.77 FastRandom_1bit
    10.25 10.22 9.49 9.20 FastRandom_32bit
    clang++ ns/byte clang++ lto ns/byte g++ ns/byte g++ lto ns/byte benchmark
    3.21 3.20 3.18 3.19 HASH_1MB
    5.02 4.88 5.01 4.99 HASH_256BYTES
    10.50 10.02 10.52 10.41 HASH_64BYTES
    clang++ ns/elem clang++ lto ns/elem g++ ns/elem g++ lto ns/elem benchmark
    43,724.23 23,583.33 28,537.38 26,545.66 MatchGCSFilter
    clang++ ns/op clang++ lto ns/op g++ ns/op g++ lto ns/op benchmark
    24,054.61 23,268.32 24,299.66 23,389.98 MempoolEviction
    clang++ ns/leaf clang++ lto ns/leaf g++ ns/leaf g++ lto ns/leaf benchmark
    142.35 139.38 138.71 238.02 MerkleRoot
    clang++ ns/op clang++ lto ns/op g++ ns/op g++ lto ns/op benchmark
    6,416.92 6,543.58 8,118.72 8,013.09 MuHash
    5,345.43 5,501.68 7,129.37 6,946.70 MuHashDiv
    5,370.88 5,496.21 7,062.38 6,948.19 MuHashMul
    1,050.07 1,031.97 1,028.19 978.20 MuHashPrecompute
    clang++ ns/byte clang++ lto ns/byte g++ ns/byte g++ lto ns/byte benchmark
    0.79 0.79 0.83 0.83 POLY1305_1MB
    0.87 0.88 0.89 0.89 POLY1305_256BYTES
    1.08 1.08 1.09 1.09 POLY1305_64BYTES
    clang++ ns/op clang++ lto ns/op g++ ns/op g++ lto ns/op benchmark
    233.72 218.40 229.82 229.67 PrePadded
    4.96 8.15 15.31 15.00 PrevectorClearNontrivial
    5.65 5.43 4.55 4.54 PrevectorClearTrivial
    327.33 316.28 114.71 123.24 PrevectorDeserializeNontrivial
    17.38 16.27 10.55 14.19 PrevectorDeserializeTrivial
    10.30 7.97 7.23 7.04 PrevectorDestructorNontrivial
    10.27 7.32 7.32 7.19 PrevectorDestructorTrivial
    2.28 3.98 7.58 7.53 PrevectorResizeNontrivial
    2.74 2.73 2.14 2.26 PrevectorResizeTrivial
    clang++ ns/byte clang++ lto ns/byte g++ ns/byte g++ lto ns/byte benchmark
    2.54 2.56 2.52 2.53 RIPEMD160
    clang++ ns/op clang++ lto ns/op g++ ns/op g++ lto ns/op benchmark
    443.48 427.45 441.40 441.89 RegularPadded
    537.25 486.07 493.83 431.17 RollingBloom
    32,004.94 33,067.62 35,217.81 33,502.58 RollingBloomReset
    10,805,603.00 10,352,368.00 10,300,251.00 10,862,213.00 RpcMempool
    clang++ ns/byte clang++ lto ns/byte g++ ns/byte g++ lto ns/byte benchmark
    1.92 1.85 1.87 1.88 SHA1
    3.21 3.19 3.18 3.19 SHA256
    2.05 2.04 2.04 3.58 SHA256D64_1024
    7.37 6.88 7.20 7.16 SHA256_32b
    4.11 4.03 4.01 4.02 SHA3_256_1M
    3.07 2.96 2.94 4.42 SHA512
    clang++ ns/op clang++ lto ns/op g++ ns/op g++ lto ns/op benchmark
    29.88 29.51 28.23 28.05 SipHash_32b
    10.09 9.84 8.68 8.97 Trig
    83,430.64 77,216.85 83,367.50 88,736.58 VerifyNestedIfScript
    96,481.36 94,740.30 97,176.62 96,895.70 VerifyScriptBench
    37,446.16 38,476.75 40,102.65 36,176.27 WalletBalanceClean
    254,368.50 241,829.00 278,516.50 262,199.75 WalletBalanceDirty
    18,055.69 18,901.00 19,411.54 17,680.53 WalletBalanceMine
    36,783.80 38,399.57 39,947.21 36,207.50 WalletBalanceWatch
  12. sipa commented at 11:16 pm on October 1, 2021: member
    @martinus Man, that’s a wildly inconsistent set of differences… I don’t see anything really dramatic, though.
  13. martinus commented at 10:43 am on October 2, 2021: contributor

    I’ve now calculated the geometric mean of all the benchmark results:

    geomean of runtime (lower is better) compiler
    500.60 clang++
    470.27 clang++ lto
    501.78 g++
    498.56 g++ lto

    So on average clang++ seems to benefit in the benchmarks, but for g++ the change is not significant. I’d say one needs to do more real world benchmarks to see if it’s a benefit.

    One other interesting observeration: When compiling with --enable-lto g++ seems to be able to detect more problems. E.g. it found this:

    0In member function 'operator=',
    1    inlined from 'Seed' at test/util/setup_common.cpp:66:33,
    2    inlined from 'SeedInsecureRand' at ./test/util/setup_common.h:61:13,
    3    inlined from '__ct_base ' at test/util/setup_common.cpp:105:21:
    4random.cpp:702:19: warning: 'D.34682.bitbuf' may be used uninitialized [-Wmaybe-uninitialized]
    5  702 |     bitbuf = from.bitbuf;
    6      |                   ^
    7test/util/setup_common.cpp: In member function '__ct_base ':
    8test/util/setup_common.cpp:66:33: note: '<anonymous>' declared here
    9   66 |     ctx = FastRandomContext(seed);
    

    The FastRandomContext(const uint256& seed) constructor does not initialize the uint64_t bitbuf member, and in FastRandomContext& operator=(FastRandomContext&& from) noexcept; the uninitialized variable is copied with bitbuf = from.bitbuf; which technically is undefined behavior.

  14. DrahtBot removed the label DrahtBot Guix build requested on Oct 2, 2021
  15. practicalswift commented at 8:24 am on October 3, 2021: contributor

    One other interesting observeration: When compiling with --enable-lto g++ seems to be able to detect more problems. E.g. it found this:

    0In member function 'operator=',
    1    inlined from 'Seed' at test/util/setup_common.cpp:66:33,
    2    inlined from 'SeedInsecureRand' at ./test/util/setup_common.h:61:13,
    3    inlined from '__ct_base ' at test/util/setup_common.cpp:105:21:
    4random.cpp:702:19: warning: 'D.34682.bitbuf' may be used uninitialized [-Wmaybe-uninitialized]
    5  702 |     bitbuf = from.bitbuf;
    6      |                   ^
    7test/util/setup_common.cpp: In member function '__ct_base ':
    8test/util/setup_common.cpp:66:33: note: '<anonymous>' declared here
    9   66 |     ctx = FastRandomContext(seed);
    

    The FastRandomContext(const uint256& seed) constructor does not initialize the uint64_t bitbuf member, and in FastRandomContext& operator=(FastRandomContext&& from) noexcept; the uninitialized variable is copied with bitbuf = from.bitbuf; which technically is undefined behavior.

    Great find @martinus!

    Enabling more intelligent compiler reasoning is a very good reason for LTO :)

    After some testing I can confirm that the uninitialized read above is reachable from testing code.

    Can be verified by applying the following patch:

     0diff --git a/src/random.cpp b/src/random.cpp
     1index 174f4cef3..73e946783 100644
     2--- a/src/random.cpp
     3+++ b/src/random.cpp
     4@@ -693,13 +693,30 @@ FastRandomContext::FastRandomContext(bool fDeterministic) noexcept : requires_se
     5     rng.SetKey(seed.begin(), 32);
     6 }
     7
     8+// Force Valgrind use-of-uninitialized memory (UUM) violation if `o` is uninitialized.
     9+//
    10+// As suggested by [@guidovranken](/bitcoin-bitcoin/contributor/guidovranken/) in [#22064](/bitcoin-bitcoin/22064/)
    11+template<typename T>
    12+void ForceValgrindWarningIfUninitialized(const T& o) {
    13+    static_assert(std::is_trivially_copyable<T>::value);
    14+    FILE* f = fopen("/dev/null", "wb");
    15+    fwrite(&o, sizeof(o), 1, f);
    16+    fclose(f);
    17+}
    18+
    19 FastRandomContext& FastRandomContext::operator=(FastRandomContext&& from) noexcept
    20 {
    21+    ForceValgrindWarningIfUninitialized(from.requires_seed);
    22     requires_seed = from.requires_seed;
    23+    ForceValgrindWarningIfUninitialized(from.rng);
    24     rng = from.rng;
    25+    ForceValgrindWarningIfUninitialized(from.bytebuf);
    26     std::copy(std::begin(from.bytebuf), std::end(from.bytebuf), std::begin(bytebuf));
    27+    ForceValgrindWarningIfUninitialized(from.bytebuf_size);
    28     bytebuf_size = from.bytebuf_size;
    29+    ForceValgrindWarningIfUninitialized(from.bitbuf);
    30     bitbuf = from.bitbuf;
    31+    ForceValgrindWarningIfUninitialized(from.bitbuf_size);
    32     bitbuf_size = from.bitbuf_size;
    33     from.requires_seed = true;
    34     from.bytebuf_size = 0;
    

    And running valgrind src/test/test_bitcoin -t addrman_tests/addrman_simple:

     0==22209== Memcheck, a memory error detector
     1
     2Running 1 test case...
     3==22209== Syscall param write(buf) points to uninitialised byte(s)
     4==22209==    at 0x736A264: write (write.c:27)
     5==22209==    by 0x72E522C: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1203)
     6==22209==    by 0x72E6FC0: new_do_write (fileops.c:457)
     7==22209==    by 0x72E6FC0: _IO_do_write@@GLIBC_2.2.5 (fileops.c:433)
     8==22209==    by 0x72E637F: _IO_file_close_it@@GLIBC_2.2.5 (fileops.c:136)
     9==22209==    by 0x72D83F6: fclose@@GLIBC_2.2.5 (iofclose.c:53)
    10==22209==    by 0xDA86B7: ForceValgrindWarningIfUninitialized<unsigned char [64]> (random.cpp:704)
    11==22209==    by 0xDA86B7: FastRandomContext::operator=(FastRandomContext&&) (random.cpp:713)
    12==22209==    by 0x9CFD67: Seed(FastRandomContext&) (setup_common.cpp:66)
    13==22209==    by 0x9D066B: SeedInsecureRand (setup_common.h:61)
    14==22209==    by 0x9D066B: BasicTestingSetup::BasicTestingSetup(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<char const*, std::allocator<char const*> > const&) (setup_common.cpp:105)
    15==22209==    by 0x3AB6BE: addrman_simple (addrman_tests.cpp:166)
    16
    
  16. martinus commented at 4:48 am on October 4, 2021: contributor

    I ran a few -reindex-chainstate benchmarks overnight on my Intel i7 CPU:

    Running -assumevalid=00000000000000000002a23d6df20eecec15b21d32c75833cce28f113de888b7 -reindex-chainstate -stopatheight=400000 -dbcache=4000

    user [sec] system [sec] total [sec] total relativ compiler
    1863.15 69.09 1932.24 100.00% g++
    1852.46 68.42 1920.88 99.41% g++ lto
    1827.73 69.73 1897.46 98.20% clang++
    1788.40 70.35 1858.75 96.20% clang++ lto

    Running -assumevalid=0 -reindex-chainstate -stopatheight=400000 -dbcache=4000

    user [sec] system [sec] total [sec] total relative compiler
    35243.08 158.50 35401.58 100.00% g++
    35491.77 154.54 35646.30 100.69% g++ lto
    32663.09 158.60 32821.68 92.71% clang++
    32184.18 154.70 32338.88 91.35% clang++ lto

    My takeaways:

    • --enable-lto seems to make g++ 11.1.0 builds slightly slower (but the change is not significant)
    • --enable-lto makes clang++ 12.0.1 builds slightly faster
    • clang 12.0.1 optimizes significantly better than g++ 11.1, especially when validating scripts (-asumevalid=0) This is in line with what this phoronix article found
  17. martinus referenced this in commit 98d1d7afff on Oct 4, 2021
  18. martinus referenced this in commit 749e88eb86 on Oct 4, 2021
  19. fanquake force-pushed on Oct 4, 2021
  20. fanquake commented at 6:23 am on October 4, 2021: member

    @martinus thanks for the testing so far.

    For more benchmarking / testing, I’ve added an additional commit which adds a --enable-thin-lto flag. Note that this is Clang only, and will run with much more parallelism than fat LTO.

    I’ve also added a commit that will partially fix using LTO in the Guix builds. After discussing with @dongcarl , we’ve realized that using the gcc-* wrappers for ar, ranlib etc is “better” when using LTO, as the required --plugin arguments will be setup automatically. I’ve successfully completed a x86_64-linux-gnu Guix build, with LTO, with this change.

  21. MarcoFalke added the label DrahtBot Guix build requested on Oct 4, 2021
  22. DrahtBot removed the label DrahtBot Guix build requested on Oct 5, 2021
  23. in contrib/guix/libexec/build.sh:240 in 7a4758883e outdated
    236@@ -237,7 +237,7 @@ mkdir -p "$OUTDIR"
    237 ###########################
    238 
    239 # CONFIGFLAGS
    240-CONFIGFLAGS="--enable-reduce-exports --disable-bench --disable-gui-tests --disable-fuzz-binary"
    241+CONFIGFLAGS="--enable-lto --enable-reduce-exports --disable-bench --disable-gui-tests --disable-fuzz-binary"
    


    MarcoFalke commented at 2:40 pm on October 5, 2021:
    If no runtime benefit can be observed with recent gcc, it would be surprising to see a benefit of lto with gcc-8. Lto might not be worth it for the release binaries unless they are also switched to use clang?

    laanwj commented at 4:45 pm on October 5, 2021:
    If it’s not slower, something could still be said for smaller binaries.

    laanwj commented at 4:48 pm on October 5, 2021:
    And I think what would still be interesting is to build the depends with -lto too, so that calls into dependencies can be optimized, and this might be able to shave off big parts of Qt.
  24. sipa commented at 5:29 pm on October 5, 2021: member
    For GCC 8 it may be useful to try enabling -flto-odr-type-merging; that may detect certain calling mismatches (if -Wodr is enabled as well, but it is by default). In later GCCs the option disappeared; not sure why.
  25. MarcoFalke added the label DrahtBot Guix build requested on Oct 6, 2021
  26. DrahtBot commented at 12:44 pm on October 6, 2021: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    No conflicts as of last run.

  27. jamesob commented at 2:13 pm on October 6, 2021: member

    ACK https://github.com/bitcoin/bitcoin/pull/23152/commits/7a4758883ede8853c0e74ae66a2df68da0ce405c

    I tested/benched both --enable-lto and --enable-thin-lto with clang-12 and lld. Though gcc-9 supports LTO, apparently lld doesn’t know how to work with gcc, so I didn’t test that.

    The differences I saw for a mid-chain 40,000 block IBD weren’t as dramatic as I expected based on the results above. The dbcache here is the default (pretty low at 300), so I’ll try rerunning with a much higher dbcache to see if that makes a difference. In any case, this changeset obviously works.

    Worth noting that the assumevalid value (000000000000000000176c192f42ad13ab159fdb20198b87e7ba3c001e47b876) I use here straddles the run, since its height is 522,000.

    Edit: just to be clear, these results show that LTO (or, really, clang-compiled binaries) are somehow marginally slower in this test than gcc. Kinda weird?


    ibd local range 500000 540000

    bench name command
    ibd.local.range.500000.540000 bitcoind -dbcache=300 -debug=coindb -debug=bench -listen=0 -connect=0 -addnode=127.0.0.1:8888 -prune=9999999 -printtoconsole=0 -assumevalid=000000000000000000176c192f42ad13ab159fdb20198b87e7ba3c001e47b876

    lto-clang-12 vs. thin-lto-clang-12 vs. gcc-9 (master) vs. clang-12 (master) (absolute)

    bench name x lto-clang-12 thin-lto-clang-12 gcc-9 (master) clang-12 (master)
    build.make.8.total_secs 1 582.0000 (± 0.0000) 515.0000 (± 0.0000) 469.0000 (± 0.0000) 349.0000 (± 0.0000)
    build.make.8.peak_rss_KiB 1 6124560.0000 (± 0.0000) 1843432.0000 (± 0.0000) 1865844.0000 (± 0.0000) 956092.0000 (± 0.0000)
    build.make.8.cpu_kernel_secs 1 61.4100 (± 0.0000) 61.3100 (± 0.0000) 116.7600 (± 0.0000) 59.8800 (± 0.0000)
    build.make.8.cpu_user_secs 1 3063.3200 (± 0.0000) 3809.7000 (± 0.0000) 3530.8800 (± 0.0000) 2654.3400 (± 0.0000)
    ibd.local.range.500000.540000.total_secs 2 4593.6784 (± 98.6011) 4662.5669 (± 81.0513) 4526.9569 (± 122.6671) 4656.9325 (± 173.6344)
    ibd.local.range.500000.540000.peak_rss_KiB 2 2468266.0000 (± 26734.0000) 2269640.0000 (± 154632.0000) 2275648.0000 (± 157012.0000) 2288716.0000 (± 117284.0000)
    ibd.local.range.500000.540000.cpu_kernel_secs 2 650.9050 (± 3.0050) 642.1900 (± 7.9500) 649.1850 (± 1.7650) 662.6350 (± 7.7250)
    ibd.local.range.500000.540000.cpu_user_secs 2 13794.1000 (± 0.4800) 13864.0600 (± 35.5200) 14426.4950 (± 3.7650) 14008.0250 (± 13.4050)

    lto-clang-12 vs. thin-lto-clang-12 vs. gcc-9 (master) vs. clang-12 (master) (relative)

    bench name x lto-clang-12 thin-lto-clang-12 gcc-9 (master) clang-12 (master)
    build.make.8.total_secs 1 1.668 1.476 1.344 1.000
    build.make.8.peak_rss_KiB 1 6.406 1.928 1.952 1.000
    build.make.8.cpu_kernel_secs 1 1.026 1.024 1.950 1.000
    build.make.8.cpu_user_secs 1 1.154 1.435 1.330 1.000
    ibd.local.range.500000.540000.total_secs 2 1.015 1.030 1.000 1.029
    ibd.local.range.500000.540000.peak_rss_KiB 2 1.088 1.000 1.003 1.008
    ibd.local.range.500000.540000.cpu_kernel_secs 2 1.014 1.000 1.011 1.032
    ibd.local.range.500000.540000.cpu_user_secs 2 1.000 1.005 1.046 1.016
  28. martinus commented at 3:48 pm on October 6, 2021: contributor
    I think the numbers are actually not too different from mine. lto-clang-12 uses the least amount of CPU, and gcc-9 the most. I think that’s the important metrics here, the actual runtime difference could be due to random chance (e.g. waiting for leveldb writes).
  29. MarcoFalke deleted a comment on Oct 6, 2021
  30. MarcoFalke deleted a comment on Oct 6, 2021
  31. MarcoFalke commented at 5:41 pm on October 6, 2021: member
    @jamesob @martinus I couldn’t tell from your results, but both of you compiled without depends, right?
  32. jamesob commented at 5:55 pm on October 6, 2021: member
  33. martinus commented at 4:57 am on October 7, 2021: contributor
    @MarcoFalke no, I just configured, ran make -j14 check and then bitcoind
  34. martinus referenced this in commit f46225ea5c on Oct 7, 2021
  35. jamesob commented at 6:47 pm on October 7, 2021: member

    Benches with large DB cache (9000) are in. Interesting results in the relative table are bolded below.


    bench name command
    ibd.local.range.500000.540000 bitcoind -dbcache=9000 -debug=coindb -debug=bench -listen=0 -connect=0 -addnode=127.0.0.1:8888 -prune=9999999 -printtoconsole=0 -assumevalid=000000000000000000176c192f42ad13ab159fdb20198b87e7ba3c001e47b876

    lto-clang-12 vs. thin-lto-clang-12 vs. gcc-9 (master) vs. clang-12 (master) (absolute)

    bench name x lto-clang-12 thin-lto-clang-12 gcc-9 (master) clang-12 (master)
    build.make.8.gcc.total_secs 1 586.0000 (± 0.0000) 515.0000 (± 0.0000) 459.0000 (± 0.0000) 346.0000 (± 0.0000)
    build.make.8.gcc.peak_rss_KiB 1 6120784.0000 (± 0.0000) 1805480.0000 (± 0.0000) 1866384.0000 (± 0.0000) 956064.0000 (± 0.0000)
    build.make.8.gcc.cpu_kernel_secs 1 61.6600 (± 0.0000) 59.8500 (± 0.0000) 112.4300 (± 0.0000) 59.6300 (± 0.0000)
    build.make.8.gcc.cpu_user_secs 1 3045.8600 (± 0.0000) 3812.3300 (± 0.0000) 3398.8500 (± 0.0000) 2639.3500 (± 0.0000)
    ibd.local.range.500000.540000.total_secs 2 3413.4441 (± 21.9705) 3424.3078 (± 75.4325) 3533.6713 (± 3.3140) 3469.5077 (± 7.1294)
    ibd.local.range.500000.540000.peak_rss_KiB 2 6262320.0000 (± 220.0000) 6261344.0000 (± 620.0000) 6259058.0000 (± 1330.0000) 6259524.0000 (± 3872.0000)
    ibd.local.range.500000.540000.cpu_kernel_secs 2 279.6500 (± 1.4900) 279.2600 (± 0.6200) 281.6750 (± 1.6750) 281.6800 (± 1.6400)
    ibd.local.range.500000.540000.cpu_user_secs 2 12522.1300 (± 7.6300) 12644.4700 (± 19.7700) 13135.6450 (± 2.0050) 12697.3800 (± 16.7200)

    lto-clang-12 vs. thin-lto-clang-12 vs. gcc-9 (master) vs. clang-12 (master) (relative)

    bench name x lto-clang-12 thin-lto-clang-12 gcc-9 (master) clang-12 (master)
    build.make.8.gcc.total_secs 1 1.694 1.488 1.327 1.000
    build.make.8.gcc.peak_rss_KiB 1 6.402 1.888 1.952 1.000
    build.make.8.gcc.cpu_kernel_secs 1 1.034 1.004 1.885 1.000
    build.make.8.gcc.cpu_user_secs 1 1.154 1.444 1.288 1.000
    ibd.local.range.500000.540000.total_secs 2 1.000 1.003 1.035 1.016
    ibd.local.range.500000.540000.peak_rss_KiB 2 1.001 1.000 1.000 1.000
    ibd.local.range.500000.540000.cpu_kernel_secs 2 1.001 1.000 1.009 1.009
    ibd.local.range.500000.540000.cpu_user_secs 2 1.000 1.010 1.049 1.014
  36. DrahtBot removed the label DrahtBot Guix build requested on Oct 7, 2021
  37. MarcoFalke commented at 7:56 am on October 8, 2021: member
     0/bin/sh ../libtool  --tag=CXX --preserve-dup-deps  --mode=link x86_64-w64-mingw32-g++ -std=c++17  -fstack-reuse=none -Wstack-protector -fstack-protector-all -fcf-protection=full      -flto -fPIE -pipe -O2 -O2 -g -fno-ident -fno-extended-identifiers -fvisibility=hidden -Wl,--exclude-libs,ALL  -Wl,--enable-reloc-section -Wl,--dynamicbase -Wl,--nxcompat -Wl,--high-entropy-va -pie   -flto -all-static -pthread -lpthread -L/bitcoin/depends/x86_64-w64-mingw32/lib -Wl,--no-insert-timestamp -Wl,--major-subsystem-version -Wl,6 -Wl,--minor-subsystem-version -Wl,1 -o bitcoin-cli.exe bitcoin_cli-bitcoin-cli.o bitcoin-cli-res.o libbitcoin_cli.a univalue/libunivalue.la libbitcoin_util.a crypto/libbitcoin_crypto_base.a crypto/libbitcoin_crypto_sse41.a crypto/libbitcoin_crypto_avx2.a crypto/libbitcoin_crypto_shani.a -L/bitcoin/depends/x86_64-w64-mingw32/lib -lboost_system-mt-s-x64 -lboost_filesystem-mt-s-x64 -L/bitcoin/depends/x86_64-w64-mingw32/lib -levent -lws2_32 -lssp -liphlpapi -lshlwapi -lws2_32 -ladvapi32 -luuid -loleaut32 -lole32 -lcomctl32 -lshell32 -lwinmm -lcomdlg32 -lgdi32 -luser32 -lkernel32 
     1libtool: link: x86_64-w64-mingw32-g++ -std=c++17 -fstack-reuse=none -Wstack-protector -fstack-protector-all -fcf-protection=full -flto -fPIE -pipe -O2 -O2 -g -fno-ident -fno-extended-identifiers -fvisibility=hidden -Wl,--exclude-libs -Wl,ALL -Wl,--enable-reloc-section -Wl,--dynamicbase -Wl,--nxcompat -Wl,--high-entropy-va -pie -flto -static -pthread -Wl,--no-insert-timestamp -Wl,--major-subsystem-version -Wl,6 -Wl,--minor-subsystem-version -Wl,1 -o bitcoin-cli.exe bitcoin_cli-bitcoin-cli.o bitcoin-cli-res.o  -lpthread -L/bitcoin/depends/x86_64-w64-mingw32/lib libbitcoin_cli.a univalue/.libs/libunivalue.a libbitcoin_util.a crypto/libbitcoin_crypto_base.a crypto/libbitcoin_crypto_sse41.a crypto/libbitcoin_crypto_avx2.a crypto/libbitcoin_crypto_shani.a -lboost_system-mt-s-x64 -lboost_filesystem-mt-s-x64 -levent -lws2_32 /gnu/store/c1mhzb9f961bdc9z4jbsn44c8fk2an5y-gcc-cross-x86_64-w64-mingw32-8.4.0/x86_64-w64-mingw32/lib/libssp.a -liphlpapi -lshlwapi -lws2_32 -ladvapi32 -luuid -loleaut32 -lole32 -lcomctl32 -lshell32 -lwinmm -lcomdlg32 -lgdi32 -luser32 -lkernel32 -pthread
     2x86_64-w64-mingw32-ld: libbitcoin_util_a-request.o (symbol from plugin):(.gnu.linkonce.t._ZNSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EEaSERKS7_+0x0): multiple definition of `std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::operator=(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)'; univalue/.libs/libunivalue.a(libunivalue_la-univalue.o):/gnu/store/c1mhzb9f961bdc9z4jbsn44c8fk2an5y-gcc-cross-x86_64-w64-mingw32-8.4.0/include/c++/bits/vector.tcc:186: first defined here
     3x86_64-w64-mingw32-ld: libbitcoin_util_a-request.o (symbol from plugin):(.gnu.linkonce.t._ZNSt6vectorI8UniValueSaIS0_EEaSERKS2_+0x0): multiple definition of `std::vector<UniValue, std::allocator<UniValue> >::operator=(std::vector<UniValue, std::allocator<UniValue> > const&)'; univalue/.libs/libunivalue.a(libunivalue_la-univalue.o):/gnu/store/c1mhzb9f961bdc9z4jbsn44c8fk2an5y-gcc-cross-x86_64-w64-mingw32-8.4.0/include/c++/bits/vector.tcc:186: first defined here
     4x86_64-w64-mingw32-ld: libbitcoin_util_a-system.o (symbol from plugin):(.gnu.linkonce.t._ZNSt8_Rb_treeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_8UniValueESt10_Select1stIS9_ESt4lessIS5_ESaIS9_EE24_M_get_insert_unique_posERS7_+0x0): multiple definition of `std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, UniValue>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, UniValue> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, UniValue> > >::_M_get_insert_unique_pos(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'; univalue/.libs/libunivalue.a(libunivalue_la-univalue.o):/gnu/store/c1mhzb9f961bdc9z4jbsn44c8fk2an5y-gcc-cross-x86_64-w64-mingw32-8.4.0/include/c++/bits/stl_tree.h:2051: first defined here
     5x86_64-w64-mingw32-ld: libbitcoin_util_a-system.o (symbol from plugin):(.gnu.linkonce.t._ZNSt8_Rb_treeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_8UniValueESt10_Select1stIS9_ESt4lessIS5_ESaIS9_EE8_M_eraseEPSt13_Rb_tree_nodeIS9_E+0x0): multiple definition of `std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, UniValue>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, UniValue> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, UniValue> > >::_M_erase(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, UniValue> >*)'; univalue/.libs/libunivalue.a(libunivalue_la-univalue.o):/gnu/store/c1mhzb9f961bdc9z4jbsn44c8fk2an5y-gcc-cross-x86_64-w64-mingw32-8.4.0/include/c++/bits/stl_tree.h:1873: first defined here
     6x86_64-w64-mingw32-ld: libbitcoin_util_a-system.o (symbol from plugin):(.gnu.linkonce.t._ZNSt6vectorI8UniValueSaIS0_EE17_M_realloc_insertIJRKS0_EEEvN9__gnu_cxx17__normal_iteratorIPS0_S2_EEDpOT_+0x0): multiple definition of `void std::vector<UniValue, std::allocator<UniValue> >::_M_realloc_insert<UniValue const&>(__gnu_cxx::__normal_iterator<UniValue*, std::vector<UniValue, std::allocator<UniValue> > >, UniValue const&)'; univalue/.libs/libunivalue.a(libunivalue_la-univalue.o):/gnu/store/c1mhzb9f961bdc9z4jbsn44c8fk2an5y-gcc-cross-x86_64-w64-mingw32-8.4.0/include/c++/bits/vector.tcc:413: first defined here
     7x86_64-w64-mingw32-ld: libbitcoin_util_a-settings.o (symbol from plugin):(.gnu.linkonce.t._ZNSt6vectorI8UniValueSaIS0_EE15_M_range_insertIN9__gnu_cxx17__normal_iteratorIPKS0_S2_EEEEvNS5_IPS0_S2_EET_SB_St20forward_iterator_tag+0x0): multiple definition of `void std::vector<UniValue, std::allocator<UniValue> >::_M_range_insert<__gnu_cxx::__normal_iterator<UniValue const*, std::vector<UniValue, std::allocator<UniValue> > > >(__gnu_cxx::__normal_iterator<UniValue*, std::vector<UniValue, std::allocator<UniValue> > >, __gnu_cxx::__normal_iterator<UniValue const*, std::vector<UniValue, std::allocator<UniValue> > >, __gnu_cxx::__normal_iterator<UniValue const*, std::vector<UniValue, std::allocator<UniValue> > >, std::forward_iterator_tag)'; univalue/.libs/libunivalue.a(libunivalue_la-univalue.o):/gnu/store/c1mhzb9f961bdc9z4jbsn44c8fk2an5y-gcc-cross-x86_64-w64-mingw32-8.4.0/include/c++/bits/vector.tcc:672: first defined here
     8collect2: error: ld returned 1 exit status
     9make[2]: *** [Makefile:5912: bitcoin-cli.exe] Error 1
    10make[2]: Leaving directory '/distsrc-base/distsrc-1571f1a82883-x86_64-w64-mingw32/src'
    11make[1]: *** [Makefile:16291: all-recursive] Error 1
    12make[1]: Leaving directory '/distsrc-base/distsrc-1571f1a82883-x86_64-w64-mingw32/src'
    13make: *** [Makefile:821: all-recursive] Error 1
    
  38. fanquake force-pushed on Oct 8, 2021
  39. fanquake commented at 8:16 am on October 8, 2021: member

    For GCC 8 it may be useful to try enabling -flto-odr-type-merging; that may detect certain calling mismatches (if -Wodr is enabled as well, but it is by default). In later GCCs the option disappeared; not sure why.

    I’ve added this. In newer GCC, in regards to this option I see:

    -flto-odr-type-merging Does nothing. Preserved for backward compatibility.

    Looks like it was removed when -Wodr became enabled by default. See https://github.com/gcc-mirror/gcc/commit/686a56a85d39750cd5c0c42f2ea747c8632e519e.

    Also added -flto-report.

    And I think what would still be interesting is to build the depends with -lto too, so that calls into dependencies can be optimized, and this might be able to shave off big parts of Qt.

    I’ve added -flto to some FLAGS in depends.

    In regards to Guix. The Linux HOSTS are all building.

    The macOS build is failing (cross-compiling outside Guix works). See full output here:

     0Undefined symbols for architecture x86_64:
     1  "RandomInit()", referenced from:
     2      _main in lto.o
     3  "SetupChainParamsBaseOptions(ArgsManager&)", referenced from:
     4      _main in lto.o
     5  "ArgsManager::AddArg(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned int, OptionsCategory const&)", referenced from:
     6      _main in lto.o
     7  "ArgsManager::AddCommand(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)", referenced from:
     8      _main in lto.o
     9  "HelpRequested(ArgsManager const&)", referenced from:
    10      _main in lto.o
    11  "SetupHelpOptions(ArgsManager&)", referenced from:
    12      _main in lto.o
    13  "FormatFullVersion()", referenced from:
    14      _main in lto.o
    15  "SetupEnvironment()", referenced from:
    16      _main in lto.o
    17<trim>
    

    The mingw-w64 is failing with:

    0/bin/sh ../libtool  --tag=CXX --preserve-dup-deps  --mode=link x86_64-w64-mingw32-g++ -std=c++17  -fstack-reuse=none -Wstack-protector -fstack-protector-all -fcf-protection=full      -flto -fPIE -pipe -O2 -O2 -g -fno-ident -fno-extended-identifiers -fvisibility=hidden -Wl,--exclude-libs,ALL  -Wl,--enable-reloc-section -Wl,--dynamicbase -Wl,--nxcompat -Wl,--high-entropy-va -pie   -flto -all-static -pthread -lpthread -L/bitcoin/depends/x86_64-w64-mingw32/lib -Wl,--no-insert-timestamp -Wl,--major-subsystem-version -Wl,6 -Wl,--minor-subsystem-version -Wl,1 -o bitcoin-cli.exe bitcoin_cli-bitcoin-cli.o bitcoin-cli-res.o libbitcoin_cli.a univalue/libunivalue.la libbitcoin_util.a crypto/libbitcoin_crypto_base.a crypto/libbitcoin_crypto_sse41.a crypto/libbitcoin_crypto_avx2.a crypto/libbitcoin_crypto_shani.a -L/bitcoin/depends/x86_64-w64-mingw32/lib -lboost_system-mt-s-x64 -lboost_filesystem-mt-s-x64 -L/bitcoin/depends/x86_64-w64-mingw32/lib -levent -lws2_32 -lssp -liphlpapi -lshlwapi -lws2_32 -ladvapi32 -luuid -loleaut32 -lole32 -lcomctl32 -lshell32 -lwinmm -lcomdlg32 -lgdi32 -luser32 -lkernel32 
    1libtool: link: x86_64-w64-mingw32-g++ -std=c++17 -fstack-reuse=none -Wstack-protector -fstack-protector-all -fcf-protection=full -flto -fPIE -pipe -O2 -O2 -g -fno-ident -fno-extended-identifiers -fvisibility=hidden -Wl,--exclude-libs -Wl,ALL -Wl,--enable-reloc-section -Wl,--dynamicbase -Wl,--nxcompat -Wl,--high-entropy-va -pie -flto -static -pthread -Wl,--no-insert-timestamp -Wl,--major-subsystem-version -Wl,6 -Wl,--minor-subsystem-version -Wl,1 -o bitcoin-cli.exe bitcoin_cli-bitcoin-cli.o bitcoin-cli-res.o  -lpthread -L/bitcoin/depends/x86_64-w64-mingw32/lib libbitcoin_cli.a univalue/.libs/libunivalue.a libbitcoin_util.a crypto/libbitcoin_crypto_base.a crypto/libbitcoin_crypto_sse41.a crypto/libbitcoin_crypto_avx2.a crypto/libbitcoin_crypto_shani.a -lboost_system-mt-s-x64 -lboost_filesystem-mt-s-x64 -levent -lws2_32 /gnu/store/c1mhzb9f961bdc9z4jbsn44c8fk2an5y-gcc-cross-x86_64-w64-mingw32-8.4.0/x86_64-w64-mingw32/lib/libssp.a -liphlpapi -lshlwapi -lws2_32 -ladvapi32 -luuid -loleaut32 -lole32 -lcomctl32 -lshell32 -lwinmm -lcomdlg32 -lgdi32 -luser32 -lkernel32 -pthread
    2x86_64-w64-mingw32-ld: libbitcoin_util_a-request.o (symbol from plugin):(.gnu.linkonce.t._ZNSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EEaSERKS7_+0x0): multiple definition of `std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::operator=(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)'; univalue/.libs/libunivalue.a(libunivalue_la-univalue.o):/gnu/store/c1mhzb9f961bdc9z4jbsn44c8fk2an5y-gcc-cross-x86_64-w64-mingw32-8.4.0/include/c++/bits/vector.tcc:186: first defined here
    3x86_64-w64-mingw32-ld: libbitcoin_util_a-request.o (symbol from plugin):(.gnu.linkonce.t._ZNSt6vectorI8UniValueSaIS0_EEaSERKS2_+0x0): multiple definition of `std::vector<UniValue, std::allocator<UniValue> >::operator=(std::vector<UniValue, std::allocator<UniValue> > const&)'; univalue/.libs/libunivalue.a(libunivalue_la-univalue.o):/gnu/store/c1mhzb9f961bdc9z4jbsn44c8fk2an5y-gcc-cross-x86_64-w64-mingw32-8.4.0/include/c++/bits/vector.tcc:186: first defined here
    4x86_64-w64-mingw32-ld: libbitcoin_util_a-system.o (symbol from plugin):(.gnu.linkonce.t._ZNSt8_Rb_treeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_8UniValueESt10_Select1stIS9_ESt4lessIS5_ESaIS9_EE24_M_get_insert_unique_posERS7_+0x0): multiple definition of `std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, UniValue>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, UniValue> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, UniValue> > >::_M_get_insert_unique_pos(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'; univalue/.libs/libunivalue.a(libunivalue_la-univalue.o):/gnu/store/c1mhzb9f961bdc9z4jbsn44c8fk2an5y-gcc-cross-x86_64-w64-mingw32-8.4.0/include/c++/bits/stl_tree.h:2051: first defined here
    5x86_64-w64-mingw32-ld: libbitcoin_util_a-system.o (symbol from plugin):(.gnu.linkonce.t._ZNSt8_Rb_treeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_8UniValueESt10_Select1stIS9_ESt4lessIS5_ESaIS9_EE8_M_eraseEPSt13_Rb_tree_nodeIS9_E+0x0): multiple definition of `std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, UniValue>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, UniValue> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, UniValue> > >::_M_erase(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, UniValue> >*)'; univalue/.libs/libunivalue.a(libunivalue_la-univalue.o):/gnu/store/c1mhzb9f961bdc9z4jbsn44c8fk2an5y-gcc-cross-x86_64-w64-mingw32-8.4.0/include/c++/bits/stl_tree.h:1873: first defined here
    6x86_64-w64-mingw32-ld: libbitcoin_util_a-system.o (symbol from plugin):(.gnu.linkonce.t._ZNSt6vectorI8UniValueSaIS0_EE17_M_realloc_insertIJRKS0_EEEvN9__gnu_cxx17__normal_iteratorIPS0_S2_EEDpOT_+0x0): multiple definition of `void std::vector<UniValue, std::allocator<UniValue> >::_M_realloc_insert<UniValue const&>(__gnu_cxx::__normal_iterator<UniValue*, std::vector<UniValue, std::allocator<UniValue> > >, UniValue const&)'; univalue/.libs/libunivalue.a(libunivalue_la-univalue.o):/gnu/store/c1mhzb9f961bdc9z4jbsn44c8fk2an5y-gcc-cross-x86_64-w64-mingw32-8.4.0/include/c++/bits/vector.tcc:413: first defined here
    7x86_64-w64-mingw32-ld: libbitcoin_util_a-settings.o (symbol from plugin):(.gnu.linkonce.t._ZNSt6vectorI8UniValueSaIS0_EE15_M_range_insertIN9__gnu_cxx17__normal_iteratorIPKS0_S2_EEEEvNS5_IPS0_S2_EET_SB_St20forward_iterator_tag+0x0): multiple definition of `void std::vector<UniValue, std::allocator<UniValue> >::_M_range_insert<__gnu_cxx::__normal_iterator<UniValue const*, std::vector<UniValue, std::allocator<UniValue> > > >(__gnu_cxx::__normal_iterator<UniValue*, std::vector<UniValue, std::allocator<UniValue> > >, __gnu_cxx::__normal_iterator<UniValue const*, std::vector<UniValue, std::allocator<UniValue> > >, __gnu_cxx::__normal_iterator<UniValue const*, std::vector<UniValue, std::allocator<UniValue> > >, std::forward_iterator_tag)'; univalue/.libs/libunivalue.a(libunivalue_la-univalue.o):/gnu/store/c1mhzb9f961bdc9z4jbsn44c8fk2an5y-gcc-cross-x86_64-w64-mingw32-8.4.0/include/c++/bits/vector.tcc:672: first defined here
    8collect2: error: ld returned 1 exit status
    
  40. MarcoFalke added the label DrahtBot Guix build requested on Oct 8, 2021
  41. MarcoFalke deleted a comment on Oct 8, 2021
  42. DrahtBot removed the label DrahtBot Guix build requested on Oct 9, 2021
  43. fanquake force-pushed on Oct 10, 2021
  44. MarcoFalke added the label DrahtBot Guix build requested on Oct 11, 2021
  45. MarcoFalke deleted a comment on Oct 11, 2021
  46. DrahtBot commented at 9:58 am on October 12, 2021: member

    Guix builds

    File commit 5b7210c8745d9572fe94620f848d4ee1304c91a7(master) commit dfe4547134ea52fd0a6585129f8b11ac49904b95(master and this pull)
    SHA256SUMS.part c79a21dc3da6e71b...
    *-aarch64-linux-gnu-debug.tar.gz 108b2ef989bd0d35...
    *-aarch64-linux-gnu.tar.gz a5bacdca50e2c3f0...
    *-arm-linux-gnueabihf-debug.tar.gz b21f70b81f75b1f6...
    *-arm-linux-gnueabihf.tar.gz 6c79c07aa1d2df22...
    *-osx-unsigned.dmg e15275439c514402...
    *-osx-unsigned.tar.gz 23080798bd4790d0...
    *-osx64.tar.gz f4afcac2e217616d...
    *-powerpc64-linux-gnu-debug.tar.gz 7bb92bb01f337a75...
    *-powerpc64-linux-gnu.tar.gz 085787f59f3c1a58...
    *-powerpc64le-linux-gnu-debug.tar.gz 2732fa036e1ba621...
    *-powerpc64le-linux-gnu.tar.gz 495e1c10d8950c1a...
    *-riscv64-linux-gnu-debug.tar.gz 82c896f2e942879b...
    *-riscv64-linux-gnu.tar.gz ebc9d158e7fe286e...
    *-win-unsigned.tar.gz 4a871e218e240ccb...
    *-win64-debug.zip 0ac6d7beea9e7a67...
    *-win64-setup-unsigned.exe 73d5ba8460a67851...
    *-win64.zip 104adfb4cd021131...
    *-x86_64-linux-gnu-debug.tar.gz 896c0ecaefff84eb...
    *-x86_64-linux-gnu.tar.gz 9cc9de13c910b2a4...
    *.tar.gz 0d51a99bd81f38ec... fb63f940b09474e5...
    guix_build.log 463c2d4bbaa53638... 902025a013b42081...
    guix_build.log.diff f2c8d8610e11556c...
  47. DrahtBot removed the label DrahtBot Guix build requested on Oct 12, 2021
  48. build: add `--enable-lto` configuration option
    Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
    Co-authored-by: Elichai Turkel <elichai.turkel@gmail.com>
    68e5aafde3
  49. fanquake force-pushed on Nov 16, 2021
  50. fanquake commented at 1:47 am on November 16, 2021: member

    I’ve rebased this, and reduced the changes back to just adding the --enable-lto configure option (-flto). I think merging this as is, is still useful for now, so that developers can experiment/benchmark. LTO is still opt in, and there are no changes to release builds.

    I plan on adding proper LTO support to depends as a follow up.

  51. fanquake added the label DrahtBot Guix build requested on Nov 16, 2021
  52. DrahtBot commented at 10:04 am on November 18, 2021: member

    Guix builds

    File commit b869a784ef2b259f14545bf6bd314fb58c36514b(master) commit da6e6382113ef97b8ee85c7454bea8addf765731(master and this pull)
    SHA256SUMS.part 954f517d6d32db0e... 6cc1eec935c0042e...
    *-aarch64-linux-gnu-debug.tar.gz af2097467f36eabf... 3dcc0de6160f7c35...
    *-aarch64-linux-gnu.tar.gz e69d087ccde8efd6... 77067d14556e109e...
    *-arm-linux-gnueabihf-debug.tar.gz a0c73a2842d9fa8f... 2dcdcb1a2ea8ce99...
    *-arm-linux-gnueabihf.tar.gz c4cc98770c06329b... c57d5c94ca3f85fe...
    *-osx-unsigned.dmg 62a4e08899b352be... 54a32ff64a9f2d24...
    *-osx-unsigned.tar.gz b93bb0e47ecc8b95... 1fa5ad953c9c2af1...
    *-osx64.tar.gz c69f867438a8af21... 8e80cc616af98cb8...
    *-powerpc64-linux-gnu-debug.tar.gz 2498043e265ec068... 75165af3e6ecaaaf...
    *-powerpc64-linux-gnu.tar.gz 9432dd894f7360c8... 7dcd9732be03f2ae...
    *-powerpc64le-linux-gnu-debug.tar.gz 65ed6dc046debc24... 809b49fa06cee5da...
    *-powerpc64le-linux-gnu.tar.gz 9ab401045d52eea9... a37c138df3d55bee...
    *-riscv64-linux-gnu-debug.tar.gz 2fac51b0a508c378... 22f2b9f5eae333e0...
    *-riscv64-linux-gnu.tar.gz e1ec62b871a1f4e4... 0b423013c33d5a77...
    *-win-unsigned.tar.gz cbbac77f2a64bfcc... c6ebb35599df0f4e...
    *-win64-debug.zip ff35db2f0d6ab21b... 4241c2f5e90c6951...
    *-win64-setup-unsigned.exe 90d58bf173f12b26... 93cd680dfbf4317d...
    *-win64.zip d97a69c639cd5e58... 22322ace7cf4fdd7...
    *-x86_64-linux-gnu-debug.tar.gz 65fe6d9d2c6d0f99... 40dc193eea66d1e5...
    *-x86_64-linux-gnu.tar.gz fd13a09d2d862e25... 32fee564ee128f26...
    *.tar.gz 06dea90135016d53... 61d1906bb2485b61...
    guix_build.log a53270b21d20dcea... a9ff2d0d3c723d30...
    guix_build.log.diff 912a22ba51f64849...
  53. DrahtBot removed the label DrahtBot Guix build requested on Nov 18, 2021
  54. laanwj commented at 1:30 pm on November 18, 2021: member
    Code review ACK 68e5aafde3e87c16da95410a0474f38f589afb36 (but see below)
  55. laanwj commented at 2:22 pm on November 18, 2021: member

    I tried running a build with --enable-lto on Ubuntu 20.04, with clang 13 and get the following errors during link:

    0/usr/bin/ld: minisketch/libminisketch.a: error adding symbols: archive has no index; run ranlib to add one
    1clang-13: error: linker command failed with exit code 1 (use -v to see invocation)
    

    With gcc 9.3.0 I get the following error:

    0/usr/bin/ld: /tmp/bitcoin-tx.PKYwxw.ltrans1.ltrans.o: in function `VerifyWitnessProgram(CScriptWitness const&, int, std::vector<unsigned char, std::allocator<unsigned char> > const&, unsigned int, BaseSignatureChecker const&, ScriptError_t*, bool) [clone .constprop.0]':
    1./build/src/./src/pubkey.cpp:230: undefined reference to `secp256k1_xonly_pubkey_parse'
    2/usr/bin/ld: /tmp/bitcoin-tx.PKYwxw.ltrans1.ltrans.o:./build/src/./src/pubkey.cpp:232: undefined reference to `secp256k1_xonly_pubkey_tweak_add_check'
    
  56. martinus referenced this in commit 10aeb6e648 on Nov 18, 2021
  57. fanquake commented at 4:19 am on November 19, 2021: member

    with clang 13 and get the following errors during link:

    Builds with Apple Clang are working ok, but I see the same issue with LLVM Clang 13. Will take a look.

    With gcc 9.3.0 I get the following error:

    Interesting. Anything non-standard about your build? I’ve completed builds using this branch with GCC 10.3.0 and 9.3.0.

  58. fanquake commented at 6:56 am on November 19, 2021: member

    but I see the same issue with LLVM Clang 13. Will take a look.

    The issue is using ranlib rather than llvm-ranlib. When I use llvm-ranlib-13 (with clang-13) building with LTO works fine.

    We should be able to override the ranlib used during configure, using RANLIB=llvm-ranlib-*, but that doesn’t currently work because of our use of AC_PATH_TOOL. Going to fix this up.

  59. laanwj commented at 6:32 pm on November 19, 2021: member

    Interesting. Anything non-standard about your build? I’ve completed builds using this branch with GCC 10.3.0 and 9.3.0.

    I’ve retried on Ubuntu 20.04 with just

    0../configure --with-incompatible-bdb --enable-lto
    1make -j4
    

    No flag overrides. The only thing special isthat it’s an out-of-tree build. It still fails in the linking step.

    0  CXXLD    bitcoin-wallet
    1/usr/bin/ld: /tmp/bitcoin-tx.OJq2ei.ltrans1.ltrans.o: in function `VerifyWitnessProgram(CScriptWitness const&, int, std::vector<unsigned char, std::allocator<unsigned char> >
    2 const&, unsigned int, BaseSignatureChecker const&, ScriptError_t*, bool) [clone .constprop.0]':
    3./build/src/./src/pubkey.cpp:230: undefined reference to `secp256k1_xonly_pubkey_parse'
    4/usr/bin/ld: /tmp/bitcoin-tx.OJq2ei.ltrans1.ltrans.o:./build/src/./src/pubkey.cpp:232: undefined reference to `secp256k1_xonly_pubkey_tweak_add_check'
    5/usr/bin/ld: /tmp/bitcoin-tx.OJq2ei.ltrans4.ltrans.o: in function `XOnlyPubKey::VerifySchnorr(uint256 const&, Span<unsigned char const>) const':
    6./build/src/./src/pubkey.cpp:210: undefined reference to `secp256k1_xonly_pubkey_parse'
    7
    

    It could still be the environment I can retry in a fresh VM.

    Edit: that works, fine, will try to find out what is different. Edit.2: there was some local crap in the source directory interfering (leftover stale .deps directories, to be specific). So no issue with this PR at all.

  60. fanquake commented at 12:14 pm on November 25, 2021: member
    This currently works for GCC 9.x+, no change to release builds, still completely opt in. I’m planning on following up with improved support for Clang, and depends support shortly, but am going to merge this now.
  61. fanquake merged this on Nov 25, 2021
  62. fanquake closed this on Nov 25, 2021

  63. fanquake deleted the branch on Nov 25, 2021
  64. sidhujag referenced this in commit 67658eec7d on Nov 25, 2021
  65. DrahtBot locked this on Nov 25, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me