Revisit link-time optimization (LTO)? Some results from clang LTO compilation #14277

issue practicalswift opened this issue on September 20, 2018
  1. practicalswift commented at 1:16 PM on September 20, 2018: contributor

    Is it worth revisiting LTO compilation?

    I did some experimentation with LTO compilation and the results look promising :-)

    Binary size results (non-stripped binaries):

    • bench_bitcoin shrank from 74 678 800 to 39 695 288 bytes (-47 %)
    • bitcoin-cli shrank from 4 837 744 to 2 918 544 bytes (-40 %)
    • bitcoin-tx shrank from 15 206 720 to 7 717 608 bytes (-49 %)
    • bitcoind shrank from 102 004 960 to 70 706 000 bytes (-31 %)
    • test_bitcoin shrank from 161 739 656 to 100 838 072 bytes (-38 %)
    • test_bitcoin_fuzzy shrank from 15 929 968 to 6 036 176 bytes (-62 %)

    Binary size results (stripped binaries):

    • bench_bitcoin shrank from 5 632 272 to 3 722 720 bytes (-34 %)
    • bitcoin-cli shrank from 383 216 to 260 288 bytes (-32 %)
    • bitcoin-tx shrank from 1 399 112 to 936 080 bytes (-33 %)
    • bitcoind shrank from 6 639 336 to 6 044 520 bytes (-9 %)
    • test_bitcoin shrank from 12 067 056 to 10 853 616 bytes (-10 %)
    • test_bitcoin_fuzzy shrank from 1 468 976 to 428 160 bytes (-71 %)

    Benchmark results (insignificant relative changes omitted to reduce noise):

    • Runtime of benchmark FastRandom_1bit changed -7.9 % when enabling LTO
    • Runtime of benchmark FastRandom_32bit changed -6.7 % when enabling LTO
    • Runtime of benchmark MatchGCSFilter changed -11.5 % when enabling LTO
    • Runtime of benchmark MempoolEviction changed -13.3 % when enabling LTO
    • Runtime of benchmark PrevectorDeserializeNontrivial changed -58.1 % when enabling LTO
    • Runtime of benchmark RollingBloom changed -15.0 % when enabling LTO

    Below is the log from my experimentation.

    Let me know if anything can be improved. Feedback appreciated.

    # Build Bitcoin without LTO (baseline)
    $ git clone https://github.com/bitcoin/bitcoin bitcoin-without-lto
    $ cd bitcoin-without-lto
    $ export CC="clang"
    $ export CXX="clang++"
    $ export RANLIB="/usr/lib/llvm-6.0/bin/llvm-ranlib"
    $ ./autogen.sh
    $ ./configure
    $ make
    $ cd ..
    
    # Build Bitcoin with LTO
    $ git clone https://github.com/bitcoin/bitcoin bitcoin-with-lto
    $ cd bitcoin-with-lto
    $ PREFIX=${PWD}/binutils-bin/
    $ mkdir binutils-bin
    $ apt install texinfo bison
    $ git clone --depth 1 git://sourceware.org/git/binutils-gdb.git binutils
    $ mkdir binutils-build
    $ cd binutils-build
    $ export CC="clang"
    $ export CXX="clang++"
    $ unset RANLIB
    $ ../binutils/configure --enable-gold --enable-plugins --disable-werror --prefix=${PREFIX}
    $ make all-gold
    $ make install
    $ cd ..
    $ ${PREFIX}/bin/ld.gold -plugin 2>&1 | grep -q "plugin: missing argument" && echo "ld.gold has plugin support" || echo "ERROR: ld.gold lacks plugin support"
    $ cp /usr/lib/llvm-6.0/lib/LLVMgold.so ${PREFIX}/lib/
    $ export PATH="${PREFIX}/bin:${PATH}"
    $ export CC="clang -flto"
    $ export CXX="clang++ -flto"
    $ export RANLIB="/usr/lib/llvm-6.0/bin/llvm-ranlib"
    $ ./autogen.sh
    $ ./configure
    $ make
    $ cd ..
    
    # Check binary sizes
    $ ls -Sl bitcoin-*-lto/src/bitcoind \
           bitcoin-*-lto/src/bitcoin-tx \
           bitcoin-*-lto/src/bench/bench_bitcoin \
           bitcoin-*-lto/src/bitcoin-cli \
           bitcoin-*-lto/src/test/test_bitcoin \
           bitcoin-*-lto/src/test/test_bitcoin_fuzzy
    -rwxr-xr-x 1 root root 161739656 Sep 20 11:57 bitcoin-without-lto/src/test/test_bitcoin
    -rwxr-xr-x 1 root root 102004960 Sep 20 11:57 bitcoin-without-lto/src/bitcoind
    -rwxr-xr-x 1 root root 100838072 Sep 20 12:12 bitcoin-with-lto/src/test/test_bitcoin
    -rwxr-xr-x 1 root root  74678800 Sep 20 11:57 bitcoin-without-lto/src/bench/bench_bitcoin
    -rwxr-xr-x 1 root root  70706000 Sep 20 12:11 bitcoin-with-lto/src/bitcoind
    -rwxr-xr-x 1 root root  39695288 Sep 20 12:10 bitcoin-with-lto/src/bench/bench_bitcoin
    -rwxr-xr-x 1 root root  15929968 Sep 20 11:57 bitcoin-without-lto/src/test/test_bitcoin_fuzzy
    -rwxr-xr-x 1 root root  15206720 Sep 20 11:57 bitcoin-without-lto/src/bitcoin-tx
    -rwxr-xr-x 1 root root   7717608 Sep 20 12:09 bitcoin-with-lto/src/bitcoin-tx
    -rwxr-xr-x 1 root root   6036176 Sep 20 12:09 bitcoin-with-lto/src/test/test_bitcoin_fuzzy
    -rwxr-xr-x 1 root root   4837744 Sep 20 11:57 bitcoin-without-lto/src/bitcoin-cli
    -rwxr-xr-x 1 root root   2918544 Sep 20 12:08 bitcoin-with-lto/src/bitcoin-cli
    $ strip bitcoin-*-lto/src/bitcoind \
           bitcoin-*-lto/src/bitcoin-tx \
           bitcoin-*-lto/src/bench/bench_bitcoin \
           bitcoin-*-lto/src/bitcoin-cli \
           bitcoin-*-lto/src/test/test_bitcoin \
           bitcoin-*-lto/src/test/test_bitcoin_fuzzy
    $ ls -Sl bitcoin-*-lto/src/bitcoind \
           bitcoin-*-lto/src/bitcoin-tx \
           bitcoin-*-lto/src/bench/bench_bitcoin \
           bitcoin-*-lto/src/bitcoin-cli \
           bitcoin-*-lto/src/test/test_bitcoin \
           bitcoin-*-lto/src/test/test_bitcoin_fuzzy
    -rwxr-xr-x 1 root root 12067056 Sep 20 15:54 bitcoin-without-lto/src/test/test_bitcoin
    -rwxr-xr-x 1 root root 10853616 Sep 20 15:54 bitcoin-with-lto/src/test/test_bitcoin
    -rwxr-xr-x 1 root root  6639336 Sep 20 15:54 bitcoin-without-lto/src/bitcoind
    -rwxr-xr-x 1 root root  6044520 Sep 20 15:54 bitcoin-with-lto/src/bitcoind
    -rwxr-xr-x 1 root root  5632272 Sep 20 15:54 bitcoin-without-lto/src/bench/bench_bitcoin
    -rwxr-xr-x 1 root root  3722720 Sep 20 15:54 bitcoin-with-lto/src/bench/bench_bitcoin
    -rwxr-xr-x 1 root root  1468976 Sep 20 15:54 bitcoin-without-lto/src/test/test_bitcoin_fuzzy
    -rwxr-xr-x 1 root root  1399112 Sep 20 15:54 bitcoin-without-lto/src/bitcoin-tx
    -rwxr-xr-x 1 root root   936080 Sep 20 15:54 bitcoin-with-lto/src/bitcoin-tx
    -rwxr-xr-x 1 root root   428160 Sep 20 15:54 bitcoin-with-lto/src/test/test_bitcoin_fuzzy
    -rwxr-xr-x 1 root root   383216 Sep 20 15:54 bitcoin-without-lto/src/bitcoin-cli
    -rwxr-xr-x 1 root root   260288 Sep 20 15:54 bitcoin-with-lto/src/bitcoin-cli
    
    # Gather performance measurements until ^C is pressed
    $ while true; do for SWITCH in with without; do echo "# $SWITCH"; \
        bitcoin-${SWITCH}-lto/src/bench/bench_bitcoin; done; done 2>&1 | \
        tee bench_bitcoin-lto-vs-non-lto
    
    # Summarize results
    $ ./parse_lto.py < bench_bitcoin-lto-vs-non-lto
    * Runtime of benchmark FastRandom_1bit changed -7.9 % when enabling LTO. Median total time was 4.4 seconds without LTO and 4.1 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
    * Runtime of benchmark FastRandom_32bit changed -6.7 % when enabling LTO. Median total time was 5.8 seconds without LTO and 5.4 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
    * Runtime of benchmark MatchGCSFilter changed -11.5 % when enabling LTO. Median total time was 8.3 seconds without LTO and 7.3 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
    * Runtime of benchmark MempoolEviction changed -13.3 % when enabling LTO. Median total time was 4.6 seconds without LTO and 4.0 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
    * Runtime of benchmark PrevectorDeserializeNontrivial changed -58.1 % when enabling LTO. Median total time was 8.2 seconds without LTO and 3.4 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
    * Runtime of benchmark RollingBloom changed -15.0 % when enabling LTO. Median total time was 4.4 seconds without LTO and 3.7 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
    
    # Environment
    $ clang++ --version | head -2
    clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
    Target: x86_64-pc-linux-gnu
    $ dpkg -S $(which clang++)
    clang: /usr/bin/clang++
    $ dpkg -S /usr/lib/llvm-6.0/bin/llvm-ranlib
    llvm-6.0: /usr/lib/llvm-6.0/bin/llvm-ranlib
    $ dpkg -S /usr/lib/llvm-6.0/lib/LLVMgold.so
    llvm-6.0-dev: /usr/lib/llvm-6.0/lib/LLVMgold.so
    $ cat /etc/lsb-release
    DISTRIB_ID=Ubuntu
    DISTRIB_RELEASE=18.04
    DISTRIB_CODENAME=bionic
    DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
    

    This is the content of parse_lto.py:

    #!/usr/bin/env python3
    
    import collections
    import statistics
    import sys
    
    results_lto = collections.defaultdict(list)
    results_nonlto = collections.defaultdict(list)
    for line in sys.stdin:
        line = line.rstrip("\n")
        if line.startswith("# Benchmark"):
            continue
        if line.startswith("#"):
            lto_status = line[2:]
            continue
        assert(lto_status in ["with", "without"])
        benchmark, _, _, total_time, _ = line.split(", ", 4)
        total_time = float(total_time)
        if lto_status == "with":
            results_lto[benchmark].append(total_time)
            continue
        if lto_status == "without":
            results_nonlto[benchmark].append(total_time)
            continue
        assert(False)
    
    assert(len(results_lto) == len(results_nonlto))
    for benchmark in sorted(results_lto):
        least_observations = min(len(results_lto[benchmark]), len(results_nonlto[benchmark]))
        results_lto[benchmark] = results_lto[benchmark][:least_observations]
        results_nonlto[benchmark] = results_nonlto[benchmark][:least_observations]
    for benchmark in sorted(results_lto):
        assert(len(results_lto[benchmark]) == len(results_nonlto[benchmark]))
        median_lto = statistics.median(results_lto[benchmark])
        median_nonlto = statistics.median(results_nonlto[benchmark])
        assert(median_nonlto != 0)
        change = median_lto / median_nonlto - 1
        if abs(change) < 0.05:
            continue
        print("* Runtime of benchmark {} changed {:.1f} % when enabling LTO. Median total time was {:.1f} seconds without LTO and {:.1f} seconds with LTO. Based on {} independent runs of bench_bitcoin.".format(
            benchmark, 100 * change, median_nonlto, median_lto, len(results_lto[benchmark])
        ))
    
  2. fanquake added the label Build system on Sep 20, 2018
  3. MarcoFalke commented at 1:35 PM on September 20, 2018: member

    Apparently this slowed down IBD: #10616 (comment)

  4. MarcoFalke added the label Brainstorming on Sep 20, 2018
  5. practicalswift commented at 1:45 PM on September 20, 2018: contributor

    @MarcoFalke Interesting! Perhaps @sipa encountered a gcc specific issue. FWIW I used stock Clang 6.0.0 shipped with Ubuntu 18.04.1 LTS.

    What would be the proper way to test for IBD slow down? @sipa, how did you test? :-)

  6. sipa commented at 4:40 PM on September 20, 2018: member

    @practicalswift Just reindex-chainstate with stopatheight, I believe.

    I was also testing with GCC, not clang. Perhaps all 4 combinations should be tested, as we don't really have a good idea of the practical performance effect of compiler already?

    Also, when comparing binary sizes, first strip them of debug symbols etc.

  7. MarcoFalke commented at 4:54 PM on September 20, 2018: member

    If we wanted to do this, I suggest to do at least a reindex-chainstate performance benchmark on all the gitian binaries/architectures.

  8. practicalswift commented at 6:09 PM on January 10, 2019: contributor

    @MarcoFalke Is the process for performing such a benchmark documented somewhere? I'd like to do some LTO vs non-LTO benchmarking.

  9. MarcoFalke commented at 6:18 PM on January 10, 2019: member

    Hmm, maybe we could just run the tests on https://bitcoinperf.com/

  10. practicalswift commented at 11:26 AM on May 19, 2020: contributor

    Closing due to lack of progress/interest :)

  11. practicalswift closed this on May 19, 2020

  12. fanquake referenced this in commit 681b25e3cd on Nov 25, 2021
  13. sidhujag referenced this in commit 67658eec7d on Nov 25, 2021
  14. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-01 00:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me