RFC: build: support for pre-compiled headers. #31053

pull theuni wants to merge 5 commits into bitcoin:master from theuni:cmake-pch changing 11 files +104 −1
  1. theuni commented at 9:33 pm on October 7, 2024: member

    This cuts off roughly a third of my total build time, either with -j8 or -j24.

    There are many different ways to go about adding pre-compiled header support with CMake. This is probably the simplest and most naive, but it has a substantial impact.

    I used ClangBuildAnalyzer to measure the headers which took the most amount of time to parse while building the kernel library. I did this under the assumption that those are probably the most included files in general, and because they should never be affected by changing defines/flags in different compilation units.

    CMake re-generates a new pch file for each target that includes it, so it’s only worth using for libraries/binaries which compile a substantial list of cpp files. I eyeballed the libs/bins which I thought made sense.

    I chose to add base_precompiled as PRIVATE everywhere so that it’s clear exactly where they’re being used.

    Other approaches would involve keeping (or generating) lists of proper headers for each lib/bin, and potentially sharing them around when possible. The approach here (a monolithic interface lib) is unfortunately incompatible with sharing.

    There’s an additional (hackish) commit to support ccache + pch. Unfortunately this requires a rather new (4.8+) ccache in order to properly set the variables we need. With the required options turned on as documented by ccache, this combo works fine for me locally.

    Overall, I think this is a reasonable approach because it pretty much just works with no fiddling. The main drawback is that over time the headers list will grow stale, so it’ll require some attention now and then.


    Results: (-DBUILD_UTIL_CHAINSTATE=ON -DBUILD_FUZZ_BINARY=ON -DBUILD_BENCH=ON)

    Before: **** Time summary: Compilation (1412 times): Parsing (frontend): 2619.3 s Codegen & opts (backend): 1553.3 s

    Real make time: 6m20.894s

    After: **** Time summary: Compilation (1422 times): Parsing (frontend): 1375.4 s Codegen & opts (backend): 1726.7 s

    Real make time: 4m24.634s

    pch + ccache 2nd run: 0m10.073s

  2. DrahtBot commented at 9:33 pm on October 7, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31053.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK TheCharlatan, hebasto, laanwj, edilmedeiros, BrandonOdiwuor

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30882 (wip: Split fuzz binary (take 2) by dergoegge)
    • #30861 (build: Improve ccache performance for different build directories by hebasto)
    • #30811 (build: Unify -logsourcelocations format by hebasto)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. theuni commented at 9:35 pm on October 7, 2024: member

    Ping @hebasto, @fanquake

    Also @willcl-ark, you might be interested in this as you’ve been benchmarking build times.

  4. TheCharlatan commented at 9:41 pm on October 7, 2024: contributor
    Concept ACK
  5. theuni added the label Build system on Oct 7, 2024
  6. DrahtBot added the label CI failed on Oct 7, 2024
  7. DrahtBot commented at 10:50 pm on October 7, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/31200880897

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  8. fanquake commented at 2:28 pm on October 8, 2024: member

    MSAN failure was spurious, but not the tidy (https://cirrus-ci.com/task/4930868089716736):

    0[21:49:26.130] clang-tidy-18 -p=/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu -quiet -load=/tidy-build/libbitcoin-tidy.so /ci_container_base/src/validationinterface.cpp
    1[21:49:26.130] /ci_container_base/src/validationinterface.cpp:22:13: error: redundant 'RemovalReasonToString' declaration [readability-redundant-declaration,-warnings-as-errors]
    2[21:49:26.130]    22 | std::string RemovalReasonToString(const MemPoolRemovalReason& r) noexcept;
    3[21:49:26.130]       | ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    4[21:49:26.130] /ci_container_base/src/kernel/mempool_removal_reason.h:22:13: note: previously declared here
    5[21:49:26.130]    22 | std::string RemovalReasonToString(const MemPoolRemovalReason& r) noexcept;
    6[21:49:26.130]       |             ^
    7[21:49:26.130] ^^^ ⚠️ Failure generated from clang-tidy
    8[21:49:26.130] + echo '^^^ ⚠️ Failure generated from clang-tidy'
    9[21:49:26.130] + false
    
  9. theuni commented at 3:22 pm on October 8, 2024: member

    Huh, I guess that happens because pch shuffles around the include order.

    The redundant decl indeed seems broken to me. Will PR a quick fix.

  10. theuni force-pushed on Oct 8, 2024
  11. theuni commented at 8:41 pm on October 8, 2024: member

    Combining with #30911 produces even more of a speedup (with Make, ninja is about the same).

    Config: -DCMAKE_BUILD_TYPE=Debug -DWITH_CCACHE=OFF -DBUILD_UTIL_CHAINSTATE=ON -DBUILD_FUZZ_BINARY=ON -DBUILD_BENCH=ON Using: make -j24

    Master: 2m7.311s This PR: 1m34.495s This PR + #30911: 1m19.046s

    Result: Completes in 62% of the time compared to master.

  12. DrahtBot removed the label CI failed on Oct 8, 2024
  13. fanquake referenced this in commit 5fb9455063 on Oct 9, 2024
  14. hebasto commented at 9:16 am on October 9, 2024: member

    Combining with #30911 produces even more of a speedup (with Make, ninja is about the same).

    Why are ninja builds not affected?

  15. theuni commented at 4:20 pm on October 9, 2024: member

    Combining with #30911 produces even more of a speedup (with Make, ninja is about the same).

    Why are ninja builds not affected?

    Ninja builds are sped up significantly by pch (this PR), it’s the combo with #30911 doesn’t seem to have much effect.

  16. build: Add a rough attempt at pre-compiled headers
    These are all kernel headers which required the most amount of frontend
    parsing as measured by clang's -ftime-trace option.
    92e49c7a90
  17. build: support ccache + pre-compiled headers when possible
    ccache supports setting key=value options as of v4.8. Prior to that, setting
    sloppiness would've required substantial hacking.
    
    For 4.8+:
      - Set the required sloppiness options
      - Set "-Xclang -fno-pch-timestamp" for clang
    
    Both options are documented here: https://ccache.dev/manual/latest.html
    5b709e434d
  18. build: display pch status in CMake output 09716ef408
  19. guix: disable pre-compiled headers cbe297fc4a
  20. ci: disable pch for tidy/iwyu builds
    PCH is incompabible with iwyu:
      error: include-what-you-use does not support PCH
    
    It seems likely that it'd mess with clang-tidy as well.
    3cec3508a7
  21. theuni force-pushed on Oct 9, 2024
  22. laanwj commented at 8:47 am on October 17, 2024: member
    Concept ACK. It’s nice that CMAKE provides a built-in abstraction for this so there’s no need for really ugly stuff.
  23. in CMakeLists.txt:646 in 3cec3508a7
    637@@ -638,6 +638,12 @@ flags_summary()
    638 message("Attempt to harden executables ......... ${ENABLE_HARDENING}")
    639 message("Treat compiler warnings as errors ..... ${WERROR}")
    640 message("Use ccache for compiling .............. ${WITH_CCACHE}")
    641+if(CMAKE_DISABLE_PRECOMPILE_HEADERS)
    642+  set(pch_status OFF)
    643+else()
    644+  set(pch_status ON)
    645+endif()
    646+message("Precompiled-headers ................... ${pch_status}")
    


    hebasto commented at 1:15 pm on October 22, 2024:

    nit:

    0message("Precompiled headers ................... ${pch_status}")
    
  24. hebasto commented at 1:20 pm on October 22, 2024: member

    Concept ACK.

    Tested and benchmarked on Ubuntu 23.10 with CMake 3.27.4 using the following command:

    0$ rm -rf build && cmake -B build -G Ninja --preset dev-mode -DWITH_MULTIPROCESS=OFF -DWITH_CCACHE=OFF && time cmake --build build -j 16
    
    • The master branch @ 5fb94550638d0b01c184d2f3d5d97c8874c8c34b:
    0real	6m46.350s
    
    • this PR @ 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8:
    0real	6m10.958s
    

    Do you want to keep the base_precompiled target or use the PRECOMPILE_HEADERS property on per-target basis?

  25. bitcoin deleted a comment on Oct 22, 2024
  26. edilmedeiros commented at 7:41 pm on October 23, 2024: contributor

    Concept ACK Code LGTM.


    rm -rf build && cmake -B build -DBoost_INCLUDE_DIR=/opt/local/libexec/boost/1.78/include -DCMAKE_BUILD_TYPE=Debug -DWITH_CCACHE=OFF -DBUILD_BENCH=ON -DBUILD_FUZZ_BINARY=ON -DBUILD_UTIL_CHAINSTATE=ON && /usr/bin/time cmake --build build -j 11

    Master at ffe4261cb0669b1e1a926638e0498ae5b63f3599 1m56,95s real

    This PR at 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8 1m32,76s real


    With ccache 4.10:

    rm -rf build && cmake -B build -DBoost_INCLUDE_DIR=/opt/local/libexec/boost/1.78/include -DCMAKE_BUILD_TYPE=Debug -DWITH_CCACHE=ON -DBUILD_BENCH=ON -DBUILD_FUZZ_BINARY=ON -DBUILD_UTIL_CHAINSTATE=ON && /usr/bin/time -h cmake --build build -j 11

    Master at ffe4261cb0669b1e1a926638e0498ae5b63f3599 2m18,47s real

    This PR at 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8 1m52,09s real


    All running on a MacBook Pro M3 Pro 18GB with MacOS Sonoma 14.6.1. cmake version 3.29.5. Interesting that ccache made my build slower.

  27. laanwj commented at 6:03 pm on October 24, 2024: member

    Complete build on RPi5 with NVME hat, 8GB memory:

    0$ cmake -B build  -G Ninja --preset=dev-mode  -DWITH_CCACHE=OFF --toolchain /data/tmp/bitcoin/depends/aarch64-unknown-linux-gnu/toolchain.cmake 
    1$ time ninja -C build
    

    (PR with pch 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8) real 33m33.235s user 127m24.712s sys 4m22.753s

    (base commit 5fb94550638d0b01c184d2f3d5d97c8874c8c34b) real 37m39.927s user 144m17.888s sys 3m26.133s

    About 10% faster (in real time), not as impressive as some results on faster machines, but still nice to have.

    (did the same bechmark on SD card instead of NVME, the difference is not significant, CPU really dominates instead of I/O here)

    Interesting that ccache made my build slower.

    well-ccache speeds up repeated builds; it makes sense that if it cannot cache anything it’s slightly slower, as it has to store all the input and outputs on disk

  28. in src/CMakeLists.txt:78 in 3cec3508a7
    74@@ -75,6 +75,76 @@ add_subdirectory(secp256k1)
    75 set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
    76 string(APPEND CMAKE_C_COMPILE_OBJECT " ${APPEND_CPPFLAGS} ${APPEND_CFLAGS}")
    77 
    78+
    


    fanquake commented at 4:31 pm on November 5, 2024:

    Overall, I think this is a reasonable approach because it pretty much just works with no fiddling. The main drawback is that over time the headers list will grow stale, so it’ll require some attention now and then.

    Not sure if here, but it’d seem useful to document some (minified) version of the PR description in regards to maintaining this going forward. If anything to prevent if from being too arbtrary.


    theuni commented at 4:00 pm on November 6, 2024:

    Yes. To be clear, I really don’t like the arbitrary nature of this at all. I was hoping someone would have a better idea for an approach to selection.

    I really meant this as an RFC because I agree that what’s here is a pretty crappy solution. But it’s clear that pch is a win, so I think we should come up with something.

  29. DrahtBot added the label Needs rebase on Nov 6, 2024
  30. DrahtBot commented at 11:37 am on November 6, 2024: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  31. BrandonOdiwuor commented at 12:39 pm on November 20, 2024: contributor
    Concept ACK using pre-compiled headers to speed up compilation
  32. in cmake/ccache.cmake:37 in 5b709e434d outdated
    32+        OUTPUT_QUIET
    33+        ERROR_QUIET
    34+      )
    35+      if(NOT CCACHE_TEST_RESULT)
    36+        list(APPEND CMAKE_CXX_COMPILER_LAUNCHER sloppiness=pch_defines,time_macros)
    37+        # According to the ccache docs clang requires this option.
    


    willcl-ark commented at 4:37 pm on November 26, 2024:

    They also give GCC flags. Perhaps here we can therefore use something like:

     0diff --git a/cmake/ccache.cmake b/cmake/ccache.cmake
     1index 08f078e627b..cb7059884fa 100644
     2--- a/cmake/ccache.cmake
     3+++ b/cmake/ccache.cmake
     4@@ -35,7 +35,12 @@ if(NOT MSVC)
     5       if(NOT CCACHE_TEST_RESULT)
     6         list(APPEND CMAKE_CXX_COMPILER_LAUNCHER sloppiness=pch_defines,time_macros)
     7         # According to the ccache docs clang requires this option.
     8-        try_append_cxx_flags("-Xclang -fno-pch-timestamp" TARGET core_interface SKIP_LINK)
     9+        if(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
    10+          try_append_cxx_flags("-Xclang -fno-pch-timestamp" TARGET core_interface SKIP_LINK)
    11+        elseif(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
    12+          try_append_cxx_flags("-fpch-preprocess" TARGET core_interface SKIP_LINK)
    13+        endif()
    14       else()
    15         list(APPEND configure_warnings "For pre-compiled headers support with ccache, version 4.8+ is required. Disabling pch.")
    16         set(CMAKE_DISABLE_PRECOMPILE_HEADERS ON)
    
  33. willcl-ark commented at 12:40 pm on November 27, 2024: member

    Took a first pass over this. It seems to work for me locally, both with and without ccache.

    I see both clang and gcc running faster with this change than merge-base @ 5fb94550638 (with clang being much faster than gcc in both cases):

    0Summary
    1  env CCACHE_DISABLE=1 CC=clang CXX=clang++ just build (commit = 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8) ran
    2    1.32 ± 0.02 times faster than env CCACHE_DISABLE=1 CC=clang CXX=clang++ just build (commit = 5fb94550638)
    3    1.43 ± 0.00 times faster than env CCACHE_DISABLE=1 CC=gcc CXX=g++ just build (commit = 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8)
    4    1.53 ± 0.03 times faster than env CCACHE_DISABLE=1 CC=gcc CXX=g++ just build (commit = 5fb94550638)
    
    Command Mean [s] Min [s] Max [s] Relative
    env CCACHE_DISABLE=1 CC=clang CXX=clang++ just build (commit = 5fb94550638) 178.672 ± 2.623 176.287 181.481 1.32 ± 0.02
    env CCACHE_DISABLE=1 CC=clang CXX=clang++ just build (commit = 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8) 134.888 ± 0.242 134.672 135.150 1.00
    env CCACHE_DISABLE=1 CC=gcc CXX=g++ just build (commit = 5fb94550638) 206.191 ± 4.083 203.754 210.905 1.53 ± 0.03
    env CCACHE_DISABLE=1 CC=gcc CXX=g++ just build (commit = 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8) 193.261 ± 0.185 193.056 193.416 1.43 ± 0.00

    I too can’t think of a cleaner way of implementing this, or making it more automated than using a manual list.

    Left one comment about adding flags for gcc


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me