ci: build Valgrind (3.21) from source #27992

pull fanquake wants to merge 1 commits into bitcoin:master from fanquake:valgrind_build_from_source changing 3 files +11 −6
  1. fanquake commented at 2:41 pm on June 28, 2023: member

    I’ve been using this branch for some time, for working Valgrind CI jobs on aarch64. Benefits include:

    • Valgrind CI jobs work across x86_64 & aarch64.
    • Can use latest (hopefully less buggy) Valgrind, rather than whatever the distro happens to package.
    • No need to “bless” a specific compiler, (current discussion includes switching from Clang to GCC as a workaround).
    • Valgrind from source runs significantly faster compared to the system package. i.e, when fuzzing under valgrind:

    Master:

    0asmap_direct with args
    1Done 646 runs in 155 second(s)
    2....
    3addrman_deserialize with args
    4Done 2944 runs in 2875 second(s)
    

    vs running this branch:

    0asmap_direct with args
    1Done 646 runs in 23 second(s)
    2...
    3addrman_deserialize with args
    4Done 2944 runs in 413 second(s)
    

    This is also being seen in the qa-assets repo: https://github.com/bitcoin-core/qa-assets/pull/136#issuecomment-1611072317.

    For example, the tx_pool_standard target under Valgrind currently takes > 10 hours to complete:

     0Run tx_pool_standard with args ['valgrind', '--quiet', '--error-exitcode=1', '/tmp/cirrus-build/bitcoin-core/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz', '-runs=1', PosixPath('/tmp/cirrus-build/bitcoin-core/ci/scratch/qa-assets/fuzz_seed_corpus/tx_pool_standard')]INFO: Running with entropic power schedule (0xFF, 100).
     1INFO: Seed: 242510469
     2INFO: Loaded 1 modules   (248538 inline 8-bit counters): 248538 [0x27fd278, 0x2839d52), 
     3INFO: Loaded 1 PC tables (248538 PCs): 248538 [0x2839d58,0x2c04af8), 
     4INFO:     3775 files found in /tmp/cirrus-build/bitcoin-core/ci/scratch/qa-assets/fuzz_seed_corpus/tx_pool_standard
     5INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 1048576 bytes
     6INFO: seed corpus: files: 3775 min: 1b max: 2090089b total: 88593805b rss: 321Mb
     7[#16](/bitcoin-bitcoin/16/)	pulse  cov: 4536 ft: 4537 corp: 1/1b exec/s: 5 rss: 323Mb
     8[#32](/bitcoin-bitcoin/32/)	pulse  cov: 4538 ft: 4544 corp: 4/10b exec/s: 6 rss: 323Mb
     9[#64](/bitcoin-bitcoin/64/)	pulse  cov: 4540 ft: 4553 corp: 8/34b exec/s: 6 rss: 323Mb
    10[#128](/bitcoin-bitcoin/128/)	pulse  cov: 6319 ft: 9483 corp: 21/196b exec/s: 4 rss: 327Mb
    11[#256](/bitcoin-bitcoin/256/)	pulse  cov: 6339 ft: 13188 corp: 104/1621b exec/s: 2 rss: 327Mb
    12[#512](/bitcoin-bitcoin/512/)	pulse  cov: 8952 ft: 24180 corp: 262/6023b exec/s: 2 rss: 335Mb
    13[#1024](/bitcoin-bitcoin/1024/)	pulse  cov: 9924 ft: 36577 corp: 575/23Kb exec/s: 1 rss: 343Mb
    14<snip>
    15[#2048](/bitcoin-bitcoin/2048/)	pulse  cov: 10161 ft: 56438 corp: 1218/244Kb exec/s: 0 rss: 371Mb
    16<snip>
    17[#3776](/bitcoin-bitcoin/3776/)	INITED cov: 10988 ft: 65398 corp: 1933/10331Kb exec/s: 0 rss: 430Mb
    18[#3776](/bitcoin-bitcoin/3776/)	DONE   cov: 10988 ft: 65398 corp: 1933/10331Kb lim: 1048576 exec/s: 0 rss: 430Mb
    19Done 3776 runs in 37778 second(s)
    

    however with this branch, it takes 1.5 hours:

     0Run tx_pool_standard with args ['valgrind', '--quiet', '--error-exitcode=1', '/tmp/cirrus-build-1174734651/bitcoin-core/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz', '-runs=1', PosixPath('/tmp/cirrus-build-1174734651/bitcoin-core/ci/scratch/qa-assets/fuzz_seed_corpus/tx_pool_standard')]INFO: Running with entropic power schedule (0xFF, 100).
     1INFO: Seed: 350811728
     2INFO: Loaded 1 modules   (366489 inline 8-bit counters): 366489 [0x1c106d0, 0x1c69e69), 
     3INFO: Loaded 1 PC tables (366489 PCs): 366489 [0x1c69e70,0x2201800), 
     4INFO:     3775 files found in /tmp/cirrus-build-1174734651/bitcoin-core/ci/scratch/qa-assets/fuzz_seed_corpus/tx_pool_standard
     5INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 1048576 bytes
     6INFO: seed corpus: files: 3775 min: 1b max: 2090089b total: 88593805b rss: 302Mb
     7[#64](/bitcoin-bitcoin/64/)	pulse  cov: 1172 ft: 1186 corp: 11/47b exec/s: 32 rss: 304Mb
     8[#128](/bitcoin-bitcoin/128/)	pulse  cov: 1793 ft: 2253 corp: 32/285b exec/s: 32 rss: 304Mb
     9[#256](/bitcoin-bitcoin/256/)	pulse  cov: 1862 ft: 3792 corp: 99/1399b exec/s: 19 rss: 305Mb
    10[#512](/bitcoin-bitcoin/512/)	pulse  cov: 3074 ft: 7764 corp: 221/4862b exec/s: 15 rss: 308Mb
    11[#1024](/bitcoin-bitcoin/1024/)	pulse  cov: 3767 ft: 12721 corp: 498/20Kb exec/s: 10 rss: 314Mb
    12[#2048](/bitcoin-bitcoin/2048/)	pulse  cov: 4141 ft: 22302 corp: 1101/224Kb exec/s: 5 rss: 341Mb
    13<snip>
    14[#3776](/bitcoin-bitcoin/3776/)	INITED cov: 4573 ft: 26452 corp: 1737/6505Kb exec/s: 0 rss: 400Mb
    15[#3776](/bitcoin-bitcoin/3776/)	DONE   cov: 4573 ft: 26452 corp: 1737/6505Kb lim: 698384 exec/s: 0 rss: 400Mb
    16Done 3776 runs in 5163 second(s)
    

    Running the native_valgrind CI (master, aarch64):

     0test/sighash_tests.cpp(120): Entering test case "sighash_test"
     1==21957== Source and destination overlap in memcpy(0x871e4b0, 0x871e4b0, 36)
     2==21957==    at 0x488CFA0: __GI_memcpy (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
     3==21957==    by 0x8F7F63: CTxIn::operator=(CTxIn const&) (transaction.h:74)
     4==21957==    by 0x93F96B: SignatureHashOld(CScript, CTransaction const&, unsigned int, int) (sighash_tests.cpp:76)
     5==21957==    by 0x93EF1F: sighash_tests::sighash_test::test_method() (sighash_tests.cpp:138)
     6==21957==    by 0x93EB73: sighash_tests::sighash_test_invoker() (sighash_tests.cpp:120)
     7==21957==    by 0x36CF47: boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:117)
     8==21957==    by 0x25B367: boost::function0<void>::operator()() const (function_template.hpp:763)
     9==21957==    by 0x2D6647: boost::detail::forward::operator()() (execution_monitor.ipp:1388)
    10==21957==    by 0x2D627F: boost::detail::function::function_obj_invoker0<boost::detail::forward, int>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:137)
    11==21957==    by 0x2D0393: boost::function0<int>::operator()() const (function_template.hpp:763)
    12==21957==    by 0x234A6B: int boost::detail::do_invoke<boost::shared_ptr<boost::detail::translator_holder_base>, boost::function<int ()> >(boost::shared_ptr<boost::detail::translator_holder_base> const&, boost::function<int ()> const&) (execution_monitor.ipp:301)
    13==21957==    by 0x1F7277: boost::execution_monitor::catch_signals(boost::function<int ()> const&) (execution_monitor.ipp:903)
    14==21957==
    15{
    16   <insert_a_suppression_name_here>
    17   Memcheck:Overlap
    18   fun:__GI_memcpy
    19   fun:_ZN5CTxInaSERKS_
    20   fun:_ZL16SignatureHashOld7CScriptRK12CTransactionji
    21   fun:_ZN13sighash_tests12sighash_test11test_methodEv
    22   fun:_ZN13sighash_testsL20sighash_test_invokerEv
    23   fun:_ZN5boost6detail8function22void_function_invoker0IPFvvEvE6invokeERNS1_15function_bufferE
    24   fun:_ZNK5boost9function0IvEclEv
    25   fun:_ZN5boost6detail7forwardclEv
    26   fun:_ZN5boost6detail8function21function_obj_invoker0INS0_7forwardEiE6invokeERNS1_15function_bufferE
    27   fun:_ZNK5boost9function0IiEclEv
    28   fun:_ZN5boost6detail9do_invokeINS_10shared_ptrINS0_22translator_holder_baseEEENS_8functionIFivEEEEEiRKT_RKT0_
    29   fun:_ZN5boost17execution_monitor13catch_signalsERKNS_8functionIFivEEE
    30}
    

    vs running this branch:

    0real	118m55.057s
    

    Disadvantages includes:

    • Becoming slightly more of a package manager in the CI.

    Related to the discussion in #27444. See also https://github.com/bitcoin-core/qa-assets/pull/136.

  2. fanquake added the label Brainstorming on Jun 28, 2023
  3. fanquake added the label Tests on Jun 28, 2023
  4. DrahtBot commented at 2:41 pm on June 28, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK dergoegge
    Concept NACK MarcoFalke
    Concept ACK TheCharlatan

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28071 (ci: Add missing -O2 to valgrind tasks by MarcoFalke)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. MarcoFalke commented at 2:57 pm on June 28, 2023: member

    No need to “bless” a specific compiler, (current discussion includes switching from Clang to GCC as a workaround).

    I think this is the same as point one, which is about the “Source and destination overlap in memcpy” false positive? If so, it could make sense to combine it into one point.

    Can use latest (hopefully less buggy) Valgrind, rather than whatever the distro happens to package.

    Does it mean we drop support for distro-shipped valgrind? Might be good to clarify, and also might be good to report both issues upstream.

  6. DrahtBot added the label CI failed on Jun 28, 2023
  7. DrahtBot removed the label CI failed on Jun 29, 2023
  8. dergoegge commented at 12:01 pm on June 29, 2023: member

    Concept ACK

    Speeding up the valgrind jobs seems like a nice win! (assuming our self compiled valgrind isn’t skipping more expensive checks).

  9. fanquake marked this as ready for review on Jun 30, 2023
  10. TheCharlatan commented at 8:58 pm on June 30, 2023: contributor

    Re #27992 (comment)

    (assuming our self compiled valgrind isn’t skipping more expensive checks)

    I checked the debian/rules file and did not see anything that we might be missing there in terms of build configuration.

    Concept ACK

  11. fanquake commented at 11:25 am on July 3, 2023: member

    Does it mean we drop support for distro-shipped valgrind?

    I think they should remain supported similar to what we do now, where we roughly support recent Valgrind versions on recent distro releases, when combined with recent compilers etc, and we still see occasional issues i.e #27741.

    I checked the debian/rules file and did not see anything that we might be missing there in terms of build configuration.

    I posted a little more about this here: https://github.com/bitcoin-core/qa-assets/pull/136#issuecomment-1611107599. The only option would be --enable-tls (HAVE_TLS), but that is only used in Valgrinds own tests (valgrind/*/tests/). Outside of that, I can’t see anything obvious.

  12. ci: build Valgrind (3.21) from source
    I've been using this branch for some time, for working Valgrind CI jobs
    on aarch64. Benefits include:
    * Valgrind CI jobs work across x86_64 & aarch64.
    * Can use latest (hopefully less buggy) Valgrind, rather than whatever
      the distro happens to package.
    * No need to "bless" a specific compiler for use with Valgrind, (current
      discussion includes switching from Clang to GCC).
    * Valgrind from source seems to run significantly faster compared to running
      the system package. i.e, when fuzzing under Valgrind:
    
    Master:
    ```bash
    asmap_direct with args
    Done 646 runs in 155 second(s)
    ....
    addrman_deserialize with args
    Done 2944 runs in 2875 second(s)
    ```
    
    vs running this branch:
    ```bash
    asmap_direct with args
    Done 646 runs in 23 second(s)
    ...
    addrman_deserialize with args
    Done 2944 runs in 413 second(s)
    ```
    
    This is also being seen in the qa-assets repo: https://github.com/bitcoin-core/qa-assets/pull/136#issuecomment-1611072317.
    For example, the `descriptor_parse` target under Valgrind currently takes:
    `Done 6304 runs in 12971 second(s)`
    however [with this branch, it takes](https://cirrus-ci.com/task/4623075795271680?):
    `Done 6304 runs in 4609 second(s)`.
    
    Running the native_valgrind CI (master, aarch64):
    ```bash
    test/sighash_tests.cpp(120): Entering test case "sighash_test"
    ==21957== Source and destination overlap in memcpy(0x871e4b0, 0x871e4b0, 36)
    ==21957==    at 0x488CFA0: __GI_memcpy (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
    ==21957==    by 0x8F7F63: CTxIn::operator=(CTxIn const&) (transaction.h:74)
    ==21957==    by 0x93F96B: SignatureHashOld(CScript, CTransaction const&, unsigned int, int) (sighash_tests.cpp:76)
    ==21957==    by 0x93EF1F: sighash_tests::sighash_test::test_method() (sighash_tests.cpp:138)
    ==21957==    by 0x93EB73: sighash_tests::sighash_test_invoker() (sighash_tests.cpp:120)
    ==21957==    by 0x36CF47: boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:117)
    ==21957==    by 0x25B367: boost::function0<void>::operator()() const (function_template.hpp:763)
    ==21957==    by 0x2D6647: boost::detail::forward::operator()() (execution_monitor.ipp:1388)
    ==21957==    by 0x2D627F: boost::detail::function::function_obj_invoker0<boost::detail::forward, int>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:137)
    ==21957==    by 0x2D0393: boost::function0<int>::operator()() const (function_template.hpp:763)
    ==21957==    by 0x234A6B: int boost::detail::do_invoke<boost::shared_ptr<boost::detail::translator_holder_base>, boost::function<int ()> >(boost::shared_ptr<boost::detail::translator_holder_base> const&, boost::function<int ()> const&) (execution_monitor.ipp:301)
    ==21957==    by 0x1F7277: boost::execution_monitor::catch_signals(boost::function<int ()> const&) (execution_monitor.ipp:903)
    ==21957==
    {
       <insert_a_suppression_name_here>
       Memcheck:Overlap
       fun:__GI_memcpy
       fun:_ZN5CTxInaSERKS_
       fun:_ZL16SignatureHashOld7CScriptRK12CTransactionji
       fun:_ZN13sighash_tests12sighash_test11test_methodEv
       fun:_ZN13sighash_testsL20sighash_test_invokerEv
       fun:_ZN5boost6detail8function22void_function_invoker0IPFvvEvE6invokeERNS1_15function_bufferE
       fun:_ZNK5boost9function0IvEclEv
       fun:_ZN5boost6detail7forwardclEv
       fun:_ZN5boost6detail8function21function_obj_invoker0INS0_7forwardEiE6invokeERNS1_15function_bufferE
       fun:_ZNK5boost9function0IiEclEv
       fun:_ZN5boost6detail9do_invokeINS_10shared_ptrINS0_22translator_holder_baseEEENS_8functionIFivEEEEEiRKT_RKT0_
       fun:_ZN5boost17execution_monitor13catch_signalsERKNS_8functionIFivEEE
    }
    ```
    
    vs running this branch:
    ```bash
    real	118m55.057s
    ```
    
    Disadvantages includes:
    * Becoming slightly more of a package manager in the CI.
    
    Related to the discussion in
    https://github.com/bitcoin/bitcoin/pull/27444.
    83266cd4b5
  13. fanquake force-pushed on Jul 4, 2023
  14. DrahtBot added the label CI failed on Jul 4, 2023
  15. dergoegge approved
  16. dergoegge commented at 12:58 pm on July 7, 2023: member
    utACK 83266cd4b5cb419a4ba45d4a55b955d6a4c82cd7
  17. MarcoFalke commented at 11:53 am on July 10, 2023: member
    Also, I don’t think this is a fix for https://github.com/bitcoin/bitcoin/pull/27444/files#r1165491936. This will still fail outside the CI. Here, you are only modifying the CI to use a different version of valgrind.
  18. fanquake commented at 3:04 pm on July 10, 2023: member

    This will still fail outside the CI.

    Yes, but outside the CI, the only route to solving that would be with more (achitecture specific) suppressions? If we don’t control anything else, we can’t make any guarantees about anything passing or not.

    At least in this PR, it no-longer fails inside the CI, and that’s difference between the CI working, and not, on aarch64.

    Happy to followup with further improvements to add more architecture specific supressions or similar, for use outside the CI.

  19. MarcoFalke commented at 9:48 am on July 11, 2023: member

    Well, it would be be good to exactly explain what this change is trying to solve and then explain why it solves the issue.

    If you are trying to fix the slowness, a better fix may be to just apply -O3 instead of -O0 to Bitcoin Core and leave valgrind alone, as failing to apply optimizations is not a valgrind bug.

    If you are trying to fix something else, it may be good to explain as well.

  20. MarcoFalke commented at 9:50 am on July 11, 2023: member
    NACK. I don’t think compiling from source should be used to unintentionally and accidentally fix an unrelated bug.
  21. dergoegge commented at 10:33 am on July 13, 2023: member

    Looks like this is implicitly doing the same as #28071? from the qa-assets test run: https://cirrus-ci.com/task/4623075795271680?logs=ci#L1952-L1955

     0Options used to compile and link:
     1  external signer = no
     2  multiprocess    = no
     3  with experimental syscall sandbox support = no
     4  with libs       = no
     5  with wallet     = yes
     6    with sqlite   = yes
     7    with bdb      = no
     8  with gui / qt   = no
     9  with zmq        = no
    10  with test       = not building test_bitcoin because fuzzing is enabled
    11  with fuzz binary = yes
    12  with bench      = no
    13  with upnp       = no
    14  with natpmp     = no
    15  use asm         = yes
    16  USDT tracing    = no
    17  sanitizers      = fuzzer
    18  debug enabled   = no
    19  gprof enabled   = no
    20  werror          = yes
    21  LTO             = no
    22  target os       = linux-gnu
    23  build os        = linux-gnu
    24  CC              = /usr/bin/ccache clang
    25  CFLAGS          = -pthread -g -O2
    26  CPPFLAGS        =  -DABORT_ON_FAILED_ASSUME -fmacro-prefix-map=$(abs_top_srcdir)=.  -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3  -DHAVE_BUILD_INFO 
    27  CXX             = /usr/bin/ccache clang++ -std=c++17
    28  CXXFLAGS        =   -fdebug-prefix-map=$(abs_top_srcdir)=.  -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection  -Wall -Wextra -Wgnu -Wformat -Wformat-security -Wvla -Wshadow-field -Wthread-safety -Wloop-analysis -Wredundant-decls -Wunused-member-function -Wdate-time -Wconditional-uninitialized -Woverloaded-virtual -Wsuggest-override -Wunreachable-code-loop-increment -Wimplicit-fallthrough -Wdocumentation  -Wno-unused-parameter -Wno-self-assign -Werror   -g -O2
    
  22. fanquake commented at 9:52 am on July 14, 2023: member
    I’ll follow up with just adding more suppressions until things work on aarch64.
  23. fanquake closed this on Jul 14, 2023

  24. MarcoFalke commented at 10:12 am on July 14, 2023: member
    I haven’t tried, but is this still an issue on aarch64 on current master?
  25. MarcoFalke commented at 2:32 pm on July 14, 2023: member
    Just ran both CI configs locally on aarch64 and they passed.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-29 07:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me