TSAN/MSAN fails with vm.mmap_rnd_bits=32 even with llvm 18.1.3 #30674

issue Sjors openend this issue on August 19, 2024
  1. Sjors commented at 8:34 am on August 19, 2024: member

    The Cirrus CI on my fork of the repo runs on Ubuntu 24.04 with kernel version 6.8.0-38. This has vm.mmap_rnd_bits=32 set, which causes the TSAN and MSAN jobs to fail.

    See:

    TSAN: https://cirrus-ci.com/task/6619444124844032

     0FAIL: minisketch/test
     1=====================
     2ThreadSanitizer: CHECK failed: tsan_platform_linux.cpp:282 "((personality(old_personality | ADDR_NO_RANDOMIZE))) != ((-1))" (0xffffffffffffffff, 0xffffffffffffffff) (tid=42931)
     3FAIL minisketch/test (exit status: 139)
     4FAIL: univalue/test/object
     5==========================
     6ThreadSanitizer: CHECK failed: tsan_platform_linux.cpp:282 "((personality(old_personality | ADDR_NO_RANDOMIZE))) != ((-1))" (0xffffffffffffffff, 0xffffffffffffffff) (tid=42964)
     7FAIL univalue/test/object (exit status: 139)
     8FAIL: qt/test/test_bitcoin-qt
     9=============================
    10ThreadSanitizer: CHECK failed: tsan_platform_linux.cpp:282 "((personality(old_personality | ADDR_NO_RANDOMIZE))) != ((-1))" (0xffffffffffffffff, 0xffffffffffffffff) (tid=42994)
    11FAIL qt/test/test_bitcoin-qt (exit status: 139)
    

    MSAN: https://cirrus-ci.com/task/4578750543691776

    0unning tests: base58_tests from test/base58_tests.cpp
    1Running tests: base64_tests from test/base64_tests.cpp
    2MemorySanitizer: CHECK failed: msan_linux.cpp:192 "((personality(old_personality | ADDR_NO_RANDOMIZE))) != ((-1))" (0xffffffffffffffff, 0xffffffffffffffff) (tid=22112)
    3    <empty stack>
    4make[3]: *** [Makefile:22563: test/base32_tests.cpp.test] Error 1
    5make[3]: *** Waiting for unfinished jobs....
    6MemorySanitizer: CHECK failed: msan_linux.cpp:192 "((personality(old_personality | ADDR_NO_RANDOMIZE))) != ((-1))" (0xffffffffffffffff, 0xffffffffffffffff) (tid=22137)
    7    <empty stack>
    

    This job was from mid July. Just in case I reproduced it against todays master: https://github.com/Sjors/bitcoin/pull/57 / https://cirrus-ci.com/task/4886869396160512

    My (limited) understanding is that the underlying issue should have been fixed and the fix has been backported to llvm 18.1.3: https://github.com/google/sanitizers/issues/1614#issuecomment-2010316781

    Ubuntu 24.04 has shipped that version since early July:https://launchpad.net/ubuntu/noble/amd64/clang-18

    I can see in the CI log this this version was indeed used:

    0Get:123 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libllvm18 amd64 1:18.1.3-1ubuntu1 [27.5 MB]
    

    Although I can trivially work around the issue by setting vm.mmap_rnd_bits=28, perhaps there is a deeper issue worth investigating.

    Possibly related: https://github.com/ClickHouse/ClickHouse/issues/64086 (they also tried 18.1.3 and 18.1.6).

  2. Sjors renamed this:
    TSAN/MSAN fails with vm.mmap_rnd_bits=32 even with 18.1.3
    TSAN/MSAN fails with vm.mmap_rnd_bits=32 even with llvm 18.1.3
    on Aug 19, 2024
  3. maflcko commented at 9:40 am on August 19, 2024: member

    You re-ran the same task on the same commit on the same machine 3 hours later and it passed: https://cirrus-ci.com/task/6619444124844032?logs=ci#L313 vs https://cirrus-ci.com/task/5557228785106944?logs=ci#L311

    Did you change anything in between?

  4. maflcko commented at 9:44 am on August 19, 2024: member
    Also, probably unrelated, but if you want, you can test #30639 and https://github.com/bitcoin/bitcoin/pull/30634
  5. maflcko added the label Questions and Help on Aug 19, 2024
  6. maflcko added the label CI failed on Aug 19, 2024
  7. Sjors commented at 10:07 am on August 19, 2024: member

    @maflcko yes, I first reproduced the issue and then tested the workaround vm.mmap_rnd_bits=28. See https://github.com/Sjors/bitcoin/pull/51.

    I’ll try those clang-19 PRs. If that fixes the issue then presumably the issue is in llvm and they should consider backporting additional commits. But if it doesn’t then maybe the problem is on our side (even though it’s trivial to work around).

  8. maflcko commented at 11:06 am on August 19, 2024: member
    I see. So in theory it should be reproducible by setting up a vanilla Ubuntu 24.04 (or later) host to run the CI tasks. I guess no one has done so yet, given that you are the first one to observe the issue. However, if it is reproducible, then it probably should be fixed.
  9. maflcko removed the label Questions and Help on Aug 19, 2024
  10. Sjors commented at 11:11 am on August 20, 2024: member
    @maflcko clang 19 fixes neither, see https://github.com/Sjors/bitcoin/pull/59.
  11. maflcko commented at 2:38 pm on October 16, 2024: member
    https://github.com/llvm/llvm-project/commit/7d039effc4930be9240446a4241d268a39960e0b only added two bits 28->30, so a failure with 32 is still expected, unless I am missing something.
  12. fanquake commented at 2:49 pm on November 13, 2024: member
    Has started happening again in the TSAN CI (Clang 19.1.4)? https://cirrus-ci.com/task/5338932714405888?logs=ci#L2213

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me