test: 32-bit Clang ipc_test failure at -O0 #31772

issue fanquake openend this issue on January 31, 2025
  1. fanquake commented at 3:12 pm on January 31, 2025: member

    Noticed as part of this branch #29796, however it can also be reproduced with master (8fa10edcd1706a1f0dc9d8c3adbc8efa3c7755bf) by reproducing the equivalent CI & setting C(XX)FLAGS to -O0:

    0make -C depends/ MULTIPROCESS=1 NO_QT=1 NO_WALLET=1 NO_ZMQ=1 NO_USDT=1 CFLAGS="-O0" CXXFLAGS="-O0" DEBUG=1 -j19 HOST=i686-pc-linux-gnu
    1cmake -B build --toolchain /root/ci_scratch/depends/i686-pc-linux-gnu/toolchain.cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_C_COMPILER='clang;-m32' -DCMAKE_CXX_COMPILER='clang++;-m32'
    2cmake --build build -j18
    
     0./build/src/test/test_bitcoin --run_test=ipc*
     1Running 2 test cases...
     2terminate called after throwing an instance of 'kj::ExceptionImpl'
     3  what():  /root/ci_scratch/depends/i686-pc-linux-gnu/include/kj/common.h:1797: failed: expected start <= end && end <= size_; Out-of-bounds ArrayPtr::slice().
     4stack: 5ca0dd6d 5c78b5db 5c8cc599 5ca0adc9 5c94fb37 5c950070 5c9cd709 5c9cf58a 5c9d72fd 5c9d025e 5c776f11 5bf6351d 5bf6347d 5bf633cd 5bf6337b 5bf6331d 5bf63194 f7afa4b0 f773dff6 f77d55b7
     5    ??:0: returning here
     6    ??:0: returning here
     7    ??:0: returning here
     8    ??:0: returning here
     9    ??:0: returning here
    10    ??:0: returning here
    11    ??:0: returning here
    12    ??:0: returning here
    13    ??:0: returning here
    14    ??:0: returning here
    15    ??:0: returning here
    16    ??:0: returning here
    17    ??:0: returning here
    18    ??:0: returning here
    19    ??:0: returning here
    20    ??:0: returning here
    21    ??:0: returning here
    22    ??:0: returning here
    23    ??:0: returning here
    24    ??:0: returning here
    25unknown location(0): fatal error: in "ipc_tests/ipc_tests": signal: SIGABRT (application abort requested)
    26test/ipc_tests.cpp(12): last checkpoint: "ipc_tests" test entry
    27test_bitcoin: common/args.cpp:578: void ArgsManager::AddArg(const std::string &, const std::string &, unsigned int, const OptionsCategory &): Assertion `ret.second' failed.
    28unknown location(0): fatal error: in "ipc_tests/parse_address_test": signal: SIGABRT (application abort requested)
    29test/ipc_tests.cpp(20): last checkpoint: "parse_address_test" fixture ctor
    30
    31*** 2 failures are detected in the test module "Bitcoin Core Test Suite"
    
  2. fanquake added the label Tests on Jan 31, 2025
  3. bitcoin deleted a comment on Feb 3, 2025
  4. fanquake commented at 5:58 pm on February 3, 2025: member
  5. ryanofsky commented at 2:21 am on February 4, 2025: contributor

    I tried those steps except I left off the -DCMAKE_C_COMPILER='clang;-m32' -DCMAKE_CXX_COMPILER='clang++;-m32' part and the test passed without a problem.

    Is the clang part important? It seems odd to use gcc for the depends build and clang for the bitcoin build, and when when I tried the adding the clang arguments specified, cmake complained about not being able to find libstdc++, which maybe makes sense because, I don’t know if it is supposed to work with the gnu library.

    If there are any easier steps to reproduce this maybe using docker, or just a CI run I can look at that would be helpful. I ran into countless problems just getting i686 default build to work at all on my machine, so it is hard to know what things may be particular to the configuration this is happening in.

  6. fanquake commented at 9:49 am on February 4, 2025: member

    If there are any easier steps to reproduce this maybe using docker,

    Running time env -i HOME="$HOME" PATH="$PATH" USER="$USER" bash -c 'FILE_ENV="./ci/test/00_setup_env_i686_multiprocess.sh" ./ci/test_run_all.sh' with the branch from #29796.

    or just a CI run I can look at that would be helpful.

    https://github.com/bitcoin/bitcoin/pull/29796/checks?check_run_id=36485685971

  7. ryanofsky commented at 2:05 pm on February 4, 2025: contributor

    Thanks was able to get a stack trace by running cd /ci_container_base/ci/scratch/build-i686-pc-linux-gnu, gdb -ex run --args ./src/test/test_bitcoin -t ipc_tests, and bt in container:

     0[#0](/bitcoin-bitcoin/0/)  0xf7f4b5b9 in __kernel_vsyscall ()
     1[#1](/bitcoin-bitcoin/1/)  0xf79e6037 in ?? () from /lib32/libc.so.6
     2[#2](/bitcoin-bitcoin/2/)  0xf7994c51 in raise () from /lib32/libc.so.6
     3[#3](/bitcoin-bitcoin/3/)  0xf797c2b7 in abort () from /lib32/libc.so.6
     4[#4](/bitcoin-bitcoin/4/)  0xf7d4bf71 in ?? () from /lib32/libstdc++.so.6
     5[#5](/bitcoin-bitcoin/5/)  0xf7d65a97 in ?? () from /lib32/libstdc++.so.6
     6[#6](/bitcoin-bitcoin/6/)  0xf7d4b8f9 in std::terminate() () from /lib32/libstdc++.so.6
     7[#7](/bitcoin-bitcoin/7/)  0xf7d65ddc in __cxa_throw () from /lib32/libstdc++.so.6
     8[#8](/bitcoin-bitcoin/8/)  0x597d7e12 in kj::ExceptionCallback::RootExceptionCallback::onFatalException (this=0x5c8344b0, exception=...) at /usr/src/kj/exception.c++:1107
     9[#9](/bitcoin-bitcoin/9/)  0x597d6085 in kj::throwFatalException (exception=..., ignoreCount=1) at /usr/src/kj/exception.c++:1194
    10[#10](/bitcoin-bitcoin/10/) 0x597d05e5 in kj::_::Debug::Fault::fatal (this=0xf75cbc28) at /usr/src/kj/debug.c++:371
    11[#11](/bitcoin-bitcoin/11/) 0x597cf19a in kj::_::inlineRequireFailure (file=0x59e9a314 "/ci_container_base/depends/i686-pc-linux-gnu/include/kj/common.h", line=1797, expectation=0x59e9a2f7 "start <= end && end <= size_", 
    12    macroArgs=0x59e9a2d4 "\"Out-of-bounds ArrayPtr::slice().\"", message=0x59e9a2b0 "Out-of-bounds ArrayPtr::slice().") at /usr/src/kj/common.c++:36
    13[#12](/bitcoin-bitcoin/12/) 0x5954afc8 in kj::ArrayPtr<char const>::slice (this=0xf75cbd54, start=1498659230, end=13) at /usr/src/kj/common.h:1797
    14[#13](/bitcoin-bitcoin/13/) 0x5968bd88 in kj::StringPtr::slice (this=0xf75cbd54, start=1498659230) at /usr/src/kj/string.h:734
    15[#14](/bitcoin-bitcoin/14/) 0x597cc1e4 in kj::CidrRange::CidrRange (this=0x5a9ddc80 <kj::_::reservedCidrs()::result>, pattern=...) at /usr/src/kj/cidr.c++:53
    16[#15](/bitcoin-bitcoin/15/) 0x597105fe in kj::_::reservedCidrs () at /usr/src/kj/async-io.c++:3007
    17[#16](/bitcoin-bitcoin/16/) 0x59710b37 in kj::_::NetworkFilter::NetworkFilter (this=0x5c838b70) at /usr/src/kj/async-io.c++:3038
    18[#17](/bitcoin-bitcoin/17/) 0x5978e80c in kj::(anonymous namespace)::SocketNetwork::SocketNetwork (this=0x5c838b68, lowLevel=...) at /usr/src/kj/async-io-unix.c++:1742
    19[#18](/bitcoin-bitcoin/18/) 0x5979068d in kj::(anonymous namespace)::AsyncIoProviderImpl::AsyncIoProviderImpl (this=0x5c838b60, lowLevel=...) at /usr/src/kj/async-io-unix.c++:1974
    20[#19](/bitcoin-bitcoin/19/) 0x59798404 in kj::heap<kj::(anonymous namespace)::AsyncIoProviderImpl, kj::(anonymous namespace)::LowLevelAsyncIoProviderImpl&> () at /usr/src/kj/memory.h:609
    21[#20](/bitcoin-bitcoin/20/) 0x59791361 in kj::setupAsyncIo () at /usr/src/kj/async-io-unix.c++:2058
    22[#21](/bitcoin-bitcoin/21/) 0x595373c2 in mp::EventLoop::EventLoop(char const*, std::function<void (bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>, void*) (this=0xf75cc110, 
    23    exe_name=0x59b257f6 "IpcPipeTest", log_fn=..., context=0x0) at /usr/src/mp/proxy.cpp:158
    24[#22](/bitcoin-bitcoin/22/) 0x58a4064e in IpcPipeTest()::$_0::operator()() const (this=0x5c8542d4) at ./test/ipc_test.cpp:60
    25[#23](/bitcoin-bitcoin/23/) 0x58a405ae in std::__invoke_impl<void, IpcPipeTest()::$_0>(std::__invoke_other, IpcPipeTest()::$_0&&) (__f=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/invoke.h:61
    26[#24](/bitcoin-bitcoin/24/) 0x58a404fe in std::__invoke<IpcPipeTest()::$_0>(IpcPipeTest()::$_0&&) (__fn=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/invoke.h:96
    27[#25](/bitcoin-bitcoin/25/) 0x58a404ac in std::thread::_Invoker<std::tuple<IpcPipeTest()::$_0> >::_M_invoke<0u>(std::_Index_tuple<0u>) (this=0x5c8542d4)
    28    at /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_thread.h:292
    29[#26](/bitcoin-bitcoin/26/) 0x58a4044e in std::thread::_Invoker<std::tuple<IpcPipeTest()::$_0> >::operator()() (this=0x5c8542d4) at /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_thread.h:299
    30[#27](/bitcoin-bitcoin/27/) 0x58a402c5 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<IpcPipeTest()::$_0> > >::_M_run() (this=0x5c8542d0)
    31    at /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_thread.h:244
    32[#28](/bitcoin-bitcoin/28/) 0xf7d98d21 in ?? () from /lib32/libstdc++.so.6
    33[#29](/bitcoin-bitcoin/29/) 0xf79e4157 in ?? () from /lib32/libc.so.6
    34[#30](/bitcoin-bitcoin/30/) 0xf7a78e08 in ?? () from /lib32/libc.so.6
    
  8. ryanofsky commented at 2:21 pm on February 4, 2025: contributor

    Stack trace seems to be showing something going wrong in capnproto code, but unclear what the cause is. The crash is happening in the CidrRange::CidrRange constructor here:

    https://github.com/capnproto/capnproto/blob/b34ec28cceaf15b1082b74b50f03f770873c3636/c%2B%2B/src/kj/cidr.c%2B%2B#L53

    which is being called on a static list of address patterns:

    https://github.com/capnproto/capnproto/blob/b34ec28cceaf15b1082b74b50f03f770873c3636/c%2B%2B/src/kj/async-io.c%2B%2B#L2999-L3007

    All of the patterns look valid so no reason there should be a parsing exception like seems to be happening. It seems like this might be a more strange compiler / build issue

  9. ryanofsky commented at 2:29 pm on February 4, 2025: contributor

    gdb shows invalid results being returned from pattern.findFirst('/') call where pattern is 192.0.0.0/24

    https://github.com/capnproto/capnproto/blob/b34ec28cceaf15b1082b74b50f03f770873c3636/c%2B%2B/src/kj/cidr.c%2B%2B#L51C48-L51C57

    0[#14](/bitcoin-bitcoin/14/) 0x5971f1e4 in kj::CidrRange::CidrRange (this=0x5a930c80 <kj::_::reservedCidrs()::result>, pattern=...) at /usr/src/kj/cidr.c++:53
    1(gdb) p pattern
    2$6 = {content = {<kj::DisallowConstCopyIfNotConst<char const>> = {<No data fields>}, ptr = 0x59e13a45 "192.0.0.0/24", size_ = 13}}
    3(gdb) p slashPos
    4$7 = 1497950621
    5(gdb) p pattern.findFirst('/')
    6$8 = {ptr = {isSet = true, {value = 2155537998}}}
    
  10. ryanofsky commented at 7:48 pm on February 4, 2025: contributor

    I thought I tracked down the problem, and some change I made to the test caused it to pass, but rerunning the the fix in a new container, it no longer works, so I have to go back.

    Here were notes I was about to post about potential problem & fix. In any case I suspect problem would be avoided if we just consistently used gcc or clang for this build and didn’t try to mix them.


    Test failure seems to be caused by the container using incompatible ABI versions for the depends build and the main build. The depends build is using gcc with ABI version 18:

    0gcc -E -x c++ - -dM <<< "" | grep ABI
    1#define __GXX_ABI_VERSION 1018
    

    The bitcoin build is using clang with ABI version 2:

    0clang++ -E -x c++ - -dM <<< "" | grep ABI
    1#define __GXX_ABI_VERSION 1002
    

    In clang, the ABI version is determined when the compiler is built, but in gcc you can control it with the -fabi-version option, so the following change seems to fix the build:

     0--- a/depends/hosts/linux.mk
     1+++ b/depends/hosts/linux.mk
     2@@ -1,5 +1,5 @@
     3 linux_CFLAGS=-pipe -std=$(C_STANDARD)
     4-linux_CXXFLAGS=-pipe -std=$(CXX_STANDARD)
     5+linux_CXXFLAGS=-pipe -std=$(CXX_STANDARD) -fabi-version=2
     6 
     7 ifneq ($(LTO),)
     8 linux_AR = $(host_toolchain)gcc-ar
     9--- a/depends/packages/libmultiprocess.mk
    10+++ b/depends/packages/libmultiprocess.mk
    11@@ -27,3 +27,5 @@ endef
    12 define $(package)_stage_cmds
    13   $(MAKE) DESTDIR=$($(package)_staging_dir) install-lib
    14 endef
    15+
    16+$(package)_cxxflags += -fabi-version=11
    

    First part of the diff setting -fabi-version=2 in the depends build is the main fix. The second part of the diff setting -fabi-version=11 is a workaround for an issue that happened specifically with the multiprocess package. Because when -fabi-version is set to 10 or below there is a compile error:

     0[ 20%] Building CXX object CMakeFiles/mputil.dir/src/mp/util.cpp.o
     1In file included from /usr/include/c++/13/bits/shared_ptr_base.h:59,
     2                 from /usr/include/c++/13/bits/shared_ptr.h:53,
     3                 from /usr/include/c++/13/condition_variable:45,
     4                 from /usr/include/c++/13/future:41,
     5                 from /ci_container_base/depends/work/build/i686-pc-linux-gnu/libmultiprocess/07c917f7ca910d66abc6d3873162fc9061704074-722fc6d9234/include/mp/util.h:11,
     6                 from /ci_container_base/depends/work/build/i686-pc-linux-gnu/libmultiprocess/07c917f7ca910d66abc6d3873162fc9061704074-722fc6d9234/src/mp/util.cpp:6:
     7/usr/include/c++/13/bits/unique_ptr.h: In instantiation of ‘constexpr std::unique_ptr<_Tp, _Dp>::unique_ptr() [with _Del = std::__future_base::_Result_base::_Deleter; <template-parameter-2-2> = void; _Tp = std::__future_base::_Result_base; _Dp = std::__future_base::_Result_base::_Deleter]’:
     8/usr/include/c++/13/future:340:34:   required from here
     9/usr/include/c++/13/bits/unique_ptr.h:305:11: error: no matching function for call to ‘std::__uniq_ptr_data<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter, true, true>::__uniq_ptr_data()’
    10  305 |         : _M_t()
    11      |           ^~~~~~
    12/usr/include/c++/13/bits/unique_ptr.h:241:40: note: candidate: ‘template<class _Del> std::__uniq_ptr_data<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter, true, true>::__uniq_ptr_data(std::__uniq_ptr_impl<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>::pointer, _Del&&) [inherited from std::__uniq_ptr_impl<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>]’
    13  241 |       using __uniq_ptr_impl<_Tp, _Dp>::__uniq_ptr_impl;
    14      |                                        ^~~~~~~~~~~~~~~
    15/usr/include/c++/13/bits/unique_ptr.h:241:40: note:   template argument deduction/substitution failed:
    16/usr/include/c++/13/bits/unique_ptr.h:305:11: note:   candidate expects 2 arguments, 0 provided
    17  305 |         : _M_t()
    18      |           ^~~~~~
    19/usr/include/c++/13/bits/unique_ptr.h:241:40: note: candidate: ‘std::__uniq_ptr_data<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter, true, true>::__uniq_ptr_data(std::__uniq_ptr_impl<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>::pointer) [inherited from std::__uniq_ptr_impl<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>]’
    20  241 |       using __uniq_ptr_impl<_Tp, _Dp>::__uniq_ptr_impl;
    21      |                                        ^~~~~~~~~~~~~~~
    22/usr/include/c++/13/bits/unique_ptr.h:241:40: note:   candidate expects 1 argument, 0 provided
    23/usr/include/c++/13/bits/unique_ptr.h:242:7: note: candidate: ‘std::__uniq_ptr_data<_Tp, _Dp, <anonymous>, <anonymous> >::__uniq_ptr_data(std::__uniq_ptr_data<_Tp, _Dp, <anonymous>, <anonymous> >&&) [with _Tp = std::__future_base::_Result_base; _Dp = std::__future_base::_Result_base::_Deleter; bool <anonymous> = true; bool <anonymous> = true]’
    24  242 |       __uniq_ptr_data(__uniq_ptr_data&&) = default;
    25      |       ^~~~~~~~~~~~~~~
    26/usr/include/c++/13/bits/unique_ptr.h:242:7: note:   candidate expects 1 argument, 0 provided
    27make[4]: *** [CMakeFiles/mputil.dir/build.make:76: CMakeFiles/mputil.dir/src/mp/util.cpp.o] Error 1
    
  11. maflcko commented at 8:28 pm on February 4, 2025: member

    Interesting that this did not result in a compile or link failure instead.

    I switched to clang in commit fad0f21c3caba129106799fe6c14aff323ef99f2 to avoid OOM with g++, (which was before switching to 32-bit in commit fae0295a799499268caca9c385ac4d7061543980), but now that the CI machines have more memory, it should be fine if this CI task takes more memory and time.

    Happy to review a pull, if someone submits one.

  12. ryanofsky commented at 9:28 pm on February 4, 2025: contributor

    Can confirm switching from clang to gcc does seem to fix this. -Wno-error=documentation also had to be dropped because gcc does not support it.

     0--- a/ci/test/00_setup_env_i686_multiprocess.sh
     1+++ b/ci/test/00_setup_env_i686_multiprocess.sh
     2@@ -10,15 +10,12 @@ export HOST=i686-pc-linux-gnu
     3 export CONTAINER_NAME=ci_i686_multiprocess
     4 export CI_IMAGE_NAME_TAG="docker.io/ubuntu:24.04"
     5 export CI_IMAGE_PLATFORM="linux/amd64"
     6-export PACKAGES="llvm clang g++-multilib"
     7-export DEP_OPTS="DEBUG=1 MULTIPROCESS=1"
     8+export PACKAGES="g++-multilib"
     9+export DEP_OPTS="DEBUG=1 MULTIPROCESS=1 NO_QT=1"
    10 export GOAL="install"
    11 export TEST_RUNNER_EXTRA="--v2transport"
    12 export BITCOIN_CONFIG="\
    13  -DCMAKE_BUILD_TYPE=Debug \
    14- -DCMAKE_C_COMPILER='clang;-m32' \
    15- -DCMAKE_CXX_COMPILER='clang++;-m32' \
    16- -DCMAKE_CXX_FLAGS='-Wno-error=documentation' \
    17  -DAPPEND_CPPFLAGS='-DBOOST_MULTI_INDEX_ENABLE_SAFE_MODE' \
    18 "
    19 export BITCOIND=bitcoin-node  # Used in functional tests
    

    Interesting that this did not result in a compile or link failure instead.

    I was just seeing a lot of strange things here. Theoretically, I think it should be fine to use gcc and clang together. Some other things I noticed: when I ran gdb I could step into the findFirst methods and see it looking for the ‘/’ character and pos pointing to the right place but then the return value was wrong (a very large number instead of the position of the character). Also in I couldn’t even tell which findFirst overload I was in because line number did not seem to correspond to the source file.

    When I added the -fabi-version flags and rebuilt I saw the ipc tests run and passed, and the -fabi-version fix worked twice in two different containers when I applied it manually, but did not seem to work when it was already applied at the beginning of the build. So I think something else I was doing in my rebuild steps, like reconfiguring cmake was causing the problem to go away. So I don’t know. I want to debug more but I think I already spent way too much time on this.

  13. ryanofsky commented at 0:04 am on February 5, 2025: contributor

    I tried to repeat previous steps rebuilding depends and test_bitcoin in container to figure out what I was doing that caused the test to stop crashing, but it seems to crash reliably, so I can’t work out what I other changes I might have made while trying -fabi-version flags in #31772 (comment) that would have caused it not to crash.

    Additionally, I went back to original build and debugged it with GDB, and could easily step through and see where the bug is happening when I run with:

    0gdb -ex 'b findFirst' -ex run --args ./src/test/test_bitcoin -t ipc_tests
    

    Everything works well up until reaching the ArrayPtr<const char>::findFirst method:

    https://github.com/capnproto/capnproto/blob/b34ec28cceaf15b1082b74b50f03f770873c3636/c%2B%2B/src/kj/common.h#L1882-L1890

    The method also seems to work fine until it reaches the return pos - ptr; line. Because the function returns a Maybe value, this calls Maybe constructor:

    https://github.com/capnproto/capnproto/blob/b34ec28cceaf15b1082b74b50f03f770873c3636/c%2B%2B/src/kj/common.h#L1389

    which has a NullableValue member and calls NullableValue constructor:

    https://github.com/capnproto/capnproto/blob/b34ec28cceaf15b1082b74b50f03f770873c3636/c%2B%2B/src/kj/common.h#L1158

    which calls a function called ctor:

    https://github.com/capnproto/capnproto/blob/b34ec28cceaf15b1082b74b50f03f770873c3636/c%2B%2B/src/kj/common.h#L1061

    and unfortunately that function seems to be completely broken and not do anything. The ctor function is supposed to construct a T value inside the NullableValue<T> object, but it fails to do that. In this case T is a size_t object so it is supposed to assign the pos - ptr size that was computed into the Maybe<size_t> object that is being returned. But it doesn’t do this. So the Maybe<size_t> object is only half-initialized as {isSet = true, {value = 1497966205}}, where the value 1497966205 is just the preexisting value it held before it was constructed, and is much longer than the size of the string being searched, so the code later throws an exception when there is an attempt to slice the “192.0.0.0/24” string at that position.

    This bug seems like it is is probably a compiler bug, but not one is necessarily going to happen reliably because if the unitialized memory location started of as 0 instead 1497966205, crash would not happen. So I’m not sure switching bitcoin compiler from clang to gcc really fixes the problem, or just makes it appear not to happen. And I"m not sure what I was doing before that caused the problem to disappear as well. It seems like there might be just be a problem with this version of gcc and -O0 and this piece of code.

    Next steps might be to look at generated assembly and confirm compiler is really producing buggy code or try to reproduce a minimal test case. Another thing we could try to do is update to a new version of gcc. Version here seems to be gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0

    Here is disassembly of relevant functions:

      0(gdb) disassemble /s
      1Dump of assembler code for function _ZNK2kj8ArrayPtrIKcE9findFirstERS1_:
      2/usr/src/kj/common.h:
      31883    inline Maybe<size_t> ArrayPtr<const char>::findFirst(const char& c) const {
      4   0x5977156c <+0>:     push   %ebp
      5   0x5977156d <+1>:     mov    %esp,%ebp
      6   0x5977156f <+3>:     push   %ebx
      7   0x59771570 <+4>:     sub    $0x24,%esp
      8   0x59771573 <+7>:     call   0x566e8da0 <__x86.get_pc_thunk.bx>
      9   0x59771578 <+12>:    add    $0x121eff0,%ebx
     10   0x5977157e <+18>:    mov    0x8(%ebp),%eax
     11   0x59771581 <+21>:    mov    %eax,-0x1c(%ebp)
     12   0x59771584 <+24>:    mov    0xc(%ebp),%eax
     13   0x59771587 <+27>:    mov    %eax,-0x20(%ebp)
     14   0x5977158a <+30>:    mov    0x10(%ebp),%eax
     15   0x5977158d <+33>:    mov    %eax,-0x24(%ebp)
     16=> 0x59771590 <+36>:    mov    %gs:0x14,%eax
     17   0x59771596 <+42>:    mov    %eax,-0xc(%ebp)
     18   0x59771599 <+45>:    xor    %eax,%eax
     19
     201884      const char* pos = reinterpret_cast<const char*>(memchr(ptr, c, size_));
     21   0x5977159b <+47>:    mov    -0x20(%ebp),%eax
     22   0x5977159e <+50>:    mov    0x4(%eax),%ecx
     23   0x597715a1 <+53>:    mov    -0x24(%ebp),%eax
     24   0x597715a4 <+56>:    movzbl (%eax),%eax
     25   0x597715a7 <+59>:    movsbl %al,%edx
     26   0x597715aa <+62>:    mov    -0x20(%ebp),%eax
     27   0x597715ad <+65>:    mov    (%eax),%eax
     28   0x597715af <+67>:    sub    $0x4,%esp
     29   0x597715b2 <+70>:    push   %ecx
     30   0x597715b3 <+71>:    push   %edx
     31   0x597715b4 <+72>:    push   %eax
     32   0x597715b5 <+73>:    call   0x5664a880 <memchr@plt>
     33   0x597715ba <+78>:    add    $0x10,%esp
     34   0x597715bd <+81>:    mov    %eax,-0x10(%ebp)
     35
     361885      if (pos == nullptr) {
     37   0x597715c0 <+84>:    cmpl   $0x0,-0x10(%ebp)
     38   0x597715c4 <+88>:    jne    0x597715d8 <_ZNK2kj8ArrayPtrIKcE9findFirstERS1_+108>
     39
     401886        return nullptr;
     41   0x597715c6 <+90>:    sub    $0x8,%esp
     42   0x597715c9 <+93>:    push   $0x0
     43   0x597715cb <+95>:    push   -0x1c(%ebp)
     44   0x597715ce <+98>:    call   0x595224a6 <_ZN2kj5MaybeIjEC2EDn>
     45   0x597715d3 <+103>:   add    $0x10,%esp
     46   0x597715d6 <+106>:   jmp    0x597715f9 <_ZNK2kj8ArrayPtrIKcE9findFirstERS1_+141>
     47
     481887      } else {
     491888        return pos - ptr;
     50   0x597715d8 <+108>:   mov    -0x20(%ebp),%eax
     51   0x597715db <+111>:   mov    (%eax),%eax
     52   0x597715dd <+113>:   mov    -0x10(%ebp),%edx
     53   0x597715e0 <+116>:   sub    %eax,%edx
     54   0x597715e2 <+118>:   mov    %edx,%eax
     55   0x597715e4 <+120>:   mov    %eax,-0x14(%ebp)
     56   0x597715e7 <+123>:   sub    $0x8,%esp
     57   0x597715ea <+126>:   lea    -0x14(%ebp),%eax
     58   0x597715ed <+129>:   push   %eax
     59   0x597715ee <+130>:   push   -0x1c(%ebp)
     60   0x597715f1 <+133>:   call   0x595224d0 <_ZN2kj5MaybeIjEC2EOj>
     61   0x597715f6 <+138>:   add    $0x10,%esp
     62
     631889      }
     641890    }
     65   0x597715f9 <+141>:   mov    -0xc(%ebp),%eax
     66   0x597715fc <+144>:   sub    %gs:0x14,%eax
     67   0x59771603 <+151>:   je     0x5977160a <_ZNK2kj8ArrayPtrIKcE9findFirstERS1_+158>
     68   0x59771605 <+153>:   call   0x597cf290 <__stack_chk_fail_local>
     69   0x5977160a <+158>:   mov    -0x1c(%ebp),%eax
     70   0x5977160d <+161>:   mov    -0x4(%ebp),%ebx
     71   0x59771610 <+164>:   leave
     72   0x59771611 <+165>:   ret    $0x4
     73End of assembler dump.
     74(gdb) disassemble /s _ZN2kj5MaybeIjEC2EOj
     75Dump of assembler code for function _ZN2kj5MaybeIjEC2EOj:
     76/usr/src/kj/common.h:
     771389      Maybe(T&& t): ptr(kj::mv(t)) {}
     78   0x595224d0 <+0>:     push   %ebp
     79   0x595224d1 <+1>:     mov    %esp,%ebp
     80   0x595224d3 <+3>:     push   %esi
     81   0x595224d4 <+4>:     push   %ebx
     82   0x595224d5 <+5>:     call   0x566e8da0 <__x86.get_pc_thunk.bx>
     83   0x595224da <+10>:    add    $0x146e08e,%ebx
     84   0x595224e0 <+16>:    mov    0x8(%ebp),%esi
     85   0x595224e3 <+19>:    sub    $0xc,%esp
     86   0x595224e6 <+22>:    push   0xc(%ebp)
     87   0x595224e9 <+25>:    call   0x59502e4c <_ZN2kj2mvIjEEOT_RS1_>
     88   0x595224ee <+30>:    add    $0x10,%esp
     89   0x595224f1 <+33>:    sub    $0x8,%esp
     90   0x595224f4 <+36>:    push   %eax
     91   0x595224f5 <+37>:    push   %esi
     92   0x595224f6 <+38>:    call   0x5952b7d8 <_ZN2kj1_13NullableValueIjEC2EOj>
     93   0x595224fb <+43>:    add    $0x10,%esp
     94   0x595224fe <+46>:    nop
     95   0x595224ff <+47>:    lea    -0x8(%ebp),%esp
     96   0x59522502 <+50>:    pop    %ebx
     97   0x59522503 <+51>:    pop    %esi
     98   0x59522504 <+52>:    pop    %ebp
     99   0x59522505 <+53>:    ret
    100End of assembler dump.
    101(gdb) disassemble /s _ZN2kj1_13NullableValueIjEC2EOj
    102Dump of assembler code for function _ZN2kj1_13NullableValueIjEC2EOj:
    103/usr/src/kj/common.h:
    1041156      inline NullableValue(T&& t)
    105   0x5952b7d8 <+0>:     push   %ebp
    106   0x5952b7d9 <+1>:     mov    %esp,%ebp
    107   0x5952b7db <+3>:     push   %ebx
    108   0x5952b7dc <+4>:     sub    $0x4,%esp
    109   0x5952b7df <+7>:     call   0x566e8da0 <__x86.get_pc_thunk.bx>
    110   0x5952b7e4 <+12>:    add    $0x1464d84,%ebx
    111
    1121157          : isSet(true) {
    113   0x5952b7ea <+18>:    mov    0x8(%ebp),%eax
    114   0x5952b7ed <+21>:    movb   $0x1,(%eax)
    115
    1161158        ctor(value, kj::mv(t));
    117   0x5952b7f0 <+24>:    sub    $0xc,%esp
    118   0x5952b7f3 <+27>:    push   0xc(%ebp)
    119   0x5952b7f6 <+30>:    call   0x59502e4c <_ZN2kj2mvIjEEOT_RS1_>
    120   0x5952b7fb <+35>:    add    $0x10,%esp
    121   0x5952b7fe <+38>:    mov    0x8(%ebp),%edx
    122   0x5952b801 <+41>:    add    $0x4,%edx
    123   0x5952b804 <+44>:    sub    $0x8,%esp
    124   0x5952b807 <+47>:    push   %eax
    125   0x5952b808 <+48>:    push   %edx
    126   0x5952b809 <+49>:    call   0x595314f8 <_ZN2kj4ctorIjIjEEEvRT_DpOT0_>
    127   0x5952b80e <+54>:    add    $0x10,%esp
    128
    1291159      }
    130   0x5952b811 <+57>:    nop
    131   0x5952b812 <+58>:    mov    -0x4(%ebp),%ebx
    132   0x5952b815 <+61>:    leave
    133   0x5952b816 <+62>:    ret
    134End of assembler dump.
    135(gdb) disassemble /s _ZN2kj4ctorIjIjEEEvRT_DpOT0_
    136
    137Dump of assembler code for function _ZN2kj4ctorIjIjEEEvRT_DpOT0_:
    138/usr/src/kj/common.h:
    1391060    inline void ctor(T& location, Params&&... params) {
    140   0x595314f8 <+0>:     push   %ebp
    141   0x595314f9 <+1>:     mov    %esp,%ebp
    142   0x595314fb <+3>:     push   %esi
    143   0x595314fc <+4>:     push   %ebx
    144   0x595314fd <+5>:     call   0x566e8da0 <__x86.get_pc_thunk.bx>
    145   0x59531502 <+10>:    add    $0x145f066,%ebx
    146
    1471061      new (_::PlacementNew(), &location) T(kj::fwd<Params>(params)...);
    148   0x59531508 <+16>:    sub    $0x4,%esp
    149   0x5953150b <+19>:    push   0x8(%ebp)
    150   0x5953150e <+22>:    push   %eax
    151   0x5953150f <+23>:    push   $0x4
    152   0x59531511 <+25>:    call   0x58a0ec00 <_ZnwjN2kj1_12PlacementNewEPv>
    153   0x59531516 <+30>:    add    $0x10,%esp
    154   0x59531519 <+33>:    mov    %eax,%esi
    155   0x5953151b <+35>:    test   %esi,%esi
    156   0x5953151d <+37>:    je     0x59531531 <_ZN2kj4ctorIjIjEEEvRT_DpOT0_+57>
    157   0x5953151f <+39>:    sub    $0xc,%esp
    158   0x59531522 <+42>:    push   0xc(%ebp)
    159   0x59531525 <+45>:    call   0x59502e70 <_ZN2kj3fwdIjEEOT_RNS_8NoInfer_IS1_E4TypeE>
    160   0x5953152a <+50>:    add    $0x10,%esp
    161   0x5953152d <+53>:    mov    (%eax),%eax
    162   0x5953152f <+55>:    mov    %eax,(%esi)
    163
    1641062    }
    165   0x59531531 <+57>:    nop
    166   0x59531532 <+58>:    lea    -0x8(%ebp),%esp
    167   0x59531535 <+61>:    pop    %ebx
    168   0x59531536 <+62>:    pop    %esi
    169   0x59531537 <+63>:    pop    %ebp
    170   0x59531538 <+64>:    ret
    171End of assembler dump.
    172(gdb) 
    173Dump of assembler code for function _ZN2kj4ctorIjIjEEEvRT_DpOT0_:
    174/usr/src/kj/common.h:
    1751060    inline void ctor(T& location, Params&&... params) {
    176   0x595314f8 <+0>:     push   %ebp
    177   0x595314f9 <+1>:     mov    %esp,%ebp
    178   0x595314fb <+3>:     push   %esi
    179   0x595314fc <+4>:     push   %ebx
    180   0x595314fd <+5>:     call   0x566e8da0 <__x86.get_pc_thunk.bx>
    181   0x59531502 <+10>:    add    $0x145f066,%ebx
    182
    1831061      new (_::PlacementNew(), &location) T(kj::fwd<Params>(params)...);
    184   0x59531508 <+16>:    sub    $0x4,%esp
    185   0x5953150b <+19>:    push   0x8(%ebp)
    186   0x5953150e <+22>:    push   %eax
    187   0x5953150f <+23>:    push   $0x4
    188   0x59531511 <+25>:    call   0x58a0ec00 <_ZnwjN2kj1_12PlacementNewEPv>
    189   0x59531516 <+30>:    add    $0x10,%esp
    190   0x59531519 <+33>:    mov    %eax,%esi
    191   0x5953151b <+35>:    test   %esi,%esi
    192   0x5953151d <+37>:    je     0x59531531 <_ZN2kj4ctorIjIjEEEvRT_DpOT0_+57>
    193   0x5953151f <+39>:    sub    $0xc,%esp
    194   0x59531522 <+42>:    push   0xc(%ebp)
    195   0x59531525 <+45>:    call   0x59502e70 <_ZN2kj3fwdIjEEOT_RNS_8NoInfer_IS1_E4TypeE>
    196   0x5953152a <+50>:    add    $0x10,%esp
    197   0x5953152d <+53>:    mov    (%eax),%eax
    198   0x5953152f <+55>:    mov    %eax,(%esi)
    199
    2001062    }
    201   0x59531531 <+57>:    nop
    202   0x59531532 <+58>:    lea    -0x8(%ebp),%esp
    203   0x59531535 <+61>:    pop    %ebx
    204   0x59531536 <+62>:    pop    %esi
    205   0x59531537 <+63>:    pop    %ebp
    206   0x59531538 <+64>:    ret
    207End of assembler dump.
    208(gdb) disassemble /s _ZnwjN2kj1_12PlacementNewEPv
    209Dump of assembler code for function _ZnwjN2kj1_12PlacementNewEPv:
    210/ci_container_base/depends/i686-pc-linux-gnu/include/kj/common.h:
    2111051    inline void* operator new(size_t, kj::_::PlacementNew, void* __p) noexcept {
    212   0x58a0ec00 <+0>:     endbr32
    213   0x58a0ec04 <+4>:     push   %ebp
    214   0x58a0ec05 <+5>:     mov    %esp,%ebp
    215   0x58a0ec07 <+7>:     push   %ebx
    216   0x58a0ec08 <+8>:     sub    $0x14,%esp
    217   0x58a0ec0b <+11>:    call   0x58a0ec10 <_ZnwjN2kj1_12PlacementNewEPv+16>
    218   0x58a0ec10 <+16>:    pop    %eax
    219   0x58a0ec11 <+17>:    add    $0x1f81958,%eax
    220   0x58a0ec17 <+23>:    mov    %eax,-0x14(%ebp)
    221   0x58a0ec1a <+26>:    mov    0xc(%ebp),%eax
    222   0x58a0ec1d <+29>:    mov    0x8(%ebp),%eax
    223   0x58a0ec20 <+32>:    mov    %gs:0x14,%eax
    224   0x58a0ec26 <+38>:    mov    %eax,-0x8(%ebp)
    225
    2261052      return __p;
    227   0x58a0ec29 <+41>:    mov    0xc(%ebp),%eax
    228   0x58a0ec2c <+44>:    mov    %eax,-0x10(%ebp)
    229   0x58a0ec2f <+47>:    mov    %gs:0x14,%eax
    230   0x58a0ec35 <+53>:    mov    -0x8(%ebp),%ecx
    231   0x58a0ec38 <+56>:    cmp    %ecx,%eax
    232   0x58a0ec3a <+58>:    jne    0x58a0ec49 <_ZnwjN2kj1_12PlacementNewEPv+73>
    233   0x58a0ec40 <+64>:    mov    -0x10(%ebp),%eax
    234   0x58a0ec43 <+67>:    add    $0x14,%esp
    235   0x58a0ec46 <+70>:    pop    %ebx
    236   0x58a0ec47 <+71>:    pop    %ebp
    237   0x58a0ec48 <+72>:    ret
    238   0x58a0ec49 <+73>:    mov    -0x14(%ebp),%ebx
    239   0x58a0ec4c <+76>:    call   0x56649250 <__stack_chk_fail@plt>
    240End of assembler dump.
    241(gdb) disassemble /s _ZN2kj3fwdIjEEOT_RNS_8NoInfer_IS1_E4TypeE
    242Dump of assembler code for function _ZN2kj3fwdIjEEOT_RNS_8NoInfer_IS1_E4TypeE:
    243/ci_container_base/depends/i686-pc-linux-gnu/include/kj/common.h:
    244700     template<typename T> constexpr T&& fwd(NoInfer<T>& t) noexcept { return static_cast<T&&>(t); }
    245   0x59502e70 <+0>:     push   %ebp
    246   0x59502e71 <+1>:     mov    %esp,%ebp
    247   0x59502e73 <+3>:     call   0x584df28a <__x86.get_pc_thunk.ax>
    248   0x59502e78 <+8>:     add    $0x148d6f0,%eax
    249   0x59502e7d <+13>:    mov    0x8(%ebp),%eax
    250   0x59502e80 <+16>:    pop    %ebp
    251   0x59502e81 <+17>:    ret
    252End of assembler dump.
    

    When I fed this to chatgpt (https://chatgpt.com/share/67a2aa92-e100-800a-b5b3-999982d1a648) it claimed to find a bug in the dissembly where operator new function is not interpreting its parameters correctly, and I could confirm this with gdb (at least to the best of my understanding, I am not that familiar with assembly and calling conventions). But gdb definitely showed operator new returning the wrong address (source address not destination address), which explained why the destination was not being updated and contained a garbage value.

    So I think there is pretty good evidence that this version of gcc contains a bug with -O0 and is miscompiling the code. Again I’m not sure if we care about this or not. It’s an older version of gcc so might be logical to just update it. Or just not compile this code with -O0.

  14. maflcko commented at 9:45 am on February 5, 2025: member

    This bug seems like it is is probably a compiler bug, but not one is necessarily going to happen reliably because if the unitialized memory location started of as 0 instead 1497966205, crash would not happen. So I’m not sure switching bitcoin compiler from clang to gcc really fixes the problem, or just makes it appear not to happen. And I"m not sure what I was doing before that caused the problem to disappear as well. It seems like there might be just be a problem with this version of gcc and -O0 and this piece of code.

    Next steps might be to look at generated assembly and confirm compiler is really producing buggy code or try to reproduce a minimal test case. Another thing we could try to do is update to a new version of gcc. Version here seems to be gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0

    Interesting. It would be nice to reduce this, if someone wants to spend more time on this. However, I am not familiar with libmp and the build process around it, so if someone manages to produce one (or two) cpp files (even if large) with the corresponding compiler flags, I am happy to take it from there and minimize further.

  15. ryanofsky commented at 6:50 pm on February 5, 2025: contributor

    Interesting. It would be nice to reduce this

    So I reduced the failure down to a standalone case, but it could be reduced further. Now it is unclear to me if there is a compiler bug here or not, because this is caused by a very specific interaction between the two compiler versions we are using:

    • gcc version Ubuntu 13.3.0-6ubuntu2~24.04
    • clang version 18.1.3 (1ubuntu1)

    and maybe the specific flags we are using. The bug happens because the linker links together the

    • _ZN2kj4ctorIjIjEEEvRT_DpOT0_ function (void kj::ctor<unsigned int, unsigned int>(unsigned int&, unsigned int&&))

    generated by GCC in the capnproto static libraries

    • _ZnwjN2kj1_12PlacementNewEPv function (operator new(unsigned int, kj::_::PlacementNew, void*))

    generated by clang in the libbitcoin_ipc_test.a library.

    This happens due to linker command line order. The ipc_test library comes before the kj static libraries in the command line, but it only contains an operator new symbol, not a ctor symbol, while kj static libraries contain both symbols. The link will prefer the first symbol definition it sees so it uses the operator new from bitcoin together with the ctor from libkj and this does not work because gcc ctor calls clang operator new with a calling convention it is not expecting (as described by chatgpt above).

    Here’s the reduced test case I have which reproduces the bug with the same code and flags as the CI build:

     0cat > test.h <<EOS
     1#include <cstddef>
     2template <typename T> struct NoInfer_ { typedef T Type; };
     3template <typename T> using NoInfer = typename NoInfer_<T>::Type;
     4
     5template<typename T> constexpr T&& fwd(NoInfer<T>& t) noexcept { return static_cast<T&&>(t); }
     6
     7struct PlacementNew {};
     8
     9void* operator new(size_t, PlacementNew, void* __p) noexcept;
    10
    11template <typename T, typename... Params>
    12inline void ctor(T& location, Params&&... params) {
    13  new (PlacementNew(), &location) T(fwd<Params>(params)...);
    14}
    15EOS
    16
    17cat > test_gcc.cpp <<EOS
    18#include "test.h"
    19template void ctor<size_t, size_t>(size_t&, size_t&&);
    20EOS
    21
    22cat > test_clang.cpp <<EOS
    23#include "test.h"
    24#include <iostream>
    25
    26void* operator new(size_t, PlacementNew, void* __p) noexcept {
    27  return __p;
    28}
    29
    30size_t f() {
    31    size_t i = 10;
    32    size_t j = 20;
    33    ctor(i, std::move(j));
    34    return i;
    35}
    36
    37int main() {
    38    std::cout << "i = " << f() << std::endl;
    39}
    40EOS
    41
    42g++ -m32 -D_GLIBCXX_DEBUG -D_GLIBCXX_DEBUG_PEDANTIC -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_DEBUG -pipe -std=c++20 -O0 -g -fPIC -Wall -Wextra -Wno-strict-aliasing -Wno-sign-compare -Wno-unused-parameter -pthread -c test_gcc.cpp -o test_gcc.o
    43clang++ -m32 -Wno-error=documentation -O0 -ftrapv -O0 -g3 -g3 -fstack-protector-all -fcf-protection=full -fstack-clash-protection -fPIE -c test_clang.cpp -o test_clang.o
    44clang++ -m32 -Wno-error=documentation -O0 -ftrapv -O0 -g3 -g3 -fstack-protector-all -fcf-protection=full -fstack-clash-protection -Wl,-z,relro -Wl,-z,now -Wl,-z,separate-code -fPIE -pie test_gcc.o test_clang.o
    45./a.out
    

    If the test case were compiled properly it would print 20, but due to the calling convention problem here it prints 10, because the line ctor(i, std::move(j)); is supposed to assign j (20) to i, but doesn’t do that because operator new returns a garbage value.

    I confirmed that adding -fabi-version=2 to gcc command line above does not fix the problem (the program still prints 10 instead of 20). But I did not experiment further to see which of the compiler flags we are passing may be causing the incompatibility, or if the flags are irrelevant and gcc and clang just disagree on the calling convention for this operator new.

    I am not sure if there is a general solution for this problem. If linker order were changed so that kj libraries were always specified first before the bitcoin library, that would prevent crashing inside kj libraries, but then it might cause a similar crash in the bitcoin code.

    All of this suggests that if you are using depends, and you want to change the compiler or pass different flags in bitcoin, you really need to make the same changes in both builds. Mixing different flags and compiler versions might not be a good idea.

  16. ryanofsky commented at 7:06 pm on February 5, 2025: contributor

    Following seem to be minimal flags to reproduce this issue. None of the other flags except for -fPIC was making any difference, and without -fPIC the program just segfaults instead of printing the wrong value:

    0g++ -m32 -fPIC -c test_gcc.cpp -o test_gcc.o
    1clang++ -m32 -fPIC -c test_clang.cpp -o test_clang.o
    2clang++ -m32 -fPIC test_gcc.o test_clang.o
    3./a.out
    

    So it seems like this version of gcc and clang just always (regardless of flags) assume incompatible calling conventions for operator new and can’t be used together here.

    Fix for the bug should just be to use either clang or gcc for this CI build and not try to use both.

  17. ryanofsky commented at 2:31 am on February 6, 2025: contributor
    This also seems to be a known issue: “It has come to my attention that GCC and clang generate incompatible code for passing an argument of an empty class type.” https://itanium-cxx-abi.github.io/cxx-abi/cxx-abi-dev/archives/2015-December/002869.html
  18. maflcko commented at 8:23 am on February 6, 2025: member
    I wonder if there is a way to detect those at compile or link time. Other than that, printing a warning when mixing two compilers when compiling with depends may be useful?
  19. fanquake commented at 11:41 am on February 6, 2025: member

    Other than that, printing a warning when mixing two compilers when compiling with depends may be useful?

    Why only depends though? If the issue is generally mixing compilers/flags, then using Clang + system libs on any (gcc-based) Linux system (potentially) has the same problem. It’s currently the case that we use Clang + (GCC built) system libs in at least the ASAN,fuzz,valgrind CIs, and Clang + GCC depends libs in the 32-bit job.

  20. ryanofsky commented at 1:37 pm on February 6, 2025: contributor

    Why only depends though? If the issue is generally mixing compilers/flags, then using Clang + system libs on any (gcc-based) Linux system (potentially) has the same problem.

    Yes this issue is not specific to depends, and could theoretically could happen if, for example an ubuntu library package was compiled with gcc and exposed a function with an empty struct parameter, and you tried to use the library with clang.

    The point of having an ABI is to prevent issues like this and allow different compilers to interoperate, but this is a corner case where something hasn’t been standardized. Also, in this case, issue goes away if either package is compiled with any optimization (even -O is sufficient) because this makes the call to operator new inlined. And this issue could also be masked by having a different linker command line order that would cause linker to choose compatible ctor and operator new symbol definitions instead of incompatible ones.

    Given corner case nature of this problem, I’m thinking it would be good to send a patch to upstream capnproto to avoid the issue by changing parameter order. Following patch seems to fix the issue in our CI job:

     0diff --git a/depends/packages/capnp.mk b/depends/packages/capnp.mk
     1index 0c211cbc455d..00ccf08acf4b 100644
     2--- a/depends/packages/capnp.mk
     3+++ b/depends/packages/capnp.mk
     4@@ -5,6 +5,12 @@ $(package)_download_file=$(native_$(package)_download_file)
     5 $(package)_file_name=$(native_$(package)_file_name)
     6 $(package)_sha256_hash=$(native_$(package)_sha256_hash)
     7 
     8+$(package)_patches = abifix.patch
     9+
    10+define $(package)_preprocess_cmds
    11+  patch -p2 < $($(package)_patch_dir)/abifix.patch
    12+endef
    13+
    14 define $(package)_set_vars :=
    15   $(package)_config_opts := -DBUILD_TESTING=OFF
    16   $(package)_config_opts += -DWITH_OPENSSL=OFF
    17diff --git a/depends/patches/capnp/abifix.patch b/depends/patches/capnp/abifix.patch
    18new file mode 100644
    19index 000000000000..1386aadc7452
    20--- /dev/null
    21+++ b/depends/patches/capnp/abifix.patch
    22@@ -0,0 +1,35 @@
    23+diff --git a/c++/src/kj/common.h b/c++/src/kj/common.h
    24+index 237c41d3..dc2e6381 100644
    25+--- a/c++/src/kj/common.h
    26++++ b/c++/src/kj/common.h
    27+@@ -1041,24 +1041,26 @@ private:
    28+ 
    29+ // We want placement new, but we don't want to #include <new>.  operator new cannot be defined in
    30+ // a namespace, and defining it globally conflicts with the definition in <new>.  So we have to
    31+-// define a dummy type and an operator new that uses it.
    32++// define a dummy type and an operator new that uses it. The dummy type is intentionally passed
    33++// as the last parameter to avoid an ABI issues caused by GCC and clang using incompatible calling
    34++// conventions for passing empty struct parameters.
    35+ 
    36+ namespace _ {  // private
    37+ struct PlacementNew {};
    38+ }  // namespace _ (private)
    39+ } // namespace kj
    40+ 
    41+-inline void* operator new(size_t, kj::_::PlacementNew, void* __p) noexcept {
    42++inline void* operator new(size_t, void* __p, kj::_::PlacementNew) noexcept {
    43+   return __p;
    44+ }
    45+ 
    46+-inline void operator delete(void*, kj::_::PlacementNew, void* __p) noexcept {}
    47++inline void operator delete(void*, void* __p, kj::_::PlacementNew) noexcept {}
    48+ 
    49+ namespace kj {
    50+ 
    51+ template <typename T, typename... Params>
    52+ inline void ctor(T& location, Params&&... params) {
    53+-  new (_::PlacementNew(), &location) T(kj::fwd<Params>(params)...);
    54++  new (&location, _::PlacementNew()) T(kj::fwd<Params>(params)...);
    55+ }
    56+ 
    57+ template <typename T>
    
  21. ryanofsky commented at 2:08 pm on February 6, 2025: contributor
  22. fanquake referenced this in commit f6bf4d2834 on Feb 7, 2025
  23. fanquake referenced this in commit b2833160a2 on Feb 7, 2025
  24. fanquake commented at 10:23 am on February 7, 2025: member

    Submitted patch in https://github.com/capnproto/capnproto/pull/2235

    Thanks. Pulled that into #29796.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-02-22 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me