arch: amd64 gitian virtualization: docker and podman host OS: Fedora and Ubuntu Bionic gitian builder at commit: https://github.com/devrandom/gitian-builder/commit/9b28e9c990eb0e98fb23573903791c75fdee3db1
Log: win-build.log
arch: amd64 gitian virtualization: docker and podman host OS: Fedora and Ubuntu Bionic gitian builder at commit: https://github.com/devrandom/gitian-builder/commit/9b28e9c990eb0e98fb23573903791c75fdee3db1
Log: win-build.log
The Ubuntu Bionic machine uses 2 threads to compile and the Fedora machine uses 9 threads.
First seen here: #16667 (comment)
Apparently it's missing the univalue symbols during link. It did however build univalue, and create libunivalue.la.
I will try using one thread to see if the issue is related to that
Times passed with one thread: 3
Ok, I built with one thread and it failed with a different linker error this time: win-build.log
Will try to build rc3 tomorrow and see if the hashes match
On master, linux and windows are affected now
I'll try a bit too.
Ok, I built with one thread and it failed with a different linker error this time:
Same kind of error though (unreferenced symbols for a library that seems to have been built). So it affects secp256k1 too.
Apparent data race? is definitely unexpected in case of one thread.
Same error for 7358ae6d71cd0e5908a1203b61cd4e54fe4af5de (rc3), 9 threads, so it is not related to #16667
Haven't been able to reproduce it yet (debian, LXC, 6 threads, building master for linux/windows/macosx in a loop).
linux-build.log
Even standard std::__cxx11::basic_string symbols are missing there. Seems like the linker is botched.
There's also:
/usr/bin/ld: i386:x86-64 architecture of input file `univalue/.libs/libunivalue.a(libunivalue_la-univalue_get.o)' is incompatible with i386 output
Could be that there's sometimes files left behind that interfere with the build?
I might just switch to guix builds. @dongcarl wen guix, sir?
FWIW I've been running head-to-tail gitian builds of master for all OSes for the entire night, and haven't had a single failure.
Can you try to bisect this?
I might just switch to guix builds.
Yes, would be interesting to see if that solves it. Whether it works or not, it will help isolate the issue.
@MarcoFalke What's the best commit and os for me to reproduce this? Going to try running Guix and seeing if we get the same problem.
I will produce signed results for rc3 the old way (gbuild) to rule out it is not one of my wrapper scripts, which wrap the ./contrib/gitian-build.py wrapper script, which wraps gbuild, cause this.
DrahtBot ran with: arch: amd64 OS: Ubuntu Bionic docker: vanilla ubuntu package
I ran with: arch: amd64 os: fedora 30 podman: 1.4.4-4 (https://koji.fedoraproject.org/koji/buildinfo?buildID=1314654) gitian: 9b28e9c990eb0e98fb23573903791c75fdee3db1 VERSION=0.19.0rc3
$ ./bin/gbuild --num-make 9 --memory 9000 --commit bitcoin=v${VERSION} ../bitcoin/contrib/gitian-descriptors/gitian-win.yml
and it fails as follows:
--- Building for bionic amd64 ---
Stopping target if it is up
=
podman container stop gitian-target
=
=
podman container rm gitian-target
=
Making a new image copy
Starting target
Checking if target is up=
podman run -d --name gitian-target base-bionic-amd64:latest
=
.
=
podman exec -u ubuntu -i gitian-target true
=
Preparing build environment
=
podman exec -u ubuntu -i gitian-target setarch x86_64 bash
=
=
podman exec -u ubuntu gitian-target mkdir -p /home/ubuntu/cache/
=
=
podman cp cache/bitcoin-core-win-0.19/ gitian-target:/home/ubuntu/cache/
=
=
podman exec -u root gitian-target chown -R ubuntu:ubuntu /home/ubuntu/cache/
=
=
podman exec -u ubuntu gitian-target mkdir -p /home/ubuntu/cache/
=
=
podman cp cache/common/ gitian-target:/home/ubuntu/cache/
=
=
podman exec -u root gitian-target chown -R ubuntu:ubuntu /home/ubuntu/cache/
=
Updating apt-get repository (log in var/install.log)
Installing additional packages (log in var/install.log)
=
podman exec -u root -i gitian-target [ ! -e /var/cache/gitian/initial-upgrade ]
=
Upgrading system, may take a while (log in var/install.log)
Creating package manifest
=
podman exec -u root -i gitian-target bash
=
Creating build script (var/build-script)
=
podman exec -u ubuntu gitian-target mkdir -p /home/ubuntu/build/
=
=
podman cp inputs/bitcoin gitian-target:/home/ubuntu/build/
=
=
podman exec -u root gitian-target chown -R ubuntu:ubuntu /home/ubuntu/build/
=
Running build script (log in var/build.log)
Traceback (most recent call last):
6: from ./bin/gbuild:307:in `<main>'
5: from ./bin/gbuild:307:in `each'
4: from ./bin/gbuild:309:in `block in <main>'
3: from ./bin/gbuild:309:in `each'
2: from ./bin/gbuild:314:in `block (2 levels) in <main>'
1: from ./bin/gbuild:164:in `build_one_configuration'
./bin/gbuild:21:in `system!': failed to run on-target setarch x86_64 bash -x < var/build-script > var/build.log 2>&1 (RuntimeError)
Can you try to bisect this?
This will be hard because it is non-deterministic
(The above failure with vanilla gbuild took 4 tries)
Looks like you've tried different machines, different host OS, built for different architectures, tried different numbers of threads. The only commonality seems to be the use of gitian with docker.
Alternatively, a change in Bionic (the guest OS) triggered this. I'll regenerate the base image and see if it starts happening.
I wonder, how to debug a non-deterministic linker failure? Is it possible to inspect the state of a VM after it happens? Maybe comparing some of the .o's and .a's with those produced in a successful run could explain a thing.
Bisecting this (eta is next week):
BAD: 0.19.0rc3 BAD: 085cac6b90 BAD: 81f732bcaa BAD: 1a8a5ede9f GOOD (?): 0.18.1
Edit: I used the gitian descriptor from 0.19.0rc3, so here the bisect goes again with the gitian descriptors from the respctive tag:
BAD: 0.19.0rc3 RUNNING: 1a8a5ede9f
I wonder, how to debug a non-deterministic linker failure? Is it possible to inspect the state of a VM after it happens? Maybe comparing some of the .o's and .a's with those produced in a successful run could explain a thing.
I believe the container should still be running. I will try to upload a full dump next time I see the issue.
Full dump (without /home/ubuntu/cache): https://send.firefox.com/download/b73400f4959ea277/#F8JJEensS8D01PQ7Va8OrA
I type make V=1, which shouldn't have the gcc and linker wrappers in its PATH and it gives me:
root@625be49b673e:/home/ubuntu/build/bitcoin/distsrc-x86_64-w64-mingw32# make V=1
Making all in src
make[1]: Entering directory '/home/ubuntu/build/bitcoin/distsrc-x86_64-w64-mingw32/src'
make[2]: Entering directory '/home/ubuntu/build/bitcoin/distsrc-x86_64-w64-mingw32/src'
/bin/bash ../libtool --tag=CXX --preserve-dup-deps --mode=link x86_64-w64-mingw32-g++ -std=c++11 -fstack-reuse=none -Wstack-protector -fstack-protector-all -fPIE -pipe -O2 -O2 -g -fvisibility=hidden -Wl,--exclude-libs,ALL -pthread -Wl,--dynamicbase -Wl,--nxcompat -Wl,--high-entropy-va -pie -all-static -L/home/ubuntu/build/bitcoin/depends/x86_64-w64-mingw32/share/../lib -o bitcoind.exe bitcoind-bitcoind.o bitcoind-res.o libbitcoin_server.a libbitcoin_wallet.a libbitcoin_server.a libbitcoin_common.a univalue/libunivalue.la libbitcoin_util.a libbitcoin_zmq.a libbitcoin_consensus.a crypto/libbitcoin_crypto_base.a crypto/libbitcoin_crypto_sse41.a crypto/libbitcoin_crypto_avx2.a crypto/libbitcoin_crypto_shani.a leveldb/libleveldb.a leveldb/libleveldb_sse42.a leveldb/libmemenv.a secp256k1/libsecp256k1.la -L/home/ubuntu/build/bitcoin/depends/x86_64-w64-mingw32/share/../lib -lboost_system-mt-s-x64 -lboost_filesystem-mt-s-x64 -lboost_thread-mt-s-x64 -lboost_chrono-mt-s-x64 -ldb_cxx-4.8 -lcrypto -lminiupnpc -levent -lzmq -lQt5AccessibilitySupport -lQt5DeviceDiscoverySupport -lQt5FbSupport -lQt5ThemeSupport -lQt5EventDispatcherSupport -lQt5FontDatabaseSupport -lssp -lcrypt32 -liphlpapi -lshlwapi -lmswsock -lws2_32 -ladvapi32 -lrpcrt4 -luuid -loleaut32 -lole32 -lcomctl32 -lshell32 -lwinmm -lwinspool -lcomdlg32 -lgdi32 -luser32 -lkernel32 -lmingwthrd
libtool: link: x86_64-w64-mingw32-g++ -std=c++11 -fstack-reuse=none -Wstack-protector -fstack-protector-all -fPIE -pipe -O2 -O2 -g -fvisibility=hidden -Wl,--exclude-libs -Wl,ALL -pthread -Wl,--dynamicbase -Wl,--nxcompat -Wl,--high-entropy-va -pie -static -o bitcoind.exe bitcoind-bitcoind.o bitcoind-res.o -L/home/ubuntu/build/bitcoin/depends/x86_64-w64-mingw32/share/../lib libbitcoin_server.a libbitcoin_wallet.a libbitcoin_server.a libbitcoin_common.a univalue/.libs/libunivalue.a libbitcoin_util.a libbitcoin_zmq.a libbitcoin_consensus.a crypto/libbitcoin_crypto_base.a crypto/libbitcoin_crypto_sse41.a crypto/libbitcoin_crypto_avx2.a crypto/libbitcoin_crypto_shani.a leveldb/libleveldb.a leveldb/libleveldb_sse42.a leveldb/libmemenv.a secp256k1/.libs/libsecp256k1.a -lboost_system-mt-s-x64 -lboost_filesystem-mt-s-x64 -lboost_thread-mt-s-x64 -lboost_chrono-mt-s-x64 -ldb_cxx-4.8 -lcrypto -lminiupnpc -levent -lzmq -lQt5AccessibilitySupport -lQt5DeviceDiscoverySupport -lQt5FbSupport -lQt5ThemeSupport -lQt5EventDispatcherSupport -lQt5FontDatabaseSupport -lssp -lcrypt32 -liphlpapi -lshlwapi -lmswsock -lws2_32 -ladvapi32 -lrpcrt4 -luuid -loleaut32 -lole32 -lcomctl32 -lshell32 -lwinmm -lwinspool -lcomdlg32 -lgdi32 -luser32 -lkernel32 -lmingwthrd -pthread
libbitcoin_server.a(libbitcoin_server_a-chain.o): In function `operator()':
/home/ubuntu/build/bitcoin/distsrc-x86_64-w64-mingw32/src/interfaces/chain.cpp:219: undefined reference to `UniValue::operator[](std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'
/home/ubuntu/build/bitcoin/distsrc-x86_64-w64-mingw32/src/interfaces/chain.cpp:220: undefined reference to `UniValue::get_int() const'
libbitcoin_server.a(libbitcoin_server_a-init.o): In function `BlockNotifyGenesisWait':
/home/ubuntu/build/bitcoin/distsrc-x86_64-w64-mingw32/src/init.cpp:606: undefined reference to `std::condition_variable::notify_all()'
libbitcoin_server.a(libbitcoin_server_a-init.o): In function `OnRPCStopped':
/home/ubuntu/build/bitcoin/distsrc-x86_64-w64-mingw32/src/init.cpp:356: undefined reference to `std::condition_variable::notify_all()'
libbitcoin_server.a(libbitcoin_server_a-init.o):/usr/lib/gcc/x86_64-w64-mingw32/7.3-posix/include/c++/thread:126: undefined reference to `std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)())'
libbitcoin_server.a(libbitcoin_server_a-init.o): In function `BlockNotifyCallback':
/home/ubuntu/build/bitcoin/distsrc-x86_64-w64-mingw32/src/init.cpp:591: undefined reference to `std::thread::detach()'
libbitcoin_server.a(libbitcoin_server_a-init.o): In function `__tcf_18':
...
Ok, I examined the archive and checked:
build/bitcoin/distsrc-x86_64-w64-mingw32/src/univalue/.libs/libunivalue.a has a symbols dictionaryUniValue::get_int() is in there (just picked an example one)nm -s -g --demangle ./build/bitcoin/distsrc-x86_64-w64-mingw32/src/univalue/.libs/libunivalue.a| grep "UniValue::get_int() const"
UniValue::get_int() const in libunivalue_la-univalue_get.o
0000000000000630 T UniValue::get_int() const
.o file using ar x and checked inside:0000000000000630 T UniValue::get_int() const
readelf -h shows:ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
wait, what: x86-64?!?!?
wait, what: ELF!
let's check one of bitcoin's own files
$ readelf -h ./build/bitcoin/distsrc-x86_64-w64-mingw32/src/libbitcoin_server_a-chain.o
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
It looks like the univalue lib was compiled for x86_64 linux. not x86_32 x86_64 windows! this explains why the linking doesn't work, at least
These are ELF:
./src/secp256k1/gen_context.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped
./src/univalue/lib/libunivalue_la-univalue_write.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped
./src/univalue/lib/libunivalue_la-univalue.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped
./src/univalue/lib/libunivalue_la-univalue_get.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped
./src/univalue/lib/libunivalue_la-univalue_read.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped
(the secp256k1 file is correct: gen_context is for the build host, not for the target)
The rest is COFF at least:
./src/libbitcoin_common_a-compressor.o: Intel amd64 COFF object file, no line number info, not stripped, 20 sections, symbol offset=0x2ba8c, 64 symbols
./src/node/libbitcoin_server_a-psbt.o: Intel amd64 COFF object file, no line number info, not stripped, 60 sections, symbol offset=0xbb4a8, 170 symbols
…
Haven't been able to find out why yet. But it fails for univalue subtree, apparently.
HRM, looks like univalue's configure does not use cross-compiler CC:
[build/bitcoin/distsrc-x86_64-w64-mingw32/src/univalue/Makefile]
CC = gcc
CCDEPMODE = depmode=none
ac_ct_CC = gcc
CXX = g++
CXXCPP = g++ -E
CXXDEPMODE = depmode=none
CXXFLAGS = -O2 -g
looking at config.log it's not configured for cross-compiling at all:
$ ./configure --disable-option-checking --prefix=/ --disable-ccache --disable-maintainer-mode --disable-dependency-tracking --enable-r
educe-exports --disable-bench --disable-gui-tests CFLAGS=-O2 -g CXXFLAGS=-O2 -g --disable-shared --with-pic --with-bignum=no --enable-mo
dule-recovery --disable-jni --cache-file=/dev/null --srcdir=. --no-create --no-recursion
…
configure:2934: checking build system type
configure:2948: result: x86_64-pc-linux-gnu
configure:2968: checking host system type
configure:2981: result: x86_64-pc-linux-gnu
CONFIG_SITE=/home/ubuntu/build/bitcoin/depends/x86_64-w64-mingw32/share/config.site wasn't passed through to the child configure script, so it doesn't pick up the cross-build configurationOk, thanks finding the issue. I will keep bisecting then
Ok, I built with one thread and it failed with a different linker error this time: win-build.log
I read the logs here, and the problem appears to come from when we ./config.status --recheck in secp256k1.
You'll see that on L1505, we configure distsrc-x86_64-w64-mingw32/src/secp256k1 once, and it seems to be getting the right system types...
=== configuring in src/secp256k1 (/home/ubuntu/build/bitcoin/distsrc-x86_64-w64-mingw32/src/secp256k1)
configure: running /bin/bash ./configure --disable-option-checking '--prefix=/' '--disable-ccache' '--disable-maintainer-mode' '--disable-dependency-tracking' '--enable-reduce-exports' '--disable-bench' '--disable-gui-tests' 'CFLAGS=-O2 -g' 'CXXFLAGS=-O2 -g' '--disable-shared' '--with-pic' '--enable-benchmark=no' '--with-bignum=no' '--enable-module-recovery' '--disable-jni' --cache-file=/dev/null --srcdir=.
configure: loading site script /home/ubuntu/build/bitcoin/depends/x86_64-w64-mingw32/share/config.site
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-w64-mingw32
but later on on L1701, we run ./config.status --recheck in secp256k1 distsrc-x86_64-w64-mingw32/src/secp256k1 and we start getting problems:
/bin/bash ./config.status --recheck
running CONFIG_SHELL=/bin/bash /bin/bash ./configure --disable-option-checking --prefix=/ --disable-ccache --disable-maintainer-mode --disable-dependency-tracking --enable-reduce-exports --disable-bench --disable-gui-tests CFLAGS=-O2 -g CXXFLAGS=-O2 -g --disable-shared --with-pic --enable-benchmark=no --with-bignum=no --enable-module-recovery --disable-jni --cache-file=/dev/null --srcdir=. --no-create --no-recursion
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
That's a good find—--recheck is not part of any of the descriptors at least. It shoudln't be there. Could something intermittently be causing a recheck?
Ok, 0.18.1 (or rather 1a8a5ede9f) also fails. Note that the corresponding gitian descriptor is properly checked out:
$ export COMMIT=1a8a5ede9f && (cd ../bitcoin && git checkout $COMMIT) && while ./bin/gbuild --num-make 9 --memory 9000 --commit bitcoin=$COMMIT ../bitcoin/contrib/gitian-descriptors/gitian-win.yml; do bash -c "echo \"one more success for $COMMIT\" >> /tmp/g_t"; done
sha256sum /home_ubuntu.zip c56059bc8914b430226c310872201afe4fee051a008fed76afe2dd624a5e7013 /home_ubuntu.zip
https://send.firefox.com/download/e152d58f4a175568/#T6fxKWupIGNR7bn26d4EqQ
Now I can't even build depends:
commit: 25c136d30e linux-build.log
What's interesting here is that jonasschnelli's nightly builds are still working properly: https://bitcoin.jonasschnelli.ch/?show=nightly#nighly
I've installed a 19.10 Ubuntu Box with docker and the gitian builds work fine in there.
I've installed a 19.10 Ubuntu Box with docker and the gitian builds work fine in there.
For the builds that failed... Did you git clean -xdff after each try?
There are two git directories: One where the gitian descriptors are drawn from, and the other where the code is drawn from. I ran git clean -dffx on both and got on the one with the gitian descriptors:
$ git clean -dffx
Removing depends/work/
No idea how this folder got there, nor how it could affect builds. Will retry now.
No longer seeing this