Running the native_fuzz_with_valgrind_job
, on aarch64 (Fedora 37), I’ve seen the following:
0Run addr_info_deserialize with args ['valgrind', '--quiet', '--error-exitcode=1', '/home/fedora/ci_scratch/ci/scratch/build/bitcoin-aarch64-unknown-linux-gnu/src/test/fuzz/fuzz', '-runs=1', '/home/fedora/ci_scratch/ci/scratch/qa-assets/fuzz_seed_corpus/addr_info_deserialize']
1valgrind: m_libcfile.c:66 (vgPlain_safe_fd): Assertion 'newfd >= VG_(fd_hard_limit)' failed.
2
3
4valgrind: m_libcfile.c:66 (vgPlain_safe_fd): Assertion 'newfd >= VG_(fd_hard_limit)' failed.
5
6Target "valgrind --quiet --error-exitcode=1 /home/fedora/ci_scratch/ci/scratch/build/bitcoin-aarch64-unknown-linux-gnu/src/test/fuzz/fuzz -runs=1 /home/fedora/ci_scratch/ci/scratch/qa-assets/fuzz_seed_corpus/addr_info_deserialize" failed with exit code -11
7./ci/test/04_install.sh: line 98: pop_var_context: head of shell_variables not a function context
This was first reported as a Valgrind bug, https://bugs.kde.org/show_bug.cgi?id=465435, however:
I really think that the problem is with Docker. It’s advertising some ridiculously high value for ulimit -n like 1048576. Valgrind wants to put its own files in the top 12 of those slots, and is trying to to a fcntl(oldfd, F_DUPFD, 1048576-12) - note that 1048576-12 matches the 1048564 that you get from the patch message. Then Docker fails to honour its promised file descriptor limit and the fcntl fails.
So the easiest thing to do here might just be to set some sane ulimit values (during docker run), that still work for all other jobs, and avoid the Valgrind assertion (which should become a more useful error message at some point?).
Opening a PR for discussion/brainstorming. The changes in this PR (from the bug report) “fix” this particular issue, but I haven’t yet tested all jobs etc. Maybe we’d rather only do this on the affected test.