Spurious (?) valgrind failure for p2p_compactblocks.py #27741

issue Sjors openend this issue on May 24, 2023
  1. Sjors commented at 3:23 pm on May 24, 2023: member

    Ubuntu 23.04 with g++ 12.2.0 and valgrind 3.19.0. Built without depends and without --enable-debug.

    The failure is introduced by: https://github.com/bitcoin/bitcoin/pull/27626/commits/cce96182ba2457335868c65dc16b081c3dee32ee

    It can also be reproduced on master a13f3746dccd9c4ec16d6bfe9b33ebd26e3238e1.

    Offending line in RemoveBlockRequest:

    0if (from_peer && *from_peer != node_id) {
    

    It seems to believe from_peer is uninitialized. Adding a log statement that prints from_peer makes the error go away. As does compiling with --enable-debug. Others were not able to reproduce this at all.

    This seems like a Valgrind bug, not an actual problem with the code.

    node0/stderr/…:

     0==2894643== Thread 25 b-msghand:
     1==2894643== Conditional jump or move depends on uninitialised value(s)
     2==2894643==    at 0x24A6EC: (anonymous namespace)::PeerManagerImpl::RemoveBlockRequest(uint256 const&, std::optional<long>) [clone .isra.0] (in /home/sjors/dev/bitcoin/src/bitcoind)
     3==2894643==    by 0x25AAA1: (anonymous namespace)::PeerManagerImpl::ProcessBlock(CNode&, std::shared_ptr<CBlock const> const&, bool, bool) (in /home/sjors/dev/bitcoin/src/bitcoind)
     4==2894643==    by 0x2718A4: (anonymous namespace)::PeerManagerImpl::ProcessMessage(CNode&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, CDataStream&, std::chrono::duration<long, std::ratio<1l, 1000000l> >, std::atomic<bool> const&) (in /home/sjors/dev/bitcoin/src/bitcoind)
     5==2894643==    by 0x274E68: (anonymous namespace)::PeerManagerImpl::ProcessMessages(CNode*, std::atomic<bool>&) (in /home/sjors/dev/bitcoin/src/bitcoind)
     6==2894643==    by 0x22A9E7: CConnman::ThreadMessageHandler() (in /home/sjors/dev/bitcoin/src/bitcoind)
     7==2894643==    by 0x620BF5: util::TraceThread(std::basic_string_view<char, std::char_traits<char> >, std::function<void ()>) (in /home/sjors/dev/bitcoin/src/bitcoind)
     8==2894643==    by 0x22095C: std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(std::basic_string_view<char, std::char_traits<char> >, std::function<void ()>), char const*, CConnman::Start(CScheduler&, CConnman::Options const&)::{lambda()#5}> > >::_M_run() (in /home/sjors/dev/bitcoin/src/bitcoind)
     9==2894643==    by 0x494F3E2: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.31)
    10==2894643==    by 0x4C6F189: start_thread (pthread_create.c:444)
    11==2894643==    by 0x4CFDAF3: clone (clone.S:100)
    12==2894643== 
    13{
    14   <insert_a_suppression_name_here>
    15   Memcheck:Cond
    16   fun:_ZN12_GLOBAL__N_115PeerManagerImpl18RemoveBlockRequestERK7uint256St8optionalIlE.isra.0
    17   fun:_ZN12_GLOBAL__N_115PeerManagerImpl12ProcessBlockER5CNodeRKSt10shared_ptrIK6CBlockEbb
    18   fun:_ZN12_GLOBAL__N_115PeerManagerImpl14ProcessMessageER5CNodeRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEER11CDataStreamNSt6chrono8durationIlSt5ratioILl1ELl1000000EEEERKSt6atomicIbE
    19   fun:_ZN12_GLOBAL__N_115PeerManagerImpl15ProcessMessagesEP5CNodeRSt6atomicIbE
    20   fun:_ZN8CConnman20ThreadMessageHandlerEv
    21   fun:_ZN4util11TraceThreadESt17basic_string_viewIcSt11char_traitsIcEESt8functionIFvvEE
    22   fun:_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJPFvSt17basic_string_viewIcSt11char_traitsIcEESt8functionIFvvEEEPKcZN8CConnman5StartER10CSchedulerRKNSE_7OptionsEEUlvE3_EEEEE6_M_runEv
    23   obj:/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.31
    24   fun:start_thread
    25   fun:clone
    26}
    27==2894643== 
    28==2894643== Exit program on first error (--exit-on-first-error=yes)
    

    Combined log shows where it happens in the test, right after receiving msg_sendcmpct at delivery_peer.send_and_ping(msg_cmpctblock(cmpct_block.to_p2p())).

     02023-05-24T12:44:08.114000Z TestFramework (INFO): PRNG seed is: 6887258428905600896
     12023-05-24T12:44:08.115000Z TestFramework (DEBUG): Setting up network thread
     22023-05-24T12:44:08.115000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_zkozpx3h
     32023-05-24T12:44:08.116000Z TestFramework.node0 (DEBUG): bitcoind started, waiting for RPC to come up
     42023-05-24T12:44:11.558000Z TestFramework.node0 (DEBUG): RPC successfully started
     52023-05-24T12:44:11.662000Z TestFramework.p2p (DEBUG): Connecting to Bitcoin Node: 127.0.0.1:12377
     62023-05-24T12:44:11.663000Z TestFramework.p2p (DEBUG): Connected & Listening: 127.0.0.1:12377
     72023-05-24T12:44:11.663000Z TestFramework.p2p (DEBUG): Send message to 127.0.0.1:12377: msg_version(nVersion=70016 nServices=9 nTime=Wed May 24 14:44:11 2023 addrTo=CAddress(nServices=1 net=IPv4 addr=127.0.0.1 port=12377) addrFrom=CAddress(nServices=1 net=IPv4 addr=0.0.0.0 port=0) nNonce=0x8B641D5BF26C25DE strSubVer=/python-p2p-tester:0.0.3/ nStartingHeight=-1 relay=1)
     82023-05-24T12:44:11.702000Z TestFramework.p2p (DEBUG): Received message from 127.0.0.1:12377: msg_version(nVersion=70016 nServices=1033 nTime=Wed May 24 14:44:11 2023 addrTo=CAddress(nServices=0 net=IPv4 addr=0.0.0.0 port=0) addrFrom=CAddress(nServices=1033 net=IPv4 addr=0.0.0.0 port=0) nNonce=0xEBA5B51C154D242B strSubVer=/Satoshi:25.99.0(testnode0)/ nStartingHeight=0 relay=1)
     9
    10[... cut because comments max 65536 chars ... ]
    11
    122023-05-24T12:44:18.001000Z TestFramework.p2p (DEBUG): Send message to 127.0.0.1:12377: msg_tx(tx=CTransaction(nVersion=2 vin=[CTxIn(prevout=COutPoint(hash=8f4329eaafed2c1755c9dd86d495e79a167cdd3e4dd5819125a6724febc20c17 n=0) scriptSig= nSequence=0)] vout=[CTxOut(nValue=4.99997000 scriptPubKey=51755175517551755175517551755175517551755175517551755175517551)] wit=CTxWitness() nLockTime=0))
    132023-05-24T12:44:18.001000Z TestFramework.p2p (DEBUG): Send message to 127.0.0.1:12377: msg_tx(tx=CTransaction(nVersion=2 vin=[CTxIn(prevout=COutPoint(hash=46593581928ea644c434b3f459f0cd9f056761ce4e9b2d69e2691359ae031526 n=0) scriptSig= nSequence=0)] vout=[CTxOut(nValue=4.99996000 scriptPubKey=51755175517551755175517551755175517551755175517551755175517551)] wit=CTxWitness() nLockTime=0))
    142023-05-24T12:44:18.001000Z TestFramework.p2p (DEBUG): Send message to 127.0.0.1:12377: msg_tx(tx=CTransaction(nVersion=2 vin=[CTxIn(prevout=COutPoint(hash=a799b1c2eef43346c704913c06e7323660bbb96bf25fd85d3416f9fd8eaedc2a n=0) scriptSig= nSequence=0)] vout=[CTxOut(nValue=4.99995000 scriptPubKey=51755175517551755175517551755175517551755175517551755175517551)] wit=CTxWitness() nLockTime=0))
    152023-05-24T12:44:18.001000Z TestFramework.p2p (DEBUG): Send message to 127.0.0.1:12377: msg_ping(nonce=00000009)
    162023-05-24T12:44:18.011000Z TestFramework.p2p (DEBUG): Received message from 127.0.0.1:12377: msg_pong(nonce=00000009)
    172023-05-24T12:44:18.054000Z TestFramework.p2p (DEBUG): Send message to 127.0.0.1:12377: msg_cmpctblock(HeaderAndShortIDs=P2PHeaderAndShortIDs(header=CBlockHeader(nVersion=805306368 hashPrevBlock=47ed1663e361a06b9088c35faf83e39e4a57c6fe44d733ebe22ac5bf4c18e56d hashMerkleRoot=0cca0a75f4dbeb22cfb8d625a8fd5bf5644ae8b403598bdaaa1095b2a860765f nTime=Wed May 24 14:44:52 2023 nBits=207fffff nNonce=00000002), nonce=0, shortids_length=5, shortids=[215827416537220, 271830461757115, 254576430693737, 118474923560500, 145899820988280], prefilled_txn_length=1, prefilledtxn=[PrefilledTransaction(i... (msg truncated)
    182023-05-24T12:44:18.054000Z TestFramework.p2p (DEBUG): Send message to 127.0.0.1:12377: msg_ping(nonce=0000000a)
    192023-05-24T12:44:18.065000Z TestFramework.p2p (DEBUG): Received message from 127.0.0.1:12377: msg_sendcmpct(announce=True, version=2)
    202023-05-24T12:44:18.088000Z TestFramework.p2p (DEBUG): Closed connection to: 127.0.0.1:12377
    212023-05-24T12:44:18.088000Z TestFramework.p2p (DEBUG): Closed connection to: 127.0.0.1:12377
    222023-05-24T12:44:18.105000Z TestFramework (ERROR): Assertion failed
    23Traceback (most recent call last):
    24  File "/home/sjors/dev/bitcoin/test/functional/test_framework/test_framework.py", line 132, in main
    25    self.run_test()
    26  File "test/functional/p2p_compactblocks.py", line 864, in run_test
    27    self.test_compactblock_reconstruction_multiple_peers(self.segwit_node, self.additional_segwit_node)
    28  File "test/functional/p2p_compactblocks.py", line 773, in test_compactblock_reconstruction_multiple_peers
    29    delivery_peer.send_and_ping(msg_cmpctblock(cmpct_block.to_p2p()))
    30  File "/home/sjors/dev/bitcoin/test/functional/test_framework/p2p.py", line 553, in send_and_ping
    31    self.sync_with_ping(timeout=timeout)
    32  File "/home/sjors/dev/bitcoin/test/functional/test_framework/p2p.py", line 570, in sync_with_ping
    33    self.wait_until(test_function, timeout=timeout)
    34  File "/home/sjors/dev/bitcoin/test/functional/test_framework/p2p.py", line 463, in wait_until
    35    wait_until_helper(test_function, timeout=timeout, lock=p2p_lock, timeout_factor=self.timeout_factor)
    36  File "/home/sjors/dev/bitcoin/test/functional/test_framework/util.py", line 266, in wait_until_helper
    37    if predicate():
    38  File "/home/sjors/dev/bitcoin/test/functional/test_framework/p2p.py", line 460, in test_function
    39    assert self.is_connected
    40AssertionError
    412023-05-24T12:44:18.106000Z TestFramework (DEBUG): Closing down network thread
    422023-05-24T12:44:18.156000Z TestFramework (INFO): Stopping nodes
    432023-05-24T12:44:18.157000Z TestFramework.node0 (DEBUG): Stopping node
    
  2. fanquake commented at 3:30 pm on May 24, 2023: member
    Can you make it clear what version of the code you are actually running valgrind on? Are you compiling cce96182ba2457335868c65dc16b081c3dee32ee or are you compiling master (if so, which commit).
  3. Sjors commented at 5:38 pm on May 24, 2023: member
    I compiled on cce96182ba2457335868c65dc16b081c3dee32ee, but can reproduce on master. Updated description.
  4. MarcoFalke commented at 7:30 am on May 25, 2023: member

    Interesting. This should be a known bug. For now you can just use clang instead, see also the compilers that are known to be working with valgrind: (Your Ubuntu 23.04 should be identical to Debian Bookworm for the purposes here)

    https://github.com/bitcoin/bitcoin/blob/a13f3746dccd9c4ec16d6bfe9b33ebd26e3238e1/contrib/valgrind.supp#L15-L17

  5. MarcoFalke added the label Tests on May 25, 2023
  6. MarcoFalke added the label Upstream on May 25, 2023
  7. MarcoFalke added the label Questions and Help on May 25, 2023
  8. MarcoFalke commented at 11:21 am on May 25, 2023: member

    Steps to reproduce on a fresh install of Ubuntu Mantic (currently the same packages as your 23.04):

    0export DEBIAN_FRONTEND=noninteractive && apt update && apt install curl wget htop git vim ccache -y && git clone https://github.com/bitcoin/bitcoin.git  --depth=1 ./bitcoin-core && cd bitcoin-core && apt install build-essential libtool autotools-dev automake pkg-config bsdmainutils python3-zmq     libevent-dev libboost-dev valgrind   -y   &&  ./autogen.sh && ./configure   && make -j $(nproc)
    1./test/functional/p2p_compactblocks.py --valgrind
    2./test/functional/combine_logs.py -c /tmp/bitcoin_func_test_wnb4s2d5
    

    Same with:

    0$ g++-13 --version 
    1g++-13 (Ubuntu 13.1.0-2ubuntu2) 13.1.0
    
  9. MarcoFalke commented at 11:21 am on May 25, 2023: member
    Maybe try clang over gcc or a newer version of valgrind for now?
  10. MarcoFalke commented at 1:03 pm on May 25, 2023: member

    Oh, and if you use clang and valgrind 3.19, you’ll also have to set gdwarf:

    ./configure CC='clang -gdwarf-4' CXX='clang++ -gdwarf-4'

  11. MarcoFalke commented at 5:01 pm on May 25, 2023: member

    Looks like valgrind 3.21 still has this issue. I tried on Tumbleweed OpenSuse:

    0zypper in -y valgrind  libevent-devel boost-devel       find bison gcc-c++ libtool make autoconf automake python3  lbzip2 patch xz      curl wget htop git vim ccache  && git clone https://github.com/bitcoin/bitcoin.git  --depth=1 ./bitcoin-core && cd bitcoin-core &&    ./autogen.sh &&  ./configure  --disable-tests --disable-bench && make -j $(nproc) && ./test/functional/p2p_compactblocks.py --valgrind
    

    Maybe someone can report the bug to valgrind upstream?

  12. Sjors commented at 1:17 pm on May 30, 2023: member
    Using clang indeed makes this failure go away for me.
  13. fanquake commented at 2:42 pm on May 31, 2023: member

    Looks like valgrind 3.21 still has this issue.

    Valgrind 3.22 (master @ e3602b3eec0696b183722ea21da14588cf205c74) also still fails.

  14. MarcoFalke commented at 10:13 am on July 14, 2023: member
    I haven’t tried, but an alternative to using clang would be to try to set -O1 for gcc, but I am not sure on how to do this cleanly?
  15. fanquake commented at 3:13 pm on July 14, 2023: member

    Maybe someone can report the bug to valgrind upstream?

    Looks like the issue no-longer reproduces using the steps from #27741 (comment) (for me) using current master. Does this still repro for anyone else?

  16. MarcoFalke commented at 3:51 pm on July 14, 2023: member
    I can still reproduce with #27741 (comment).
  17. fanquake commented at 3:57 pm on July 14, 2023: member

    I can still reproduce with #27741 (comment).

    Ah yep, it still happens (and I guess only happens) on x86_64, not on aarch64 (where I tested earlier).

  18. MarcoFalke commented at 12:02 pm on July 17, 2023: member
    Tracked it down to ./configure CFLAGS="-O1 -g" CXXFLAGS="-O1 -g -fipa-sra " --disable-bench --disable-tests --disable-fuzz-binary so far.
  19. MarcoFalke commented at 12:09 pm on July 17, 2023: member

    This can be fixed, like most other C++ bugs can be fixed: “Add or remove * or & in random places, until it works.”

    Diff:

    0     while (range.first != range.second) {
    1-        auto [node_id, list_it] = range.first->second;
    2+        auto& [node_id, list_it] = range.first->second;
    3 
    

    Or, with added const:

     0diff --git a/src/net_processing.cpp b/src/net_processing.cpp
     1index 8da2c70..f26522b 100644
     2--- a/src/net_processing.cpp
     3+++ b/src/net_processing.cpp
     4@@ -1144,7 +1144,7 @@ void PeerManagerImpl::RemoveBlockRequest(const uint256& hash, std::optional<Node
     5     Assume(mapBlocksInFlight.count(hash) <= MAX_CMPCTBLOCKS_INFLIGHT_PER_BLOCK);
     6 
     7     while (range.first != range.second) {
     8-        auto [node_id, list_it] = range.first->second;
     9+        const auto& [node_id, list_it] { range.first->second};
    10 
    11         if (from_peer && *from_peer != node_id) {
    12             range.first++;
    
  20. MarcoFalke commented at 8:27 am on July 18, 2023: member
  21. MarcoFalke removed the label Questions and Help on Jul 18, 2023
  22. Sjors commented at 11:48 am on July 28, 2023: member
    I guess both changes are a strict improvement, but won’t put an end to the whack-a-mole grind.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-01 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me