deadlock shutting down v25.0 #27965

issue dooglus openend this issue on June 25, 2023
  1. dooglus commented at 1:20 pm on June 25, 2023: contributor

    Is there an existing issue for this?

    • I have searched the existing issues

    Current behaviour

    I ran “bitcoin-cli stop” and it didn’t stop.

    Expected behaviour

    I expect it to stop.

    Steps to reproduce

    It usually works. Sometimes it doesn’t. I don’t know how to reproduce.

    Relevant log output

    Here’s the end of debug.log:

    2023-06-25T12:55:49Z Saw new header hash=0000000000000000000375a71ccc0569c279dbb751bdbb70d86d983cd38eea68 height=795857
    2023-06-25T12:55:55Z UpdateTip: new best=0000000000000000000375a71ccc0569c279dbb751bdbb70d86d983cd38eea68 height=795857 version=0x20600000 log2_work=94.262739 tx=856322917 date='2023-06-25T12:55:38Z' progress=1.000000 cache=90.3MiB(608799txo)
    2023-06-25T12:56:00Z New outbound peer connected: version: 70016, blocks=795857, peer=3892, peeraddr=[...].onion:8333 (block-relay-only)
    2023-06-25T12:57:09Z SetNetworkActive: false
    2023-06-25T12:57:56Z tor: Thread interrupt
    2023-06-25T12:57:56Z Shutdown: In progress...
    2023-06-25T12:57:56Z RemoveLocal([...].onion:8333)
    2023-06-25T12:57:56Z torcontrol thread exit
    2023-06-25T12:57:56Z addcon thread exit
    2023-06-25T12:57:56Z opencon thread exit
    2023-06-25T12:57:56Z net thread exit
    2023-06-25T12:57:56Z msghand thread exit
    

    Here’s the output of thread apply all where in gdb:

    gdb.txt

    Every thread appears to be waiting for something.

    There’s no CPU or disk activity, and it’s been 20 minutes since I asked it to shut down.

    How did you obtain Bitcoin Core

    Compiled from source

    What version of Bitcoin Core are you using?

    v25.0

    Operating system and version

    Debian GNU/Linux 11 (bullseye)

    Machine specifications

    No response

  2. dooglus commented at 1:31 pm on June 25, 2023: contributor

    Netstat tells me there’s nothing listening on port 8332:

    $ sudo netstat -plnt | grep 833
    tcp        0      0 127.0.0.1:8333          0.0.0.0:*               LISTEN      584775/bitcoin-qt-v 
    tcp       27      0 127.0.0.1:8334          0.0.0.0:*               LISTEN      584775/bitcoin-qt-v 
    

    Detaching gdb and running strace -f on the process shows lots of activity:

    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=12, events=POLLIN}, {fd=25, events=POLLIN}, {fd=47, events=POLLIN}, {fd=48, events=POLLIN}], 7, 895) = 1 ([{fd=5, revents=POLLIN}])
    [pid 584775] read(5, "\1\0\0\0\0\0\0\0", 16) = 8
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=12, events=POLLIN}, {fd=25, events=POLLIN}, {fd=47, events=POLLIN}, {fd=48, events=POLLIN}], 7, 895 <unfinished ...>
    [pid 584816] <... futex resumed>)       = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195866, tv_nsec=312988621}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195866, tv_nsec=814043987}, FUTEX_BITSET_MATCH_ANY <unfinished ...>
    [pid 584775] <... poll resumed>)        = 0 (Timeout)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=12, events=POLLIN}, {fd=25, events=POLLIN}, {fd=47, events=POLLIN}, {fd=48, events=POLLIN}], 7, 2249 <unfinished ...>
    [pid 584816] <... futex resumed>)       = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195867, tv_nsec=314452307}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195867, tv_nsec=815520511}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195868, tv_nsec=316601459}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195868, tv_nsec=817678830}, FUTEX_BITSET_MATCH_ANY <unfinished ...>
    [pid 584775] <... poll resumed>)        = 0 (Timeout)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=12, events=POLLIN}, {fd=25, events=POLLIN}, {fd=47, events=POLLIN}, {fd=48, events=POLLIN}], 7, 2246 <unfinished ...>
    [pid 584816] <... futex resumed>)       = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195869, tv_nsec=146454219}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195869, tv_nsec=318525074}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195869, tv_nsec=819615888}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195870, tv_nsec=320680834}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195870, tv_nsec=821742112}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195871, tv_nsec=322803638}, FUTEX_BITSET_MATCH_ANY <unfinished ...>
    [pid 584775] <... poll resumed>)        = 0 (Timeout)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=12, events=POLLIN}, {fd=25, events=POLLIN}, {fd=47, events=POLLIN}, {fd=48, events=POLLIN}], 7, 2246 <unfinished ...>
    [pid 584816] <... futex resumed>)       = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195871, tv_nsec=823810143}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195872, tv_nsec=324882052}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195872, tv_nsec=825956914}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195873, tv_nsec=327059800}, FUTEX_BITSET_MATCH_ANY <unfinished ...>
    [pid 584775] <... poll resumed>)        = 0 (Timeout)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=12, events=POLLIN}, {fd=25, events=POLLIN}, {fd=47, events=POLLIN}, {fd=48, events=POLLIN}], 7, 2247 <unfinished ...>
    [pid 584816] <... futex resumed>)       = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195873, tv_nsec=827420582}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195874, tv_nsec=328257848}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195874, tv_nsec=829327580}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195875, tv_nsec=330388270}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195875, tv_nsec=831469691}, FUTEX_BITSET_MATCH_ANY <unfinished ...>
    [pid 584775] <... poll resumed>)        = 0 (Timeout)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=12, events=POLLIN}, {fd=25, events=POLLIN}, {fd=47, events=POLLIN}, {fd=48, events=POLLIN}], 7, 2246 <unfinished ...>
    [pid 584816] <... futex resumed>)       = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195876, tv_nsec=332528457}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195876, tv_nsec=833594822}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
    [pid 584816] futex(0x7fff840046d8, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 584816] futex(0x7fff8400472c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=4195877, tv_nsec=334410531}, FUTEX_BITSET_MATCH_ANY <unfinished ...>
    [pid 584779] <... poll resumed>)        = 1 ([{fd=3, revents=POLLIN}])
    [pid 584775] <... poll resumed>)        = 1 ([{fd=6, revents=POLLIN}])
    [pid 584779] recvmsg(3,  <unfinished ...>
    [pid 584775] recvmsg(6,  <unfinished ...>
    [pid 584779] <... recvmsg resumed>{msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="#\203e7\0\0\0\0\f\0\17\0>\1\30\372e\1\0\0\2\0\0\0\0\0\0\0\0\0\0\0", iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 32
    [pid 584775] <... recvmsg resumed>{msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="#\203\310\0\0\0\0\0\f\0\17\0>\1\30\372e\1\0\0\2\0\0\0\0\0\0\0\0\0\0\0", iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 32
    [pid 584779] write(5, "\1\0\0\0\0\0\0\0", 8) = 8
    [pid 584775] recvmsg(6,  <unfinished ...>
    [pid 584779] poll([{fd=3, events=POLLIN}], 1, -1 <unfinished ...>
    [pid 584775] <... recvmsg resumed>{msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=12, events=POLLIN}, {fd=25, events=POLLIN}, {fd=47, events=POLLIN}, {fd=48, events=POLLIN}], 7, 566) = 1 ([{fd=5, revents=POLLIN}])
    [pid 584775] read(5, "\1\0\0\0\0\0\0\0", 16) = 8
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] recvmsg(6, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 584775] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=12, events=POLLIN}, {fd=25, events=POLLIN}, {fd=47, events=POLLIN}, {fd=48, events=POLLIN}], 7, 566 <unfinished ...>
    [...]
  3. bitcoin deleted a comment on Jun 25, 2023
  4. fanquake commented at 9:39 am on June 26, 2023: member
    @Crypt-iQ possibly similar to issues you’ve been seeing?
  5. maflcko commented at 10:46 am on June 26, 2023: member

    Looks like the deadlock in the shutoff thread is here?

    https://github.com/bitcoin/bitcoin/blob/642b5dd1b4fc5788ec7ab162a7c6ce0b96e7a6e4/src/httpserver.cpp#L481-L486

    So maybe this is related to #27722 ?

  6. maflcko commented at 10:51 am on June 26, 2023: member

    However, I don’t see how the debug log and the gdb output are matching. The debug log claims that the net and msghand thread have been stopped, but that only happens in connman->Stop, in a line after StopHTTPServer(), which is the current point in gdb:

    0Thread 22 (Thread 0x7fffa16fe700 (LWP 584804) "b-shutoff"):
    1[#0](/bitcoin-bitcoin/0/)  futex_wait_cancelable (private=0, expected=0, futex_word=0x5555570e4be8 <g_requests_cv+40>) at ../sysdeps/nptl/futex-internal.h:186
    2[#1](/bitcoin-bitcoin/1/)  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x5555570e4b80 <g_requests_mutex>, cond=0x5555570e4bc0 <g_requests_cv>) at pthread_cond_wait.c:508
    3[#2](/bitcoin-bitcoin/2/)  __pthread_cond_wait (cond=0x5555570e4bc0 <g_requests_cv>, mutex=0x5555570e4b80 <g_requests_mutex>) at pthread_cond_wait.c:638
    4[#3](/bitcoin-bitcoin/3/)  0x00007ffff665e90c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
    5[#4](/bitcoin-bitcoin/4/)  0x0000555555db2293 in void std::condition_variable::wait<StopHTTPServer()::{lambda()#1}>(std::unique_lock<std::mutex>&, StopHTTPServer()::{lambda()#1}) ()
    6[#5](/bitcoin-bitcoin/5/)  0x0000555555daf47a in StopHTTPServer() ()
    7[#6](/bitcoin-bitcoin/6/)  0x000055555580c871 in Shutdown(node::NodeContext&) ()
    
  7. Crypt-iQ commented at 10:52 am on June 26, 2023: contributor
    Yeah I think the thread exits confused me since torcontrol, connman are stopped after the httpserver. I think those log lines mean the threads were interrupted and exited their loop (hence the exit log line) and are now joinable. So I think it’s a dupe. If you look at #27722 you’ll see that the tor control thread, some connman threads have exited as well
  8. maflcko commented at 10:57 am on June 26, 2023: member

    Oh I see. StopThreads doesn’t stop any threads. It just joins them, so maybe the method should be renamed?

    Closing as dupe for now.

  9. maflcko closed this on Jun 26, 2023

  10. bitcoin locked this on Jun 25, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me