Shutdown deadlock in SyncWithValidationInterfaceQueue #12229

issue theuni openend this issue on January 19, 2018
  1. theuni commented at 7:15 pm on January 19, 2018: member

    I’m hoping to take a look at this at some point in the next day or two, but logging here in case I forget. Ping @TheBlueMatt.

    Trigger conditions: I was catching up after ~1week offline, and interrupted halfway through.

    The backtrace is straightforward:

     0(gdb) thread apply all bt
     1Thread 3 (Thread 0x7fbb37fff700 (LWP 31792)):
     2[#0](/bitcoin-bitcoin/0/)  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
     3[#1](/bitcoin-bitcoin/1/)  0x00007fbb75a9aa16 in std::__atomic_futex_unsigned_base::_M_futex_wait_until(unsigned int*, unsigned int, bool, std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
     4[#2](/bitcoin-bitcoin/2/)  0x000055d498e501e0 in SyncWithValidationInterfaceQueue() ()
     5[#3](/bitcoin-bitcoin/3/)  0x000055d498e34965 in CChainState::ActivateBestChain(CValidationState&, CChainParams const&, std::shared_ptr<CBlock const>) ()
     6[#4](/bitcoin-bitcoin/4/)  0x000055d498e36847 in ProcessNewBlock(CChainParams const&, std::shared_ptr<CBlock const>, bool, bool*) ()
     7[#5](/bitcoin-bitcoin/5/)  0x000055d498d32464 in ProcessMessage(CNode*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, CDataStream&, long, CChainParams const&, CConnman*, std::atomic<bool> const&) [clone .constprop.1364] ()
     8[#6](/bitcoin-bitcoin/6/)  0x000055d498d3dd80 in PeerLogicValidation::ProcessMessages(CNode*, std::atomic<bool>&) ()
     9[#7](/bitcoin-bitcoin/7/)  0x000055d498cee7cd in CConnman::ThreadMessageHandler() ()
    10[#8](/bitcoin-bitcoin/8/)  0x000055d498cce47f in void TraceThread<std::function<void ()> >(char const*, std::function<void ()>) ()
    11[#9](/bitcoin-bitcoin/9/)  0x000055d498d065cb in std::thread::_Impl<std::_Bind_simple<void (*(char const*, std::function<void ()>))(char const*, std::function<void ()>)> >::_M_run() ()
    12[#10](/bitcoin-bitcoin/10/) 0x00007fbb75a9cc80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
    13[#11](/bitcoin-bitcoin/11/) 0x00007fbb75d6d6ba in start_thread (arg=0x7fbb37fff700) at pthread_create.c:333
    14[#12](/bitcoin-bitcoin/12/) 0x00007fbb7520241d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
    15
    16Thread 2 (Thread 0x7fbb61e98700 (LWP 31778)):
    17[#0](/bitcoin-bitcoin/0/)  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    18[#1](/bitcoin-bitcoin/1/)  0x000055d499034a1b in leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper(void*) ()
    19[#2](/bitcoin-bitcoin/2/)  0x00007fbb75d6d6ba in start_thread (arg=0x7fbb61e98700) at pthread_create.c:333
    20[#3](/bitcoin-bitcoin/3/)  0x00007fbb7520241d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
    21
    22Thread 1 (Thread 0x7fbb76384740 (LWP 31765)):
    23[#0](/bitcoin-bitcoin/0/)  0x00007fbb75d6e98d in pthread_join (threadid=140442075133696, thread_return=0x0) at pthread_join.c:90
    24[#1](/bitcoin-bitcoin/1/)  0x00007fbb75a9cb97 in std::thread::join() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
    25[#2](/bitcoin-bitcoin/2/)  0x000055d498d037ab in CConnman::Stop() ()
    26[#3](/bitcoin-bitcoin/3/)  0x000055d498cc9630 in Shutdown() ()
    27[#4](/bitcoin-bitcoin/4/)  0x000055d498c9e2a3 in AppInit(int, char**) ()
    28[#5](/bitcoin-bitcoin/5/)  0x000055d498c9164f in main ()
    
  2. TheBlueMatt commented at 10:30 pm on January 19, 2018: member

    The issue that you hit is that the threadGroup including the scheduler thread got interrupted and shut down prior to the CConnman Stop(), which occurred prior to the GetMainSignals().FlushBackgroundCallbacks(). The only other thing in the same threadGroup, IIRC, are the script check threads, meaning you may have a similar race where running ConnectBlock’s, which will fall back to single threaded validation for largely no reason. I do not believe the second issue is a race, as the single thread will do work while waiting for the queue to finish, though previous suggested changes to the queue did not do so, and would have created a subtle race.

    Thus, I think the correct fix is to delay interrupting the threadGroup until immediately prior to the FlushBackgroundCallbacks() or FlushStateToDisk() call in Shutdown().

  3. MarcoFalke added this to the milestone 0.16.0 on Jan 23, 2018
  4. TheBlueMatt referenced this in commit bc0df97d0c on Jan 25, 2018
  5. TheBlueMatt referenced this in commit 935f287a60 on Jan 25, 2018
  6. TheBlueMatt referenced this in commit 082a61c69d on Jan 25, 2018
  7. laanwj closed this on Jan 30, 2018

  8. laanwj referenced this in commit 3448907a68 on Jan 30, 2018
  9. Christewart referenced this in commit 56cb277588 on Feb 8, 2018
  10. hkjn referenced this in commit 8ed765bb38 on Feb 12, 2018
  11. HashUnlimited referenced this in commit e863033209 on Mar 16, 2018
  12. lionello referenced this in commit 5b2ae935a4 on Nov 7, 2018
  13. PastaPastaPasta referenced this in commit 3eff18c572 on Jan 17, 2020
  14. PastaPastaPasta referenced this in commit 3835feecd6 on Feb 12, 2020
  15. codablock referenced this in commit bac693df27 on Mar 23, 2020
  16. codablock referenced this in commit c01e39d610 on Mar 23, 2020
  17. ckti referenced this in commit 63adf464f0 on Mar 28, 2021
  18. gades referenced this in commit 540432932e on Jun 30, 2021
  19. MarcoFalke locked this on Sep 8, 2021
  20. gades referenced this in commit e14ed0480e on Feb 15, 2022


theuni TheBlueMatt

Milestone
0.16.0


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me