scheduler_tests (currently disabled) occasionally deadlocks #6540

issue posita opened this issue on August 10, 2015
  1. posita commented at 5:20 PM on August 10, 2015: contributor

    (Largely reproduced from #6171 (comment) et seq.)

    I'm seeing occasional deadlocks in test_bitcoin as recently as v0.11.0, but only with debugging symbols removed (e.g., via eu-strip .../bin/test_bitcoin). They may also happen with a binary whose debugging symbols are intact, but I haven't observed that (yet).

    Here's what I found:

    % eu-strip -f .../bin/test_bitcoin.debug .../bin/test_bitcoin
    % # Run test_bitcoin in another shell and wait for it to hang
    % gdb -s .../bin/test_bitcoin.debug .../bin/test_bitcoin <PID>
    ...
    Reading symbols from .../bin/test_bitcoin...Reading symbols from .../bin/test_bitcoin.debug...done.
    done.
    Attaching to program: .../bin/test_bitcoin, process <PID>
    ...
    (gdb) bt
    [#0](/bitcoin-bitcoin/0/)  0x00007fa0334e308f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
    [#1](/bitcoin-bitcoin/1/)  0x00007fa0359f8a9b in boost::condition_variable::wait (this=0x7fa036fde358, m=...) at /usr/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/bitcoin-bitcoin/2/)  0x00007fa034e455ac in boost::thread::join_noexcept (this=this@entry=0x7fa036fd9180) at libs/thread/src/pthread/thread.cpp:312
    [#3](/bitcoin-bitcoin/3/)  0x00007fa0359f93e6 in join (this=0x7fa036fd9180) at /usr/include/boost/thread/detail/thread.hpp:756
    [#4](/bitcoin-bitcoin/4/)  boost::thread_group::join_all (this=this@entry=0x7ffe608fbf90) at /usr/include/boost/thread/detail/thread_group.hpp:118
    [#5](/bitcoin-bitcoin/5/)  0x00007fa0359f6c7e in scheduler_tests::manythreads::test_method (this=this@entry=0x7ffe608fcc87) at test/scheduler_tests.cpp:109
    [#6](/bitcoin-bitcoin/6/)  0x00007fa0359f70ce in scheduler_tests::manythreads_invoker () at test/scheduler_tests.cpp:43
    [#7](/bitcoin-bitcoin/7/)  0x00007fa035970bf7 in invoke<void (*)()> (this=<optimized out>, f=<optimized out>) at /usr/include/boost/test/utils/callback.hpp:56
    [#8](/bitcoin-bitcoin/8/)  boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke (this=<optimized out>)
        at /usr/include/boost/test/utils/callback.hpp:89
    [#9](/bitcoin-bitcoin/9/)  0x00007fa0349ed8f1 in operator() (this=<optimized out>) at ./boost/test/utils/callback.hpp:118
    [#10](/bitcoin-bitcoin/10/) operator() (this=<optimized out>) at ./boost/test/impl/unit_test_monitor.ipp:41
    [#11](/bitcoin-bitcoin/11/) invoke<boost::unit_test::(anonymous namespace)::zero_return_wrapper_t<boost::unit_test::callback0<> > > (this=<optimized out>, f=...)
        at ./boost/test/utils/callback.hpp:42
    [#12](/bitcoin-bitcoin/12/) boost::unit_test::ut_detail::callback0_impl_t<int, boost::unit_test::(anonymous namespace)::zero_return_wrapper_t<boost::unit_test::callback0<boost::unit_test::ut_detail::unused> > >::invoke (this=<optimized out>) at ./boost/test/utils/callback.hpp:89
    [#13](/bitcoin-bitcoin/13/) 0x00007fa0349c46de in operator() (this=0x7ffe608fd7d0) at ./boost/test/utils/callback.hpp:118
    [#14](/bitcoin-bitcoin/14/) do_invoke<boost::scoped_ptr<boost::detail::translate_exception_base>, boost::unit_test::callback0<int> > (F=..., tr=...)
        at ./boost/test/impl/execution_monitor.ipp:281
    [#15](/bitcoin-bitcoin/15/) boost::execution_monitor::catch_signals (
        this=this@entry=0x7fa034c2d9a0 <boost::unit_test::singleton<boost::unit_test::unit_test_monitor_t>::instance()::the_inst>, F=...)
        at ./boost/test/impl/execution_monitor.ipp:885
    [#16](/bitcoin-bitcoin/16/) 0x00007fa0349c4f33 in boost::execution_monitor::execute (
        this=this@entry=0x7fa034c2d9a0 <boost::unit_test::singleton<boost::unit_test::unit_test_monitor_t>::instance()::the_inst>, F=...)
        at ./boost/test/impl/execution_monitor.ipp:1211
    [#17](/bitcoin-bitcoin/17/) 0x00007fa0349eda05 in boost::unit_test::unit_test_monitor_t::execute_and_translate (
        this=0x7fa034c2d9a0 <boost::unit_test::singleton<boost::unit_test::unit_test_monitor_t>::instance()::the_inst>, tc=...)
        at ./boost/test/impl/unit_test_monitor.ipp:69
    [#18](/bitcoin-bitcoin/18/) 0x00007fa0349d44af in boost::unit_test::framework_impl::visit (this=0x7fa034c2d8c0 <boost::unit_test::(anonymous namespace)::s_frk_impl()::the_inst>,
        tc=...) at ./boost/test/impl/framework.ipp:156
    [#19](/bitcoin-bitcoin/19/) 0x00007fa034a08a33 in boost::unit_test::traverse_test_tree (suite=..., V=...) at ./boost/test/impl/unit_test_suite.ipp:207
    [#20](/bitcoin-bitcoin/20/) 0x00007fa034a08a33 in boost::unit_test::traverse_test_tree (suite=..., V=...) at ./boost/test/impl/unit_test_suite.ipp:207
    [#21](/bitcoin-bitcoin/21/) 0x00007fa0349cf9fa in boost::unit_test::framework::run (id=1, id@entry=4294967295, continue_test=continue_test@entry=true)
        at ./boost/test/impl/framework.ipp:442
    [#22](/bitcoin-bitcoin/22/) 0x00007fa0349eb697 in boost::unit_test::unit_test_main (init_func=<optimized out>, argc=<optimized out>, argv=<optimized out>)
        at ./boost/test/impl/unit_test_main.ipp:185
    [#23](/bitcoin-bitcoin/23/) 0x00007fa03314fb45 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
    [#24](/bitcoin-bitcoin/24/) 0x00007fa035957b7e in _start ()
    (gdb) generate-core-file
    warning: Memory read failed for corefile section, 8192 bytes at 0x7ffe6097a000.
    Saved corefile core.<PID>
    ...
    

    It looks like it's hanging on a join, but I don't know why. See test/scheduler_tests.cpp#L109. A core file (dumped from gdb) is here, if it's worth anything.

  2. gavinandresen commented at 5:25 PM on August 10, 2015: contributor

    What version of Boost and operating system?

  3. TheBlueMatt commented at 11:18 AM on August 13, 2015: member

    Can you "info threads" and "bt" from more than just the one thread?

  4. TheBlueMatt commented at 11:19 AM on August 13, 2015: member

    Oh, and can you reproduce with -DDEBUG_LOCKORDER?

  5. dexX7 commented at 2:17 AM on August 14, 2015: contributor

    @posita: I think I observed this/a related issue some weeks ago, too: #6278 (comment)

    Back then I didn't strip the files.

  6. posita commented at 4:02 AM on August 14, 2015: contributor

    What version of Boost and operating system?

    Debian stable ("jessie") w/ Boost 1.55.0.2:

    root@f7c305516d72:/# uname -a
    Linux f7c305516d72 4.0.7-boot2docker [#1](/bitcoin-bitcoin/1/) SMP Wed Jul 15 00:01:41 UTC 2015 x86_64 GNU/Linux
    root@f7c305516d72:/# cat /etc/debian_version
    8.1
    root@f7c305516d72:/# dpkg -l libboost-all-dev
    Desired=Unknown/Install/Remove/Purge/Hold
    | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
    |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
    ||/ Name                             Version               Architecture          Description
    +++-================================-=====================-=====================-=====================================================================
    ii  libboost-all-dev                 1.55.0.2              amd64                 Boost C++ Libraries development files (ALL) (default version)
    
  7. posita commented at 7:20 PM on August 14, 2015: contributor

    Can you "info threads" and "bt" from more than just the one thread? Oh, and can you reproduce with -DDEBUG_LOCKORDER -g?

    Yes, but test_bitcoin did not produce any extra output (if it was supposed to). Below are the stack traces, and how I got them. The core file is here. It looks like scheduler.cpp#L40 is where the other threads are blocked, perhaps on each other?

    Prep

    root@e282d4a2b356:/# cd .../bitcoin
    root@e282d4a2b356:.../bitcoin# git clean -Xdf
    ...
    root@e282d4a2b356:.../bitcoin# git clean -df
    ...
    root@e282d4a2b356:.../bitcoin# git checkout v0.11.0
    ...
    root@e282d4a2b356:.../bitcoin# git branch
    * (detached from v0.11.0)
      master
    root@e282d4a2b356:.../bitcoin# git status --ignored
    HEAD detached at v0.11.0
    nothing to commit, working directory clean
    root@e282d4a2b356:.../bitcoin# ./autogen.sh
    ...
    root@e282d4a2b356:.../bitcoin# CFLAGS='-DDEBUG_LOCKORDER -g' CXXFLAGS='-DDEBUG_LOCKORDER -g' CPPFLAGS='-DDEBUG_LOCKORDER -g' ./configure --disable-wallet
    ...
    root@e282d4a2b356:.../bitcoin# CFLAGS='-DDEBUG_LOCKORDER -g' CXXFLAGS='-DDEBUG_LOCKORDER -g' CPPFLAGS='-DDEBUG_LOCKORDER -g' make
    ...
    root@e282d4a2b356:.../bitcoin# CFLAGS='-DDEBUG_LOCKORDER -g' CXXFLAGS='-DDEBUG_LOCKORDER -g' CPPFLAGS='-DDEBUG_LOCKORDER -g' make check
    ...
    root@e282d4a2b356:.../bitcoin# CFLAGS='-DDEBUG_LOCKORDER -g' CXXFLAGS='-DDEBUG_LOCKORDER -g' CPPFLAGS='-DDEBUG_LOCKORDER -g' make install
    ...
    root@e282d4a2b356:.../bitcoin# cd /usr/local/bin
    root@e282d4a2b356:/usr/local/bin# ls -al test_bitcoin*
    -rwxr-xr-x 1 root staff 61087584 Aug 14 18:57 test_bitcoin
    root@e282d4a2b356:/usr/local/bin# eu-strip -f test_bitcoin.debug test_bitcoin
    root@e282d4a2b356:/usr/local/bin# ls -al test_bitcoin*
    -rwxr-xr-x 1 root staff  9776504 Aug 14 18:57 test_bitcoin
    -rwxr-xr-x 1 root staff 51314088 Aug 14 18:57 test_bitcoin.debug
    root@e282d4a2b356:/usr/local/bin# which test_bitcoin
    /usr/local/bin/test_bitcoin
    root@e282d4a2b356:/usr/local/bin# pgrep test_bitcoin
    root@e282d4a2b356:/usr/local/bin# ( while true ; do test_bitcoin ; done ) &
    ... [around 25 iterations]
    Running 151 test cases...
    [hung, CPU back to idle]
    root@e282d4a2b356:/usr/local/bin# pgrep test_bitcoin
    30886
    

    Debug Info

    root@e282d4a2b356:/usr/local/bin# gdb -s ./test_bitcoin.debug ./test_bitcoin "$( pgrep test_bitcoin )"
    ...
    Reading symbols from ./test_bitcoin...Reading symbols from /usr/local/bin/test_bitcoin.debug...done.
    done.
    Attaching to program: /usr/local/bin/test_bitcoin, process 30886
    ...
    (gdb) info threads
    (gdb) info threads
      Id   Target Id         Frame
      3    Thread 0x7f5b767fc700 (LWP 30939) "test_bitcoin" 0x00007f5b7d7d908f in pthread_cond_wait@@GLIBC_2.3.2 ()
       from /lib/x86_64-linux-gnu/libpthread.so.0
      2    Thread 0x7f5b63fff700 (LWP 30943) "test_bitcoin" 0x00007f5b7d7d908f in pthread_cond_wait@@GLIBC_2.3.2 ()
       from /lib/x86_64-linux-gnu/libpthread.so.0
    * 1    Thread 0x7f5b80729740 (LWP 30886) "test_bitcoin" 0x00007f5b7d7d908f in pthread_cond_wait@@GLIBC_2.3.2 ()
       from /lib/x86_64-linux-gnu/libpthread.so.0
    (gdb) thread apply all bt
    
    Thread 3 (Thread 0x7f5b767fc700 (LWP 30939)):
    [#0](/bitcoin-bitcoin/0/)  0x00007f5b7d7d908f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
    [#1](/bitcoin-bitcoin/1/)  0x00007f5b7fd9dc00 in boost::condition_variable::wait (this=0x7ffc92ba4ce0, m=...) at /usr/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/bitcoin-bitcoin/2/)  0x00007f5b8010b163 in CScheduler::serviceQueue (this=0x7ffc92ba4cb0) at scheduler.cpp:40
    [#3](/bitcoin-bitcoin/3/)  0x00007f5b7fda6728 in boost::_mfi::mf0<void, CScheduler>::operator() (this=0x7f5b8132c008, p=0x7ffc92ba4cb0)
        at /usr/include/boost/bind/mem_fn_template.hpp:49
    [#4](/bitcoin-bitcoin/4/)  0x00007f5b7fda661e in boost::_bi::list1<boost::_bi::value<CScheduler*> >::operator()<boost::_mfi::mf0<void, CScheduler>, boost::_bi::list0> (
        this=0x7f5b8132c018, f=..., a=...) at /usr/include/boost/bind/bind.hpp:253
    [#5](/bitcoin-bitcoin/5/)  0x00007f5b7fda64ac in boost::_bi::bind_t<void, boost::_mfi::mf0<void, CScheduler>, boost::_bi::list1<boost::_bi::value<CScheduler*> > >::operator()
        (this=0x7f5b8132c008) at /usr/include/boost/bind/bind_template.hpp:20
    [#6](/bitcoin-bitcoin/6/)  0x00007f5b7fda6331 in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf0<void, CScheduler>, boost::_bi::list1<boost::_bi::value<CScheduler*> > > >::run (this=0x7f5b8132be50) at /usr/include/boost/thread/detail/thread.hpp:117
    [#7](/bitcoin-bitcoin/7/)  0x00007f5b7f13aaea in boost::(anonymous namespace)::thread_proxy (param=<optimized out>) at libs/thread/src/pthread/thread.cpp:164
    [#8](/bitcoin-bitcoin/8/)  0x00007f5b7d7d50a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
    [#9](/bitcoin-bitcoin/9/)  0x00007f5b7d50a04d in clone () from /lib/x86_64-linux-gnu/libc.so.6
    
    Thread 2 (Thread 0x7f5b63fff700 (LWP 30943)):
    [#0](/bitcoin-bitcoin/0/)  0x00007f5b7d7d908f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
    [#1](/bitcoin-bitcoin/1/)  0x00007f5b7fd9dc00 in boost::condition_variable::wait (this=0x7ffc92ba4ce0, m=...) at /usr/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/bitcoin-bitcoin/2/)  0x00007f5b8010b163 in CScheduler::serviceQueue (this=0x7ffc92ba4cb0) at scheduler.cpp:40
    [#3](/bitcoin-bitcoin/3/)  0x00007f5b7fda6728 in boost::_mfi::mf0<void, CScheduler>::operator() (this=0x7f5b81346948, p=0x7ffc92ba4cb0)
        at /usr/include/boost/bind/mem_fn_template.hpp:49
    [#4](/bitcoin-bitcoin/4/)  0x00007f5b7fda661e in boost::_bi::list1<boost::_bi::value<CScheduler*> >::operator()<boost::_mfi::mf0<void, CScheduler>, boost::_bi::list0> (
        this=0x7f5b81346958, f=..., a=...) at /usr/include/boost/bind/bind.hpp:253
    [#5](/bitcoin-bitcoin/5/)  0x00007f5b7fda64ac in boost::_bi::bind_t<void, boost::_mfi::mf0<void, CScheduler>, boost::_bi::list1<boost::_bi::value<CScheduler*> > >::operator()
        (this=0x7f5b81346948) at /usr/include/boost/bind/bind_template.hpp:20
    [#6](/bitcoin-bitcoin/6/)  0x00007f5b7fda6331 in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf0<void, CScheduler>, boost::_bi::list1<boost::_bi::value<CScheduler*> > > >::run (this=0x7f5b81346790) at /usr/include/boost/thread/detail/thread.hpp:117
    [#7](/bitcoin-bitcoin/7/)  0x00007f5b7f13aaea in boost::(anonymous namespace)::thread_proxy (param=<optimized out>) at libs/thread/src/pthread/thread.cpp:164
    [#8](/bitcoin-bitcoin/8/)  0x00007f5b7d7d50a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
    [#9](/bitcoin-bitcoin/9/)  0x00007f5b7d50a04d in clone () from /lib/x86_64-linux-gnu/libc.so.6
    
    Thread 1 (Thread 0x7f5b80729740 (LWP 30886)):
    [#0](/bitcoin-bitcoin/0/)  0x00007f5b7d7d908f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
    [#1](/bitcoin-bitcoin/1/)  0x00007f5b7fd9dc00 in boost::condition_variable::wait (this=0x7f5b8132bea8, m=...) at /usr/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/bitcoin-bitcoin/2/)  0x00007f5b7f13b5ac in boost::thread::join_noexcept (this=0x7f5b812e40e0) at libs/thread/src/pthread/thread.cpp:312
    [#3](/bitcoin-bitcoin/3/)  0x00007f5b7fd9db50 in boost::thread::join (this=0x7f5b812e40e0) at /usr/include/boost/thread/detail/thread.hpp:756
    [#4](/bitcoin-bitcoin/4/)  0x00007f5b7fd9e80d in boost::thread_group::join_all (this=0x7ffc92ba4d70) at /usr/include/boost/thread/detail/thread_group.hpp:118
    [#5](/bitcoin-bitcoin/5/)  0x00007f5b7fd9c642 in scheduler_tests::manythreads::test_method (this=0x7ffc92ba5a67) at test/scheduler_tests.cpp:109
    [#6](/bitcoin-bitcoin/6/)  0x00007f5b7fd9b980 in scheduler_tests::manythreads_invoker () at test/scheduler_tests.cpp:43
    [#7](/bitcoin-bitcoin/7/)  0x00007f5b7fc5e563 in boost::unit_test::ut_detail::invoker<boost::unit_test::ut_detail::unused>::invoke<void (*)()> (this=0x7ffc92ba5ad7,
        f=@0x7f5b81274348: 0x7f5b7fd9b95d <scheduler_tests::manythreads_invoker()>) at /usr/include/boost/test/utils/callback.hpp:56
    [#8](/bitcoin-bitcoin/8/)  0x00007f5b7fc5dfd7 in boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke (this=0x7f5b81274340)
        at /usr/include/boost/test/utils/callback.hpp:89
    [#9](/bitcoin-bitcoin/9/)  0x00007f5b7ece38f1 in operator() (this=<optimized out>) at ./boost/test/utils/callback.hpp:118
    [#10](/bitcoin-bitcoin/10/) operator() (this=<optimized out>) at ./boost/test/impl/unit_test_monitor.ipp:41
    [#11](/bitcoin-bitcoin/11/) invoke<boost::unit_test::(anonymous namespace)::zero_return_wrapper_t<boost::unit_test::callback0<> > > (this=<optimized out>, f=...)
        at ./boost/test/utils/callback.hpp:42
    [#12](/bitcoin-bitcoin/12/) boost::unit_test::ut_detail::callback0_impl_t<int, boost::unit_test::(anonymous namespace)::zero_return_wrapper_t<boost::unit_test::callback0<boost::unit_test::ut_detail::unused> > >::invoke (this=<optimized out>) at ./boost/test/utils/callback.hpp:89
    [#13](/bitcoin-bitcoin/13/) 0x00007f5b7ecba6de in operator() (this=0x7ffc92ba6610) at ./boost/test/utils/callback.hpp:118
    [#14](/bitcoin-bitcoin/14/) do_invoke<boost::scoped_ptr<boost::detail::translate_exception_base>, boost::unit_test::callback0<int> > (F=..., tr=...)
        at ./boost/test/impl/execution_monitor.ipp:281
    [#15](/bitcoin-bitcoin/15/) boost::execution_monitor::catch_signals (
        this=this@entry=0x7f5b7ef239a0 <boost::unit_test::singleton<boost::unit_test::unit_test_monitor_t>::instance()::the_inst>, F=...)
        at ./boost/test/impl/execution_monitor.ipp:885
    [#16](/bitcoin-bitcoin/16/) 0x00007f5b7ecbaf33 in boost::execution_monitor::execute (
        this=this@entry=0x7f5b7ef239a0 <boost::unit_test::singleton<boost::unit_test::unit_test_monitor_t>::instance()::the_inst>, F=...)
        at ./boost/test/impl/execution_monitor.ipp:1211
    [#17](/bitcoin-bitcoin/17/) 0x00007f5b7ece3a05 in boost::unit_test::unit_test_monitor_t::execute_and_translate (
        this=0x7f5b7ef239a0 <boost::unit_test::singleton<boost::unit_test::unit_test_monitor_t>::instance()::the_inst>, tc=...)
        at ./boost/test/impl/unit_test_monitor.ipp:69
    [#18](/bitcoin-bitcoin/18/) 0x00007f5b7ecca4af in boost::unit_test::framework_impl::visit (
        this=0x7f5b7ef238c0 <boost::unit_test::(anonymous namespace)::s_frk_impl()::the_inst>, tc=...) at ./boost/test/impl/framework.ipp:156
    [#19](/bitcoin-bitcoin/19/) 0x00007f5b7ecfea33 in boost::unit_test::traverse_test_tree (suite=..., V=...) at ./boost/test/impl/unit_test_suite.ipp:207
    [#20](/bitcoin-bitcoin/20/) 0x00007f5b7ecfea33 in boost::unit_test::traverse_test_tree (suite=..., V=...) at ./boost/test/impl/unit_test_suite.ipp:207
    [#21](/bitcoin-bitcoin/21/) 0x00007f5b7ecc59fa in boost::unit_test::framework::run (id=1, id@entry=4294967295, continue_test=continue_test@entry=true)
        at ./boost/test/impl/framework.ipp:442
    [#22](/bitcoin-bitcoin/22/) 0x00007f5b7ece1697 in boost::unit_test::unit_test_main (init_func=<optimized out>, argc=<optimized out>, argv=<optimized out>)
        at ./boost/test/impl/unit_test_main.ipp:185
    [#23](/bitcoin-bitcoin/23/) 0x00007f5b7fdf3788 in main (argc=1, argv=0x7ffc92ba6948) at /usr/include/boost/test/unit_test.hpp:59
    (gdb) generate-core-file
    warning: Memory read failed for corefile section, 8192 bytes at 0x7ffc92bf9000.
    Saved corefile core.30886
    ...
    
  8. laanwj added the label Tests on Sep 4, 2015
  9. MarcoFalke commented at 9:55 AM on September 23, 2015: member

    If someone wants to track down this issue further, you don't have to run all test.

    src/test/test_bitcoin --log_level=ALL --run_test=scheduler_tests
    

    is enough.

  10. morcos commented at 2:56 AM on October 28, 2015: member

    This happens to me about 1 out of every 4 times I run test_bitcoin. I'm not doing anything special like stripping debugging symbols.

  11. laanwj commented at 12:31 PM on December 1, 2015: member

    If this is a common issue (I've never noticed it locally and I run test_bitcoin 10's of times per day) Maybe better to disable the scheduler tests until they're fixed?

  12. MarcoFalke commented at 1:30 PM on December 1, 2015: member

    @laanwj Happens 4% of the time:

    for i in {1..10000}; do timeout 1 src/test/test_bitcoin --log_level=ALL --run_test=scheduler_tests &> /dev/null ;echo $? >> /tmp/result100 ;done;cat /tmp/result100|sort|uniq -c
       9638 0
        462 124
    
  13. laanwj referenced this in commit 8f0d79e3c8 on Dec 1, 2015
  14. laanwj commented at 1:57 PM on December 1, 2015: member

    See #7144

  15. gavinandresen commented at 8:39 PM on December 1, 2015: contributor

    For what it's worth: I have never seen this bug, OSX and boost 1.59.

    Looking through Boost release notes, I see a fix in boost 1.56 that might be the culprit:

    1.56 : Simplified, refactored and unified (timed_)lock code based on try_lock(). There were several bugs when handling timeout expirations. @MarcoFalke @morcos : are you using boost version < 1.56 ?

  16. MarcoFalke commented at 9:43 PM on December 1, 2015: member

    Running 1.58 (the same? as travis is running, which also locks)

  17. morcos commented at 10:03 PM on December 1, 2015: member

    hmm if i was paying closer attention i could have helped debug this better, because it was happening fairly consistently for me, and now it never happens anymore.

  18. posita commented at 11:34 PM on December 1, 2015: contributor

    FWIW, I don't recall being able to reproduce this in OS X (using Boost 1.58 or 1.59). I was able to reliably reproduce it using Debian Jessie<sup>1</sup> (but the system packages for Boost are at 1.55). @MarcoFalke, what platform (uname -a) are you experiencing locks with 1.58? Can you do a linking sanity check (e.g., ldd -v .../path/to/test_bitcoin on Linux, otool -Lv ... on Darwin)?

    <sup>1</sup> Technically, it was inside a Docker container based on debian:stable which was running via boot2docker inside a Parallels VM running on OS X, but I'm pretty sure that doesn't count as OS X. :stuck_out_tongue_winking_eye:

  19. posita commented at 5:59 AM on December 2, 2015: contributor

    I am unable to reproduce on OS X with Boost 1.59:

    % uname -a
    Darwin ... 13.4.0 Darwin Kernel Version 13.4.0: Wed Mar 18 16:20:14 PDT 2015; root:xnu-2422.115.14~1/RELEASE_X86_64 x86_64
    % src/bitcoin-cli --version
    Bitcoin Core RPC client version v0.11.2
    % otool -Lv src/bitcoin-cli
    src/bitcoin-cli:
            .../lib/libboost_system-mt.dylib (compatibility version 0.0.0, current version 0.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            .../lib/libboost_filesystem-mt.dylib (compatibility version 0.0.0, current version 0.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            .../lib/libboost_program_options-mt.dylib (compatibility version 0.0.0, current version 0.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            .../lib/libboost_thread-mt.dylib (compatibility version 0.0.0, current version 0.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            .../lib/libboost_chrono-mt.dylib (compatibility version 0.0.0, current version 0.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            .../lib/libssl.1.0.0.dylib (compatibility version 1.0.0, current version 1.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            .../lib/libcrypto.1.0.0.dylib (compatibility version 1.0.0, current version 1.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 120.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)
            time stamp 2 Wed Dec 31 16:00:02 1969
    % grep BOOST_LIB_VERSION .../include/boost/version.hpp
    //  BOOST_LIB_VERSION must be defined to be the same as BOOST_VERSION
    #define BOOST_LIB_VERSION "1_59"
    % for i in {1..10000} ; do gtimeout 2 src/test/test_bitcoin --log_level=ALL --run_test=scheduler_tests &>/dev/null ; echo $? >>/tmp/result100 ; done ; sort /tmp/result100 | uniq -c
    10000 0
    
  20. MarcoFalke commented at 7:08 AM on December 2, 2015: member

    With BOOST_LIB_VERSION=1_58

    $ uname -a
    Linux localhost.localdomain 4.2.6-301.fc23.x86_64 [#1](/bitcoin-bitcoin/1/) SMP Fri Nov 20 22:22:41 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
    $ ldd -v ./src/test/test_bitcoin|grep boost
        libboost_system.so.1.58.0 => /lib64/libboost_system.so.1.58.0 (0x00007f5e84548000)
        libboost_filesystem.so.1.58.0 => /lib64/libboost_filesystem.so.1.58.0 (0x00007f5e84332000)
        libboost_program_options.so.1.58.0 => /lib64/libboost_program_options.so.1.58.0 (0x00007f5e840bc000)
        libboost_thread.so.1.58.0 => /lib64/libboost_thread.so.1.58.0 (0x00007f5e83e96000)
        libboost_chrono.so.1.58.0 => /lib64/libboost_chrono.so.1.58.0 (0x00007f5e83c8e000)
        libboost_unit_test_framework.so.1.58.0 => /lib64/libboost_unit_test_framework.so.1.58.0 (0x00007f5e839eb000)
        /lib64/libboost_system.so.1.58.0:
        /lib64/libboost_filesystem.so.1.58.0:
        /lib64/libboost_program_options.so.1.58.0:
        /lib64/libboost_thread.so.1.58.0:
        /lib64/libboost_chrono.so.1.58.0:
        /lib64/libboost_unit_test_framework.so.1.58.0:
    
    
  21. posita commented at 7:20 AM on December 2, 2015: contributor

    I am unable to reproduce on OS X with Boost 1.59 ...

    However, I can reproduce it on Debian Sid (running in Docker) with Boost 1.58:

    root@c74c16a6aee2:~/bitcoin# uname -a
    Linux c74c16a6aee2 4.1.13-boot2docker [#1](/bitcoin-bitcoin/1/) SMP Fri Nov 20 19:05:50 UTC 2015 x86_64 GNU/Linux
    root@c74c16a6aee2:~/bitcoin# src/bitcoin-cli --version
    Bitcoin Core RPC client version v0.11.99.0-4077ad2
    root@c74c16a6aee2:~/bitcoin# ldd -v src/bitcoin-cli | grep boost
            libboost_system.so.1.58.0 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.58.0 (0x00007fc1a24f2000)
            libboost_filesystem.so.1.58.0 => /usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.58.0 (0x00007fc1a22d9000)
            libboost_program_options.so.1.58.0 => /usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.58.0 (0x00007fc1a2057000)
            libboost_thread.so.1.58.0 => /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0 (0x00007fc1a1e30000)
            libboost_chrono.so.1.58.0 => /usr/lib/x86_64-linux-gnu/libboost_chrono.so.1.58.0 (0x00007fc1a1c28000)
            /usr/lib/x86_64-linux-gnu/libboost_system.so.1.58.0:
            /usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.58.0:
            /usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.58.0:
            /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0:
            /usr/lib/x86_64-linux-gnu/libboost_chrono.so.1.58.0:
    root@c74c16a6aee2:~/bitcoin# grep BOOST_LIB_VERSION /usr/include/boost/version.hpp
    //  BOOST_LIB_VERSION must be defined to be the same as BOOST_VERSION
    #define BOOST_LIB_VERSION "1_58"
    root@c74c16a6aee2:~/bitcoin# for i in {1..2000} ; do timeout 2 src/test/test_bitcoin --log_level=ALL --run_test=scheduler_tests &>/dev/null ; echo $? >>/tmp/result100 ; done ; sort /tmp/result100 | uniq -c
       1695 0
        305 124
    

    Caveats:

  22. posita commented at 4:31 PM on December 3, 2015: contributor

    FWIW, I used these with the stack mentioned in my previous comment to reproduce this on Debian (wheezy, jessie, sid) and Ubuntu (trusty, vivid, wily):

    % for i in Dockerfile-*-*-automake ; do read -k 1 "?Any key to continue with ${i} ... " ; echo ; docker build -f "${i}" . ; done
    Any key to continue with Dockerfile-debian-jessie-automake ... .
    ...
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    
    *** No errors detected
    Running 1 test case...
    Platform: linux
    Compiler: GNU C++ version 4.9.1
    STL     : GNU libstdc++ version 20140912
    Boost   : 1.55.0
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Bitcoin Core RPC client version v0.11.2
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Linux 11aafb97cfeb 4.1.13-boot2docker [#1](/bitcoin-bitcoin/1/) SMP Fri Nov 20 19:05:50 UTC 2015 x86_64 GNU/Linux
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
        871 0
        129 124
    ...
    Any key to continue with Dockerfile-debian-sid-automake ... .
    ...
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    
    *** No errors detected
    Running 1 test case...
    Platform: linux
    Compiler: GNU C++ version 5.2.1 20151129
    STL     : GNU libstdc++ version 20151129
    Boost   : 1.58.0
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Bitcoin Core RPC client version v0.11.2
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Linux 5cf97854c2bc 4.1.13-boot2docker [#1](/bitcoin-bitcoin/1/) SMP Fri Nov 20 19:05:50 UTC 2015 x86_64 GNU/Linux
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
        881 0
        119 124
    ...
    Any key to continue with Dockerfile-debian-wheezy-automake ... .
    ...
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    
    *** No errors detected
    Running 1 test case...
    Platform: linux
    Compiler: GNU C++ version 4.7.2
    STL     : GNU libstdc++ version 20120920
    Boost   : 1.49.0
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Bitcoin Core RPC client version v0.11.2
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Linux c98f3c4629d3 4.1.13-boot2docker [#1](/bitcoin-bitcoin/1/) SMP Fri Nov 20 19:05:50 UTC 2015 x86_64 GNU/Linux
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
        909 0
         91 124
     ...
    Any key to continue with Dockerfile-ubuntu-trusty-automake ... .
    ...
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    
    *** No errors detected
    Running 1 test case...
    Platform: linux
    Compiler: GNU C++ version 4.8.2
    STL     : GNU libstdc++ version 20140404
    Boost   : 1.54.0
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Bitcoin Core RPC client version v0.11.2
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Linux a52c17016130 4.1.13-boot2docker [#1](/bitcoin-bitcoin/1/) SMP Fri Nov 20 19:05:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
        887 0
        113 124
    ...
    Any key to continue with Dockerfile-ubuntu-vivid-automake ... .
    ...
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    
    *** No errors detected
    Running 1 test case...
    Platform: linux
    Compiler: GNU C++ version 4.9.2
    STL     : GNU libstdc++ version 20150305
    Boost   : 1.55.0
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Bitcoin Core RPC client version v0.11.2
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Linux 19d74fa25b3a 4.1.13-boot2docker [#1](/bitcoin-bitcoin/1/) SMP Fri Nov 20 19:05:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
        897 0
        103 124
    ...
    Any key to continue with Dockerfile-ubuntu-wily-automake ... .
    ...
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    
    *** No errors detected
    Running 1 test case...
    Platform: linux
    Compiler: GNU C++ version 5.2.1 20151010
    STL     : GNU libstdc++ version 20151010
    Boost   : 1.58.0
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Bitcoin Core RPC client version v0.11.2
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Linux a6af3de67df6 4.1.13-boot2docker [#1](/bitcoin-bitcoin/1/) SMP Fri Nov 20 19:05:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
        741 0
        259 124
    

    So it appears to be happening pretty consistently under each of those environments, often more than 10% of the time with my setup (124 exit code counts are the hung ones).

    While it's possible that virtualization amplifies opportunities to observe a race condition that more efficient environments hide, concurrent with that last test (for Ubuntu Wily), I ran the following on the OS X host and still got zero hangs:

    % bitcoin-cli --version
    Bitcoin Core RPC client version v0.11.2
    % test_bitcoin --build_info --run_test=scheduler_tests
    Running 1 test case...
    Platform: Mac OS
    Compiler: Clang version 6.0 (clang-600.0.57)
    STL     : libc++ version 1101
    Boost   : 1.59.0
    
    *** No errors detected
    % python -c 'import decimal ; print((decimal.Decimal("10" * 16384 * 8192) + decimal.Decimal("4321" * 16384 * 8192)) * decimal.Decimal("9" * 32768 * 4096))' & \
    > python -c 'import decimal ; print((decimal.Decimal("10" * 16384 * 8192) + decimal.Decimal("4321" * 16384 * 8192)) * decimal.Decimal("9" * 32768 * 4096))' & \
    > python -c 'import decimal ; print((decimal.Decimal("10" * 16384 * 8192) + decimal.Decimal("4321" * 16384 * 8192)) * decimal.Decimal("9" * 32768 * 4096))' & \
    > python -c 'import decimal ; print((decimal.Decimal("10" * 16384 * 8192) + decimal.Decimal("4321" * 16384 * 8192)) * decimal.Decimal("9" * 32768 * 4096))' & \
    > time ( for i in {1..10000} ; do gtimeout 2 test_bitcoin --log_level=ALL --run_test=scheduler_tests &>/dev/null ; echo "${?}" ; done ) | sort | uniq -c ; \
    > kill %1 %2 %3 %4
    [1] 20843
    [2] 20844
    [3] 20845
    [4] 20846
    10000 0
    ( for i in {1..10000}; do; gtimeout 2 test_bitcoin --log_level=ALL  &> ; echo)  565.09s user 252.40s system 23% cpu 56:46.88 total
    [4]  + terminated  python -c
    [3]  + terminated  python -c
    [2]  - terminated  python -c
    [1]  - terminated  python -c
    
  23. luke-jr referenced this in commit f03d898094 on Dec 8, 2015
  24. laanwj renamed this:
    test_bitcoin occasionally deadlocks
    scheduler_tests (currently disabled) occasionally deadlocks
    on Apr 28, 2016
  25. paveljanik commented at 7:07 AM on May 5, 2016: contributor

    FWIW: OS X physical test machine with 10.10, current master and boosts from 1.54.0 to 1.58.0: no hangs at all. I can't reproduce the test problem here locally :-(

  26. paveljanik commented at 9:52 AM on May 5, 2016: contributor

    openSUSE 13.2 64bit, no hang in 10000 runs with boost 1.54.0.

    Running 1 test case...
    Platform: linux
    Compiler: GNU C++ version 4.8.3 20140627 [gcc-4_8-branch revision 212064]
    STL     : GNU libstdc++ version 20140627
    Boost   : 1.54.0
    
    *** No errors detected
    
  27. paveljanik commented at 10:22 AM on May 5, 2016: contributor

    How to reproduce this so I can look at it?

  28. paveljanik commented at 11:00 AM on May 5, 2016: contributor

    OK, I can sort of reproduce it here on openSUSE 13.2 64bit.

    Running

    for i in {1..100}; do timeout 2000 ./test_bitcoin -d y --log-level=ALL  --run_test=scheduler_tests >>log.$CASE.$i 2>&1 && echo OK || echo ERR $CASE $i; done | sort | uniq -c & CASE=$((CASE+1))
    

    5 times in parallel will keep approx. 2 instances of test_bitcoin running and waiting in

    pthread_cond_wait@@GLIBC_2.3.2
    

    Full bt:

    [#0](/bitcoin-bitcoin/0/)  0x00007f477358605f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    [#1](/bitcoin-bitcoin/1/)  0x00007f4775d27638 in boost::condition_variable::wait (this=0x7f4777638218, m=...) at /usr/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/bitcoin-bitcoin/2/)  0x00007f47750a13f4 in boost::thread::join_noexcept() () from /usr/lib64/libboost_thread.so.1.54.0
    [#3](/bitcoin-bitcoin/3/)  0x00007f4775d27588 in boost::thread::join (this=0x7f47776381a0) at /usr/include/boost/thread/detail/thread.hpp:756
    [#4](/bitcoin-bitcoin/4/)  0x00007f4775d28295 in boost::thread_group::join_all (this=0x7ffdcadd52d0) at /usr/include/boost/thread/detail/thread_group.hpp:118
    [#5](/bitcoin-bitcoin/5/)  0x00007f4775d25b8a in scheduler_tests::manythreads::test_method (this=0x7ffdcadd5fc7) at test/scheduler_tests.cpp:110
    [#6](/bitcoin-bitcoin/6/)  0x00007f4775d24e60 in scheduler_tests::manythreads_invoker () at test/scheduler_tests.cpp:44
    [#7](/bitcoin-bitcoin/7/)  0x00007f4775bbfff7 in boost::unit_test::ut_detail::invoker<boost::unit_test::ut_detail::unused>::invoke<void (*)()> (this=0x7ffdcadd6037, 
        f=@0x7f47776013c8: 0x7f4775d24e3d <scheduler_tests::manythreads_invoker()>) at /usr/include/boost/test/utils/callback.hpp:56
    [#8](/bitcoin-bitcoin/8/)  0x00007f4775bbfa63 in boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke (this=0x7f47776013c0)
        at /usr/include/boost/test/utils/callback.hpp:89
    [#9](/bitcoin-bitcoin/9/)  0x00007f4774c50661 in ?? () from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#10](/bitcoin-bitcoin/10/) 0x00007f4774c2bbf6 in boost::execution_monitor::catch_signals(boost::unit_test::callback0<int> const&) () from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#11](/bitcoin-bitcoin/11/) 0x00007f4774c2c423 in boost::execution_monitor::execute(boost::unit_test::callback0<int> const&) () from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#12](/bitcoin-bitcoin/12/) 0x00007f4774c50762 in boost::unit_test::unit_test_monitor_t::execute_and_translate(boost::unit_test::test_case const&) ()
       from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#13](/bitcoin-bitcoin/13/) 0x00007f4774c3a424 in boost::unit_test::framework_impl::visit(boost::unit_test::test_case const&) () from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#14](/bitcoin-bitcoin/14/) 0x00007f4774c68f13 in boost::unit_test::traverse_test_tree(boost::unit_test::test_suite const&, boost::unit_test::test_tree_visitor&) ()
       from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#15](/bitcoin-bitcoin/15/) 0x00007f4774c68f13 in boost::unit_test::traverse_test_tree(boost::unit_test::test_suite const&, boost::unit_test::test_tree_visitor&) ()
       from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#16](/bitcoin-bitcoin/16/) 0x00007f4774c3594a in boost::unit_test::framework::run(unsigned long, bool) () from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#17](/bitcoin-bitcoin/17/) 0x00007f4774c4e344 in boost::unit_test::unit_test_main(bool (*)(), int, char**) () from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#18](/bitcoin-bitcoin/18/) 0x00007f4775d88160 in main (argc=5, argv=0x7ffdcadd6e98) at /usr/include/boost/test/unit_test.hpp:59
    

    This is with boost 1.54.0. I'll try newer versions of boost on this machine later today.

  29. laanwj commented at 1:58 PM on May 5, 2016: member

    I could reproduce it at the time, but haven't tried for a while. It was a pretty rare issue, which made it non-trivial to fix. Congrats on getting a backtrace at least. You should probably have dumped one for all threads to get a fuller picture. (waiting in pthread_cond_wait@@GLIBC_2.3.2 is perfectly normal, it's bound to be a combination of threads which puts it into a deadlock)

  30. paveljanik commented at 2:22 PM on May 5, 2016: contributor
    (gdb) thread apply all bt
    
    Thread 4 (Thread 0x7f3c0c330700 (LWP 26007)):
    [#0](/bitcoin-bitcoin/0/)  0x00007f3c0ce0705f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    [#1](/bitcoin-bitcoin/1/)  0x00007f3c0f5046e3 in boost::condition_variable::wait (this=0x7ffc1eccc250, m=...) at /tmp/BOOST/1.58.0/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/bitcoin-bitcoin/2/)  0x00007f3c0f6db830 in CScheduler::serviceQueue (this=0x7ffc1eccc220) at scheduler.cpp:42
    [#3](/bitcoin-bitcoin/3/)  0x00007f3c0e927fef in thread_proxy () from /tmp/BOOST/1.58.0/lib/libboost_thread.so.1.58.0
    [#4](/bitcoin-bitcoin/4/)  0x00007f3c0ce030a4 in start_thread () from /lib64/libpthread.so.0
    [#5](/bitcoin-bitcoin/5/)  0x00007f3c0cb3908d in clone () from /lib64/libc.so.6
    
    Thread 3 (Thread 0x7f3c0b32e700 (LWP 26009)):
    [#0](/bitcoin-bitcoin/0/)  0x00007f3c0ce0705f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    [#1](/bitcoin-bitcoin/1/)  0x00007f3c0f5046e3 in boost::condition_variable::wait (this=0x7ffc1eccc250, m=...) at /tmp/BOOST/1.58.0/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/bitcoin-bitcoin/2/)  0x00007f3c0f6db830 in CScheduler::serviceQueue (this=0x7ffc1eccc220) at scheduler.cpp:42
    [#3](/bitcoin-bitcoin/3/)  0x00007f3c0e927fef in thread_proxy () from /tmp/BOOST/1.58.0/lib/libboost_thread.so.1.58.0
    [#4](/bitcoin-bitcoin/4/)  0x00007f3c0ce030a4 in start_thread () from /lib64/libpthread.so.0
    [#5](/bitcoin-bitcoin/5/)  0x00007f3c0cb3908d in clone () from /lib64/libc.so.6
    
    Thread 2 (Thread 0x7f3c0a32c700 (LWP 26011)):
    [#0](/bitcoin-bitcoin/0/)  0x00007f3c0ce0705f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    [#1](/bitcoin-bitcoin/1/)  0x00007f3c0f5046e3 in boost::condition_variable::wait (this=0x7ffc1eccc250, m=...) at /tmp/BOOST/1.58.0/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/bitcoin-bitcoin/2/)  0x00007f3c0f6db830 in CScheduler::serviceQueue (this=0x7ffc1eccc220) at scheduler.cpp:42
    [#3](/bitcoin-bitcoin/3/)  0x00007f3c0e927fef in thread_proxy () from /tmp/BOOST/1.58.0/lib/libboost_thread.so.1.58.0
    [#4](/bitcoin-bitcoin/4/)  0x00007f3c0ce030a4 in start_thread () from /lib64/libpthread.so.0
    [#5](/bitcoin-bitcoin/5/)  0x00007f3c0cb3908d in clone () from /lib64/libc.so.6
    
    Thread 1 (Thread 0x7f3c0f3d4740 (LWP 25800)):
    [#0](/bitcoin-bitcoin/0/)  0x00007f3c0ce0705f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    [#1](/bitcoin-bitcoin/1/)  0x00007f3c0f5046e3 in boost::condition_variable::wait (this=0x7f3c11c15d18, m=...) at /tmp/BOOST/1.58.0/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/bitcoin-bitcoin/2/)  0x00007f3c0e92883c in boost::thread::join_noexcept() () from /tmp/BOOST/1.58.0/lib/libboost_thread.so.1.58.0
    [#3](/bitcoin-bitcoin/3/)  0x00007f3c0f504996 in join (this=0x7f3c11c11f20) at /tmp/BOOST/1.58.0/include/boost/thread/detail/thread.hpp:767
    [#4](/bitcoin-bitcoin/4/)  boost::thread_group::join_all (this=this@entry=0x7ffc1eccc2e0) at /tmp/BOOST/1.58.0/include/boost/thread/detail/thread_group.hpp:118
    [#5](/bitcoin-bitcoin/5/)  0x00007f3c0f502b2a in scheduler_tests::manythreads::test_method (this=this@entry=0x7ffc1ecccfd7) at test/scheduler_tests.cpp:110
    [#6](/bitcoin-bitcoin/6/)  0x00007f3c0f502fbe in scheduler_tests::manythreads_invoker () at test/scheduler_tests.cpp:44
    [#7](/bitcoin-bitcoin/7/)  0x00007f3c0f454f27 in invoke<void (*)()> (this=<optimized out>, f=<optimized out>) at /tmp/BOOST/1.58.0/include/boost/test/utils/callback.hpp:56
    [#8](/bitcoin-bitcoin/8/)  boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke (this=<optimized out>) at /tmp/BOOST/1.58.0/include/boost/test/utils/callback.hpp:89
    [#9](/bitcoin-bitcoin/9/)  0x00007f3c0e4d22b1 in boost::unit_test::ut_detail::callback0_impl_t<int, boost::unit_test::(anonymous namespace)::zero_return_wrapper_t<boost::unit_test::callback0<boost::unit_test::ut_detail::unused> > >::invoke() ()
       from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#10](/bitcoin-bitcoin/10/) 0x00007f3c0e4b4946 in boost::execution_monitor::catch_signals(boost::unit_test::callback0<int> const&) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#11](/bitcoin-bitcoin/11/) 0x00007f3c0e4b5173 in boost::execution_monitor::execute(boost::unit_test::callback0<int> const&) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#12](/bitcoin-bitcoin/12/) 0x00007f3c0e4d23b2 in boost::unit_test::unit_test_monitor_t::execute_and_translate(boost::unit_test::test_case const&) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#13](/bitcoin-bitcoin/13/) 0x00007f3c0e4bbf14 in boost::unit_test::framework_impl::visit(boost::unit_test::test_case const&) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#14](/bitcoin-bitcoin/14/) 0x00007f3c0e4eb053 in boost::unit_test::traverse_test_tree(boost::unit_test::test_suite const&, boost::unit_test::test_tree_visitor&) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#15](/bitcoin-bitcoin/15/) 0x00007f3c0e4eb053 in boost::unit_test::traverse_test_tree(boost::unit_test::test_suite const&, boost::unit_test::test_tree_visitor&) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#16](/bitcoin-bitcoin/16/) 0x00007f3c0e4b743a in boost::unit_test::framework::run(unsigned long, bool) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#17](/bitcoin-bitcoin/17/) 0x00007f3c0e4cff94 in boost::unit_test::unit_test_main(bool (*)(), int, char**) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#18](/bitcoin-bitcoin/18/) 0x00007f3c0ca75b05 in __libc_start_main () from /lib64/libc.so.6
    [#19](/bitcoin-bitcoin/19/) 0x00007f3c0f43cdcc in _start () at ../sysdeps/x86_64/start.S:112
    (gdb) 
    
  31. paveljanik commented at 2:27 PM on May 5, 2016: contributor
    (gdb) thread 4
    [Switching to thread 4 (Thread 0x7f3c0c330700 (LWP 26007))]
    [#0](/bitcoin-bitcoin/0/)  0x00007f3c0ce0705f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    (gdb) up
    [#1](/bitcoin-bitcoin/1/)  0x00007f3c0f5046e3 in boost::condition_variable::wait (this=0x7ffc1eccc250, m=...) at /tmp/BOOST/1.58.0/include/boost/thread/pthread/condition_variable.hpp:73
    73                res = pthread_cond_wait(&cond,&internal_mutex);
    (gdb) up
    [#2](/bitcoin-bitcoin/2/)  0x00007f3c0f6db830 in CScheduler::serviceQueue (this=0x7ffc1eccc220) at scheduler.cpp:42
    42                  newTaskScheduled.wait(lock);
    (gdb) list
    37      // is called.
    38      while (!shouldStop()) {
    39          try {
    40              while (!shouldStop() && taskQueue.empty()) {
    41                  // Wait until there is something to do.
    42                  newTaskScheduled.wait(lock);
    43              }
    44  
    
  32. paveljanik commented at 9:30 PM on May 5, 2016: contributor

    Self notice: Do I really need to repeat how I love debugging multiple threads servicing one queue of tasks repeatedly adding other tasks to the same queue?

    The solution to the deadlock is to call notify_one() at the end of ::serviceQueue(), because this could wake up other threads running ::serviceQueues() in such a deadlock case. Without this, there could remain some servicing threads waiting for the task which doesn't come in (because we are finished)!

    Or directly call notify_all() but I prefer _one().

    What do you think?

    diff --git a/src/scheduler.cpp b/src/scheduler.cpp
    index 184ddc2..0a431b2 100644
    --- a/src/scheduler.cpp
    +++ b/src/scheduler.cpp
    @@ -79,6 +81,7 @@ void CScheduler::serviceQueue()
             }
         }
         --nThreadsServicingQueue;
    +    newTaskScheduled.notify_one();
     }
    
     void CScheduler::stop(bool drain)
    
    
  33. laanwj closed this on May 10, 2016

  34. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 18:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me