Thread / shutdown cleanup #2357

pull gavinandresen wants to merge 7 commits into bitcoin:master from gavinandresen:shutdowncleanup changing 31 files +528 −809
  1. gavinandresen commented at 10:07 PM on March 11, 2013: contributor

    The shutdown code has bothered me for a long time, with fShutdown and vnThreadsRunning and a bunch of ad-hoc Sleeps().

    Reports of shutdown-on-exit bugs in 0.8 prompted me to clean it all up.

    This pull reworks the way we keep track of threads, using boost::thread_groups or boost::thread pointers.

    Instead of polling for a global fShutdown flag, threads now just MilliSleep(), wait on boost mutexes, or periodically call boost::this_thread::interruption_point(), and are interrupted using boost's thread interruption mechanisms.

    All of the cleanup removes 400 lines of code, and, I hope, makes the code easier to follow.

    Tested on OSX extensively, and compiled/tested lightly on Windows XP.

  2. gavinandresen commented at 3:36 PM on March 13, 2013: contributor

    Rebased, and added -rpcthreads to --help output.

    But the SIGTERM handler is not working properly, which is why the pull tester is upset...

  3. Diapolo commented at 10:40 PM on March 19, 2013: none

    I really like the reduced complexity of the code, I didn't test it out yet though.

  4. gavinandresen commented at 8:10 PM on March 20, 2013: contributor

    I can't reproduce the pull-tester issue in an Ubuntu VM, but I did fix a compiler warning.

  5. Diapolo commented at 9:16 PM on March 20, 2013: none

    Let's see what pull-tester is doing now, seems the old entries are gone and I wanted to also take a look :).

  6. in src/init.cpp:None in 885904aa20 outdated
     303 | @@ -299,6 +304,7 @@ bool static Bind(const CService &addr, unsigned int flags) {
     304 |          "  -rpcport=<port>        " + _("Listen for JSON-RPC connections on <port> (default: 8332 or testnet: 18332)") + "\n" +
     305 |          "  -rpcallowip=<ip>       " + _("Allow JSON-RPC connections from specified IP address") + "\n" +
     306 |          "  -rpcconnect=<ip>       " + _("Send commands to node running on <ip> (default: 127.0.0.1)") + "\n" +
     307 | +        "  -rpcthreads=<n>        " + _("Use this mean threads to service RPC calls (default: 4)") + "\n" +
    


    sipa commented at 1:59 PM on March 21, 2013:

    mean?


    Diapolo commented at 2:08 PM on March 21, 2013:

    ^^ I vote for Set the number of threads to service RPC calls (1-16, 0=auto, default: 0) and we should perhaps consider the same routine as you added for the -par switch then.

    <pre> // -rpcthreads=0 means autodetect, but nRPCThreads==0 means no concurrency nRPCThreads = GetArg("-rpcthreads", 0); if (nRPCThreads == 0) nRPCThreads = boost::thread::hardware_concurrency(); if (nRPCThreads <= 1) nRPCThreads = 0; else if (nRPCThreads > MAX_RPC_THREADS) nRPCThreads = MAX_RPC_THREADS; </pre>


    gavinandresen commented at 2:11 PM on March 21, 2013:

    No.

    More RPC threads is not better, because RPC calls are normally single-threaded anyway due to lock contention.


    Diapolo commented at 2:13 PM on March 21, 2013:

    Okay, didn't know that, but the message should be in a similar format and some checks for min / max would be also nice IMHO.


    rebroad commented at 1:25 AM on April 1, 2013:

    "Use this mean threads" doesn't appear to me to be English.

  7. jgarzik commented at 5:58 PM on March 22, 2013: contributor

    ACK (with some inline minor nits mentioned)

  8. sipa commented at 12:35 AM on March 23, 2013: member

    I really like this change (except the polling to check whether someone interrupted... i wonder if one can call a condition variable's notify in a signal handler...). I haven't gone through the code in detail, but I see nothing terrible.

    I've tested both bitcoind and bitcoin-qt under valgrind, with several ways of causing exit. All seems good.

  9. denis2342 commented at 3:14 AM on March 23, 2013: none

    I tested it under OSX 10.8.3 with a command line build of bitcoind and it does not stop when I send the "stop" rpc command.

  10. gavinandresen commented at 5:55 PM on March 23, 2013: contributor

    There is definitely a bug with the signal handler / RPC stop on OSX at least, if I try to stop shortly after startup the "stop" get swallowed and further stops have no effect.

  11. gavinandresen commented at 12:22 AM on March 24, 2013: contributor

    Fixed two issues, and made one improvement:

    1. Running -daemon, the shutdown detection thread was being started before the fork(). Oops. I moved the -daemon code to run earlier, in AppInit() (which has the added benefit of simplifying the code a bit).
    2. A request to shutdown during AppInit2() startup could get lost. I fixed that by making the detect shutdown thread run and repeatedly interrupt the main thread group until it, itself, is interrupted.

    Improvement:

    Simplified the Qt code by using a QTimer to look for fRequestShutdown getting set, instead of the more complicated boost signal --> slot --> Qt signal --> slot.

  12. denis2342 commented at 10:39 AM on March 29, 2013: none

    on freebsd I needed to the add boost_timer lib to the linker

  13. denis2342 commented at 4:19 PM on March 29, 2013: none

    right now I got this while closing with "stop":

    Assertion failed: (!pthread_mutex_lock(&m)), function lock, file /usr/local/include/boost/thread/pthread/recursive_mutex.hpp

  14. gavinandresen commented at 5:28 PM on March 29, 2013: contributor

    @denis2342 : when you say "right now", do you mean with this patch or without it?

  15. denis2342 commented at 10:21 AM on March 30, 2013: none

    yes, with the patch.

    sorry for the delay, I am on the road.

    denis

    Sent from my iPhone

    On 29.03.2013, at 18:29, Gavin Andresen notifications@github.com wrote:

    @denis2342 : when you say "right now", do you mean with this patch or without it?

    — Reply to this email directly or view it on GitHub.

  16. in src/walletdb.cpp:None in aaff917faa outdated
     498 | @@ -498,9 +499,9 @@ void ThreadFlushWalletDB(void* parg)
     499 |      unsigned int nLastSeen = nWalletDBUpdated;
     500 |      unsigned int nLastFlushed = nWalletDBUpdated;
     501 |      int64 nLastWalletUpdate = GetTime();
     502 | -    while (!fShutdown)
     503 | +    while (true)
     504 |      {
     505 | -        Sleep(500);
     506 | +        MilliSleep(500);
    


    laanwj commented at 11:05 AM on March 31, 2013:

    It looks like the Sleep calls were directly replaced by MilliSleep calls. Sleep, however, takes seconds. Was it intended to multiply with a factor 1000 too?

  17. sipa commented at 11:11 AM on March 31, 2013: member

    @laanwj No, Sleep() (as opposed to sleep()) was an own function in util.cpp that takes milliseconds as argument. I'm in favor of renaming it to MilliSleep() to reduce confusion.

  18. laanwj commented at 11:36 AM on March 31, 2013: member

    Yes, agreed. This is certainly an improvement in naming, then.

  19. gavinandresen commented at 9:52 PM on April 1, 2013: contributor

    Squashed a few commits, rebased, and changed unsigned constants from 'u' to 'U'.

    I'm still puzzled as to why shutdown isn't happening immediately on the pull-tester machine, I can't reproduce that problem on either my Mac or a debian VM. I'll debug directly on the pull-tester machine next.

  20. denis2342 commented at 2:39 PM on April 2, 2013: none

    got it again with the latest changes while using "stop":

    ./bitcoind stop Bitcoin server stopping Assertion failed: (!pthread_mutex_lock(&m)), function lock, file /usr/local/include/boost/thread/pthread/recursive_mutex.hpp, line 105.

  21. gavinandresen commented at 4:02 PM on April 2, 2013: contributor

    @denis2342 can you run under a debugger and get a call stack for that assertion? I'm guessing something is being called after main() exits, but I'm just guessing...

  22. gavinandresen commented at 5:30 PM on April 3, 2013: contributor

    Sigh. That took way too long to figure out.

    Boost 1.40 (running on pull-tester) versus 1.53 (running on my machine) difference; boost 1.40's thread_group cannot interrupt_all() if another thread is waiting on join_all().

    I'll rework the code to be boost 1.40-compatible.

  23. Fix signed/unsigned comparison warnings 87b9931bed
  24. Shutdown cleanup prep-work
    Create a boost::thread_group object at the qt/bitcoind main-loop level
    that will hold pointers to all the main-loop threads.
    
    This will replace the vnThreadsRunning[] array.
    
    For testing, ported the BitcoinMiner threads to use its
    own boost::thread_group.
    c8c2fbe07f
  25. Rename util.h Sleep --> MilliSleep
    Two reasons for this change:
    1. Need to always use boost::thread's sleep, even on Windows, so the
    sleeps can be interrupted (prior code used Windows' built-in Sleep).
    
    2. I always forgot what units the old Sleep took.
    1b43bf0d3a
  26. Diapolo commented at 8:41 PM on April 3, 2013: none

    @gavinandresen Why are we supporting such an ancient Boost version anyway? I don't get that :).

  27. LoopForever and ThreadTrace helpers 72f14d26ec
  28. Port Thread* methods to boost::thread_group 21eb5adadb
  29. Clean up shutdown process b31499ec72
  30. Have Qt poll for shutdown requested, the QT way. 723035bb68
  31. BitcoinPullTester commented at 12:37 AM on April 4, 2013: none

    Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/723035bb6839c5d65bfee96d501a8c54814778e3 for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

  32. gavinandresen commented at 1:24 AM on April 4, 2013: contributor

    I'm going to pull this, I suspect @denis2342 is seeing the same intermittent crash-at-shutdown problems that we saw before on OSX (or are you seeing them on freebsd?)

  33. gavinandresen referenced this in commit a0a437c86a on Apr 4, 2013
  34. gavinandresen merged this on Apr 4, 2013
  35. gavinandresen closed this on Apr 4, 2013

  36. gavinandresen deleted the branch on Apr 4, 2013
  37. laudney referenced this in commit 282a094a78 on Mar 19, 2014
  38. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-21 21:16 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me