Nuke boost::thread and boost::thread_group #8631

pull theuni wants to merge 19 commits into bitcoin:master from theuni:no-interrupt-threads changing 33 files +400 −188
  1. theuni commented at 2:33 am on August 31, 2016: member

    This replaces #8023 with a solution that moves us directly to std::thread, rather than using an intermediate wrapper.

    Background

    std::thread is pretty much a drop-in replacement for boost::thread, except that boost::thread is interruptible. We’ve used those interruptions, but we’ll have to drop them eventually as they didn’t become part of the spec.

    When an interruption point is hit, it throws an instance of boost::thread_interrupted. boost::thread catches these automatically by default.

    The interruption points are defined as:

     0
     1    boost::thread::join()
     2    boost::thread::timed_join()
     3    boost::thread::try_join_for(),
     4    boost::thread::try_join_until(),
     5    boost::condition_variable::wait()
     6    boost::condition_variable::timed_wait()
     7    boost::condition_variable::wait_for()
     8    boost::condition_variable::wait_until()
     9    boost::condition_variable_any::wait()
    10    boost::condition_variable_any::timed_wait()
    11    boost::condition_variable_any::wait_for()
    12    boost::condition_variable_any::wait_until()
    13    boost::thread::sleep()
    14    boost::this_thread::sleep_for()
    15    boost::this_thread::sleep_until()
    16    boost::this_thread::interruption_point()
    

    The ones relevant to us are primarily boost::condition_variable_any::wait(), boost::this_thread::sleep_for(), and of course boost::this_thread::interruption_point()

    Any boost::thread will immediately throw boost::thread_interrupted if those are hit at any point. So we can’t simply s/boost::thread/std::thread/g, as we would never be able to exit.

    Changes

    • All threads are now externally interruptible. Rather than having a single bool for shutting down, I’ve attempted to notify each relevant subsystem. But because they’re not well-defined, the notifications are somewhat arbitrary as well.
    • For long-running threads, our own thread_interrupted and interruption_point have been added.
    • Threads should call TraceThread if they may hit an interruption_point.
    • MilliSleep is no longer interruptible.
    • boost::thread_group is gone. All threads are now managed individually

    Future changes

    There is lots of cleanup to be done post-merge. I was very tempted to include cleanups here, but I’ve elected to keep the changes minimal. Todo:

    • s/boost::mutex/std::mutex/
    • s/boost::condition_variable/std::condition_variable/
    • s/boost::chrono/std::chrono/
    • Drop a ton of boost includes
    • Drop other boost cruft (like catching boost::thread_interrupted)

    Also, I started on this by creating a base class for launching threads. Each instance would be responsible for implementing Interrupt() and Stop(). But because this is already a complex set of changes, I think it’s best to do that as a separate step.

  2. fanquake added the label Refactoring on Aug 31, 2016
  3. JeremyRubin commented at 3:46 am on August 31, 2016: contributor

    Concept ACK.

    Will review & test soon.

    The boost interruption_point paradigm is messy and can make code hard to reason about. In my code for #8464 I have also removed boost::thread and interruption points, preferring more explicit quitting. Dealing with the interruption code made working on #8464 more difficult than it had to be.

    One Nit, which I think is worth addressing. I would refactor the names from Interrupt to Quit, Shutdown, or even ControlledExit. Connotation of interrupt is a non-consensual break, whereas Quit or Shutdown implies being done as a normal part of control flow.

  4. theuni commented at 3:52 am on August 31, 2016: member

    @JeremyRubin Nearly every thread uses separate interrupt/shutdown calls for exactly that reason.

    Interrupt is non-consensual, and means “stop what you’re doing and don’t start anything else”. Shutdown means “please gracefully exit now and don’t return until you do”.

  5. JeremyRubin commented at 3:59 am on August 31, 2016: contributor
    @theuni Ah, gotcha. Was looking at small sample; my bad. I guess my comment should instead be, we should eventually migrate to shutdown only no interrupt threads; but this is a big step in that direction and precursor to that happening.
  6. sipa commented at 3:30 pm on August 31, 2016: member
    Concept ACK, testing.
  7. paveljanik commented at 4:02 pm on August 31, 2016: contributor

    OS X here:

     0Making all in src
     1  CXX      test/test_test_bitcoin-scheduler_tests.o
     2In file included from test/scheduler_tests.cpp:8:
     3In file included from ./test/test_bitcoin.h:16:
     4/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/thread:337:5: error: attempt to use a deleted function
     5    __invoke(std::__1::move(std::__1::get<0>(__t)), std::__1::move(std::__1::get<_Indices>(__t))...);
     6    ^
     7/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/thread:347:5: note: in instantiation of function template specialization 'std::__1::__thread_execute<void (CScheduler::*)(), std::__1::reference_wrapper<CScheduler> , 1>' requested here
     8    __thread_execute(*__p, _Index());
     9    ^
    10/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/thread:359:42: note: in instantiation of function template specialization 'std::__1::__thread_proxy<std::__1::tuple<void (CScheduler::*)(), std::__1::reference_wrapper<CScheduler> > >' requested here
    11    int __ec = pthread_create(&__t_, 0, &__thread_proxy<_Gp>, __p.get());
    12                                         ^
    13/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:1673:31: note: in instantiation of function template specialization 'std::__1::thread::thread<void (CScheduler::*)(), std::__1::reference_wrapper<CScheduler> , void>' requested here
    14            ::new((void*)__p) _Up(std::__1::forward<_Args>(__args)...);
    15                              ^
    16/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:1600:18: note: in instantiation of function template specialization 'std::__1::allocator<std::__1::thread>::construct<std::__1::thread, void (CScheduler::*)(), std::__1::reference_wrapper<CScheduler> >' requested here
    17            {__a.construct(__p, std::__1::forward<_Args>(__args)...);}
    18                 ^
    19/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:1453:14: note: in instantiation of function template specialization 'std::__1::allocator_traits<std::__1::allocator<std::__1::thread> >::__construct<std::__1::thread, void (CScheduler::*)(), std::__1::reference_wrapper<CScheduler> >' requested here
    20            {__construct(__has_construct<allocator_type, _Tp*, _Args...>(),
    21             ^
    22/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/vector:1643:25: note: in instantiation of function template specialization 'std::__1::allocator_traits<std::__1::allocator<std::__1::thread> >::construct<std::__1::thread, void (CScheduler::*)(), std::__1::reference_wrapper<CScheduler> >' requested here
    23        __alloc_traits::construct(this->__alloc(),
    24                        ^
    25test/scheduler_tests.cpp:89:22: note: in instantiation of function template specialization 'std::__1::vector<std::__1::thread, std::__1::allocator<std::__1::thread> >::emplace_back<void (CScheduler::*)(), std::__1::reference_wrapper<CScheduler> >' requested here
    26        microThreads.emplace_back(&CScheduler::serviceQueue, std::ref(microTasks));
    27                     ^
    28/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/type_traits:1043:5: note: '~__nat' has been explicitly marked deleted here
    29    ~__nat() = delete;
    30    ^
    
  8. paveljanik commented at 6:04 pm on August 31, 2016: contributor
    Compile error fixed here.
  9. sipa commented at 11:06 pm on August 31, 2016: member
    Needs rebase.
  10. theuni force-pushed on Sep 1, 2016
  11. theuni commented at 0:10 am on September 1, 2016: member
    rebased and squashed in the compile fix.
  12. laanwj commented at 12:10 pm on September 1, 2016: member

    Nearly every thread uses separate interrupt/shutdown calls for exactly that reason.

    There’s another reason, and that is that interrupt is supposed to be an instant operation at the caller site, and shutdown is allowed to wait. This allows parallelism in the shutdown process:

    • Send module A interrupt (A starts terminating its threads, which may take some time)
    • Send module B interrupt (B starts terminating its threads, which may take some time)

    • Module A shutdown (waits for A’s threads to join)
    • Module B shutdown (waits for B’s threads to join)

    Sure, the naming may not be optimal, “interrupt” is more like a hint to stop work, but I think the approach makes sense.

    (I’m sure this can be implemented more thoroughly with explicit shutdown-dependencies and std::futures, but the idea is the same, to make shutdown as fast as possible)

    Edit: re-triggered travis, the failure was a timeout, probably has nothing to do with the changes here.

  13. laanwj commented at 1:47 pm on September 1, 2016: member
    Travis does keep failing. There may be an issue after all.
  14. theuni commented at 3:32 pm on September 1, 2016: member
    Hmm, travis was happy before the rebase. I’ll try to reproduce locally, maybe something changed.
  15. theuni force-pushed on Sep 1, 2016
  16. theuni force-pushed on Sep 2, 2016
  17. theuni commented at 5:05 am on September 2, 2016: member
    The travis failures were caused by the rebase after all. ab48c5e72156b34300db4a6521cb3c9969be3937 introduced a new test that relied on the interruptible semantics. I’ve added a commit that should fix that up.
  18. theuni force-pushed on Sep 2, 2016
  19. theuni force-pushed on Sep 2, 2016
  20. btcdrak commented at 11:21 am on September 2, 2016: contributor
    needs rebase again :(
  21. threads: turn upnp into a std::thread d083ab9701
  22. threads: make scheduler interruptible 8447a96ee2
  23. threads: make dnsseed interruptible ed68c688d3
  24. threads: eliminate the need for this_thread::interruption_point
    Also note that boost::thread catches boost::thread_interrupted itself,
    but std::thread obviously won't catch thread_interrupted. So don't rethrow
    after catching.
    61f71715ad
  25. threads: use std::thread for net threads
    Also, drop threadGroup for them and treat them individually
    54cb55b208
  26. threads: add interruptible sleep for net threads d20fe536aa
  27. threads: turn MilliSleep into a wrapper around std::this_thread::sleep_for 3f31fe5dff
  28. threads: use std::thread for torcontrol d798f58733
  29. threads: make script check threads interruptible eb126ca399
  30. threads: use std::thread for threadflushing 8e1347a894
  31. threads: use a std::thread for importing 1fa9450000
  32. threads: make the genesis block wait interruptible 7b757d561e
  33. threads: convert free-running threads to std::thread a15b49c78b
  34. threads: use std::condition_variable/mutex for messageHandlerCondition 69d38c0601
  35. threads: break loops rather than throwing when possible abc1ebf12c
  36. threads: Use threadwrapper for http server/worker threads
    This ensures that any interruptions are caught
    3499425150
  37. threads: get rid of MilliSleep and interruption_point in ThreadFlushWalletDB f6bc83b36c
  38. threads: nuke the last remnants of boost::thread_group e890e93456
  39. threads: remove recent reintroduction of threadgroups
    And since std::threads can't be interrupted, the checkqueue must be manually
    interrupted
    01f8700b0e
  40. theuni force-pushed on Sep 2, 2016
  41. in src/init.cpp: in 8447a96ee2 outdated
    1085@@ -1078,7 +1086,7 @@ bool AppInit2(boost::thread_group& threadGroup, CScheduler& scheduler)
    1086 
    1087     // Start the lightweight task scheduler thread
    1088     CScheduler::Function serviceLoop = boost::bind(&CScheduler::serviceQueue, &scheduler);
    1089-    threadGroup.create_thread(boost::bind(&TraceThread<CScheduler::Function>, "scheduler", serviceLoop));
    1090+    scheduler_thread = std::thread(std::bind(&TraceThread<CScheduler::Function>, "scheduler", serviceLoop));
    


    JeremyRubin commented at 7:46 am on September 5, 2016:

    Nit:

    I find the semantics of bind to be pretty confusing confusing. If we’re cleaning up threading semantics I think it might be clearer to do as follows:

    scheduler_thread = std::thread({ TraceThread(“scheduler”, CScheduler::serviceQueue(scheduler)); };

    This should also allow you to get rid of the explicit template param.


    theuni commented at 7:06 pm on September 6, 2016:
    I don’t think it’s possible to do this without a bind, as TraceThread requires a bound instance, though I’d be happy to be proven wrong :)

    JeremyRubin commented at 7:51 pm on September 6, 2016:

    Github weirdly formatted my comment so you couldn’t see the lambda.

    0scheduler_thread = std::thread([&]{
    1    TraceThread("scheduler", [&]{scheduler.serviceQueue();} );
    2 };
    

    Does something like that not work? Maybe it’s not much better than bind…


    theuni commented at 7:56 pm on September 6, 2016:

    Ah, I didn’t realize that you meant to replace the binds with lambdas. I’m not sure that I find this any less confusing, but I don’t really mind either way:

    0scheduler_thread = std::thread([&scheduler] {
    1    TraceThread("scheduler", [&scheduler] {
    2        scheduler.serviceQueue();
    3    });
    4});
    

    theuni commented at 7:58 pm on September 6, 2016:
    Heh, didn’t see your response before I posted, but looks like we’re on the same page. Either one is fine by me.
  42. in src/net.cpp: in ed68c688d3 outdated
    119@@ -120,6 +120,12 @@ boost::condition_variable messageHandlerCondition;
    120 static CNodeSignals g_signals;
    121 CNodeSignals& GetNodeSignals() { return g_signals; }
    122 
    123+static std::condition_variable net_interrupt_cond;
    124+static std::mutex cs_net_interrupt;
    125+static std::atomic<bool> net_interrupted(false);
    


    JeremyRubin commented at 7:58 am on September 5, 2016:

    This is OK – initially, I was confused as to why you use an atomic var for the condition variable, as usually access is guarded by the lock so atomic would be overkill. But in this case, it’s fine.

    May be worth writing a profile at some point for if it is better to use a cond+atomic or to lock on writing the condition externally, or just throwing a comment in here as to why you’re using an atomic.


    theuni commented at 7:08 pm on September 6, 2016:

    Right, that’s not at all clear from the individual commits. The atomic is used as the break condition for a few condvars, as well as the loop break condition in a few places. Rather than using separate bools for each, I just lumped them all into this one.

    Will add a comment.

  43. in src/net.cpp: in 54cb55b208 outdated
    2191-    threadGroup.create_thread(boost::bind(&TraceThread<void (*)()>, "opencon", &ThreadOpenConnections));
    2192+    open_connections_thread = std::thread(std::bind(&TraceThread<void (*)()>, "opencon", &ThreadOpenConnections));
    2193 
    2194     // Process messages
    2195-    threadGroup.create_thread(boost::bind(&TraceThread<void (*)()>, "msghand", &ThreadMessageHandler));
    2196+    message_handler_thread = std::thread(std::bind(&TraceThread<void (*)()>, "msghand", &ThreadMessageHandler));
    


    JeremyRubin commented at 8:08 am on September 5, 2016:
    repeating comment about using lambdas.

    theuni commented at 7:23 pm on September 6, 2016:
    Yep, these don’t need the binds. Will fix.
  44. in src/checkqueue.h: in eb126ca399 outdated
    153@@ -145,12 +154,14 @@ class CCheckQueue
    154     //! Add a batch of checks to the queue
    155     void Add(std::vector<T>& vChecks)
    156     {
    157-        boost::unique_lock<boost::mutex> lock(mutex);
    158-        BOOST_FOREACH (T& check, vChecks) {
    159-            queue.push_back(T());
    160-            check.swap(queue.back());
    161+        {
    


    JeremyRubin commented at 8:18 am on September 5, 2016:
    Just noting that there is a slight functional change here, the lock is released earlier but this seems a-OK.
  45. in src/checkqueue.h: in eb126ca399 outdated
     8@@ -9,9 +9,8 @@
     9 #include <vector>
    10 
    11 #include <boost/foreach.hpp>
    12-#include <boost/thread/condition_variable.hpp>
    13-#include <boost/thread/locks.hpp>
    14-#include <boost/thread/mutex.hpp>
    15+#include <condition_variable>
    


    JeremyRubin commented at 8:21 am on September 5, 2016:
    It might be worth holding off on this commit given that my Lockfree Checkqueue PR also touches all these places; assuming i can get that merged, and it also swaps in std::thread.
  46. in src/init.cpp: in 7b757d561e outdated
    171@@ -172,6 +172,10 @@ static std::thread import_thread;
    172 static std::thread flush_wallet_thread;
    173 #endif
    174 
    175+static bool fHaveGenesis = false;
    176+static boost::mutex cs_GenesisWait;
    177+static CConditionVariable condvar_GenesisWait;
    


    JeremyRubin commented at 8:27 am on September 5, 2016:
    CConditionVariable is a boost::condition_variable!!
  47. in src/net.cpp: in 69d38c0601 outdated
    1965@@ -1966,7 +1966,7 @@ void ThreadMessageHandler()
    1966         }
    1967 
    1968         if (fSleep)
    1969-            messageHandlerCondition.timed_wait(lock, boost::posix_time::microsec_clock::universal_time() + boost::posix_time::milliseconds(100));
    1970+            messageHandlerCondition.wait_for(lock, std::chrono::milliseconds(100), []()->bool {return net_interrupted; });
    


    JeremyRubin commented at 8:42 am on September 5, 2016:
    nit: there is no interruption point in the new code
  48. in src/net.cpp: in 69d38c0601 outdated
    1909@@ -1910,8 +1910,8 @@ bool OpenNetworkConnection(const CAddress& addrConnect, bool fCountFailure, CSem
    1910 
    1911 void ThreadMessageHandler()
    1912 {
    1913-    boost::mutex condition_mutex;
    1914-    boost::unique_lock<boost::mutex> lock(condition_mutex);
    1915+    std::mutex condition_mutex;
    


    JeremyRubin commented at 8:42 am on September 5, 2016:
    Do these do anything? Can you just kill them? Was that mutex supposed to be (originally) static?

    theuni commented at 7:28 pm on September 6, 2016:

    The condvar has to use some lock, I assume this was just done to avoid creating a mutex/condvar per loop iteration. The mutex doesn’t need to guard anything since the wake condition is atomic.

    Sure, they could be static.


    JeremyRubin commented at 2:47 am on September 13, 2016:

    So I’ve actually discovered a nice little trick for this –

    0struct dummy_mutex {
    1void lock(){};
    2void unlock(){};
    3 };
    

    if you want to make it more clear that it’s a worthless lock.

  49. in src/net.cpp: in abc1ebf12c outdated
    1114@@ -1115,7 +1115,7 @@ static void AcceptConnection(const ListenSocket& hListenSocket) {
    1115 void ThreadSocketHandler()
    1116 {
    1117     unsigned int nPrevNodeCount = 0;
    1118-    while (true)
    1119+    while (!net_interrupted)
    


    JeremyRubin commented at 8:45 am on September 5, 2016:
    This is a good commit; throwing generates bad code 👍
  50. in src/net.cpp: in d20fe536aa outdated
    135+    bool ret = false;
    136+    {
    137+        std::unique_lock<std::mutex> lock(cs_net_interrupt);
    138+        ret = net_interrupt_cond.wait_for(lock, std::chrono::milliseconds(n), []()->bool{return net_interrupted; });
    139+    }
    140+    interruption_point(ret);
    


    JeremyRubin commented at 8:51 am on September 5, 2016:
    might be better (although less clean) to use a noexcept version that returns a bool.

    theuni commented at 7:30 pm on September 6, 2016:
    I agree, these were only kept as-is to avoid scope creep.
  51. in src/utiltime.cpp: in 3f31fe5dff outdated
    71-    boost::this_thread::sleep(boost::posix_time::milliseconds(n));
    72-#else
    73-//should never get here
    74-#error missing boost sleep implementation
    75-#endif
    76+    std::this_thread::sleep_for(std::chrono::milliseconds(n));
    


    JeremyRubin commented at 8:52 am on September 5, 2016:
    👍
  52. in src/test/transaction_tests.cpp: in 01f8700b0e
    469@@ -470,12 +470,14 @@ BOOST_AUTO_TEST_CASE(test_big_witness_transaction) {
    470 
    471     // check all inputs concurrently, with the cache
    472     PrecomputedTransactionData txdata(tx);
    473-    boost::thread_group threadGroup;
    474+    std::vector<std::thread> threadGroup;
    


    JeremyRubin commented at 8:53 am on September 5, 2016:
    same comment; this is fixed in my WIP lockfree stuff so may want to leave it assuming I get mine merged.
  53. JeremyRubin commented at 8:57 am on September 5, 2016: contributor

    @theuni I’ve gone through and I’ll give you a “tentative” utack. Mostly looks fine.

    I’d suggest mostly:

    • checking typedefs which secretly use boost::*
    • Getting rid of the exception throwing interruption point. It’s a bad default.
    • Getting rid of std::bind in favor of lambdas.
    • depending on status of my wip scriptcheck queue stuff, leaving those two commits out.
  54. theuni commented at 7:36 pm on September 6, 2016: member

    @JeremyRubin Thanks for the review! To your points:

    • I agree somewhat on the typedefs, but I’d like to avoid this PR getting bigger and bigger. So let’s do these case-by-case.
    • I don’t think we can get rid of the exception throwing yet. I hate the interruption_point logic, but I don’t think it’s realistic to remove it all in one PR. I certainly agree with minimizing it though.
    • Agree on losing the binds where possible
    • Without the scriptcheck changes, this can’t be tested as we can’t shutdown. So I’m afraid we’ll have to step on eachother’s toes until one is merged. Obviously I’m happy to drop the scriptcheck changes and adapt to the new stuff if it goes in first.
  55. JeremyRubin commented at 7:57 pm on September 6, 2016: contributor

    @theuni The only one I think you need to fix for this pr is CConditionVariable. Were there others you’re worried about?

    No worries about the scriptcheck stuff, we’ll cross that bridge when it comes.

  56. theuni commented at 9:01 pm on September 6, 2016: member

    @JeremyRubin Fixing up CConditionVariable means doing lots of mutex/lock replacements all over the place. I’d really rather do that as a follow-up to keep this one minimal.

    Now that we’re not using any boost::thread’s, I don’t believe there’s any major functional difference.

  57. JeremyRubin commented at 9:22 pm on September 6, 2016: contributor
    @theuni sounds fine, this one is nuking boost::thread not boost::condition_variable after all :)
  58. btcdrak commented at 11:04 am on September 17, 2016: contributor
    needs rebase
  59. in src/bitcoin-cli.cpp: in 01f8700b0e
    293@@ -294,6 +294,8 @@ int CommandLineRPC(int argc, char *argv[])
    294     catch (const boost::thread_interrupted&) {
    


    laanwj commented at 9:48 am on October 11, 2016:
    Does this still need to catch boost::thread_interrupted? (another one below)

    theuni commented at 10:08 am on October 11, 2016:
    No, see “future changes” in the OP. There are a bunch of these to get rid of. I guess it doesn’t make sense to do that later, I’ll add it here.

    laanwj commented at 10:23 am on October 11, 2016:
    Sorry, yes you mention it there. It just looked strange to me to add something instead of replace.
  60. fanquake commented at 12:30 pm on November 6, 2016: member
    Concept ACK. Needs a rebase. Tagging for 0.14.0
  61. fanquake added this to the milestone 0.14.0 on Nov 6, 2016
  62. fanquake commented at 12:53 pm on November 12, 2016: member
    There also some comments/examples about using boost thread in scheduler that could be nuked.
  63. theuni commented at 3:01 am on December 6, 2016: member
    Closing for now, need to get a specific chunk of this in first.
  64. theuni closed this on Dec 6, 2016

  65. fanquake moved this from the "In progress" to the "Later" column in a project

  66. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me