Lightweight task scheduler #5964

pull gavinandresen wants to merge 7 commits into bitcoin:master from gavinandresen:scheduler changing 15 files +309 −54
  1. gavinandresen commented at 6:09 pm on April 2, 2015: contributor

    Threads have noticeable overhead on 32-bit systems, so using fewer threads for simple tasks is better than starting up threads willy-nilly.

    The first commit implements a simple task queue class (CScheduler) which can be serviced by one or more threads.

    The second and third port the networking code’s DumpAddresses thread to be a task instead.

    Porting other threads (ThreadFlushWalletDB is a good candidate) is left as an exercise for future pull requests.

    The CScheduler class was developed and tested with code here: https://github.com/gavinandresen/scheduler (see main.cpp)

    I tested the DumpAddresses change running with -debug=net and a hacked DEBUG_ADDRESS_INTERVAL, and made sure I got more than one “Flushed %d addresses to peers.dat” in my debug.log.

  2. in src/scheduler.cpp: in 313bcd909d outdated
    37+        else
    38+        {
    39+            // Wait until either there is a new task, or until
    40+            // the time of the first item on the queue:
    41+            while (!taskQueue.empty() &&
    42+                   newTaskScheduled.wait_until(lock, taskQueue.begin()->first) != boost::cv_status::timeout)
    


    sipa commented at 6:40 pm on April 2, 2015:
    Does this work when the timestamp of the first item in the queue is in the past?
  3. in src/scheduler.cpp: in 313bcd909d outdated
    28+    // when the thread is waiting or when the user's function
    29+    // is called.
    30+    while (1)
    31+    {
    32+        if (taskQueue.empty())
    33+        {
    


    sipa commented at 6:41 pm on April 2, 2015:
    Nit: coding style (braces on the same line).
  4. in src/scheduler.cpp: in 313bcd909d outdated
    62+{
    63+    {
    64+        boost::unique_lock<boost::mutex> lock(newTaskMutex);
    65+        taskQueue.insert(std::make_pair(t, f));
    66+    }
    67+    newTaskScheduled.notify_all();
    


    sipa commented at 6:42 pm on April 2, 2015:
    Nit: notify_one should be enough (and slightly more efficient, if we’d ever have multiple runner threads).
  5. jgarzik commented at 6:44 pm on April 2, 2015: contributor

    concept ACK CScheduler ACK

    • Yes, DumpAddresses thread is a prime candidate for this
    • Ditto ThreadFlushWalletDB
    • Disagree on ThreadImport – ThreadImport is a prime candidate for something to be executed in a long running thread of its own.
  6. in src/scheduler.cpp: in 313bcd909d outdated
    30+    while (1)
    31+    {
    32+        if (taskQueue.empty())
    33+        {
    34+            // Wait until there is something to do.
    35+            newTaskScheduled.wait(lock);
    


    sipa commented at 6:45 pm on April 2, 2015:
    .wait() may wake spuriously, so it’s better to turn the surrounding if into a while loop. I believe that means you also won’t have to deal with the queue-empty case in the loop below.
  7. in src/scheduler.h: in 313bcd909d outdated
    30+//
    31+class CScheduler
    32+{
    33+public:
    34+    CScheduler();
    35+
    


    sipa commented at 6:50 pm on April 2, 2015:
    Unsure about this, but leaving the thread running forever may cause destruction order issues if a task is running exactly at shutdown. A destructor that sets an ’exit’ bool flag (which the scheduler thread checks), and waits for all scheduler threads to exit may be needed.

    gavinandresen commented at 3:52 pm on April 3, 2015:
    Thread lifetime managment is up to the caller. For Bitcoin, the thread is added to the main thread group, and all the threads are interrupted and joined before shutdown (see WaitForShutdown() in bitcoind.cpp or BitcoinCore::shutdown() in qt/bitcoin.cpp).
  8. sipa commented at 6:54 pm on April 2, 2015: member
    Agree with @jgarzik; threadimport should not run in the scheduler, as it is expected to take up to hours - blocking several other operations. Wallet flushing is a good candidate.
  9. gavinandresen force-pushed on Apr 3, 2015
  10. gavinandresen force-pushed on Apr 3, 2015
  11. gavinandresen force-pushed on Apr 3, 2015
  12. gavinandresen commented at 3:55 pm on April 3, 2015: contributor

    Fixed @sipa’s nits, including running scheduler.cpp/.h through clang-format.

    And reverted the ThreadImport change, so it is it’s own thread again.

  13. gavinandresen force-pushed on Apr 3, 2015
  14. theuni commented at 0:52 am on April 4, 2015: member

    boost::condition_variable::wait_until and boost::cv_status didn’t show up until 1.50. Can they be worked around?

    Otherwise, we’ll need to enforce >= 1.50 as a build requirement.

  15. sipa commented at 1:06 am on April 4, 2015: member
    They’re easy to work with using timed_wait, I think.
  16. jgarzik commented at 5:15 am on April 5, 2015: contributor
    ut ACK latest rev
  17. laanwj added the label Improvement on Apr 6, 2015
  18. gavinandresen force-pushed on Apr 7, 2015
  19. gavinandresen force-pushed on Apr 7, 2015
  20. gavinandresen force-pushed on Apr 7, 2015
  21. gavinandresen force-pushed on Apr 7, 2015
  22. gavinandresen commented at 8:08 pm on April 7, 2015: contributor

    Three changes:

    1. I decided I didn’t like the the asymmetry of CScheduler creating the thread, but the caller managing it. So I replaced CScheduler::run() with CScheduler::ServiceQueue, which the caller thread-ifys. And added code and a destructor that asserts if thread management isn’t done correctly. That made it easy to wrap the scheduler trhead in util.h’s TraceThread, to get debug.log tracing and handling of exceptions that might be thrown from the scheduled tasks.
    2. Added compatibility code for boost version 1.49 and below.
    3. I fixed a race-condition bug when there are multiple threads servicing the queue.

    I think I’m all done tweaking this now.

  23. in src/scheduler.cpp: in e395785457 outdated
    57+                // Keep waiting until timeout
    58+            }
    59+#endif
    60+            // If there are multiple threads, the queue can empty while we're waiting (another
    61+            // thread may service the task we were waiting on).
    62+            if (taskQueue.empty()) continue;
    


    sipa commented at 8:26 pm on April 7, 2015:
    I was wrong earlier. If you want to support multiple runner threads, you need to do this check inside the while loops above, as the item can disappear while they wait.

    gavinandresen commented at 8:35 pm on April 7, 2015:

    Mmm. I convinced myself an empty check isn’t necessary there, but belt&suspenders says I should just put it back.

    (not needed because: if the queue is emptied during timeout, then the wait loop is timed out and exited and the empty queue is caught here. And since there is no unschedule-task, the only other way the wait can exit is if a new task is added to the queue, which wakes up only one thread and so the queue can’t be empty).

    I’ll add the empty condition to the wait loops.


    sipa commented at 8:43 pm on April 7, 2015:

    Ok, you convinced me… but indeed, belt & suspenders.

    You could even combine the two loops: while (taskQueue.empty() || wait_until(taskQueue.begin()->first)) ?

    EDIT: nevermind. Sorry, ignore my nits here, I’m not thinking clearly.

  24. in src/scheduler.h: in e395785457 outdated
    40+    ~CScheduler();
    41+
    42+    // Start a thread to service the task queue.
    43+    // Run may be called multiple times to have multiple
    44+    // threads servicing the queue.
    45+    boost::thread* run();
    


    sipa commented at 8:30 pm on April 7, 2015:
    This doesn’t seem to exist anymore.
  25. in src/scheduler.cpp: in e395785457 outdated
    65+            taskQueue.erase(taskQueue.begin());
    66+
    67+            // Unlock before calling f, so it can reschedule itself or another task
    68+            // without deadlocking:
    69+            lock.unlock();
    70+            f();
    


    sipa commented at 8:32 pm on April 7, 2015:
    Do you want the loop to exit if the user function throws an exception?

    gavinandresen commented at 8:46 pm on April 7, 2015:

    If it is a thread_interrupted exception, definitely want to exit.

    Any other exception, it seems best to leave it up to the calling code what to do.

    For Bitcoin, the TraceThread will print any exceptions (other than thread_interrupted) to debug.log and exit the scheduler thread. I think that is the behavior we want – tasks should catch their own exceptions, if they don’t that is a bug.

    I don’t feel strongly, though– instead of wrapping with TraceThread we could bring back LoopForever so uncaught exceptions from tasks are logged to debug.log but the scheduler loop keeps running.


    sipa commented at 9:09 pm on April 7, 2015:
    Fair enough. I mostly commented because it may make debugging a bit harder, but I agree: tasks should not throw exceptions (except thread_interrupted, perhaps).

    laanwj commented at 1:26 pm on May 14, 2015:
    As long as run-away exceptions are logged, any solution here is fine with me. It is indeed a bug if it happens.
  26. sipa commented at 9:54 pm on April 7, 2015: member
    extremely-lightly-tested ACK (as in: does not crash).
  27. gavinandresen force-pushed on Apr 8, 2015
  28. gavinandresen commented at 3:56 pm on April 8, 2015: contributor

    Final (?) nits picked (got rid of the obsolete run from the .h, added belt&suspenders checks for empty queue, ran through clang-format again), this should be ready for merge.

    I also did some more testing, writing a stress-test that created 4,000 tasks serviced by 100 threads over a couple of milliseconds ( https://github.com/gavinandresen/scheduler/blob/master/main.cpp ), and all is well, including running under valgrind.

  29. jgarzik commented at 4:01 pm on April 8, 2015: contributor
    lightly tested ACK
  30. theuni commented at 0:24 am on April 10, 2015: member

    @gavinandresen See the top 2 commits here: https://github.com/theuni/bitcoin/tree/5964

    Without the timer fix, using your test program I was flooded with “Repeat!” and everything else returned immediately. With the change from https://github.com/theuni/bitcoin/commit/4d0819394249be25c34db28e4c3648a71ab238fa I get:

     0Microtask counts: 190 198 199 197 196 197 205 201 217 200 
     1Sum: 2000
     2Wait... negative?
     3One
     4Two
     5AlsoTwo
     6Three
     7Gonna start repeating every 2 secs
     8Five
     9Repeat!
    10Repeat!
    11Repeat!
    12Eleven!
    13Repeat!
    14Long-running task, gonna sleep for 11seconds
    15One
    16Repeat!
    17Two
    18Repeat!
    19Repeat!
    20Repeat!
    21Repeat!
    22Done sleeping
    23Repeat!
    

    Each looks to be taking as long as they should.

    It’d be great to see that moved into a test. Given boost’s wait() problems with 1.50-1.52 iirc, I think there’s reason to be cautious.

  31. gavinandresen commented at 12:45 pm on April 10, 2015: contributor

    @theuni : thanks for testing, I’ll pull in those commits and I’ll create a unit test with the microtask test – does it fail without your fixes?

    I don’t want to create a unit test that takes 20 seconds to run.

  32. gavinandresen force-pushed on Apr 10, 2015
  33. gavinandresen force-pushed on Apr 10, 2015
  34. gavinandresen commented at 2:43 pm on April 10, 2015: contributor

    Pulled in fixes from @theuni, and added a new unit test. I measured run-time and memory usage before and after the new unit test, no significant difference on my 64-bit OSX machine:

    before:

     0/usr/bin/time -l test/test_bitcoin
     1Running 148 test cases...
     2
     3*** No errors detected
     4        6.57 real         4.29 user         0.47 sys
     5 197120000  maximum resident set size
     6         0  average shared memory size
     7         0  average unshared data size
     8         0  average unshared stack size
     9     49901  page reclaims
    10         0  page faults
    11         0  swaps
    12         0  block input operations
    13       486  block output operations
    14         0  messages sent
    15         0  messages received
    16         0  signals received
    17       257  voluntary context switches
    18      4137  involuntary context switches
    

    after:

     0Running 148 test cases...
     1
     2*** No errors detected
     3        6.59 real         4.39 user         0.53 sys
     4 197238784  maximum resident set size
     5         0  average shared memory size
     6         0  average unshared data size
     7         0  average unshared stack size
     8     50350  page reclaims
     9         0  page faults
    10         0  swaps
    11         0  block input operations
    12       434  block output operations
    13         0  messages sent
    14         0  messages received
    15         0  signals received
    16       248  voluntary context switches
    17     28623  involuntary context switches
    
  35. sipa commented at 9:26 am on April 11, 2015: member
    utACK
  36. pstratem commented at 7:57 pm on April 24, 2015: contributor

    There should be a mechanism for the scheduled Function to specify the next time it’s called.

    This is necessary for high precision random intervals.

  37. gavinandresen commented at 0:24 am on April 25, 2015: contributor
    Functions can reschedule themselves– pass a reference to the CScheduler object to the function (via boost::bind and boost::ref). See the CScheduler unit tests for examples.
  38. in src/test/scheduler_tests.cpp: in 74b50983cb outdated
    63+        CScheduler::Function f = boost::bind(&microTask, boost::ref(microTasks),
    64+                                             boost::ref(counterMutex[whichCounter]), boost::ref(counter[whichCounter]),
    65+                                             randomDelta(rng), tReschedule);
    66+        microTasks.schedule(f, t);
    67+    }
    68+    boost::this_thread::sleep_for(boost::chrono::microseconds(600));
    


    sipa commented at 10:16 pm on April 25, 2015:
    This line doesn’t compile for me.

    gavinandresen commented at 6:11 pm on April 26, 2015:
    I’ll implement a similar fix to utiltime.cpp’s MilliSleep to workaround broken versions of boost.
  39. gavinandresen force-pushed on Apr 26, 2015
  40. sipa commented at 1:56 pm on April 27, 2015: member
    Tested ACK.
  41. gavinandresen force-pushed on May 12, 2015
  42. gavinandresen commented at 2:10 pm on May 12, 2015: contributor
    Rebased to fix a trivial merge conflict (#include removed from src/bitcoind.cpp). I think this should be merged for 0.11.
  43. luke-jr commented at 6:29 pm on May 13, 2015: member

    Debian 7:

    0test/scheduler_tests.cpp: In member function ‘void scheduler_tests::manythreads::test_method()’:
    1test/scheduler_tests.cpp:68:5: error: ‘sleep_for’ is not a member of ‘boost::this_thread’
    2test/scheduler_tests.cpp:84:5: error: ‘sleep_for’ is not a member of ‘boost::this_thread’
    
  44. gavinandresen force-pushed on May 13, 2015
  45. gavinandresen commented at 7:22 pm on May 13, 2015: contributor

    @luke-jr : fixed. I’d fixed that before, I think it got lost in a rebase / shuffle between working on home/work trees.

    The fix is in the unit test commit, and is:

     0static void MicroSleep(uint64_t n)
     1{
     2#if defined(HAVE_WORKING_BOOST_SLEEP_FOR)
     3    boost::this_thread::sleep_for(boost::chrono::microseconds(n));
     4#elif defined(HAVE_WORKING_BOOST_SLEEP)
     5    boost::this_thread::sleep(boost::posix_time::microseconds(n));
     6#else
     7    //should never get here
     8    #error missing boost sleep implementation
     9#endif
    10}
    
  46. in src/util.h: in 3b77f7b461 outdated
    209- *   new boost::thread(boost::bind(&LoopForever<void (*)()>, "dumpaddr", &DumpAddresses, 900000));
    210- * or maybe:
    211- *    boost::function<void()> f = boost::bind(&FunctionWithArg, argument);
    212- *    threadGroup.create_thread(boost::bind(&LoopForever<boost::function<void()> >, "nothing", f, milliseconds));
    213- */
    214-template <typename Callable> void LoopForever(const char* name,  Callable func, int64_t msecs)
    


    laanwj commented at 1:20 pm on May 14, 2015:
    Happy to see that this gets rid of the awful LoopForever() in util.h.
  47. in src/scheduler.cpp: in 3b77f7b461 outdated
    87+}
    88+
    89+static void Repeat(CScheduler* s, CScheduler::Function f, int64_t deltaSeconds)
    90+{
    91+    f();
    92+    s->scheduleFromNow(boost::bind(&Repeat, s, f, deltaSeconds), deltaSeconds);
    


    laanwj commented at 1:24 pm on May 14, 2015:

    For repeating tasks, there are two options:

    1. Schedule the new run deltaSeconds after completion of the task, every time
    2. Schedule every deltaSeconds, no matter how long the task takes (unless the task takes longer than the interval, then something special has to be done)

    So 1 schedules from the end time of the task, 2 schedules from the beginning time of the task,.

    This appears to implement 1. Good enough in most cases (we don’t need clockwork precision, and this resembles the behavior of LoopForever) but this may be worth documenting.

  48. in src/scheduler.cpp: in 3b77f7b461 outdated
    42+// Wait until either there is a new task, or until
    43+// the time of the first item on the queue:
    44+
    45+// wait_until needs boost 1.50 or later; older versions have timed_wait:
    46+#if BOOST_VERSION < 105000
    47+            while (!taskQueue.empty() && newTaskScheduled.timed_wait(lock, toPosixTime(taskQueue.begin()->first)) == true) {
    


    laanwj commented at 1:32 pm on May 14, 2015:
    nit: please leave out the == true, booleans already make good truth values
  49. laanwj commented at 1:33 pm on May 14, 2015: member
    ACK apart from the nit
  50. [Qt] add defaultConfirmTarget constant to sendcoinsdialog
    - replaces some hard-coded values for the default confirmation target
    - also simplify code that is using the new constant
    e656560edb
  51. CScheduler class for lightweight task scheduling
    Simple class to manage a task queue that is serviced by one or
    more threads.
    928b950e3b
  52. build: make libboost_chrono mandatory
    previously it was only used with certain boost versions. Now all versions
    require it.
    ca66717d89
  53. scheduler: fix with boost <= 1.50 cfefe5b88c
  54. gavinandresen force-pushed on May 14, 2015
  55. gavinandresen commented at 2:58 pm on May 14, 2015: contributor
    Nits picked, will merge as soon as Travis gives the green light.
  56. CScheduler unit test 68d370bec4
  57. Create a scheduler thread for lightweight tasks ddd0acd3db
  58. Use CScheduler for net's DumpAddresses
    Instead of starting Yet Another Thread to dump addresses,
    use CScheduler to do it.
    9a1dcea2df
  59. gavinandresen force-pushed on May 14, 2015
  60. gavinandresen commented at 4:56 pm on May 14, 2015: contributor

    Travis was unhappy– one of the builds was taking more than 50 minutes to complete.

    I toned down the unit test by a factor of ten, in case that is what was causing the problem (it now services hundreds of tasks using 10 threads, instead of thousands using 100 threads)…. if the builds STILL take longer than normal I’ll have to recruit @theuni to help debug.

  61. laanwj commented at 5:00 pm on May 14, 2015: member
    @gavinandresen It initially failed; so I merged #6136 and tried again. It’s expected to take somewhat longer but should at least succeed.
  62. theuni commented at 5:01 pm on May 14, 2015: member
    @gavinandresen I think it’s partially bad timing. #6136 forced travis to rebuild all dependencies and build up a new cache. Should be back to normal in an hour or so.
  63. in src/scheduler.cpp: in 9a1dcea2df
    38+            while (taskQueue.empty()) {
    39+                // Wait until there is something to do.
    40+                newTaskScheduled.wait(lock);
    41+            }
    42+// Wait until either there is a new task, or until
    43+// the time of the first item on the queue:
    


    btcdrak commented at 5:51 pm on May 14, 2015:
    tiny nit, is the indenting here off?
  64. gavinandresen merged this on May 14, 2015
  65. gavinandresen closed this on May 14, 2015

  66. gavinandresen referenced this in commit b4c219b622 on May 14, 2015
  67. luke-jr commented at 7:30 pm on May 14, 2015: member
    Build and unit tests pass on Debian 7: ACK
  68. theuni commented at 2:50 am on May 15, 2015: member
    @gavinandresen I got this earlier on an unrelated PR, looks very related/suspicious: https://travis-ci.org/bitcoin/bitcoin/jobs/62615004#L1344 .
  69. gavinandresen commented at 3:22 am on May 15, 2015: contributor
    Hmmm… Looks like the threads were interrupted before finishing (slow VM maybe). I’ll try to rework the test tomorrow so it doesn’t interrupt the threads.
  70. MarcoFalke referenced this in commit e84a5f0004 on Apr 15, 2020
  71. sidhujag referenced this in commit 5d8334f126 on Apr 16, 2020
  72. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me