Process logs in a separate thread #13200

pull jamesob wants to merge 1 commits into bitcoin:master from jamesob:2018-05-asynclog changing 3 files +160 −2
  1. jamesob commented at 2:51 pm on May 9, 2018: member

    This changeset renders and flushes log messages in a dedicated thread, preventing the originating thread from blocking (or failing) on fwrite, fflush, or the various string manipulations done on log messages. Originating threads push log messages into a buffer that is continually flushed.

    Benchmarks live in the comments below.

    Open questions:

    • Do we want this behavior by default? I think so, given that it seems safer and more performant than doing log processing synchronously with Important Stuff.
    • Do we want to allow an opt-out configuration option that makes log processing synchronous again?
    • The ring buffer implementation allows either overwriting existing data or blocking when at full capacity. In the case of logging, I’ve set it to block and wait for capacity instead of dropping messages. Does this seem right?

    Future work:

    • Reconcile this change with #13168 in case that’s merged first (I have a patch)
    • Even when logging is disabled, buffer debug logs without flushing and then handle SIGSEGV et al by flushing the log buffer (idea being to give the user crash diagnostics).
  2. fanquake added the label Utils/log/libs on May 9, 2018
  3. jamesob force-pushed on May 9, 2018
  4. jamesob force-pushed on May 9, 2018
  5. jamesob force-pushed on May 9, 2018
  6. in src/logging.cpp:287 in f3f96027eb outdated
    282+    std::once_flag flush_logs_thread_started;
    283+
    284+    static void ConsumeLogs()
    285+    {
    286+        std::unique_ptr<LogArgs> next_log_line;
    287+        while (next_log_line = log_buffer.PollForOne()) {
    


    promag commented at 3:24 pm on May 9, 2018:
    Maybe add batch support?
  7. jnewbery commented at 3:32 pm on May 9, 2018: member

    Concept ACK.

    My thoughts on your open questions:

    • Do we want this behavior by default? [JN] Yes
    • Do we want to allow an opt-out configuration option that makes log processing synchronous again [JN] Yes
    • The ring buffer implementation allows either overwriting existing data or blocking when at full capacity. In the case of logging, I’ve set it to block and wait for capacity instead of dropping messages. Does this seem right? [JN] I think there are at least three options here:
      1. overwrite existing data
      2. drop new messages
      3. block I’d go with (2), but push a final message to the ring buffer saying Logging buffer full - dropping new log messages, and then when the ring buffer is accepting messages again, push a message saying Logging buffer accepting new messages. That makes it very clear to anyone reading the log that messages were dropped between timestamp A and timestamp B.
  8. practicalswift commented at 5:48 pm on May 9, 2018: contributor

    Concept ACK!

    Agree with @jnewbery regarding the open questions.

    Benchmarks would be interesting.

  9. MarcoFalke commented at 6:17 pm on May 9, 2018: member
    Wouldn’t blocking be the status-quo and thus preferable for now?
  10. jamesob force-pushed on May 9, 2018
  11. in src/ringbuffer.h:23 in 3c13cef155 outdated
    18+/**
    19+ * A threadsafe circular buffer.
    20+ *
    21+ * Allows optional blocking behavior instead of overwriting when at capacity.
    22+ */
    23+template<typename ET, int S, typename ST=int>
    


    donaloconnor commented at 10:17 pm on May 9, 2018:
    Do we really need the size type template parameter? Especially if you are passing an int through as a value. size_t makes more sense to me since std::array is size_t and signed sizes (int) also make no sense.

    jamesob commented at 6:47 pm on May 10, 2018:
    Yep, agree. Fixed.
  12. in src/ringbuffer.h:38 in 3c13cef155 outdated
    33+    /** @return the number of elements in the buffer. */
    34+    inline size_type Size() const {
    35+        std::lock_guard<std::mutex> lock(m_lock);
    36+        return m_size;
    37+    }
    38+    inline size_type MaxSize() const { return S; }
    


    donaloconnor commented at 10:19 pm on May 9, 2018:
    constexpr ? I guess inline helps some here.

    jamesob commented at 6:47 pm on May 10, 2018:
    Done.
  13. in src/ringbuffer.h:111 in 3c13cef155 outdated
    106+
    107+    /** Number of elements in the buffer. */
    108+    size_type m_size;
    109+
    110+    /** Index at which the next insertion will happen. */
    111+    std::atomic<ST> m_insert_at_idx;
    


    donaloconnor commented at 10:22 pm on May 9, 2018:
    it doesn’t make sense why having these (m_insert_at_idxand m_read_at_idx) as std::atomic is necessary if they are protected by the mutex

    jamesob commented at 6:47 pm on May 10, 2018:
    Good point, fixed.
  14. in src/ringbuffer.h:134 in 3c13cef155 outdated
    129+    value_type AssumeLockedPopFront()
    130+    {
    131+        assert(m_size > 0);
    132+
    133+        value_type data = std::move(m_buffer[m_read_at_idx]);
    134+        m_buffer[m_read_at_idx] = value_type();
    


    donaloconnor commented at 10:25 pm on May 9, 2018:
    if the object is movable it will be moved and the value will be empty already but if it’s not movable and it’s copied then I guess there’s no big deal letting the data present? Seems unnecessary to construct an empty object in a free slot again.

    jamesob commented at 6:47 pm on May 10, 2018:
    Agree, fixed.
  15. in src/ringbuffer.h:51 in 3c13cef155 outdated
    46+    /**
    47+     * Push an element into the buffer.
    48+     *
    49+     * @arg no_overwrite  If true, block and wait to write instead of overwriting existing data.
    50+     */
    51+    size_type PushBack(value_type v, bool no_overwrite = true)
    


    donaloconnor commented at 10:28 pm on May 9, 2018:
    I think having a move version of PushBack with const v& ref push back would make more sense to avoid unnecessary copying. PushBack(value_type&& v) and PushBack(const value_type& v)

    jamesob commented at 5:07 pm on May 10, 2018:
    Good catch. Shouldn’t const value_type& bind to both rvalues and lvalues, though?

    jamesob commented at 6:47 pm on May 10, 2018:
    Fixed.
  16. in src/logging.cpp:294 in 4e8af464b3 outdated
    289+        }
    290+    }
    291+
    292+    void FlushAll()
    293+    {
    294+        for (LogArgs s : log_buffer.PopAll()) {
    


    donaloconnor commented at 10:45 pm on May 9, 2018:
    LogArgs& s to avoid copies

    jamesob commented at 6:47 pm on May 10, 2018:
    Good catch, fixed.
  17. in src/ringbuffer.h:104 in 4e8af464b3 outdated
    100+        }
    101+        return std::move(values);
    102+    }
    103+
    104+private:
    105+    std::array<value_type, S> m_buffer;
    


    donaloconnor commented at 10:57 pm on May 9, 2018:

    I feel we should use dynamic/heap memory for this for the following reasons:

    1. This is only allocated once on start up and is of fixed size.
    2. Passing a large value to this array can potentially have 2 problems:
      1. In the stack it can overflow (if used later by someone else)
      2. In global/static data it will grow the executable size.

    Anyway just what I feel. Maybe std::array is okay in this case though. I’m just imagining uses of this class further down the line


    jamesob commented at 8:16 pm on May 10, 2018:
    I’m going to leave this as-is for now since I don’t think it’s a real problem, but if you have thoughts to the contrary I’m curious to hear.

    donaloconnor commented at 8:30 pm on May 10, 2018:
    Sounds good to me thanks! I went a bit overboard with the comments anyway.

    jamesob commented at 8:31 pm on May 10, 2018:
    No, I thought your feedback was very useful - thanks!

    skeees commented at 10:11 pm on May 12, 2018:
    I think this is fine - if you want it on the heap you can alloc the entire RingBuffer on the heap
  18. donaloconnor changes_requested
  19. donaloconnor commented at 10:58 pm on May 9, 2018: contributor

    Concept ACK

    Good idea :)

    Just a few things I’ve spotted.

    Regarding your questions:

    Do we want this behavior by default? I think so, given that it seems safer and more performant than doing log processing synchronously with Important Stuff.

    Yes

    Do we want to allow an opt-out configuration option that makes log processing synchronous again?

    I’m a bit against adding config options for these kind of things. It either should work in all cases or not be used. I think for logging we should stick to one version and not have it user configurable.

    The ring buffer implementation allows either overwriting existing data or blocking when at full capacity. In the case of logging, I’ve set it to block and wait for capacity instead of dropping messages. Does this seem right?

    I’m very against dropping logging data. It shouldn’t happen. The threads pushing logs should wait if it does block but the buffer should be big enough that this is rare anyway.

  20. jamesob force-pushed on May 10, 2018
  21. jamesob force-pushed on May 10, 2018
  22. jamesob commented at 8:30 pm on May 10, 2018: member

    I’ve pushed changes that address @donaloconnor’s good feedback (thanks, Donal).

    Still trying to decide what to do re: blocking vs. dropping messages; I’m inclined towards the approach that @jnewbery suggests since it seems to be the safest.

  23. jamesob force-pushed on May 11, 2018
  24. in src/ringbuffer.h:23 in 354aac5a69 outdated
    18+/**
    19+ * A threadsafe circular buffer.
    20+ *
    21+ * Allows optional blocking behavior instead of overwriting when at capacity.
    22+ */
    23+template<typename ET, int S>
    


    skeees commented at 10:07 pm on May 12, 2018:
    consider making these template parameter names less abbreviated
  25. in src/ringbuffer.h:79 in 354aac5a69 outdated
    74+     *
    75+     * @return false if the poll was interrupted without executing func.
    76+     */
    77+    std::unique_ptr<value_type> PollForOne(int poll_interval_ms = 200)
    78+    {
    79+        while (!m_stop_waiting.load()) {
    


    skeees commented at 10:14 pm on May 12, 2018:
    this approach seems strange - why not just use another condition variable the same way you do when you push things to the buffer?

    skeees commented at 10:27 pm on May 12, 2018:
    if you are doing this just to handle interruption on shutdown - you can do the boost::thread::interrupt() boost::condition_var and catch a boost::thread_interrupted exception instead of polling in a loop - its another option

    jamesob commented at 2:25 pm on May 14, 2018:
    I think that we’re trying to move away from boost’s thread stuff, but I think your general suggestion holds - we can probably just flip a condition variable + boolean in Shutdown() to get the same sort of behavior. The polling was a pretty gross hack.
  26. in src/ringbuffer.h:33 in 354aac5a69 outdated
    28+
    29+    RingBuffer() : m_size(0), m_insert_at_idx(0), m_read_at_idx(0), m_stop_waiting(false) { }
    30+    ~RingBuffer() { }
    31+
    32+    /** @return the number of elements in the buffer. */
    33+    inline size_t Size() const {
    


    skeees commented at 10:19 pm on May 12, 2018:
    inline seems overkill here?

    skeees commented at 10:22 pm on May 12, 2018:
    also - if you wanna do the size_t stuff tradition / precedence would suggest doing using size_type = size_t
  27. in src/ringbuffer.h:97 in 354aac5a69 outdated
    92+    std::vector<value_type> PopAll()
    93+    {
    94+        std::unique_lock<std::mutex> lock(m_lock);
    95+        std::vector<value_type> values;
    96+
    97+        while (m_size > 0 && m_read_at_idx != m_insert_at_idx) {
    


    skeees commented at 10:20 pm on May 12, 2018:
    sufficient here to just do the size check?

    skeees commented at 10:45 pm on May 12, 2018:
    also - you could just do this as two range copies one from read_at to min(read_at + m_size, S) and another from 0 to m_size - the size of the stuff you copied previously
  28. in src/ringbuffer.h:60 in 354aac5a69 outdated
    55+        }
    56+        m_buffer[m_insert_at_idx] = v;
    57+
    58+        // We're overwriting existing data, so increment the read index too.
    59+        if (m_size > 0 && m_insert_at_idx == m_read_at_idx) m_read_at_idx = IncrementIdx(m_read_at_idx);
    60+        if (m_size < S) ++m_size;
    


    skeees commented at 10:26 pm on May 12, 2018:
    more simply written as `if(m_size == S) Increment else ++size ?
  29. in src/ringbuffer.h:77 in 354aac5a69 outdated
    72+     * Block until the buffer has content to process, polling periodically, and
    73+     * then return a pointer to an element. Allow interrupts by using SignalStopWaiting().
    74+     *
    75+     * @return false if the poll was interrupted without executing func.
    76+     */
    77+    std::unique_ptr<value_type> PollForOne(int poll_interval_ms = 200)
    


    skeees commented at 10:28 pm on May 12, 2018:
    why a pointer here? if you want to show that this function can fail consider this signature: bool PollForOne(value_type& popped)

    jamesob commented at 2:27 pm on May 14, 2018:
    Ah yeah, that totally makes sense.
  30. in src/ringbuffer.h:75 in 354aac5a69 outdated
    70+
    71+    /**
    72+     * Block until the buffer has content to process, polling periodically, and
    73+     * then return a pointer to an element. Allow interrupts by using SignalStopWaiting().
    74+     *
    75+     * @return false if the poll was interrupted without executing func.
    


    skeees commented at 10:28 pm on May 12, 2018:
    needs update
  31. in src/ringbuffer.h:39 in 354aac5a69 outdated
    34+        std::lock_guard<std::mutex> lock(m_lock);
    35+        return m_size;
    36+    }
    37+    constexpr size_t MaxSize() const { return S; }
    38+
    39+    value_type PopFront()
    


    skeees commented at 10:33 pm on May 12, 2018:
    afaict this method is unsafe right now AssumeLockedPopFront (called internally) will assert if the buffer is empty - and there is no way currently to guarantee that I only call this method when there is something to pop
  32. in src/ringbuffer.h:29 in 354aac5a69 outdated
    24+class RingBuffer
    25+{
    26+public:
    27+    using value_type = ET;
    28+
    29+    RingBuffer() : m_size(0), m_insert_at_idx(0), m_read_at_idx(0), m_stop_waiting(false) { }
    


    skeees commented at 10:39 pm on May 12, 2018:
    consider dropping m_read_at_idx - it can be expressed as a method (m_insert_at_idx + m_size) % MaxSize() - one less variable to ensure that you properly update

    jamesob commented at 2:34 pm on May 14, 2018:
    I don’t think that works. Let’s say we instantiate a buffer with MaxSize() == 10 and add 5 elements, making m_insert_at_idx == 5. I’d want to start reading from index 0, but your formula gives me 5 + 4 % 10 = 9, which isn’t right. Maybe I’m missing something here.

    skeees commented at 2:40 pm on May 14, 2018:
    oh sorry - i flipped them - m_insert_at_idx == (m_read_at_idx + m_size) % MaxSize()
  33. in src/ringbuffer.h:95 in 354aac5a69 outdated
    90+
    91+    /** @return all values currently in the buffer after popping them. */
    92+    std::vector<value_type> PopAll()
    93+    {
    94+        std::unique_lock<std::mutex> lock(m_lock);
    95+        std::vector<value_type> values;
    


    skeees commented at 10:43 pm on May 12, 2018:
    can preallocate to m_size
  34. in src/logging.cpp:278 in 354aac5a69 outdated
    273@@ -271,3 +274,44 @@ void BCLog::Logger::ShrinkDebugFile()
    274     else if (file != nullptr)
    275         fclose(file);
    276 }
    277+
    278+namespace async_logging {
    


    skeees commented at 10:51 pm on May 12, 2018:
    you might as well make this a class instead of essentially a static object. who knows when we might want to have separate instances logging different modules to different log files and its very easy to just make this a class now

    jamesob commented at 2:35 pm on May 14, 2018:
    Yeah, fair point.
  35. in src/logging.cpp:301 in 354aac5a69 outdated
    296+        }
    297+    }
    298+
    299+    void Queue(const std::string& str)
    300+    {
    301+        std::call_once(flush_logs_thread_started, [](){
    


    skeees commented at 10:52 pm on May 12, 2018:
    much clearer to just have a start method, or RAII the class if you can?
  36. in src/logging.cpp:304 in 354aac5a69 outdated
    299+    void Queue(const std::string& str)
    300+    {
    301+        std::call_once(flush_logs_thread_started, [](){
    302+            flush_logs_thread.reset(new std::thread(ConsumeLogs));
    303+        });
    304+        log_buffer.PushBack(std::move(str));
    


    skeees commented at 10:52 pm on May 12, 2018:
    i think you want to std::forward here and declare the arg const std::string&&

    jamesob commented at 2:36 pm on May 14, 2018:
    Ah yeah, thanks.
  37. in src/ringbuffer.h:67 in 354aac5a69 outdated
    62+
    63+        return m_size;
    64+    }
    65+
    66+    /** Interrupt anyone waiting for reads. */
    67+    void SignalStopWaiting() {
    


    skeees commented at 3:52 pm on May 13, 2018:
    to handle shutdown safely you need to have a way to interrupt writers who might be waiting for empty space too
  38. in src/logging.cpp:307 in 354aac5a69 outdated
    302+            flush_logs_thread.reset(new std::thread(ConsumeLogs));
    303+        });
    304+        log_buffer.PushBack(std::move(str));
    305+    }
    306+
    307+    void Shutdown()
    


    skeees commented at 3:59 pm on May 13, 2018:
    if for whatever reason there are more logs generated after you’ve flushed in this method - they wont ever get to the file. I think you mentioned somewhere having an option to enable synchronous logging. You should probably flip that flag that here so if something happens to get logged after the async logger is shut down it makes its way into the file, added benefit that you can be a bit less careful about when in the shutdown sequence the logger is terminated

    jamesob commented at 2:37 pm on May 14, 2018:
    Yep, good point.
  39. in src/bitcoind.cpp:196 in 354aac5a69 outdated
    190@@ -191,5 +191,10 @@ int main(int argc, char* argv[])
    191     // Connect bitcoind signal handlers
    192     noui_connect();
    193 
    194-    return (AppInit(argc, argv) ? EXIT_SUCCESS : EXIT_FAILURE);
    195+    int status = AppInit(argc, argv);
    196+
    197+    if (!status) {
    


    skeees commented at 4:00 pm on May 13, 2018:
    i think if you do shutdown here it will only work for coind but not qt but in init.cpp it will for both? not entirely sure though
  40. in src/init.cpp:294 in 354aac5a69 outdated
    290@@ -291,6 +291,7 @@ void Shutdown()
    291     globalVerifyHandle.reset();
    292     ECC_Stop();
    293     LogPrintf("%s: done\n", __func__);
    294+    async_logging::Shutdown();
    


    skeees commented at 4:12 pm on May 13, 2018:
    oh - why do in both? if you are just concerned about capturing all logs, see my comment on the Shutdown() method below
  41. in src/ringbuffer.h:56 in 354aac5a69 outdated
    51+    {
    52+        std::unique_lock<std::mutex> lock(m_lock);
    53+        if (no_overwrite && m_size == S) {
    54+            m_read_condvar.wait(lock, [this]() { return this->m_size < S; });
    55+        }
    56+        m_buffer[m_insert_at_idx] = v;
    


    skeees commented at 4:18 pm on May 13, 2018:
    same thing as in a prior comment - use a forward with an rvalue reference
  42. in src/ringbuffer.h:83 in 354aac5a69 outdated
    78+    {
    79+        while (!m_stop_waiting.load()) {
    80+            {
    81+                std::unique_lock<std::mutex> lock(m_lock);
    82+                if (m_size > 0) {
    83+                    return std::unique_ptr<value_type>(new value_type(AssumeLockedPopFront()));
    


    skeees commented at 4:19 pm on May 13, 2018:
    i think you are doing an unnecessary copy here
  43. skeees commented at 4:26 pm on May 13, 2018: contributor

    left some comments - two general ones though:

    • you really wrote a thread safe ringbuffer - if you think this will be useful elsewhere, consider having just a pure implementation of the ringbuffer with a derivation (or wrapper) that adds the synchronization logic
    • Some of your api naming/consistency is a bit confusing. You have: PushBack() - might block if no_overwrite is set PopFront() - never blocks - user must know data is available PollForOne() - blocks, might return null

    On the block vs drop debate - I think I would vote in the blocking camp since that is essentially current behavior, followed by dropping older messages

  44. jamesob commented at 2:56 pm on May 14, 2018: member

    Thanks for the thorough review and good feedback here, @skeees. I’ll address most of your comments in an upcoming push.

    Sadly, the benchmarks aren’t looking good on this one. Either my implementation of the ringbuffer is dog slow or this Concept is eminently NACKable.

    Doing an IBD to height 250,000 from one (chain-synced) virtual machine to another (unsynced) over my local network, this PR performed roughly 77% slower than master 196c5a (9m31s vs. 16m57s). That’s a baffling deficit but I’ve been able to reproduce it consistently.

    I’m going to try applying @TheBlueMatt’s previous work (https://github.com/bitcoinfibre/bitcoinfibre/commit/6b6a3aef0663775b63bac7d0aa07ec5fc4eb9fc9) to master and see if that performs any better. If it doesn’t, I’m going to just abandon async logging and continue work on a periodic benchmarking system.

  45. skeees commented at 3:16 pm on May 14, 2018: contributor
    I’d do some profiling / tweaking before you abandon this entirely. I’d think this should be at a minimum equivalent - and potentially substantially faster than the current way things are logged.
  46. jamesob commented at 6:59 pm on May 14, 2018: member

    @TheBlueMatt’s patch (https://github.com/jamesob/bitcoin/commit/7ab904ad933a7cec5c29cf4daf3f91efff93c676) is better than mine, but still a minute (i.e. 10%) slower than master on an SSD:

    master

    0vagrant@bench:/tmp/2018-05-09-master-6186a/bitcoin$ time ./src/bitcoind -datadir=`pwd`/data -conf=`pwd`/bitcoin.conf -debug=all -printtoconsole=0 -connect=0 -addnode=10.0.0.10 -listen=1 -stopatheight=250000
    1
    2real    9m33.070s
    3user    8m9.261s
    4sys     3m47.927s
    

    bluematt-asynclog

    0vagrant@bench:/tmp/2018-05-11-2018-05-asynclog-13145/bitcoin$ rm -rf data && mkdir data && time ./src/bitcoind -datadir=`pwd`/data -conf=`pwd`/bitcoi
    1n.conf -debug=all -printtoconsole=0 -connect=0 -addnode=10.0.0.10 -listen=1 -stopatheight=250000
    2
    3real    10m26.887s
    4user    9m3.219s
    5sys     4m49.201s
    
  47. skeees commented at 12:49 pm on May 15, 2018: contributor

    I was curious about the unexpectedly slow performance and tried something slightly different - https://github.com/skeees/bitcoin/commit/1bf2307b9bf406cc199f3671e856ecd7a582b2a8

    its a bit messy, but good enough for benchmarking which seems to suggest it performs slightly better than the current logging, especially if you give it short workloads (e.g. do a reindex w a stopatheight of 150k) - though I imagine this effect gets washed away if you did e.g. a full IBD.

  48. jamesob commented at 8:01 pm on May 15, 2018: member

    Nice, @skeees! That approach performs the best so far in terms of async logging implementations, though it’s still roughly on par with master in my benchmark:

    0vagrant@bench:/tmp/2018-05-11-2018-05-asynclog-13145/bitcoin$ rm -rf data && mkdir data && time ./src/bitcoind -datadir=`pwd`/data -conf=`pwd`/bitcoin.conf -debug=all -printtoconsole=0 -connect=0 -addnode=10.0.0.10 -listen=1 -stopatheight=250000
    1terminate called without an active exception
    2Aborted (core dumped)
    3
    4real    9m37.566s
    5user    8m27.668s
    6sys     3m38.395s
    

    (the core dump is just an unclean shutdown of the logging thread) @skeees suggested some ideas for further improvements on his draft which he or I may fool around with.

  49. skeees commented at 6:50 pm on May 16, 2018: contributor

    I cleaned this up a bit more and updated my branch: https://github.com/skeees/bitcoin/tree/async-log-2 (still does not shutdown cleanly)

    For benchmarks like the below (a reindex of 1st 100k blocks), I see a substantial improvement - that slowly fades away and longer duration things (e.g. a large IBD) is effectively equivalent - which suggests this may have some performance benefit for short bursty stuff that generates lots of logs

    0(~/src/bitcoin:git:master)% time ./src/bitcoind -stopatheight=100000 -noprinttoconsole -reindex -debug
    1./src/bitcoind -stopatheight=100000 -noprinttoconsole -reindex -debug  75.24s user 20.76s system 79% cpu 2:00.70 total
    2(~/src/bitcoin:git:async-log-2)% time ./src/bitcoind -stopatheight=100000 -noprinttoconsole -reindex -debug 
    3./src/bitcoind -stopatheight=100000 -noprinttoconsole -reindex -debug  73.26s user 9.93s system 107% cpu 1:17.38 total
    
  50. jamesob force-pushed on May 21, 2018
  51. jamesob force-pushed on May 21, 2018
  52. jamesob commented at 8:32 pm on May 21, 2018: member

    I think I’m giving up the ghost on this one. I’ve taken @skeees’ latest work, fixed shutdown, and simplified the code a bit, but I still can’t get benchmarks to show improvement.

    Below is a round of three reindexes to height 100,000:

    master (https://github.com/bitcoin/bitcoin/commit/196c5a947a071f7c9c5b211f0a47787a7d540ecb)

     0vagrant@bench:/tmp/2018-05-09-master-6186a/bitcoin$ for _ in `seq 3`; do time ./src/bitcoind -datadir=`pwd`/data -conf=`pwd`/bitcoin.conf -debug=all
     1-printtoconsole=0 -connect=0 -addnode=10.0.0.10 -listen=1 -stopatheight=100000 -reindex -logasync=0; done
     2
     3real    1m21.080s
     4user    0m33.459s
     5sys     0m37.487s
     6
     7real    1m4.716s
     8user    0m34.033s
     9sys     0m33.567s
    10
    11real    1m23.858s
    12user    0m34.043s
    13sys     0m36.156s
    

    this branch (https://github.com/bitcoin/bitcoin/pull/13200/commits/34000b1aaa626f1547ad80693e5b636857426f65)

     0vagrant@bench:/tmp/2018-05-11-2018-05-asynclog-13145/bitcoin$ for _ in `seq 3`; do time ./src/bitcoind -datadir=`pwd`/data -conf=`pwd`/bitcoin.conf -debug=all -printtoconsole=0 -connect=0 -addnode=10.0.0.10 -listen=1 -stopatheight=100000 -reindex; done
     1
     2real    1m4.269s
     3user    0m31.258s
     4sys     0m33.798s
     5
     6real    1m30.629s
     7user    0m31.691s
     8sys     0m40.961s
     9
    10real    1m14.528s
    11user    0m32.676s
    12sys     0m38.040s
    

    skeees version (https://github.com/skeees/bitcoin/commit/57dc3fcdc0b90fa4f0bff004eed8ca5ae21ad6bd)

     0vagrant@bench:/tmp/2018-05-11-2018-05-asynclog-13145/bitcoin$ for _ in `seq 3`; do time ./src/bitcoind -datadir=`pwd`/data -conf=`pwd`/bitcoin.conf -debug=all -printtoconsole=0 -connect=0 -addnode=10.0.0.10 -listen=1 -stopatheight=100000 -reindex; done
     1terminate called without an active exception
     2Aborted (core dumped)
     3
     4real    1m18.180s
     5user    0m33.764s
     6sys     0m39.323s
     7terminate called without an active exception
     8Aborted (core dumped)
     9
    10real    3m17.639s
    11user    0m45.444s
    12sys     2m21.804s
    13terminate called without an active exception
    14Aborted (core dumped)
    15
    16real    4m59.937s
    17user    0m48.212s
    18sys     3m35.134s
    

    Can’t tell what’s going on with those last two runs.

    In any case, I don’t think the numbers here justify the complexity of this changeset. It’s surprising to me that we aren’t seeing more improvement, but I’ve got other stuff to do. I’m going to abandon this PR for now, but if anyone wants to pick it up they’re welcome to.

  53. jamesob closed this on May 21, 2018

  54. practicalswift commented at 8:42 pm on May 21, 2018: contributor
    I would like to thank you for doing this experimentation @jamesob. Negative results give insights too. Thanks!
  55. jamesob commented at 2:13 pm on May 23, 2018: member
    Reopening because @skeees has some interesting bench results…
  56. jamesob reopened this on May 23, 2018

  57. jamesob force-pushed on May 23, 2018
  58. skeees commented at 2:40 pm on May 23, 2018: contributor

    wrote a quick benchmarking script here: https://gist.github.com/skeees/93d3ab3ce8ceabfa1b68df51d3b36b60 It compares the different logging approaches mentioned above - specifically a reindex and and ibd up to a fixed height

    Some sample results below (we’ve now had a couple of independent reproductions across a couple of different machines) - and the latest commit on this pr is consistently faster than existing implementations for microbenchmarks (100k blocks) and equivalent if you change height in the script to a larger number (e.g. if you set HEIGHT=250000 in the script)

    I think, given the below, there’s decent evidence that this will improve performance for bursty things, and is worth pursuing. A couple things still need to be cleaned up (shutdown mostly)

    (total is wall clock time)

     0benchmarking reindex _pin-1805-asynclog-master
     159.65s user 7.97s system 197% cpu 34.296 total
     259.55s user 7.88s system 198% cpu 33.963 total
     3benchmarking reindex _pin-1805-asynclog-bluematt
     462.03s user 10.91s system 224% cpu 32.530 total
     562.88s user 11.10s system 227% cpu 32.569 total
     6benchmarking reindex _pin-1805-asynclog-skeees-jamesob
     757.02s user 5.61s system 232% cpu 26.952 total
     856.61s user 5.49s system 232% cpu 26.750 total
     9
    10benchmarking ibd _pin-1805-asynclog-master
    1168.38s user 11.55s system 175% cpu 45.427 total
    1268.25s user 11.60s system 182% cpu 43.749 total
    13benchmarking ibd _pin-1805-asynclog-bluematt
    1472.10s user 15.20s system 207% cpu 42.017 total
    1572.34s user 15.10s system 207% cpu 42.170 total
    16benchmarking ibd _pin-1805-asynclog-skeees-jamesob
    1765.20s user 8.46s system 210% cpu 35.001 total
    1865.10s user 8.88s system 209% cpu 35.307 total
    
  59. logging: perform logfile writes in a separate thread
    Instead of flushing logs to disk synchronously within the
    originating thread, push the log message into a ring buffer to
    be consumed and written to disk by a separate, logging-specific
    thread.
    
    Co-authored-by: James O'Beirne <james@chaincode.com>
    74ae99112e
  60. jamesob force-pushed on May 23, 2018
  61. practicalswift commented at 5:15 pm on May 23, 2018: contributor

    @skeees Is _pin-1805-asynclog-master the base case – i.e. before this PR?

    Do you know roughly how many log messages that were printed during the reindex and ibd rounds respectively?

  62. skeees commented at 5:32 pm on May 23, 2018: contributor

    yeah - exactly master is a recent master (6916024768ec57a00f54224640ab4e4871d2a30a), and then i took matt’s and my+jamesob’s changes and rebased on top of that master - those are the 3 cases. you should be able to run that script if you wanna reproduce

    good question - looks like the logs were ~150MB each run, 1.25mm lines

  63. DrahtBot closed this on Jul 22, 2018

  64. DrahtBot commented at 11:49 pm on July 22, 2018: member
  65. DrahtBot reopened this on Jul 22, 2018

  66. jamesob commented at 1:36 pm on August 21, 2018: member
    Benchmarking with bitcoinperf shows a speedup of 4.6% on IBD up to height 200,000 with this change (details here).
  67. DrahtBot commented at 6:55 pm on September 7, 2018: member
    • #14209 (logging: Replace LogPrint macros with regular functions by MarcoFalke)
    • #13088 (Log early messages with -printtoconsole by ajtowns)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  68. gmaxwell commented at 10:17 am on September 9, 2018: contributor

    I’d go with (2), but push a final message to the ring buffer saying Logging buffer full - dropping new log messages, and then when the ring buffer is accepting messages again, push a message saying Logging buffer accepting new messages

    How about (4) Store a sequence number, always overwrite. When reading out, notice when there were missed messages via gaps in the sequence number and log “X missed log entries”.

    Aside, benchmarking on the first 100k blocks is something I’d strongly advise against. It’s an almost completely different load than reality. The first couple hundred thousand blocks are almost completely empty, so tests against them aren’t much different than a purely synthetic bench_bitcoin thing. If you’re doing a lot of this sort of testing, its useful to bring a node up to height 450k, stop it, back it up and benchmark reindexing up to 500k.

  69. DrahtBot commented at 9:29 am on September 13, 2018: member
  70. DrahtBot added the label Needs rebase on Sep 13, 2018
  71. jamesob commented at 3:39 pm on September 14, 2018: member

    If you’re doing a lot of this sort of testing, its useful to bring a node up to height 450k, stop it, back it up and benchmark reindexing up to 500k.

    That sounds much more reasonable. I’ll rerun benches on that part of the chain in the next few weeks.

  72. DrahtBot commented at 8:22 pm on December 14, 2018: member
  73. DrahtBot added the label Up for grabs on Dec 14, 2018
  74. DrahtBot closed this on Dec 14, 2018

  75. laanwj removed the label Needs rebase on Oct 24, 2019
  76. elichai commented at 7:51 pm on July 9, 2020: contributor
    Just a thought I had, idk if there’s lock contention here but you could remove locking from this entirely and replace with atomic pointers, and then you swap the buffer pointer with a new empty buffer pointer, that way the effect on the rest of the threads should be at worse a single cache miss
  77. jamesob commented at 8:30 pm on July 9, 2020: member

    Just a thought I had

    I don’t plan to put any more work into this, but this might be a good project to pick up for a newish contributor.

  78. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-19 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me