Avoid excessive lock contention in CCheckQueue::Add #23397

pull hebasto wants to merge 2 commits into bitcoin:master from hebasto:211030-contention changing 1 files +15 −7
  1. hebasto commented at 5:26 pm on October 30, 2021: member

    This PR significantly reduces lock contention in the CCheckQueue class by releasing a mutex before calling std::condition_variable::notify_one and std::condition_variable::notify_all.

    From C++ docs:

    The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s); in fact doing so is a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock.

    Related to:

  2. hebasto renamed this:
    211030 contention
    Avoid excessive lock contention in CCheckQueue::Add
    on Oct 30, 2021
  3. hebasto commented at 5:30 pm on October 30, 2021: member
  4. lsilva01 approved
  5. lsilva01 commented at 9:16 pm on October 30, 2021: contributor

    tACK 5bcfd70 on Ubuntu 20.04

    This PR moves LOCK(m_mutex) to a smaller scope, so when code reaches notify_one()or notify_all(), the mutex is already released. It also changes CCheckQueue ::Add to exit earlier if vChecks is empty.

  6. jamesob commented at 2:11 pm on November 1, 2021: member
    @hebasto do you expect there are significant performance implications for this change? Should I benchmark?
  7. hebasto commented at 2:25 pm on November 1, 2021: member

    @jamesob

    @hebasto do you expect there are significant performance implications for this change?

    Not sure about “significant” :)

    Should I benchmark?

    I’ll appreciate it! Also I have a plan to improve CCheckQueue performance further (not here, of course).

  8. in src/checkqueue.h:170 in 0e41699722 outdated
    166@@ -167,15 +167,20 @@ class CCheckQueue
    167     //! Add a batch of checks to the queue
    168     void Add(std::vector<T>& vChecks)
    169     {
    170-        LOCK(m_mutex);
    171-        for (T& check : vChecks) {
    172-            queue.push_back(T());
    173-            check.swap(queue.back());
    174+        unsigned int num_of_new_checks;
    


    sipa commented at 3:31 pm on November 1, 2021:

    unsigned int is typically smaller than the size of a vector. Use size_t or std::vector<T>::size_type perhaps?

    I very much doubt we’ll ever have more than 4 billion checks queued up, but it also seems simpler to make the type can just account for everything.


    hebasto commented at 3:54 pm on November 1, 2021:

    I’ve chosen unsinged int due to https://github.com/bitcoin/bitcoin/blob/5adc5c02800f00d1e6e8812a2b0559b1800e82e9/src/checkqueue.h#L60

    Maybe use size_t for nTodo as well?


    sipa commented at 4:49 pm on November 1, 2021:
    Ah, I see, that makes sense. I think changing to size_t everywhere can be done in a separate PR, if desires.

    JeremyRubin commented at 6:56 pm on November 1, 2021:
    the max number of checks is IIRC something slightly above 100k
  9. in src/checkqueue.h:183 in 0e41699722 outdated
    185-        if (vChecks.size() == 1)
    186+
    187+        if (num_of_new_checks == 1)
    188             m_worker_cv.notify_one();
    189-        else if (vChecks.size() > 1)
    190+        else if (num_of_new_checks > 1)
    


    sipa commented at 3:32 pm on November 1, 2021:
    If you’re touching this code, make it follow style? (no indentation without brances).

    hebasto commented at 3:51 pm on November 1, 2021:
    Style is fixed in the second commit. Or do you want me to follow style here?

    sipa commented at 5:45 pm on November 1, 2021:
    Oh, that’s fine.
  10. sipa commented at 3:33 pm on November 1, 2021: member
    Concept ACK, but left some nits.
  11. in src/checkqueue.h:177 in 0e41699722 outdated
    176+            LOCK(m_mutex);
    177+            for (T& check : vChecks) {
    178+                queue.push_back(T());
    179+                check.swap(queue.back());
    180+            }
    181+            num_of_new_checks = vChecks.size();
    


    martinus commented at 6:24 pm on November 1, 2021:
    nit: I think you could also write const auto num_of_new_checks = vChecks.size(); above, saves one line of code in the lock and doesn’t keep an uninitialized variable around for a few lines.

    sipa commented at 7:09 pm on November 1, 2021:
    Actually none of that is necessary. vChecks doesn’t change size during the adding loop. The old vChecks.size() based checks can be kept.

    hebasto commented at 8:48 pm on November 1, 2021:

    vChecks doesn’t change size during the adding loop.

    Correct. But vChecks is passed in CCheckQueue::Add by a non-const reference, and that makes possible its modification by a concurrent thread between leaving a critical section and checking num_of_new_checks == 1.

    Although, the only caller is located in CChainState::ConnectBlock which is guarded by the cs_main mutex. Therefore, you are right.

    The old vChecks.size() based checks can be kept.

    Yes, it will work. But while looking into CCheckQueue::Add code, it is not obvious that vChecks.size() is constant in multi-threading environment. I think that adding a local variable makes code more readable and robust considering any possible changes at the caller site(s) in the future.

    OTOH, it seems all could look clearer if we pass vChecks as an rvalue reference using move semantics. What do you think?


    vasild commented at 3:50 pm on November 2, 2021:

    I would never think that a reference argument could be changed by another thread while the function is executing. I hope we don’t do that anywhere in the code. Because how would the function avoid races? Which mutex protects the argument? That would require something like (it does not compile, of course):

    0void func(T& arg GUARDED_BY(some_mutex), ...)
    1{
    2    ...
    3    LOCK(some_mutex);
    4    access arg;
    5    ...
    

    So, I think it is safe to assume that vChecks will not change while Add() is executing. Indeed the only caller passes in a local variable that is not visible by other threads (and then throws it away, so the “swap” is actually a “move”).

    … it seems all could look clearer if we pass vChecks as an rvalue reference using move semantics …

    Yes :) and std::move() each element from vChecks into queue. It is unnecessary to first create a default-constructed object only to swap it with another and throw it away. This is what std::move() is for.


    sipa commented at 4:06 pm on November 2, 2021:

    @hebasto That’s not how const works in C++. The lack thereof means you (the called function in this case) are allowed to modify the argument; it says nothing at all about whether another thread might be modifying the same variable. Even if the argument was const, nothing prevents another thread from holding a mutable reference to the same object.

    That said, passing an argument (const or not) to a function while it is being modified by other threads would be completely non-idiomatic, dangerous code. It is possible, and sometimes necessary of course, but at least you’d expect the passed object to have internal mutexes or other synchronization mechanisms to protect its state in that case; not something the callee should deal with. Virtually every STL function will exhibit undefined behavior if you pass it references to arguments that are being modified concurrently. Outside of very exceptional cases that should raise big red flags, you can very much assume that arguments passed to a function are under exclusive control of that function for the duration of its execution.


    hebasto commented at 4:36 pm on November 2, 2021:

    @sipa Thanks for your detailed explanation.

    Going to address all of the comments.

  12. in src/checkqueue.h:174 in 0e41699722 outdated
    173-            check.swap(queue.back());
    174+        unsigned int num_of_new_checks;
    175+        {
    176+            LOCK(m_mutex);
    177+            for (T& check : vChecks) {
    178+                queue.push_back(T());
    


    martinus commented at 6:25 pm on November 1, 2021:
    nit: maybe not part of this PR, but using queue.emplace_back() instead queue.push_back(T()) would save one move/copy constructor and one destructor because the object could be directly constructed in the queue

    vasild commented at 3:36 pm on November 2, 2021:
    What about queue.push_back(std::move(check))?

    martinus commented at 4:21 pm on November 2, 2021:

    I tried that once but then tests fail, I think for the types used here std::swap is implemented but a move constructor isn’t. I didn’t have a closer look why that fails.

    I think the nicest c++ solution would be to replace the whole loop with

    0queue.insert(queue.end(), std::make_move_iterator(vChecks.begin()), std::make_move_iterator(vChecks.end()));
    

    But this too requires properly implementing move constructor.

  13. martinus approved
  14. martinus commented at 6:34 pm on November 1, 2021: contributor
    utACK 5bcfd70a3b0e008d7a80e37a29fe90887a0f8c66. I doubt that this makes any measurable difference, but I’ve often been wrong with benchmark guess and shorter locks are always better :+1:
  15. JeremyRubin commented at 6:59 pm on November 1, 2021: contributor
    would love to see some thorough testing on this. I could be misremembering, but check cross platform that there’s no reason to continue to hold the lock. IIRC there was some sort of edge condition (maybe in window?) around a notify being able to be missed or something, but the details are like 5-6 years out of my cache :)
  16. in src/checkqueue.h:170 in 5bcfd70a3b outdated
    166@@ -167,16 +167,26 @@ class CCheckQueue
    167     //! Add a batch of checks to the queue
    168     void Add(std::vector<T>& vChecks)
    169     {
    170-        LOCK(m_mutex);
    171-        for (T& check : vChecks) {
    172-            queue.push_back(T());
    173-            check.swap(queue.back());
    174+        if (vChecks.empty()) {
    


    vasild commented at 3:56 pm on November 2, 2021:

    The code below assumes vChecks may be modified by another thread while this is executing (it cannot actually). I guess either remove that assumption or move this access to vChecks under m_mutex (if that mutex it supposed to protect vChecks!?).

    It is a race to call std::vector::empty() while modifying it concurrently.


    hebasto commented at 4:38 pm on November 2, 2021:

    You are right.

    I guess either remove that assumption or move this access to vChecks under m_mutex (if that mutex it supposed to protect vChecks!?).

    The former will go.

  17. Avoid excessive lock contention in CCheckQueue::Add c43aa62343
  18. Exit early for an empty vChecks in CCheckQueue::Add 459e208276
  19. hebasto force-pushed on Nov 3, 2021
  20. hebasto commented at 9:36 am on November 3, 2021: member

    @sipa @martinus @vasild

    Thank you for your reviews and your suggestions!

    Updated 5bcfd70a3b0e008d7a80e37a29fe90887a0f8c66 -> 459e208276a4d1457d37bf41c977e62caf05456d (pr23397.01 -> pr23397.02, diff):

    • addressed all of the comments
  21. vasild approved
  22. vasild commented at 11:47 am on November 3, 2021: member

    ACK 459e208276a4d1457d37bf41c977e62caf05456d

    The CI failure looks unrelated.

  23. jamesob commented at 12:44 pm on November 3, 2021: member

    ACKish - I haven’t reviewed the code, but it works for the benchmarks and so I guess in some sense I’ve tested it. As expected, no big diff in performance.

    #23397 vs. $mergebase (absolute)

    bench name x #23397 $mergebase
    ibd.local.range.500000.540000.total_secs 2 4826.3510 (± 0.4951) 4831.4970 (± 6.8399)
    ibd.local.range.500000.540000.peak_rss_KiB 2 6410470.0000 (± 1750.0000) 6406678.0000 (± 2714.0000)
    ibd.local.range.500000.540000.cpu_kernel_secs 2 345.3800 (± 3.1200) 348.2850 (± 0.8050)
    ibd.local.range.500000.540000.cpu_user_secs 2 30102.6800 (± 5.2600) 30086.9150 (± 24.1150)

    #23397 vs. $mergebase (relative)

    bench name x #23397 $mergebase
    ibd.local.range.500000.540000.total_secs 2 1.000 1.001
    ibd.local.range.500000.540000.peak_rss_KiB 2 1.001 1.000
    ibd.local.range.500000.540000.cpu_kernel_secs 2 1.000 1.008
    ibd.local.range.500000.540000.cpu_user_secs 2 1.001 1.000
  24. martinus commented at 5:23 pm on November 3, 2021: contributor
    ACK 459e208, codereview and tested. I first thought this introduced a segfault in psbt_wallet_tests/psbt_updater_test because that test failed for me, but thats a different issue fixed in #23403.
  25. theStack approved
  26. theStack commented at 2:38 pm on November 24, 2021: member

    Code-review ACK 459e208276a4d1457d37bf41c977e62caf05456d

    Maybe I’m missing something, but IMHO there is currently no possible way that CCheckQueue:Add gets called with an empty vector as argument, as the size always equals the number of inputs of a transaction (see last out parameter of CheckInputScripts). It’d be only empty for coinbase transactions, but for those control.Add(...) doesn’t get called. That said, it doesn’t hurt if the empty check in Add is still there.

  27. MarcoFalke merged this on Nov 29, 2021
  28. MarcoFalke closed this on Nov 29, 2021

  29. sidhujag referenced this in commit 537b8a836d on Nov 29, 2021
  30. hebasto deleted the branch on Nov 29, 2021
  31. RandyMcMillan referenced this in commit a6aa71207f on Dec 23, 2021
  32. PastaPastaPasta referenced this in commit c8374bd577 on Apr 7, 2022
  33. PastaPastaPasta referenced this in commit 25f8e062c0 on Apr 7, 2022
  34. PastaPastaPasta referenced this in commit 572737fff5 on Apr 7, 2022
  35. PastaPastaPasta referenced this in commit 22838d1b8c on Apr 11, 2022
  36. DrahtBot locked this on Nov 29, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-23 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me