Improve shutdown process #11006

pull promag wants to merge 1 commits into bitcoin:master from promag:201708-fast-shutdown changing 1 files +2 −0
  1. promag commented at 3:02 am on August 8, 2017: member

    Improve the shutdown time by not having to wait up to 2 seconds.

    Here is a comparison running wallet.py function tests before this PR:

     02017-08-08 03:25:20.881000 TestFramework (INFO): Initializing test directory /var/folders/1v/8_69hby54nj2k3n6fywt44x80000gn/T/testq_ramjjr
     12017-08-08 03:25:23.853000 TestFramework (INFO): Mining blocks...
     22017-08-08 03:25:24.132000 TestFramework (INFO): test getmemoryinfo
     32017-08-08 03:25:24.559000 TestFramework (INFO): test gettxout
     42017-08-08 03:25:59.858000 TestFramework (INFO): check -rescan
     52017-08-08 03:26:07.735000 TestFramework (INFO): check -reindex
     62017-08-08 03:26:15.751000 TestFramework (INFO): check -zapwallettxes=1
     72017-08-08 03:26:24.105000 TestFramework (INFO): check -zapwallettxes=2
     82017-08-08 03:26:36.694000 TestFramework (INFO): Stopping nodes
     92017-08-08 03:26:43.599000 TestFramework (INFO): Cleaning up
    102017-08-08 03:26:43.612000 TestFramework (INFO): Tests successful
    

    After:

     02017-08-08 03:24:04.319000 TestFramework (INFO): Initializing test directory /var/folders/1v/8_69hby54nj2k3n6fywt44x80000gn/T/testoqeyi50_
     12017-08-08 03:24:07.035000 TestFramework (INFO): Mining blocks...
     22017-08-08 03:24:07.317000 TestFramework (INFO): test getmemoryinfo
     32017-08-08 03:24:07.763000 TestFramework (INFO): test gettxout
     42017-08-08 03:24:25.715000 TestFramework (INFO): check -rescan
     52017-08-08 03:24:27.792000 TestFramework (INFO): check -reindex
     62017-08-08 03:24:29.797000 TestFramework (INFO): check -zapwallettxes=1
     72017-08-08 03:24:32.207000 TestFramework (INFO): check -zapwallettxes=2
     82017-08-08 03:24:36.812000 TestFramework (INFO): Stopping nodes
     92017-08-08 03:24:37.915000 TestFramework (INFO): Cleaning up
    102017-08-08 03:24:37.927000 TestFramework (INFO): Tests successful
    

    This largely improves the time spent in Travis (under evaluation).

  2. promag commented at 3:14 am on August 8, 2017: member
    With this change the time should be significantly lower than 5 seconds. This was motivated by @jnewbery comment.
  3. promag force-pushed on Aug 8, 2017
  4. promag force-pushed on Aug 8, 2017
  5. promag renamed this:
    WIP: Improve shutdown process
    Improve shutdown process
    on Aug 8, 2017
  6. promag commented at 3:39 am on August 8, 2017: member

    For the record, running with version libevent@2.1.8.

    If there is a minimum version required, can it be defined, like boost?

  7. jnewbery commented at 9:48 am on August 8, 2017: member

    This PR should just have 2195bb0e579336f2b9adfb9a2e23957e794f5621. You can remove the other commits.

    You could potentially protect this change with a preprocessor:

    0#if LIBEVENT_VERSION_NUMBER >= 0x02xxxxxx
    

    I don’t know what version of libevent the event_base_loopexit behaviour was fixed. Perhaps @laanwj can offer some clues (he’s the author of #6990).

    Definitely seems like a good win for test run times.

  8. in src/httpserver.cpp:500 in 2195bb0e57 outdated
    487-        // (see discussion in https://github.com/bitcoin/bitcoin/pull/6990)
    488-        if (threadResult.valid() && threadResult.wait_for(std::chrono::milliseconds(2000)) == std::future_status::timeout) {
    489-            LogPrintf("HTTP event loop did not exit within allotted time, sending loopbreak\n");
    490-            event_base_loopbreak(eventBase);
    491-        }
    492+        event_base_loopexit(eventBase, nullptr);
    


    laanwj commented at 9:51 am on August 8, 2017:
    I don’t get it - you don’t give any rationale for reverting this and removing the comment.

    laanwj commented at 9:55 am on August 8, 2017:
    As I understand it, removing the timeout here (resulting in instant shutdown) means that responses from RPC stop might not come back, as the http event handler is killed before that could finish. This resulted in intermittent travis errors.

    promag commented at 9:57 am on August 8, 2017:
    No, that happens when evhttp_free is called.

    laanwj commented at 9:59 am on August 8, 2017:
    I’m sure that at least used to be the case, we had problems here in the past, this is why this particular shutdown sequence was changed so many times.
  9. laanwj commented at 9:53 am on August 8, 2017: member

    I don’t know what version of libevent the event_base_loopexit behaviour was fixed. Perhaps @laanwj can offer some clues (he’s the author of #6990).

    No, I don’t know, and we certainly can’t assume that everyone is building against a version where this is fixed. Not least because libevent’s release schedule is very slow.

  10. laanwj added the label RPC/REST/ZMQ on Aug 8, 2017
  11. promag commented at 9:58 am on August 8, 2017: member
    @jnewbery yes extra commits were mistakenly pushed when testing with other branch. Will fix.
  12. promag commented at 9:59 am on August 8, 2017: member
    @laanwj what about keeping the older code for older libevents?
  13. laanwj commented at 10:00 am on August 8, 2017: member

    @laanwj what about keeping the older code for older libevents?

    Yes, I’d say that is a requirement.

  14. promag force-pushed on Aug 8, 2017
  15. promag commented at 12:36 pm on August 8, 2017: member

    @laanwj @jnewbery will try to dissect which version has bad behaviour.

    Will add a comment before event_base_loopexit.

  16. promag commented at 6:36 am on August 9, 2017: member
    With libevent@2.0.21 (the version that supposedly has issues with event_base_loopexit everything seems to work (at least on my system). Maybe the problem was in other place? I’ll dig a bit more.
  17. laanwj commented at 9:46 am on August 9, 2017: member

    everything seems to work (at least on my system)

    As I remember the intermittent errors on issuing stop, might well not happen locally in good-weather conditions. As we all know, travis VMs run on overbooked, extremely busy servers, so it’s more likely to happen there. Maybe it can be simulated by adding an artificial delay somewhere, my hunch would be to have the client side read the RPC response to stop after a delay.

  18. jnewbery commented at 8:35 pm on August 9, 2017: member

    @laanwj - forgive me, I’m trying to piece together the history of changes to the shutdown procedure:

    • In #6719, you changed from using event_base_loopbreak() to event_base_loopexit() with a timeout. That was to make sure that the server shut down gracefully and didn’t just drop responses to pending requests.
    • In #6990, you say that ‘event_base_loopexit was not doing what I expected it to. What I expected was that it sets a timeout, given that no other pending events it would exit in N seconds. However, what it does was delay the event loop exit with N seconds, even if nothing is pending.’ I can’t understand the difference between those two statements. That PR switches from calling event_base_loopexit() to call event_base_loopbreak() after a 2 second delay.
    • Later in the same PR, you say ‘it appears that with libevent master, this issue doesn’t exist. event_base_loopexit does actually do what I expect it to in the OP.’ Looking at the documentation, if event_base_loopexit is called with a time value, then it will always wait for that long before exiting. If it’s called without a time value, it’ll exit after the next base loop iteration (http://www.wangafu.net/~nickm/libevent-2.1/doxygen/html/event_8h.html#ab89f0c907906f784fc80a3107016c3dc)

    So I think what we want is to just call event_base_loopexit() without a time value. That would allow the current loop to exit, so shutdown should be graceful. I can’t tell from the history or your notes whether you tried that before. Is there any reason that wouldn’t work? (specifically, I’m just looking for your issue with event_base_loopexit was)

  19. jnewbery commented at 8:36 pm on August 9, 2017: member

    Oh, and Concept ACK:

    locally with this PR:

    0TEST                           | STATUS    | DURATION
    1
    2abandonconflict.py             | ✓ Passed  | 4 s 
    3...
    4zmq_test.py                    | ✓ Passed  | 2 s 
    5
    6ALL                            | ✓ Passed  | 630 s (accumulated)
    7Runtime: 162 s
    

    locally without this PR:

    0TEST                           | STATUS    | DURATION
    1
    2abandonconflict.py             | ✓ Passed  | 14 s
    3...
    4zmq_test.py                    | ✓ Passed  | 7 s 
    5
    6ALL                            | ✓ Passed  | 985 s (accumulated)
    7Runtime: 269 s
    

    Seems like a huge win (40% over the BASE_SCRIPTS test suite)

  20. promag commented at 8:42 pm on August 9, 2017: member

    Nice @jnewbery. Furthermore, the the docs say

    0tv	the amount of time after which the loop should terminate, or NULL to exit after running all currently active events.
    

    And while the RPC response is being handled, there is an active event.

  21. promag commented at 8:44 pm on August 9, 2017: member
    IMO we should just add the loop exit, and keep the break after the timeout.
  22. jnewbery commented at 8:55 pm on August 9, 2017: member

    IMO we should just add the loop exit, and keep the break after the timeout.

    Sounds reasonable if you’re able to test this against old versions of libevent (and pending any further background info from @laanwj)

  23. promag force-pushed on Aug 12, 2017
  24. promag commented at 3:33 am on August 12, 2017: member

    @laanwj you mean:

     0diff --git a/test/functional/test_framework/authproxy.py b/test/functional/test_framework/authproxy.py
     1index b3671cbdc..5a2f9c514 100644
     2--- a/test/functional/test_framework/authproxy.py
     3+++ b/test/functional/test_framework/authproxy.py
     4@@ -166,6 +166,8 @@ class AuthServiceProxy(object):
     5     def _get_response(self):
     6         req_start_time = time.time()
     7         try:
     8+            if self._service_name == 'stop':
     9+                time.sleep(3)
    10             http_response = self.__conn.getresponse()
    11         except socket.timeout as e:
    12             raise JSONRPCException({
    

    All looks good, even with libevent@2.0.21.

  25. Improve shutdown process 793667af1c
  26. promag force-pushed on Aug 12, 2017
  27. laanwj commented at 2:06 pm on October 18, 2017: member
    utACK 793667a, my concerns have been addressed.
  28. laanwj merged this on Oct 18, 2017
  29. laanwj closed this on Oct 18, 2017

  30. laanwj referenced this in commit a1d78b59fc on Oct 18, 2017
  31. zkbot referenced this in commit fb2f98e00b on Oct 23, 2017
  32. sickpig referenced this in commit 46b5b86be5 on Mar 9, 2018
  33. sickpig referenced this in commit dadc2880a1 on Mar 9, 2018
  34. sickpig referenced this in commit 6d4a55878a on Mar 12, 2018
  35. sickpig referenced this in commit 6db640a393 on Mar 12, 2018
  36. sickpig referenced this in commit 76cf243311 on Mar 12, 2018
  37. sickpig referenced this in commit 4428056252 on Mar 12, 2018
  38. marlengit referenced this in commit 297ac08c3a on Sep 25, 2018
  39. laanwj referenced this in commit a88bd3186d on Dec 6, 2018
  40. jasonbcox referenced this in commit 355b1f1d53 on Mar 9, 2019
  41. jonspock referenced this in commit c21009decf on Mar 10, 2019
  42. jonspock referenced this in commit f01c587ff0 on Mar 11, 2019
  43. codablock referenced this in commit 7f679e6d4c on Oct 14, 2019
  44. codablock referenced this in commit 835636d9b0 on Oct 14, 2019
  45. codablock referenced this in commit a8e49d33e0 on Oct 14, 2019
  46. codablock referenced this in commit fefe07003b on Oct 14, 2019
  47. barrystyle referenced this in commit 85d3844af2 on Jan 22, 2020
  48. barrystyle referenced this in commit 32ea3cc4ef on Jan 22, 2020
  49. TheArbitrator referenced this in commit 7df7b4f53c on Jun 23, 2021
  50. TheArbitrator referenced this in commit 47b8deeda8 on Jun 25, 2021
  51. Munkybooty referenced this in commit aa25265942 on Aug 2, 2021
  52. Munkybooty referenced this in commit 2e1fb3f3e8 on Aug 3, 2021
  53. Munkybooty referenced this in commit 21d93c1aea on Aug 5, 2021
  54. Munkybooty referenced this in commit 36b2f4e0f0 on Aug 5, 2021
  55. Munkybooty referenced this in commit 6db39063f8 on Aug 8, 2021
  56. Munkybooty referenced this in commit 9740c21531 on Aug 11, 2021
  57. Munkybooty referenced this in commit 09963a6af0 on Aug 11, 2021
  58. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-24 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me