http: speed up shutdown #6990

pull laanwj wants to merge 1 commits into bitcoin:master from bitcoin:2015_11_threadexit changing 3 files +22 −12
  1. laanwj commented at 5:12 pm on November 11, 2015: member

    Pardon a bit of iteration. This continues/fixes #6719.

    event_base_loopexit was not doing what I expected it to. What I expected was that it sets a timeout, given that no other pending events it would exit in N seconds. However, what it does was delay the event loop exit with N seconds, even if nothing is pending.

    Solve it in a different way: give the event loop thread time to exit out of itself, and if it doesn’t, send loopbreak (and then join it).

    This speeds up the RPC tests a lot, each exit incurred a 10 second overhead, with this change there should be no shutdown overhead in the common case and up to two seconds if the event loop is blocking.

    As a bonus this breaks dependency on the global boost::thread_group, as the HTTP server minds its own offspring.

    Time for rpctests with this patch:

    0real    7m50.215s
    1user    0m30.152s
    2sys     0m11.752s
    

    Without:

    0real    13m12.069s
    1user    0m30.412s
    2sys     0m11.168s
    
  2. laanwj added the label RPC on Nov 11, 2015
  3. dcousens commented at 9:01 pm on November 11, 2015: contributor

    event_base_loopbreak was not doing what I expected it to. What I expected was that it sets a timeout, given that no other pending events it would exit in N seconds. However, what it does was delay the event loop exit with N seconds, even if nothing is pending.

    Sounds like you wanted event_base_loopexit? Ref:

    The next event_base_loop() iteration after the given timer expires will complete normally (handling all queued events) then exit without blocking for events again.

    In any case, event_base_loopbreak should still exit immediately, stopping any loops after the next event:

    event_base_loop() will abort the loop after the next event is completed; event_base_loopbreak() is typically invoked from this event’s callback. This behavior is analogous to the “break;” statement.

    Are you sure this isn’t symptomatic of something else?

    Reference: http://www.wangafu.net/~nickm/libevent-2.0/doxygen/html/event_8h.html#a07a7599e478e4031fa8cf52e26d8aa1e

    edit: Right, never mind, you were using event_base_loopexit, hence the confusion. You’re OP (before it was edited) makes it sound like you switched from event_base_loopbreak.

  4. in src/httpserver.cpp: in 856be4fe46 outdated
    474@@ -480,6 +475,14 @@ void StopHTTPServer()
    475         workQueue->WaitExit();
    476         delete workQueue;
    477     }
    478+    if (eventBase) {
    479+        LogPrint("http", "Waiting for HTTP event thread to exit\n");
    480+        // Give event loop a few seconds to exit (to send back last RPC responses), then break it
    481+        if (!threadHTTP.try_join_for(boost::chrono::milliseconds(2000))) {
    


    dcousens commented at 9:06 pm on November 11, 2015:
    Out of interest, why do we have to wait [a hard coded timeout]? Why can’t we wait for all events to finish (and maybe just ignore all new incoming connections?)

    MarcoFalke commented at 9:11 pm on November 11, 2015:

    all events to finish

    may take a long time (minutes) depending what was going in in the past?


    dcousens commented at 9:27 pm on November 11, 2015:
    @MarcoFalke but this doesn’t stop new connections (or is that done elsewhere?) in that waiting time. So conceptually, the status quo could be the same? That is, new connections coming in, old ones still finishing.

    theuni commented at 4:14 am on November 12, 2015:

    New connections are stopped just above, see where the accept sockets are torn down.

    Though it does look like we should close connected sockets in http_reject_request_cb as well for good measure. @laanwj Is there a reason to keep them open once they’ve had a req rejected?


    laanwj commented at 6:34 am on November 12, 2015:
    • Waiting for all events to finish could take forever. There’s tons of ways that someone could keep a connection option by slowly sending data. Also, timing events (such as the walletpassphrase timeout) may hold up the event queue.
    • Just breaking off everything in progress immediately prevents the answer to stop from getting back. We could just declare this a feature of course and be done with it… But it was a race condition, sometimes you would get a reply, sometimes not. We could define stop to cease all RPC activity immediately. But I think I like this better.
    • There AFAIK is no way to ask libevent, nor libevent_http if the events in the queue are important enough to wait for or not. And specially tagging the connection that sends ‘stop’ is quite ugly, not sure if even possible without diving into internal data structures.

    For context read #6719.

    I’m not 100% happy with this solution either, but I wasn’t able to find another way to do it, and it’s better than the previous one. @cfields sure, you can close the connection afterwards there, but that doesn’t solve this issue.


    laanwj commented at 6:42 am on November 12, 2015:

    @MarcoFalke but this doesn’t stop new connections (or is that done elsewhere?) in that waiting time. So conceptually, the status quo could be the same?

    It does:

    0        // Unlisten sockets
    1        BOOST_FOREACH (evhttp_bound_socket *socket, boundSockets) {
    2            evhttp_del_accept_socket(eventHTTP, socket);
    3        }
    4        // Reject requests on current connections
    5        evhttp_set_gencb(eventHTTP, http_reject_request_cb, NULL);
    

    This closes the port and makes new requests error (so that no new work items are created). But that’s best-effort. Current connections could still do e.g. a slowloris attack. The two seconds are a belt and suspenders, nothing more.

  5. dcousens commented at 9:06 pm on November 11, 2015: contributor
    utACK
  6. jonasschnelli commented at 7:46 am on November 12, 2015: contributor
    utACK.
  7. laanwj commented at 8:48 am on November 12, 2015: member

    Some more information: it appears that with libevent master, this issue doesn’t exist. event_base_loopexit does actually do what I expect it to in the OP. As that is the version I’m normally using, that is why I didn’t notice before.

    Edit: so I think we should still do this for backward compat, but at least I’m not crazy :)

  8. dcousens commented at 1:00 am on November 13, 2015: contributor
    @laanwj maybe just inline a comment to mention this might be fixed in latest version of libevent? Could be useful for some future refactoring, save someone trying to establish whether there was a reason we moved away from it that they aren’t aware of.
  9. laanwj commented at 9:14 am on November 13, 2015: member

    Will do

    Edit: OK, added comment and updated commit message.

  10. laanwj commented at 9:14 am on November 13, 2015: member
    Huh! I pushed this branch to bitcoin/bitcoin instead of my own repository. Oh well, no harm done, I’ll just remove it again when it’s merged.
  11. laanwj force-pushed on Nov 13, 2015
  12. laanwj force-pushed on Nov 13, 2015
  13. http: speed up shutdown
    This continues/fixes #6719.
    
    `event_base_loopbreak` was not doing what I expected it to, at least in
    libevent 2.0.21.
    What I expected was that it sets a timeout, given that no other pending
    events it would exit in N seconds. However, what it does was delay the
    event loop exit with 10 seconds, even if nothing is pending.
    
    Solve it in a different way: give the event loop thread time to exit
    out of itself, and if it doesn't, send loopbreak.
    
    This speeds up the RPC tests a lot, each exit incurred a 10 second
    overhead, with this change there should be no shutdown overhead in the
    common case and up to two seconds if the event loop is blocking.
    
    As a bonus this breaks dependency on boost::thread_group, as the HTTP
    server minds its own offspring.
    a264c32e33
  14. laanwj force-pushed on Nov 13, 2015
  15. laanwj commented at 11:24 am on November 13, 2015: member
    @theuni can you take a look here as well? Edit: nm, you already did, but didn’t ack
  16. gmaxwell commented at 7:32 pm on November 13, 2015: contributor
    ACK.
  17. gmaxwell merged this on Nov 13, 2015
  18. gmaxwell closed this on Nov 13, 2015

  19. gmaxwell referenced this in commit dbd2c135dd on Nov 13, 2015
  20. dcousens commented at 8:51 am on November 14, 2015: contributor
    Awesome, thanks @laanwj :)
  21. MarcoFalke commented at 1:05 pm on November 15, 2015: member

    Post merge tested ACK! @laanwj this makes travis a lot faster. (runtime reduced for some rpc tests by 90%)

    Before:

    0$ time for i in {1..10};do qa/pull-tester/rpc-tests.py disablewallet;done
    1real    1m53.003s
    2user    0m1.135s
    3sys 0m0.437s
    4
    5$ time for i in {1..10};do qa/pull-tester/rpc-tests.py wallet;done
    6real    10m4.888s
    7user    0m21.829s
    8sys 0m8.239s
    

    After:

    0disablewallet:
    1real    0m13.809s
    2user    0m1.166s
    3sys 0m0.510s
    4
    5wallet:
    6real    5m7.703s
    7user    0m22.128s
    8sys 0m8.222s
    
  22. laanwj deleted the branch on Nov 30, 2015
  23. zkbot referenced this in commit fb2f98e00b on Oct 23, 2017
  24. sickpig referenced this in commit 46b5b86be5 on Mar 9, 2018
  25. sickpig referenced this in commit dadc2880a1 on Mar 9, 2018
  26. sickpig referenced this in commit 6d4a55878a on Mar 12, 2018
  27. sickpig referenced this in commit 6db640a393 on Mar 12, 2018
  28. sickpig referenced this in commit 76cf243311 on Mar 12, 2018
  29. sickpig referenced this in commit 4428056252 on Mar 12, 2018
  30. marlengit referenced this in commit 297ac08c3a on Sep 25, 2018
  31. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-05 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me