Correctly terminate HTTP server #13501

pull promag wants to merge 5 commits into bitcoin:master from promag:2018-06-loopexit changing 1 files +41 −18
  1. promag commented at 10:48 pm on June 18, 2018: member

    Fixes #11777. Replaces #13492.

    This PR tracks current HTTP connections in order to disable reading on shutdown while allowing all remaining events to process. This allows the event loop to exit gracefully.

    This happens in the test framework because the RPC client never closes the connection to the node — the same connection is used for all RPC to that node — even when stop is called.

  2. fanquake added the label RPC/REST/ZMQ on Jun 18, 2018
  3. promag commented at 10:51 pm on June 18, 2018: member

    This is an alternative to #13492 that doesn’t require bumping the minimum libevent2 version. This was submitted in a different branch to keep the other version available.

    To test apply

     0diff --git a/src/rpc/server.cpp b/src/rpc/server.cpp
     1index 10040b125..57dad383c 100644
     2--- a/src/rpc/server.cpp
     3+++ b/src/rpc/server.cpp
     4@@ -236,6 +236,7 @@ UniValue stop(const JSONRPCRequest& jsonRequest)
     5     // Event loop will exit after current HTTP requests have been handled, so
     6     // this reply will get back to the client.
     7     StartShutdown();
     8+    MilliSleep(2000);
     9     return "Bitcoin server stopping";
    10 }
    

    and then

    0bitcoind -regtest
    1bitcoin-cli -regtest stop
    
  4. ken2812221 commented at 11:47 pm on June 18, 2018: contributor
    That event_base_loopexit is still requied though.
  5. promag commented at 7:35 am on June 19, 2018: member
    No, it shouldn’t be. As soon as there are no more active or pending events, the event loop should exit.
  6. promag force-pushed on Jun 19, 2018
  7. promag commented at 3:16 pm on June 19, 2018: member
    Updated.
  8. promag commented at 1:21 pm on June 21, 2018: member
    @ken2812221 care to review?
  9. ken2812221 commented at 2:46 pm on June 21, 2018: contributor
    Looks like shutdown would take more time, it takes 30 mins to complete tests on travis, I would rather reopen #13485
  10. promag commented at 8:37 pm on June 23, 2018: member
    @ken2812221 I’ll try to reproduce.
  11. MarcoFalke commented at 9:52 pm on August 8, 2018: member

    Going to close and reopen, to get a fresh travis run.

    Previous run: https://travis-ci.org/bitcoin/bitcoin/builds/394134101

  12. MarcoFalke closed this on Aug 8, 2018

  13. MarcoFalke reopened this on Aug 8, 2018

  14. MarcoFalke commented at 9:53 pm on August 8, 2018: member

    New run: https://travis-ci.org/bitcoin/bitcoin/builds/413803664

    Hmm.. Still takes longer than on master, I think

  15. promag commented at 9:59 pm on August 8, 2018: member
    Thanks @MarcoFalke.
  16. DrahtBot commented at 5:05 am on August 9, 2018: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    No conflicts as of last run.

  17. promag force-pushed on Aug 12, 2018
  18. promag commented at 0:51 am on August 12, 2018: member
    Tried to change client and server timeouts and looks like it terminates earlier. This indicates that either the client or server are not terminating gracefully.
  19. promag force-pushed on Aug 13, 2018
  20. promag force-pushed on Aug 14, 2018
  21. promag force-pushed on Aug 14, 2018
  22. promag force-pushed on Aug 17, 2018
  23. promag commented at 0:36 am on August 17, 2018: member
    @MarcoFalke @ken2812221 please test again. Will update OP later.
  24. promag force-pushed on Aug 17, 2018
  25. promag renamed this:
    Remove race on shutdown between event loop exit and http reply
    Correctly terminate HTTP server
    on Aug 17, 2018
  26. promag force-pushed on Aug 17, 2018
  27. promag commented at 1:48 am on August 17, 2018: member
    Updated OP.
  28. ken2812221 commented at 2:20 am on August 17, 2018: contributor
    It still shows “EOF reached” on bitcoin-cli when I add MilliSleep(1000) at https://github.com/bitcoin/bitcoin/blob/3c8d1ae15352d5c92d5903536c8fe07f771e97a0/src/rpc/server.cpp#L238-L239
  29. promag force-pushed on Aug 17, 2018
  30. promag commented at 8:44 am on August 17, 2018: member
    @ken2812221 sorry, forgot to add a hunk to the commit 😕
  31. laanwj added the label Bug on Aug 17, 2018
  32. laanwj added this to the milestone 0.18.0 on Aug 17, 2018
  33. ken2812221 commented at 1:01 am on August 21, 2018: contributor
    ~Tested ACK d1c35ed~
  34. in src/httpserver.cpp:480 in d1c35eda99 outdated
    475@@ -464,21 +476,23 @@ void StopHTTPServer()
    476         delete workQueue;
    477         workQueue = nullptr;
    478     }
    479+    // Disable reading on current connections
    480+    for (evhttp_connection* conn : g_http_connections) {
    


    laanwj commented at 10:37 am on August 21, 2018:
    wouldn’t there be unprotected concurrent acceess to g_http_connections from multiple threads here? (the HTTP event thread as well as the thread in which StopHTTPServer is called) if so, it needs a mutex

    promag commented at 11:24 am on August 21, 2018:
    I thought the callback would be called from the event loop thread, but looks like it’s not the case. Pushed a commit that adds a mutex.
  35. promag force-pushed on Aug 21, 2018
  36. ken2812221 commented at 10:06 pm on October 2, 2018: contributor
    @promag I would prefer this fix than #13485, but this seems to break things on Windows. Would you mind take a look again?
  37. promag force-pushed on Oct 3, 2018
  38. promag force-pushed on Oct 4, 2018
  39. promag force-pushed on Oct 4, 2018
  40. promag force-pushed on Oct 4, 2018
  41. promag force-pushed on Oct 4, 2018
  42. promag force-pushed on Oct 4, 2018
  43. promag force-pushed on Oct 4, 2018
  44. promag force-pushed on Oct 4, 2018
  45. promag force-pushed on Oct 4, 2018
  46. promag force-pushed on Oct 4, 2018
  47. promag force-pushed on Oct 4, 2018
  48. promag commented at 11:11 pm on October 4, 2018: member
    @ken2812221 try again 🎁
  49. ken2812221 commented at 4:47 pm on October 5, 2018: contributor
    tACK ee05f25eb600d8dc45d21f43dab899aacc7d6784
  50. in src/httpserver.cpp:148 in ee05f25eb6 outdated
    143@@ -144,6 +144,9 @@ struct evhttp* eventHTTP = nullptr;
    144 static std::vector<CSubNet> rpc_allow_subnets;
    145 //! Work queue for handling longer requests off the event loop thread
    146 static WorkQueue<HTTPClosure>* workQueue = nullptr;
    147+//! Set of current HTTP connections
    148+static std::set<evhttp_connection*> g_http_connections;
    


    ken2812221 commented at 6:26 pm on October 5, 2018:
    nit: static std::unordered_set<evhttp_connection*> g_http_connections GUARDED_BY(g_http_connections_cs);
  51. in src/httpserver.cpp:149 in ee05f25eb6 outdated
    143@@ -144,6 +144,9 @@ struct evhttp* eventHTTP = nullptr;
    144 static std::vector<CSubNet> rpc_allow_subnets;
    145 //! Work queue for handling longer requests off the event loop thread
    146 static WorkQueue<HTTPClosure>* workQueue = nullptr;
    147+//! Set of current HTTP connections
    148+static std::set<evhttp_connection*> g_http_connections;
    149+static std::mutex g_http_connections_cs;
    


    ken2812221 commented at 6:31 pm on October 5, 2018:
    nit: use Mutex and LOCK for easier debugging.
  52. bugfix: Correctly terminate HTTP server f9d60febdc
  53. promag force-pushed on Oct 5, 2018
  54. ken2812221 approved
  55. ken2812221 commented at 4:39 pm on October 9, 2018: contributor
    tACK f9d60febdc265824d5f183ef6e0756875869ddb0
  56. sipa commented at 5:35 am on October 19, 2018: member

    Concept ACK f9d60febdc265824d5f183ef6e0756875869ddb0.

    I’m not familiar enough with libevent to judge the API calls, but the Bitcoin Core side of things looks good.

    A review by @laanwj would be good, I think.

  57. in src/httpserver.cpp:17 in f9d60febdc outdated
    13@@ -14,6 +14,7 @@
    14 #include <ui_interface.h>
    15 
    16 #include <memory>
    17+#include <set>
    


    laanwj commented at 4:45 pm on October 24, 2018:
    unordered_set? there’s no traversing here

    promag commented at 8:50 pm on October 24, 2018:
    IIRC for small collections std::set is preferable. I can change if that sounds better to you.
  58. in src/httpserver.cpp:492 in f9d60febdc outdated
    499-            event_base_loopbreak(eventBase);
    500-        }
    501+        event_base_loopbreak(eventBase);
    502         threadHTTP.join();
    503+        // Process remaining events.
    504+        event_base_dispatch(eventBase);
    


    ryanofsky commented at 6:16 pm on November 1, 2018:

    The changes in this if (eventBase) section don’t seem directly related to the other changes in this PR. If I understand the eventlib documentation correctly, the behavior of the new code is identical to previous code except there is no longer a 2 second timeout.

    So is the point of this change just cleanup? It seems like it might be safer to keep the 2 second timeout.


    promag commented at 11:02 pm on November 1, 2018:

    This is necessary in order to have a clean event loop quit, which means all active events processed and there’s no pending events. In some cases (long response/slow client) the timeout can cause errors on the client side because the client receives an unexpected connection close.

    Ideally it would be enough:

    0// Wait the event loop to quit
    1threadHTTP.join();
    

    but in windows only this works (sorry I don’t know why):

    0// Wait the event loop to quit in *this* thread
    1event_base_loopbreak(eventBase);
    2threadHTTP.join();
    3// Process remaining events.
    4event_base_dispatch(eventBase);
    

    ryanofsky commented at 6:21 pm on November 2, 2018:

    This is necessary in order to have a clean event loop quit

    It seems like only the part of this change needed to exit cleanly is removing the racy event_base_loopexit added in #11006. Aside from that, it looks like the previous code would also exit cleanly, unless it took longer than 2 seconds.

    If the 2 second timeout causes problems, maybe it could be increased, but it doesn’t seem like seem like a great idea to me to delay shutdown forever writing data to clients. It also seems nicer that the previous code didn’t need an inexplicable windows workaround.

    On the other hand, if we really do want to wait forever here instead of timing out, it would be good to delete the std::future/std::packaged_task code added in 755aa05174e06effd758eeb78c5af9fb465e9611, since now it is sitting there unused.

  59. in src/httpserver.cpp:483 in f9d60febdc outdated
    474@@ -457,21 +475,21 @@ void StopHTTPServer()
    475         delete workQueue;
    476         workQueue = nullptr;
    477     }
    478+    {
    479+        LogPrint(BCLog::HTTP, "Disabling reading on current connections\n");
    480+        LOCK(g_http_connections_cs);
    481+        for (evhttp_connection* conn : g_http_connections) {
    482+            bufferevent* bev = evhttp_connection_get_bufferevent(conn);
    483+            if (bev) bufferevent_disable(bev, EV_READ);
    


    ryanofsky commented at 6:21 pm on November 1, 2018:
    Is checking bev just defensive here? Would it ever be expected to be null? It’d be good to have a sentence or shorter comment here to suggest what this does.

    promag commented at 10:54 pm on November 1, 2018:
    No I don’t think it can be null, but I’ll check libevent code. This “protection” is already on master and I followed it.
  60. in src/httpserver.cpp:229 in f9d60febdc outdated
    225+static void http_request_cb(struct evhttp_request* req, void* /* arg */)
    226 {
    227+    {
    228+        LOCK(g_http_connections_cs);
    229+        evhttp_connection* conn = evhttp_request_get_connection(req);
    230+        if (g_http_connections.insert(conn).second) {
    


    ryanofsky commented at 6:22 pm on November 1, 2018:
    Confused by the check. Would this ever be false? Short comment here would be helpful.

    promag commented at 10:50 pm on November 1, 2018:
    One connection can carry multiple request/response pairs.

    ryanofsky commented at 5:29 pm on November 2, 2018:

    Won’t this be buggy if a client opens a connection but doesn’t send a complete request before the server starts to shut down? In that case this code won’t get called, so the connection pointer will never be added to g_http_connections, so event_base_dispatch would never return, and bitcoind would hang forever on shutdown.

    I don’t see a way to be notified directly about new connections, so maybe there is nothing we can do about this, but it seems worth commenting about if there isn’t a better approach or workaround.


    promag commented at 12:06 pm on November 5, 2018:

    @ryanofsky tried the following:

    0bitcoind -regtest
    1nc localhost 18443
    2bitcoin-cli -regtest stop
    

    at this point the server waits until it timeouts the request (by default 30 seconds).

    If a request is sent before the timeout the result is:

     0POST / HTTP/1.1
     1Authorization: Basic ...
     2Content-Type: application/json
     3Content-Length: 44
     4
     5{"jsonrpc": "2.0","method":"help","id":123}
     6HTTP/1.1 503 Service Unavailable
     7Content-Type: text/html
     8Connection: close
     9Date: Mon, 05 Nov 2018 11:58:34 GMT
    10Content-Length: 110
    11
    12<HTML><HEAD>
    13<TITLE>503 Service Unavailable</TITLE>
    14</HEAD><BODY>
    15<H1>Service Unavailable</H1>
    16</BODY></HTML>
    

    If we consider that stop RPC triggers a graceful shutdown then this LGTM.

  61. in src/httpserver.cpp:149 in f9d60febdc outdated
    144@@ -144,6 +145,9 @@ struct evhttp* eventHTTP = nullptr;
    145 static std::vector<CSubNet> rpc_allow_subnets;
    146 //! Work queue for handling longer requests off the event loop thread
    147 static WorkQueue<HTTPClosure>* workQueue = nullptr;
    148+//! Set of current HTTP connections
    149+static Mutex g_http_connections_cs;
    


    ryanofsky commented at 6:51 pm on November 1, 2018:
    Maybe use suffix _mutex instead of _cs. I think people have generally moved on to calling critical sections mutexes.

    promag commented at 10:50 pm on November 1, 2018:
    Done.
  62. in src/httpserver.cpp:479 in f9d60febdc outdated
    474@@ -457,21 +475,21 @@ void StopHTTPServer()
    475         delete workQueue;
    476         workQueue = nullptr;
    477     }
    478+    {
    479+        LogPrint(BCLog::HTTP, "Disabling reading on current connections\n");
    


    ryanofsky commented at 7:01 pm on November 1, 2018:

    Can you add a comment about what the effect of disabling reading on the current connections is?

    If this just disables reading and doesn’t close the connections or interrupt pending writes, it seems like shutdown code could potentially get stuck forever (if a client isn’t reading and there is a pending write and data is backed up).


    promag commented at 10:49 pm on November 1, 2018:
    So the event base quits when there are no active or pending events. Writing the response is an active event which will go away at the end. Reading can be a pending event (for instance, the client sends stop RPC but keeps the connection open — which happens in the functional tests) so that will prevent the base to quit automatically.

    ryanofsky commented at 4:41 pm on November 2, 2018:

    Thanks for explaining, but I keep rereading this and coming away confused, because it is leaving out a lot of specifics. I think the following would be clearer if it is accurate:

    0// Disable read events so the event loop is able to exit even if there are
    1// open connections. (The preceding InterruptHTTPServer call stops accepting new
    2// connections and new requests on existing connections, but does not close
    3// existing connections.)
    4//
    5// Leave write events enabled because there may still be response data waiting
    6// to be written, and we want to avoid sending incomplete responses.
    
  63. in src/httpserver.cpp:608 in f9d60febdc outdated
    603@@ -586,11 +604,10 @@ void HTTPRequest::WriteReply(int nStatus, const std::string& strReply)
    604         // workaround above.
    605         if (event_get_version_number() >= 0x02010600 && event_get_version_number() < 0x02020001) {
    606             evhttp_connection* conn = evhttp_request_get_connection(req_copy);
    607-            if (conn) {
    608+            LOCK(g_http_connections_cs);
    609+            if (conn && g_http_connections.count(conn)) {
    


    ryanofsky commented at 7:09 pm on November 1, 2018:

    Comment above “Re-enable reading…” is out of date, or at least incomplete now, and would be good to update.

    I think I need to look more into the workaround here, but it’s not clear to me why this no longer enables write events during shutdown (or what the effect of that is). Hopefully it doesn’t mean that a pending writes would never complete…


    promag commented at 11:05 pm on November 1, 2018:
    Writes are always enabled and there is no reason to enable reading if g_http_connections.count(conn) == 0.

    ryanofsky commented at 6:49 pm on November 2, 2018:
    I think it would be helpful to update “Re-enable reading” to “Re-enable reading (unless g_http_connections has been cleared and the server is shutting down)”, because I think it’s hard to see otherwise how g_http_connections.count is relevant in this context.
  64. meshcollider commented at 7:37 pm on November 1, 2018: contributor
    Concept ACK
  65. rename to g_http_connections_mutex b28ed684ac
  66. add comment a826493a9e
  67. add comment ca43f55b0b
  68. remove unnecessary check d14515001b
  69. promag commented at 11:16 pm on November 1, 2018: member
    @ryanofsky thanks for the review.
  70. in src/httpserver.cpp:470 in d14515001b
    498-        // Before this was solved with event_base_loopexit, but that didn't work as expected in
    499-        // at least libevent 2.0.21 and always introduced a delay. In libevent
    500-        // master that appears to be solved, so in the future that solution
    501-        // could be used again (if desirable).
    502-        // (see discussion in https://github.com/bitcoin/bitcoin/pull/6990)
    503-        if (threadResult.valid() && threadResult.wait_for(std::chrono::milliseconds(2000)) == std::future_status::timeout) {
    


    jnewbery commented at 6:47 pm on November 2, 2018:
    I believe this is the only use of threadResult. Can you remove it entirely (currently declared at L437 and assigned at L446)
  71. jnewbery commented at 7:58 pm on November 2, 2018: member

    Tested ACK d14515001bb2bf87602af7553def5ebe0f6bcdbb. I’m not a libevent expert, so I can’t review the API calls, but the changes look reasonable to me, and the test suite runs fine locally for me.

    The minimum version of libevent officially supported is 2.0.22 (https://github.com/bitcoin/bitcoin/blob/750415701cb1c9e4cc717be7c810a054dbc2ead0/doc/dependencies.md). I’ve tested this on libevent 2.0.21. We should make sure that this gets tested on other versions of libevent.

    Commits should be squashed prior to merge.

  72. ryanofsky commented at 8:01 pm on November 2, 2018: member

    I think it would help this PR to have a more complete description. I would describe it as follows (feel free to copy any of it if useful).

    This PR does three things:

    1. It removes an event_base_loopexit optimization added in #11006 that turns out to cause a race condition described in #13492#issue-195407424, where incomplete HTTP responses are sent to clients on shutdown.

    2. It replaces the #11006 optimization with a new one. Without the optimization from #11006, the HTTP event loop would never exit cleanly if there were any open idle connections to the HTTP server, because it would be stuck waiting for read events on those connections. The new optimization implemented here disables these unnecessary waits on shutdown by calling bufferevent_disable, while still waiting for any pending writes to complete.

    3. It removes a two-second shutdown timeout, and instead will wait an infinite time for HTTP writes to complete. With the optimizations above, the two second timeout isn’t strictly necessary, since the loop will no longer wait forever when there are idle connections.

  73. promag closed this on Nov 6, 2018

  74. promag renamed this:
    Correctly terminate HTTP server
    ~~Correctly~~ terminate HTTP server
    on Nov 6, 2018
  75. promag commented at 3:19 pm on November 6, 2018: member

    After more digging in libevent source code I’ve realized that sending the header Connection: close is the correct way to trigger the connection close. Verified that this behavior exists since 2.0.22, see https://github.com/libevent/libevent/blob/c51b159cff9f5e86696f5b9a4c6f517276056258/http.c#L472.

    It is also unnecessary to keep track of connections as evhttp already handles that.

  76. laanwj commented at 4:10 pm on November 6, 2018: member

    sending the header Connection: close is the correct way to trigger the connection close

    That only signals to the client that the connection should be closed (e.g. no connection reuse), it doesn’t actually trigger a close from our side. (or does it? I’m confused now; at the least it doesn’t provide a guarantee that the connection will be closed any time soon, only after the request fully finished)

  77. promag commented at 4:26 pm on November 6, 2018: member

    From testing it does, see https://github.com/libevent/libevent/blob/c51b159cff9f5e86696f5b9a4c6f517276056258/http.c#L784-L786.

    Even with a non-http client, like nc, when I send that header the server will then close the connection.

  78. laanwj referenced this in commit a88bd3186d on Dec 6, 2018
  79. codablock referenced this in commit 835636d9b0 on Oct 14, 2019
  80. codablock referenced this in commit fefe07003b on Oct 14, 2019
  81. barrystyle referenced this in commit 32ea3cc4ef on Jan 22, 2020
  82. TheArbitrator referenced this in commit 7df7b4f53c on Jun 23, 2021
  83. TheArbitrator referenced this in commit 47b8deeda8 on Jun 25, 2021
  84. Munkybooty referenced this in commit aa25265942 on Aug 2, 2021
  85. Munkybooty referenced this in commit 2e1fb3f3e8 on Aug 3, 2021
  86. Munkybooty referenced this in commit 21d93c1aea on Aug 5, 2021
  87. Munkybooty referenced this in commit 36b2f4e0f0 on Aug 5, 2021
  88. Munkybooty referenced this in commit 6db39063f8 on Aug 8, 2021
  89. Munkybooty referenced this in commit 9740c21531 on Aug 11, 2021
  90. Munkybooty referenced this in commit 09963a6af0 on Aug 11, 2021
  91. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-18 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me