This revives #19420. Since promag is not so active at the moment, I can support this to finally get it merged.
The PR is rebased and comments by jonatack have been addressed.
Once this is merged, I will also reopen #19434.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
See the guideline for information on the review process.
Type | Reviewers |
---|---|
ACK | stickies-v, hebasto |
Concept ACK | brunoerg, promag |
Stale ACK | kevkevinpal |
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Running test/functional/feature_abortnode.py
before and after change we get a significant performance gain
running before changes f3bc1a72825fe2b51f4bc20e004cef464f05b965
0test/functional/feature_abortnode.py
12022-12-23T04:47:19.826000Z TestFramework (INFO): Initializing test directory /var/folders/9g/cvx014yx4dq5lwl_x5zvn_j80000gn/T/bitcoin_func_test_k3_6xric
22022-12-23T04:47:21.229000Z TestFramework (INFO): Waiting for crash
32022-12-23T04:47:51.429000Z TestFramework (INFO): Node crashed - now verifying restart fails
42022-12-23T04:47:52.086000Z TestFramework (INFO): Stopping nodes
52022-12-23T04:47:52.300000Z TestFramework (INFO): Cleaning up /var/folders/9g/cvx014yx4dq5lwl_x5zvn_j80000gn/T/bitcoin_func_test_k3_6xric on exit
62022-12-23T04:47:52.300000Z TestFramework (INFO): Tests successful
32.50 seconds to run test
running test with these changes applied 40528e7
0test/functional/feature_abortnode.py
12022-12-23T04:44:30.075000Z TestFramework (INFO): Initializing test directory /var/folders/9g/cvx014yx4dq5lwl_x5zvn_j80000gn/T/bitcoin_func_test_0h3w_rbx
22022-12-23T04:44:31.516000Z TestFramework (INFO): Waiting for crash
32022-12-23T04:44:31.837000Z TestFramework (INFO): Node crashed - now verifying restart fails
42022-12-23T04:44:32.452000Z TestFramework (INFO): Stopping nodes
52022-12-23T04:44:32.727000Z TestFramework (INFO): Cleaning up /var/folders/9g/cvx014yx4dq5lwl_x5zvn_j80000gn/T/bitcoin_func_test_0h3w_rbx on exit
62022-12-23T04:44:32.727000Z TestFramework (INFO): Tests successful
1.2 seconds to run test
147@@ -146,6 +148,10 @@ static GlobalMutex g_httppathhandlers_mutex;
148 static std::vector<HTTPPathHandler> pathHandlers GUARDED_BY(g_httppathhandlers_mutex);
149 //! Bound listening sockets
150 static std::vector<evhttp_bound_socket *> boundSockets;
151+//! Track active requests
152+static Mutex g_requests_mutex;
Maybe a GlobalMutex
would be more appropriate here?
0static GlobalMutex g_requests_mutex;
Mutex
vs GlobalMutex
?
I’m still newish to the repo, is there a convention on when we should use
Mutex
vsGlobalMutex
?
It depends on a scope. For the global namespace the GlobalMutex
is an appropriate option: https://github.com/bitcoin/bitcoin/blob/f3bc1a72825fe2b51f4bc20e004cef464f05b965/src/sync.h#L132-L141
Maybe a
GlobalMutex
would be more appropriate here?
This suggestion must fix the following errors:
0httpserver.cpp:221:13: error: calling function 'MaybeCheckNotHeld' requires negative capability '!g_requests_mutex' [-Werror,-Wthread-safety-analysis]
1 LOCK(g_requests_mutex);
2 ^
3./sync.h:258:56: note: expanded from macro 'LOCK'
4#define LOCK(cs) UniqueLock UNIQUE_NAME(criticalblock)(MaybeCheckNotHeld(cs), #cs, __FILE__, __LINE__)
5 ^
6httpserver.cpp:218:9: error: calling function 'MaybeCheckNotHeld' requires negative capability '!g_requests_mutex' [-Werror,-Wthread-safety-analysis]
7 LOCK(g_requests_mutex);
8 ^
9./sync.h:258:56: note: expanded from macro 'LOCK'
10#define LOCK(cs) UniqueLock UNIQUE_NAME(criticalblock)(MaybeCheckNotHeld(cs), #cs, __FILE__, __LINE__)
11 ^
12httpserver.cpp:481:9: error: calling function 'MaybeCheckNotHeld' requires negative capability '!g_requests_mutex' [-Werror,-Wthread-safety-analysis]
13 WAIT_LOCK(g_requests_mutex, lock);
14 ^
15./sync.h:263:45: note: expanded from macro 'WAIT_LOCK'
16#define WAIT_LOCK(cs, name) UniqueLock name(MaybeCheckNotHeld(cs), #cs, __FILE__, __LINE__)
17 ^
183 errors generated.
482+ while (!g_requests.empty()) {
483+ g_requests_cv.wait(lock);
484+ }
485+ }
486+ if (eventHTTP) {
487+ event_base_once(eventBase, -1, EV_TIMEOUT, [](evutil_socket_t, short, void*) {
g_thread_http
the eventHTTP
struct is being used.
219+ g_requests.insert(req);
220+ evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
221+ LOCK(g_requests_mutex);
222+ auto n = g_requests.erase(req);
223+ assert(n == 1);
224+ if (g_requests.empty()) g_requests_cv.notify_all();
g_requests_cv.notify_all()
and update comment above.
.notify_all()
if we are waiting for all the http requests to finish?
because that way you would get a log every time a connection closes. Otherwise g_requests_cv
should be renamed to g_requests_empty_cv
.
If g_requests_cv
is used to signal changes to g_requests
then also add call .notify_all()
after g_requests.insert(req)
.
476@@ -459,14 +477,22 @@ void StopHTTPServer()
477 evhttp_del_accept_socket(eventHTTP, socket);
478 }
479 boundSockets.clear();
480+ {
481+ WAIT_LOCK(g_requests_mutex, lock);
482+ while (!g_requests.empty()) {
483+ g_requests_cv.wait(lock);
482+ WAIT_LOCK(g_requests_mutex, lock);
483+ if (!g_requests.empty()) {
484+ LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size());
485+ }
486+ while (!g_requests.empty()) {
487+ g_requests_cv.wait(lock);
test/functional/feature_abortnode.py
From #15363 (comment):
The libevent HTTP interface is kind of crazy, so I don’t think it’s surprising that fixing this requires some trial and error. But it would be good to have tests for each issue that gets fixed to avoid going in circles.
Suggesting to add a test which will prevent any regressions in the future.
218+ LOCK(g_requests_mutex);
219+ g_requests.insert(req);
220+ g_requests_cv.notify_all();
221+ evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
222+ LOCK(g_requests_mutex);
223+ auto n = g_requests.erase(req);
Calling notify_all()
while holding a mutex is a pessimization. Suggesting to apply a diff:
0 auto n = WITH_LOCK(g_requests_mutex, return g_requests.erase(req));
212@@ -207,6 +213,19 @@ std::string RequestMethodString(HTTPRequest::RequestMethod m)
213 /** HTTP request callback */
214 static void http_request_cb(struct evhttp_request* req, void* arg)
215 {
216+ // Track requests and notify when a request is completed.
217+ {
218+ LOCK(g_requests_mutex);
219+ g_requests.insert(req);
Calling notify_all()
while holding a mutex is a pessimization. Suggesting to apply a diff:
0 WITH_LOCK(g_requests_mutex, g_requests.insert(req));
or to introduce a new scope.
483+ if (!g_requests.empty()) {
484+ LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size());
485+ }
486+ while (!g_requests.empty()) {
487+ g_requests_cv.wait(lock);
488+ }
While this code snippet is used across the current codebase, for new code I’d prefer another wait()
overload with an “inlined” stop_waiting
condition:
0 g_requests_cv.wait(lock, []() EXCLUSIVE_LOCKS_REQUIRED(g_requests_mutex) { return g_requests.empty(); });
147@@ -146,6 +148,10 @@ static GlobalMutex g_httppathhandlers_mutex;
148 static std::vector<HTTPPathHandler> pathHandlers GUARDED_BY(g_httppathhandlers_mutex);
149 //! Bound listening sockets
150 static std::vector<evhttp_bound_socket *> boundSockets;
151+//! Track active requests
152+static GlobalMutex g_requests_mutex;
153+static std::condition_variable g_requests_cv;
154+static std::set<evhttp_request*> g_requests GUARDED_BY(g_requests_mutex);
std::unordered_set
which has more efficient insert()
and erase()
methods?
502 }
503- if (eventHTTP) {
504- evhttp_free(eventHTTP);
505- eventHTTP = nullptr;
506- }
507 if (eventBase) {
if (eventBase)
blocks into a single one?
From #15363 (comment):
The libevent HTTP interface is kind of crazy, so I don’t think it’s surprising that fixing this requires some trial and error. But it would be good to have tests for each issue that gets fixed to avoid going in circles.
Suggesting to add a test which will prevent any regressions in the future. @hebasto like https://github.com/bitcoin/bitcoin/blob/7bb07bf8bd8e240562714f9cc4b7696868cb6c26/test/functional/feature_abortnode.py#L44 should take less than 5s or so?
From #15363 (comment):
The libevent HTTP interface is kind of crazy, so I don’t think it’s surprising that fixing this requires some trial and error. But it would be good to have tests for each issue that gets fixed to avoid going in circles.
Suggesting to add a test which will prevent any regressions in the future.
@hebasto like
should take less than 5s or so?
I have updated the test and it seems to be reliably failing on master, so I think this is a good enough regression test @hebasto ?
Also addressed other comments by @hebasto .
484+ WAIT_LOCK(g_requests_mutex, lock);
485+ g_requests_cv.wait(lock, []() EXCLUSIVE_LOCKS_REQUIRED(g_requests_mutex) {
486+ if (!g_requests.empty()) {
487+ LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size());
488+ }
489+ return g_requests.empty();
482- if (g_thread_http.joinable()) g_thread_http.join();
483+ {
484+ WAIT_LOCK(g_requests_mutex, lock);
485+ g_requests_cv.wait(lock, []() EXCLUSIVE_LOCKS_REQUIRED(g_requests_mutex) {
486+ if (!g_requests.empty()) {
487+ LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size());
213@@ -207,6 +214,17 @@ std::string RequestMethodString(HTTPRequest::RequestMethod m)
214 /** HTTP request callback */
215 static void http_request_cb(struct evhttp_request* req, void* arg)
216 {
217+ // Track requests and notify when a request is completed.
218+ {
219+ WITH_LOCK(g_requests_mutex, g_requests.insert(req));
220+ g_requests_cv.notify_all();
I hadn’t seen that, thanks for linking.
It’s a minor concern, but I’m a bit worried that we’re creating a very general contract ("g_requests_cv
will always notify for any changes to the g_requests
") on a global variable when it’s enforced outside of g_requests
, when in the future we might easily introduce mutations on g_requests
without notifying the cv. A more scoped cv, such as g_requests_cv_handled
largely mitigates that.
But, again, no big concern, and it’s not likely to cause any major problems should we miss something there. As you see fit.
FYI I think one approach to improve on this contract (and as a nice benefit, also encapsulate code a bit better - I think this improves readability) in the below diff.
(WaitUntilEmpty()
needs improvement because it shouldn’t directly be logging about the HTTP server like that. I guess we also don’t need a GlobalMutex
here.)
0diff --git a/src/httpserver.cpp b/src/httpserver.cpp
1index b481ef093..90ef37bd5 100644
2--- a/src/httpserver.cpp
3+++ b/src/httpserver.cpp
4@@ -134,6 +134,40 @@ struct HTTPPathHandler
5 HTTPRequestHandler handler;
6 };
7
8+class RequestTracker
9+{
10+public:
11+ typedef std::unordered_set<evhttp_request*> requests_t;
12+private:
13+ GlobalMutex m_requests_mutex;
14+ std::condition_variable m_requests_cv;
15+ requests_t m_requests GUARDED_BY(m_requests_mutex);
16+public:
17+ void Register(struct evhttp_request* req)
18+ {
19+ WITH_LOCK(m_requests_mutex, m_requests.insert(req));
20+ m_requests_cv.notify_all();
21+ }
22+
23+ size_t Remove(struct evhttp_request* req)
24+ {
25+ auto n{WITH_LOCK(m_requests_mutex, return m_requests.erase(req))};
26+ m_requests_cv.notify_all();
27+ return n;
28+ }
29+
30+ void WaitUntilEmpty()
31+ {
32+ WAIT_LOCK(m_requests_mutex, lock);
33+ if (!m_requests.empty()) {
34+ LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", m_requests.size());
35+ }
36+ m_requests_cv.wait(lock, [this]() EXCLUSIVE_LOCKS_REQUIRED(m_requests_mutex) {
37+ return m_requests.empty();
38+ });
39+ }
40+};
41+
42 /** HTTP module state */
43
44 //! libevent event loop
45@@ -149,10 +183,8 @@ static GlobalMutex g_httppathhandlers_mutex;
46 static std::vector<HTTPPathHandler> pathHandlers GUARDED_BY(g_httppathhandlers_mutex);
47 //! Bound listening sockets
48 static std::vector<evhttp_bound_socket *> boundSockets;
49-//! Track active requests
50-static GlobalMutex g_requests_mutex;
51-static std::condition_variable g_requests_cv;
52-static std::unordered_set<evhttp_request*> g_requests GUARDED_BY(g_requests_mutex);
53+static RequestTracker g_request_tracker;
54+
55
56 /** Check if a network address is allowed to access the HTTP server */
57 static bool ClientAllowed(const CNetAddr& netaddr)
58@@ -216,12 +248,9 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
59 {
60 // Track requests and notify when a request is completed.
61 {
62- WITH_LOCK(g_requests_mutex, g_requests.insert(req));
63- g_requests_cv.notify_all();
64+ g_request_tracker.Register(req);
65 evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
66- auto n = WITH_LOCK(g_requests_mutex, return g_requests.erase(req));
67- assert(n == 1);
68- g_requests_cv.notify_all();
69+ assert(g_request_tracker.Remove(req) == 1);
70 }, nullptr);
71 }
72
73@@ -477,15 +506,7 @@ void StopHTTPServer()
74 evhttp_del_accept_socket(eventHTTP, socket);
75 }
76 boundSockets.clear();
77- {
78- WAIT_LOCK(g_requests_mutex, lock);
79- if (!g_requests.empty()) {
80- LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size());
81- }
82- g_requests_cv.wait(lock, []() EXCLUSIVE_LOCKS_REQUIRED(g_requests_mutex) {
83- return g_requests.empty();
84- });
85- }
86+ g_request_tracker.WaitUntilEmpty();
87 if (eventHTTP) {
88 // Schedule a callback to call evhttp_free in the event base thread, so
89 // that evhttp_free does not need to be called again after the handling
148@@ -146,6 +149,10 @@ static GlobalMutex g_httppathhandlers_mutex;
149 static std::vector<HTTPPathHandler> pathHandlers GUARDED_BY(g_httppathhandlers_mutex);
150 //! Bound listening sockets
151 static std::vector<evhttp_bound_socket *> boundSockets;
152+//! Track active requests
153+static GlobalMutex g_requests_mutex;
154+static std::condition_variable g_requests_cv;
155+static std::unordered_set<evhttp_request*> g_requests GUARDED_BY(g_requests_mutex);
I think this can also be a simple atomic counter which doesn’t require a lock?
0diff --git a/src/httpserver.cpp b/src/httpserver.cpp
1index 346c6d5cb..a058a9c49 100644
2--- a/src/httpserver.cpp
3+++ b/src/httpserver.cpp
4@@ -152,7 +152,7 @@ static std::vector<evhttp_bound_socket *> boundSockets;
5 //! Track active requests
6 static GlobalMutex g_requests_mutex;
7 static std::condition_variable g_requests_cv;
8-static std::unordered_set<evhttp_request*> g_requests GUARDED_BY(g_requests_mutex);
9+static std::atomic<size_t> g_requests_count;
10
11 /** Check if a network address is allowed to access the HTTP server */
12 static bool ClientAllowed(const CNetAddr& netaddr)
13@@ -216,11 +216,9 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
14 {
15 // Track requests and notify when a request is completed.
16 {
17- WITH_LOCK(g_requests_mutex, g_requests.insert(req));
18- g_requests_cv.notify_all();
19+ ++g_requests_count;
20 evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
21- auto n = WITH_LOCK(g_requests_mutex, return g_requests.erase(req));
22- assert(n == 1);
23+ --g_requests_count;
24 g_requests_cv.notify_all();
25 }, nullptr);
26 }
27@@ -480,10 +478,10 @@ void StopHTTPServer()
28 {
29 WAIT_LOCK(g_requests_mutex, lock);
30 g_requests_cv.wait(lock, []() EXCLUSIVE_LOCKS_REQUIRED(g_requests_mutex) {
31- if (!g_requests.empty()) {
32- LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size());
33- }
34- return g_requests.empty();
35+ if (g_requests_count > 0) {
36+ LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests_count);
37+ }
38+ return g_requests_count == 0;
39 });
40 }
41 if (eventHTTP) {
(cc @hebasto - I think this is a similar but less “controversial” approach than what I proposed offline)
std::atomic<size_t>
as opposed to set mutations which require acquiring a lock. If I’m misunderstanding you, could you please clarify?
std::atomic<T>::wait
Approach ACK 6d3a848fc
Can anyone please confirm or correct my understanding of why this so much faster?
My current understanding is that even when all of the workers in g_thread_http_workers
have terminated (i.e. the entire queue is processed and running==false
), the http server thread g_thread_http
does not return (when joining) immediately because there are still some idle connections (with whom?) open, preventing the server from shutting down. By first (instead of afterwards) calling evhttp_free(eventHTTP)
, we can safely destroy the server even if connections are still open (but all requests are handled), causing g_thread_http.join()
to return immediately in StopHTTPServer()
as opposed to waiting for a timeout.
I think the commit and PR message would benefit from a more clear reason why this works.
Can anyone please confirm or correct my understanding of why this so much faster?
My current understanding is that even when all of the workers in
g_thread_http_workers
have terminated (i.e. the entire queue is processed andrunning==false
), the http server threadg_thread_http
does not return (when joining) immediately because there are still some idle connections (with whom?) open, preventing the server from shutting down. By first (instead of afterwards) callingevhttp_free(eventHTTP)
, we can safely destroy the server even if connections are still open (but all requests are handled), causingg_thread_http.join()
to return immediately inStopHTTPServer()
as opposed to waiting for a timeout.
Sounds good to me. I guess libevent doesn’t close the idle connection as an optimization (less overhead if the client wants to send more requests) and if the client doesn’t close them then we are seeing this issue. I would add that the overhead that we see in the functional test is the timeout we have configured on the http server ourselves here.
I guess libevent doesn’t close the idle connection as an optimization (less overhead if the client wants to send more requests) and if the client doesn’t close them then we are seeing this issue.
Not really an optimization of libevent, AuthServiceProxy
used in the test framework creates a persistent connection (to reuse the same connection for consecutive HTTP requests, as @fjahr says).
ACK. With this change evhttp_free
is called after all requests are completed. Calling evhttp_free
at this point will cause all active connections to close and nothing remains on the event base, and all RPC threads are ready to be joined. As mentioned above, this is useful to avoid hitting the idle timeout.
Note that shutdown still hangs while a request is in-flight.
20@@ -21,12 +21,15 @@
21 #include <util/threadnames.h>
22 #include <util/translation.h>
23
24+#include <condition_variable>
25 #include <cstdio>
26 #include <cstdlib>
27 #include <deque>
28 #include <memory>
29+#include <mutex>
#include <event2/http_struct.h>
(for evhttp_request
)
213@@ -207,6 +214,17 @@ std::string RequestMethodString(HTTPRequest::RequestMethod m)
214 /** HTTP request callback */
215 static void http_request_cb(struct evhttp_request* req, void* arg)
216 {
217+ // Track requests and notify when a request is completed.
218+ {
219+ WITH_LOCK(g_requests_mutex, g_requests.insert(req));
220+ g_requests_cv.notify_all();
221+ evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
222+ auto n = WITH_LOCK(g_requests_mutex, return g_requests.erase(req));
nit - think I’d prefer removing the n
var altogether, the temp var doesn’t really make it more readable imo
0 auto n{WITH_LOCK(g_requests_mutex, return g_requests.erase(req))};
ACK https://github.com/bitcoin/bitcoin/commit/55bbf04adc4879dcacd135f40aa053e60032af4b
I have some open suggestions, but I’m happy with the PR as it is.
This was made obsolete by tracking the active requests and explicitly waiting for them to finish before shutdown.
34 #include <sys/stat.h>
35
36 #include <event2/buffer.h>
37 #include <event2/bufferevent.h>
38 #include <event2/http.h>
39+#include <event2/http_struct.h>
#include <event2/http_struct.h>
/#include <event2/event.h>
/.