http: Track active requests and wait for last to finish

fjahr commented at 6:48 PM on December 22, 2022: contributor

This revives #19420. Since promag is not so active at the moment, I can support this to finally get it merged.

The PR is rebased and comments by jonatack have been addressed.

Once this is merged, I will also reopen #19434.

DrahtBot commented at 6:49 PM on December 22, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	stickies-v, hebasto
Concept ACK	brunoerg, promag
Stale ACK	kevkevinpal

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

DrahtBot added the label RPC/REST/ZMQ on Dec 22, 2022

fjahr cross-referenced this on Dec 22, 2022 from issue http: Track active requests and wait for last to finish by promag

kevkevinpal commented at 4:50 AM on December 23, 2022: contributor

Running test/functional/feature_abortnode.py before and after change we get a significant performance gain

running before changes f3bc1a72825fe2b51f4bc20e004cef464f05b965

test/functional/feature_abortnode.py
2022-12-23T04:47:19.826000Z TestFramework (INFO): Initializing test directory /var/folders/9g/cvx014yx4dq5lwl_x5zvn_j80000gn/T/bitcoin_func_test_k3_6xric
2022-12-23T04:47:21.229000Z TestFramework (INFO): Waiting for crash
2022-12-23T04:47:51.429000Z TestFramework (INFO): Node crashed - now verifying restart fails
2022-12-23T04:47:52.086000Z TestFramework (INFO): Stopping nodes
2022-12-23T04:47:52.300000Z TestFramework (INFO): Cleaning up /var/folders/9g/cvx014yx4dq5lwl_x5zvn_j80000gn/T/bitcoin_func_test_k3_6xric on exit
2022-12-23T04:47:52.300000Z TestFramework (INFO): Tests successful

32.50 seconds to run test

running test with these changes applied 40528e7

test/functional/feature_abortnode.py
2022-12-23T04:44:30.075000Z TestFramework (INFO): Initializing test directory /var/folders/9g/cvx014yx4dq5lwl_x5zvn_j80000gn/T/bitcoin_func_test_0h3w_rbx
2022-12-23T04:44:31.516000Z TestFramework (INFO): Waiting for crash
2022-12-23T04:44:31.837000Z TestFramework (INFO): Node crashed - now verifying restart fails
2022-12-23T04:44:32.452000Z TestFramework (INFO): Stopping nodes
2022-12-23T04:44:32.727000Z TestFramework (INFO): Cleaning up /var/folders/9g/cvx014yx4dq5lwl_x5zvn_j80000gn/T/bitcoin_func_test_0h3w_rbx on exit
2022-12-23T04:44:32.727000Z TestFramework (INFO): Tests successful

1.2 seconds to run test

stickies-v commented at 10:15 AM on December 23, 2022: contributor

Concept ACK - thanks for reviving this!

brunoerg commented at 11:30 AM on December 23, 2022: contributor

Concept ACK

in src/httpserver.cpp:152 in 40528e777f outdated

 147 | @@ -146,6 +148,10 @@ static GlobalMutex g_httppathhandlers_mutex;
 148 |  static std::vector<HTTPPathHandler> pathHandlers GUARDED_BY(g_httppathhandlers_mutex);
 149 |  //! Bound listening sockets
 150 |  static std::vector<evhttp_bound_socket *> boundSockets;
 151 | +//! Track active requests
 152 | +static Mutex g_requests_mutex;

brunoerg commented at 11:59 AM on December 23, 2022:

Maybe a GlobalMutex would be more appropriate here?

static GlobalMutex g_requests_mutex;

kevkevinpal commented at 3:21 PM on December 23, 2022:

I'm still newish to the repo, is there a convention on when we should use Mutex vs GlobalMutex?

hebasto commented at 4:07 PM on December 23, 2022:

I'm still newish to the repo, is there a convention on when we should use Mutex vs GlobalMutex?

It depends on a scope. For the global namespace the GlobalMutex is an appropriate option: https://github.com/bitcoin/bitcoin/blob/f3bc1a72825fe2b51f4bc20e004cef464f05b965/src/sync.h#L132-L141

hebasto commented at 4:14 PM on December 23, 2022:

Maybe a GlobalMutex would be more appropriate here?

This suggestion must fix the following errors:

httpserver.cpp:221:13: error: calling function 'MaybeCheckNotHeld' requires negative capability '!g_requests_mutex' [-Werror,-Wthread-safety-analysis]
            LOCK(g_requests_mutex);
            ^
./sync.h:258:56: note: expanded from macro 'LOCK'
#define LOCK(cs) UniqueLock UNIQUE_NAME(criticalblock)(MaybeCheckNotHeld(cs), #cs, __FILE__, __LINE__)
                                                       ^
httpserver.cpp:218:9: error: calling function 'MaybeCheckNotHeld' requires negative capability '!g_requests_mutex' [-Werror,-Wthread-safety-analysis]
        LOCK(g_requests_mutex);
        ^
./sync.h:258:56: note: expanded from macro 'LOCK'
#define LOCK(cs) UniqueLock UNIQUE_NAME(criticalblock)(MaybeCheckNotHeld(cs), #cs, __FILE__, __LINE__)
                                                       ^
httpserver.cpp:481:9: error: calling function 'MaybeCheckNotHeld' requires negative capability '!g_requests_mutex' [-Werror,-Wthread-safety-analysis]
        WAIT_LOCK(g_requests_mutex, lock);
        ^
./sync.h:263:45: note: expanded from macro 'WAIT_LOCK'
#define WAIT_LOCK(cs, name) UniqueLock name(MaybeCheckNotHeld(cs), #cs, __FILE__, __LINE__)
                                            ^
3 errors generated.

fjahr commented at 12:33 PM on December 25, 2022:

Done, errors are fixed. Thanks!

promag commented at 3:12 PM on December 23, 2022: member

Concept ACK :trollface:

fjahr force-pushed on Dec 23, 2022

in src/httpserver.cpp:493 in b9b8b5d2ca outdated

 482 | +        while (!g_requests.empty()) {
 483 | +            g_requests_cv.wait(lock);
 484 | +        }
 485 | +    }
 486 | +    if (eventHTTP) {
 487 | +        event_base_once(eventBase, -1, EV_TIMEOUT, [](evutil_socket_t, short, void*) {

promag commented at 11:15 PM on December 23, 2022:

maybe include a comment explaining this, see #19420 (review). At this point, it is possible that ong_thread_http the eventHTTP struct is being used.

fjahr commented at 6:57 PM on December 25, 2022:

done, I hope that covers it.

in src/httpserver.cpp:224 in b9b8b5d2ca outdated

 219 | +        g_requests.insert(req);
 220 | +        evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
 221 | +            LOCK(g_requests_mutex);
 222 | +            auto n = g_requests.erase(req);
 223 | +            assert(n == 1);
 224 | +            if (g_requests.empty()) g_requests_cv.notify_all();

promag commented at 11:17 PM on December 23, 2022:

maybe always call g_requests_cv.notify_all() and update comment above.

kevkevinpal commented at 3:54 PM on December 24, 2022:

why would we want to always call .notify_all() if we are waiting for all the http requests to finish?

promag commented at 4:14 PM on December 24, 2022:

because that way you would get a log every time a connection closes. Otherwise g_requests_cv should be renamed to g_requests_empty_cv.

If g_requests_cv is used to signal changes to g_requests then also add call .notify_all() after g_requests.insert(req).

fjahr commented at 6:56 PM on December 25, 2022:

done, I have opted for always calling it

in src/httpserver.cpp:487 in b9b8b5d2ca outdated

 476 | @@ -459,14 +477,22 @@ void StopHTTPServer()
 477 |          evhttp_del_accept_socket(eventHTTP, socket);
 478 |      }
 479 |      boundSockets.clear();
 480 | +    {
 481 | +        WAIT_LOCK(g_requests_mutex, lock);
 482 | +        while (!g_requests.empty()) {
 483 | +            g_requests_cv.wait(lock);

promag commented at 11:19 PM on December 23, 2022:

maybe log "Waiting for %d requests to stop HTTP server" or something like that.

fjahr commented at 6:56 PM on December 25, 2022:

done

fjahr force-pushed on Dec 25, 2022

in src/httpserver.cpp:487 in f95063efb9 outdated

 482 | +        WAIT_LOCK(g_requests_mutex, lock);
 483 | +        if (!g_requests.empty()) {
 484 | +            LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size());
 485 | +        }
 486 | +        while (!g_requests.empty()) {
 487 | +            g_requests_cv.wait(lock);

promag commented at 10:16 PM on December 26, 2022:

my suggestion was to add the log here in the loop.

fjahr commented at 12:45 PM on January 5, 2023:

Yeah, I think doing it in the loop essentially creates a countdown that I didn't think was necessary. But I am kind of indifferent on this, so I have changed it now.

kevkevinpal commented at 1:57 AM on December 27, 2022: contributor

retACK f95063e with test/functional/feature_abortnode.py

hebasto commented at 4:24 PM on December 27, 2022: member

From #15363 (comment):

The libevent HTTP interface is kind of crazy, so I don't think it's surprising that fixing this requires some trial and error. But it would be good to have tests for each issue that gets fixed to avoid going in circles.

Suggesting to add a test which will prevent any regressions in the future.

in src/httpserver.cpp:223 in f95063efb9 outdated

 218 | +        LOCK(g_requests_mutex);
 219 | +        g_requests.insert(req);
 220 | +        g_requests_cv.notify_all();
 221 | +        evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
 222 | +            LOCK(g_requests_mutex);
 223 | +            auto n = g_requests.erase(req);

hebasto commented at 4:40 PM on December 27, 2022:

Calling notify_all() while holding a mutex is a pessimization. Suggesting to apply a diff:

            auto n = WITH_LOCK(g_requests_mutex, return g_requests.erase(req));

fjahr commented at 12:44 PM on January 5, 2023:

done

in src/httpserver.cpp:219 in f95063efb9 outdated

 212 | @@ -207,6 +213,19 @@ std::string RequestMethodString(HTTPRequest::RequestMethod m)
 213 |  /** HTTP request callback */
 214 |  static void http_request_cb(struct evhttp_request* req, void* arg)
 215 |  {
 216 | +    // Track requests and notify when a request is completed.
 217 | +    {
 218 | +        LOCK(g_requests_mutex);
 219 | +        g_requests.insert(req);

hebasto commented at 4:40 PM on December 27, 2022:

Calling notify_all() while holding a mutex is a pessimization. Suggesting to apply a diff:

        WITH_LOCK(g_requests_mutex, g_requests.insert(req));

or to introduce a new scope.

fjahr commented at 12:44 PM on January 5, 2023:

done

in src/httpserver.cpp:488 in f95063efb9 outdated

 483 | +        if (!g_requests.empty()) {
 484 | +            LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size());
 485 | +        }
 486 | +        while (!g_requests.empty()) {
 487 | +            g_requests_cv.wait(lock);
 488 | +        }

hebasto commented at 4:59 PM on December 27, 2022:

While this code snippet is used across the current codebase, for new code I'd prefer another wait() overload with an "inlined" stop_waiting condition:

        g_requests_cv.wait(lock, []() EXCLUSIVE_LOCKS_REQUIRED(g_requests_mutex) { return g_requests.empty(); });

fjahr commented at 12:44 PM on January 5, 2023:

done

in src/httpserver.cpp:154 in f95063efb9 outdated

 147 | @@ -146,6 +148,10 @@ static GlobalMutex g_httppathhandlers_mutex;
 148 |  static std::vector<HTTPPathHandler> pathHandlers GUARDED_BY(g_httppathhandlers_mutex);
 149 |  //! Bound listening sockets
 150 |  static std::vector<evhttp_bound_socket *> boundSockets;
 151 | +//! Track active requests
 152 | +static GlobalMutex g_requests_mutex;
 153 | +static std::condition_variable g_requests_cv;
 154 | +static std::set<evhttp_request*> g_requests GUARDED_BY(g_requests_mutex);

hebasto commented at 5:06 PM on December 27, 2022:

Is any particular reason for not using std::unordered_set which has more efficient insert() and erase() methods?

promag commented at 5:45 PM on January 3, 2023:

No reason, should be a small set anyway, both are fine IMHO.

fjahr commented at 12:44 PM on January 5, 2023:

done

in src/httpserver.cpp:503 in f95063efb9 outdated

 502 |      }
 503 | -    if (eventHTTP) {
 504 | -        evhttp_free(eventHTTP);
 505 | -        eventHTTP = nullptr;
 506 | -    }
 507 |      if (eventBase) {

hebasto commented at 6:00 PM on December 27, 2022:

Combine two if (eventBase) blocks into a single one?

fjahr commented at 12:43 PM on January 5, 2023:

done

promag commented at 5:48 PM on January 3, 2023: member

From #15363 (comment):

The libevent HTTP interface is kind of crazy, so I don't think it's surprising that fixing this requires some trial and error. But it would be good to have tests for each issue that gets fixed to avoid going in circles.

Suggesting to add a test which will prevent any regressions in the future. @hebasto like https://github.com/bitcoin/bitcoin/blob/7bb07bf8bd8e240562714f9cc4b7696868cb6c26/test/functional/feature_abortnode.py#L44 should take less than 5s or so?

fjahr force-pushed on Jan 5, 2023

fjahr commented at 12:47 PM on January 5, 2023: contributor

From #15363 (comment):

The libevent HTTP interface is kind of crazy, so I don't think it's surprising that fixing this requires some trial and error. But it would be good to have tests for each issue that gets fixed to avoid going in circles.

Suggesting to add a test which will prevent any regressions in the future.

@hebasto like

https://github.com/bitcoin/bitcoin/blob/7bb07bf8bd8e240562714f9cc4b7696868cb6c26/test/functional/feature_abortnode.py#L44

should take less than 5s or so?

I have updated the test and it seems to be reliably failing on master, so I think this is a good enough regression test @hebasto ?

Also addressed other comments by @hebasto .

in src/httpserver.cpp:486 in 6d3a848fc7 outdated

 484 | +        WAIT_LOCK(g_requests_mutex, lock);
 485 | +        g_requests_cv.wait(lock, []() EXCLUSIVE_LOCKS_REQUIRED(g_requests_mutex) {
 486 | +                if (!g_requests.empty()) {
 487 | +                    LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size());
 488 | +                }
 489 | +                return g_requests.empty();

hebasto commented at 2:20 PM on January 5, 2023:

nit: Isn't indentation too big?

fjahr commented at 12:25 PM on January 9, 2023:

fixed

in src/httpserver.cpp:484 in 6d3a848fc7 outdated

 482 | -        if (g_thread_http.joinable()) g_thread_http.join();
 483 | +    {
 484 | +        WAIT_LOCK(g_requests_mutex, lock);
 485 | +        g_requests_cv.wait(lock, []() EXCLUSIVE_LOCKS_REQUIRED(g_requests_mutex) {
 486 | +                if (!g_requests.empty()) {
 487 | +                    LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size());

hebasto commented at 2:23 PM on January 5, 2023:

Having side effects in the predicate seems a bad idea due to spurious wakeups. It should be a pure function.

fjahr commented at 12:25 PM on January 9, 2023:

Sure, then I think this isn't possible anymore so I am reverting back to how I did it before. I think that's fine though.

hebasto commented at 2:24 PM on January 5, 2023: member

Approach ACK 6d3a848fc76f76af2b2b262011f37be952325b44.

in src/httpserver.cpp:220 in 6d3a848fc7 outdated

 213 | @@ -207,6 +214,17 @@ std::string RequestMethodString(HTTPRequest::RequestMethod m)
 214 |  /** HTTP request callback */
 215 |  static void http_request_cb(struct evhttp_request* req, void* arg)
 216 |  {
 217 | +    // Track requests and notify when a request is completed.
 218 | +    {
 219 | +        WITH_LOCK(g_requests_mutex, g_requests.insert(req));
 220 | +        g_requests_cv.notify_all();

stickies-v commented at 4:36 PM on January 5, 2023:

What's the purpose of this notification? I think we (currently) only need to monitor for completed requests, not new ones?

fjahr commented at 12:25 PM on January 9, 2023:

Did you see #26742 (review) ?

stickies-v commented at 2:30 PM on January 9, 2023:

I hadn't seen that, thanks for linking.

It's a minor concern, but I'm a bit worried that we're creating a very general contract ("g_requests_cv will always notify for any changes to the g_requests") on a global variable when it's enforced outside of g_requests, when in the future we might easily introduce mutations on g_requests without notifying the cv. A more scoped cv, such as g_requests_cv_handled largely mitigates that.

But, again, no big concern, and it's not likely to cause any major problems should we miss something there. As you see fit.

stickies-v commented at 11:43 AM on January 11, 2023:

FYI I think one approach to improve on this contract (and as a nice benefit, also encapsulate code a bit better - I think this improves readability) in the below diff.

(WaitUntilEmpty() needs improvement because it shouldn't directly be logging about the HTTP server like that. I guess we also don't need a GlobalMutex here.)

diff --git a/src/httpserver.cpp b/src/httpserver.cpp
index b481ef093..90ef37bd5 100644
--- a/src/httpserver.cpp
+++ b/src/httpserver.cpp
@@ -134,6 +134,40 @@ struct HTTPPathHandler
     HTTPRequestHandler handler;
 };
 
+class RequestTracker
+{
+public:
+    typedef std::unordered_set<evhttp_request*> requests_t;
+private:
+    GlobalMutex m_requests_mutex;
+    std::condition_variable m_requests_cv;
+    requests_t m_requests GUARDED_BY(m_requests_mutex);
+public:
+    void Register(struct evhttp_request* req)
+    {
+        WITH_LOCK(m_requests_mutex, m_requests.insert(req));
+        m_requests_cv.notify_all();
+    }
+
+    size_t Remove(struct evhttp_request* req)
+    {
+        auto n{WITH_LOCK(m_requests_mutex, return m_requests.erase(req))};
+        m_requests_cv.notify_all();
+        return n;
+    }
+
+    void WaitUntilEmpty()
+    {
+        WAIT_LOCK(m_requests_mutex, lock);
+        if (!m_requests.empty()) {
+            LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", m_requests.size());
+        }
+        m_requests_cv.wait(lock, [this]() EXCLUSIVE_LOCKS_REQUIRED(m_requests_mutex) {
+            return m_requests.empty();
+        });
+    }
+};
+
 /** HTTP module state */
 
 //! libevent event loop
@@ -149,10 +183,8 @@ static GlobalMutex g_httppathhandlers_mutex;
 static std::vector<HTTPPathHandler> pathHandlers GUARDED_BY(g_httppathhandlers_mutex);
 //! Bound listening sockets
 static std::vector<evhttp_bound_socket *> boundSockets;
-//! Track active requests
-static GlobalMutex g_requests_mutex;
-static std::condition_variable g_requests_cv;
-static std::unordered_set<evhttp_request*> g_requests GUARDED_BY(g_requests_mutex);
+static RequestTracker g_request_tracker;
+
 
 /** Check if a network address is allowed to access the HTTP server */
 static bool ClientAllowed(const CNetAddr& netaddr)
@@ -216,12 +248,9 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
 {
     // Track requests and notify when a request is completed.
     {
-        WITH_LOCK(g_requests_mutex, g_requests.insert(req));
-        g_requests_cv.notify_all();
+        g_request_tracker.Register(req);
         evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
-            auto n = WITH_LOCK(g_requests_mutex, return g_requests.erase(req));
-            assert(n == 1);
-            g_requests_cv.notify_all();
+            assert(g_request_tracker.Remove(req) == 1);
         }, nullptr);
     }
 
@@ -477,15 +506,7 @@ void StopHTTPServer()
         evhttp_del_accept_socket(eventHTTP, socket);
     }
     boundSockets.clear();
-    {
-        WAIT_LOCK(g_requests_mutex, lock);
-        if (!g_requests.empty()) {
-            LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size());
-        }
-        g_requests_cv.wait(lock, []() EXCLUSIVE_LOCKS_REQUIRED(g_requests_mutex) {
-            return g_requests.empty();
-        });
-    }
+    g_request_tracker.WaitUntilEmpty();
     if (eventHTTP) {
         // Schedule a callback to call evhttp_free in the event base thread, so
         // that evhttp_free does not need to be called again after the handling

</details>

fjahr commented at 7:46 PM on February 10, 2023:

Sorry but it's a significant change and this PR already has a pretty long history and several ACKs for the current approach. I would suggest you open a follow-up PR with this as an additional improvement.

in src/httpserver.cpp:155 in 6d3a848fc7 outdated

 148 | @@ -146,6 +149,10 @@ static GlobalMutex g_httppathhandlers_mutex;
 149 |  static std::vector<HTTPPathHandler> pathHandlers GUARDED_BY(g_httppathhandlers_mutex);
 150 |  //! Bound listening sockets
 151 |  static std::vector<evhttp_bound_socket *> boundSockets;
 152 | +//! Track active requests
 153 | +static GlobalMutex g_requests_mutex;
 154 | +static std::condition_variable g_requests_cv;
 155 | +static std::unordered_set<evhttp_request*> g_requests GUARDED_BY(g_requests_mutex);

stickies-v commented at 6:06 PM on January 6, 2023:

I think this can also be a simple atomic counter which doesn't require a lock?

diff --git a/src/httpserver.cpp b/src/httpserver.cpp
index 346c6d5cb..a058a9c49 100644
--- a/src/httpserver.cpp
+++ b/src/httpserver.cpp
@@ -152,7 +152,7 @@ static std::vector<evhttp_bound_socket *> boundSockets;
 //! Track active requests
 static GlobalMutex g_requests_mutex;
 static std::condition_variable g_requests_cv;
-static std::unordered_set<evhttp_request*> g_requests GUARDED_BY(g_requests_mutex);
+static std::atomic<size_t> g_requests_count;
 
 /** Check if a network address is allowed to access the HTTP server */
 static bool ClientAllowed(const CNetAddr& netaddr)
@@ -216,11 +216,9 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
 {
     // Track requests and notify when a request is completed.
     {
-        WITH_LOCK(g_requests_mutex, g_requests.insert(req));
-        g_requests_cv.notify_all();
+        ++g_requests_count;
         evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
-            auto n = WITH_LOCK(g_requests_mutex, return g_requests.erase(req));
-            assert(n == 1);
+            --g_requests_count;
             g_requests_cv.notify_all();
         }, nullptr);
     }
@@ -480,10 +478,10 @@ void StopHTTPServer()
     {
         WAIT_LOCK(g_requests_mutex, lock);
         g_requests_cv.wait(lock, []() EXCLUSIVE_LOCKS_REQUIRED(g_requests_mutex) {
-                if (!g_requests.empty()) {
-                    LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size());
-                }
-                return g_requests.empty();
+            if (g_requests_count > 0) {
+                LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests_count);
+            }
+            return g_requests_count == 0;
         });
     }
     if (eventHTTP) {

</details>

(cc @hebasto - I think this is a similar but less "controversial" approach than what I proposed offline)

fjahr commented at 12:25 PM on January 9, 2023:

Yes, for this change alone it would be possible with a count I think but not for the follow-up change in #19434 where the explicit requests are removed in a callback.

stickies-v commented at 2:35 PM on January 9, 2023:

Okay, makes sense. I'm not sure how controversial #19434 is, need to take a closer look. Would it make sense to use the atomic counter in this PR and then use a set in the follow-up? In case we can't get that merged, that would leave master in a better state, and I think the code churn is manageable?

promag commented at 4:03 PM on January 9, 2023:

You still need a lock, otherwise, how would you wait for the count to be zero?

stickies-v commented at 4:13 PM on January 9, 2023:

I'm not sure I understand your comment - my diff (https://github.com/bitcoin/bitcoin/pull/26742#discussion_r1063664153) does not involve removing the lock, although it reduces reliance on it since a counter can be encapsulated with a more efficient std::atomic<size_t> as opposed to set mutations which require acquiring a lock. If I'm misunderstanding you, could you please clarify?

promag commented at 4:18 PM on January 9, 2023:

Ah neverming, we could use std::atomic<T>::wait

promag commented at 4:20 PM on January 9, 2023:

Anyway, I think having a set of connections is required for upcoming changes (like #19434) but you are right that for this branch the count is enough. I still tend to prefer as is but I would ACK a version with the counter.

fjahr commented at 5:23 PM on January 9, 2023:

I prefer to keep it as is as well. I don't think #19434 will be controversial because it's not conceptually different, just an improvement. And using the count is just a minor simplification, the code works the same either way. We have put stranger things into the code base, like unused functions for example, as long as there were follow-ups where they would be used and the these follow-ups were following the same concept that was already ACKed.

stickies-v commented at 11:44 AM on January 11, 2023:

Alright, it's not my ideal approach but let's be pragmatic and keep it as is. Can be marked as resolved for me.

stickies-v commented at 6:17 PM on January 6, 2023: contributor

Approach ACK 6d3a848fc

Can anyone please confirm or correct my understanding of why this so much faster?

My current understanding is that even when all of the workers in g_thread_http_workers have terminated (i.e. the entire queue is processed and running==false), the http server thread g_thread_http does not return (when joining) immediately because there are still some idle connections (with whom?) open, preventing the server from shutting down. By first (instead of afterwards) calling evhttp_free(eventHTTP), we can safely destroy the server even if connections are still open (but all requests are handled), causing g_thread_http.join() to return immediately in StopHTTPServer() as opposed to waiting for a timeout.

I think the commit and PR message would benefit from a more clear reason why this works.

fjahr force-pushed on Jan 9, 2023

fjahr commented at 12:52 PM on January 9, 2023: contributor

Can anyone please confirm or correct my understanding of why this so much faster?

My current understanding is that even when all of the workers in g_thread_http_workers have terminated (i.e. the entire queue is processed and running==false), the http server thread g_thread_http does not return (when joining) immediately because there are still some idle connections (with whom?) open, preventing the server from shutting down. By first (instead of afterwards) calling evhttp_free(eventHTTP), we can safely destroy the server even if connections are still open (but all requests are handled), causing g_thread_http.join() to return immediately in StopHTTPServer() as opposed to waiting for a timeout.

Sounds good to me. I guess libevent doesn't close the idle connection as an optimization (less overhead if the client wants to send more requests) and if the client doesn't close them then we are seeing this issue. I would add that the overhead that we see in the functional test is the timeout we have configured on the http server ourselves here.

promag commented at 2:27 PM on January 9, 2023: member

I guess libevent doesn't close the idle connection as an optimization (less overhead if the client wants to send more requests) and if the client doesn't close them then we are seeing this issue.

Not really an optimization of libevent, AuthServiceProxy used in the test framework creates a persistent connection (to reuse the same connection for consecutive HTTP requests, as @fjahr says).

promag commented at 4:12 PM on January 9, 2023: member

ACK. With this change evhttp_free is called after all requests are completed. Calling evhttp_free at this point will cause all active connections to close and nothing remains on the event base, and all RPC threads are ready to be joined. As mentioned above, this is useful to avoid hitting the idle timeout.

Note that shutdown still hangs while a request is in-flight.

in src/httpserver.cpp:29 in 55bbf04adc outdated

  20 | @@ -21,12 +21,15 @@
  21 |  #include <util/threadnames.h>
  22 |  #include <util/translation.h>
  23 |  
  24 | +#include <condition_variable>
  25 |  #include <cstdio>
  26 |  #include <cstdlib>
  27 |  #include <deque>
  28 |  #include <memory>
  29 | +#include <mutex>

promag commented at 12:29 AM on January 11, 2023:

can be removed.

stickies-v commented at 11:08 AM on January 11, 2023:

+1, and we're missing #include <event2/http_struct.h> (for evhttp_request)

fjahr commented at 7:45 PM on February 10, 2023:

Both done

in src/httpserver.cpp:222 in 55bbf04adc outdated

 213 | @@ -207,6 +214,17 @@ std::string RequestMethodString(HTTPRequest::RequestMethod m)
 214 |  /** HTTP request callback */
 215 |  static void http_request_cb(struct evhttp_request* req, void* arg)
 216 |  {
 217 | +    // Track requests and notify when a request is completed.
 218 | +    {
 219 | +        WITH_LOCK(g_requests_mutex, g_requests.insert(req));
 220 | +        g_requests_cv.notify_all();
 221 | +        evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
 222 | +            auto n = WITH_LOCK(g_requests_mutex, return g_requests.erase(req));

stickies-v commented at 11:08 AM on January 11, 2023:

nit - think I'd prefer removing the n var altogether, the temp var doesn't really make it more readable imo

            auto n{WITH_LOCK(g_requests_mutex, return g_requests.erase(req))};

fjahr commented at 7:46 PM on February 10, 2023:

Done

stickies-v approved

stickies-v commented at 11:46 AM on January 11, 2023: contributor

ACK https://github.com/bitcoin/bitcoin/commit/55bbf04adc4879dcacd135f40aa053e60032af4b

I have some open suggestions, but I'm happy with the PR as it is.

http: Track active requests and wait for last to finish 8c6d007c80

http: Release server before waiting for event base loop exit 660bdbf785

test: Reduce extended timeout on abortnode test

This was made obsolete by tracking the active requests and explicitly waiting for them to finish before shutdown.

60978c8080

fjahr force-pushed on Feb 10, 2023

fjahr commented at 7:47 PM on February 10, 2023: contributor

Addressed comments by @promag and @stickies-v

stickies-v approved

stickies-v commented at 12:36 PM on February 11, 2023: contributor

re-ACK 60978c8

DrahtBot requested review from kevkevinpal on Feb 11, 2023

in src/httpserver.cpp:39 in 60978c8080

  34 |  #include <sys/stat.h>
  35 |  
  36 |  #include <event2/buffer.h>
  37 |  #include <event2/bufferevent.h>
  38 |  #include <event2/http.h>
  39 | +#include <event2/http_struct.h>

hebasto commented at 1:39 PM on March 1, 2023:

nit: IWYU suggests to s/#include <event2/http_struct.h>/#include <event2/event.h>/.

fjahr commented at 6:28 PM on March 4, 2023:

Thanks! I will fix this in the follow-up that will pick up #19434.

hebasto approved

hebasto commented at 1:40 PM on March 1, 2023: member

ACK 60978c8080ec13ff4571c8a89e742517b2aca692

bitcoin deleted a comment on Mar 4, 2023

fanquake removed review request from kevkevinpal on Mar 5, 2023

fanquake requested review from john-moffett on Mar 5, 2023

DrahtBot requested review from kevkevinpal on Mar 6, 2023

achow101 commented at 12:29 AM on March 7, 2023: member

ACK 60978c8080ec13ff4571c8a89e742517b2aca692

achow101 merged this on Mar 7, 2023

achow101 closed this on Mar 7, 2023

sidhujag referenced this in commit 61edeffbd5 on Mar 7, 2023

Crypt-iQ cross-referenced this on May 22, 2023 from issue bitcoind hangs waiting for `g_requests.empty()` by Crypt-iQ

stickies-v cross-referenced this on Sep 29, 2023 from issue http: bugfix: allow server shutdown in case of remote client disconnection by stickies-v

fanquake referenced this in commit db7b5dfcc5 on Oct 4, 2023

Frank-GER referenced this in commit b0fea09705 on Oct 13, 2023

bitcoin locked this on Mar 6, 2024

http: Track active requests and wait for last to finish - 2nd attempt #26742

Reviews