http: add evhttp_connection_set_closecb to avoid g_requests hang #27909

pull Crypt-iQ wants to merge 1 commits into bitcoin:master from Crypt-iQ:httpserver_fix_wo_fu changing 1 files +14 −2
  1. Crypt-iQ commented at 3:14 pm on June 17, 2023: contributor

    Prior to this patch, it was possible that if an RPC was interrupted before the reply was received, the success callback evhttp_request_set_on_complete_cb would never be called and g_requests wouldn’t be cleaned up. When attempting to shutdown bitcoind, it would hang waiting for g_requests.empty().

    Fixes #27722

    I am currently working on testing this with a libevent version of 2.2.1 to ensure that the patch also works for that version.

  2. DrahtBot commented at 3:14 pm on June 17, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Approach ACK stickies-v

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28551 (http: bugfix: allow server shutdown in case of remote client disconnection by stickies-v)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label RPC/REST/ZMQ on Jun 17, 2023
  4. fanquake commented at 9:17 am on June 20, 2023: member
    cc @fjahr
  5. stickies-v commented at 9:32 am on June 20, 2023: contributor

    Concept ACK.

    Was able to reproduce the bug, and this fixes the issue for me (libevent 2.1.12, macOS 13.4).

    Need to read up on the libevent workaround a bit more first.

  6. promag commented at 6:11 pm on June 20, 2023: member
    Why I am not co-author?
  7. Crypt-iQ commented at 6:16 pm on June 20, 2023: contributor

    Why I am not co-author?

    Happy to add you of course since you wrote it – I couldn’t find your email via git log to add it and I wasn’t sure how to give credit w/o an email

  8. jonatack commented at 6:17 pm on June 20, 2023: contributor
    I’d suggest using @promag’s commit authored by him (and then maybe/maybe not an additional commit if any substantive new changes).
  9. Crypt-iQ commented at 6:20 pm on June 20, 2023: contributor
    Can you advise on how to do that? I can close this in favor of another and let somebody take it over with proper commit attribution. I opened this PR since it wasn’t addressed for a month
  10. jonatack commented at 6:21 pm on June 20, 2023: contributor

    git log for me has @promag’s email as João Barbosa <joao.paulo.barbosa@gmail.com>

    Would be something like git commit --amend --author="João Barbosa <joao.paulo.barbosa@gmail.com>"

  11. jonatack commented at 6:21 pm on June 20, 2023: contributor
    or git cherry-pick d41736beb
  12. Crypt-iQ force-pushed on Jun 20, 2023
  13. Crypt-iQ force-pushed on Jun 20, 2023
  14. Crypt-iQ commented at 6:26 pm on June 20, 2023: contributor

    git log for me has @promag’s email as João Barbosa joao.paulo.barbosa@gmail.com

    I see, I grepped for promag instead of the name. Anyways, the commit attribution is fixed. Sorry about that @promag

    or git cherry-pick https://github.com/bitcoin/bitcoin/commit/d41736beb0701242e2f7f6ba2f3bb0192126fbad

    I didn’t do this because that commit only fixes it for certain libevent versions

  15. promag commented at 6:51 pm on June 20, 2023: member
    @Crypt-iQ No worries mate, thanks for picking this work!
  16. in src/httpserver.cpp:224 in 1296038463 outdated
    219+        auto conn = evhttp_request_get_connection(req);
    220+
    221+        // The EV_READ enablement is off-set by the workaround for the libevent
    222+        // bug further below. This is why it can only be used with libevent
    223+        // versions that are not affected by the workaround code.
    224+        if (event_get_version_number() < 0x02010600 && event_get_version_number() >= 0x02020001) {
    


    stickies-v commented at 5:53 pm on June 22, 2023:

    This will never get called, I think you mean ||?

    0        if (event_get_version_number() < 0x02010600 || event_get_version_number() >= 0x02020001) {
    

    stickies-v commented at 5:53 pm on June 22, 2023:
    The libevent bug was actually patched in 2.1.9 already, so I think we should update our version check to reflect that? Although it might warrant a separate PR to do so.

    Crypt-iQ commented at 8:12 pm on June 22, 2023:
    Yes that’s right

    stickies-v commented at 4:28 pm on June 23, 2023:
    Opened a pull to correct this: https://github.com/bitcoin/bitcoin/pull/27949
  17. in src/httpserver.cpp:228 in 1296038463 outdated
    223+        // versions that are not affected by the workaround code.
    224+        if (event_get_version_number() < 0x02010600 && event_get_version_number() >= 0x02020001) {
    225+            // Enable EV_READ to detect remote disconnection meaning that the close
    226+            // callback set below is called.
    227+            bufferevent* bev = evhttp_connection_get_bufferevent(conn);
    228+            bufferevent_enable(bev, EV_READ);
    


    stickies-v commented at 5:54 pm on June 22, 2023:
    Is this safe to do? Unless I misunderstand, this effectively disables the bugfix introduced in https://github.com/libevent/libevent/commit/5ff8eb26371c4dc56f384b2de35bea2d87814779, which means that if we get new data in our inbound buffer this will put the connection in an invalid state. I’m not sure yet how that would get triggered in practice, so I’m just going off what I read so far as I can’t reproduce the race condition reported in #11593.

    Crypt-iQ commented at 8:16 pm on June 22, 2023:
    I wasn’t able to reproduce using this #11593 (comment) and since you pointed out that the change wasn’t merged into 2.2, I think it’s better to err on the side of caution and not enable reading

    Crypt-iQ commented at 5:57 pm on June 23, 2023:
    I’ve updated the PR and removed this
  18. stickies-v commented at 6:16 pm on June 22, 2023: contributor

    I agree with the approach of calling evhttp_connection_set_closecb regardless of whether EV_READ is enabled or disabled, which is currently the only difference between this PR and #27245.

    With this PR:

    • In the case that EV_READ is enabled, evhttp_connection_set_closecb will be called as soon as the connection is closed, including in case of a remote client disconnection (especially useful during long-running requests as described here), which is the most efficient as we’re not wasting server resources on completing the request that won’t get served anyway.

    • In the case that EV_READ is disabled, evhttp_connection_set_closecb will not be called immediately in case of remote client disconnection (because we’re not listening for it), but it will still be called when the server is shutting down, still resolving the issue in #27909 where bitcoind can’t gracefully shutdown but potentially wasting resources on requests of which the connection has been closed.

    However, I’m not sure the entire code block to enable EV_READ is safe, as per my comment. I’ll continue digging into that but looking forward to hearing other views on this? If it’s not safe, I think we need to leave this out and look for alternatives to monitoring remote client disconnections. A fix is suggested in https://github.com/azat/libevent/commit/7d2e2a84ce7bab938433851ff4e842e15c005dc4, but I can’t see it in the 2.2.1-alpha release yet. Even with EV_READ disabled, I still think calling evhttp_connection_set_closecb is a very worthwhile improvement.

  19. fjahr commented at 10:34 pm on June 22, 2023: contributor

    I opened this PR since it wasn’t addressed for a month

    Why didn’t you ask me first if I am still working on this? A month isn’t a long time in this project when you working on multiple things at the same time…

    Typically people leave a review comment in a situation like this instead of opening a new PR.

  20. Crypt-iQ closed this on Jun 23, 2023

  21. fjahr commented at 8:54 am on June 23, 2023: contributor
    @Crypt-iQ There is no need to close this PR now since it got attention and review. I will close mine if you want to continue. But I wanted to clarify this for the future.
  22. Crypt-iQ commented at 8:59 am on June 23, 2023: contributor
    Ok I reopened, it would be helpful if these things were made explicit (leave a comment) rather than implicit
  23. Crypt-iQ reopened this on Jun 23, 2023

  24. Crypt-iQ force-pushed on Jun 23, 2023
  25. Crypt-iQ commented at 5:56 pm on June 23, 2023: contributor

    @Crypt-iQ There is no need to close this PR now since it got attention and review. I will close mine if you want to continue. But I wanted to clarify this for the future.

    I think you can reopen your PR to include the bufferevent read enable. I’ve updated this PR to include a minimal fix for the issue (without the bufferevent read enable) because I’m still a little unclear on what exactly the bug fix was originally for and I can’t reproduce it. Also happy to close this PR in favor of yours since this is now missing the read enable.

  26. fanquake referenced this in commit a15388c606 on Jun 28, 2023
  27. in src/httpserver.cpp:219 in 710985d695 outdated
    214@@ -215,6 +215,20 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
    215     {
    216         WITH_LOCK(g_requests_mutex, g_requests.insert(req));
    217         g_requests_cv.notify_all();
    218+
    219+        auto conn = evhttp_request_get_connection(req);
    


    stickies-v commented at 12:33 pm on June 28, 2023:

    nit

    0        auto conn{evhttp_request_get_connection(req)};
    

    Crypt-iQ commented at 0:34 am on July 14, 2023:
    done
  28. in src/httpserver.cpp:225 in 710985d695 outdated
    220+
    221+        // Close callback to clear active but running request. This is also
    222+        // called if the connection is closed after a successful request, but
    223+        // complete callback set below already cleared the state.
    224+        evhttp_connection_set_closecb(conn, [](evhttp_connection* conn, void* arg) {
    225+            auto req = static_cast<evhttp_request*>(arg);
    


    stickies-v commented at 12:33 pm on June 28, 2023:

    nit

    0            auto req{static_cast<evhttp_request*>(arg)};
    

    Crypt-iQ commented at 0:34 am on July 14, 2023:
    done
  29. in src/httpserver.cpp:228 in 710985d695 outdated
    223+        // complete callback set below already cleared the state.
    224+        evhttp_connection_set_closecb(conn, [](evhttp_connection* conn, void* arg) {
    225+            auto req = static_cast<evhttp_request*>(arg);
    226+            LOCK(g_requests_mutex);
    227+            auto n{g_requests.erase(req)};
    228+            if (n == 1 && g_requests.empty()) g_requests_cv.notify_all();
    


    stickies-v commented at 12:36 pm on June 28, 2023:

    I’m curious why we make notify_all() dependent on g_requests.empty() here, when in all other places we notify for all added and removed events?

    0            if (g_requests.erase(req)) g_requests_cv.notify_all();
    

    Crypt-iQ commented at 0:31 am on July 14, 2023:
    I think because the only call to g_requests_cv.wait(...) has a predicate that waits for g_requests.empty(). I’ve removed the dependency in the latest patch
  30. stickies-v commented at 12:39 pm on June 28, 2023: contributor

    Approach ACK 710985d6953eb73b5edddaa7d4d54b58c90d1d34

    I think improving remote client disconnect detection is best left for a separate PR; this is a worthwhile improvement on its own and I think uncontroversial.

  31. sidhujag referenced this in commit 2e3d241436 on Jun 30, 2023
  32. http: add evhttp_connection_set_closecb to avoid g_requests hang
    Prior to this patch, it was possible that if an RPC was interrupted
    before the reply was received, the success callback
    evhttp_request_set_on_complete_cb would never be called and
    g_requests wouldn't be cleaned up. When attempting to shutdown
    bitcoind, it would hang waiting for g_requests.empty().
    
    Co-authored-by: Fabian Jahr <fjahr@protonmail.com>
    d170e6c60c
  33. Crypt-iQ force-pushed on Jul 14, 2023
  34. Crypt-iQ commented at 0:33 am on July 14, 2023: contributor
    I’ve removed the assert(n == 1) in evhttp_request_set_on_complete_cb because of this comment: #27722 (comment). It appears that both evhttp_connection_set_closecb and evhttp_request_set_on_complete_cb can run which causes a crash. I wasn’t able to reproduce it and still want to know what exactly happened, but I think removing the assert here is certainly better than having a crash.
  35. fanquake added this to the milestone 25.1 on Jul 14, 2023
  36. stickies-v commented at 3:56 pm on July 14, 2023: contributor

    I wasn’t able to reproduce it and still want to know what exactly happened, but I think removing the assert here is certainly better than having a crash.

    I’m able to reproduce @dooglus’s observed crash quite quickly (< 1 second usually) by blasting the server with requests from > 1 thread.

    On top of d170e6c60cd9f46da1794448b8be55ba6f0b2922, first re-add the assertion:

     0diff --git a/src/httpserver.cpp b/src/httpserver.cpp
     1index 849e9b482..5683e761f 100644
     2--- a/src/httpserver.cpp
     3+++ b/src/httpserver.cpp
     4@@ -229,7 +229,8 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
     5 
     6         // Clear state after successful request.
     7         evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
     8-            WITH_LOCK(g_requests_mutex, g_requests.erase(req));
     9+            auto n{WITH_LOCK(g_requests_mutex, return g_requests.erase(req))};
    10+            assert(n == 1);
    11             g_requests_cv.notify_all();
    12         }, nullptr);
    13     }
    

    I’m running on signet with user:pass credentials:

    0./src/bitcoind -signet -debug=http -rpcuser=user -rpcpassword=pass
    

    And then finally the script to make it crash:

     0import http.client
     1import base64
     2from concurrent.futures import ThreadPoolExecutor
     3
     4host = "localhost"
     5port = 38332
     6
     7auth_string = base64.b64encode('user:pass'.encode()).decode()
     8
     9num_threads = 2
    10iterations_per_thread = 10000
    11
    12def send_requests(i):
    13    for _ in range(iterations_per_thread):
    14        conn = http.client.HTTPConnection(host, port)
    15        conn.request("POST", "/")
    16        response = conn.getresponse()
    17
    18        conn.close()
    19
    20with ThreadPoolExecutor(max_workers=num_threads) as executor:
    21    executor.map(send_requests, range(num_threads))
    

    I agree that keeping the assertion in as is is bad, but I don’t like just removing it without properly understanding why this fails (or probably in other words, why/when the evhttp_connection_set_closecb cb is called before the evhttp_request_set_on_complete_cb cb) either, so I think we need to investigate first before proceeding?

  37. Crypt-iQ commented at 0:43 am on July 17, 2023: contributor

    I am a complete libevent noob, but I noticed some odd behavior that makes me think this is a race condition. I used the following diff that logged both sides’ port numbers and the above python script:

     0diff --git a/src/httpserver.cpp b/src/httpserver.cpp
     1index 849e9b482..8bb9d407f 100644
     2--- a/src/httpserver.cpp
     3+++ b/src/httpserver.cpp
     4@@ -223,13 +223,21 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
     5         // complete callback set below already cleared the state.
     6         evhttp_connection_set_closecb(conn, [](evhttp_connection* conn, void* arg) {
     7             auto req{static_cast<evhttp_request*>(arg)};
     8+	    auto sockd = (struct sockaddr_in*)evhttp_connection_get_addr(conn);
     9+	    auto addr = sockd->sin_port;
    10+	    LogPrint(BCLog::HTTP, "close callback remote port %d conn port %d\n", req->remote_port, addr);
    11             WITH_LOCK(g_requests_mutex, g_requests.erase(req));
    12             g_requests_cv.notify_all();
    13         }, req);
    14 
    15         // Clear state after successful request.
    16         evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
    17-            WITH_LOCK(g_requests_mutex, g_requests.erase(req));
    18+	    auto conn = req->evcon;
    19+	    auto sockd = (struct sockaddr_in*)evhttp_connection_get_addr(conn);
    20+	    auto addr = sockd->sin_port;
    21+            LogPrint(BCLog::HTTP, "complete callback remote port %d conn port %d\n", req->remote_port, addr);
    22+	    auto n{WITH_LOCK(g_requests_mutex, return g_requests.erase(req))};
    23+	    assert(n == 1);
    24             g_requests_cv.notify_all();
    25         }, nullptr);
    26     }
    

    Most of the time (98-99%), I would see normal log lines where both ports were the same in the complete and close callbacks:

    02023-07-16T22:25:46Z [http] Received a POST request for / from 127.0.0.1:44596
    12023-07-16T22:25:46Z [http] complete callback remote port 44596 conn port 13486
    22023-07-16T22:25:46Z [http] close callback remote port 44596 conn port 13486
    

    However, every time the assert failure was hit, the port numbers were not the same for the complete/close callbacks. The port numbers were also re-used from another ongoing request. In the below snippet, the close callback for the request using remote port 44688 uses the local port 36526 which also matches the local port used in the complete callback for remote port 44686. Also, the complete callback for remote port 44688 has a local port 37038 instead of 36526.

    02023-07-16T22:25:46Z [http] close callback remote port 44686 conn port 34990
    12023-07-16T22:25:46Z [http] Received a POST request for / from 127.0.0.1:44686
    22023-07-16T22:25:46Z [http] complete callback remote port 44686 conn port 36526
    32023-07-16T22:25:46Z [http] Received a POST request for / from 127.0.0.1:44688
    42023-07-16T22:25:46Z [http] close callback remote port 44688 conn port 36526
    52023-07-16T22:25:46Z [http] complete callback remote port 44688 conn port 37038
    6<crash here>
    
  38. Crypt-iQ commented at 12:04 pm on July 17, 2023: contributor

    I modified the diff above and printed out the evhttp_request memory address as well as the req->output_buffer memory address. I noticed that theevhttp_request address is re-used when there’s a crash and the output_buffer address is not consistent between callbacks in the error case.

     0diff --git a/src/httpserver.cpp b/src/httpserver.cpp
     1index 849e9b482..9fefad6bd 100644
     2--- a/src/httpserver.cpp
     3+++ b/src/httpserver.cpp
     4@@ -211,6 +211,7 @@ std::string RequestMethodString(HTTPRequest::RequestMethod m)
     5 /** HTTP request callback */
     6 static void http_request_cb(struct evhttp_request* req, void* arg)
     7 {
     8+    LogPrint(BCLog::HTTP, "Received a request from %d req %p\n", req->remote_port, (void*)req);
     9     // Track requests and notify when a request is completed.
    10     {
    11         WITH_LOCK(g_requests_mutex, g_requests.insert(req));
    12@@ -222,14 +223,22 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
    13         // called if the connection is closed after a successful request, but
    14         // complete callback set below already cleared the state.
    15         evhttp_connection_set_closecb(conn, [](evhttp_connection* conn, void* arg) {
    16-            auto req{static_cast<evhttp_request*>(arg)};
    17+	     auto req{static_cast<evhttp_request*>(arg)};
    18+            auto sockd = (struct sockaddr_in*)evhttp_connection_get_addr(conn);
    19+	     auto addr = sockd->sin_port;
    20+	     LogPrint(BCLog::HTTP, "set_closecb called for remote port %d conn port %d req %p output buffer %p\n", req->remote_port, addr, (void*)req, (void*)req->output_buffer);
    21             WITH_LOCK(g_requests_mutex, g_requests.erase(req));
    22             g_requests_cv.notify_all();
    23         }, req);
    24 
    25         // Clear state after successful request.
    26         evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
    27-            WITH_LOCK(g_requests_mutex, g_requests.erase(req));
    28+	     auto conn = req->evcon;
    29+	     auto sockd = (struct sockaddr_in*)evhttp_connection_get_addr(conn);
    30+	     auto addr = sockd->sin_port;
    31+            LogPrint(BCLog::HTTP, "set_on_complete_cb called for remote port %d conn port %d req %p output buffer %p\n", req->remote_port, addr, (void*)req, (void*)req->output_buffer);
    32+            auto n{WITH_LOCK(g_requests_mutex, return g_requests.erase(req))};
    33+            assert(n == 1);
    34             g_requests_cv.notify_all();
    35         }, nullptr);
    36     }
    

    In the following log snippet, the request from remote ports 56815 and 56816 use the same memory address 0x600000224000. When the close callback is called, I would have expected an output_buffer address of 0x6000009301b0 rather than 0x6000009302d0. Note that g_requests is a set of evhttp_request*. So the first complete callback removes the request with address 0x600000224000. Then a new request is received with the same address and is inserted into g_requests. Then the close callback runs (I’m not sure which request this close callback belongs to) and removes the entry with address 0x600000224000. Then the last complete callback runs and the entry has already been deleted.

    02023-07-17T11:50:09Z [http] Received a request from 56815 req 0x600000224000
    12023-07-17T11:50:09Z [http] Received a POST request for / from [::1]:56815
    22023-07-17T11:50:09Z [http] set_on_complete_cb called for remote port 56815 conn port 61405 req 0x600000224000 output buffer 0x6000009301b0
    32023-07-17T11:50:09Z [http] Received a request from 56816 req 0x600000224000
    42023-07-17T11:50:09Z [http] Received a POST request for / from [::1]:56816
    52023-07-17T11:50:09Z [http] set_closecb called for remote port 56816 conn port 61405 req 0x600000224000 output buffer 0x6000009302d0
    62023-07-17T11:50:09Z [http] set_on_complete_cb called for remote port 56816 conn port 61661 req 0x600000224000 output buffer 0x6000009302d0
    7Assertion failed: (n == 1), function operator(), file httpserver.cpp, line 241.
    8Abort trap: 6
    
  39. in src/httpserver.cpp:228 in d170e6c60c
    223+        // complete callback set below already cleared the state.
    224+        evhttp_connection_set_closecb(conn, [](evhttp_connection* conn, void* arg) {
    225+            auto req{static_cast<evhttp_request*>(arg)};
    226+            WITH_LOCK(g_requests_mutex, g_requests.erase(req));
    227+            g_requests_cv.notify_all();
    228+        }, req);
    


    stickies-v commented at 4:30 pm on July 17, 2023:

    This is the culprit of what you’re observing. req becomes a dangling pointer as soon as a request is completed. In most cases (which is why we typically need > 1 number of iterations), this just means that g_requests.erase(req) removes 0 elements because it points to a non-existing request, but sometimes (and actually quite frequently as addresses seem to be reused quite aggressively) this will point to an unrelated request which has already been added into g_requests, erasing it unintentionally.

    The underlying issue is that an evhttp_connection and an evhttp_request simply have different lifecycles, with connections living longer than requests. It seems that passing req as an argument to evhttp_connection_set_closecb is a fundamentally unsafe thing to do if we can’t guarantee that req hasn’t been completed by the time closecb is called (which in the happy path, we always expect it to be).


    Crypt-iQ commented at 4:35 pm on July 19, 2023:
    nice catch!

    vasild commented at 10:40 am on September 19, 2023:

    What about this:

     0evhttp_connection_set_closecb(
     1    evhttp_request_get_connection(req),
     2    [](evhttp_connection* conn, void* arg) {
     3        auto req{static_cast<evhttp_request*>(arg)};
     4        // By the time the connection close callback is executed (this lambda)
     5        // the request we scheduled to be passed as an argument (arg) may
     6        // have been destroyed and a new, unrelated, one could have been
     7        // created at the same address. So arg may be a stale pointer or
     8        // a valid pointer, pointing to a new request object.
     9        LOCK(g_requests_mutex);
    10        if (g_requests.count(req) > 0 /* is not a stale pointer */
    11            && evhttp_request_get_connection(req) == conn /* is not pointing to a new request */) {
    12
    13            g_requests.erase(req);
    14            g_requests_cv.notify_all();
    15        }
    16    },
    17    req);
    

    stickies-v commented at 2:02 pm on September 19, 2023:
    0            && evhttp_request_get_connection(req) == conn /* is not pointing to a new request */) {
    

    I’m not sure this will be 100% robust when we have multiple requests in a connection, I think this can still lead to usage of stale pointers? But as we’re closing the connection anyway, it may not be an issue.


    stickies-v commented at 4:38 pm on September 19, 2023:

    I think the most sensible approach is to instead of track requests, keep a per-connection request counter. It’s a pretty straightforward approach with a std::unordered_map<const evhttp_connection*, size_t>, but I’ve added a RequestTracker helper class wrapping around that to make that a bit more robust. What do you think?

    https://github.com/stickies-v/bitcoin/commit/953c6914806bdaa6d087984049dccfe180dffc20


    stickies-v commented at 6:22 pm on September 29, 2023:
    I’ve opened #28551 with this approach to try and get it in before v26.
  40. stickies-v commented at 5:00 pm on July 17, 2023: contributor

    Nice investigative work, think I found the issue as per my comment.

    I wonder if we could track connections instead of requests. A quick implementation (docstrings etc generally not updated, just a PoW) seems to indicate that it does:

     0diff --git a/src/httpserver.cpp b/src/httpserver.cpp
     1index 849e9b482..8fe826f55 100644
     2--- a/src/httpserver.cpp
     3+++ b/src/httpserver.cpp
     4@@ -151,9 +151,9 @@ static std::vector<HTTPPathHandler> pathHandlers GUARDED_BY(g_httppathhandlers_m
     5 //! Bound listening sockets
     6 static std::vector<evhttp_bound_socket *> boundSockets;
     7 //! Track active requests
     8-static GlobalMutex g_requests_mutex;
     9-static std::condition_variable g_requests_cv;
    10-static std::unordered_set<evhttp_request*> g_requests GUARDED_BY(g_requests_mutex);
    11+static GlobalMutex g_http_conns_mutex;
    12+static std::condition_variable g_http_conns_cv;
    13+static std::unordered_set<evhttp_connection*> g_http_conns GUARDED_BY(g_http_conns_mutex);
    14 
    15 /** Check if a network address is allowed to access the HTTP server */
    16 static bool ClientAllowed(const CNetAddr& netaddr)
    17@@ -213,25 +213,16 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
    18 {
    19     // Track requests and notify when a request is completed.
    20     {
    21-        WITH_LOCK(g_requests_mutex, g_requests.insert(req));
    22-        g_requests_cv.notify_all();
    23-
    24         auto conn{evhttp_request_get_connection(req)};
    25+        WITH_LOCK(g_http_conns_mutex, g_http_conns.insert(conn));
    26+        g_http_conns_cv.notify_all();
    27 
    28-        // Close callback to clear active but running request. This is also
    29-        // called if the connection is closed after a successful request, but
    30-        // complete callback set below already cleared the state.
    31-        evhttp_connection_set_closecb(conn, [](evhttp_connection* conn, void* arg) {
    32-            auto req{static_cast<evhttp_request*>(arg)};
    33-            WITH_LOCK(g_requests_mutex, g_requests.erase(req));
    34-            g_requests_cv.notify_all();
    35-        }, req);
    36 
    37-        // Clear state after successful request.
    38-        evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
    39-            WITH_LOCK(g_requests_mutex, g_requests.erase(req));
    40-            g_requests_cv.notify_all();
    41-        }, nullptr);
    42+        // Remove connection when it is closed
    43+        evhttp_connection_set_closecb(conn, [](evhttp_connection* conn, void* arg) {
    44+            auto n{WITH_LOCK(g_http_conns_mutex, return g_http_conns.erase(conn))};
    45+            g_http_conns_cv.notify_all();
    46+        }, NULL);
    47     }
    48 
    49     // Disable reading to work around a libevent bug, fixed in 2.2.0.
    50@@ -486,12 +477,12 @@ void StopHTTPServer()
    51     }
    52     boundSockets.clear();
    53     {
    54-        WAIT_LOCK(g_requests_mutex, lock);
    55-        if (!g_requests.empty()) {
    56-            LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size());
    57+        WAIT_LOCK(g_http_conns_mutex, lock);
    58+        if (!g_http_conns.empty()) {
    59+            LogPrint(BCLog::HTTP, "Waiting for %d connections to stop HTTP server\n", g_http_conns.size());
    60         }
    61-        g_requests_cv.wait(lock, []() EXCLUSIVE_LOCKS_REQUIRED(g_requests_mutex) {
    62-            return g_requests.empty();
    63+        g_http_conns_cv.wait(lock, []() EXCLUSIVE_LOCKS_REQUIRED(g_http_conns_mutex) {
    64+            return g_http_conns.empty();
    65         });
    66     }
    67     if (eventHTTP) {
    

    This assumes that a connection can never have more than 1 request, and I’m not sure if that’s the case.

  41. Crypt-iQ commented at 4:39 pm on July 19, 2023: contributor

    This assumes that a connection can never have more than 1 request, and I’m not sure if that’s the case.

    I’m also not sure. This PR body leads me to believe that it is possible to have more than 1 request per connection: https://github.com/libevent/libevent/pull/592#issue-292981989 , however I’m not sure whether it’s possible for them to be interleaved and cause problems with your connection-based patch. Something I’ll try to investigate.

  42. Crypt-iQ commented at 2:14 pm on July 21, 2023: contributor

    It is possible for a connection to have more than one request (see: https://github.com/libevent/libevent/issues/1383#issuecomment-1383249281) or evhttp_associate_new_request_with_connection in the libevent code. It appears though that the requests are sequential so when one completes the next one can run. I modified the python script to test this, but it doesn’t lead to a deadlock or crash because when the next request comes in, it uses the same evhttp_connection* and so the insert into g_http_conns is a no-op. The next request will also update the callback on the evhttp_connection* via evhttp_connection_set_closecb so that only one callback is run. When the connection is finally closed, it will find the connection & delete it. I think this means your patch works, but I’ll continue to investigate a little more

     0import http.client
     1import base64
     2from concurrent.futures import ThreadPoolExecutor
     3import time
     4
     5host = "localhost"
     6port = 18443
     7
     8auth_string = base64.b64encode('user:pass'.encode()).decode()
     9
    10num_threads =  1
    11iterations_per_thread = 1
    12
    13def send_requests(i):
    14    for _ in range(iterations_per_thread):
    15        conn = http.client.HTTPConnection(host, port)
    16        conn.request("POST", "/", headers={"Keep-Alive": "timeout=500, max=1000"})
    17        
    18        #response = conn.getresponse()
    19        #conn.close()
    20
    21        time.sleep(3)
    22
    23        conn.request("POST", "/", headers={"Keep-Alive": "timeout=500, max=1000"})
    24
    25        time.sleep(100)
    26
    27with ThreadPoolExecutor(max_workers=num_threads) as executor:
    28    executor.map(send_requests, range(num_threads))
    
  43. Crypt-iQ commented at 2:03 am on August 4, 2023: contributor
    I haven’t been able to cause a crash with the connection-based code. Given that the connection-based code is different than the crashing code here, do you want to open a new PR with the connection-based code @stickies-v ?
  44. fanquake commented at 9:12 am on September 15, 2023: member
    What is that status of this? cc @stickies-v.
  45. DrahtBot added the label CI failed on Sep 15, 2023
  46. vasild commented at 8:12 am on September 19, 2023: contributor
    https://github.com/libevent/libevent/issues/78 is relevant. Need to check how libevent was fixed to detect client close and make use of that in bitcoind.
  47. stickies-v commented at 10:32 am on September 19, 2023: contributor

    Need to check how libevent was fixed to detect client close and make use of that in bitcoind.

    As far as I could find, it’s not yet fixed but it’s also largely orthogonal, see my comment here and this open libevent issue. Detecting a remote disconnect would help save resources by not completing a request we’ll no longer be serving, but isn’t necessary to avoid the hangs on shutdown.

  48. DrahtBot removed the label CI failed on Sep 26, 2023
  49. stickies-v referenced this in commit a414b9c5ad on Sep 29, 2023
  50. stickies-v referenced this in commit d75c8b8c13 on Sep 29, 2023
  51. stickies-v referenced this in commit dc44bbad5a on Sep 29, 2023
  52. stickies-v referenced this in commit da8496571a on Sep 29, 2023
  53. fanquake removed this from the milestone 25.1 on Oct 2, 2023
  54. fanquake commented at 9:37 am on October 2, 2023: member
    Going to mark this as draft for now, given that #28551 is open.
  55. fanquake marked this as a draft on Oct 2, 2023
  56. stickies-v referenced this in commit f01ec7f4d9 on Oct 2, 2023
  57. stickies-v referenced this in commit 8c84bf25f2 on Oct 2, 2023
  58. stickies-v referenced this in commit bd18dd1f53 on Oct 3, 2023
  59. stickies-v referenced this in commit 41f9027813 on Oct 3, 2023
  60. fanquake referenced this in commit db7b5dfcc5 on Oct 4, 2023
  61. fanquake referenced this in commit ae86adabe4 on Oct 4, 2023
  62. stickies-v commented at 9:50 am on October 4, 2023: contributor
    Superseded by #28551, can be closed now.
  63. fanquake commented at 9:51 am on October 4, 2023: member
    Closing for now.
  64. fanquake closed this on Oct 4, 2023


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me