- Log HTTP request pointers to be able to discover which one is stuck.
- Instead of hanging indefinitely, use timeouts and abort when HTTP connections take unreasonably long to close.
- Ensure we always free libevent HTTP.
Should help debug #31894.
Should help debug #31894.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31929.
See the guideline for information on the review process. A summary of reviews will appear here.
No conflicts as of last run.
0Assertion failed: (evb), function WriteReply, file httpserver.cpp, line 682.
1Error processing input "/Users/runner/work/bitcoin/bitcoin/ci/scratch/qa-assets/fuzz_corpora/http_request/83149a7e3abd937c7bc67fa55b9a4fc9338e3d8c"
2
3⚠️ Failure generated from target with exit code 1: ['/Users/runner/work/bitcoin/bitcoin/ci/scratch/build-aarch64-apple-darwin23.6.0/src/test/fuzz/fuzz', PosixPath('/Users/runner/work/bitcoin/bitcoin/ci/scratch/qa-assets/fuzz_corpora/http_request')]
🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37612857743
Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
564- // that evhttp_free does not need to be called again after the handling
565- // of unfinished request connections that follows.
566- event_base_once(eventBase, -1, EV_TIMEOUT, [](evutil_socket_t, short, void*) {
567+ // Schedule a callback to call evhttp_free in the event base thread, as
568+ // otherwise eventHTTP often keeps internal events alive, meaning
569+ // aforementioned thread would never run out of events and exit.
To verify the improved accuracy of the documentation here one can apply:
0diff --git a/src/httpserver.cpp b/src/httpserver.cpp
1index 52fa8463d2..a3efaa2579 100644
2--- a/src/httpserver.cpp
3+++ b/src/httpserver.cpp
4@@ -557,6 +557,10 @@ void StopHTTPServer()
5 // Schedule a callback to call evhttp_free in the event base thread, as
6 // otherwise eventHTTP often keeps internal events alive, meaning
7 // aforementioned thread would never run out of events and exit.
8+ while (true) {
9+ event_base_dump_events(eventBase, stderr);
10+ std::this_thread::sleep_for(5s);
11+ }
12 if (event_base_once(eventBase, -1, EV_TIMEOUT, [](evutil_socket_t, short, void*) {
13 evhttp_free(eventHTTP);
14 eventHTTP = nullptr;
Run bitcoind, and then send the stop-RPC from bitcoin-cli. It gave me the following log:
02025-02-22T15:18:19Z [http] Stopping HTTP server
12025-02-22T15:18:19Z [http] Waiting for HTTP worker threads to exit
2Inserted events:
32025-02-22T15:18:19Z [http] Exited http event loop
4 0x555556947a50 [fd 11] Read Persist Internal
5Active events:
62025-02-22T15:18:19Z net thread exit
72025-02-22T15:18:20Z msghand thread exit
8Inserted events:
9 0x555556947a50 [fd 11] Read Persist Internal
10Active events:
11Inserted events:
12 0x555556947a50 [fd 11] Read Persist Internal
13Active events:
14Inserted events:
15 0x555556947a50 [fd 11] Read Persist Internal
16Active events:
17<repeated until killing the process>
Edit: The documentation change has since been reverted to simplify the PR.
From what i remember this was extrememly hard to get right with libevent (or at least our use of it), without leaking anything or running into race conditions. There have been a few tries to fix the behavior over the years, but they all were different compromises, adding ever more compexity, and often introducing new problems.
As it seems to be where things are heading anyway (#32061) maybe it’s better to focus on replacing libevent and its webserver framework wholesale.
Thank you for providing your perspective @laanwj!
In working on this PR, I’ve also gained an appreciation for getting rid of libevent. However, I’m not sure when that will happen, maybe not until post-30.0. In the meantime we are seeing issues on CI such as #31894 (analysis).
What this PR is doing is adding our own request ids to track which HTTP request never finishes. When we have stuck HTTP requests, it also makes us abort the process with a clear error after 10min30s instead of waiting for the test framework to time out after 40min. On top of that it also fixes an intermittent leak, right after we’ve joined on the thread that is the only one to sometimes perform deletion.
Have come across #26742 and #19420 while working on this, so have seen some of the struggle.
Don’t have an overall grasp on how frequent issues like 31894 are on CI (or in the wild), it’s fair that that frequency should influence the priority of this compared to other work.
Could aid debugging when an HTTP request gets stuck at shutdown.
Could happen when libevent HTTP connection doesn't complete for some reason.
Output early warning to give clue about what's up.
Prior check+log message before even starting to wait was a bit too eager to hint at something being wrong.
There is a race between ThreadHTTP exiting and our queuing of the freeing-event (event_base_once).
Additional free was useful in a majority of runs - 709/962.
Memory leaks during shutdown are not a big issue (unless growing unbounded) as the OS will clean up after the process, but still nice to tidy up.
Can be reproduced by on this commit by running something like:
```shell
#!/usr/bin/env bash
mv ~/.bitcoin/debug.log ~/.bitcoin/debug.log.old
# Allows overriding total through `$ ./repeat.sh <TOTAL>`
TOTAL=${1:-100}
for ((i=1; i<=$TOTAL; i++)); do
echo "Repetition $i/$TOTAL"
./build/bin/bitcoind -daemonwait -noconnect -debug=libevent -debug=http -debug=rpc
./build/bin/bitcoin-cli stop
if [ $? -ne 0 ]; then
echo "Exit code $?"
exit
fi
while [ -f "$HOME/.bitcoin/bitcoind.pid" ]; do
echo "PID file still exists"
sleep 0.1
done
done
echo "Leak averted in $(grep "Freeing eventHTTP-event" ~/.bitcoin/debug.log -c)/$(grep "Bitcoin Core version" ~/.bitcoin/debug.log -c) cases."
```
Spent some time revisiting and polishing this since 24558c2cf18210f46d6e2fadf0c5c5912f4b8e10.
HTTPRequestTracker
s CountActiveConnections()
and WaitUntilEmpty()
.
hodlinator
DrahtBot
fanquake
laanwj
Labels
RPC/REST/ZMQ