http: fix submission during shutdown race #34577

pull furszy wants to merge 2 commits into bitcoin:master from furszy:2026_http_submission_error changing 4 files +73 −38
  1. furszy commented at 7:52 pm on February 12, 2026: member

    Fixes #34573.

    As mentioned in #34573 (comment), the ThreadPool PR (#33689) revealed an existing issue.

    Before that PR, we were returning an incorrect error “Request rejected because http work queue depth exceeded” during shutdown for unhandled requests (we were not differentiating between “queue depth exceeded” and “server interrupted” errors). Now, with the ThreadPool inclusion, we return the proper error but we don’t handle it properly.

    This PR improves exactly that. Handling the missing error and properly returning it to the user.

    The race can be reproduced as follows:

    1. The server receives an http request.
    2. Processing of the request is delayed, and shutdown is triggered in the meantime.
    3. During shutdown, the libevent callback is unregistered and the threadpool interrupted.
    4. The delayed request (step 2) resumes and tries to submit a task to the now-interrupted server.

    Reproduction test can be found #34577 (comment).

    Also, to prevent this kind of issue from happening again, this PR changes task submission to return the error as part of the function’s return value using util::Expected instead of throwing the exception. Unlike exceptions, which require extra try-catch blocks and can be ignored, returning Expected forces callers to explicitly handle failures, and attributes like [[nodiscard]] allow us catch unhandled ones at compile time.

  2. DrahtBot added the label RPC/REST/ZMQ on Feb 12, 2026
  3. DrahtBot commented at 7:52 pm on February 12, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK sedited, pinheadmz, andrewtoth, achow101, hodlinator
    Concept ACK ismaelsadeeq
    Stale ACK Eunovo

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34562 (ThreadPool follow-ups, proactive shutdown and HasReason dependency cleanup by furszy)
    • #32061 (Replace libevent with our own HTTP and socket-handling implementation by pinheadmz)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. furszy force-pushed on Feb 12, 2026
  5. DrahtBot added the label CI failed on Feb 12, 2026
  6. DrahtBot commented at 8:03 pm on February 12, 2026: contributor

    🚧 At least one of the CI tasks failed. Task No wallet: https://github.com/bitcoin/bitcoin/actions/runs/21961944156/job/63441463470 LLM reason (✨ experimental): Compilation failed in fuzz/threadpool.cpp: Submit returns util::Expected<future, string> which cannot be assigned to std::future, causing build errors.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. in src/test/threadpool_tests.cpp:49 in 0b2d12f83c
    43@@ -44,6 +44,13 @@ BOOST_FIXTURE_TEST_SUITE(threadpool_tests, ThreadPoolFixture)
    44         }                                                                         \
    45     } while (0)
    46 
    47+// Helper to unwrap a valid pool submission
    48+template <typename F> [[nodiscard]]
    49+auto Submit(ThreadPool& pool, F&& fn)
    


    hodlinator commented at 8:09 pm on February 12, 2026:

    Nice helper! Was actually experimenting with changing ThreadPool::Submit() to return util::Expected yesterday (suffering from lack of knowledge of std::invoke_result_t<F> though). It was so cumbersome to adapt these test without realizing a helper would be useful. :)

    nit: The going style seems to be [[nodiscard]] on the same line as the return type:

    0template <typename F>
    1[[nodiscard]] auto Submit(ThreadPool& pool, F&& fn)
    

    Examples from master: https://github.com/bitcoin/bitcoin/blob/07b924775e4ff5dbfb9f2e7549fd13db8ec5c61b/src/util/overflow.h#L14-L15 https://github.com/bitcoin/bitcoin/blob/07b924775e4ff5dbfb9f2e7549fd13db8ec5c61b/src/test/fuzz/util.h#L84-L85

    Same style nit goes for ThreadPool::Submit().


    furszy commented at 11:01 pm on February 12, 2026:
    Done as suggested. [[nodiscard]] inlined. Also, credits to ryanofsky for the helper function idea <3. I was about to use a macro.
  8. in src/util/threadpool.h:169 in 5326821cbf
    169             }
    170             m_work_queue.emplace(std::move(task));
    171         }
    172         m_cv.notify_one();
    173-        return future;
    174+        return util::Expected<decltype(future), std::string>(std::move(future));
    


    hodlinator commented at 8:19 pm on February 12, 2026:

    nit: Can replace with:

    0        return {std::move(future)};
    

    furszy commented at 11:02 pm on February 12, 2026:
    Cool. Done as suggested. Thanks.
  9. in src/util/threadpool.h:164 in 5326821cbf
    163         auto future{task.get_future()};
    164         {
    165             LOCK(m_mutex);
    166             if (m_interrupt || m_workers.empty()) {
    167-                throw std::runtime_error("No active workers; cannot accept new tasks");
    168+                return util::Unexpected<std::string>("No active workers; cannot accept new tasks");
    


    hodlinator commented at 8:20 pm on February 12, 2026:

    nit: No need to specify Unexpected template arg:

    0                return util::Unexpected{"No active workers; cannot accept new tasks"};
    

    furszy commented at 11:02 pm on February 12, 2026:
    Cool. Done as suggested. Thanks.
  10. in src/httpserver.cpp:284 in 5326821cbf
    280+        if (auto res = g_threadpool_http.Submit(std::move(item)); !res.has_value()) {
    281+            LogWarning("Request rejected during server shutdown");
    282+            hreq->WriteReply(HTTP_SERVICE_UNAVAILABLE, "Request rejected during server shutdown");
    283+            return;
    284+        }
    285+
    


    hodlinator commented at 8:26 pm on February 12, 2026:

    meganit: Superfluous newline after if-block before end of outer block.


    furszy commented at 11:02 pm on February 12, 2026:
    Done.
  11. andrewtoth commented at 8:31 pm on February 12, 2026: contributor

    The race can be reproduced as follows:

    Can this be made into a functional regression test?

  12. in src/httpserver.cpp:214 in 5326821cbf outdated
    210@@ -211,7 +211,7 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
    211             }
    212         }
    213     }
    214-    auto hreq{std::make_unique<HTTPRequest>(req, *static_cast<const util::SignalInterrupt*>(arg))};
    215+    auto hreq{std::make_shared<HTTPRequest>(req, *static_cast<const util::SignalInterrupt*>(arg))};
    


    hodlinator commented at 8:32 pm on February 12, 2026:
    remark: Was experimenting with having ThreadPool::Submit() take a “task_creator” lambda that only gets called when no error condition in Submit occurs, it would then return the inner lambda which would take ownership of the unique_ptr as on master. Even though I have a distaste for shared_ptr, your version is more elegant.
  13. in src/test/fuzz/threadpool.cpp:1 in 241367d021 outdated


    hodlinator commented at 8:51 pm on February 12, 2026:

    Commit message suggestion for 241367d021d210f25e427e7bb35c676b293ff28a:

    0threadpool: make Submit return Expected instead of throwing
    1
    2Unlike exceptions, which can be ignored as they require extra try-catch
    3blocks, returning expected errors forces callers to always handle
    4submission failures.
    5
    6+ Not throwing an exception also happens to fix unclean shutdown bug
    7+ [#34573](/bitcoin-bitcoin/34573/) since we no longer throw when attempting to Submit() from http_request_cb().
    

    hodlinator commented at 8:54 pm on February 12, 2026:

    Commit message suggestion for 5326821cbffbef388421af082d43c7bc6bcc48f7:

    Given that I think the unclean shutdown is already fixed in the first commit by using util::Expected instead of throwing exceptions, I think this is somewhat more correct for the second:

     0http: properly respond to HTTP request during shutdown
     1
     2Makes sure we respond to the client as the HTTP request attempts to submit a task to
     3the threadpool during server shutdown.
     4
     5Roughly what happens:
     6
     71) The server receives an HTTP request and starts calling http_request_cb().
     82) Meanwhile on another thread, shutdown is triggered which calls InterruptHTTPServer()
     9   and unregisters http_request_cb() and interrupts the thread pool.
    103) The request (step 1) resumes and tries to submit a task
    11   to the now-interrupted server.
    12
    13This fix detects failed submissions immediately, and the server
    14responds with HTTP_SERVICE_UNAVAILABLE.
    

    I don’t think it’s necessary to classify this as a race condition. I guess it depends on how you define it.


    furszy commented at 11:02 pm on February 12, 2026:
    Sure. Done as suggested. Thanks.

    furszy commented at 11:06 pm on February 12, 2026:

    Taken, thanks!

    I don’t think it’s necessary to classify this as a race condition. I guess it depends on how you define it.

    There is a thread (shutdown - main thread) modifying a resource (thread pool) just before another thread (libevent http handler) accesses it. It is pretty much a race condition to me.


    hodlinator commented at 8:34 pm on February 13, 2026:

    There is a thread (shutdown - main thread) modifying a resource (thread pool) just before another thread (libevent http handler) accesses it. It is pretty much a race condition to me.

    Fair.


    hodlinator commented at 11:04 am on February 17, 2026:

    IMO stopping the ThreadPool before we stop the libevent dispatch loop is the wrong order. Would ideally do something like the following patch:

     0http: Correct shutdown order of libevent vs ThreadPool
     1
     2g_threadpool_http is initialized before the libevent dispatch loop in g_thread_http,
     3so it should be deinitialized in the reverse order. Exit the event loop first so
     4that we stop trying to Submit() new tasks to the pool before stopping it. (We will
     5still interrupt the pool earlier on, so submission-attempts can still fail).
     6
     7Also:
     8* Delete the accept sockets before exiting the loop.
     9* Free eventHTTP from our own thread as events are no longer processed (this also fixes a frequent memory leak where eventHTTP would not be freed).
    10
    11diff --git a/src/httpserver.cpp b/src/httpserver.cpp
    12index 578c4ec693..ec821dd0b0 100644
    13--- a/src/httpserver.cpp
    14+++ b/src/httpserver.cpp
    15@@ -457,15 +457,20 @@ void StopHTTPServer()
    16 {
    17     LogDebug(BCLog::HTTP, "Stopping HTTP server\n");
    18 
    19-    LogDebug(BCLog::HTTP, "Waiting for HTTP worker threads to exit\n");
    20-    g_threadpool_http.Stop();
    21-
    22-    // Unlisten sockets, these are what make the event loop running, which means
    23-    // that after this and all connections are closed the event loop will quit.
    24+    // Unlisten sockets
    25     for (evhttp_bound_socket *socket : boundSockets) {
    26         evhttp_del_accept_socket(eventHTTP, socket);
    27     }
    28     boundSockets.clear();
    29+
    30+    if (eventBase) {
    31+        LogDebug(BCLog::HTTP, "Finishing processing of event loop callbacks\n");
    32+        event_base_loopexit(eventBase, nullptr);
    33+    }
    34+
    35+    LogDebug(BCLog::HTTP, "Waiting for HTTP worker threads to exit\n");
    36+    g_threadpool_http.Stop();
    37+
    38     {
    39         if (const auto n_connections{g_requests.CountActiveConnections()}; n_connections != 0) {
    40             LogDebug(BCLog::HTTP, "Waiting for %d connections to stop HTTP server\n", n_connections);
    41@@ -473,13 +478,8 @@ void StopHTTPServer()
    42         g_requests.WaitUntilEmpty();
    43     }
    44     if (eventHTTP) {
    45-        // Schedule a callback to call evhttp_free in the event base thread, so
    46-        // that evhttp_free does not need to be called again after the handling
    47-        // of unfinished request connections that follows.
    48-        event_base_once(eventBase, -1, EV_TIMEOUT, [](evutil_socket_t, short, void*) {
    49-            evhttp_free(eventHTTP);
    50-            eventHTTP = nullptr;
    51-        }, nullptr, nullptr);
    52+        evhttp_free(eventHTTP);
    53+        eventHTTP = nullptr;
    54     }
    55     if (eventBase) {
    56         LogDebug(BCLog::HTTP, "Waiting for HTTP event thread to exit\n");
    

    Doing the above allows for merging the ThreadPool’s constructor with Start() and merging the destructor with Stop().

    • This follows from not allowing calling Start() twice, which is implicit in a constructor. The new assert in Start() becomes unnecessary.
    • Having Stop() logic only happen during destruction also implies that only a single thread should have access to the ThreadPool instance at that time. This would prevent the potential race condition I found here: #34562 (review)
      0util: Merge ThreadPool ctor with Start(), and ThreadPool dtor with Stop().
      1
      2HTTP now wraps the pool in std::optional.
      3
      4diff --git a/src/httpserver.cpp b/src/httpserver.cpp
      5index ec821dd0b0..8ca5b7771e 100644
      6--- a/src/httpserver.cpp
      7+++ b/src/httpserver.cpp
      8@@ -75,7 +75,7 @@ static std::vector<HTTPPathHandler> pathHandlers GUARDED_BY(g_httppathhandlers_m
      9 //! Bound listening sockets
     10 static std::vector<evhttp_bound_socket *> boundSockets;
     11 //! Http thread pool - future: encapsulate in HttpContext
     12-static ThreadPool g_threadpool_http("http");
     13+static std::optional<ThreadPool> g_threadpool_http;
     14 static int g_max_queue_depth{100};
     15 
     16 /**
     17@@ -252,7 +252,7 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
     18 
     19     // Dispatch to worker thread
     20     if (i != iend) {
     21-        if (static_cast<int>(g_threadpool_http.WorkQueueSize()) >= g_max_queue_depth) {
     22+        if (static_cast<int>(g_threadpool_http->WorkQueueSize()) >= g_max_queue_depth) {
     23             LogWarning("Request rejected because http work queue depth exceeded, it can be increased with the -rpcworkqueue= setting");
     24             hreq->WriteReply(HTTP_SERVICE_UNAVAILABLE, "Work queue depth exceeded");
     25             return;
     26@@ -276,7 +276,7 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
     27             req->WriteReply(HTTP_INTERNAL_SERVER_ERROR, err_msg);
     28         };
     29 
     30-        if (auto res = g_threadpool_http.Submit(std::move(item)); !res.has_value()) {
     31+        if (auto res = g_threadpool_http->Submit(std::move(item)); !res.has_value()) {
     32             // Both SubmitError::Inactive and SubmitError::Interrupted mean shutdown
     33             LogWarning("HTTP request rejected during server shutdown: '%s'", SubmitErrorString(res.error()));
     34             hreq->WriteReply(HTTP_SERVICE_UNAVAILABLE, "Request rejected during server shutdown");
     35@@ -438,7 +438,7 @@ void StartHTTPServer()
     36 {
     37     int rpcThreads = std::max((long)gArgs.GetIntArg("-rpcthreads", DEFAULT_HTTP_THREADS), 1L);
     38     LogInfo("Starting HTTP server with %d worker threads\n", rpcThreads);
     39-    g_threadpool_http.Start(rpcThreads);
     40+    g_threadpool_http.emplace("http", rpcThreads);
     41     g_thread_http = std::thread(ThreadHTTP, eventBase);
     42 }
     43 
     44@@ -450,7 +450,7 @@ void InterruptHTTPServer()
     45         evhttp_set_gencb(eventHTTP, http_reject_request_cb, nullptr);
     46     }
     47     // Interrupt pool after disabling requests
     48-    g_threadpool_http.Interrupt();
     49+    g_threadpool_http->Interrupt();
     50 }
     51 
     52 void StopHTTPServer()
     53@@ -469,7 +469,7 @@ void StopHTTPServer()
     54     }
     55 
     56     LogDebug(BCLog::HTTP, "Waiting for HTTP worker threads to exit\n");
     57-    g_threadpool_http.Stop();
     58+    g_threadpool_http.reset();
     59 
     60     {
     61         if (const auto n_connections{g_requests.CountActiveConnections()}; n_connections != 0) {
     62diff --git a/src/util/threadpool.h b/src/util/threadpool.h
     63index e1f2d86ed0..7a3d77e453 100644
     64--- a/src/util/threadpool.h
     65+++ b/src/util/threadpool.h
     66@@ -87,13 +87,6 @@ private:
     67     }
     68 
     69 public:
     70-    explicit ThreadPool(const std::string& name) : m_name(name) {}
     71-
     72-    ~ThreadPool()
     73-    {
     74-        Stop(); // In case it hasn't been stopped.
     75-    }
     76-
     77     /**
     78      * [@brief](/bitcoin-bitcoin/contributor/brief/) Start worker threads.
     79      *
     80@@ -104,12 +97,11 @@ public:
     81      * initialization and idle shutdown patterns, callers must provide their
     82      * own synchronization.
     83      */
     84-    void Start(const int num_workers) noexcept EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
     85+    explicit ThreadPool(const std::string& name, const int num_workers) noexcept EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
     86+        : m_name(name)
     87     {
     88         assert(num_workers > 0);
     89         LOCK(m_mutex);
     90-        assert(m_workers.empty());
     91-        m_interrupt = false; // Reset
     92 
     93         // Create workers
     94         m_workers.reserve(num_workers);
     95@@ -126,7 +118,7 @@ public:
     96      *
     97      * Must be called from a controller (non-worker) thread.
     98      */
     99-    void Stop() EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    100+    ~ThreadPool() EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    101     {
    102         // Notify workers and join them
    103         std::vector<std::thread> threads_to_join;
    

    Curious to hear what others think. This is not blocking for me within this PR. I haven’t thoroughly tested the libevent change beyond running functional tests and I can understand not wanting to touch libevent behavior. Maybe the change to ThreadPool should wait until #32061 lands.


    furszy commented at 3:54 pm on February 17, 2026:

    IMO stopping the ThreadPool before we stop the libevent dispatch loop is the wrong order

    Just to be precise with this, the ordering is wrong but it is not really an issue, because (1) InterruptHTTPServer replaces the handler for one that rejects all incoming requests. And (2) even if we would be handling requests, the pool would just reject the work and we would return a clear error message now.

    I agree with the first patch, but I think we should limit ourselves to refactoring libevent code while trying to avoid behavior changes at this point (unless we see a very clear benefit behind them). Modifications there require additional research time for an unmaintained library.

    Moving the pool shutdown to the end of the function should give us the same behavior without touching the libevent code.

    Doing the above allows for merging the ThreadPool’s constructor with Start() and merging the destructor with Stop().

    RAII the pool was brought up a few times before; I implemented it too. It’s not a bad idea, but I’ve been trying not to introduce it too early and first gather more use cases. Mainly because it would make lazy startup and idle shutdown patterns harder to implement. It would also require pool configurations to be stored in a separate structure and dup them when the pool gets constructed.


    hodlinator commented at 6:12 pm on February 18, 2026:

    Mainly because it would make lazy startup and idle shutdown patterns harder to implement.

    Yup, it took me a little while to figure out I needed to use optional::emplace() to in-place construct the pool in my suggested patch. Probably first time I used that in code I wrote.

    It would also require pool configurations to be stored in a separate structure and dup them when the pool gets constructed.

    Didn’t run into such issues in this instance FWIW.


    Interesting that more RAII style was brought up earlier, it does feel like a natural fit if we didn’t have libevent to navigate. Anyway, I’m happy we have this primitive now, sorry I didn’t have bandwidth to review it earlier.


    furszy commented at 6:51 pm on February 18, 2026:

    Interesting that more RAII style was brought up earlier, it does feel like a natural fit if we didn’t have libevent to navigate

    we have some enthusiastic RAII fans around here.

  14. furszy commented at 8:53 pm on February 12, 2026: member

    Can this be made into a functional regression test?

    The “processing request is delayed” step means manually placing a sleep() inside the libevent callback http_request_cb() to progress on the server shutdown concurrently. Then we need to sleep() during shutdown, just after InterruptHTTPServer(), to resume the request processing before shutdown occurs. So, the short answer is no. Sadly races aren’t easy to replicate deterministically in functional tests. We will surely be able to exercise more scenarios like this one, in unit tests, after the new http server inclusion (libevent removal path), which will bring better encapsulation.

    That being said, could provide a sleep()-based patch to replicate it locally.

  15. hodlinator commented at 9:01 pm on February 12, 2026: contributor
    Concept ACK 5326821cbffbef388421af082d43c7bc6bcc48f7
  16. DrahtBot removed the label CI failed on Feb 12, 2026
  17. furszy force-pushed on Feb 12, 2026
  18. furszy commented at 11:06 pm on February 12, 2026: member
    Updated per feedback, thanks @hodlinator!
  19. fanquake added this to the milestone 31.0 on Feb 13, 2026
  20. in src/util/threadpool.h:157 in 668eaa4ab4
    152@@ -151,20 +153,20 @@ class ThreadPool
    153      * Note: Ignoring the returned future requires guarding the task against
    154      * uncaught exceptions, as they would otherwise be silently discarded.
    155      */
    156-    template <class F> [[nodiscard]] EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    157-    auto Submit(F&& fn)
    158+    template <class F>
    159+    [[nodiscard]] util::Expected<std::future<std::invoke_result_t<F>>, std::string> Submit(F&& fn) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    


    andrewtoth commented at 2:25 pm on February 13, 2026:
    Can this be noexcept now?

    furszy commented at 3:38 pm on February 13, 2026:
    Sure, done as suggested.
  21. furszy force-pushed on Feb 13, 2026
  22. in src/httpserver.cpp:283 in b95e403595 outdated
    275@@ -276,7 +276,11 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
    276             req->WriteReply(HTTP_INTERNAL_SERVER_ERROR, err_msg);
    277         };
    278 
    279-        [[maybe_unused]] auto _{g_threadpool_http.Submit(std::move(item))};
    280+        if (auto res = g_threadpool_http.Submit(std::move(item)); !res.has_value()) {
    281+            LogWarning("Request rejected during server shutdown");
    282+            hreq->WriteReply(HTTP_SERVICE_UNAVAILABLE, "Request rejected during server shutdown");
    283+            return;
    284+        }
    


    hodlinator commented at 8:55 pm on February 13, 2026:
    1. Log the inner error string (currently it’s never read).
    2. Even though we know it in the context of this PR/issue - I think it’s a bit overly presumptuous to claim we are shutting down. All this outer logic knows is that Submit() failed. Would go with more neutral language.
    0        if (auto res = g_threadpool_http.Submit(std::move(item)); !res.has_value()) {
    1            LogWarning("Request rejected by thread pool: %s", res.error());
    2            hreq->WriteReply(HTTP_SERVICE_UNAVAILABLE, "Request rejected by thread pool");
    3            return;
    4        }
    

    (I was experimenting with changing ThreadPool::Submit() to enforce the queue depth by itself, which would be another potential reason for it to fail).


    furszy commented at 9:14 pm on February 13, 2026:

    The LogWarning change makes sense, but the WriteReply one isn’t really appropriate. Have in mind that WriteReply is an user-facing response, and referring to the “thread pool” there leaks internal details that do not help the user understand what happened nor what to do next.

    (I was experimenting with changing ThreadPool::Submit() to enforce the queue depth by itself, which would be another potential reason for it to fail).

    I briefly thought about this and ended up discarding the idea because there is no other known use-case that requires it for now. Capping the pool is required when the pool faces user’s requests, and all other applications we have for it are purely internal (parallel index sync, block inputs fetching, wallet rescan, kernel validation signals, etc). Ideally, we should only generalize features into the thread pool class only when we plan to use them in more than one place.


    furszy commented at 9:28 pm on February 13, 2026:
    We currently don’t have a simple way to detect shutdown from within the libevent request callback to be able to differentiate between states, and we shouldn’t pollute code just because of this. Submission can only fail because of a shutdown atm. We will be able to access more information from within this function after the libevent removal path, as the new server provides better encapsulation.

    hodlinator commented at 9:37 pm on February 13, 2026:

    Yeah, I can agree on not being to specific with the internals in user-facing messages and omitting thread pool.

    0        if (auto res = g_threadpool_http.Submit(std::move(item)); !res.has_value()) {
    1            LogWarning("Request rejected by thread pool: %s", res.error());
    2            hreq->WriteReply(HTTP_SERVICE_UNAVAILABLE, "Request rejected, probably due to server shutdown");
    3            return;
    4        }
    

    If you want to keep the current very assertive wording “Request rejected during server shutdown” I would suggest returning a one-value error enum instead of the std::string in util::Expected and having:

    0        if (auto res = g_threadpool_http.Submit(std::move(item)); !res.has_value()) {
    1            switch (res.error()) {
    2            case ThreadPool::SubmitError::Interrupted:
    3                LogWarning("Request rejected during server shutdown");
    4                hreq->WriteReply(HTTP_SERVICE_UNAVAILABLE, "Request rejected during server shutdown");
    5                return;
    6            } // no default case, so the compiler can warn about missing cases
    7        }
    

    hodlinator commented at 9:44 pm on February 13, 2026:

    I briefly thought about this and ended up discarding the idea because there is no other known use-case that requires it for now. Capping the pool is required when the pool faces user’s requests, and all other applications we have for it are purely internal (parallel index sync, block inputs fetching, wallet rescan, kernel validation signals, etc). Ideally, we should only generalize features into the thread pool class only when we plan to use them in more than one place.

    I was worried about multiple threads calling Submit() and doing the queue depth check on the outside in a racy way where they both pass and then both Submit() and exceed the depth by one. But libevent probably only calls http_request_cb() from one thread so we should be fine right now. Thanks for laying out your reasoning, makes sense.


    furszy commented at 9:51 pm on February 13, 2026:

    If you want to keep the current very assertive wording “Request rejected during server shutdown” I would suggest returning a one-value error enum instead of the std::string in util::Expected

    I’m not convinced we should add an error code enum when we’re not expecting to expand it anytime soon. It would force everyone using the pool to add a bunch of boilerplate code for no real gain. We could differentiate states now, without waiting for the post-libevent encapsulation, by checking whether the pool was interrupted too. Still, it feels like overthinking this for no real reason. I would focus on fixing the issue here and keep the code simple for now. We can always adapt/modify it when more pool use-cases get merged.


    hodlinator commented at 10:03 pm on February 13, 2026:
    True, I am probably somewhat biased by my earlier experiments. Maybe it would be cleaner for Submit() to return std::optional<std::future<std::invoke_result_t<F>>> instead of util::Expected<std::future<std::invoke_result_t<F>>, std::string> after all - as long as there is only one single cause of failure?

    furszy commented at 11:14 pm on February 13, 2026:

    Maybe it would be cleaner for Submit() to return std::optional<std::future<std::invoke_result_t<F>>> instead of util::Expected<std::future<std::invoke_result_t<F>>, std::string> after all - as long as there is only one single cause of failure?

    Hmm, good observation. Thinking further.. there are currently two possible errors during submission that are currently under the same error message: 1) a non-started pool and 2) an interrupted pool. The first one is actually a logical error rather than one during runtime, and we should probably assert/assume that cannot happen. Will think more about this.


    andrewtoth commented at 3:11 pm on February 15, 2026:
    I think Expected is the appropriate return value here. It is an unexpected failure to submit a task, rather than the result being “not found” or “not existing” as a nullopt would imply.

    hodlinator commented at 8:59 am on February 16, 2026:

    optional doesn’t strictly mean “found/not found”, IMO it’s closer to “may be set or not”. Submit() is not a query type function, we are submitting to a thread pool and should be getting a std::future back.

    The user-code and messages here immediately draw the conclusion that failure to submit means we are shutting down, so it is not that unexpected. Having a string as the error approaches stringly typed coding which should also be avoided (potentially returning different strings for different error conditions makes user code worse).

    As long as we only have one combined error condition (not started or interrupted), I think optional makes the most sense. If we separate the two I think util::Unexpected{enum SubmitError { NotStarted, Interrupted }} makes more sense.

    PS. I’m sorry if this feels nit-picky, I would probably let it slide if it was more of a leaf-type component.


    furszy commented at 3:36 pm on February 16, 2026:

    I went one step back and asked myself why added the submission exception in the first place (3.5 years ago). Should have remembered the reasoning sooner and not gotten obnubilated by our new shiny util::Expected toy.. sorry.

    Thread pools are a well-studied structure, and most languages and libraries handle submission errors in the same way:

    • Java’s ExecutorService throws RejectedExecutionException after shutdown().
    • Python’s ThreadPoolExecutor raises a RuntimeError after shutdown().
    • Boost.Asio treats submission after stop or shutdown as misuse.

    So, the principle is the same everywhere: a thread pool is a resource with a lifecycle. Once you cross the shutdown boundary, submitting work isn’t normal control flow anymore, and it should be treated differently.

    For us, that means:

    • “Pool not started”: that’s a bug in the calling code. We can either Assume it or throw a logic_error. Callers can’t recover from this anyway.
    • “Pool interrupted”: is a hard lifecycle boundary. Submitting after Interrupt() should throw a dedicated interrupted_exception. Not everyone has to worry about this.
    • Otherwise: strong guarantee. If you get a future back, your task will run.

    Will re-accommodate the code to follow this principles again. I’m inclined towards being consistent with existing libraries and not reinvent the wheel.

    Let me know if you guys have any reservation about this.


    hodlinator commented at 3:55 pm on February 16, 2026:
    I think the way Java and Python throw exceptions is dated. It’s what got us into the situation with this PR in the first place, returning an error by value is a safer pattern IMO. That’s why Rust opted to not support exceptions.

    furszy commented at 4:18 pm on February 16, 2026:

    I think the way Java and Python throw exceptions is dated. It’s what got us into the situation with this PR in the first place, returning an error by value is a safer pattern IMO. That’s why Rust opted to not support exceptions.

    It seems low-level Rust panics/aborts when submitting a task to an interrupted pool. What they have is a try_spawn wrapper (TrySubmit for us) that behaves in the way you describe. See rusty_pool or tokio-threadpool.

    It’s what got us into the situation with this PR in the first place

    Well, realistically speaking, the race was already there. We were returning the wrong error message before. The exception actually helped us catch it.


    andrewtoth commented at 4:46 pm on February 16, 2026:
    Agree with @hodlinator. Changing to exceptions is a step backwards. Modern error handling should use std::expected, which we can’t use yet. Also agree to use an enum for the different error cases, instead of using a std::optional. It’s ok if we don’t check the value yet in production code, but we can update the test cases to assert the proper error.

    andrewtoth commented at 5:10 pm on February 16, 2026:

    Well, realistically speaking, the race was already there. We were returning the wrong error message before. The exception actually helped us catch it.

    The previous race would return an error to the caller. The exception caused the node to crash. Using Expected prevents this kind of error from occurring because the caller is forced to handle the error instead of forgetting to wrap in try/catch.


    furszy commented at 0:50 am on February 17, 2026:

    The previous race would return an error to the caller. The exception caused the node to crash. Using Expected prevents this kind of error from occurring because the caller is forced to handle the error instead of forgetting to wrap in try/catch.

    Yes, that’s pretty much the PR description. What I meant above is that it’s great we’re here, because it means we’ve caught something new. That comment at least wasn’t a pro exceptions argument. It was a pro thread pool inclusion argument really.

  23. hodlinator commented at 8:59 pm on February 13, 2026: contributor
    Thanks for incorporating my earlier feedback btw!
  24. furszy force-pushed on Feb 13, 2026
  25. andrewtoth commented at 10:26 pm on February 14, 2026: contributor

    That being said, could provide a sleep()-based patch to replicate it locally.

    Here is a diff, along with a functional test rpc_http_shutdown_race.py. Applying this to the base commit 07b924775e4ff5dbfb9f2e7549fd13db8ec5c61b will fail with a remote disconnected error, and applying to this patch 542879da7b66e890d2c1bf668ab5680e939541f8 will pass.

     0diff --git a/src/httpserver.cpp b/src/httpserver.cpp
     1index 671e119642..d2de754183 100644
     2--- a/src/httpserver.cpp
     3+++ b/src/httpserver.cpp
     4@@ -18,6 +18,7 @@
     5 #include <util/strencodings.h>
     6 #include <util/threadnames.h>
     7 #include <util/threadpool.h>
     8+#include <util/time.h>
     9 #include <util/translation.h>
    10 
    11 #include <condition_variable>
    12@@ -257,6 +258,7 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
    13             hreq->WriteReply(HTTP_SERVICE_UNAVAILABLE, "Work queue depth exceeded");
    14             return;
    15         }
    16+        UninterruptibleSleep(2s);
    17 
    18         auto item = [req = std::move(hreq), in_path = std::move(path), fn = i->handler]() {
    19             std::string err_msg;
    20@@ -446,6 +448,7 @@ void InterruptHTTPServer()
    21     }
    22     // Interrupt pool after disabling requests
    23     g_threadpool_http.Interrupt();
    24+    UninterruptibleSleep(2s);
    25 }
    26 
    27 void StopHTTPServer()
    28diff --git a/test/functional/rpc_http_shutdown_race.py b/test/functional/rpc_http_shutdown_race.py
    29new file mode 100755
    30index 0000000000..f91cdee9f3
    31--- /dev/null
    32+++ b/test/functional/rpc_http_shutdown_race.py
    33@@ -0,0 +1,42 @@
    34+#!/usr/bin/env python3
    35+# Copyright (c) The Bitcoin Core developers
    36+# Distributed under the MIT software license, see the accompanying
    37+# file COPYING or http://www.opensource.org/licenses/mit-license.php.
    38+"""Test that an in-flight RPC during shutdown receives 503."""
    39+
    40+from test_framework.test_framework import BitcoinTestFramework
    41+from test_framework.util import assert_equal, str_to_b64str
    42+
    43+import http.client
    44+import urllib.parse
    45+
    46+class RpcHttpShutdownRaceTest(BitcoinTestFramework):
    47+    def set_test_params(self):
    48+        self.num_nodes = 1
    49+        self.setup_clean_chain = True
    50+
    51+    def run_test(self):
    52+        url = urllib.parse.urlparse(self.nodes[0].url)
    53+        authpair = f"{url.username}:{url.password}"
    54+        headers = {"Authorization": f"Basic {str_to_b64str(authpair)}"}
    55+
    56+        conn = http.client.HTTPConnection(url.hostname, url.port, timeout=10)
    57+        conn.connect()
    58+        conn.request("POST", "/", '{"method":"getblockchaininfo","id":1}', headers)
    59+
    60+        # Trigger shutdown outside RPC callback path so the in-flight request
    61+        # is interrupted before submitting to the pool.
    62+        self.nodes[0].process.terminate()
    63+
    64+        resp = conn.getresponse()
    65+        body = resp.read()
    66+        assert_equal(resp.status, http.client.SERVICE_UNAVAILABLE)
    67+        assert b"Request rejected during server shutdown" in body
    68+        node = self.nodes[0]
    69+        self.wait_until(lambda: node.process.poll() is not None, timeout=20)
    70+        stderr = node.stderr.read().decode("utf-8", errors="replace").strip()
    71+        node.is_node_stopped(expected_stderr=stderr, expected_ret_code=node.process.returncode)
    72+
    73+
    74+if __name__ == '__main__':
    75+    RpcHttpShutdownRaceTest(__file__).main()
    76diff --git a/test/functional/test_runner.py b/test/functional/test_runner.py
    77index cdfa8c1022..0d21907677 100755
    78--- a/test/functional/test_runner.py
    79+++ b/test/functional/test_runner.py
    80@@ -201,6 +201,7 @@ BASE_SCRIPTS = [
    81     'wallet_reindex.py',
    82     'wallet_reorgsrestore.py',
    83     'interface_http.py',
    84+    'rpc_http_shutdown_race.py',
    85     'interface_rpc.py',
    86     'interface_usdt_coinselection.py',
    87     'interface_usdt_mempool.py',
    
  26. andrewtoth approved
  27. andrewtoth commented at 3:11 pm on February 15, 2026: contributor

    ACK 542879da7b66e890d2c1bf668ab5680e939541f8

    This fixes the unclean shutdown race condition. It’s too bad we can’t have a regression test without adding sleeps.

  28. DrahtBot requested review from hodlinator on Feb 15, 2026
  29. furszy commented at 4:29 pm on February 15, 2026: member

    Thanks for the reproduction patch @andrewtoth! Added link to the PR description.

    This fixes the unclean shutdown race condition. It’s too bad we can’t have a regression test without adding sleeps.

    A small reflection about this.

    We can’t expect to be able to cover everything in functional tests. Simply because they live on a higher level and have no access to internals to exercise race conditions.

    These types of scenarios belong to the unit tests land. But for that to happen, the surrounding code has to be properly encapsulated. Which is not the case for the http server, as there is currently no class for it, we basically import an entire file with global variables and static functions (and up to a few days ago, we had no tests for the work queue at all and this race condition was already there, just hidden returning an incorrect error message instead).

    The only way to replicate scenarios like this one atm, on a unit test, would be to manually copy paste startup and shutdown. Which is not really scalable nor maintainable.

    Still, we will get better soon, the libevent removal path @pinheadmz is working on also brings better encapsulation that we can use to deeply cover the area.

  30. in src/util/threadpool.h:24 in 818e2edadc outdated
    17@@ -17,6 +18,7 @@
    18 #include <queue>
    19 #include <stdexcept>
    20 #include <thread>
    21+#include <type_traits>
    22 #include <utility>
    23 #include <vector>
    24 
    


    ismaelsadeeq commented at 3:42 pm on February 16, 2026:

    In “threadpool: make Submit return Expected instead of throwing” 818e2edadc11dca226539ddee945f0acf6e53814

    This introduces inconsistency. In Start, we throw a runtime error as a result of multiple calls to Start, or possibly an exception could occur when thread creation fails for some reason.

    I’m also a bit conflicted here. Why are we using util::Expected? Shouldn’t the client be patched to handle throws when interacting with the thread pool? It’s likely that we throw for some cases, and still back to an issue like the unclean shutdown we are fixing here?


    furszy commented at 3:47 pm on February 16, 2026:
    Yes. Please see #34577 (review). Same conclusion there. I’m changing the current PR approach now. I shouldn’t have gotten obnubilated by our shiny new util::Expected toy.

    hodlinator commented at 3:58 pm on February 16, 2026:
    Start() could probably Assume() or assert() instead of throw, it seems to be a bit of a logic error to Start() twice without stopping.

    andrewtoth commented at 4:48 pm on February 16, 2026:

    obnubilated

    lol

    Throwing exceptions should be deprecated IMO, and we should aim to use the more modern Expected approach for errors. If an error is unrecoverable, abort instead of throwing an exception.


    sedited commented at 5:45 pm on February 16, 2026:

    Throwing exceptions should be deprecated IMO

    I disagree with this as a general statement. For example, throwing an exception is the standard way to indicate an error in a constructor, as utilized all over the standard library. We also use the standard library very extensively, which makes extensive use of exceptions, and will require us to handle them for as long as we use it.


    andrewtoth commented at 10:15 pm on February 16, 2026:
    Right, we can’t return an Expected from a constructor like we can return a Result in Rust when creating an object. Other than that, though, I don’t think there’s a need to throw an exception. We still need to catch exceptions, of course. I didn’t suggest we stop doing that!

    furszy commented at 8:07 pm on February 17, 2026:
    Updated. Got convinced keeping the Expected approach here is the best path to follow. Thanks everyone for the discussion!
  31. ismaelsadeeq commented at 3:44 pm on February 16, 2026: member

    Concept ACK @furszy, it would be nice if you share a branch with a reproducer.

    edit: see #34577 (comment) for reproducer

  32. furszy force-pushed on Feb 17, 2026
  33. furszy force-pushed on Feb 17, 2026
  34. DrahtBot added the label CI failed on Feb 17, 2026
  35. in src/util/threadpool.h:226 in 881568b39b
    222@@ -208,4 +223,15 @@ class ThreadPool
    223     }
    224 };
    225 
    226+static constexpr std::string_view SubmitErrorString(ThreadPool::SubmitError err) noexcept {
    


    hodlinator commented at 8:38 am on February 17, 2026:

    static on a free function means that it should be local to the translation unit without any exported symbols. AFAIK we only use it for functions in .CPP files. inline would make more sense, but is already implied by constexpr.

    0constexpr std::string_view SubmitErrorString(ThreadPool::SubmitError err) noexcept {
    

    furszy commented at 3:55 pm on February 17, 2026:
    Sure. Done. The static was moved from when this was a class method.
  36. hodlinator commented at 11:36 am on February 17, 2026: contributor

    The CI MSan failure in p2p_private_broadcast.py1 possibly stems from the I2P network being unstable2.

  37. furszy force-pushed on Feb 17, 2026
  38. threadpool: make Submit return Expected instead of throwing
    Unlike exceptions, which can be ignored as they require extra try-catch
    blocks, returning expected errors forces callers to always handle
    submission failures.
    
    Not throwing an exception also fixes an unclean shutdown bug
    #34573 since we no longer throw when attempting to Submit()
    from the libevent callback http_request_cb().
    59d24bd5dd
  39. furszy force-pushed on Feb 17, 2026
  40. furszy commented at 8:05 pm on February 17, 2026: member
    rebased to get latest CI changes.
  41. DrahtBot removed the label CI failed on Feb 17, 2026
  42. hodlinator approved
  43. hodlinator commented at 6:16 pm on February 18, 2026: contributor
    ACK 9d45a1b1bbccf21aa7750f5741af4d6edfd3dc48
  44. DrahtBot requested review from andrewtoth on Feb 18, 2026
  45. DrahtBot requested review from ismaelsadeeq on Feb 18, 2026
  46. andrewtoth approved
  47. andrewtoth commented at 6:36 pm on February 18, 2026: contributor
    ACK 9d45a1b1bbccf21aa7750f5741af4d6edfd3dc48
  48. http: properly respond to HTTP request during shutdown
    Makes sure we respond to the client as the HTTP request attempts to submit a task to
    the thread pool during server shutdown.
    
    Roughly what happens:
    
    1) The server receives an HTTP request and starts calling http_request_cb().
    2) Meanwhile on another thread, shutdown is triggered which calls InterruptHTTPServer()
       and unregisters libevent http_request_cb() callback and interrupts the thread pool.
    3) The request (step 1) resumes and tries to submit a task to the now-interrupted server.
    
    This fix detects failed submissions immediately, and the server responds with
    HTTP_SERVICE_UNAVAILABLE.
    726b3663cc
  49. in src/httpserver.cpp:279 in 379a9201b1 outdated
    275@@ -276,7 +276,12 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
    276             req->WriteReply(HTTP_INTERNAL_SERVER_ERROR, err_msg);
    277         };
    278 
    279-        [[maybe_unused]] auto _{g_threadpool_http.Submit(std::move(item))};
    280+        if (auto res = g_threadpool_http.Submit(std::move(item)); !res.has_value()) {
    


    pinheadmz commented at 9:09 pm on February 18, 2026:

    379a9201b1aabffe5f19dbabf1fe4fd57e8fefba

    I’m also interested in trying to save unique_ptr for hreq and tried some hacks like @hodlinator but nothing great came out of that. If nothing else could we Assume(hreq.use_count() == 1) in this block and also below where we send 404? This would just clarify that we don’t reply to a request outside a worker thread unless we’re sure the worker thread is nonexistent.


    furszy commented at 10:12 pm on February 18, 2026:
    Done as suggested. Thanks.
  50. furszy force-pushed on Feb 18, 2026
  51. furszy commented at 10:15 pm on February 18, 2026: member
    Updated per @pinheadmz feedback, thanks. Only added a shared_pointer.use_count() == 1 check to be completely sure the request is destroyed upon submission failure.
  52. Eunovo commented at 10:00 am on February 19, 2026: contributor
  53. DrahtBot requested review from hodlinator on Feb 19, 2026
  54. DrahtBot requested review from andrewtoth on Feb 19, 2026
  55. hodlinator approved
  56. hodlinator commented at 10:03 am on February 19, 2026: contributor
    re-ACK a6fc8a43f3f5b7605ebcd24212e01524c86a9189
  57. fanquake commented at 10:13 am on February 19, 2026: member
    @dergoegge does this resolve #34573 for you?
  58. pinheadmz approved
  59. pinheadmz commented at 12:19 pm on February 19, 2026: member

    ACK a6fc8a43f3f5b7605ebcd24212e01524c86a9189

    Built and tested on arm64/macos. Reviewed each commit, gave feedback which was addressed. I compared the results of the suggested functional test against modified code between master and branch, ran the test without the modified code between master and branch, and observed all those trials with Wireshark to confirm that the PR improves HTTP server behavior during shutdown.

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA256
     2
     3ACK a6fc8a43f3f5b7605ebcd24212e01524c86a9189
     4-----BEGIN PGP SIGNATURE-----
     5
     6iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmmW/0sACgkQ5+KYS2KJ
     7yTomKxAAhc0LeceulNUcjufbNhrpCSGYCK3n+5BrLLKObfQbngO1OnOW3wOEVBjt
     8RwwpJpwa7nIX2U9YoLWDEJKb5HmdozJZAIWHRuuX2qALOkEr/oguBYZEtan72DDq
     9cvtm+d6jncur1FnieRTDIdj8/YFkJPIv4dtGyxuy7f3VF0cODuk2fE7oZ+NIQdzN
    10rMcgI4757HIltYxJRuTVAt7KtvfZuMye/6UufBOcjz159vM+uUi6ORGORzf6EUk9
    11N3ZF+6DWrQrRjWi0k195t9i0TVDlF/eND2DZmEkK5Jrt+ZjcTMoUaQ3Wj2WoRRfd
    12XQ5CL9jzw0g8v6LMdYaVmAOfJbIzoF1H9NMWx1a7dwTkaABClXvm0QjB8N14kbhZ
    13S3inDNJ0hTfyoEeXyHk+Ui+TMj1TutFq1kLvN5TUT+lDE/e/rybL9cKNQvl8NrEk
    14CtD2mIXLf0vJynpo/hOEO4IP6WzUJjl/6gyrhJ3Dli6xLUznNsPnFB0YgNd2+tPk
    150xr/SQ0DTn5GVzOkw9E5jkvDjoVKRbV1sZaw039cBL3ika7ckvt3pEumKj+zZ+IK
    16IVZ/mXSrfCVKxy3WfY9eQjgjp0A5gogpTQR/MHq1TQ3Pa0v0Qn18bdw0mvm32R8f
    17DXw8cVQj0JZEP2XiOxD3G2v6rtYK+vnnFQUpxhgcbNSND/8Fpg8=
    18=Q/ll
    19-----END PGP SIGNATURE-----
    

    pinheadmz’s public key is on openpgp.org

  60. in src/util/threadpool.h:106 in a6fc8a43f3
    104+     * Calling Start() on an already-running pool is a logic error. For lazy
    105+     * initialization and idle shutdown patterns, callers must provide their
    106+     * own synchronization.
    107      */
    108-    void Start(int num_workers) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    109+    void Start(const int num_workers) noexcept EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    


    sedited commented at 1:21 pm on February 19, 2026:
    Not sure about marking this as noexcept. Exceptions here would be caught by AppInit and we’d shutdown gracefully. There a bunch of functions that do still throw here, e.g. std::system_error when spawning the thread fails. Do we really gain anything by terminating immediately?

    furszy commented at 3:52 pm on February 19, 2026:

    Agree on removing noexcept.

    And this is probably a good moment to align on the broader design philosophy so we have some common foundations going forward.

    The way I’m seeing it after some back and forth: there are two fundamentally different kinds of failures, and they deserve different treatment. Plus we have a special third rule that is even more important to us than any design decision.

    The first is environment related failures: things like an allocation failure, IO error, thread spawning failing due to the OS is out of resources, etc. These can happen even in perfectly correct code. For these, throwing makes sense. Even if the caller can’t fully recover, it at least gets the chance to shut down gracefully, flush state, log something useful. Hard termination throws all of that away.

    The second kind is logic errors: things that cannot happen if the code is correct. A double Start() call is a good example of this. That’s not the OS misbehaving, that’s a bug in the caller. Throwing a std::runtime_error implies the caller might catch it and carry on, when really we want to say “this shouldn’t happen, and if it does, something is just wrong”. An assertion is more direct here. It’s not an error to be handled, it’s a violation of the API contract.

    The third rule that applies only to us (Bitcoin-Core) is that we want to minimize assertions in code that can be accessed through the P2P layer. We don’t want to be logically honest here, we want to guard against someone triggering a crash remotely.


    Maybe if we agree on this, we could add it to one of our .md guidelines. So we don’t have to revive this type of discussions every time someone suggests to add a noexcept or to change an exception for an assertion in risky places.

    Thoughts?


    hodlinator commented at 7:51 pm on February 19, 2026:

    The first is environment related failures: things like an allocation failure, IO error, thread spawning failing due to the OS is out of resources, etc. These can happen even in perfectly correct code. For these, throwing makes sense. Even if the caller can’t fully recover, it at least gets the chance to shut down gracefully, flush state, log something useful. Hard termination throws all of that away.

    I think it’s fairly hard to recover from OOM, IO errors or failing to spawn threads, so wouldn’t aim to use exceptions in this case. Exiting with a clear error in the log/stderr and/or crash dump seems realistic and sufficient.

    Agree it would be good to document, if we can agree. ;)

    Some compilers/stdlibs allow for disabling exceptions btw. Not advocating for it in this project, just FYI. I guess one gets a hard crash instead of stack unwinding when throwing from constructors etc.

  61. furszy force-pushed on Feb 19, 2026
  62. furszy commented at 4:49 pm on February 19, 2026: member

    Due to the upcoming release freeze, decided to drop the last commit https://github.com/bitcoin/bitcoin/commit/a6fc8a43f3f5b7605ebcd24212e01524c86a9189 to keep this PR focused strictly on fixing the race.

    That commit was a design improvement rather than a direct fix, so it makes sense to handle it separately. We can reintroduce it in a follow-up once we align on the overall design direction #34577 (review) (if possible).

    PR ready to go.

  63. sedited approved
  64. sedited commented at 5:05 pm on February 19, 2026: contributor
    ACK 726b3663cc8e2164d4e9452f12f5866f5e8f6f1a
  65. DrahtBot requested review from pinheadmz on Feb 19, 2026
  66. DrahtBot requested review from Eunovo on Feb 19, 2026
  67. DrahtBot requested review from hodlinator on Feb 19, 2026
  68. pinheadmz approved
  69. pinheadmz commented at 5:49 pm on February 19, 2026: member

    re-ACK 726b3663cc8e2164d4e9452f12f5866f5e8f6f1a

    Change since last review, dropped tip commit which doesn’t affect the bug fix targeting by the rest of the branch.

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA256
     2
     3ACK 726b3663cc8e2164d4e9452f12f5866f5e8f6f1a
     4-----BEGIN PGP SIGNATURE-----
     5
     6iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmmXTMUACgkQ5+KYS2KJ
     7yTrnag/+IVOKcTiwVPKY0LS+i96t8qwCDeBCYH22b7pzwdDbQ8Bmnhg42Bz5kSQq
     8XuGPXt4sAn0MjGiC7EpAIErA+yhT/nrf1QuOyJslSUz+6BnO491trzZAdPJ9cnyF
     913x4AY2mYs/fLFCfKVk7QhW5U+0XfgqcVD/BNsPtmBfcA6ClXUmFatBL3Mb+P1Y8
    10H26shtJUz3If+HAM6d5Iey/IoMgS6ENgaLr+4weMRej6JYpNfU2cC4+G9vs/WgRL
    11+W2K+Fs+Ose2lMLwg+Py6m7vmtrrb1PzuYYGw2B07qsHOP9aGGN+SLDEVs/tEMeY
    12YUVY0vWnplCG0NMo+VQoMa4xchaMNVpLN1Haa4nwBDWOxSn64t3a9E/EFh+9X3Z/
    13fLppj+2KrUqCeJ6eN/bJB2sAutq+oZSMFwgybqArPQ9KSMSoNAwoUYySphMmeEkW
    14JC6NcCTNkI1ZzVrZChizJPYLoGVF7YKWnlo2lLR74/l99+auKQAmTxBIl5hlctoB
    154Fv3tgqS+iKrPe9g1mrgGRInhCwsvXVU9tZtgXChzQGuecfQYaESmWVJ1RvqzbQx
    16fUSbShCOT7iYs7OK+7j5BwPyIBgRoDEf9A9fJ8+/QlYSkQRnTMqux90AtdWXKm/V
    17Pde7ojKRmACJsuC2hLnE/Ympa63+EwSBf1xZXiLKNtCauVlQCpo=
    18=TWiX
    19-----END PGP SIGNATURE-----
    

    pinheadmz’s public key is on openpgp.org

  70. andrewtoth approved
  71. andrewtoth commented at 6:18 pm on February 19, 2026: contributor
    ACK 726b3663cc8e2164d4e9452f12f5866f5e8f6f1a
  72. achow101 commented at 7:39 pm on February 19, 2026: member
    ACK 726b3663cc8e2164d4e9452f12f5866f5e8f6f1a
  73. hodlinator approved
  74. hodlinator commented at 7:52 pm on February 19, 2026: contributor
    re-ACK 726b3663cc8e2164d4e9452f12f5866f5e8f6f1a
  75. achow101 merged this on Feb 19, 2026
  76. achow101 closed this on Feb 19, 2026

  77. furszy deleted the branch on Feb 19, 2026
  78. dergoegge commented at 11:05 am on February 20, 2026: member

    does this resolve #34573 for you?

    I did re-run the test that found this new crash issue yesterday and it looks like it’s fixed.

    Fuzzing/property-based tests take some time to run. If I get pinged to re-test something it’d make sense to wait for me to follow up before merge. Otherwise I’ll likely stop responding to these pings and will open a new issue if something wasn’t fixed.

  79. ismaelsadeeq commented at 12:00 pm on February 20, 2026: member

    Fuzzing/property-based tests take some time to run. If I get pinged to re-test something it’d make sense to wait for me to follow up before merge. Otherwise I’ll likely stop responding to these pings and will open a new issue if something wasn’t fixed.

    Maybe it would be helpful to let the maintainers/contributors know you’re looking into it/ i.e re-running tests after getting pinged, so they know whether to wait or move forward? If an issue wasn’t fixed after a fix is merged and resolved, opening a new issue seems fair.

  80. dergoegge commented at 12:18 pm on February 20, 2026: member
    In this case at least one maintainer knew, but in general they can expect me to follow up and if it’s urgent they can also ping me elsewhere.
  81. furszy commented at 3:24 pm on February 20, 2026: member

    In this case, the crash reproducer script made it straightforward to proceed, and I think people didn’t wanted to bother you further since the cause was clear @dergoegge.

    What we could have done better was waiting for your feedback on the issue before closing it, I agree. And, at the same time, I also agree that a brief coordination message “running fuzzers, will update soon” would have been very helpful for all of us as well.

    I mean, I think we all understand that everyone here is busy and overloaded with parallel projects and tasks, and we all work across different time zones, so being a bit more public on our actions would help minimize misunderstandings.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-03-09 18:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me