HTTPServer: Prevent race condition between worker thread and I/O thread #35614

pull pinheadmz wants to merge 4 commits into bitcoin:master from pinheadmz:http-slow-sock changing 5 files +192 −30
  1. pinheadmz commented at 12:50 PM on June 27, 2026: member

    This prevents a losing race condition that could prevent the server from reading any more requests from an HTTP client.

    Found and reported by the fuzzing department: https://github.com/dergoegge/bitcoin/commit/7fe5f54497c86f216b619e340939447504e87dcb

    The Race:

    A connected socket can either be written to or read from based on the result of GenerateWaitSockets(). That method checks the HTTPRemoteClient flag m_send_ready. If it's true the implication is that there is data in the client's send buffer ready to go. Once that data is sent and the buffer is empty, MaybeSendBytesFromBuffer() sets it false again.

    The sad case was when a worker thread calling WriteReply() adds data to the send buffer, but before it sets m_send_ready to true, the I/O thread sends that data and empties the buffer. With the buffer unexpectedly empty, WriteReply() sets m_send_ready to true.

    The effect of this is that the socket will stay in "write" mode with nothing to write. With nothing to write, MaybeSendBytesFromBuffer() never sets it back to false and the socket is stuck forever.

    The Fix:

    Simply move m_send_ready = true inside the block of WriteReply() where m_send_mutex is still held. This prevents the I/O thread from emptying the send buffer while the worker thread is setting the flag.

    Testing:

    To observe the race condition, revert the first commit "http: prevent race condition between worker thread and I/O thread" and run the unit test from the remainder of the branch. I like to see the logs:

    `test_bitcoin --log_level=all --run_test=httpserver_tests -- --printtoconsole --debug=http --debug=lock'

    The test will fail with a small probability. The socket will get stuck and the test will abort after a 60 second timeout. To garuntee the race condition loses and fail the test every time, slow down WriteReply() in the worker thread:

    diff --git a/src/httpserver.cpp b/src/httpserver.cpp
    index 99e30ff663..b0c7b516d8 100644
    --- a/src/httpserver.cpp
    +++ b/src/httpserver.cpp
    @@ -614,6 +614,7 @@ void HTTPRequest::WriteReply(HTTPStatusCode status, std::span<const std::byte> r
         } else {
             // Inform HTTPServer I/O that data is ready to be sent to this client
             // in the next loop iteration.
    +        std::this_thread::sleep_for(500ms);
             m_client->m_send_ready = true;
         }
     
    

    With the first commit (the fix) back in place, slowing down the worker thread like this won't fail the test.

    Bonus:

    The unit test is spread over three commits. First, a method of the socket testing setup is templated so a mock socket that intentionally raises an error can be inserted. The unit test added in that commit covers a race condition that was fixed in #35182 in response to https://github.com/bitcoin/bitcoin/pull/35182/changes#r3358889539 so we get the added benefit of covering an error path, and guaranteeing coverage of both "optimistic send" (directly from worker thread) and regular send (from a tick in the I/O loop thread).

    The next commit adds a worker thread to the unit test, at which point a race condition is possible but very unlikely because all requests are sent at once. Finally, we spread out the requests in the top commit and make the race condition much easier to catch.

  2. DrahtBot commented at 12:50 PM on June 27, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35614.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK janb84, dergoegge, theStack

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    No conflicts as of last run.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  3. pinheadmz force-pushed on Jun 27, 2026
  4. DrahtBot added the label CI failed on Jun 27, 2026
  5. DrahtBot commented at 1:08 PM on June 27, 2026: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. <sub>Task 32 bit ARM: https://github.com/bitcoin/bitcoin/actions/runs/28289717765/job/83819565264</sub> <sub>LLM reason (✨ experimental): CI failed during compilation because -Werror=overloaded-virtual treated a hidden virtual DynSock::operator=(Sock&&) in test/util/net.h (used by httpserver_tests.cpp) as an error.</sub>

    <details><summary>Hints</summary>

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

    </details>

  6. pinheadmz force-pushed on Jun 27, 2026
  7. DrahtBot removed the label CI failed on Jun 27, 2026
  8. fanquake added this to the milestone 32.0 on Jun 27, 2026
  9. dergoegge approved
  10. dergoegge commented at 10:40 AM on June 29, 2026: member

    utACK f190c779cbc268de82ea64b57e4df587404ae3c6

    Fix looks correct to me.

  11. in src/httpserver.h:491 in f190c779cb
     485 | @@ -486,6 +486,7 @@ class HTTPRemoteClient
     486 |       * Set true by worker threads after writing a response to m_send_buffer.
     487 |       * Set false by the HTTPServer I/O thread after flushing m_send_buffer.
     488 |       * Checked in the HTTPServer I/O loop to avoid locking m_send_mutex if there's nothing to send.
     489 | +     * Must be set only while holding m_send_mutex.
     490 |       */
     491 |      std::atomic_bool m_send_ready{false};
    


    janb84 commented at 1:02 PM on June 29, 2026:
        * Set true by worker threads after writing a response to m_send_buffer.
        * Set false by the HTTPServer I/O thread after flushing m_send_buffer.
        * Checked in the HTTPServer I/O loop to decide whether to poll the socket for
        * writeability or readability.
        * Guarded by m_send_mutex so it stays consistent with m_send_buffer's emptiness:
        * the two must always be updated together under the same lock.
        */
        bool m_send_ready GUARDED_BY(m_send_mutex){false};
    

    NIT: I think the Comment uncovers a bit of a smell, guarded data wearing atomic's clothes. The invariant is m_send_ready ⇔ m_send_buffer emptiness, and both must move under the same lock. As far as I can see the field is only an atomic_bool to allow one lock-free read in GenerateWaitSockets and that loop already takes m_sock_mutex per client, the lock-free read saves essentially nothing. So imho no reason not to make it an bool m_send_ready GUARDED_BY(m_send_mutex){false};.

    This turns the hand-maintained "must hold the lock" convention into a compile-time guarantee.

    it needs also a change in httpserver.cpp, as mentioned:

    diff --git a/src/httpserver.cpp b/src/httpserver.cpp
    index 92579ee106..35ed8c040f 100644
    --- a/src/httpserver.cpp
    +++ b/src/httpserver.cpp
    @@ -941,7 +941,12 @@ HTTPServer::IOReadiness HTTPServer::GenerateWaitSockets() const
     
             // Check if client is ready to send data. Don't try to receive again
             // until the send buffer is cleared (all data sent to client).
    -        Sock::Event event = (http_client->m_send_ready ? Sock::SEND : Sock::RECV);
    +        // Keep this as a separate critical section from the m_sock_mutex one above:
    +        // never hold m_sock_mutex and m_send_mutex at the same time here.
    +        // MaybeSendBytesFromBuffer() locks m_send_mutex then m_sock_mutex, so nesting
    +        // them in the opposite order here would risk a lock-order inversion deadlock.
    +        const bool send_ready{WITH_LOCK(http_client->m_send_mutex, return http_client->m_send_ready;)};
    +        Sock::Event event = (send_ready ? Sock::SEND : Sock::RECV);
             io_readiness.events_per_sock.emplace(sock, Sock::Events{event});
             io_readiness.httpclients_per_sock.emplace(sock, http_client);
         }
    

    pinheadmz commented at 2:01 PM on June 29, 2026:

    Thanks, I'll explicitly add the lock to the flag that makes sense.

  12. janb84 commented at 1:04 PM on June 29, 2026: contributor

    Concept ACK f190c779cbc268de82ea64b57e4df587404ae3c6

    Code review; think it will resolve the ToCTOU but it uncovers a bit of a smell (IMHO) , see NIT.

  13. pinheadmz force-pushed on Jun 29, 2026
  14. http: prevent race condition between worker thread and I/O thread
    This prevents a losing race condition that could prevent the server
    from reading requests from an HTTP client.
    
    A connected socket can either be written to or read from based on the
    result of GenerateWaitSockets(). That method checks the HTTPRemoteClient
    flag m_send_ready. If it's `true` the implication is that there is
    data in the client's send buffer ready to go. Once that data is sent
    and the buffer is empty, MaybeSendBytesFromBuffer() sets it `false` again.
    
    The sad case was when a worker thread calling WriteReply() adds
    data to the send buffer, but before it sets m_send_ready to `true`,
    the I/O thread sends that data and empties the buffer. With the
    buffer unexpectedly empty, WriteReply() sets m_send_ready to `true`.
    
    The effect of this is that the socket will stay in "write" mode
    with nothing to write. With nothing to write, MaybeSendBytesFromBuffer()
    never sets it back to `false` and the socket is stuck forever.
    73da2a8a52
  15. test: socket error handling in HTTPServer using ErrorSock mock socket
    Implements a child class of DynSock which is used as the mock
    socket for HTTPServer unit tests. The ErrorSock::Send() method
    raises a non-permanent error on the first HTTPRequest::WriteReply()
    and then succeeds after the second.
    
    In httpserver_tests.cpp use this mechanism to ensure that the
    server retries a send operation if such an error is encountered,
    and cover both optimistic (worker thread WriteReply()) and
    non-optimistic (I/O thread SocketHandlerConnected()) send paths.
    922b08d375
  16. test: introduce a worker thread in http socket error test b98b10c072
  17. test: ensure HTTPServer race condition is fixed
    The result of WriteReply() losing the race condition would prevent any
    new requests being read from the socket. The socket error test
    sent 3 requests all at once after connecting, so in this commit
    we separate the the third request to make the losing race
    condition more likely, and make its effect more obvious.
    f595daf1dd
  18. pinheadmz force-pushed on Jun 29, 2026
  19. pinheadmz commented at 2:32 PM on June 29, 2026: member

    push to f595daf1dd01e9730e0eafbc46b2e22eeb9f33fe:

    • rebase on master, fix conflict with #35588
    • Address review from @janb84, guarding the flag by the same log as the bugger.
  20. janb84 commented at 6:32 PM on June 29, 2026: contributor

    crACK f595daf1dd01e9730e0eafbc46b2e22eeb9f33fe

  21. DrahtBot requested review from dergoegge on Jun 29, 2026
  22. dergoegge approved
  23. dergoegge commented at 9:24 AM on June 30, 2026: member

    utACK f595daf1dd01e9730e0eafbc46b2e22eeb9f33fe

  24. theStack approved
  25. theStack commented at 2:50 PM on July 3, 2026: contributor

    Code-review ACK f595daf1dd01e9730e0eafbc46b2e22eeb9f33fe

  26. fanquake merged this on Jul 3, 2026
  27. fanquake closed this on Jul 3, 2026


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-07-04 02:51 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me