http: avoid fd exhaustion #12274

pull theuni wants to merge 6 commits into bitcoin:master from theuni:http-fd-limit changing 4 files +176 −22
  1. theuni commented at 4:25 am on January 26, 2018: member

    Addresses #11368. This adds a file descriptor limiter for the http server. It’s possible for the open fd count to potentially exhaust the user’s defined limit if not checked.

    I’m having a hard time reproducing the issue now, I’m not sure what’s changed in my environment. I was finally able to hit it using some pretty nasty tricks, but I’m less worried about this now for 0.16.

    I do think it’s worth considering taking de6cfceafba33feb77c169135c5ed70bd9d09ca4, 5f5099ac57375bdf345df0f6ebcb6efdd698f4f0, and (part of) 3b0ebb3578e89e3c4b5c0003d159cb0b04c2e86c for 0.16, though, as they help significantly and are easy to review. I’m happy to break that out into a new PR if necessary.

  2. http: respond to errors immediately rather than scheduling
    Responses are usually added to the response queue by worker threads. As an
    optimization, and in order to close connections as quickly as possible, allow
    for immediate replies (bypassing the queue) when there is no need for a worker
    to service the request.
    5f5099ac57
  3. http: return initialized listeners
    No functional change, just preparation for the next commits.
    dc53bec3c5
  4. http: add a connection limiter
    Rather than potentially accepting an unbounded number of connections, give them
    a ceiling of twice the size of the processing queue.
    
    Once the ceiling is reached, new connections will not be accepted until the
    count has fallen back under the limit.
    
    Note that file descriptors are removed from the limiter in the request's
    close callback.
    c56cf1a88e
  5. http: don't re-use connections when slots are full
    If we're unable to service a request, close the connection rather than
    potentially keeping it alive longer. This helps to be slightly more fair to new
    connections.
    3b0ebb3578
  6. http: defer accepting until there's data to read
    This avoids assigning file descriptors faster than we can close them.
    
    It is the same as using an evconnlistener with the LEV_OPT_DEFERRED_ACCEPT
    flag, which is only available with libevent 2.1+.
    de6cfceafb
  7. http: use new libevent callback to drop connections quickly
    Libevent 2.2 adds a callback allowing us to drop connections as they come in.
    Use it when possible.
    cdcc0ed011
  8. fanquake added the label RPC/REST/ZMQ on Jan 26, 2018
  9. fanquake added this to the milestone 0.16.0 on Jan 26, 2018
  10. in src/httpserver.cpp:380 in cdcc0ed011
    375@@ -288,14 +376,19 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
    376     if (i != iend) {
    377         std::unique_ptr<HTTPWorkItem> item(new HTTPWorkItem(std::move(hreq), path, i->handler));
    378         assert(workQueue);
    379-        if (workQueue->Enqueue(item.get()))
    380+        if (workQueue->Enqueue(item.get())) {
    381+            // Disable reading to work around a libevent bug, fixed in 2.2.0.
    


    tjps commented at 9:14 am on January 26, 2018:

    Should this block be wrapped in a LIBEVENT_VERSION_NUMBER macro guard so it only occurs in the affected version range?

    Seems like it would be easier for future cleanups to see where the version macro is checked, say in the instance that a minimum version of libevent is defined and code like this could be safely removed.


    theuni commented at 10:29 pm on January 29, 2018:
    This is just move-only from above. There’s no need to guard it with an ifdef because there’s no api difference, it’s purely a runtime check.
  11. in src/httpserver.cpp:89 in cdcc0ed011
    84+        return m_interrupted.load(std::memory_order_acquire);
    85+    }
    86+
    87+    const unsigned int m_limit;
    88+    std::vector<evconnlistener*> m_listeners;
    89+    std::set<evutil_socket_t> m_sockets;
    


    tjps commented at 9:21 am on January 26, 2018:
    Not that it probably matters in most configurations where the fd limit is ~1K, but making this an unordered_set would be slightly beneficial in the high fd limit cases.

    theuni commented at 10:27 pm on January 29, 2018:
    Sure
  12. laanwj commented at 9:03 am on January 28, 2018: member
    This needs to be tested by those experiencing the issue: @KanoczTomas @vii
  13. promag commented at 9:32 pm on January 29, 2018: member

    Concept ACK.

    Why don’t understand why you need dc53bec, seems like an unnecessary refactor.

    5f5099a and 3b0ebb3 could be in a separate PR, are simple and make sense even without the fd exhaustion problem.

  14. theuni commented at 10:30 pm on January 29, 2018: member
    @promag dc53bec is just a small chunk of c56cf1a that seemed reasonable on its own.
  15. in src/httpserver.cpp:303 in cdcc0ed011
    299@@ -300,6 +300,22 @@ static void connection_close_cb(evhttp_connection* conn, void *arg)
    300     }
    301 }
    302 
    303+#if LIBEVENT_VERSION_NUMBER >= 0x02020001
    


    vii commented at 4:36 am on January 30, 2018:
    awesome that you’re using the callback - one way to unambiguously detect the presence of the new feature would be to add a new autoconf test maybe using AC_LINK_IFELSE for evhttp_set_newreqcb
  16. vii commented at 4:42 am on January 30, 2018: none

    Awesome! The defer accept tweak will defend against typical inadvertent denial of services

    Note that the original issue can still be activated depending on the file descriptor limit that is set, as the budgeting in the daemon does not account for this usage or the usage in the db code - see #11785

    To test it, something like this slowhttptest program

    0slowhttptest -c 40000 -r 1000 -u 'http://127.0.0.1:8332/rest/block/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f.json'
    

    or another tool which can simulate a slowloris (HTTP connection header trickling) attack

  17. laanwj removed this from the milestone 0.16.0 on Jan 30, 2018
  18. laanwj commented at 11:27 am on January 30, 2018: member
    Sorry, I’m removing the 0.16.0 milestone here. As it introduces many new concepts and work-arounds it is hard to review last-minute, the problem is hard to trigger in the first place, and it is not a regression since 0.15. Also according to @vii it doesn’t solve the issue completely. I think getting the fd issue fixed properly should be a focus for 0.16.1.
  19. laanwj added this to the milestone 0.16.1 on Jan 30, 2018
  20. in src/httpserver.cpp:533 in cdcc0ed011
    528+        listeners.push_back(listener);
    529+    }
    530+    g_limiter = MakeUnique<ConnectionLimiter>(std::move(listeners), workQueueDepth * 2);
    531+    evhttp_set_gencb(http, http_request_cb, g_limiter.get());
    532+
    533+#if LIBEVENT_VERSION_NUMBER >= 0x02020001
    


    eklitzke commented at 7:46 am on March 11, 2018:

    Would like to see this defined separately as:

    0#if LIBEVENT_VERSION_NUMBER >= 0x02020001
    1#define HAVE_EVHTTP_SET_NEWREQCB 1
    2#endif
    

    That way the check here is more obvious, particularly since you do the check elsewhere. Of course, even better than this would be to use AC_CHECK_FUNC()

  21. in src/netbase.cpp:732 in cdcc0ed011
    723@@ -724,3 +724,13 @@ void InterruptSocks5(bool interrupt)
    724 {
    725     interruptSocks5Recv = interrupt;
    726 }
    727+
    728+bool SetListenSocketDeferred(const SOCKET& sock)
    729+{
    730+    bool ret = false;
    731+#ifdef TCP_DEFER_ACCEPT
    732+    static constexpr int set = 1;
    


    eklitzke commented at 7:49 am on March 11, 2018:
    This is the first time I’ve seen static constexpr used correctly in the Bitcoin code, congratulations.
  22. eklitzke commented at 7:50 am on March 11, 2018: contributor
    This was just a superficial check, I can do another pass later in more detail. I’m curious if you can reproduce the fd exhaustion issue.
  23. fanquake commented at 4:08 pm on March 21, 2018: member

    @eklitzke I can still produce a crash with these changes on top of https://github.com/bitcoin/bitcoin/commit/2405ce1df043f778b8efb9205009500cbc17313a:

    0date='2015-09-16T16:18:02Z' progress=0.269372 cache=313.8MiB(2336744txo)
    12018-03-21T16:04:44Z UpdateTip: new best=00000000000000000c7112c52408963acb80e1fedc71005859d6ef9a22f84d79 height=374820 version=0x00000003 log2_work=83.354829 tx=84036580 date='2015-09-16T16:20:12Z' progress=0.269373 cache=313.8MiB(2337189txo)
    22018-03-21T16:04:44Z LevelDB read failure: IO error: /Users/fanquake/Library/Application Support/Bitcoin/chainstate/019338.ldb: Too many open files
    32018-03-21T16:04:44Z Fatal LevelDB error: IO error: /Users/fanquake/Library/Application Support/Bitcoin/chainstate/019338.ldb: Too many open files
    42018-03-21T16:04:44Z You can use -debug=leveldb to get more complete diagnostic messages
    52018-03-21T16:04:44Z libevent: Error from accept() call: Too many open files
    62018-03-21T16:04:44Z libevent: Error from accept() call: Too many open files
    72018-03-21T16:04:44Z libevent: Error from accept() call: Too many open files
    82018-03-21T16:04:44Z libevent: Error from accept() call: Too many open files
    

    When last discussed with @theuni he was going to revamp these changes.

  24. eklitzke commented at 10:59 pm on March 21, 2018: contributor
    @fanquake Interesting – do you have a script to stress test it to induce that condition? Or does that happen during normal operation?
  25. laanwj removed this from the milestone 0.16.1 on May 17, 2018
  26. Nico205 referenced this in commit 52d38c9292 on May 17, 2018
  27. DrahtBot commented at 8:30 pm on July 20, 2018: member
  28. DrahtBot closed this on Jul 20, 2018

  29. DrahtBot reopened this on Jul 20, 2018

  30. in src/httpserver.cpp:55 in cdcc0ed011
    50+    }
    51+    void AddConnection(evutil_socket_t fd)
    52+    {
    53+        // Disable socket accepting if adding this connection puts us equal to the limit
    54+        if (!Interrupted() && m_sockets.insert(fd).second && m_sockets.size() == m_limit) {
    55+            LogPrint(BCLog::HTTP, "Suspending new connections");
    


    MarcoFalke commented at 8:36 pm on July 20, 2018:
    All calls to LogPrintf() and LogPrint() should be terminated with \n
  31. DrahtBot commented at 4:44 pm on August 21, 2018: member
  32. DrahtBot added the label Needs rebase on Aug 21, 2018
  33. DrahtBot added the label Up for grabs on Dec 3, 2018
  34. DrahtBot commented at 4:52 pm on December 3, 2018: member
  35. DrahtBot closed this on Dec 3, 2018

  36. laanwj removed the label Needs rebase on Oct 24, 2019
  37. BlockMechanic commented at 6:16 pm on November 7, 2019: contributor
    So is this the recommended fix for #11368 ? I just bumped into it and need to resolve it
  38. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 04:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me