libevent-based http server #5677

pull laanwj wants to merge 16 commits into bitcoin:master from laanwj:2015_01_evhttpd changing 38 files +1502 −1153
  1. laanwj commented at 4:15 pm on January 18, 2015: member
    • Replace usage of boost::asio with libevent2. boost::asio is not part of C++11, so unlike other boost there is no forwards-compatibility reason to stick with it. Together with #4738 (convert json_spirit to UniValue), this rids Bitcoin Core of the worst offenders with regard to compile-time slowness.
    • Replace spit-and-duct-tape http server with evhttp. Front-end http handling is handled by libevent, a work queue (with configurable depth and parallelism) is used to handle application requests.
    • Wrap HTTP request in C++ class; this makes the application code mostly HTTP-server-neutral
    • Refactor RPC to move all http-specific code to a separate file. Theoreticaly this can allow building without HTTP server but with another RPC backend, e.g. Qt’s debug console (currently not implemented) or future RPC mechanisms people may want to use.
    • HTTP dispatch mechanism; services (e.g., RPC, REST) register which URL paths they want to handle.

    By using a proven, high-performance asynchronous networking library (also used by Tor) and HTTP server, problems such as #5674, #5655, #344 should be avoided.

    What works? bitcoind, bitcoin-cli, bitcoin-qt. Unit tests and RPC/REST tests pass. The aim for now is everything but SSL support.

    Configuration options:

    • -rpcthreads: repurposed as “number of work handler threads”. Still defaults to 4.
    • -rpcworkqueue: maximum depth of work queue. When this is reached, new requests will return a 500 Internal Error.
    • -rpctimeout: inactivity time, in seconds, after which to disconnect a client.
    • -debug=http: low-level http activity logging

    (due to the separation of RPC and HTTP server, renaming these options may make sense, but I’ve kept this out of backwards compatiblity)

    TODO:

    • Build system (currently hardcodes libraries, so this will definitely not pass Travis) (thanks @theuni)
    • REST and RPC register their own request handlers respectively
    • Qt debug console must register a RPCTimerInterface (to make timeouts in the debug console work with -server=0)
    • Interrupt/Shutdown flow needs to be cleaned up
    • [warn] event_active: event has no event_base set. appears sometimes to the console. Seems to be harmless, but it is weird (see @ajweiss comments)
  2. laanwj added the label RPC on Jan 18, 2015
  3. laanwj added the label REST on Jan 18, 2015
  4. laanwj force-pushed on Jan 19, 2015
  5. theuni commented at 3:49 am on January 20, 2015: member

    Very nice! Before looking over the work itself, I wanted to be sure that libevent was viable for all of our build targets.

    See here for the build-system work. This should be enough to get Travis passing, I’d think: https://github.com/theuni/bitcoin/commits/5677

  6. laanwj commented at 5:00 am on January 20, 2015: member
    @cfields Will pull that in, thanks a lot!
  7. luke-jr commented at 5:04 am on January 20, 2015: member
    Hm, nothing special needed for longpolling?
  8. laanwj commented at 5:14 am on January 20, 2015: member
    @luke-jr I don’t think so. The current implementation should work. Of course it would be more optimal to release the worker thread while longpolling, and change “new block” it into an event-trigger, but I leave that as a challenge for later.
  9. laanwj force-pushed on Jan 20, 2015
  10. laanwj force-pushed on Jan 20, 2015
  11. laanwj commented at 6:37 am on January 20, 2015: member

    The fail in “32-bit + dash” is strange

    0FAIL: qt/test/test_bitcoin-qt
    

    I’m not sure how this can be affected at all (passes fine here), but I’ll check.

    On Win32/64 it still tries to link against libevent_pthread. IIRC there is no specific thread library for windows, evthread_use_windows_threads is part of the core library there.

    0/usr/bin/x86_64-w64-mingw32-ld: cannot find -levent_pthreads
    
  12. theuni commented at 6:45 am on January 20, 2015: member
    Blah, sorry, missed that one.
  13. laanwj commented at 7:21 am on January 20, 2015: member

    It’s easy to miss those sneaky qt unit tests. Windows passes now!

    That leaves the 32-bit + dash case, which is not an intermittent issue

    0FAIL! : PaymentServerTests::paymentServerTests() Compared values are not the same
    1Actual (merchant):
    2Expected (QString("testmerchant.org")): testmerchant.org
    3Loc: [qt/test/paymentservertests.cpp(84)]
    

    … no clue how this happens yet. The Qt tests don’t use the RPC mechanism. My gut feeling is some interaction with OpenSSL which, absent verification, is now an indirect dependency through Qt? Not sure, and why it only happens in this test is a mystery to me.

  14. theuni commented at 8:23 am on January 20, 2015: member

    @laanwj Yes, it has some interaction with qt:

    • ./configure –with-gui=qt4: fine.
    • ./configure –with-gui=qt5: fine.
    • make -C depends; ./configure –prefix=pwd/depends/x86_64-unknown-linux-gnu: busted
    • make -C depends USE_LINUX_STATIC_QT5=1; ./configure –prefix=pwd/depends/x86_64-unknown-linux-gnu: fine.
  15. jonasschnelli commented at 3:11 pm on January 20, 2015: contributor
    tested gitian build. Binaries to test: https://builds.jonasschnelli.ch/pulls/5677/
  16. laanwj force-pushed on Jan 20, 2015
  17. jtimon commented at 2:28 pm on January 21, 2015: contributor
    Concept ACK
  18. laanwj force-pushed on Jan 22, 2015
  19. laanwj force-pushed on Jan 23, 2015
  20. laanwj commented at 10:26 am on January 23, 2015: member

    Now that the code is stable it is time for some benchmarking. I found a nice scriptable framework for HTTP benchmarking, wrk. Some results. These benchmarks were taken at the default settings (4 worker threads, 16 depth work queue).

    GET request to invalid URL

    These are handled by evhttp itself, so this is the baseline.

    0$  ./wrk -t12 -c15 -d10s http://127.0.0.1:18332/inv
    1  Thread Stats   Avg      Stdev     Max   +/- Stdev
    2    Latency   375.53us   66.86us   6.31ms   93.03%
    3    Req/Sec     2.67k   121.28     3.11k    80.98%
    4  303293 requests in 10.00s, 36.73MB read
    5  Non-2xx or 3xx responses:
    6    404: 303293
    7Requests/sec:  30343.08
    8Transfer/sec:      3.68MB
    

    As expected, we get 404s.

    GET request to /

    These are dispatched to a worker thread. Some more latency is expected. In the worker thread these error out early, as GET is not a valid method for JSON RPC.

    0$ ./wrk -t12 -c15 -d10s http://127.0.0.1:18332/
    1  Thread Stats   Avg      Stdev     Max   +/- Stdev
    2    Latency   472.96us   44.12us   1.28ms   93.31%
    3    Req/Sec     2.14k   127.02     2.33k    82.90%
    4  245025 requests in 10.00s, 41.59MB read
    5  Non-2xx or 3xx responses:
    6    405: 245025
    7Requests/sec:  24514.30
    8Transfer/sec:      4.16MB
    

    As expected: lots of 405 (invalid method for URL) results.

    RPC getgenerate requests

    Post getgenerate requests. This is an extremely cheap RPC call. Script:

    0wrk.method = "POST"
    1wrk.body   = "{\"method\":\"getgenerate\",\"params\":[],\"id\":1}\n"
    2wrk.headers["Content-Type"] = "application/json"
    3wrk.headers["Authorization"] = "Basic XXX"
    
    0$ ./wrk -t12 -c15 -d10s -s getgenerate.lua http://127.0.0.1:18332/
    1  Thread Stats   Avg      Stdev     Max   +/- Stdev
    2    Latency   571.28us   59.77us   2.09ms   91.89%
    3    Req/Sec     1.82k   110.56     2.00k    80.32%
    4  204481 requests in 9.99s, 28.28MB read
    5  Non-2xx or 3xx responses:
    6Requests/sec:  20463.77
    7Transfer/sec:      2.83MB
    

    All responses succesful.

    RPC getinfo requests

    Post getinfo requests. This is a more expensive RPC call.

    0wrk.method = "POST"
    1wrk.body   = "{\"method\":\"getinfo\",\"params\":[],\"id\":1}\n"
    2wrk.headers["Content-Type"] = "application/json"
    3wrk.headers["Authorization"] = "Basic XXX
    
    0$ ./wrk -t12 -c15 -d10s -s getinfo.lua http://127.0.0.1:18332/
    1  Thread Stats   Avg      Stdev     Max   +/- Stdev
    2    Latency     1.69ms  572.87us  15.39ms   95.21%
    3    Req/Sec   626.46     81.57     1.00k    84.26%
    4  71245 requests in 9.99s, 37.17MB read
    5  Non-2xx or 3xx responses:
    6Requests/sec:   7128.48
    7Transfer/sec:      3.72MB
    

    Although the performance goes down, no errors happen. The maximum queue depth is never reached.

    RPC requests w/ invalid authentication

    Post getinfo requests with invalid authentication. This will trigger a 250ms delay, and thus we can trigger worker queue-full conditions with enough threads.

    0wrk.method = "POST"
    1wrk.body   = "{\"method\":\"getinfo\",\"params\":[],\"id\":1}\n"
    2wrk.headers["Content-Type"] = "application/json"
    3wrk.headers["Authorization"] = "Basic YYY
    
    0$ ./wrk -t12 -c15 -d10s -s getinfo_unauth.lua http://127.0.0.1:18332/
    1Thread Stats   Avg      Stdev     Max   +/- Stdev
    2    Latency   751.03ms  323.38us 752.15ms   81.82%
    3    Req/Sec     1.05      0.65     2.00     58.33%
    4  156 requests in 10.00s, 19.80KB read
    5  Non-2xx or 3xx responses:
    6    401: 156
    7Requests/sec:     15.60
    8Transfer/sec:      1.98KB
    

    Increasing the load further, we can exceed the work queue depth:

    0$ ./wrk -t12 -c45 -d10s -s getinfo_unauth.lua http://127.0.0.1:18332/
    1  Thread Stats   Avg      Stdev     Max   +/- Stdev
    2    Latency   204.35ms  462.08ms   1.25s    83.72%
    3    Req/Sec     2.42k     1.53k    6.00k    75.64%
    4  277072 requests in 10.00s, 43.59MB read
    5  Non-2xx or 3xx responses:
    6    401: 156
    7    500: **276916**
    8Requests/sec:  27706.71
    9Transfer/sec:      4.36MB
    

    Looks good. Nothing unexpected, it’s clear the evhttp is not the bottleneck, and the work queue works as expected. Will try this with the old asio-based HTTP server shortly.

  21. laanwj commented at 11:09 am on January 23, 2015: member

    Old http server

    Same steps as above, repeated with the old server as of commit 944c256.

    GET request to invalid URL

    0$  ./wrk -t12 -c15 -d10s http://127.0.0.1:18332/inv
    1  Thread Stats   Avg      Stdev     Max   +/- Stdev
    2    Latency   502.17us   94.60us   2.65ms   74.77%
    3    Req/Sec     1.89k   119.76     2.33k    83.38%
    4  214864 requests in 9.99s, 37.50MB read
    5  Responses:
    6    404: 214864
    7Requests/sec:  21502.28
    8Transfer/sec:      3.75MB
    

    GET request to /

    0$ ./wrk -t12 -c15 -d10s http://127.0.0.1:18332/
    1  Thread Stats   Avg      Stdev     Max   +/- Stdev
    2    Latency   502.58us   92.30us   2.07ms   73.39%
    3    Req/Sec     1.89k   118.97     2.22k    83.36%
    4  214269 requests in 9.99s, 103.40MB read
    5  Responses:
    6    401: 214269
    7Requests/sec:  21447.93
    8Transfer/sec:     10.35MB
    

    RPC getgenerate requests

    0$ ./wrk -t12 -c15 -d10s -s getgenerate.lua http://127.0.0.1:18332/
    1  Thread Stats   Avg      Stdev     Max   +/- Stdev
    2    Latency    93.68us   24.84us   1.27ms   89.23%
    3    Req/Sec    10.06k   616.30    11.78k    64.49%
    4  380211 requests in 10.00s, 78.32MB read
    5  Socket errors: connect 0, read 0, write 0, timeout 39
    6  Responses:
    7    200: 380211
    8Requests/sec:  38006.24
    9Transfer/sec:      7.83MB
    

    RPC getinfo requests

     0$ ./wrk -t12 -c15 -d10s -s getinfo.lua http://127.0.0.1:18332/
     1Running 10s test @ http://127.0.0.1:18332/
     2  12 threads and 15 connections
     3  Thread Stats   Avg      Stdev     Max   +/- Stdev
     4    Latency   544.15us  187.74us   4.03ms   79.81%
     5    Req/Sec     1.89k   188.50     2.90k    63.09%
     6  71480 requests in 10.00s, 42.13MB read
     7  Socket errors: connect 0, read 0, write 0, timeout 39
     8  Responses:
     9    200: 71480
    10Requests/sec:   7144.83
    11Transfer/sec:      4.21MB
    

    RPC requests w/ invalid authentication

    0$ ./wrk -t12 -c15 -d10s -s getinfo_unauth.lua http://127.0.0.1:18332/
    1  12 threads and 15 connections
    2  Thread Stats   Avg      Stdev     Max   +/- Stdev
    3    Latency   751.75ms  161.83us 752.03ms   73.91%
    4    Req/Sec     0.94      0.66     2.00     56.52%
    5  156 requests in 10.00s, 77.09KB read
    6  Responses:
    7    401: 156
    8Requests/sec:     15.60
    9Transfer/sec:      7.71KB
    
     0$ ./wrk -t12 -c45 -d10s -s getinfo_unauth.lua http://127.0.0.1:18332/
     1Running 10s test @ http://127.0.0.1:18332/
     2  12 threads and 45 connections
     3  Thread Stats   Avg      Stdev     Max   +/- Stdev
     4    Latency     2.19s   309.47ms   2.26s    95.24%
     5    Req/Sec     0.90      1.56     6.00     87.30%
     6  156 requests in 10.01s, 77.09KB read
     7  Socket errors: connect 0, read 0, write 0, timeout 24
     8  Responses:
     9    401: 156
    10Requests/sec:     15.59
    11Transfer/sec:      7.70KB
    

    Summary

    0Testcase                      Old (r/s) New (r/s)
    1================================================
    2GET request to invalid URL    21502     30343
    3GET request to /              21448     24514
    4RPC getgenerate requests      38006*    20463
    5RPC getinfo requests           7144*     7128
    6RPC requests w/ invalid auth     16*       16
    7
    8* with timeout errors.
    
    • The new server wins on the base http requests front. Even to /, which are dispatched to the worker threads.
    • The old server is currently much faster with getgenerate requests. I am curious why. Also: how can getgenerate requests be faster than simple GET /’s? I suspect a measurement error that has to do with the timeouts (edit: this is because the old server disconnects after errors, so those don’t utilize keepalive).
    • getinfo requests are, as expected, ~ the same speed. Processing overhead dominates.
    • Same for unauthenticated requests. The requests take 250ms to handle so the number seen is exactly as expected.

    Take into account that it’s not entirely a fair comparison: the new http server can service a large number of connections at the same time, whereas the old server can have a maximum of four (or, -rpcthreads) and starves additional.connections (thanks to keep-alive). There is some overhead in multiplexing that is absent in a one-to-one scenario. This is also a purely a local benchmark. I/O bandwidth doesn’t come into it, and the benchmark tool competes for the same CPU as the server.

  22. Diapolo commented at 11:20 am on January 23, 2015: none
    Why are we dropping SSL support for RPC?
  23. in src/bitcoin-cli.cpp: in 6f1259a8cf outdated
    162+    if (!base)
    163+        throw runtime_error("cannot create event_base");
    164+
    165+    // Synchronously look up hostname
    166+    struct evhttp_connection *evcon = evhttp_connection_base_new(base, NULL, host.c_str(), port); // XXX RAII
    167+    if (evcon == NULL)
    


    Diapolo commented at 11:25 am on January 23, 2015:
    Just to understand why are you explicitly using == NULL here and for base above just !var?

    laanwj commented at 11:32 am on January 23, 2015:
    @diapolo == NULL and !evcon are equivalent in this case so purely a matter to taste. The pull clearly states [PoC] as for proof-of-concept, please don’t report all these minor things but only critical or high-level issues,
  24. in src/bitcoin-cli.cpp: in 6f1259a8cf outdated
    174+        throw runtime_error("create http request failed");
    175+
    176+    struct evkeyvalq *output_headers = evhttp_request_get_output_headers(req);
    177+    assert(output_headers);
    178+    evhttp_add_header(output_headers, "Host", host.c_str());
    179+    evhttp_add_header(output_headers, "Connection", "close");
    


    Diapolo commented at 11:26 am on January 23, 2015:
    Perhaps use some constant or struct for such strings and use a speaking name in the code?
  25. laanwj commented at 11:29 am on January 23, 2015: member

    @diapolo Re: this pull: because I don’t feel like implementing it.

    In the longer term I (and many others) would also argue it is better to drop it:

    • Makes it possible to drop OpenSSL dependency from bitcoind completely, after secp256k1 verification is used. Simplifies the code overall.
    • I’ve never heard of anyone using SSL with RPC. It may also be better not to know; after all, this invites opening up the RPC port reachable to the internet or other untrusted networks. The limited amount of configurability of rpcssl almost guarantees this will be an insecure setup.
    • If you really need to use RPC remotely over an untrusted network, it is easy enough to set up stunnel or a SSH tunnel, or even e.g. OpenVPN, with the full power of those tools available you may have a chance of doing so securely.
  26. jonasschnelli commented at 12:05 pm on January 23, 2015: contributor
    I agree with @laanwj. SSL support could lead somebody to believe it’s “save”. IMO it currently a bad idea to expose bitcoind RPC to a public accessible area. Nevertheless, If one like to do this, he could still do a apache ssl enable reverse proxy to bitcoind’s RPC.
  27. gmaxwell commented at 2:27 pm on January 23, 2015: contributor
    Also SSL in the RPC massively increases the attack surface we have exposed (if you also expose it to the outside world) and we’ve had to push updates previously on account of it– even though we believe its a feature virtually no one uses. As mentioned it can be better accomplished via stunnel (or any of several other tools)
  28. gavinandresen commented at 4:13 pm on January 23, 2015: contributor
    I agree, it was a mistake to add SSL support to the RPC (mea culpa– I wrote the original version of that code).
  29. jonasschnelli commented at 3:07 pm on February 4, 2015: contributor
    concept ACK. needs rebase.
  30. laanwj force-pushed on Feb 10, 2015
  31. arnuschky commented at 7:47 am on February 21, 2015: contributor

    If I might chime in on the SSL discussion: I am not sure whether it should be removed. In terms of integrating bitcoin core into a bigger system, I would expect that it offers SSL in order to protect credentials. This has not much to do with publicly available access, I habitually do this for any service that sends credentials over any network (and I think that this is quite advisable for a bitcoin wallet). Of course, setting up an encrypted tunnel might be an option, but it’s cumbersome and error-prone.

    I would maybe propose a different route: leave the core’s RPC functions unauthenticated, and add authentication and SSL to the wallet.

  32. luke-jr commented at 7:58 am on February 21, 2015: member
    @arnuschky There is no secure way to expose RPC to an untrusted network, whether you use the wallet or not. Although perhaps with a libevent-based server maybe that changes…
  33. jgarzik commented at 3:46 pm on February 21, 2015: contributor
    @arnuschky The HTTP REST interface provides unauthenticated access to public blockchain data: https://github.com/bitcoin/bitcoin/blob/master/doc/REST-interface.md
  34. arnuschky commented at 9:42 am on February 22, 2015: contributor

    @luke-jr What are you referring to?

    Sorry, it seems that I haven’t made myself clear. I agree that it is advisable to get rid of the openssl dependency in bitcoin core. However, losing the SSL feature for wallet RPC is a bit annoying from the system integrator’s point of view - it just makes deployment more complex. I don’t agree with the black-and-white consideration of a “trusted” vs “untrusted” network. Even in a so-called “trusted” network, there might be bad actors (hoster, employees, compromised servers etc). Thus, running a wallet RPC access without encryption in a “trusted” network is a bad idea, IMO.

    Of course, a tunnel or a reverse proxy is an option, but increases complexity of deployment and is error-prone. Furthermore, if the default is insecure people just run with it. I would even argue the opposite: disable unencrypted RPC (except on 127.0.0.1) - this is also what btcd does, and I find it quite sensible.

    Maybe encrypted RPC could be broken out into the wallet with the upcoming modularization of the wallet code? On the long run, I find the design of btcd or for example the cryptonote coins quite sensible: a unauthenticated RPC for public blockchain data (what is now offered by the REST interface), and on a different port an authenticated, encrypted RPC for wallet access.

    PS: sorry if I am taking this a bit OT here, maybe this should be discussed somewhere else

  35. sipa commented at 12:26 pm on March 3, 2015: member
    Don’t let this go unmaintained :)
  36. jonasschnelli commented at 8:37 am on March 5, 2015: contributor
    Needs rebase. I’m willing to help (testing, fixing, etc.).
  37. jonasschnelli commented at 2:15 pm on March 10, 2015: contributor
    I think this also solves #5526
  38. laanwj added this to the milestone 0.11.0 on Mar 12, 2015
  39. laanwj force-pushed on Mar 12, 2015
  40. laanwj removed this from the milestone 0.11.0 on May 1, 2015
  41. laanwj added this to the milestone 0.12.0 on May 1, 2015
  42. jgarzik commented at 7:17 pm on May 2, 2015: contributor
    Continued +1 on this.
  43. ajweiss commented at 7:51 pm on May 5, 2015: contributor

    This was straightforward to rebase onto master. While playing with it, this message got printed to the console once:

    0[warn] event_active: event has no event_base set.
    

    I haven’t been able to reproduce it, but I’m banging on it now.

  44. in src/init.cpp: in e8123ee54c outdated
    535@@ -537,6 +536,21 @@ bool InitSanityCheck(void)
    536     return true;
    537 }
    538 
    539+bool AppInitServers(boost::thread_group& threadGroup)
    540+{
    541+    RPCServer::OnStopped(&OnRPCStopped);
    542+    RPCServer::OnPreCommand(&OnRPCPreCommand);
    543+    if (!StartHTTPServer(threadGroup))
    


    ajweiss commented at 9:35 pm on May 5, 2015:
    I think there’s a complicated race condition here. StartHTTPServer() will bind the HTTP server and start listening, but if requests come in before the RPC/REST servers have called RegisterHTTPHandler(), their requests will result in 404s. Moreover, if the request comes in before eventBase is set in StartHTTPServer(), HTTPRequest::WriteReply() will attempt to queue an HTTPEvent with an invalid event_base, which results in the server dropping the request on the floor and not replying at all. (with an ugly libevent warning appearing on the console) You can reproduce this by adding a MilliSleep(5000) to the end of StartHTTPServer before it assigns eventBase. With this in place, regression tests will fail. (It’s pretty sporadic without the sleep) You can catch the null eventBase by adding an assertion to HTTPRequest::WriteReply(), where you should see nStatus being set to 404. Hope this is helpful!

    laanwj commented at 8:35 am on May 7, 2015:
    Thanks a lot for testing this! I’ll try to track it down
  45. cheako commented at 0:05 am on May 15, 2015: none

    I attempted to rebase master from the 2015_01_evhttpd branch and ran into a lot of merge conflicts that seemed to have nothing to do with libevent or http. I’m not an expert in git, but does this indicate a serious problem?

    I’ve attempted with multiple -s options, but when completed I perform a diff master and I expected to see changes related to evhttpd but instead everything else seems to change.

  46. jgarzik commented at 0:36 am on May 15, 2015: contributor
    @cheako Sometimes the easiest thing to do is cherry pick each commit onto a fresh tree, rather than trying a git merge.
  47. laanwj commented at 10:45 am on May 15, 2015: member
    @cheako This has become non-trivial to rebase, foremost due to a few REST changes and code moves. I promise to get back to this after the 0.11.0 release bustle and #6121 is merged.
  48. ajweiss commented at 6:05 pm on May 15, 2015: contributor

    Here’s a branch that was rebased as of about 10 days ago if you want to play:

    https://github.com/ajweiss/bitcoin/tree/ajw_libevent_rebase

  49. ajweiss commented at 7:07 pm on May 15, 2015: contributor
    Also, on top of that rebase, here’s a quick way to reproduce the race I found. Just use the listtransactions.py RPC test… ajweiss/bitcoin@0c008eb017824a21bc711e9e0d1cf52bff3a7b85
  50. jgarzik commented at 6:54 pm on July 23, 2015: contributor
    Continue to want this merged…
  51. laanwj force-pushed on Aug 27, 2015
  52. laanwj commented at 6:38 pm on August 27, 2015: member
    Finally got around to rebasing this. It was somewhat tricky to rebase, especially over the univalue switch, so I hope I got everything right (the tests pass at least locally…).
  53. dcousens commented at 9:54 pm on August 27, 2015: contributor
    concept ACK
  54. gmaxwell commented at 11:37 pm on August 27, 2015: contributor
    Thanks for the work rebasing this.
  55. jgarzik commented at 0:34 am on August 28, 2015: contributor
    re-ACK + “it works” test
  56. laanwj force-pushed on Aug 28, 2015
  57. laanwj renamed this:
    [PoC] libevent-based http server
    libevent-based http server
    on Aug 28, 2015
  58. laanwj commented at 3:24 pm on August 28, 2015: member

    OK, fixed all the issues in the opening post. Also added message to the release notes about removing SSL support (as well as how to use stunnel instead). Removed the [Poc] tag.

    Not sure yet why Travis is failing the RPC tests on Windows (both 32 and 64).

  59. jonasschnelli commented at 3:39 pm on August 28, 2015: contributor
    Nice! Started testing now…
  60. laanwj force-pushed on Aug 28, 2015
  61. cheako commented at 8:53 pm on August 28, 2015: none

    Looks like this: http://stackoverflow.com/a/19017158

    Perhaps the URL is getting a “\n\r” on Windows but a “\n” elsewhere and the trim is not getting both bytes on Windows.

  62. theuni commented at 10:35 pm on August 28, 2015: member

    I tracked down a few of the problems:

    bitcoin-cli’s -rpcwait is broken due to the libevent connection model. CallRPC() doesn’t return when the connection fails, so the loop doesn’t continue, and a new connection isn’t tried. Instead it just sits there until the connection times out. This one’s easy to test, simply fire up a ./bitcoin-cli -rpcwait getblockcount before starting bitcoind, then run bitcoind and notice that it doesn’t return.

    The “fix” is to set the timeout to “1” here. https://github.com/bitcoin/bitcoin/pull/5677/files#diff-321303fddcf725df060981d626a05df9R152. I’m sure there’s a smarter way to handle it via callbacks. There may be some other interaction with retries, but this didn’t seem to make any different for me:

    0evhttp_connection_set_retries(evcon, 0);
    

    After that, the next problem is that the server seems to hold the connection in TIME_WAIT after bitcoin-cli has exited. That causes the server to fail to listen on the next invocation and it fails. I tried the usual trick (SO_REUSEADDR) on the client and server sides, but it didn’t help. I’m not sure where to go from there.

  63. theuni commented at 0:03 am on August 29, 2015: member

    Ok, I was able to get to the bottom of this and get the tests passing. I have cobbled together a few hacks with the understanding that they’re the wrong things to do, but hopefully @laanwj will quickly see how to fix things up properly.

    The issues outlined above are indeed the two issues. But to add to that, I think I know what’s happening with the SO_REUSEADDR issue.

    I was able to “fix” that by hacking libevent itself (see patches below), but that points out the real problem. See this comment from libevent:

    0#ifndef WIN32
    1        /* REUSEADDR on Unix means, "don't hang on to this address after the
    2         * listener is closed."  On Windows, though, it means "don't keep other
    3         * processes from binding to this address while we're using it. */
    

    So I think what we’re seeing is that SO_REUSEADDR works differently at runtime on native Windows as opposed to running it in Wine. Which makes sense. In Linux+Wine, it’ll still be interacting with native sockets, thus it needs the option set even if it wouldn’t be required in Windows. That’s just a guess, but I think it’s a reasonable one.

    I’m not sure what can be done about that, though. Whatever we decide, though, we need to re-address #6590, as that might’ve done more harm than good.

    Here’s what’s needed to make the tests happy (note that depends need to be rebuilt):

     0commit 0aec4863b65b1e4f96260dd1b58d0cc8319d9b4e
     1Author: Cory Fields <cory-nospam-@coryfields.com>
     2Date:   Fri Aug 28 19:45:12 2015 -0400
     3
     4    event-http: Fix (hack) Windows tests.
     5
     6diff --git a/depends/packages/libevent.mk b/depends/packages/libevent.mk
     7index 524a80f..94741ac 100644
     8--- a/depends/packages/libevent.mk
     9+++ b/depends/packages/libevent.mk
    10@@ -3,6 +3,11 @@ $(package)_version=2.0.22
    11 $(package)_download_path=https://sourceforge.net/projects/levent/files/libevent/libevent-2.0
    12 $(package)_file_name=$(package)-$($(package)_version)-stable.tar.gz
    13 $(package)_sha256_hash=71c2c49f0adadacfdbe6332a372c38cf9c8b7895bb73dabeaa53cdcc1d4e1fa3
    14+$(package)_patches=reuseaddr.patch
    15+
    16+define $(package)_preprocess_cmds
    17+  patch -p1 < $($(package)_patch_dir)/reuseaddr.patch
    18+endef
    19
    20 define $(package)_set_vars
    21   $(package)_config_opts=--disable-shared --disable-openssl --disable-libevent-regress
    22diff --git a/depends/patches/libevent/reuseaddr.patch b/depends/patches/libevent/reuseaddr.patch
    23new file mode 100644
    24index 0000000..24a6bb7
    25--- /dev/null
    26+++ b/depends/patches/libevent/reuseaddr.patch
    27@@ -0,0 +1,21 @@
    28+--- old/evutil.c   2015-08-28 19:26:23.488765923 -0400
    29++++ new/evutil.c   2015-08-28 19:27:41.392767019 -0400
    30+@@ -321,15 +321,16 @@
    31+ int
    32+ evutil_make_listen_socket_reuseable(evutil_socket_t sock)
    33+ {
    34+-#ifndef WIN32
    35+   int one = 1;
    36++#ifndef WIN32
    37+   /* REUSEADDR on Unix means, "don't hang on to this address after the
    38+    * listener is closed."  On Windows, though, it means "don't keep other
    39+    * processes from binding to this address while we're using it. */
    40+   return setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, (void*) &one,
    41+       (ev_socklen_t)sizeof(one));
    42+ #else
    43+-  return 0;
    44++  return setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, (const char*) &one,
    45++      (ev_socklen_t)sizeof(one));
    46+ #endif
    47+ }
    48+ 
    49diff --git a/src/bitcoin-cli.cpp b/src/bitcoin-cli.cpp
    50index 82171a6..3ec3d4f 100644
    51--- a/src/bitcoin-cli.cpp
    52+++ b/src/bitcoin-cli.cpp
    53@@ -149,7 +149,7 @@ UniValue CallRPC(const string& strMethod, const UniValue& params)
    54     struct evhttp_connection *evcon = evhttp_connection_base_new(base, NULL, host.c_str(), port); // TODO RAII
    55     if (evcon == NULL)
    56         throw runtime_error("create connection failed");
    57-    evhttp_connection_set_timeout(evcon, GetArg("-rpctimeout", 30));
    58+    evhttp_connection_set_timeout(evcon, GetArg("-rpctimeout", 1));
    59
    60     HTTPReply response;
    61     struct evhttp_request *req = evhttp_request_new(http_request_done, (void*)&response); // TODO RAII
    
  64. theuni commented at 0:11 am on August 29, 2015: member

    (sorry for spamming)

    Confirmed the theory above: https://bugs.winehq.org/show_bug.cgi?id=26031

    Looks like the real fix will be to make sure that sockets are properly torn down so that we’re not relying on SO_REUSEADDR at all. I tried everything I could think of, though, and couldn’t get the socket suitably unbound. Maybe @laanwj will have better luck.

  65. ruimarinho commented at 1:09 am on August 29, 2015: none
    This might be irrelevant at this point, but was libuv considered for multi platform support?
  66. dcousens commented at 1:21 am on August 29, 2015: contributor

    This might be irrelevant at this point, but was libuv considered for multi platform support?

    Having used both libraries extensively in commercial applications (using their native API), I +1 support for libuv over libevent also. However, this PR is already ready. libuv is also more frequently updated, and arguably is deployed many thousands times more often in production (node/io.js), and is therefore likely more battle tested.

    Fortunately, swapping between them is a straight forward enough task.

  67. laanwj commented at 8:11 am on August 29, 2015: member
    @cfields Thanks a lot for looking into this. I don’t like the solution of setting the timeout to 1, but it’s good to have a way to make the tests pass :) @ruimarinho This has been floating around for months. I’d like to merge this when the remaining problems are solved. But if you can do so quickly, feel free to port this to libuv then we can compare performance and such…
  68. gmaxwell commented at 8:27 am on August 29, 2015: contributor

    +1 on libevent; having also used it in large highly critical commercial applications it is very widely deployed and reliable.

    Libuv is also very similar, so if there was a need to change in the future it could just be done “no big deal.”. Comparison would be neat, I agree, but not a reason to disrupt progress here.

  69. jonasschnelli commented at 8:28 am on August 29, 2015: contributor
    +1 on libevent. No need to disturb this PR in that stage of progress.
  70. jonasschnelli commented at 8:53 am on August 29, 2015: contributor

    Performance:

    Performance looks good, not much of a difference. Built over gitian for testing (https://builds.jonasschnelli.ch/pulls/5677/). Had much slower results with a local build (osx, –enable-debug, libevent 2.0.22, dynamic linked).

    Stats, no very representative, did only once on a non-calm system. Regtest, 100 blocks chain, OSX 10.10, 2.6 GHz Intel Core i7: apache-bench comparison: 1000 getinfo requests: (slightly slower) https://gist.github.com/jonasschnelli/2b511c62c1d9ed901a9b

    apache-bench comparison: 10'000 getinfo requests: (faster ~1.4*) https://gist.github.com/jonasschnelli/d86c552d99dd2d1672d3

    Memory:

    The memory behavior looks very similar if i compare current master with the PR. No leeks detected. 10'000 getchaintips request looks very similar in terms of CPU load and mem consumption.

    (top chart is current master, bottom chart is libevent)

    Unit Tests

    Successfully ran on linux and osx multiple times.

  71. jonasschnelli commented at 8:57 am on August 29, 2015: contributor
    Do i understand this correct, that this PR would no longer support non-http-keep-alive request/responses? Do we break existing things if we no longer allow forcing the disabling of http-keep-alive? Maybe some systems request with http keep alive "Connection: keep-alive" but expect that the connection closes after the first request because they have disabled keep-alive over the -rpckeepalive arg. Anyways the connected system should take care of that. Maybe just for the release notes.
  72. in src/httprpc.cpp: in fba95dbe0d outdated
    102+        LogPrintf("ThreadRPCServer incorrect password attempt from %s\n", req->GetPeer().ToString());
    103+
    104+        /* Deter brute-forcing
    105+           If this results in a DoS the user really
    106+           shouldn't have their RPC port exposed. */
    107+        MilliSleep(250);
    


    jonasschnelli commented at 9:13 am on August 29, 2015:
    idea: sleep time=*2 per attempt unless a successful auth happened? (although could be done in another PR)

    gmaxwell commented at 9:15 am on August 29, 2015:
    Since one can evade that just by making new connections (in parallel) is it really a gain?

    jonasschnelli commented at 9:19 am on August 29, 2015:
    Hmm… the sleep time then could be global (thread safe static uint64_t)?

    gmaxwell commented at 9:33 am on August 29, 2015:
    perhaps, though it turns a auth token probe into a DOS attack then, enh?. That might be preferable though if your token is at all bruteforcable you are in bad shape.

    laanwj commented at 9:04 am on August 31, 2015:
    I aimed to keep the behavior here exactly the same. Let’s leave improvements for a latter pull. (there are various practical issues with doing it this way, e.g. sleeping a worker thread to just pause a bit is wasteful - it could be handled with a timed event callback. Then again, this is not supposed to ever trigger as people should not leave the RPC port open to the outside world, so don’t spend too much time or code on it.)
  73. theuni commented at 10:13 am on August 29, 2015: member
    @laanwj In case it wasn’t clear in all my rambling above, the rpcwait issue will need some sort of fix, not just for the sake of testing. I certainly didn’t intend to imply that timeout=1 was any real solution though :) @jonasschnelli great testing!
  74. in src/bitcoin-cli.cpp: in fba95dbe0d outdated
    143+    if (buf)
    144+    {
    145+        size_t size = evbuffer_get_length(buf);
    146+        const char *data = (const char*)evbuffer_pullup(buf, size);
    147+        if (data)
    148+            reply->body = std::string(data, size);
    


    theuni commented at 10:27 am on August 29, 2015:
    Should evbuffer_drain here for good measure.

    laanwj commented at 11:52 am on August 31, 2015:
    Done. Good catch.
  75. laanwj commented at 9:07 am on August 31, 2015: member

    @jonasschnelli Thanks for testing the memory behavior and performance!

    Do i understand this correct, that this PR would no longer support non-http-keep-alive request/responses?

    Libevent’s http server absolutely supports non-http-keepalive Connection:close, and even HTTP 1.0. What it doesn’t support is disabling keepalive globally (as it is a mandatory feature of HTTP 1.1.) I don’t think this is necessary anymore - the -rpckeepalive=0 workaround was there to work around broken RPC behavior, as well as lack of any kind of timeout. (hmm - it may be possible to force HTTP 1.0 for connections, which sort of amounts to the same thing) (no, no way that I see)

  76. laanwj force-pushed on Aug 31, 2015
  77. laanwj force-pushed on Aug 31, 2015
  78. laanwj commented at 3:25 pm on August 31, 2015: member

    CallRPC() doesn’t return when the connection fails, so the loop doesn’t continue, and a new connection isn’t tried.

    Missed this observation @theuni. Yes, that should definitely be solved. I wonder why this only trips up on Wine.E.g. on Linux this returns immediately without hang, if there is no server bound:

    0$ src/bitcoin-cli -rpcpassword='a' getinfo
    1error: couldn't connect to server
    
  79. theuni commented at 4:14 pm on August 31, 2015: member

    @laanwj I wondered the same thing. The behavior seemed odd to me, I assumed it was just a timing difference. I’m glad to know it’s just a Wine/Windows thing, but now I’m curious to know if it happens in native Win as well. I’ll give it a shot on osx and see what happens.

    Probably worth taking a look at the source of evhttp_request_new to see what flags it sets. I also found it odd that a different callback isn’t used for a connection failure, that makes it tough to handle.

  80. laanwj commented at 9:22 am on September 1, 2015: member
    Looks like we’re chasing emulator rabbits. I’ve cross-compiled (w/ depends) to win64 locally and ran the RPC tests (just as travis does) against this code, through WINE. No issues. The wine version is wine-1.7.50. Maybe that makes the difference, and Travis runs an old broken version? If so I’d propose setting rpctimeout=1 just for travis-win32/64 to make the test pass until they upgrade to a newer wine. @jonasschnelli is going to run the tests on a real windows VM. Those results will be more useful.
  81. jonasschnelli commented at 1:46 pm on September 1, 2015: contributor

    Successfully ran rcp-test.sh on “Windows 8” (see screen below): Disabled p2p-fullbocktest.py because didn’t want to install python/dbm. Also had to disable one test in rest.py during getutxo binary request. The python httplib did somehow not sent a proper binary request. Tracked down and seems to be unrelated to this PR (also happens on current master).

  82. laanwj force-pushed on Sep 2, 2015
  83. laanwj force-pushed on Sep 2, 2015
  84. laanwj force-pushed on Sep 2, 2015
  85. doc: remove documentation for rpcssl 51fcfc022c
  86. qa: Remove -rpckeepalive tests from httpbasics
    This option was a temporary workaround, and is no longer necessary
    with the new web server.
    8f9301cdaa
  87. Remove rpc_boostasiotocnetaddr test
    Dropping all use of boost::asio.
    6a21dd598c
  88. libevent: add depends a9af234c1f
  89. build: build-system changes for libevent 3140ef9249
  90. tests: fix qt payment test
    Now that boost no longer automatically initializes openssl, we have to
    do it ourselves.
    6e996d39da
  91. tests: GET requests cannot have request body, use POST in rest.py
    Sending a request body with GET request is not valid in HTTP spec, and
    not compatible with evhttpd.
    ee2a42b447
  92. laanwj force-pushed on Sep 2, 2015
  93. evhttpd implementation
    - *Replace usage of boost::asio with [libevent2](http://libevent.org/)*.
    boost::asio is not part of C++11, so unlike other boost there is no
    forwards-compatibility reason to stick with it. Together with #4738 (convert
    json_spirit to UniValue), this rids Bitcoin Core of the worst offenders with
    regard to compile-time slowness.
    
    - *Replace spit-and-duct-tape http server with evhttp*. Front-end http handling
    is handled by libevent, a work queue (with configurable depth and parallelism)
    is used to handle application requests.
    
    - *Wrap HTTP request in C++ class*; this makes the application code mostly
    HTTP-server-neutral
    
    - *Refactor RPC to move all http-specific code to a separate file*.
    Theoreticaly this can allow building without HTTP server but with another RPC
    backend, e.g. Qt's debug console (currently not implemented) or future RPC
    mechanisms people may want to use.
    
    - *HTTP dispatch mechanism*; services (e.g., RPC, REST) register which URL
    paths they want to handle.
    
    By using a proven, high-performance asynchronous networking library (also used
    by Tor) and HTTP server, problems such as #5674, #5655, #344 should be avoided.
    
    What works? bitcoind, bitcoin-cli, bitcoin-qt. Unit tests and RPC/REST tests
    pass. The aim for now is everything but SSL support.
    
    Configuration options:
    
    - `-rpcthreads`: repurposed as "number of  work handler threads". Still
    defaults to 4.
    
    - `-rpcworkqueue`: maximum depth of work queue. When this is reached, new
    requests will return a 500 Internal Error.
    
    - `-rpctimeout`: inactivity time, in seconds, after which to disconnect a
    client.
    
    - `-debug=http`: low-level http activity logging
    40b556d374
  94. doc: mention SSL support dropped for RPC in release notes 57d85d9bee
  95. Implement RPCTimerHandler for Qt RPC console
    Implement RPCTimerHandler for Qt RPC console, so that `walletpassphrase`
    works with GUI and `-server=0`.
    
    Also simplify HTTPEvent-related code by using boost::function directly.
    be33f3f50b
  96. Document options for new HTTP/RPC server in --help 6d2bc22146
  97. Fix race condition between starting HTTP server thread and setting EventBase()
    Split StartHTTPServer into InitHTTPServer and StartHTTPServer to give
    clients a window to register their handlers without race conditions.
    
    Thanks @ajweiss for figuring this out.
    3a174cd400
  98. libevent: Windows reuseaddr workaround in depends
    Make it possible to reuse sockets.
    This is necessary to make the RPC tests work in WINE.
    4be0b082b9
  99. Move windows socket init to utility function 26c9b83677
  100. laanwj force-pushed on Sep 3, 2015
  101. doc: update deps in build-unix.md after libevent
    Add libevent, change usage of libssl from "secure communication" to
    "crypto" that's more accurate after RPC SSL support removed.
    1e700c9b60
  102. Revert "rpc-tests: re-enable rpc-tests for Windows"
    This reverts commit bd30c3dced21fca869a14c75081f15195762afe1.
    
    Disable windows RPC tests for now. These should be re-enabled once a
    suitable Wine version is used on Travis.
    d528025517
  103. laanwj merged this on Sep 4, 2015
  104. laanwj closed this on Sep 4, 2015

  105. laanwj referenced this in commit 9aa90994ee on Sep 4, 2015
  106. jonasschnelli commented at 11:24 am on September 4, 2015: contributor
    Post merge tested ACK
  107. paveljanik commented at 1:20 pm on September 4, 2015: contributor

    Build notes should be extended for at least OS X/brew. I use macports here and:

    0port install libevent
    

    was needed to get it compiled.

  108. jonasschnelli commented at 1:27 pm on September 4, 2015: contributor
    @paveljanik: just added libevent mentioning for osx in #6635
  109. jgarzik commented at 1:32 pm on September 4, 2015: contributor
    post merge tested re-ACK
  110. luke-jr referenced this in commit 04d9d8a2c4 on Jan 10, 2016
  111. luke-jr referenced this in commit 2fde236d23 on Jan 10, 2016
  112. zkbot referenced this in commit b0afc4ba1d on Mar 22, 2017
  113. zkbot referenced this in commit f9f48667be on Mar 25, 2017
  114. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-18 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me