[WIP] RPC: Add Method To Retrieve HTTP Server Metrics #14202

pull etscrivner wants to merge 3 commits into bitcoin:master from etscrivner:add-rpcserver-stats-rpc-interface changing 4 files +72 −1
  1. etscrivner commented at 2:53 AM on September 12, 2018: contributor

    This is a work in progress, all comments and review are welcome and appreciated.

    TL;DR: We add a new network RPC call, getnetworkrpcinfo, that returns metrics about JSON RPC HTTP server.

    INTRODUCTION

    As part of constructing Service Level Agreements (SLAs) around the uptime of Bitcoin nodes, it is useful to measure such things as requests processed per second or average request processing time. While the getnetworkinfo RPC call provides a lot information and many useful metrics about the P2P layer of Bitcoin, it does not provide any metrics about the JSON RPC layer. Since many services interact with Bitcoin almost entirely through its JSON RPC interface, it would be useful to collect and provide metrics about this interface.

    OVERVIEW OF CHANGES

    This PR contains two major changes: Collecting HTTP server metrics and Adding a new RPC method.

    Collect HTTP Server Metrics (httpserver.h, httpserver.cpp)

    This change is the first part of adding a new RPC call to report stats from the HTTP server. It modifies the HTTP server functionality to collect and report the gathered statistics.

    httpserver.h:

    • Add private member variable receivedAtTimeMicros to HTTPRequest to record the time at which the request was started.
    • Add data-type, RPCServerStats, that holds all gathered HTTP server stats.
    • Add function, GetRPCServerStats, that returns the current RPC server stats via the provider pointer.

    httpserver.cpp:

    • Add static global g_rpcserverstats to hold HTTP server stats across all requests.
    • Add static global g_rpcserverstats_mutex to serialize writes to g_rpcserverstats.
    • Modify several functions to collect the RPC server stats.
    • Implement GetRPCServerStats.

    Add getnetworkrpcinfo RPC Method (rpc/net.cpp)

    Add a new RPC command, getnetworkrpcinfo, that returns an object containing various state information about the HTTP server providing a JSON RPC interface. This information can be used to derive information such as requests per second and average request processing time.

    The information is provided in a raw form leaving the clients to derive such useful metrics, this is intentional and allows this RPC call to avoid issues around floating point accuracy and floating point error accumulation. Instead, these issues are left to the consumers of the response object to handle in whatever fashion they like.

  2. fanquake added the label RPC/REST/ZMQ on Sep 12, 2018
  3. etscrivner force-pushed on Sep 12, 2018
  4. etscrivner force-pushed on Sep 12, 2018
  5. MarcoFalke commented at 3:15 AM on September 12, 2018: member

    Concept ACK. Though, wouldn't it be easier to record these metrics outside of Bitcoin Core? I.e. in the "middle" between the rpc endpoint and rpc consumer. The outside approach seems more flexible to me, since it could collect the metrics per-call or overall or per day ...

  6. etscrivner force-pushed on Sep 12, 2018
  7. etscrivner commented at 3:39 AM on September 12, 2018: contributor

    @MarcoFalke Thank you for the quick feedback!

    So, in the configuration I have in mind here we have a set of nodes sitting behind a load-balancer. The desire is to gather info about request volume and latency from the nodes themselves as part of a health check, and to potentially use this information to scale the cluster up or down. Nodes themselves are ephemeral and the entire cluster may be redeployed multiple times per day.

    AWS (as far as I know) does not provide a way to query comparable information in realtime from an ELB/NLB/ALB, but instead allows you to export logs for later analysis. So getting this information in real-time from the load balancer isn't possible.

    The remaining option is to introduce an additional HTTP proxy between the HTTP server provided by Bitcoin and the incoming requests for the sole purpose of collecting this information, but this seems wasteful since Bitcoin itself is a server and could provide the required information with lower latency and higher quality. Since comparable information is already provided about the P2P services provided by Bitcoin (getnettotals), it seems to make sense to provide similar information about the HTTP services.

  8. etscrivner commented at 3:41 AM on September 12, 2018: contributor

    As an aside, I think a better name for this is probably getrpctotals or something similar. Could possibly be merged with getnettotals as well.

  9. etscrivner force-pushed on Sep 12, 2018
  10. DrahtBot commented at 6:45 AM on September 12, 2018: member

    <!--e57a25ab6845829454e8d69fc972939a-->Note to reviewers: This pull request conflicts with the following ones:

    • #13152 ([rpc] Add getnodeaddresses RPC command by chris-belcher)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  11. in src/httpserver.cpp:704 in dfb1ba37cf outdated
     696 | @@ -677,3 +697,8 @@ std::string urlDecode(const std::string &urlEncoded) {
     697 |      }
     698 |      return res;
     699 |  }
     700 | +
     701 | +void GetRPCServerStats(RPCServerStats* rpc_server_stats) {
     702 | +    assert(rpc_server_stats != nullptr);
     703 | +    memcpy(rpc_server_stats, &g_rpcserverstats, sizeof(RPCServerStats));
    


    promag commented at 7:12 AM on September 12, 2018:

    Needs locking?


    etscrivner commented at 12:46 PM on September 12, 2018:

    Thanks!

  12. promag commented at 7:20 AM on September 12, 2018: member

    I think this is not very flexible. We could just send the events begin/end to somewhere so these can be used as a source to some analytics and monitoring tool.

  13. etscrivner commented at 12:46 PM on September 12, 2018: contributor

    @promag This works and could be a much lower maintenance burden. I assume you mean using boost::signals2 to create request start/end signals or possibly pushing zeromq notifications? Any recommendations on approach would be appreciated

  14. etscrivner force-pushed on Sep 12, 2018
  15. etscrivner force-pushed on Sep 12, 2018
  16. etscrivner force-pushed on Sep 12, 2018
  17. promag commented at 8:27 PM on September 12, 2018: member

    I assume you mean using boost::signals2 to create request start/end signals or possibly pushing zeromq notifications? @etscrivner actually I was thinking in something simpler. If we log req_id rpc_method start_time ... and req_id rpc_method stop_time ... or just rpc_method duration ... then those metrics could be extracted externally, even in realtime. Or am I missing something?

  18. in src/httpserver.cpp:702 in ee01d94672 outdated
     696 | @@ -677,3 +697,9 @@ std::string urlDecode(const std::string &urlEncoded) {
     697 |      }
     698 |      return res;
     699 |  }
     700 | +
     701 | +void GetRPCServerStats(RPCServerStats* rpc_server_stats) {
     702 | +    assert(rpc_server_stats != nullptr);
    


    promag commented at 9:19 PM on September 12, 2018:

    Remove assert and use reference instead of pointer?

  19. in src/httpserver.cpp:704 in ee01d94672 outdated
     696 | @@ -677,3 +697,9 @@ std::string urlDecode(const std::string &urlEncoded) {
     697 |      }
     698 |      return res;
     699 |  }
     700 | +
     701 | +void GetRPCServerStats(RPCServerStats* rpc_server_stats) {
     702 | +    assert(rpc_server_stats != nullptr);
     703 | +    std::lock_guard<std::recursive_mutex> rpcserverstats_guard(g_rpcserverstats_mutex);
     704 | +    memcpy(rpc_server_stats, &g_rpcserverstats, sizeof(RPCServerStats));
    


    promag commented at 9:20 PM on September 12, 2018:

    An alternative is to return a copy of g_rpcserverstats.


    etscrivner commented at 3:16 PM on September 14, 2018:

    I think this is probably the right approach here given that it is a fairly lightweight data structure.

  20. in src/httpserver.cpp:623 in ee01d94672 outdated
     617 | @@ -602,6 +618,10 @@ void HTTPRequest::WriteReply(int nStatus, const std::string& strReply)
     618 |          }
     619 |      });
     620 |      ev->trigger(nullptr);
     621 | +
     622 | +    std::lock_guard<std::recursive_mutex> rpcserverstats_guard(g_rpcserverstats_mutex);
     623 | +    g_rpcserverstats.total_request_time_micros += GetTimeMicros() - receivedAtTimeMicros;
    


    promag commented at 9:24 PM on September 12, 2018:

    This doesn't account sending the reply which I think it should.

  21. in src/rpc/net.cpp:512 in ee01d94672 outdated
     507 | +            "\nExamples:\n"
     508 | +            + HelpExampleCli("getnetworkrpcinfo", "")
     509 | +            + HelpExampleRpc("getnetworkrpcinfo", "")
     510 | +        );
     511 | +
     512 | +    RPCServerStats rpc_stats;
    


    promag commented at 9:25 PM on September 12, 2018:

    nit RPCServerStats rpc_server_stats.

  22. in test/functional/rpc_net.py:109 in ee01d94672 outdated
     103 | @@ -101,5 +104,13 @@ def _test_getpeerinfo(self):
     104 |          assert_equal(peer_info[0][0]['minfeefilter'], Decimal("0.00000500"))
     105 |          assert_equal(peer_info[1][0]['minfeefilter'], Decimal("0.00001000"))
     106 |  
     107 | +    def _test_getnetworkrpcinfo(self):
     108 | +        first_result = self.nodes[0].getnetworkrpcinfo()
     109 | +        time.sleep(1)
    


    promag commented at 9:25 PM on September 12, 2018:

    Remove?

  23. in src/rpc/net.cpp:505 in ee01d94672 outdated
     500 | +            "Returns an object containing various state info regarding RPC service.\n"
     501 | +            "\nResult:\n"
     502 | +            "{\n"
     503 | +            "  \"total_requests\": xxxxx,            (numeric) Lifetime total number of RPC requests processed.\n"
     504 | +            "  \"uptime_secs\": xxxxx,               (numeric) Uptime of RPC server in seconds.\n"
     505 | +            "  \"total_request_time_micros\": xxxxx  (numeric) Total time spent processing RPC requests in microseconds.\n"
    


    promag commented at 9:26 PM on September 12, 2018:

    Is this relevant?


    etscrivner commented at 3:17 PM on September 14, 2018:

    If the uptime is recorded in micros, then I think you are right that this becomes redundant.

  24. promag commented at 9:27 PM on September 12, 2018: member

    Should there be a way to reset the counters?

  25. PierreRochard commented at 1:42 AM on September 13, 2018: contributor

    Concept ACK

  26. laanwj commented at 8:30 AM on September 13, 2018: member

    Though, wouldn't it be easier to record these metrics outside of Bitcoin Core? I.e. in the "middle" between the rpc endpoint and rpc consumer. The outside approach seems more flexible to me, since it could collect the metrics per-call or overall or per day ...

    Agree that these current statistics are easy enough to collect from the outside. Though I suppose there are some things that are not easy to collect from the outside using a proxy; say, if you want to record CPU/IO usage per call.

    OTOH these kind of changes run a significant risk of creating an "inner platform effect", where the scope of a simple daemon extends to be a fully-fledged OS stack including instrumentation, monitoring. All things that need to be documented, maintained, supported.

    I think this is not very flexible. We could just send the events begin/end to somewhere so these can be used as a source to some analytics and monitoring tool.

    Right—the preferable way would be to accommodate OS tooling for instrumentation, so that sysadmins, developers, and such can use the existing frameworks.

    I remember @eklitzke was talking about something like that, adding hooks for "user-defined static trace probes", which could be monitored using bcc systemtap dtrace etc.

  27. etscrivner force-pushed on Sep 14, 2018
  28. Add HTTP Server Statistics Gathering
    This change is the first part of adding a new RPC call to report stats from the
    HTTP server. It modifies the HTTP server functionality to collect and report
    the gathered statistics.
    
    `httpserver.h`:
      * Add private member variable `receivedAtTimeMicros` to `HTTPRequest` to record the time at which the request was started.
      * Add data-type, `RPCServerStats`, that holds all gathered HTTP server stats.
      * Add function, `GetRPCServerStats`, that returns the current RPC server stats via the provider pointer.
    
    `httpserver.cpp`:
      * Add static global `g_rpcserverstats` to hold HTTP server stats across all requests.
      * Add static global `g_rpcserverstats_mutex` to serialize writes to `g_rpcserverstats`.
      * Modify several functions to collect the RPC server stats.
      * Implement `GetRPCServerStats`.
    7346f956fd
  29. Add getnetworkrpcinfo RPC Command
    Add a new RPC command, `getnetworkrpcinfo`, that returns an object containing
    various state information about the HTTP server providing a JSON RPC interface.
    This information can be used to derive information such as requests per second
    and average request processing time.
    
    The information is provided in a raw form leaving the clients to derive such
    useful metrics, this is intentional and allows this RPC call to avoid issues
    around floating point accuracy and floating point error accumulation. Instead,
    these issues are left to the consumers of the response object to handle in
    whatever fashion they like.
    a43c23e59a
  30. Add Functional Test For getnetworkrpcinfo 32e45ae1e7
  31. etscrivner force-pushed on Sep 14, 2018
  32. etscrivner commented at 3:49 PM on September 14, 2018: contributor

    @laanwj Thank you for the review.

    I've begun looking into the usage of static trace probes that would allow for these measurements to happen. You make a very good point about the potential for this to create an "inner platform effect". Based on what you and @promag have said, it seems like that is a more flexible approach here.

  33. etscrivner closed this on Sep 15, 2018

  34. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 15:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me