If bitcoind is processing 16 RPC requests, attempting to submit another request using bitcoin-cli produces this less-than-helpful error message: error: couldn't parse reply from server. This PR changes the error to: error: server response: Work queue depth exceeded.
bitcoin-cli: print useful error if bitcoind rpc work queue exceeded #18335
pull LarryRuane wants to merge 2 commits into bitcoin:master from LarryRuane:cli-work-queue-error-msg changing 3 files +29 −1-
LarryRuane commented at 5:49 PM on March 12, 2020: contributor
-
LarryRuane commented at 5:49 PM on March 12, 2020: contributor
To reproduce the problem, start the client:
bitcoind -rpcworkqueue=1. This reduces the maximum number of in-progress RPCs from 16 (DEFAULT_HTTP_WORKQUEUE) to 1. Then run:seq 10 | while read i;do src/bitcoin-cli getblockhash 0 >/dev/null&doneThis submits this trivial RPC 10 time in the background (the
&character) and usually produces one or more of theerror: couldn't parse reply from servererrors. If it doesn't, try again, or increase the load to more than 10. You will see a reasonable error message indebug.log:2020-03-12T17:32:28Z WARNING: request rejected because http work queue depth exceeded, it can be increased with the -rpcworkqueue= settingBut most users wouldn't think to look there, or wouldn't have access.
With this PR, the error is more understandable. The failed request in this case is retryable, so maybe that should be indicated in the text, but this text (
Work queue depth exceeded) comes frombitcoind, which probably shouldn't be changed, especially since there could be other RPC clients that depend on that exact text. We could havebitcoin-cliinclude additional text likeplease try againbut it doesn't really know for certain in every case that the request is retryable. -
in src/bitcoin-cli.cpp:407 in 86af496741 outdated
403 | @@ -404,7 +404,7 @@ static UniValue CallRPC(BaseRequestHandler *rh, const std::string& strMethod, co 404 | // Parse reply 405 | UniValue valReply(UniValue::VSTR); 406 | if (!valReply.read(response.body)) 407 | - throw std::runtime_error("couldn't parse reply from server"); 408 | + throw std::runtime_error(strprintf("server response: %s", response.body));
laanwj commented at 6:47 PM on March 12, 2020:I think it's strange that it ends up here. It's trying to parse an error message as JSON! Isn't there some error code (either HTTP or JSON-RPC) that could be checked for instead? If not, that seems like a bug in the server.
LarryRuane commented at 9:59 PM on March 12, 2020:It does seem wrong that my proposed fix retains the practice of trying to decode a non-JSON string as JSON, and depending on that to fail. Yes, perhaps this is a bug in the server -- should it be returning JSON in
bodyhere rather than a plain string? But as I said at the end of the comment above, I was hesitant to change the server because I'm worried that there may be other RPC clients besidesbitcoin-clithat depend on the current (plain string) response. What do you think? If the fix is better in the server, I can do that.Just for further information, the response returned in this work queue depth exceeded is (status 500 is
HTTP_INTERNAL_SERVER_ERROR):(gdb) f 0 Breakpoint 1, CallRPC (rh=0x555555845e90, strMethod="getblockhash", args=std::vector of length 1, capacity 2 = {...}) at bitcoin-cli.cpp:406 406 if (!valReply.read(response.body)) (gdb) p response $2 = { status = 500, error = -1, body = "Work queue depth exceeded" } (gdb)Here's a successful reply:
(gdb) p response $6 = { status = 200, error = -1, body = "{\"result\":\"000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f\",\"error\":null,\"id\":1}\n" }Here's the response when requesting a nonsense method:
(gdb) p response $2 = { status = 404, error = -1, body = "{\"result\":null,\"error\":{\"code\":-32601,\"message\":\"Method not found\"},\"id\":1}\n" }As part of this PR, I could also systematically search for and fix any other places where
bodyis returned as plain string (not JSON); so far I see only one:req->WriteReply(HTTP_BAD_METHOD, "JSONRPC server handles only POST requests")LarryRuane commented at 10:59 PM on March 12, 2020: contributorThe second commit undoes the first, and fixes the problem in the server by having it return JSON in the body of the response:
error code: -32603 error message: Work queue depth exceeded(Here's where that error code,
RPC_INTERNAL_ERRORis defined.) It makes sense that this is similar to:$ src/bitcoin-cli not-a-valid-rpc error code: -32601 error message: Method not found $DrahtBot added the label RPC/REST/ZMQ on Mar 13, 2020laanwj commented at 9:11 PM on March 17, 2020: memberOk, looks like his is actually non-trivial.
HTTP_INTERNAL_SERVER_ERROR
$2 = { status = 500, error = -1, body = "Work queue depth exceeded" }So in this case, possibly you could detect this error code in bitcoin-cli, and print an error message instead of trying to parse as JSON. Or are there cases in which it returns 500 but the response needs to be parsed as JSON?
I must admit we're sometimes inconsistent about whether to report errors at the HTTP layer or the JSON-RPC, so it's very possible that your solution of changing the server to return a JSON error is the most consistent one!
(I'm not personally very worried about clients depending on a specific exact error message text, though we might want to mention it briefly in the release notes)
laanwj commented at 2:10 PM on March 27, 2020: memberI must admit we're sometimes inconsistent about whether to report errors at the HTTP layer or the JSON-RPC
I checked this and yes, we do return JSON errors with
HTTP_INTERNAL_SERVER_ERROR:static void JSONErrorReply(HTTPRequest* req, const UniValue& objError, const UniValue& id) { // Send error reply from json-rpc error object int nStatus = HTTP_INTERNAL_SERVER_ERROR; int code = find_value(objError, "code").get_int(); if (code == RPC_INVALID_REQUEST) nStatus = HTTP_BAD_REQUEST; else if (code == RPC_METHOD_NOT_FOUND) nStatus = HTTP_NOT_FOUND; std::string strReply = JSONRPCReply(NullUniValue, objError, id); req->WriteHeader("Content-Type", "application/json"); req->WriteReply(nStatus, strReply); }So we have no way to distinguish "parse this as JSON?" based on just the status code.
I think the current change of changing the server to always return JSON errors is OK. I think it's slightly more elegant than doing "try to parse as JSON and print the raw text otherwise" in the client. But I don't have a strong opinion.
hebasto commented at 7:39 PM on April 4, 2020: member@LarryRuane Just was noted about this PR by GH :)
If you find this approach simpler you may grab it :)
LarryRuane force-pushed on Apr 4, 2020DrahtBot commented at 8:02 PM on April 4, 2020: member<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--174a7506f384e20aa4161008e828411d-->
Conflicts
No conflicts as of last run.
LarryRuane commented at 8:04 PM on April 4, 2020: contributorluke-jr approvedluke-jr commented at 3:21 AM on May 5, 2020: memberutACK 4bb892cec2375d919ae66a743e2ad29cd56ba888
util: Handle HTTP_SERVICE_UNAVAILABLE in bitcoin-cli b5a80fa7e4LarryRuane force-pushed on Jul 12, 2020LarryRuane commented at 8:54 PM on July 12, 2020: contributorRebase (refresh)
hebasto approvedhebasto commented at 4:08 PM on July 14, 2020: memberself-ACK b5a80fa7e487c37b7ac0e3874a2fabade41b9ca8
jonatack commented at 4:48 PM on July 17, 2020: memberConcept ACK. Could you please add functional tests for this in one of the
test/functional/interface*files? Say,interface_bitcoin_cli.py(which has a lot of error response testing) orinterface_rpc.py(which has a batch request test).in test/functional/interface_rpc.py:35 in c067589b7a outdated
30 | + node.cli('getrpcinfo').send_cli() 31 | + except subprocess.CalledProcessError as e: 32 | + if e.output != 'error: Server response: Work queue depth exceeded\n': 33 | + raise AssertionError("Unexpected cli error: :{}:".format(e.output)) 34 | + got_exceeded_error = True 35 | + break
LarryRuane commented at 3:55 PM on July 21, 2020:This 400 is, of course, arbitrary, as is the number of threads this test starts (5, see below). These numbers are the result of simple trial-and-error. We want each thread to do enough work that it's still running by the time all 5 threads are started. Still, the test only takes about 1.3 seconds to run on my system. Increasing the number of threads allows the iteration count here to be reduced, but I don't want to start too many threads, because each one probably uses a nontrivial amount of resources and I don't want to overburden the CI workers.
jonatack commented at 12:31 PM on July 22, 2020: memberTested almost-ACK. Thanks for adding the test! Runs quickly, but one of the Travis jobs failed at
assert got_exceeded_errorso it may need to be more reliable. Some suggested changes below: add a line of logging, attempt to remove perhaps-unneeded breaks, and PEP8 suggestions frompycodestyleandblack-- feel free to ignore the style nits. Tested that the test fails on master and passes on this branch.<details><summary>some suggested edits to the test</summary><p>
index e0e6fc57aa..cdbed53dd6 100755 --- a/test/functional/interface_rpc.py +++ b/test/functional/interface_rpc.py @@ -11,6 +11,7 @@ from test_framework.util import assert_equal, assert_greater_than_or_equal from threading import Thread import subprocess + def expect_http_status(expected_http_status, expected_rpc_code, fcn, *args): try: @@ -20,7 +21,10 @@ def expect_http_status(expected_http_status, expected_rpc_code, assert_equal(exc.error["code"], expected_rpc_code) assert_equal(exc.http_status, expected_http_status) + got_exceeded_error = False + + def test_work_queue_getblock(node): global got_exceeded_error for _ in range(400): @@ -29,10 +33,9 @@ def test_work_queue_getblock(node): try: node.cli('getrpcinfo').send_cli() except subprocess.CalledProcessError as e: - if e.output != 'error: Server response: Work queue depth exceeded\n': - raise AssertionError("Unexpected cli error: :{}:".format(e.output)) + assert_equal(e.output, 'error: Server response: Work queue depth exceeded\n') got_exceeded_error = True - break + class RPCInterfaceTest(BitcoinTestFramework): def set_test_params(self): @@ -84,19 +87,18 @@ class RPCInterfaceTest(BitcoinTestFramework): expect_http_status(500, -8, self.nodes[0].getblockhash, 42) def test_work_queue_exceeded(self): + self.log.info("Test work queue exceeded") global got_exceeded_error - self.restart_node(0,['-rpcworkqueue=1']) + self.restart_node(0, ['-rpcworkqueue=1']) threads = [] for _ in range(5): - t = Thread(target=test_work_queue_getblock, args=(self.nodes[0], )) + t = Thread(target=test_work_queue_getblock, args=(self.nodes[0],)) t.start() threads.append(t) - if got_exceeded_error: - break for t in threads: t.join() # This likely just means more threads are needed. - assert(got_exceeded_error) + assert got_exceeded_error def run_test(self): self.test_getrpcinfo()</p></details>
fjahr commented at 1:04 PM on July 22, 2020: memberConcept ACK
Code looks good at a glance but the test fails on Travis, so I will come back for a more in-depth review when that is fixed :)
LarryRuane force-pushed on Jul 22, 2020in test/functional/interface_rpc.py:86 in db0d4605b9 outdated
79 | @@ -67,10 +80,23 @@ def test_http_status_codes(self): 80 | expect_http_status(404, -32601, self.nodes[0].invalidmethod) 81 | expect_http_status(500, -8, self.nodes[0].getblockhash, 42) 82 | 83 | + def test_work_queue_exceeded(self): 84 | + self.log.info("Test work queue exceeded") 85 | + self.restart_node(0, ['-rpcworkqueue=1']) 86 | + got_exceeded_error = []
LarryRuane commented at 3:54 PM on July 22, 2020:This is a list, rather than just a boolean, because a list is a mutable object, and this eliminates the need for the global variable, which seems cleaner. Is it worth a comment explaining that? I tend to think not.
darosior commented at 1:45 PM on October 31, 2020:nit:
Is it worth a comment explaining that? I tend to think not.
I don't think so, but it's a bit awkward for the list to be named like a boolean, maybe
exceeded_errorsif you have to retouch ?LarryRuane commented at 3:58 PM on July 22, 2020: contributorAddressed review suggestions, force-pushed (diff), thank you. The test will run until it succeeds (there's no longer an arbitrary loop iteration count), or CI will timeout the test (which is extremely unlikely). Starting three threads should be enough, even two works, but the test runs faster with three (greater chance of exceeded the work queue depth limit).
jonatack commented at 6:20 AM on July 24, 2020: memberThanks @LarryRuane for removing the global and the update. The tests appear to be timing out, though it's unclear.
in test/functional/interface_rpc.py:85 in db0d4605b9 outdated
79 | @@ -67,10 +80,23 @@ def test_http_status_codes(self): 80 | expect_http_status(404, -32601, self.nodes[0].invalidmethod) 81 | expect_http_status(500, -8, self.nodes[0].getblockhash, 42) 82 | 83 | + def test_work_queue_exceeded(self): 84 | + self.log.info("Test work queue exceeded") 85 | + self.restart_node(0, ['-rpcworkqueue=1'])
hebasto commented at 8:51 AM on August 4, 2020:This would make CI testing more reliable:
self.restart_node(0, ["-rpcworkqueue=1", "-rpcthreads=1"])in test/functional/interface_rpc.py:84 in db0d4605b9 outdated
79 | @@ -67,10 +80,23 @@ def test_http_status_codes(self): 80 | expect_http_status(404, -32601, self.nodes[0].invalidmethod) 81 | expect_http_status(500, -8, self.nodes[0].getblockhash, 42) 82 | 83 | + def test_work_queue_exceeded(self): 84 | + self.log.info("Test work queue exceeded")
hebasto commented at 8:53 AM on August 4, 2020:nit: Mind rewording to make it consistent with other log messages
self.log.info("Testing work queue exceeded...")
LarryRuane commented at 5:28 PM on August 4, 2020:Force-pushed to implement review suggestions.
hebasto approvedhebasto commented at 8:56 AM on August 4, 2020: memberACK db0d4605b9e57230a02f8b620f93e5e4d191972c, tested on Linux Mint 20 (x86_64).
https://cirrus-ci.com/task/5913926858702848 timeout seems irrelevant (could be restarted though).
add functional test 8dd5946c0bLarryRuane force-pushed on Aug 4, 2020hebasto approvedfjahr commented at 11:44 AM on August 5, 2020: membertACK 8dd5946c0b7aa8f3976b14a5de4ce84b80a9c32a
Confirmed that the test fails without the fix. Ran test multiple times and seemed stable.
mini-nit if you have to retouch: The commit message of the test commit is too general for our usual conventions.
luke-jr referenced this in commit 6d207af2f6 on Aug 15, 2020hebasto commented at 8:17 AM on August 22, 2020: member@luke-jr as you've incorporated this PR into Bitcoin Knots, mind refreshing your previous ACK to get things done?
luke-jr commented at 2:41 PM on October 13, 2020: memberutACK 8dd5946c0b7aa8f3976b14a5de4ce84b80a9c32a (no changes since previous utACK)
in test/functional/interface_rpc.py:25 in 8dd5946c0b
20 | @@ -18,6 +21,16 @@ def expect_http_status(expected_http_status, expected_rpc_code, 21 | assert_equal(exc.error["code"], expected_rpc_code) 22 | assert_equal(exc.http_status, expected_http_status) 23 | 24 | + 25 | +def test_work_queue_getblock(node, got_exceeded_error):
darosior commented at 1:42 PM on October 31, 2020:nit: this is called
queue_getblockbut queues agetrpcinfo? Maybequeue_rpc_calldarosior approveddarosior commented at 1:48 PM on October 31, 2020: memberACK 8dd5946c0b7aa8f3976b14a5de4ce84b80a9c32a
Thanks for doing this, going to be helpful downstream :-) Left a few nits only if you have to retouch
in src/httpserver.cpp:272 in 8dd5946c0b
268 | @@ -269,7 +269,7 @@ static void http_request_cb(struct evhttp_request* req, void* arg) 269 | item.release(); /* if true, queue took ownership */ 270 | else { 271 | LogPrintf("WARNING: request rejected because http work queue depth exceeded, it can be increased with the -rpcworkqueue= setting\n"); 272 | - item->req->WriteReply(HTTP_INTERNAL_SERVER_ERROR, "Work queue depth exceeded"); 273 | + item->req->WriteReply(HTTP_SERVICE_UNAVAILABLE, "Work queue depth exceeded");
MarcoFalke commented at 6:57 PM on March 19, 2021:Does this need release notes to explain the changed http status code? Just in case a user relies on this?
Just asking, you don't have to invalidate review. Release note can be added after merge.
LarryRuane commented at 7:15 PM on March 19, 2021:Good point, I think you're right, I'll do it separately, thanks.
MarcoFalke commented at 7:54 PM on March 19, 2021:You can add them directly to
doc/release-notes.mdon themasterbranch
jonatack commented at 7:58 PM on March 19, 2021:Maybe add a "needs release note" label.
MarcoFalke merged this on Mar 19, 2021MarcoFalke closed this on Mar 19, 2021LarryRuane referenced this in commit d7fbab5461 on Mar 20, 2021sidhujag referenced this in commit 581817cbd5 on Mar 20, 2021LarryRuane referenced this in commit 71fc3cb65e on Mar 22, 2021LarryRuane referenced this in commit 169bb257be on Mar 22, 2021LarryRuane referenced this in commit 51eef4a03e on Mar 23, 2021MarcoFalke referenced this in commit 11840509fb on Mar 29, 2021sidhujag referenced this in commit c6261f8300 on Mar 29, 2021DrahtBot locked this on Aug 16, 2022
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 15:14 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me