Inconsistent RPC error handling

Kixunil commented at 7:18 am on May 28, 2020: none

Expected behavior

The behavior of returning error from RPC call should be consistent between these cases:

Single RPC call fails
Single forbidden RPC call
Batched failed RPC call
One (or more) method of the batched call is forbidden

Actual behavior

Single failed RPC call returns HTTP error code
Single forbidden RPC call returns HTTP error code - HTTP_FORBIDDEN (403)
Batched failed RPC call leaks an internal error code
Batched call containing a forbidden method returns HTTP_FORBIDDEN (403) for the whole batch.

To reproduce

Disclaimer: the bug was discovered by code analysis when trying to come up with a correct solution to https://github.com/MetacoSA/NBitcoin/issues/864 I’m fairly confident that using curl to call methods and then observing output would reproduce the issue.

Another way to see a problem might be running BTCPayServer/NBXplorer with the bitcoind version from master with some users restricted. (Example: restrict getpeerinfo and see that Payjoin stops working.) I didn’t try this with newest Core, but I hit it with btc-rpc-proxy, which returns HTTP errors for authentication just like what bitcoind does according to the code.

System information

Not relevant.

Discussion

I’m opening this issue in order to decide on proper handling of RPC errors. I see these possible approaches:

RPC_METHOD_NOT_FOUND becomes public, the comment is removed and the rest of the code stays the same, the behavior doesn’t change.
JSONErrorReply function is modified to return HTTP errors for batched calls too.
New public error codes are defined and used for every method independently, the mapping to HTTP codes is removed (at least for batched calls)

I personally like the last one because it allows efficient inspection of node capabilities, but I’m worried that it might break some clients. I’d love to hear arguments from people who are more experienced in this area.

Kixunil added the label Bug on May 28, 2020

NicolasDorier commented at 3:16 pm on May 28, 2020: contributor

I think the current behavior should be kept: If any or all of the batched request fails, the batch result should still be HTTP 200, whichever the reason. This was the behavior before.

On top of this, it seems that if you do batched requests and one of the request is denied, then the caller will not receive ANY of the other results.

Kixunil commented at 5:49 pm on May 28, 2020: none

Yeah, keeping a bit strange behavior is backwards compatible. Changing authentication to return errors individually would help with batched requests - this shouldn’t be a backwards-compatibility hazard, as no code probably uses unreleased API yet.

In case that this is the accepted solution, the comment about internal error code should be removed and the value should become public.

NicolasDorier commented at 11:57 pm on May 28, 2020: contributor

ping @JeremyRubin

JeremyRubin commented at 3:37 am on May 29, 2020: contributor

I don’t have an opinion here sorry for breaking compat. I think that we should change batching to scan if any are not whitelisted and reject the whole batch.

Kixunil commented at 1:09 pm on May 29, 2020: none

we should change batching to scan if any are not whitelisted and reject the whole batch

You mean change that behavior to something else? Because that’s the current behavior.

JeremyRubin commented at 8:16 am on May 30, 2020: contributor

It sounds like in the current behavior some RPCs are running in the batch and some are failing and that hides the output. I think that we should pre-scan before running any

Kixunil commented at 6:41 pm on May 30, 2020: none

So pre-scan missing methods too?

JeremyRubin commented at 6:47 pm on May 30, 2020: contributor

Yeah I think so. If there are missing methods or non-whitelisted methods it sounds like failing the whole batch is a bit of a better API.

The behavior of the whitelists is such that it should be roughly identical to what happens if the RPC doesn’t exist at all.

Kixunil commented at 9:12 am on May 31, 2020: none

Why exactly do you consider it a better API? FYI, it’d break NBitcoin.

The behavior of the whitelists is such that it should be roughly identical to what happens if the RPC doesn’t exist at all.

I agree with this.

JeremyRubin commented at 6:57 pm on May 31, 2020: contributor

How would it break NBitcoin? Does NBitcoin send missing methods in a batch?

Kixunil commented at 2:11 pm on June 1, 2020: none

Yes, it sends various methods in batch, some of which are missing and then detects the features based on the responses. Making it return 404 would make it think none of the methods is available. It could be changed to not batch, but that’d lead to a (small?) loss of performance.

JeremyRubin commented at 4:44 pm on June 1, 2020: contributor

Ok, the right thing to do is to to add a feature detection RPC rather than guess by firing RPCs off and seeing what fails (RPCs may not be ‘safe’ or wise to call).

Mind reviewing #19117?

Kixunil commented at 7:58 am on June 5, 2020: none

I agree that official feature detection would be better. I wonder how NBitcoin will cope with the complexity:

Versions before 0.20 can’t disable calls (except if using btc-rpc-proxy)
0.20 can disable calls, but can’t list them
0.20 + epsilon may be able to list them but block all of them

It seems to me that if @NicolasDorier wants to keep his sanity, he’ll probably have to remove batching during detection. But maybe it’s not too bad? Try to get the list first and if it returns RPC method not found, then unbatched (slower) discovery continues. If it returns success, then the information is available and no further calls are needed. The policy could become “if you need fast discovery, use bitcoind >0.20”. What do you think Nicolas?

NicolasDorier commented at 4:22 am on August 9, 2020: contributor

The only way to work around this problem is to try batching first, then fallback to sending requests one by one if 403 is sent.

This however is dangerous if the RPC requests are not idempotent, as some requests could be executed twice. (once in the failing batch, and one in the retry) for scanning, this is fine.

NicolasDorier referenced this in commit bece4f2529 on Aug 9, 2020

NicolasDorier commented at 4:57 am on August 9, 2020: contributor

btw, the only way to fix this problem cleanly is to have a dedicated RPC code for forbidden exception instead of relying on HTTP status code. And keep using HTTP 200.

That way people can see which RPC request failed in the lot.

Kixunil commented at 5:35 am on August 9, 2020: none

This however is dangerous if the RPC requests are not idempotent, as some requests could be executed twice.

Not really because the requests are scanned before they are executed so either all of them fail or all of them are executed. Both in Bitcoin Core and my proxy. Therefore error 403 is fine. It’d only be a problem if someone wrote their own RPC proxy with different behavior (which is their bug).

JeremyRubin commented at 5:59 am on August 9, 2020: contributor

https://github.com/bitcoin/bitcoin/pull/12763/files#diff-744201ccb403526a4b5bf52d6e5367deR216-R219

Oh, I thought it was a worse issue than it is! Yes, the code rejects the entire batch if any are not permitted. This is by design; if you have a random failure in the middle of your batch that would be potentially bad. It’s better to reject the whole batch if the user is trying any unpermitted action.

How many RPCs do you actually need to detect? Maybe what I’d suggest is structuring your test-packet as:

Test Packet 1: All Mandatory RPCs Test Packet 2: All Optional RPCs If 2 fails: Test RPCs one-by-one

Kixunil commented at 6:33 am on August 9, 2020: none

If 2 fails: Test RPCs one-by-one

That’s exactly what he did recently. :)

You have a good point about failing the whole batch. I guess that’s better than (likely negligible) performance issues of scanning. Scanning RPCs should probably be done with a special method anyway.

So what do you think about publishing RPC_METHOD_NOT_FOUND?

JeremyRubin commented at 5:12 pm on August 9, 2020: contributor

That should be fine to do. It might be nice to differentiate RPC_METHOD_NOT_FOUND and RPC_METHOD_NOT_ALLOWED. Right now the code does not make a distinction, but you could make RPC_METHOD_NOT_ALLOWED be the error as the precheck, and RPC_METHOD_NOT_FOUND as the one once you try to run it and it’s not there (but may also be nice to make that a pre-checked thing IMO)

maflcko added the label RPC/REST/ZMQ on Aug 14, 2022

pinheadmz commented at 2:34 pm on March 27, 2023: member

The inconsistencies between single / batch RPCs are addressed in #27101 which adds a stricter jsonrpc 2.0 behavior when requested by the client (by adding "jsonrpc": "2.0" to the request object). In particular, we try to always return HTTP 200 OK unless there actually is a server error.

That PR does not affect the whitelist / authorization processing which looks like, in this discussion, is decidedly acceptable. @Kixunil and @JeremyRubin are you able to review and feed back on #27101 ? We might be able to close this issue as duplicate of https://github.com/bitcoin/bitcoin/issues/2960

pinheadmz assigned pinheadmz on Jun 2, 2023

willcl-ark commented at 3:18 pm on October 14, 2024: member

Going to close since #27101 was merged.

Please open a new issue if there is something outstanding.

willcl-ark closed this on Oct 14, 2024

Inconsistent RPC error handling #19087