rpc: getblock: implement with block height as input parameter. #26469

russeree commented at 10:00 AM on November 8, 2022: contributor

Affects the getblock rpc call with the goal of adding the ability to call getblock with a hash or block height whist not impacting compatibility with existing services that rely on the getblock call.

Motivation

Decrease the number of RPC calls needed to get valid data from getblock where the user has only a block height available. As well as reduce complexity, bandwidth, and overhead. Note: getblock returns the hash of the block when parsed.

Overview

It is trivial to implement a check where the input univalue string (string) to getblock is greater than or less than a length of 64. This is because all valid block header hashes are a length of 64. This means if the input is not length 64 we should attempt to interpret the string as a block height input and pass it to ParseHashOrHeight and then retreive the phashBlock (hash) value from the returned block at the input height. This hash value can be dropped into the existing blockhash variable leaving the rest of the call unmodified.

If the length of the input is 64 the univalue is treated as a blockhash and passed though ParseHashOrHeight to get the block header hash value.

Contentious Points

Ambiguity

The fact that a user can now parse a hash or a block height as the required input argument to param[0]. There may be the issue of inputs being seen as too flexible or ambiguous.

My thoughts behind this are because a valid getblock hash input will always be a univalue string with length of 64 there likely wont be a base10 block height collision with the base16 hash inputs for 1.9*10^58 years assuming a block time of ~10 mins. Well before the collision of the block height with hash values all of datatypes we for block height storage and parsing would be refactored to support values that would overflow the int32_t type. As such a block height could never reasonably be interpreted as a hash. By software or people

The next note about ambiguity of inputs is that a hash no matter the actual numerical value when read must be a string of length 64. Example a hash value of 0 would still need to be input as "000...000"

Implementation Details

Overall the implementation is pretty simple. If input string is anything other than a univalue str of length 64 attempt to parse as a block height using ParseHashOrHeight. Since the maximum number height_or_hash input parameter value is a uint64t. We use the ParseInt32 function with the input being the .get_str of param[0]. This function works incredibly well because it by default doesn't accept negative values, alpha chars. , strings that overflow a int32_t type, and symbol chars. If the conversion to a unit64t is successful it is then safe to pass into ParseHashOrHeight.

If the user places a value higher than the tip they will get an error code -8: Target block height _m_ after current tip _n_. If the user inputs an invalid string for height then they will receive and error code -5

the blockhash parameter name was left unchanged for backwards compatibility.

Backward Compatibility

Should be fully backwards compatible with all existing applications that use getblock and are supplying even remotely valid inputs. This is just a drop in check to see if getblock should make an attempt to convert an input to a block height.

Performance

Summary +9.4% performance gain when returning details of all blocks from 0 to the tip.

To test the performance impact of reducing the number of RPC to get block details from 2 calls to 1 I wrote a NodeJS script that parses all blocks through the RPC from 0 to the tip with getblock to determine the size of the block on disk and at the end returns the details of that largest block by size on the timechain.

The control test was bitcoin core .23 where the details for each block were fetched using 2 calls, first getblockhash(height) was called to get the hash. Then getblock(hash) was called to get the details. This process was applied to every block from 0 to the tip at the time of 762535. This test resulted in a runtime of 117m 11s. The result was the largest block size is 748918 with a size of 2765062 bytes.

Using this PR the same script was modified to call the getblock(height) and thus skipping the need to get the hash first. When run on the same hardware the execution time was 106m 32s. A performance uplift of 9.4%

As a note the biggest gains are inversely proportional to the ratio between the size of the results of getblock and getblock hash. The smaller the ratio the bigger the relative performance uplift.

This is seen when iterating though the blocks many of the early empty and small blocks the uplift was close to 2x, but as the size and number of details returned by getblock increased the advantage of saving around 100 bytes at the RPC became irrelevant.

russeree force-pushed on Nov 8, 2022

aureleoules commented at 10:40 AM on November 8, 2022: member

It would make more sense to use the following code as it is uniform with the codebase but it breaks compatibility because passing a hash would require wrapping it between quotes.

#19949 fixes this and should probably be merged first?

diff --git a/src/rpc/blockchain.cpp b/src/rpc/blockchain.cpp
index 082e1a76a..cb171f27b 100644
--- a/src/rpc/blockchain.cpp
+++ b/src/rpc/blockchain.cpp
@@ -642,7 +642,7 @@ static RPCHelpMan getblock()
                 "If verbosity is 2, returns an Object with information about block <hash> and information about each transaction.\n"
                 "If verbosity is 3, returns an Object with information about block <hash> and information about each transaction, including prevout information for inputs (only for unpruned blocks in the current best chain).\n",
                 {
-                    {"blockhash|blockheight", RPCArg::Type::STR_HEX, RPCArg::Optional::NO, "The block hash or height", RPCArgOptions{.type_str={"", "string or numeric"}}},
+                    {"hash_or_height", RPCArg::Type::NUM, RPCArg::Optional::NO, "The block hash or height", RPCArgOptions{.type_str={"", "string or numeric"}}},
                     {"verbosity|verbose", RPCArg::Type::NUM, RPCArg::Default{1}, "0 for hex-encoded data, 1 for a JSON object, 2 for JSON object with transaction data, and 3 for JSON object with transaction data including prevout information for inputs"},
                 },
                 {
@@ -706,18 +706,6 @@ static RPCHelpMan getblock()
 {
     uint256 hash;
     ChainstateManager& chainman = EnsureAnyChainman(request.context);
-    if(request.params[0].get_str().length() == 64){
-        hash = *ParseHashOrHeight(request.params[0], chainman)->phashBlock;
-    }
-    else{
-        uint64_t blockHeight;
-        if(ParseUInt64(request.params[0].get_str(),&blockHeight)){
-            hash = *ParseHashOrHeight(blockHeight, chainman)->phashBlock;
-        }
-        else{
-            throw JSONRPCError(RPC_INVALID_ADDRESS_OR_KEY, "Block height invalid");
-        }
-    }
 
     int verbosity = 1;
     if (!request.params[1].isNull()) {
@@ -729,17 +717,11 @@ static RPCHelpMan getblock()
     }
 
     CBlock block;
-    const CBlockIndex* pblockindex;
     const CBlockIndex* tip;
+    const auto* pblockindex{ParseHashOrHeight(request.params[0], chainman)};
     {
         LOCK(cs_main);
-        pblockindex = chainman.m_blockman.LookupBlockIndex(hash);
         tip = chainman.ActiveChain().Tip();
-
-        if (!pblockindex) {
-            throw JSONRPCError(RPC_INVALID_ADDRESS_OR_KEY, "Block not found");
-        }
-
         block = GetBlockChecked(chainman.m_blockman, pblockindex);
     }
 
diff --git a/src/rpc/client.cpp b/src/rpc/client.cpp
index 8688263ef..4273063da 100644
--- a/src/rpc/client.cpp
+++ b/src/rpc/client.cpp
@@ -96,6 +96,7 @@ static const CRPCConvertParam vRPCConvertParams[] =
     { "listunspent", 2, "addresses" },
     { "listunspent", 3, "include_unsafe" },
     { "listunspent", 4, "query_options" },
+    { "getblock", 0, "hash_or_height" },
     { "getblock", 1, "verbosity" },
     { "getblock", 1, "verbose" },
     { "getblockheader", 1, "verbose" },

russeree commented at 10:56 AM on November 8, 2022: contributor

It would make more sense to use the following code as it is uniform with the codebase but it breaks compatibility because passing a hash would require wrapping it between quotes.

The below code is more uniform agreed. The point of this was that this a fully backward compatible drop in implementation with a minimal impact on the codebase. There is an advantage to this method is that it becomes possible to throw not only on invalid hash but also independently on invalid height input.

Edit: Commit chain fixed thanks @sipa

Applied clang formatting to @aureleoules request to make the code more uniform.

russeree force-pushed on Nov 8, 2022

DrahtBot added the label RPC/REST/ZMQ on Nov 8, 2022

russeree closed this on Nov 8, 2022

russeree force-pushed on Nov 8, 2022

russeree deleted the branch on Nov 8, 2022

russeree restored the branch on Nov 8, 2022

sipa reopened this on Nov 8, 2022

DrahtBot commented at 4:02 PM on November 9, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept NACK	luke-jr, RandyMcMillan

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#26039 (refactor: Run type check against RPCArgs (1/2) by MarcoFalke)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

russeree force-pushed on Nov 10, 2022

maflcko commented at 8:19 AM on November 11, 2022: member

It is possible to run the test locally. This way you don't have to push and wait for CI.

fjahr commented at 9:14 PM on November 12, 2022: contributor

#19949 fixes this and should probably be merged first?

I agree, #19949 doesn't take getblock into account, I don't recall why, probably an oversight. This PR here could be a follow-up building on top of it implementing the needed further changes for getblock.

russeree force-pushed on Nov 12, 2022

russeree force-pushed on Nov 13, 2022

luke-jr commented at 10:58 PM on November 14, 2022: member

Concept NACK. See also more reasonable (yet still rejected) proposals #8457, #16439, #14858, #16345, #16317

russeree commented at 12:11 AM on November 15, 2022: contributor

Concept NACK. See also more reasonable (yet still rejected) proposals #8457, #16439, #14858, #16345, #16317

These are my thoughts about those particular listed PRs. I would disagree that they are all 'more reasonable' but none the less here are my thoughts.

Just curious as to why you would say an implementation such as #16317 is more reasonable since it clearely violates multiple suggestions outlined in the developer notes including the use of c_str and and atoi instead of parse ParseInt32. Then you have #14858. This solution introduces a whole new include for regex expressions and again uses stoi() and defeats the purpose of the guidelines calling for a ParseInt(n)

#16345 sure could be considered more reasonable since it adds a whole new RPC call as well as #16439 since it's very different solution and removes the ambiguity of using height as an argument. Also effects multiple RPCs. Both of these PRs actually create a new way defining the argument or create a new RPCs

Lastly #8457. This solution is quite close to the one in this PR. Why it's more reasonable other than it has a throw for invalid height and height out of range.

luke-jr commented at 1:44 AM on November 15, 2022: member

More reasonable mainly in the sense of using a Number parameter rather than a String.

But regardless, the main issue is the concept.

RandyMcMillan commented at 1:46 AM on November 15, 2022: contributor

I like @promag's argument here --> #8457 (comment)

    > I don't like overloading the argument, making it ambiguous.

Disagree, there is a place for overloading in APIs. In this case blocks are usually referenced by height or hash.

There is also overloading in the protocol: 4 lock_time uint32_t A time (Unix epoch time) or block number. See the locktime parsing rules.

It seems messy, and we don't do that anywhere else on the API.

From help importaddress: 1. "script" (string, required) The hex-encoded script (or address) either.

E.g. a long hash with no hexadecimal characters could be interpreted as a long number.

Unlikely? Still it's simple to distinguish that block hash from a number.

Though even then I'm not convinced it's a worthwhile addition.

From the API point of view, I think it's fair to fetch a block from it's height. From a technical point of view, making 2 calls causing locks on cs_main etc on a heavy loaded daemon is resource wasting.

So, +1 for getblock hash|height (verbose).

Originally posted by @promag in #8457 (comment)

russeree commented at 2:23 AM on November 15, 2022: contributor

Per the ambiguity argument in #8457

E.g. a long hash with no hexadecimal characters could be interpreted as a long number.

Unlikely? Still it's simple to distinguish that block hash from a number.

This is currently not even possible with the current implementation .23. This is because any attempt to input a block hash that is more or less than 64 characters will be met with an error -8, blockhash must be of length 64

Example getblock "001" doesn't attempt to get the hex padded version of 0x01 instead the user is presented with blockhash must be of length 64 (not 3, for '001')

This is because of the behavior implemented in ParseHashV. https://github.com/bitcoin/bitcoin/blob/48174c0f287b19931ca110670610bd03a03eb914/src/rpc/util.cpp#L97

The implementation of parsing the hash in getblock that was a part of #8457 was just a call of uint256 hash(uint256S(strHash)) as string input that was less than 64 chars would be parsed into a hash. example in #8457 if the user called getblock "001" it would parse that as `bitcoin-cli getblock "000000000000000000000000000000000000000000000000000000000000001"

Since the implementation of ParseHashV into getblock this is no longer possible because of the check for a length == 64 followed by a throw if that case is not met. As such the ambiguity arguments of a collision might need to be looked at again.

To the concern about the input for a block height not being a number.

More reasonable mainly in the sense of using a Number parameter rather than a String.

So if the block hash could be input as a HEX_STR and a height could be parsed as NUM then the idea could potentially become reasonable?

russeree force-pushed on Nov 17, 2022

russeree force-pushed on Nov 18, 2022

russeree commented at 7:10 AM on November 18, 2022: contributor

More reasonable mainly in the sense of using a Number parameter rather than a String.

But regardless, the main issue is the concept. @luke-jr a solution to the problem is just to create a switch that cases over the result of UNIVALUE::VType of request.params[0] my most recent commit supports not only string inputs of hash or height for getblock but an integer input as well. All while maintaining complete backwards compatibility.

So at minimum I hope these changes would class this solution into the same league as the 'more reasonable' ones.

Edit... taking it even further if it was desired any integer input could be parsed as a height where as any string could be processed as a hash.

russeree force-pushed on Nov 18, 2022

rpc::getblock<VNUM/VSTR>(hash or height) complete

added support for unitype VSTR and VNUM

rpc::getblock(int/string hash or height) complete

fix throw typo

a917ae2b30

russeree force-pushed on Nov 18, 2022

DrahtBot added the label Needs rebase on Jan 17, 2023

DrahtBot commented at 10:06 AM on January 17, 2023: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

RandyMcMillan commented at 7:01 PM on March 18, 2023: contributor

Concept NACK

After further consideration:

This change sets up the condition that getblock has to evaluate inputs:

getblock <hash> <int verbosity>

or

getblock <int> <int verbosity>

This seems error prone to me!

Using a nested command such as:

 bitcoin-cli getblock $(bitcoin-cli getblockhash 0)
 bitcoin-cli getblock $(bitcoin-cli getblockhash 0) 0
 bitcoin-cli getblock $(bitcoin-cli getblockhash 0) 1
 bitcoin-cli getblock $(bitcoin-cli getblockhash 0) 2

forces the user to handle the data query with intent.

The command is currently simple to use!

The added convenience of over loading this command doesn't out weigh potential problems it could create. [^note] [^future_nacks]

Additionally:

The getblock help output is simple to understand as is.

Arguments:
1. blockhash    (string, required) The block hash
2. verbosity    (numeric, optional, default=1) 0 for hex-encoded data, 1 for a JSON object, 2 for JSON object with transaction data, and 3 for JSON object with transaction data including prevout information for inputs

[^note]: Consider currently deployed tooling for jq json "<string>" vs <integer> quotation for example.

[^future_nacks]: I think this is a good argument on why any previous or future variation on this should never be implemented. IMO

russeree closed this on Mar 18, 2023

russeree commented at 7:44 PM on March 18, 2023: contributor

This was meant to be closed some time ago.

Though I don't agree completely in the case of a hash vs height overload since height and hashes do not and can not intersect ever. With that said, there are already other rpc calls that use this same method to get the block with a parseHashOrHeight

All and all, thanks for your input.

bitcoin locked this on Mar 17, 2024