Faster way to get block with prevouts in JSON-RPC

vostrnad commented at 10:48 pm on July 21, 2024: none

I often need to process the whole blockchain (or a large part of it) using an external script/program, for which I need blocks with prevout information included. However, the only current way to get that is getblock <hash> 3, which includes a lot of potentially unnecessary data and is quite slow, mainly (based on my experiments) because of UniValue overhead and descriptor inferring.

I benchmarked current master, retrieving 1000 blocks sequentially starting at block 840000, with different verbosity parameters:

benchmark	result
getblock (verbosity=0)	16.189s ± 1.165s
getblock (verbosity=1)	31.975s ± 1.014s
getblock (verbosity=2)	352.487s ± 1.636s
getblock (verbosity=3)	473.375s ± 2.280s

As you can see, verbosity=3 is around 30 times slower than verbosity=0. It seems obvious that a faster way of getting blocks with prevout information is feasible.

Potential solutions that come to mind:

Creating a new RPC call for undo data, say getblockundo. This would be perfect for my needs, but it would require making the undo data serialization format non-internal (not sure if this would be a problem, as IIRC it hasn’t changed in many years).
Creating a new verbosity level for getblock that would only provide the minimum amount of data necessary (i.e. no addresses, descriptors, ASM scripts, TXIDs/WTXIDs etc.) while still providing prevouts. This would be better than nothing but would still leave a lot of performance on the table because of UniValue overhead.

maflcko added the label RPC/REST/ZMQ on Jul 22, 2024

maflcko added the label Block storage on Jul 22, 2024

maflcko added the label Feature on Jul 22, 2024

andrewtoth commented at 11:16 pm on July 30, 2024: contributor

There are a few strategies to speed this up on the client side instead:

Fetch blocks concurrently
Fetch blocks in parallel
Fetch blocks in batch requests
A combination of all of the above

Setting rpcthreads to a higher number than the default 4 will allow you to request more concurrently or in parallel as well.

maflcko commented at 11:02 am on August 8, 2024: member

#30595 mentions “Traversing the block index as well and using block index entries for reading block and undo data.” However, it does not return JSON-RPC, but a kernel_BlockUndo*/BlockUndo, also the pull is experimental, doesn’t have versioning and has some other drawbacks. (Just mentioning it for context, because if you care about speed, this may be faster than JSON)

ismaelsadeeq commented at 5:11 pm on October 29, 2024: member

I also noticed using getblock sequentially on a large number of blocks was slow while checking for clusters of size > 2 in previously mined blocks see #30079 (comment).

To investigate further, I conducted a benchmark on a VPS with specs:

8 vCPU Cores, 24 GB RAM, 1.2 TB SSD, 32 TB Traffic
Running Ubuntu 22 with Bitcoin Core on latest master da10e0bab4a3e98868dd663af02c43b1dc8b7f4a

I used a script to retrieve 1000 blocks starting at block 840000, testing:

Verbosity levels 1, 2, and 3
Using Sequential and then Thread Pool strategies as @andrewtoth hinted
Running 3 iterations

Benchmark Results

Verbosity 1

Strategy	Iteration 1	Iteration 2	Iteration 3	Mean	Standard Deviation
Sequential	202 sec	118 sec	119 sec	146 sec	39 sec
Thread Pool	51 sec	52 sec	54 sec	53 sec	1 sec

Verbosity 2

Strategy	Iteration 1	Iteration 2	Iteration 3	Mean	Standard Deviation
Sequential	5004 sec	3517 sec	4952 sec	4491 sec	689 sec
Thread Pool	1248 sec	1289 sec	1298 sec	1279 sec	22 sec

Verbosity 3

Strategy	Iteration 1	Iteration 2	Iteration 3	Mean	Standard Deviation
Sequential	4145 sec	4175 sec	4187 sec	4169 sec	18 sec
Thread Pool	1591 sec	1564 sec	1587 sec	1581 sec	12 sec

The benchmark results showed a ~27.4% reduction in execution time when using parallel threading, which confirms the potential of client using threading to improve speed. However, further performance gains would benefit users requiring large block sets for data analysis e.g the whole blockchain.

I reviewed the getblock RPC implementation and noticed that all resources were moved when calling UniValue’s pushKV which was nice, also pushKV internally is also moving the values. In getblock and all the pushes to UniValue that were not moved explicitly were moved implicitly due to type elision.

edit However, I noticed that space for the block transactions in UniValue was not reserved initially, and appending data individually might likely causing resource reallocation overhead.

Adding a .reserve member function to UniValue can prevent this issue. I added the function and benchmarked to see if there was a performance improvement. The results showed slightly reduced mean times, particularly for verbosity levels 1.

andrewtoth commented at 5:22 pm on October 29, 2024: contributor

@ismaelsadeeq nice find!

I wonder, could you also benchmark batch requests? Sending a single request that contains rpcthread number of getblock requests, both sequentially and multithreaded on the client side?

josibake commented at 9:00 am on November 4, 2024: member

I think there are two separate topics here:

“I need to process the entire blockchain for [an external application like electrs, data analysis, etc]”
We can probably make the JSON-RPC faster, via threading, batching, etc

For 1., @vostrnad have you seen #30595 ? For the specific ask of prevouts, I’m almost certain this will always be faster since the the kernel API provides the prevouts by reading the rev.dat files (admittedly, I haven’t looked into how this is done with the getblock rpc, it might also be doing the same).

Here is an example program I wrote using the kernel API via rust bindings: https://github.com/josibake/silent-payments-scanner/blob/74f883c370a26e2eaa5a1a7e8e18643e07ce2cff/src/scanner.rs#L135

I found this very easy to write and incredibly performant. The nice thing about using the kernel API for this is you can use whatever language you want (so long as that language supports C-bindings), and it does not require a running bitcoind process to be able to process the block files, which seems well suited to the data analysis / index building use case.

For experimenting / testing the API, there is https://github.com/theCharlatan/rust-bitcoinkernel, and I’ve also been meaning to create some python bindings, as well. If this is of interest to you, I’d be happy to explain more and of course would love your feedback on the C API PR.

ismaelsadeeq commented at 8:58 pm on December 20, 2024: member

Thank you, @josibake, for highlighting this! I was able to perform some benchmarks to evaluate the performance you claimed of using

Rust wrapper by @TheCharlatan https://github.com/TheCharlatan/rust-bitcoinkernel
Python wrapper by @stickies-v https://github.com/stickies-v/py-bitcoinkernel

As you claimed, this is indeed more performant.

Benchmark Results:

I used the libbitcoinkernel library to imitate extracting block data for the same interval block heights 840000 to 841000, the average execution times are as follows:

Rust bindings: ~87 seconds, tested using https://github.com/ismaelsadeeq/rs-blockparser
Python bindings: ~612 seconds, tested using https://github.com/ismaelsadeeq/py-blockparser

For the Python bindings, I suspect the inefficiency arises from the deserialization process handled by https://github.com/petertodd/python-bitcoinlib because without the deserialization, the execution time drops significantly to around 62 seconds, which is much closer to the Rust result.

The block data returned by these methods is equivalent to the getblock RPC with verbosity level 2.
For verbosity level 3 (undo data alone), you can parse the undo files directly. Given the benchmark results, this approach is most likely faster than using the getblock RPC.

But this approach has some downside I think:

Data Directory Access: The libbitcoinkernel approach requires bitcoind to be shut down, as multiple clients cannot access the datadir simultaneously.
Sequential Process: This is sequential i.e only one parser can run at a time, whereas RPC calls allow asynchronous execution, enabling multiple clients to access the RPC interface simultaneously while bitcoind is still running.

The language bindings (Rust and Python) make it straightforward to build blockchain parsers and other applications, which is a significant advantage. However, this should not deter us from improving the performance of RPC calls, as they remain a widely-used interface for clients. Any chance to optimizing performance like #31490 #31539 #31179 would benefit a broader range of users..

I think this result is convincing enough to close this issue @vostrnad

romanz commented at 2:26 pm on December 21, 2024: contributor

Creating a new RPC call for undo data, say getblockundo. This would be perfect for my needs, but it would require making the undo data serialization format non-internal (not sure if this would be a problem, as IIRC it hasn’t changed in many years).

By adding a new REST endpoint for fetching block prevouts, it seems that we can get quite a good throughput rate when reading the data concurrently (tested with ab by fetching a single block 10k times using 4 concurrent connections) in binary format:

 0$ ab -k -c 4 -n 10000 http://localhost:8332/rest/block/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
 1...
 2Document Path:          /rest/block/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
 3Document Length:        2325617 bytes
 4
 5Concurrency Level:      4
 6Time taken for tests:   18.742 seconds
 7Complete requests:      10000
 8Failed requests:        0
 9Keep-Alive requests:    10000
10Total transferred:      23257250000 bytes
11HTML transferred:       23256170000 bytes
12Requests per second:    533.56 [#/sec] (mean)
13Time per request:       7.497 [ms] (mean)
14Time per request:       1.874 [ms] (mean, across all concurrent requests)
15Transfer rate:          1211837.00 [Kbytes/sec] received
16
17Connection Times (ms)
18              min  mean[+/-sd] median   max
19Connect:        0    0   0.0      0       0
20Processing:     4    7   1.6      8      20
21Waiting:        2    4   0.9      4      14
22Total:          4    7   1.6      8      20
23
24Percentage of the requests served within a certain time (ms)
25  50%      8
26  66%      8
27  75%      9
28  80%      9
29  90%     10
30  95%     10
31  98%     11
32  99%     11
33 100%     20 (longest request)
34
35
36$ ab -k -c 4 -n 10000 http://localhost:8332/rest/spentoutputs/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
37...
38Document Path:          /rest/spentoutputs/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
39Document Length:        151898 bytes
40
41Concurrency Level:      4
42Time taken for tests:   4.804 seconds
43Complete requests:      10000
44Failed requests:        0
45Keep-Alive requests:    10000
46Total transferred:      1520050000 bytes
47HTML transferred:       1518980000 bytes
48Requests per second:    2081.80 [#/sec] (mean)
49Time per request:       1.921 [ms] (mean)
50Time per request:       0.480 [ms] (mean, across all concurrent requests)
51Transfer rate:          309027.38 [Kbytes/sec] received
52
53Connection Times (ms)
54              min  mean[+/-sd] median   max
55Connect:        0    0   0.0      0       0
56Processing:     2    2   0.3      2      11
57Waiting:        2    2   0.2      2      10
58Total:          2    2   0.3      2      11
59
60Percentage of the requests served within a certain time (ms)
61  50%      2
62  66%      2
63  75%      2
64  80%      2
65  90%      2
66  95%      2
67  98%      2
68  99%      3
69 100%     11 (longest request)

@vostrnad WDYT?

shivaenigma commented at 4:18 pm on January 13, 2025: none

I am also interested in this. Even getrawblock response take 2-5 seconds depending on block size, which is very high. Few low hanging fruits that can be implemented:

Support binding to unix domain sockets
Support Gzip compression

pinheadmz commented at 4:19 pm on January 13, 2025: member

Support binding to unix domain sockets

This is a common request, but requires replacing libevent, which I’m working on: #31194

Faster way to get block with prevouts in JSON-RPC #30495

Benchmark Results: