Faster way to get block with prevouts in JSON-RPC #30495

issue vostrnad openend this issue on July 21, 2024
  1. vostrnad commented at 10:48 pm on July 21, 2024: none

    I often need to process the whole blockchain (or a large part of it) using an external script/program, for which I need blocks with prevout information included. However, the only current way to get that is getblock <hash> 3, which includes a lot of potentially unnecessary data and is quite slow, mainly (based on my experiments) because of UniValue overhead and descriptor inferring.

    I benchmarked current master, retrieving 1000 blocks sequentially starting at block 840000, with different verbosity parameters:

    benchmark result
    getblock (verbosity=0) 16.189s ± 1.165s
    getblock (verbosity=1) 31.975s ± 1.014s
    getblock (verbosity=2) 352.487s ± 1.636s
    getblock (verbosity=3) 473.375s ± 2.280s

    As you can see, verbosity=3 is around 30 times slower than verbosity=0. It seems obvious that a faster way of getting blocks with prevout information is feasible.

    Potential solutions that come to mind:

    • Creating a new RPC call for undo data, say getblockundo. This would be perfect for my needs, but it would require making the undo data serialization format non-internal (not sure if this would be a problem, as IIRC it hasn’t changed in many years).
    • Creating a new verbosity level for getblock that would only provide the minimum amount of data necessary (i.e. no addresses, descriptors, ASM scripts, TXIDs/WTXIDs etc.) while still providing prevouts. This would be better than nothing but would still leave a lot of performance on the table because of UniValue overhead.
  2. maflcko added the label RPC/REST/ZMQ on Jul 22, 2024
  3. maflcko added the label Block storage on Jul 22, 2024
  4. maflcko added the label Feature on Jul 22, 2024
  5. andrewtoth commented at 11:16 pm on July 30, 2024: contributor

    There are a few strategies to speed this up on the client side instead:

    • Fetch blocks concurrently
    • Fetch blocks in parallel
    • Fetch blocks in batch requests
    • A combination of all of the above

    Setting rpcthreads to a higher number than the default 4 will allow you to request more concurrently or in parallel as well.

  6. maflcko commented at 11:02 am on August 8, 2024: member
    #30595 mentions “Traversing the block index as well and using block index entries for reading block and undo data.” However, it does not return JSON-RPC, but a kernel_BlockUndo*/BlockUndo, also the pull is experimental, doesn’t have versioning and has some other drawbacks. (Just mentioning it for context, because if you care about speed, this may be faster than JSON)
  7. ismaelsadeeq commented at 5:11 pm on October 29, 2024: member

    I also noticed using getblock sequentially on a large number of blocks was slow while checking for clusters of size > 2 in previously mined blocks see #30079 (comment).

    To investigate further, I conducted a benchmark on a VPS with specs:

    • 8 vCPU Cores, 24 GB RAM, 1.2 TB SSD, 32 TB Traffic
    • Running Ubuntu 22 with Bitcoin Core on latest master da10e0bab4a3e98868dd663af02c43b1dc8b7f4a

    I used a script to retrieve 1000 blocks starting at block 840000, testing:

    • Verbosity levels 1, 2, and 3
    • Using Sequential and then Thread Pool strategies as @andrewtoth hinted
    • Running 3 iterations

    Benchmark Results

    Verbosity 1

    Strategy Iteration 1 Iteration 2 Iteration 3 Mean Standard Deviation
    Sequential 202 sec 118 sec 119 sec 146 sec 39 sec
    Thread Pool 51 sec 52 sec 54 sec 53 sec 1 sec

    Verbosity 2

    Strategy Iteration 1 Iteration 2 Iteration 3 Mean Standard Deviation
    Sequential 5004 sec 3517 sec 4952 sec 4491 sec 689 sec
    Thread Pool 1248 sec 1289 sec 1298 sec 1279 sec 22 sec

    Verbosity 3

    Strategy Iteration 1 Iteration 2 Iteration 3 Mean Standard Deviation
    Sequential 4145 sec 4175 sec 4187 sec 4169 sec 18 sec
    Thread Pool 1591 sec 1564 sec 1587 sec 1581 sec 12 sec

    The benchmark results showed a ~27.4% reduction in execution time when using parallel threading, which confirms the potential of client using threading to improve speed. However, further performance gains would benefit users requiring large block sets for data analysis e.g the whole blockchain.


    I reviewed the getblock RPC implementation and noticed that all resources were moved when calling UniValue’s pushKV which was nice, also pushKV internally is also moving the values. In getblock and all the pushes to UniValue that were not moved explicitly were moved implicitly due to type elision.

    edit However, I noticed that space for the block transactions in UniValue was not reserved initially, and appending data individually might likely causing resource reallocation overhead.

    Adding a .reserve member function to UniValue can prevent this issue. I added the function and benchmarked to see if there was a performance improvement. The results showed slightly reduced mean times, particularly for verbosity levels 1.

  8. andrewtoth commented at 5:22 pm on October 29, 2024: contributor

    @ismaelsadeeq nice find!

    I wonder, could you also benchmark batch requests? Sending a single request that contains rpcthread number of getblock requests, both sequentially and multithreaded on the client side?

  9. josibake commented at 9:00 am on November 4, 2024: member

    I think there are two separate topics here:

    1. “I need to process the entire blockchain for [an external application like electrs, data analysis, etc]”
    2. We can probably make the JSON-RPC faster, via threading, batching, etc

    For 1., @vostrnad have you seen #30595 ? For the specific ask of prevouts, I’m almost certain this will always be faster since the the kernel API provides the prevouts by reading the rev.dat files (admittedly, I haven’t looked into how this is done with the getblock rpc, it might also be doing the same).

    Here is an example program I wrote using the kernel API via rust bindings: https://github.com/josibake/silent-payments-scanner/blob/74f883c370a26e2eaa5a1a7e8e18643e07ce2cff/src/scanner.rs#L135

    I found this very easy to write and incredibly performant. The nice thing about using the kernel API for this is you can use whatever language you want (so long as that language supports C-bindings), and it does not require a running bitcoind process to be able to process the block files, which seems well suited to the data analysis / index building use case.

    For experimenting / testing the API, there is https://github.com/theCharlatan/rust-bitcoinkernel, and I’ve also been meaning to create some python bindings, as well. If this is of interest to you, I’d be happy to explain more and of course would love your feedback on the C API PR.

  10. ismaelsadeeq commented at 8:58 pm on December 20, 2024: member

    Thank you, @josibake, for highlighting this! I was able to perform some benchmarks to evaluate the performance you claimed of using

    As you claimed, this is indeed more performant.

    Benchmark Results:

    I used the libbitcoinkernel library to imitate extracting block data for the same interval block heights 840000 to 841000, the average execution times are as follows:

    1. Rust bindings: ~87 seconds, tested using https://github.com/ismaelsadeeq/rs-blockparser

    2. Python bindings: ~612 seconds, tested using https://github.com/ismaelsadeeq/py-blockparser

    For the Python bindings, I suspect the inefficiency arises from the deserialization process handled by https://github.com/petertodd/python-bitcoinlib because without the deserialization, the execution time drops significantly to around 62 seconds, which is much closer to the Rust result.

    The block data returned by these methods is equivalent to the getblock RPC with verbosity level 2.
    For verbosity level 3 (undo data alone), you can parse the undo files directly. Given the benchmark results, this approach is most likely faster than using the getblock RPC.

    But this approach has some downside I think:

    1. Data Directory Access: The libbitcoinkernel approach requires bitcoind to be shut down, as multiple clients cannot access the datadir simultaneously.
    2. Sequential Process: This is sequential i.e only one parser can run at a time, whereas RPC calls allow asynchronous execution, enabling multiple clients to access the RPC interface simultaneously while bitcoind is still running.

    The language bindings (Rust and Python) make it straightforward to build blockchain parsers and other applications, which is a significant advantage. However, this should not deter us from improving the performance of RPC calls, as they remain a widely-used interface for clients. Any chance to optimizing performance like #31490 #31539 #31179 would benefit a broader range of users..

    I think this result is convincing enough to close this issue @vostrnad

  11. romanz commented at 2:26 pm on December 21, 2024: contributor
    • Creating a new RPC call for undo data, say getblockundo. This would be perfect for my needs, but it would require making the undo data serialization format non-internal (not sure if this would be a problem, as IIRC it hasn’t changed in many years).

    By adding a new REST endpoint for fetching block prevouts, it seems that we can get quite a good throughput rate when reading the data concurrently (tested with ab by fetching a single block 10k times using 4 concurrent connections) in binary format:

     0$ ab -k -c 4 -n 10000 http://localhost:8332/rest/block/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
     1...
     2Document Path:          /rest/block/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
     3Document Length:        2325617 bytes
     4
     5Concurrency Level:      4
     6Time taken for tests:   18.742 seconds
     7Complete requests:      10000
     8Failed requests:        0
     9Keep-Alive requests:    10000
    10Total transferred:      23257250000 bytes
    11HTML transferred:       23256170000 bytes
    12Requests per second:    533.56 [#/sec] (mean)
    13Time per request:       7.497 [ms] (mean)
    14Time per request:       1.874 [ms] (mean, across all concurrent requests)
    15Transfer rate:          1211837.00 [Kbytes/sec] received
    16
    17Connection Times (ms)
    18              min  mean[+/-sd] median   max
    19Connect:        0    0   0.0      0       0
    20Processing:     4    7   1.6      8      20
    21Waiting:        2    4   0.9      4      14
    22Total:          4    7   1.6      8      20
    23
    24Percentage of the requests served within a certain time (ms)
    25  50%      8
    26  66%      8
    27  75%      9
    28  80%      9
    29  90%     10
    30  95%     10
    31  98%     11
    32  99%     11
    33 100%     20 (longest request)
    34
    35
    36$ ab -k -c 4 -n 10000 http://localhost:8332/rest/spentoutputs/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
    37...
    38Document Path:          /rest/spentoutputs/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
    39Document Length:        151898 bytes
    40
    41Concurrency Level:      4
    42Time taken for tests:   4.804 seconds
    43Complete requests:      10000
    44Failed requests:        0
    45Keep-Alive requests:    10000
    46Total transferred:      1520050000 bytes
    47HTML transferred:       1518980000 bytes
    48Requests per second:    2081.80 [#/sec] (mean)
    49Time per request:       1.921 [ms] (mean)
    50Time per request:       0.480 [ms] (mean, across all concurrent requests)
    51Transfer rate:          309027.38 [Kbytes/sec] received
    52
    53Connection Times (ms)
    54              min  mean[+/-sd] median   max
    55Connect:        0    0   0.0      0       0
    56Processing:     2    2   0.3      2      11
    57Waiting:        2    2   0.2      2      10
    58Total:          2    2   0.3      2      11
    59
    60Percentage of the requests served within a certain time (ms)
    61  50%      2
    62  66%      2
    63  75%      2
    64  80%      2
    65  90%      2
    66  95%      2
    67  98%      2
    68  99%      3
    69 100%     11 (longest request)
    

    @vostrnad WDYT?

  12. shivaenigma commented at 4:18 pm on January 13, 2025: none

    I am also interested in this. Even getrawblock response take 2-5 seconds depending on block size, which is very high. Few low hanging fruits that can be implemented:

    • Support binding to unix domain sockets
    • Support Gzip compression
  13. pinheadmz commented at 4:19 pm on January 13, 2025: member

    Support binding to unix domain sockets

    This is a common request, but requires replacing libevent, which I’m working on: #31194


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me