Faster way to get block with prevouts in JSON-RPC #30495

issue vostrnad openend this issue on July 21, 2024
  1. vostrnad commented at 10:48 pm on July 21, 2024: none

    I often need to process the whole blockchain (or a large part of it) using an external script/program, for which I need blocks with prevout information included. However, the only current way to get that is getblock <hash> 3, which includes a lot of potentially unnecessary data and is quite slow, mainly (based on my experiments) because of UniValue overhead and descriptor inferring.

    I benchmarked current master, retrieving 1000 blocks sequentially starting at block 840000, with different verbosity parameters:

    benchmark result
    getblock (verbosity=0) 16.189s ± 1.165s
    getblock (verbosity=1) 31.975s ± 1.014s
    getblock (verbosity=2) 352.487s ± 1.636s
    getblock (verbosity=3) 473.375s ± 2.280s

    As you can see, verbosity=3 is around 30 times slower than verbosity=0. It seems obvious that a faster way of getting blocks with prevout information is feasible.

    Potential solutions that come to mind:

    • Creating a new RPC call for undo data, say getblockundo. This would be perfect for my needs, but it would require making the undo data serialization format non-internal (not sure if this would be a problem, as IIRC it hasn’t changed in many years).
    • Creating a new verbosity level for getblock that would only provide the minimum amount of data necessary (i.e. no addresses, descriptors, ASM scripts, TXIDs/WTXIDs etc.) while still providing prevouts. This would be better than nothing but would still leave a lot of performance on the table because of UniValue overhead.
  2. maflcko added the label RPC/REST/ZMQ on Jul 22, 2024
  3. maflcko added the label Block storage on Jul 22, 2024
  4. maflcko added the label Feature on Jul 22, 2024
  5. andrewtoth commented at 11:16 pm on July 30, 2024: contributor

    There are a few strategies to speed this up on the client side instead:

    • Fetch blocks concurrently
    • Fetch blocks in parallel
    • Fetch blocks in batch requests
    • A combination of all of the above

    Setting rpcthreads to a higher number than the default 4 will allow you to request more concurrently or in parallel as well.

  6. maflcko commented at 11:02 am on August 8, 2024: member
    #30595 mentions “Traversing the block index as well and using block index entries for reading block and undo data.” However, it does not return JSON-RPC, but a kernel_BlockUndo*/BlockUndo, also the pull is experimental, doesn’t have versioning and has some other drawbacks. (Just mentioning it for context, because if you care about speed, this may be faster than JSON)
  7. ismaelsadeeq commented at 5:11 pm on October 29, 2024: member

    I also noticed using getblock sequentially on a large number of blocks was slow while checking for clusters of size > 2 in previously mined blocks see #30079 (comment).

    To investigate further, I conducted a benchmark on a VPS with specs:

    • 8 vCPU Cores, 24 GB RAM, 1.2 TB SSD, 32 TB Traffic
    • Running Ubuntu 22 with Bitcoin Core on latest master da10e0bab4a3e98868dd663af02c43b1dc8b7f4a

    I used a script to retrieve 1000 blocks starting at block 840000, testing:

    • Verbosity levels 1, 2, and 3
    • Using Sequential and then Thread Pool strategies as @andrewtoth hinted
    • Running 3 iterations

    Benchmark Results

    Verbosity 1

    Strategy Iteration 1 Iteration 2 Iteration 3 Mean Standard Deviation
    Sequential 202 sec 118 sec 119 sec 146 sec 39 sec
    Thread Pool 51 sec 52 sec 54 sec 53 sec 1 sec

    Verbosity 2

    Strategy Iteration 1 Iteration 2 Iteration 3 Mean Standard Deviation
    Sequential 5004 sec 3517 sec 4952 sec 4491 sec 689 sec
    Thread Pool 1248 sec 1289 sec 1298 sec 1279 sec 22 sec

    Verbosity 3

    Strategy Iteration 1 Iteration 2 Iteration 3 Mean Standard Deviation
    Sequential 4145 sec 4175 sec 4187 sec 4169 sec 18 sec
    Thread Pool 1591 sec 1564 sec 1587 sec 1581 sec 12 sec

    The benchmark results showed a ~27.4% reduction in execution time when using parallel threading, which confirms the potential of client using threading to improve speed. However, further performance gains would benefit users requiring large block sets for data analysis e.g the whole blockchain.


    I reviewed the getblock RPC implementation and noticed that all resources were moved when calling UniValue’s pushKV which was nice, also pushKV internally is also moving the values. In getblock and all the pushes to UniValue that were not moved explicitly were moved implicitly due to type elision.

    However, I noticed that space for the block transactions in UniValue was not reserved initially, and appending data individually was likely causing resource reallocation overhead.

    Adding a .reserve member function to UniValue can prevent this issue. I added the function and benchmarked to see if there was a performance improvement. The results showed reduced mean times, particularly for verbosity levels 1 and 2.

    Benchmark with UniValue Reservation

    Verbosity Strategy Iteration 1 Iteration 2 Iteration 3 Mean Standard Deviation
    1 Sequential 122 sec 105 sec 107 sec 111 sec 7 sec
    2 Sequential 3241 sec 3272 sec 3267 sec 3260 sec 14 sec
    3 Sequential 4089 sec 4213 sec 4202 sec 4168 sec 56 sec
  8. andrewtoth commented at 5:22 pm on October 29, 2024: contributor

    @ismaelsadeeq nice find!

    I wonder, could you also benchmark batch requests? Sending a single request that contains rpcthread number of getblock requests, both sequentially and multithreaded on the client side?

  9. josibake commented at 9:00 am on November 4, 2024: member

    I think there are two separate topics here:

    1. “I need to process the entire blockchain for [an external application like electrs, data analysis, etc]”
    2. We can probably make the JSON-RPC faster, via threading, batching, etc

    For 1., @vostrnad have you seen #30595 ? For the specific ask of prevouts, I’m almost certain this will always be faster since the the kernel API provides the prevouts by reading the rev.dat files (admittedly, I haven’t looked into how this is done with the getblock rpc, it might also be doing the same).

    Here is an example program I wrote using the kernel API via rust bindings: https://github.com/josibake/silent-payments-scanner/blob/74f883c370a26e2eaa5a1a7e8e18643e07ce2cff/src/scanner.rs#L135

    I found this very easy to write and incredibly performant. The nice thing about using the kernel API for this is you can use whatever language you want (so long as that language supports C-bindings), and it does not require a running bitcoind process to be able to process the block files, which seems well suited to the data analysis / index building use case.

    For experimenting / testing the API, there is https://github.com/theCharlatan/rust-bitcoinkernel, and I’ve also been meaning to create some python bindings, as well. If this is of interest to you, I’d be happy to explain more and of course would love your feedback on the C API PR.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me