index: store per-block transaction locations for efficient lookups #32541

pull romanz wants to merge 1 commits into bitcoin:master from romanz:locations-index changing 9 files +394 −5
  1. romanz commented at 7:27 am on May 17, 2025: contributor

    Currently, electrs and other indexers a map between an address/scripthash to the list of the relevant transactions.

    However, in order to fetch those transactions from bitcoind, electrs relies on reading the whole block and post-filtering for a specific transaction1. Other indexers use a txindex to fetch a transaction using its txid 234.

    The above approach has significant storage and CPU overhead, since the txid is a pseudo-random 32-byte value.

    This PR is adding support for using the transaction’s position within its block to be able to fetch it directly using REST API, using the following HTTP request (to fetch the N-th transaction from BLOCKHASH):

    0GET /rest/txfromblock/BLOCKHASH-N.bin
    

    If binary response format is used, the transaction data will be read directly from the storage and sent back to the client, without any deserialization overhead.

    The resulting index is much smaller (allowing it to be cached):

    0$ du -sh indexes/locations/ indexes/txindex/
    12.5G	indexes/locations/
    257G	indexes/txindex/
    

    The new index is using the following DB schema:

    0struct DBKey {
    1    uint256 hash;   // blockhash
    2    uint32_t part;  // allow splitting one block's transactions into multiple DB rows
    3};
    4
    5struct DBValue {
    6    FlatFilePos block_pos;          // file id + offset of the block
    7    std::vector<uint32_t> offsets;  // a list of transaction offsets within the block
    8};
    
  2. DrahtBot commented at 7:28 am on May 17, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32541.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK TheCharlatan

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. DrahtBot added the label UTXO Db and Indexes on May 17, 2025
  4. DrahtBot added the label CI failed on May 17, 2025
  5. DrahtBot commented at 7:37 am on May 17, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/runs/42402043332 LLM reason (✨ experimental): The CI failure is due to missing include guards in src/index/locationsindex.h.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  6. romanz force-pushed on May 17, 2025
  7. TheCharlatan commented at 9:37 am on May 17, 2025: contributor

    Concept ACK

    Can you add the schema of the index and the expected arguments for the REST API to the pull request description? I was a bit confused at first if this now exposes the file position, but if I read it correctly now, this just allows querying a transaction by its index in the block.

  8. romanz commented at 10:00 am on May 17, 2025: contributor

    Concept ACK

    Thanks!

    Can you add the schema of the index and the expected arguments for the REST API to the pull request description?

    Sure - updated in #32541#issue-3070502385.

  9. index: store per-block transaction locations for efficient lookups
    Currently, electrs and other indexers are used to maintain a map between
    an address/scripthash to the list of the relevant transactions.
    
    However, in order to fetch those transactions from bitcoind, electrs
    relies on reading the whole block and post-filtering for a specific
    transaction [1]. Other indexers use a `txindex` to fetch a transaction
    using its txid [2,3,4].
    
    The above approach has significant storage and CPU overhead, since
    the `txid` is a pseudo-random 32-byte value.
    
    This PR is adding support for using the transaction's offset within
    its block to be able to read it directly using REST API.
    
    If binary response format is used, the transaction data will be read
    directly from the storage and sent back to the client, without any
    deserialization overhead.
    
    The resulting index is much smaller (allowing it to be cached):
    
      $ du -sh indexes/locations/ indexes/txindex/
      2.5G	indexes/locations/
      57G	indexes/txindex/
    
    [1] https://github.com/romanz/electrs/blob/master/doc/schema.md
    [2] https://github.com/Blockstream/electrs/blob/new-index/doc/schema.md#txstore
    [3] https://github.com/spesmilo/electrumx/blob/master/docs/HOWTO.rst#prerequisites
    [4] https://github.com/cculianu/Fulcrum/blob/master/README.md#requirements
    c074ad2676
  10. romanz force-pushed on May 17, 2025
  11. romanz commented at 11:58 am on May 17, 2025: contributor
    Fixed a few issues (following SonarQube run).
  12. DrahtBot removed the label CI failed on May 17, 2025
  13. luke-jr commented at 6:15 pm on May 20, 2025: member
    How does this compare to getrawtransaction <txid> 0 <blockhash> without a txindex?
  14. romanz commented at 9:51 am on May 21, 2025: contributor

    I have used ApacheBench 2.3 for benchmarking REST API, and the following Rust client for getrawtransaction RPC:

    fetching using the new index

     0$ ab -k -c 1 -n 100000 http://localhost:8332/rest/txfromblock/0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe-42.bin
     1
     2Document Path:          /rest/txfromblock/0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe-42.bin
     3Document Length:        301 bytes
     4
     5Concurrency Level:      1
     6Time taken for tests:   13.760 seconds
     7Complete requests:      100000
     8Failed requests:        0
     9Keep-Alive requests:    100000
    10Total transferred:      40500000 bytes
    11HTML transferred:       30100000 bytes
    12Requests per second:    7267.65 [#/sec] (mean)
    13Time per request:       0.138 [ms] (mean)
    14Time per request:       0.138 [ms] (mean, across all concurrent requests)
    15Transfer rate:          2874.41 [Kbytes/sec] received
    

    fetching using txindex

     0$ ab -k -c 1 -n 100000 http://localhost:8332/rest/tx/4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a.bin
     1
     2Document Path:          /rest/tx/4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a.bin
     3Document Length:        301 bytes
     4
     5Concurrency Level:      1
     6Time taken for tests:   14.075 seconds
     7Complete requests:      100000
     8Failed requests:        0
     9Keep-Alive requests:    100000
    10Total transferred:      40500000 bytes
    11HTML transferred:       30100000 bytes
    12Requests per second:    7104.78 [#/sec] (mean)
    13Time per request:       0.141 [ms] (mean)
    14Time per request:       0.141 [ms] (mean, across all concurrent requests)
    15Transfer rate:          2810.00 [Kbytes/sec] received
    

    fetching without txindex

    0time cargo run --release -- 4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a 0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe
    1    Finished `release` profile [optimized] target(s) in 0.02s
    2     Running `target/release/bench-getrawtx 4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a 0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe`
    3iterations = 1000
    4average RPC duration = 8.563491ms
    5
    6real	0m8.628s
    7user	0m0.070s
    8sys	0m0.052s
    

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-05-25 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me