index: store per-block transaction locations for efficient lookups #32541

pull romanz wants to merge 1 commits into bitcoin:master from romanz:locations-index changing 9 files +393 −5
  1. romanz commented at 7:27 am on May 17, 2025: contributor

    Currently, electrs and other indexers map between an address/scripthash to the list of the relevant transactions.

    However, in order to fetch those transactions from bitcoind, electrs relies on reading the whole block and post-filtering for a specific transaction1. Other indexers use a txindex to fetch a transaction using its txid 234.

    The above approach has significant storage and CPU overhead, since the txid is a pseudo-random 32-byte value.

    This PR is adding support for using the transaction’s position within its block to be able to fetch it directly using REST API, using the following HTTP request (to fetch the N-th transaction from BLOCKHASH):

    0GET /rest/txfromblock/BLOCKHASH-N.bin
    

    If binary response format is used, the transaction data will be read directly from the storage and sent back to the client, without any deserialization overhead.

    The resulting index is much smaller (allowing it to be cached):

    0$ du -sh indexes/locations/ indexes/txindex/
    12.5G	indexes/locations/
    257G	indexes/txindex/
    

    The new index is using the following DB schema:

    0struct DBKey {
    1    uint256 hash;   // blockhash
    2    uint32_t part;  // allow splitting one block's transactions into multiple DB rows
    3};
    4
    5struct DBValue {
    6    FlatFilePos block_pos;          // file id + offset of the block
    7    std::vector<uint32_t> offsets;  // a list of transaction offsets within the block
    8};
    
  2. DrahtBot commented at 7:28 am on May 17, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32541.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK TheCharlatan

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32699 (docs: adds correct updated documentation links by Zeegaths)
    • #26966 (index: initial sync speedup, parallelize process by furszy)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label UTXO Db and Indexes on May 17, 2025
  4. DrahtBot added the label CI failed on May 17, 2025
  5. DrahtBot commented at 7:37 am on May 17, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/runs/42402043332 LLM reason (✨ experimental): The CI failure is due to missing include guards in src/index/locationsindex.h.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  6. romanz force-pushed on May 17, 2025
  7. TheCharlatan commented at 9:37 am on May 17, 2025: contributor

    Concept ACK

    Can you add the schema of the index and the expected arguments for the REST API to the pull request description? I was a bit confused at first if this now exposes the file position, but if I read it correctly now, this just allows querying a transaction by its index in the block.

  8. romanz commented at 10:00 am on May 17, 2025: contributor

    Concept ACK

    Thanks!

    Can you add the schema of the index and the expected arguments for the REST API to the pull request description?

    Sure - updated in #32541#issue-3070502385.

  9. romanz force-pushed on May 17, 2025
  10. romanz commented at 11:58 am on May 17, 2025: contributor
    Fixed a few issues (following SonarQube run).
  11. DrahtBot removed the label CI failed on May 17, 2025
  12. luke-jr commented at 6:15 pm on May 20, 2025: member
    How does this compare to getrawtransaction <txid> 0 <blockhash> without a txindex?
  13. romanz commented at 9:51 am on May 21, 2025: contributor

    I have used ApacheBench 2.3 for benchmarking REST API, and the following Rust client for getrawtransaction RPC:

    fetching using the new index

     0$ ab -k -c 1 -n 100000 http://localhost:8332/rest/txfromblock/0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe-42.bin
     1
     2Document Path:          /rest/txfromblock/0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe-42.bin
     3Document Length:        301 bytes
     4
     5Concurrency Level:      1
     6Time taken for tests:   13.760 seconds
     7Complete requests:      100000
     8Failed requests:        0
     9Keep-Alive requests:    100000
    10Total transferred:      40500000 bytes
    11HTML transferred:       30100000 bytes
    12Requests per second:    7267.65 [#/sec] (mean)
    13Time per request:       0.138 [ms] (mean)
    14Time per request:       0.138 [ms] (mean, across all concurrent requests)
    15Transfer rate:          2874.41 [Kbytes/sec] received
    

    fetching using txindex

     0$ ab -k -c 1 -n 100000 http://localhost:8332/rest/tx/4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a.bin
     1
     2Document Path:          /rest/tx/4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a.bin
     3Document Length:        301 bytes
     4
     5Concurrency Level:      1
     6Time taken for tests:   14.075 seconds
     7Complete requests:      100000
     8Failed requests:        0
     9Keep-Alive requests:    100000
    10Total transferred:      40500000 bytes
    11HTML transferred:       30100000 bytes
    12Requests per second:    7104.78 [#/sec] (mean)
    13Time per request:       0.141 [ms] (mean)
    14Time per request:       0.141 [ms] (mean, across all concurrent requests)
    15Transfer rate:          2810.00 [Kbytes/sec] received
    

    fetching without txindex

    0time cargo run --release -- 4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a 0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe
    1    Finished `release` profile [optimized] target(s) in 0.02s
    2     Running `target/release/bench-getrawtx 4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a 0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe`
    3iterations = 1000
    4average RPC duration = 8.563491ms
    5
    6real	0m8.628s
    7user	0m0.070s
    8sys	0m0.052s
    
  14. DrahtBot added the label CI failed on Jun 13, 2025
  15. DrahtBot commented at 5:34 pm on June 13, 2025: contributor

    🚧 At least one of the CI tasks failed. Task previous releases, depends DEBUG: https://github.com/bitcoin/bitcoin/runs/42406243587 LLM reason (✨ experimental): The CI failure is caused by a missing header file test/util/index.h during compilation.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  16. index: store per-block transaction locations for efficient lookups
    Currently, electrs and other indexers are used to maintain a map between
    an address/scripthash to the list of the relevant transactions.
    
    However, in order to fetch those transactions from bitcoind, electrs
    relies on reading the whole block and post-filtering for a specific
    transaction [1]. Other indexers use a `txindex` to fetch a transaction
    using its txid [2,3,4].
    
    The above approach has significant storage and CPU overhead, since
    the `txid` is a pseudo-random 32-byte value.
    
    This PR is adding support for using the transaction's offset within
    its block to be able to read it directly using REST API.
    
    If binary response format is used, the transaction data will be read
    directly from the storage and sent back to the client, without any
    deserialization overhead.
    
    The resulting index is much smaller (allowing it to be cached):
    
      $ du -sh indexes/locations/ indexes/txindex/
      2.5G	indexes/locations/
      57G	indexes/txindex/
    
    [1] https://github.com/romanz/electrs/blob/master/doc/schema.md
    [2] https://github.com/Blockstream/electrs/blob/new-index/doc/schema.md#txstore
    [3] https://github.com/spesmilo/electrumx/blob/master/docs/HOWTO.rst#prerequisites
    [4] https://github.com/cculianu/Fulcrum/blob/master/README.md#requirements
    d962c9a917
  17. romanz force-pushed on Jun 13, 2025
  18. romanz commented at 7:02 pm on June 13, 2025: contributor
    Rebased to fix #32541 (comment).
  19. DrahtBot removed the label CI failed on Jun 13, 2025
  20. romanz marked this as a draft on Jun 14, 2025
  21. romanz marked this as ready for review on Jun 14, 2025
  22. TheCharlatan commented at 2:34 pm on June 15, 2025: contributor

    How does this compare to getrawtransaction 0 without a txindex?

    As far as I understand the index makes this operation faster by not requiring to read the entire block and then iterating through the transactions to find the match, which I am guessing is what the last benchmark is showing. romanz, would this new endpoint be used while creating the entire index initially, or to serve certain requests? It is not quite clear to me when an indexing client wouldn’t want to read through the entire block and instead only get its transactions selectively.

  23. romanz commented at 5:20 pm on June 15, 2025: contributor

    As far as I understand the index makes this operation faster by not requiring to read the entire block and then iterating through the transactions to find the match

    Correct - the proposed index improves the performance of fetching a single transaction (similar to txindex), requiring significantly less storage.

    would this new endpoint be used while creating the entire index initially, or to serve certain requests?

    I would like the new index to be used to serve history-related queries. For example, https://electrum-protocol.readthedocs.io/en/latest/protocol-methods.html#blockchain-scripthash-get-history.

    You are right that during the history indexing process, the client doesn’t need the proposed index, since it needs to read both the entire block (and undo) data in order to create a map between a transaction’s location and its ScriptPubKeys.

    BTW, I am working on a proof-of-concept indexer (https://github.com/romanz/bindex-rs) which is using #32540 & #32541 - please let me know if there are any questions/comments/concerns :)


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-06-19 12:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me