index: store per-block transaction locations for efficient lookups #32541

pull romanz wants to merge 2 commits into bitcoin:master from romanz:locations-index changing 15 files +484 −6
  1. romanz commented at 7:27 AM on May 17, 2025: contributor

    Currently, electrs and other indexers map between an address/scripthash to the list of the relevant transactions.

    However, in order to fetch those transactions from bitcoind, electrs relies on reading the whole block and post-filtering for a specific transaction^1. Other indexers use a txindex to fetch a transaction using its txid ^2^4.

    The above approach has significant storage and CPU overhead, since the txid is a pseudo-random 32-byte value.

    This PR is adding support for using the transaction's position within its block to be able to fetch it directly using REST API, using the following HTTP request (to fetch the N-th transaction from BLOCKHASH):

    GET /rest/txfromblock/BLOCKHASH-N.bin
    

    If binary response format is used, the transaction data will be read directly from the storage and sent back to the client, without any deserialization overhead.

    The resulting index is much smaller (allowing it to be cached in RAM):

    $ du -sh indexes/locations/ indexes/txindex/
    2.5G	indexes/locations/
    57G	indexes/txindex/
    

    The new index is using the following DB schema:

    struct DBKey {
        uint256 hash;   // blockhash
        uint32_t row;   // allow splitting one block's transactions into multiple DB rows
    };
    
    struct DBValue {
        FlatFilePos block_pos;          // file id + offset of the block
        std::vector<uint32_t> offsets;  // a list of transaction offsets within the block
    };
    

    For example, when fetching the 5000th transaction of block [#90005](/bitcoin-bitcoin/90005/) using ab -k -c 1 -n 100000, enabled locationsindex improves the performance ~19x (2.574ms → 0.136ms).

    I am working on a proof-of-concept indexer (https://github.com/romanz/bindex-rs) which is using #32540 & #32541 - please let me know if there are any questions/comments/concerns :)

  2. DrahtBot commented at 7:28 AM on May 17, 2025: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32541.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK TheCharlatan, hodlinator

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30469 (index: Fix coinstats overflow by fjahr)
    • #26966 (index: initial sync speedup, parallelize process by furszy)
    • #17783 (common: Disallow calling IsArgSet() on ALLOW_LIST options by ryanofsky)
    • #17581 (refactor: Remove settings merge reverse precedence code by ryanofsky)
    • #17580 (refactor: Add ALLOW_LIST flags and enforce usage in CheckArgFlags by ryanofsky)
    • #17493 (util: Forbid ambiguous multiple assignments in config file by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  3. DrahtBot added the label UTXO Db and Indexes on May 17, 2025
  4. DrahtBot added the label CI failed on May 17, 2025
  5. DrahtBot commented at 7:37 AM on May 17, 2025: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. <sub>Task lint: https://github.com/bitcoin/bitcoin/runs/42402043332</sub> <sub>LLM reason (✨ experimental): The CI failure is due to missing include guards in src/index/locationsindex.h. </sub>

    <details><summary>Hints</summary>

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

    </details>

  6. romanz force-pushed on May 17, 2025
  7. TheCharlatan commented at 9:37 AM on May 17, 2025: contributor

    Concept ACK

    Can you add the schema of the index and the expected arguments for the REST API to the pull request description? I was a bit confused at first if this now exposes the file position, but if I read it correctly now, this just allows querying a transaction by its index in the block.

  8. romanz commented at 10:00 AM on May 17, 2025: contributor

    Concept ACK

    Thanks!

    Can you add the schema of the index and the expected arguments for the REST API to the pull request description?

    Sure - updated in #32541#issue-3070502385.

  9. romanz force-pushed on May 17, 2025
  10. romanz commented at 11:58 AM on May 17, 2025: contributor

    Fixed a few issues (following SonarQube run).

  11. DrahtBot removed the label CI failed on May 17, 2025
  12. luke-jr commented at 6:15 PM on May 20, 2025: member

    How does this compare to getrawtransaction <txid> 0 <blockhash> without a txindex?

  13. romanz commented at 9:51 AM on May 21, 2025: contributor

    I have used ApacheBench 2.3 for benchmarking REST API, and the following Rust client for getrawtransaction RPC:

    fetching using the new index

    $ ab -k -c 1 -n 100000 http://localhost:8332/rest/txfromblock/0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe-42.bin
    
    Document Path:          /rest/txfromblock/0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe-42.bin
    Document Length:        301 bytes
    
    Concurrency Level:      1
    Time taken for tests:   13.760 seconds
    Complete requests:      100000
    Failed requests:        0
    Keep-Alive requests:    100000
    Total transferred:      40500000 bytes
    HTML transferred:       30100000 bytes
    Requests per second:    7267.65 [#/sec] (mean)
    Time per request:       0.138 [ms] (mean)
    Time per request:       0.138 [ms] (mean, across all concurrent requests)
    Transfer rate:          2874.41 [Kbytes/sec] received
    

    fetching using txindex

    $ ab -k -c 1 -n 100000 http://localhost:8332/rest/tx/4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a.bin
    
    Document Path:          /rest/tx/4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a.bin
    Document Length:        301 bytes
    
    Concurrency Level:      1
    Time taken for tests:   14.075 seconds
    Complete requests:      100000
    Failed requests:        0
    Keep-Alive requests:    100000
    Total transferred:      40500000 bytes
    HTML transferred:       30100000 bytes
    Requests per second:    7104.78 [#/sec] (mean)
    Time per request:       0.141 [ms] (mean)
    Time per request:       0.141 [ms] (mean, across all concurrent requests)
    Transfer rate:          2810.00 [Kbytes/sec] received
    

    fetching without txindex

    time cargo run --release -- 4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a 0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe
        Finished `release` profile [optimized] target(s) in 0.02s
         Running `target/release/bench-getrawtx 4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a 0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe`
    iterations = 1000
    average RPC duration = 8.563491ms
    
    real	0m8.628s
    user	0m0.070s
    sys	0m0.052s
    

    Conclusions

    • The new LocationsIndex is only a few percent faster than the old TxIndex, but the on-disk footprint is ~22x smaller.

    • getrawtransaction which is used in the last benchmark has an average RPC duration of ~8.6ms vs ~0.14ms for the ones above.

  14. DrahtBot added the label CI failed on Jun 13, 2025
  15. DrahtBot commented at 5:34 PM on June 13, 2025: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. <sub>Task previous releases, depends DEBUG: https://github.com/bitcoin/bitcoin/runs/42406243587</sub> <sub>LLM reason (✨ experimental): The CI failure is caused by a missing header file test/util/index.h during compilation.</sub>

    <details><summary>Hints</summary>

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

    </details>

  16. romanz force-pushed on Jun 13, 2025
  17. romanz commented at 7:02 PM on June 13, 2025: contributor

    Rebased to fix #32541 (comment).

  18. DrahtBot removed the label CI failed on Jun 13, 2025
  19. romanz marked this as a draft on Jun 14, 2025
  20. romanz marked this as ready for review on Jun 14, 2025
  21. TheCharlatan commented at 2:34 PM on June 15, 2025: contributor

    How does this compare to getrawtransaction <txid> 0 <blockhash> without a txindex?

    As far as I understand the index makes this operation faster by not requiring to read the entire block and then iterating through the transactions to find the match, which I am guessing is what the last benchmark is showing. romanz, would this new endpoint be used while creating the entire index initially, or to serve certain requests? It is not quite clear to me when an indexing client wouldn't want to read through the entire block and instead only get its transactions selectively.

  22. romanz commented at 5:20 PM on June 15, 2025: contributor

    As far as I understand the index makes this operation faster by not requiring to read the entire block and then iterating through the transactions to find the match

    Correct - the proposed index improves the performance of fetching a single transaction (similar to txindex), requiring significantly less storage.

    would this new endpoint be used while creating the entire index initially, or to serve certain requests?

    I would like the new index to be used to serve history-related queries. For example, https://electrum-protocol.readthedocs.io/en/latest/protocol-methods.html#blockchain-scripthash-get-history.

    You are right that during the history indexing process, the client doesn't need the proposed index, since it needs to read both the entire block (and undo) data in order to map from ScriptPubKey to list of { block hash + transaction index within the block }.

    BTW, I am working on a proof-of-concept indexer (https://github.com/romanz/bindex-rs) which is using #32540 & #32541 - please let me know if there are any questions/comments/concerns :)

  23. in src/rest.cpp:412 in d962c9a917 outdated
     407 | +        }
     408 | +    }
     409 | +
     410 | +    const LocationsIndex* locations_index = g_locations_index.get();
     411 | +    if (!locations_index) {
     412 | +        return RESTERR(req, HTTP_BAD_REQUEST, "Locations index is not enabled");
    


    maflcko commented at 12:04 PM on June 20, 2025:

    not sure about returning an error here. Should be easy to implement a fallback?

    This should also make it easier to benchmark the benefit of the index.

    (I guess it is clear that RPC with json overhead eats some CPU, but out of curiosity I wonder what the difference is between reading the db value (tx location) + reading the tx versus reading the full block and implementing a raw-tx-seeker to jump to the raw bytes of the tx in the block. This should be possible by just reading the length prefixes and then skipping them. I guess the worst case scenario is when its the last transaction, where the prior transaction(s) are mostly filled with minimally sized length prefixes)


    romanz commented at 3:35 PM on June 20, 2025:

    Thanks!

    This should also make it easier to benchmark the benefit of the index.

    Sounds good - will implement a fallback.


    romanz commented at 1:29 PM on June 21, 2025:

    Implemented in df45388b97.


    romanz commented at 6:39 AM on June 22, 2025:

    Tested using the following Testnet4 block:

    -locationsindex=1

    $ BLOCKHASH=00000000000000047b26e3ef164e30f0475c12408d0ece060f90f722aa90787b
    $ ab -k -c 1 -n 100000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-0.bin
    
    Document Path:          /rest/txfromblock/00000000000000047b26e3ef164e30f0475c12408d0ece060f90f722aa90787b-0.bin
    Document Length:        273 bytes
    
    Concurrency Level:      1
    Time taken for tests:   2.983 seconds
    Complete requests:      100000
    Failed requests:        0
    Keep-Alive requests:    100000
    Total transferred:      37700000 bytes
    HTML transferred:       27300000 bytes
    Requests per second:    33527.87 [#/sec] (mean)
    Time per request:       0.030 [ms] (mean)
    Time per request:       0.030 [ms] (mean, across all concurrent requests)
    Transfer rate:          12343.76 [Kbytes/sec] received
    
    $ ab -k -c 1 -n 100000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-1000.bin
    
    Document Path:          /rest/txfromblock/00000000000000047b26e3ef164e30f0475c12408d0ece060f90f722aa90787b-1000.bin
    Document Length:        308 bytes
    
    Concurrency Level:      1
    Time taken for tests:   3.122 seconds
    Complete requests:      100000
    Failed requests:        0
    Keep-Alive requests:    100000
    Total transferred:      41200000 bytes
    HTML transferred:       30800000 bytes
    Requests per second:    32032.96 [#/sec] (mean)
    Time per request:       0.031 [ms] (mean)
    Time per request:       0.031 [ms] (mean, across all concurrent requests)
    Transfer rate:          12888.26 [Kbytes/sec] received
    
    $ ab -k -c 1 -n 100000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-2000.bin
    
    Document Path:          /rest/txfromblock/00000000000000047b26e3ef164e30f0475c12408d0ece060f90f722aa90787b-2000.bin
    Document Length:        308 bytes
    
    Concurrency Level:      1
    Time taken for tests:   3.030 seconds
    Complete requests:      100000
    Failed requests:        0
    Keep-Alive requests:    100000
    Total transferred:      41200000 bytes
    HTML transferred:       30800000 bytes
    Requests per second:    33003.94 [#/sec] (mean)
    Time per request:       0.030 [ms] (mean)
    Time per request:       0.030 [ms] (mean, across all concurrent requests)
    Transfer rate:          13278.93 [Kbytes/sec] received
    
    $ ab -k -c 1 -n 100000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-3000.bin
    
    Document Path:          /rest/txfromblock/00000000000000047b26e3ef164e30f0475c12408d0ece060f90f722aa90787b-3000.bin
    Document Length:        309 bytes
    
    Concurrency Level:      1
    Time taken for tests:   3.362 seconds
    Complete requests:      100000
    Failed requests:        0
    Keep-Alive requests:    100000
    Total transferred:      41300000 bytes
    HTML transferred:       30900000 bytes
    Requests per second:    29745.72 [#/sec] (mean)
    Time per request:       0.034 [ms] (mean)
    Time per request:       0.034 [ms] (mean, across all concurrent requests)
    Transfer rate:          11997.05 [Kbytes/sec] received
    
    $ ab -k -c 1 -n 100000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-4000.bin
    
    Document Path:          /rest/txfromblock/00000000000000047b26e3ef164e30f0475c12408d0ece060f90f722aa90787b-4000.bin
    Document Length:        309 bytes
    
    Concurrency Level:      1
    Time taken for tests:   2.854 seconds
    Complete requests:      100000
    Failed requests:        0
    Keep-Alive requests:    100000
    Total transferred:      41300000 bytes
    HTML transferred:       30900000 bytes
    Requests per second:    35040.53 [#/sec] (mean)
    Time per request:       0.029 [ms] (mean)
    Time per request:       0.029 [ms] (mean, across all concurrent requests)
    Transfer rate:          14132.56 [Kbytes/sec] received
    

    -locationsindex=0

    $ ab -k -c 1 -n 100000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-0.bin
    
    Concurrency Level:      1
    Time taken for tests:   3.245 seconds
    Complete requests:      100000
    Failed requests:        0
    Keep-Alive requests:    100000
    Total transferred:      37700000 bytes
    HTML transferred:       27300000 bytes
    Requests per second:    30819.31 [#/sec] (mean)
    Time per request:       0.032 [ms] (mean)
    Time per request:       0.032 [ms] (mean, across all concurrent requests)
    Transfer rate:          11346.56 [Kbytes/sec] received
    
    $ ab -k -c 1 -n 10000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-1000.bin
    
    Concurrency Level:      1
    Time taken for tests:   4.588 seconds
    Complete requests:      10000
    Failed requests:        0
    Keep-Alive requests:    10000
    Total transferred:      4120000 bytes
    HTML transferred:       3080000 bytes
    Requests per second:    2179.68 [#/sec] (mean)
    Time per request:       0.459 [ms] (mean)
    Time per request:       0.459 [ms] (mean, across all concurrent requests)
    Transfer rate:          876.98 [Kbytes/sec] received
    
    $ ab -k -c 1 -n 10000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-2000.bin
    
    Concurrency Level:      1
    Time taken for tests:   9.388 seconds
    Complete requests:      10000
    Failed requests:        0
    Keep-Alive requests:    10000
    Total transferred:      4120000 bytes
    HTML transferred:       3080000 bytes
    Requests per second:    1065.20 [#/sec] (mean)
    Time per request:       0.939 [ms] (mean)
    Time per request:       0.939 [ms] (mean, across all concurrent requests)
    Transfer rate:          428.58 [Kbytes/sec] received
    
    $ ab -k -c 1 -n 10000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-3000.bin
    
    Concurrency Level:      1
    Time taken for tests:   12.819 seconds
    Complete requests:      10000
    Failed requests:        0
    Keep-Alive requests:    10000
    Total transferred:      4130000 bytes
    HTML transferred:       3090000 bytes
    Requests per second:    780.10 [#/sec] (mean)
    Time per request:       1.282 [ms] (mean)
    Time per request:       1.282 [ms] (mean, across all concurrent requests)
    Transfer rate:          314.63 [Kbytes/sec] received
    
    $ ab -k -c 1 -n 10000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-4000.bin
    
    Concurrency Level:      1
    Time taken for tests:   17.962 seconds
    Complete requests:      10000
    Failed requests:        0
    Keep-Alive requests:    10000
    Total transferred:      4130000 bytes
    HTML transferred:       3090000 bytes
    Requests per second:    556.73 [#/sec] (mean)
    Time per request:       1.796 [ms] (mean)
    Time per request:       1.796 [ms] (mean, across all concurrent requests)
    Transfer rate:          224.54 [Kbytes/sec] received
    
  24. maflcko commented at 12:05 PM on June 20, 2025: member

    lgtm. I wonder if there should be a fallback?

  25. romanz force-pushed on Jun 21, 2025
  26. romanz commented at 9:58 AM on June 21, 2025: contributor

    Rebased over master to use std::vector<std::byte> (following #32743).

  27. romanz marked this as a draft on Jun 21, 2025
  28. DrahtBot added the label CI failed on Jun 21, 2025
  29. DrahtBot commented at 2:43 PM on June 21, 2025: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. <sub>Task MSan, depends: https://github.com/bitcoin/bitcoin/runs/44528503073</sub> <sub>LLM reason (✨ experimental): The CI failure is caused by the abort in the locationsindex_tests subprocess.</sub>

    <details><summary>Hints</summary>

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

    </details>

  30. romanz force-pushed on Jun 21, 2025
  31. romanz force-pushed on Jun 21, 2025
  32. romanz marked this as ready for review on Jun 21, 2025
  33. DrahtBot removed the label CI failed on Jun 21, 2025
  34. romanz requested review from maflcko on Jun 21, 2025
  35. romanz commented at 7:32 AM on June 22, 2025: contributor

    BTW, if I understand correctly, this index may also improve the performance of SendBlockTransactions during handling of GETBLOCKTXN requests.

  36. in src/node/blockstorage.cpp:1182 in df45388b97 outdated
    1177 | +    size_t tx_count = ReadCompactSize(filein);
    1178 | +    if (tx_index >= tx_count) {
    1179 | +        return false;
    1180 | +    }
    1181 | +
    1182 | +    SkipTransaction skip_tx;
    


    maflcko commented at 1:20 PM on June 23, 2025:

    thanks for implementing this idea. Though, at least locally on my system it looks like it didn't improve the end-to-end performance for me, at least not in a significant benchmark.

    So it seems fine to use the "normal" deserialization and even go through a redundant ser-deser loop?

    diff --git a/src/node/blockstorage.cpp b/src/node/blockstorage.cpp
    index a049122894..64cb914cb1 100644
    --- a/src/node/blockstorage.cpp
    +++ b/src/node/blockstorage.cpp
    @@ -1179,17 +1179,14 @@ bool BlockManager::ReadRawTxFromBlock(std::vector<std::byte>& tx_bytes, const Fl
             return false;
         }
     
    -    SkipTransaction skip_tx;
    +    CMutableTransaction skip_tx;
         for (size_t i = 0; i < tx_index; ++i) {
    -        filein >> skip_tx;
    +        filein >> TX_WITH_WITNESS(skip_tx);
         }
    -    int64_t tx_start = filein.tell();
    -    filein >> skip_tx;
    -    int64_t tx_end = filein.tell();
    +        filein >> TX_WITH_WITNESS(skip_tx);
     
    -    tx_bytes.resize(tx_end - tx_start);
    -    filein.seek(tx_start, SEEK_SET);
    -    filein.read(tx_bytes);
    +    tx_bytes.resize(0);
    +    VectorWriter {*(std::vector<unsigned char> *)&tx_bytes,tx_bytes.size()}<<TX_WITH_WITNESS(skip_tx);
         return true;
     }
     
    

    maflcko commented at 3:16 PM on June 23, 2025:

    Wrapping the autofile into BufferedReader helped more on my system with reading later transactions in the block via this fallback branch.


    romanz commented at 8:02 PM on June 23, 2025:

    romanz commented at 5:11 PM on June 26, 2025:

    Benchmark results

    Tested 99b48777ee using ab -k -c 1 -n 10000 over block 900005 (having >5000 txs).

    BLOCKHASH=000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec
    

    -locationsindex=1

    Document Path:          /rest/txfromblock/$BLOCKHASH-0.bin
    Document Length:        384 bytes
    Requests per second:    7387.93 [#/sec] (mean)
    Time per request:       0.135 [ms] (mean)
    
    Document Path:          /rest/txfromblock/$BLOCKHASH-1000.bin
    Document Length:        234 bytes
    Requests per second:    7875.35 [#/sec] (mean)
    Time per request:       0.127 [ms] (mean)
    
    Document Path:          /rest/txfromblock/$BLOCKHASH-2000.bin
    Document Length:        234 bytes
    Requests per second:    7470.15 [#/sec] (mean)
    Time per request:       0.134 [ms] (mean)
    
    Document Path:          /rest/txfromblock/$BLOCKHASH-3000.bin
    Document Length:        220 bytes
    Requests per second:    7102.34 [#/sec] (mean)
    Time per request:       0.141 [ms] (mean)
    
    Document Path:          /rest/txfromblock/$BLOCKHASH-4000.bin
    Document Length:        221 bytes
    Requests per second:    7569.49 [#/sec] (mean)
    Time per request:       0.132 [ms] (mean)
    
    Document Path:          /rest/txfromblock/$BLOCKHASH-5000.bin
    Document Length:        234 bytes
    Requests per second:    7333.99 [#/sec] (mean)
    Time per request:       0.136 [ms] (mean)
    

    -locationsindex=0

    Document Path:          /rest/txfromblock/$BLOCKHASH-0.bin
    Document Length:        384 bytes
    Requests per second:    4030.87 [#/sec] (mean)
    Time per request:       0.248 [ms] (mean)
    
    Document Path:          /rest/txfromblock/$BLOCKHASH-1000.bin
    Document Length:        234 bytes
    Requests per second:    764.46 [#/sec] (mean)
    Time per request:       1.308 [ms] (mean)
    
    Document Path:          /rest/txfromblock/$BLOCKHASH-2000.bin
    Document Length:        234 bytes
    Requests per second:    526.17 [#/sec] (mean)
    Time per request:       1.901 [ms] (mean)
    
    Document Path:          /rest/txfromblock/$BLOCKHASH-3000.bin
    Document Length:        220 bytes
    Requests per second:    391.21 [#/sec] (mean)
    Time per request:       2.556 [ms] (mean)
    
    Document Path:          /rest/txfromblock/$BLOCKHASH-4000.bin
    Document Length:        221 bytes
    Requests per second:    261.30 [#/sec] (mean)
    Time per request:       3.827 [ms] (mean)
    
    Document Path:          /rest/txfromblock/$BLOCKHASH-5000.bin
    Document Length:        234 bytes
    Requests per second:    208.15 [#/sec] (mean)
    Time per request:       4.804 [ms] (mean)
    
  37. maflcko approved
  38. maflcko commented at 1:20 PM on June 23, 2025: member

    lgtm

  39. maflcko commented at 1:22 PM on June 23, 2025: member

    BTW, if I understand correctly, this index may also improve the performance of SendBlockTransactions during handling of GETBLOCKTXN requests.

    This is only the fallback where the recent block fell out of memory. If this happens, the performance either shouldn't matter, or it may be better to keep more recent blocks in memory than just a single one.

  40. romanz force-pushed on Jun 23, 2025
  41. romanz force-pushed on Jun 28, 2025
  42. romanz commented at 6:43 AM on June 28, 2025: contributor

    Rebased over master and fixed rest_tx_from_block() argument name case (in addition to #32825).

  43. romanz force-pushed on Jun 30, 2025
  44. romanz commented at 1:13 PM on June 30, 2025: contributor

    Rebased over master after #32825 is merged (to resolve a small conflict).

  45. romanz commented at 6:07 PM on July 5, 2025: contributor

    Added a short description in doc/REST-interface.md and doc/files.md, and updated release notes in https://github.com/bitcoin/bitcoin/pull/32541/commits/c695d134683d52bce9e499a5848e4c4c7951155c. Please let me know if there are any additional open issues :)

  46. romanz commented at 6:54 PM on July 14, 2025: contributor

    Ping :)

  47. romanz force-pushed on Jul 14, 2025
  48. romanz force-pushed on Jul 19, 2025
  49. romanz commented at 5:18 PM on July 19, 2025: contributor

    I have split the content of this PR into 2 commits:

    • 784459ac79 is adding the new /rest/txfromblock/ endpoint
    • 1b928f58fc is adding an optional locationsindex for optimizing lookups
  50. in src/node/blockstorage.cpp:1083 in 784459ac79 outdated
    1079 | @@ -1080,6 +1080,32 @@ bool BlockManager::ReadRawBlock(std::vector<std::byte>& block, const FlatFilePos
    1080 |      return true;
    1081 |  }
    1082 |  
    1083 | +static const uint32_t HEADER_SIZE = ::GetSerializeSize(CBlockHeader{});
    


    hodlinator commented at 6:25 PM on August 2, 2025:

    Unused variable. Was this supposed to be used to skip forward in the stream instead of deserializing the header?


    romanz commented at 6:43 AM on August 3, 2025:

    Thanks - removed.

    Was this supposed to be used to skip forward in the stream instead of deserializing the header?

    Yes, but now it's not needed here.


    hodlinator commented at 7:17 AM on August 4, 2025:

    Better to not add it in 784459ac79a89f591eb52a4c6c266c421ca8baec?


    romanz commented at 11:32 AM on August 4, 2025:

    Definitely - should be fixed now:

    $ git diff 784459ac79 104b40c8c9
    diff --git a/src/node/blockstorage.cpp b/src/node/blockstorage.cpp
    index c8d824bb6c..05ec73697c 100644
    --- a/src/node/blockstorage.cpp
    +++ b/src/node/blockstorage.cpp
    @@ -1080,8 +1080,6 @@ bool BlockManager::ReadRawBlock(std::vector<std::byte>& block, const FlatFilePos
         return true;
     }
     
    -static const uint32_t HEADER_SIZE = ::GetSerializeSize(CBlockHeader{});
    -
     bool BlockManager::ReadTxFromBlock(CTransactionRef& tx, const FlatFilePos& block_pos, size_t tx_index) const
     {
         AutoFile file{OpenBlockFile(block_pos, /*fReadOnly=*/true)};
    
  51. in src/index/locationsindex.cpp:70 in 1b928f58fc outdated
      65 | +    fs::create_directories(path);
      66 | +
      67 | +    m_db = std::make_unique<BaseIndex::DB>(path / "db", n_cache_size, f_memory, f_wipe);
      68 | +}
      69 | +
      70 | +static const uint32_t HEADER_SIZE = ::GetSerializeSize(CBlockHeader{});
    


    hodlinator commented at 6:29 PM on August 2, 2025:

    Would prefer it be moved to function-local scope so we don't add any startup time for those who never enable -locationsindex or use the REST interface.


    romanz commented at 6:43 AM on August 3, 2025:

    Thanks - moved into LocationsIndex::CustomAppend().

  52. in doc/REST-interface.md:44 in 1b928f58fc outdated
      35 | @@ -36,6 +36,14 @@ Responds with 404 if the transaction doesn't exist.
      36 |  By default, this endpoint will only search the mempool.
      37 |  To query for a confirmed transaction, enable the transaction index via "txindex=1" command line / configuration option.
      38 |  
      39 | +`GET /rest/txfromblock/<BLOCK-HASH>-<TX-OFFSET>.<bin|hex|json>`
      40 | +
      41 | +Given a block hash and transaction offset within it: returns a transaction in binary, hex-encoded binary, or JSON formats.
      42 | +Responds with 404 if the transaction doesn't exist.
      43 | +
      44 | +By default, this endpoint will read also the leading transactions, before reading and returning the requested one.
    


    hodlinator commented at 6:33 PM on August 2, 2025:

    nit:

    • Improved order?
    • Clarify that wasted work is being done.
    By default, this endpoint will also deserialize the leading transactions, before reading and returning the requested one.
    

    romanz commented at 6:43 AM on August 3, 2025:

    Thanks - fixed.


    hodlinator commented at 7:25 AM on August 4, 2025:

    Typo in added parenthesis: beggining


    romanz commented at 11:34 AM on August 4, 2025:

    Oops, sorry -> should be fixed now:

    $ git diff b2a22ce33d 4441827ef4
    diff --git a/doc/REST-interface.md b/doc/REST-interface.md
    index ffb3c09af1..8f8431c29a 100644
    --- a/doc/REST-interface.md
    +++ b/doc/REST-interface.md
    @@ -42,7 +42,7 @@ Given a block hash and transaction offset within it: returns a transaction in bi
     Responds with 404 if the transaction doesn't exist.
     
     By default, this endpoint will also deserialize the leading transactions, before reading and returning the requested one
    -(which results in wasted deserialization work if the transaction is not in the beggining of the block).
    +(which results in wasted deserialization work if the transaction is not in the beginning of the block).
     To read the requested transaction directly, enable the transaction locations' index via "locationsindex=1" command line / configuration option.
     
     #### Blocks
    
  53. in src/index/locationsindex.cpp:5 in 1b928f58fc outdated
       0 | @@ -0,0 +1,143 @@
       1 | +// Copyright (c) 2018-2022 The Bitcoin Core developers
       2 | +// Distributed under the MIT software license, see the accompanying
       3 | +// file COPYING or http://www.opensource.org/licenses/mit-license.php.
       4 | +
       5 | +#include <map>
    


    hodlinator commented at 6:37 PM on August 2, 2025:

    nit: Please order std library headers last. Seems unused though.


    romanz commented at 6:43 AM on August 3, 2025:

    Removed.

  54. in src/rest.cpp:524 in 1b928f58fc outdated
     514 | +        }
     515 | +        if (index >= pindex->nTx) {
     516 | +            return RESTERR(req, HTTP_BAD_REQUEST, strprintf("Block %s has only %d transactions", uriParts[0], pindex->nTx));
     517 | +        }
     518 | +        block_pos = pindex->GetBlockPos();
     519 | +    }
    


    hodlinator commented at 6:50 PM on August 2, 2025:

    We can skip this work if LocationsIndex is enabled as it doesn't use block_pos.


    romanz commented at 6:42 AM on August 3, 2025:

    Good catch - fixed.

  55. in src/rest.cpp:536 in 1b928f58fc outdated
     531 | +        }
     532 | +        success = locations_index->ReadRawTransaction(*block_hash, index, tx_bytes);
     533 | +    }
     534 | +
     535 | +    if (!success) {
     536 | +        return RESTERR(req, HTTP_NOT_FOUND, strprintf("Failed to read transaction #%d from block %s", index, (*block_hash).ToString()));
    


    hodlinator commented at 6:52 PM on August 2, 2025:

    nit:

            return RESTERR(req, HTTP_NOT_FOUND, strprintf("Failed to read transaction #%d from block %s", index, block_hash->ToString()));
    

    romanz commented at 6:42 AM on August 3, 2025:

    Thanks - fixed.

  56. in src/index/locationsindex.cpp:28 in 1b928f58fc outdated
      23 | +
      24 | +namespace {
      25 | +
      26 | +struct DBKey {
      27 | +    uint256 hash;
      28 | +    uint32_t part; // allow splitting one block's transactions into multiple DB rows
    


    hodlinator commented at 7:37 PM on August 2, 2025:

    nit: Might be less confusing to use "row" everywhere here instead of "part"?


    romanz commented at 6:42 AM on August 3, 2025:

    Thanks - renamed to row.

  57. in src/index/locationsindex.cpp:107 in 1b928f58fc outdated
     102 | +}
     103 | +
     104 | +bool LocationsIndex::ReadRawTransaction(const uint256& block_hash, size_t i, std::vector<std::byte>& out) const
     105 | +{
     106 | +    uint32_t part = i / TXS_PER_ROW; // used to find the correct DB row
     107 | +    i = i % TXS_PER_ROW;             // index within a single DB row
    


    hodlinator commented at 7:39 PM on August 2, 2025:

    nit: Could use local variable with descriptive name rather than mutating in-param:

        const auto column = i % TXS_PER_ROW;             // index within a single DB row
    

    romanz commented at 6:42 AM on August 3, 2025:

    Thanks - fixed.

  58. in src/index/locationsindex.cpp:118 in 1b928f58fc outdated
     113 | +
     114 | +    if (value.offsets.empty()) {
     115 | +        LogError("%s: LocationsIndex entry for %s:%u must have >1 offsets\n", __func__, block_hash.ToString(), part);
     116 | +        return false;
     117 | +    }
     118 | +    size_t nTx = value.offsets.size() - 1;
    


    hodlinator commented at 7:42 PM on August 2, 2025:

    nit: Should use snake_case naming convention as per developer-notes.md, n_tx/num_tx/tx_count?


    romanz commented at 6:42 AM on August 3, 2025:

    Thanks - renamed to tx_count.

  59. in src/index/locationsindex.cpp:114 in 1b928f58fc outdated
     109 | +    DBValue value{};
     110 | +    if (!m_db->Read(DBKey{block_hash, part}, value)) {
     111 | +        return false;
     112 | +    }
     113 | +
     114 | +    if (value.offsets.empty()) {
    


    hodlinator commented at 7:48 PM on August 2, 2025:

    Off-by-one - should match error message and later code:

        if (value.offsets.size() < 2) {
    

    romanz commented at 6:42 AM on August 3, 2025:

    Good catch, thanks!

  60. in src/index/locationsindex.h:10 in 1b928f58fc outdated
       5 | +#ifndef BITCOIN_INDEX_LOCATIONSINDEX_H
       6 | +#define BITCOIN_INDEX_LOCATIONSINDEX_H
       7 | +
       8 | +#include <attributes.h>
       9 | +#include <chain.h>
      10 | +#include <flatfile.h>
    


    hodlinator commented at 7:51 PM on August 2, 2025:

    nit: Unused header.


    romanz commented at 6:42 AM on August 3, 2025:

    Removed.

  61. in src/test/locationsindex_tests.cpp:70 in 1b928f58fc outdated
      65 | +        {
      66 | +            LOCK(cs_main);
      67 | +            block_index = m_node.chainman->m_blockman.LookupBlockIndex(block_hash);
      68 | +            block_pos = block_index->GetBlockPos();
      69 | +        }
      70 | +        BOOST_REQUIRE(block_index);
    


    hodlinator commented at 8:42 PM on August 2, 2025:

    Do we need to fetch a second CBlockIndex? This passes the test:

            FlatFilePos block_pos{WITH_LOCK(cs_main, return tip->GetBlockPos())};
    

    romanz commented at 6:34 AM on August 3, 2025:

    You're right, thanks - removed the unneeded code and renamed tip to block_index.

  62. hodlinator commented at 9:19 PM on August 2, 2025: contributor

    Concept ACK 1b928f58fc78f4727cef988902075abc05c372b2

    Cool way of compressing transaction metadata to significantly reduce on-disk footprint compared to TxIndex and make it more likely this one fits in RAM. (Fine as long as external code can fetch by block hash + tx index instead of by txid).

    Optimization as I understand it: All transactions within the same block have the same FlatFilePos base, which is compressed by a factor of 128 thanks to TXS_PER_ROW. Instead of storing an individual tx offset and size for each tx, it uses 2 offsets for a tx request, which allows re-use of offsets between adjacent txs.

    Thanks for electrs!

  63. romanz force-pushed on Aug 3, 2025
  64. romanz commented at 7:50 AM on August 3, 2025: contributor

    Many thanks for the review! Applied the comments: 1b928f5 -> b2a22ce

  65. romanz requested review from hodlinator on Aug 3, 2025
  66. romanz requested review from maflcko on Aug 3, 2025
  67. romanz force-pushed on Aug 4, 2025
  68. romanz commented at 12:48 PM on August 4, 2025: contributor

    Applied a few more fixes: b2a22ce -> 4441827

  69. romanz commented at 7:20 PM on August 25, 2025: contributor

    Ping :)

  70. in src/index/locationsindex.cpp:6 in 4441827ef4 outdated
       0 | @@ -0,0 +1,141 @@
       1 | +// Copyright (c) 2018-2022 The Bitcoin Core developers
       2 | +// Distributed under the MIT software license, see the accompanying
       3 | +// file COPYING or http://www.opensource.org/licenses/mit-license.php.
       4 | +
       5 | +#include <blockencodings.h>
       6 | +#include <clientversion.h>
    


    hodlinator commented at 12:47 PM on August 26, 2025:

    nit: Unused: clientversion.h


  71. in src/index/locationsindex.cpp:76 in 4441827ef4 outdated
      71 | +
      72 | +    assert(block.data);
      73 | +    assert(block.file_number >= 0);
      74 | +
      75 | +    const uint32_t nTx = block.data->vtx.size();
      76 | +    uint32_t nTxOffset = HEADER_SIZE + GetSizeOfCompactSize(nTx);
    


    hodlinator commented at 12:54 PM on August 26, 2025:

    nit: snake_case and narrower types:

        Assume(block.data->vtx.size() <= std::numeric_limits<uint32_t>::max());
        const uint32_t tx_count{static_cast<uint32_t>(block.data->vtx.size())};
        uint32_t tx_offset{HEADER_SIZE + GetSizeOfCompactSize(tx_count)};
    

  72. in src/index/locationsindex.cpp:105 in 4441827ef4 outdated
     100 | +}
     101 | +
     102 | +bool LocationsIndex::ReadRawTransaction(const uint256& block_hash, size_t i, std::vector<std::byte>& out) const
     103 | +{
     104 | +    uint32_t row = i / TXS_PER_ROW;      // used to find the correct DB row
     105 | +    const auto column = i % TXS_PER_ROW; // index within a single DB row
    


    hodlinator commented at 1:05 PM on August 26, 2025:

    nit: Could make row const as well and nail down the types:

    bool LocationsIndex::ReadRawTransaction(const uint256& block_hash, uint32_t i, std::vector<std::byte>& out) const
    {
        const uint32_t row{i / TXS_PER_ROW};      // used to find the correct DB row
        const uint32_t column{i % TXS_PER_ROW}; // index within a single DB row
    

  73. in src/index/locationsindex.cpp:132 in 4441827ef4 outdated
     127 | +    if (file.IsNull()) {
     128 | +        LogError("%s: OpenBlockFile failed\n", __func__);
     129 | +        return false;
     130 | +    }
     131 | +
     132 | +    out.resize(tx_size); // Zeroing of memory is intentional here
    


    hodlinator commented at 1:11 PM on August 26, 2025:

    Don't see a good way around zeroing the memory, but comment makes it sound like a good thing? I guess it's slightly better than returning a vector containing garbage upon failure, but even better would be to clear it upon failure?

    nit: Could change:

    -    bool ReadRawTransaction(const uint256& block_hash, size_t i, std::vector<std::byte>& out) const;
    +    std::vector<std::byte> ReadRawTransaction(const uint256& block_hash, uint32_t i) const;
    

    And only return non-empty vector on success instead of having a separate bool? Somewhat unorthodox compared to other Index-implementations but more modern in that it exploits the fact that vectors are movable.


  74. in src/rest.cpp:560 in 4441827ef4 outdated
     555 | +            serialized << TX_WITH_WITNESS(tx);
     556 | +            strHex = HexStr(serialized);
     557 | +        } else {
     558 | +            strHex = HexStr(tx_bytes);
     559 | +        }
     560 | +        strHex.append("\n");
    


    hodlinator commented at 5:34 PM on August 26, 2025:

    nit: Might as well use the fact that it's only 1 char:

            strHex.push_back('\n');
    

  75. in src/rest.cpp:571 in 4441827ef4 outdated
     566 | +        if (!tx) {
     567 | +            DataStream{tx_bytes} >> TX_WITH_WITNESS(tx);
     568 | +        }
     569 | +        UniValue objTx(UniValue::VOBJ);
     570 | +        TxToUniv(*tx, /*block_hash=*/{}, /*entry=*/objTx);
     571 | +        std::string strJSON = objTx.write() + "\n";
    


    hodlinator commented at 5:35 PM on August 26, 2025:

    nit:

            std::string strJSON = objTx.write() + '\n';
    

  76. in src/rest.cpp:513 in 4441827ef4 outdated
     508 | +    }
     509 | +
     510 | +    FlatFilePos block_pos{};
     511 | +    if (!locations_index) {
     512 | +        LOCK(cs_main);
     513 | +        const CBlockIndex* pindex{chainman.m_blockman.LookupBlockIndex(*block_hash)};
    


    hodlinator commented at 5:42 PM on August 26, 2025:

    nit: Since we already have index and locations_index in scope here, could call this block, or block_index?

            const CBlockIndex* block{chainman.m_blockman.LookupBlockIndex(*block_hash)};
    

  77. in src/rest.cpp:485 in 4441827ef4 outdated
     480 | +
     481 | +    std::string param;
     482 | +    const RESTResponseFormat rf = ParseDataFormat(param, uri_part);
     483 | +
     484 | +    // request is sent over URI scheme /rest/txfromblock/blockhash-index
     485 | +    std::vector<std::string_view> uriParts{util::Split<std::string_view>(param, '-')};
    


    hodlinator commented at 5:45 PM on August 26, 2025:

    nit: Should use snake_case as per developer-notes.md:

    • uri_parts
    • str_hex
    • obj_tx
    • str_JSON

  78. in src/index/locationsindex.h:15 in 4441827ef4 outdated
      10 | +#include <index/base.h>
      11 | +
      12 | +static constexpr bool DEFAULT_LOCATIONSINDEX{false};
      13 | +
      14 | +/**
      15 | + * LocationsIndex is used to store and retrieve transactions' location within a block.
    


    hodlinator commented at 5:51 PM on August 26, 2025:

    2 nits:

    1. Could reword and elaborate
       * LocationsIndex is used to store and retrieve the location of transactions within a block.
       *
       * This is done in a compressed manner, allowing the whole index to fit in RAM on decent hardware.
       * For example, the on-disk size was <X>GiB at block height <Y>.
      
    2. Could clarify PR description unless I'm mistaken.
      -allowing it to be cached
      +allowing it to be cached in RAM
      

  79. in src/CMakeLists.txt:193 in 4441827ef4 outdated
     237 | @@ -238,6 +238,7 @@ add_library(bitcoin_node STATIC EXCLUDE_FROM_ALL
     238 |    index/base.cpp
    


    hodlinator commented at 6:19 PM on August 26, 2025:

    (Inline comment in random location to keep main thread cleaner).

    nit: At #32541 (comment) you write:

    You are right that during the history indexing process, the client doesn't need the proposed index, since it needs to read both the entire block (and undo) data in order to create a map between a transaction's location and its ScriptPubKeys.

    IIUC, saying something closer to "map from ScriptPubKey to list of { block hash + tx index within the block }" would be more precise? If so, might be good to edit for future reviewers.


    romanz commented at 7:06 AM on August 31, 2025:

    Thanks - updated the comment.

  80. in src/CMakeLists.txt:194 in 4441827ef4 outdated
     237 | @@ -238,6 +238,7 @@ add_library(bitcoin_node STATIC EXCLUDE_FROM_ALL
     238 |    index/base.cpp
     239 |    index/blockfilterindex.cpp
    


    hodlinator commented at 6:52 PM on August 26, 2025:

    (Inline comment in random location to keep main thread cleaner).

    nit: In the benchmark comment #32541 (comment)

    Maybe you could edit it to add something like:


    Conclusions: The new LocationsIndex is only a few percent faster than the old TxIndex, but the on-disk footprint is ~22x smaller.

    getrawtransaction which is used in the last benchmark has an average RPC duration of ~8.6ms vs ~0.14ms for the ones above.


    The reason I push for this is that I had to do a double-take even when coming back to this PR. I think it would assist reviewers. Hopefully this would not be grinding down the dinner into mushy baby food, making the reviews less thorough and leaving jaws underdeveloped.


    romanz commented at 6:04 PM on August 30, 2025:

    Sounds good - done.

  81. in src/init.cpp:1784 in 4441827ef4 outdated
    1779 | @@ -1777,6 +1780,11 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
    1780 |          node.indexes.emplace_back(g_coin_stats_index.get());
    1781 |      }
    1782 |  
    1783 | +    if (args.GetBoolArg("-locationsindex", DEFAULT_LOCATIONSINDEX)) {
    1784 | +        g_locations_index = std::make_unique<LocationsIndex>(interfaces::MakeChain(node), /*cache_size=*/0, false, do_reindex);
    


    hodlinator commented at 7:02 PM on August 26, 2025:

    Have you experimented with different cache sizes?


    hodlinator commented at 7:07 PM on August 26, 2025:

    nit: Better to name bool parameter:

            g_locations_index = std::make_unique<LocationsIndex>(interfaces::MakeChain(node), /*cache_size=*/0, /*f_memory=*/false, do_reindex);
    

    (These comments are validated by clang-tidy btw).

    Even though BaseIndex::DB::DB() names it f_memory, I think it would be clearer to use the term in DBParams:

            g_locations_index = std::make_unique<LocationsIndex>(interfaces::MakeChain(node), /*cache_size=*/0, /*memory_only=*/false, do_reindex);
    

    (If you do this, please rename it in LocationsIndex.h/cpp).


    romanz commented at 6:05 PM on August 30, 2025:

    Not yet - will do and report the results here.



    romanz commented at 5:22 AM on September 4, 2025:

    It seems that OS block cache is quite effective (even with /*cache_size=*/0), so I am getting similar performance when using 3GiB DB cache compared to #32541 (comment).

    Tested with the patch below:

    $ ab -k -c 1 -n 100000 http://localhost:8332/rest/txfromblock/$BLOCKHASH-5000.bin
    ...
    Document Path:          /rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
    Document Length:        234 bytes
    
    Concurrency Level:      1
    Time taken for tests:   13.497 seconds
    Complete requests:      100000
    Failed requests:        0
    Keep-Alive requests:    100000
    Total transferred:      33800000 bytes
    HTML transferred:       23400000 bytes
    Requests per second:    7409.06 [#/sec] (mean)
    Time per request:       0.135 [ms] (mean)
    Time per request:       0.135 [ms] (mean, across all concurrent requests)
    Transfer rate:          2445.57 [Kbytes/sec] received
    
    diff --git a/src/init.cpp b/src/init.cpp
    index 97ca0d1b18..3bab318790 100644
    --- a/src/init.cpp
    +++ b/src/init.cpp
    @@ -1822,7 +1822,7 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
         }
     
         if (args.GetBoolArg("-locationsindex", DEFAULT_LOCATIONSINDEX)) {
    -        g_locations_index = std::make_unique<LocationsIndex>(interfaces::MakeChain(node), /*cache_size=*/0, /*memory_only=*/false, do_reindex);
    +        g_locations_index = std::make_unique<LocationsIndex>(interfaces::MakeChain(node), /*cache_size=*/3000_MiB, /*memory_only=*/false, do_reindex);
             node.indexes.emplace_back(g_locations_index.get());
         }
    

    hodlinator commented at 10:59 AM on September 5, 2025:

    Wasn't getting consistent results until I ran doas pyperf system tune and got the variance down from ±1.0ms to ±0.1ms.

    Used curl + hyperfine for clearer numbers on variance:

    ₿ hyperfine "curl -v http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin --output /dev/null" --warmup=1000 --min-runs=5000 --shell=none
    

    With 0 cache:

    Benchmark 1: curl -v http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin --output /dev/null
      Time (mean ± σ):       4.5 ms ±   0.1 ms    [User: 1.9 ms, System: 2.0 ms]
      Range (min … max):     4.1 ms …   5.5 ms    5000 runs
    

    With 3'000_MIB cache:

    Benchmark 1: curl -v http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin --output /dev/null
      Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.9 ms, System: 2.1 ms]
      Range (min … max):     4.1 ms …   5.9 ms    10000 runs
     
      Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
    

    Confirms that cache doesn't make any material difference (Edit: at least not on NVMe).

  82. in src/rest.cpp:503 in 4441827ef4 outdated
     498 | +    }
     499 | +    const size_t index = *parsed_index;
     500 | +
     501 | +    ChainstateManager* maybe_chainman = GetChainman(context, req);
     502 | +    if (!maybe_chainman) return false;
     503 | +    ChainstateManager& chainman = *maybe_chainman;
    


    hodlinator commented at 7:17 PM on August 26, 2025:

    nit: chainman is only used in the !locations_index scenario, so could be skipped unless you want to always require one.


  83. in src/index/locationsindex.cpp:65 in 4441827ef4 outdated
      60 | +    : BaseIndex(std::move(chain), "locationsindex")
      61 | +{
      62 | +    fs::path path = gArgs.GetDataDirNet() / "indexes" / "locations";
      63 | +    fs::create_directories(path);
      64 | +
      65 | +    m_db = std::make_unique<BaseIndex::DB>(path / "db", n_cache_size, f_memory, f_wipe);
    


    hodlinator commented at 8:13 PM on August 26, 2025:

    Not sure why explicit create_directories() is needed, this seems to work:

        , m_db{std::make_unique<BaseIndex::DB>(gArgs.GetDataDirNet() / "indexes" / "locations" / "db",
                                               n_cache_size, f_memory, f_wipe)}
    {
    

  84. in test/functional/interface_rest.py:53 in 4441827ef4 outdated
      52 | @@ -53,7 +53,7 @@ def filter_output_indices_by_value(vouts, value):
      53 |  class RESTTest (BitcoinTestFramework):
    


    hodlinator commented at 8:27 PM on August 26, 2025:

    (Inline comment in random location to keep main thread cleaner).

    Played around with adding coverage in feature_init.py. However, I encountered some weird interaction between txindex and locationsindex. Not sure how incidental it is (going above a certain number of files) or whether some kind of interference is happening. See comment "If we add this, then we fail later while perturbing txindex":

    diff --git a/test/functional/feature_init.py b/test/functional/feature_init.py
    index 15b3e8595f..47e29655ce 100755
    --- a/test/functional/feature_init.py
    +++ b/test/functional/feature_init.py
    @@ -49,7 +49,7 @@ class InitTest(BitcoinTestFramework):
     
             def start_expecting_error(err_fragment):
                 node.assert_start_raises_init_error(
    -                extra_args=['-txindex=1', '-blockfilterindex=1', '-coinstatsindex=1', '-checkblocks=200', '-checklevel=4'],
    +                extra_args=['-txindex=1', '-blockfilterindex=1', '-coinstatsindex=1', '-locationsindex=1', '-checkblocks=200', '-checklevel=4'],
                     expected_msg=err_fragment,
                     match=ErrorMatch.PARTIAL_REGEX,
                 )
    @@ -77,6 +77,7 @@ class InitTest(BitcoinTestFramework):
                 b'txindex thread start',
                 b'block filter index thread start',
                 b'coinstatsindex thread start',
    +            b'locationsindex thread start',
                 b'msghand thread start',
                 b'net thread start',
                 b'addcon thread start',
    @@ -84,7 +85,7 @@ class InitTest(BitcoinTestFramework):
             if self.is_wallet_compiled():
                 lines_to_terminate_after.append(b'Verifying wallet')
     
    -        args = ['-txindex=1', '-blockfilterindex=1', '-coinstatsindex=1']
    +        args = ['-txindex=1', '-blockfilterindex=1', '-coinstatsindex=1', '-locationsindex=1']
             for terminate_line in lines_to_terminate_after:
                 self.log.info(f"Starting node and will terminate after line {terminate_line}")
                 with node.busy_wait_for_debug_log([terminate_line]):
    @@ -101,12 +102,13 @@ class InitTest(BitcoinTestFramework):
             self.stop_node(0)
     
             self.log.info("Test startup errors after removing certain essential files")
    -
             files_to_delete = {
                 'blocks/index/*.ldb': 'Error opening block database.',
                 'chainstate/*.ldb': 'Error opening coins database.',
                 'blocks/blk*.dat': 'Error loading block database.',
                 'indexes/txindex/MANIFEST*': 'LevelDB error: Corruption: CURRENT points to a non-existent file',
    +            # If we add this, then we fail later while perturbing txindex!?:
    +            #'indexes/locations/db/MANIFEST*': 'LevelDB error: Corruption: CURRENT points to a non-existent file',
                 # Removing these files does not result in a startup error:
                 # 'indexes/blockfilter/basic/*.dat', 'indexes/blockfilter/basic/db/*.*', 'indexes/coinstats/db/*.*',
                 # 'indexes/txindex/*.log', 'indexes/txindex/CURRENT', 'indexes/txindex/LOCK'
    @@ -120,6 +122,8 @@ class InitTest(BitcoinTestFramework):
                 'indexes/coinstats/db/*.*': 'LevelDB error: Corruption',
                 'indexes/txindex/*.log': 'LevelDB error: Corruption',
                 'indexes/txindex/CURRENT': 'LevelDB error: Corruption',
    +            'indexes/locations/db/*.log': 'LevelDB error: Corruption',
    +            'indexes/locations/db/CURRENT': 'LevelDB error: Corruption',
                 # Perturbing these files does not result in a startup error:
                 # 'indexes/blockfilter/basic/*.dat', 'indexes/txindex/MANIFEST*', 'indexes/txindex/LOCK'
             }
    

    romanz commented at 6:03 PM on August 30, 2025:

    Good catch, will investigate.


    romanz commented at 6:50 AM on August 31, 2025:


    hodlinator commented at 7:33 AM on September 3, 2025:

    94389c28e1068ffcc116614d16ac3047eb3068e3 seems to be passing (after a rebase over master) 🤔

    Tried rebasing my suggestions-branch, and 528f79f010d1617163e4154753b85e04539af993 / #32835 (which I also reviewed :grimacing:) fixed the issue by waiting for indexes to sync up before restarting the node.

    would it be OK to squash 94389c28e1068ffcc116614d16ac3047eb3068e3 into bdd90ab25f5f3b05c03a8dfb7177399264bed377?

    Fine by me! Could add me as Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com> if you want.


    romanz commented at 6:54 PM on September 3, 2025:
  85. in src/node/blockstorage.h:420 in 4441827ef4 outdated
     416 | @@ -417,6 +417,8 @@ class BlockManager
     417 |  
     418 |      bool ReadBlockUndo(CBlockUndo& blockundo, const CBlockIndex& index) const;
     419 |  
     420 | +    bool ReadTxFromBlock(CTransactionRef& tx, const FlatFilePos& block_pos, size_t tx_index) const;
    


    hodlinator commented at 8:42 PM on August 26, 2025:

    nit: Could use more modern style instead of returning duplicate bool state (also nail down index type):

        CTransactionRef ReadTxFromBlock(const FlatFilePos& block_pos, uint32_t tx_index) const;
    

  86. in src/rest.cpp:484 in 4441827ef4 outdated
     479 | +    if (!CheckWarmup(req)) return false;
     480 | +
     481 | +    std::string param;
     482 | +    const RESTResponseFormat rf = ParseDataFormat(param, uri_part);
     483 | +
     484 | +    // request is sent over URI scheme /rest/txfromblock/blockhash-index
    


    hodlinator commented at 7:03 AM on August 27, 2025:

    Have you considered the possibility of supporting batches of block-hash+tx-index in one GET?

    rest_getutxos() does something similar: https://github.com/bitcoin/bitcoin/blob/4441827ef4e9b5fe306c5f0a81a52b5d2b5e0b69/src/rest.cpp#L1007

    No sure if it's best practices in the REST world though. Another option could be using a JSON blob in a HTTP header, but that may be even more frowned upon.

    cc @stickies-v who might have a feel for this kind of thing (came across his f959fc0397c3f3615e99bc28d2df549d9d52f277).


    romanz commented at 6:44 PM on September 3, 2025:

    Yes - but if it's OK, I would prefer to do it in a separate PR.

  87. hodlinator commented at 7:34 AM on August 27, 2025: contributor

    Reviewed 4441827ef4e9b5fe306c5f0a81a52b5d2b5e0b69

    Sorry for the bag of nits, take from it what you will. The questions have higher priority for sure. Tried adding coverage of LocationsIndex to test/functional/feature_init.py but ran into interference with TxIndex, could be some more general issue though. Haven't had time to uncover the reason yet.

    Suggestions+test branch: https://github.com/bitcoin/bitcoin/compare/4441827ef4e9b5fe306c5f0a81a52b5d2b5e0b69...hodlinator:bitcoin:pr/32541_suggestions

    Further research idea: It would be awesome to use sendfile(socket, filein, offset, count)/TransmitFile() for this to move that part to 100% kernel-space but not sure how we would cut through the abstractions for that: https://www.man7.org/linux/man-pages/man2/sendfile.2.html Edit regarding sendfile: Oof, this is somewhat impossible unless one turns off XOR obfuscation. libevent had native support for sendfile via evbuffer_add_file and I did a quick & dirty implementation for /rest/block/0000000000000000000515e202c8ae73c8155fc472422d7593af87aa74f2cf3d.bin (fetching the taproot wizards 3.96MB block). Measured a 27% speedup (avg. 89.3ms vs 122.6ms for 1000 runs each).

  88. romanz force-pushed on Aug 30, 2025
  89. romanz commented at 6:02 PM on August 30, 2025: contributor

    Many thanks for the review and the suggested fixes! Squashed most the fixes into the above commits and force-pushed: https://github.com/bitcoin/bitcoin/compare/4441827ef4e9b5fe306c5f0a81a52b5d2b5e0b69...5e1c80a22f3d3c6ecf6d4e2777d67656423dd3f3

  90. DrahtBot added the label CI failed on Aug 30, 2025
  91. DrahtBot commented at 7:08 PM on August 30, 2025: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. <sub>Task TSan, depends, no gui: https://github.com/bitcoin/bitcoin/runs/49248056402</sub> <sub>LLM reason (✨ experimental): ThreadSanitizer detected a data race (width in ios:501:18) causing the mptest to fail.</sub>

    <details><summary>Hints</summary>

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

    </details>

  92. romanz commented at 5:36 AM on August 31, 2025: contributor

    Rebasing to resolve a conflict with master.

  93. romanz force-pushed on Aug 31, 2025
  94. DrahtBot removed the label CI failed on Aug 31, 2025
  95. romanz requested review from hodlinator on Sep 2, 2025
  96. in test/functional/interface_rest.py:494 in 94389c28e1 outdated
     485 | +                assert tx_bytes.hex() == tx["hex"]
     486 | +                tx_hex = self.test_rest_request(f"/txfromblock/{blockhash}-{i}", req_type=ReqType.HEX, ret_type=RetType.BYTES).decode().strip()
     487 | +                assert tx_hex == tx["hex"]
     488 | +                tx_json = self.test_rest_request(f"/txfromblock/{blockhash}-{i}", req_type=ReqType.JSON, ret_type=RetType.JSON)
     489 | +                assert tx_json["txid"] == tx["txid"]
     490 | +                assert tx_json["hex"] == tx["hex"]
    


    hodlinator commented at 8:14 AM on September 3, 2025:

    Could cover both paths:

    diff --git a/test/functional/interface_rest.py b/test/functional/interface_rest.py
    index 541dde3174..333b8c3f79 100755
    --- a/test/functional/interface_rest.py
    +++ b/test/functional/interface_rest.py
    @@ -28,7 +28,7 @@ from test_framework.wallet import (
         MiniWallet,
         getnewdestination,
     )
    -from typing import Optional
    +from typing import Optional, NamedTuple
     
     
     INVALID_PARAM = "abc"
    @@ -53,7 +53,7 @@ def filter_output_indices_by_value(vouts, value):
     class RESTTest (BitcoinTestFramework):
         def set_test_params(self):
             self.num_nodes = 2
    -        self.extra_args = [["-rest", "-blockfilterindex=1", "-locationsindex=1"], []]
    +        self.extra_args = [["-rest", "-blockfilterindex=1", "-locationsindex=1"], ["-rest"]]
             # whitelist peers to speed up tx relay / mempool sync
             self.noban_tx_relay = True
             self.supports_cli = False
    @@ -67,14 +67,17 @@ class RESTTest (BitcoinTestFramework):
                 status: int = 200,
                 ret_type: RetType = RetType.JSON,
                 query_params: Optional[dict[str, typing.Any]] = None,
    +            url: Optional[NamedTuple] = None,
                 ) -> typing.Union[http.client.HTTPResponse, bytes, str, None]:
             rest_uri = '/rest' + uri
             if req_type in ReqType:
                 rest_uri += f'.{req_type.name.lower()}'
             if query_params:
                 rest_uri += f'?{urllib.parse.urlencode(query_params)}'
    +        if not url:
    +            url = self.url
     
    -        conn = http.client.HTTPConnection(self.url.hostname, self.url.port)
    +        conn = http.client.HTTPConnection(url.hostname, url.port)
             self.log.debug(f'{http_method} {rest_uri} {body}')
             if http_method == 'GET':
                 conn.request('GET', rest_uri)
    @@ -474,6 +477,7 @@ class RESTTest (BitcoinTestFramework):
             block_count = self.nodes[0].getblockcount()
             expected_index_info = {'synced': True, 'best_block_height': block_count}
             self.wait_until(lambda: self.nodes[0].getindexinfo()['locationsindex'] == expected_index_info)
    +        other_url = urllib.parse.urlparse(self.nodes[1].url)
             for height in range(0, block_count + 1):
                 blockhash = self.nodes[0].getblockhash(height)
                 block = self.nodes[0].getblock(blockhash, 2)
    @@ -489,5 +493,14 @@ class RESTTest (BitcoinTestFramework):
                     assert tx_json["txid"] == tx["txid"]
                     assert tx_json["hex"] == tx["hex"]
     
    +                tx_bytes_noindex = self.test_rest_request(f"/txfromblock/{blockhash}-{i}", req_type=ReqType.BIN, ret_type=RetType.BYTES, url=other_url)
    +                assert tx_bytes.hex() == tx["hex"]
    +                tx_hex_noindex = self.test_rest_request(f"/txfromblock/{blockhash}-{i}", req_type=ReqType.HEX, ret_type=RetType.BYTES, url=other_url).decode().strip()
    +                assert tx_hex == tx["hex"]
    +                tx_json_noindex = self.test_rest_request(f"/txfromblock/{blockhash}-{i}", req_type=ReqType.JSON, ret_type=RetType.JSON, url=other_url)
    +                assert tx_bytes_noindex == tx_bytes
    +                assert tx_hex_noindex   == tx_hex
    +                assert tx_json_noindex  == tx_json
    +
     if __name__ == '__main__':
         RESTTest(__file__).main()
    

    romanz commented at 6:53 PM on September 3, 2025:

    Force-pushed 94389c28e1...91408755df, with a similar change to check both enabled and disabled index.

  97. hodlinator commented at 9:18 AM on September 3, 2025: contributor

    Reviewed 94389c28e1068ffcc116614d16ac3047eb3068e3

    Measured a ~11% speed increase by enabling LocationsIndex, running on NVMe. Are you seeing greater improvements on other setups?

    -locationsindex=0

    ₿ hey -n 100000 http://localhost:8332/rest/txfromblock/0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe-42.bin
    
    Summary:
      Total:	2.3504 secs
      Slowest:	0.0193 secs
      Fastest:	0.0001 secs
      Average:	0.0012 secs
      Requests/sec:	42545.2261
    

    -locationsindex=1

    ₿ hey -n 100000 http://localhost:8332/rest/txfromblock/0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe-42.bin
    
    Summary:
      Total:  2.0951 secs
      Slowest:  0.0417 secs
      Fastest:  0.0001 secs
      Average:  0.0010 secs
      Requests/sec: 47730.5857
    
  98. romanz commented at 6:32 PM on September 3, 2025: contributor

    Are you seeing greater improvements on other setups?

    I think that the performance gain will increase when fetching transactions near the end of the block.

    For example, when fetching the 5000th transaction of block [#90005](/bitcoin-bitcoin/90005/) using ab -k -c 1 -n 100000[^1][^2], enabled locationsindex improves the performance ~19x (2.574ms → 0.136ms).

    <details>

    <summary> Details </summary>

    -locationsindex=1

    Document Path:          /rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
    Document Length:        234 bytes
    
    Concurrency Level:      1
    Time taken for tests:   13.598 seconds
    Complete requests:      100000
    Failed requests:        0
    Keep-Alive requests:    100000
    Total transferred:      33800000 bytes
    HTML transferred:       23400000 bytes
    Requests per second:    7353.85 [#/sec] (mean)
    Time per request:       0.136 [ms] (mean)
    Time per request:       0.136 [ms] (mean, across all concurrent requests)
    Transfer rate:          2427.34 [Kbytes/sec] received
    

    -locationsindex=0

    Document Path:          /rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
    Document Length:        234 bytes
    
    Concurrency Level:      1
    Time taken for tests:   257.400 seconds
    Complete requests:      100000
    Failed requests:        0
    Keep-Alive requests:    100000
    Total transferred:      33800000 bytes
    HTML transferred:       23400000 bytes
    Requests per second:    388.50 [#/sec] (mean)
    Time per request:       2.574 [ms] (mean)
    Time per request:       2.574 [ms] (mean, across all concurrent requests)
    Transfer rate:          128.24 [Kbytes/sec] received
    

    [^1]: ApacheBench 2.3 [^2]: testing with warm OS block cache, i.e. measuring mostly HTTP & (de-)serialization

    </details>

  99. romanz commented at 6:44 PM on September 3, 2025: contributor

    Ran the same test using hey 0.1.4 - enabling locationsindex improves the performance ~15.8x (2.963ms → 0.187ms)

    <details>

    <summary> Details </summary>

    -locationsindex=1

     hey -c 1 -n 10000 http://localhost:8332/rest/txfromblock/$BLOCKHASH-5000.bin
    
    Summary:
      Total:	1.8766 secs
      Slowest:	0.0686 secs
      Fastest:	0.0001 secs
      Average:	0.0002 secs
      Requests/sec:	5328.8354
      
      Total data:	2340000 bytes
      Size/request:	234 bytes
    
    Response time histogram:
      0.000 [1]	|
      0.007 [9998]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
      0.014 [0]	|
      0.021 [0]	|
      0.027 [0]	|
      0.034 [0]	|
      0.041 [0]	|
      0.048 [0]	|
      0.055 [0]	|
      0.062 [0]	|
      0.069 [1]	|
    
    
    Latency distribution:
      10% in 0.0001 secs
      25% in 0.0002 secs
      50% in 0.0002 secs
      75% in 0.0002 secs
      90% in 0.0002 secs
      95% in 0.0002 secs
      99% in 0.0003 secs
    
    Details (average, fastest, slowest):
      DNS+dialup:	0.0000 secs, 0.0001 secs, 0.0686 secs
      DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0003 secs
      req write:	0.0000 secs, 0.0000 secs, 0.0004 secs
      resp wait:	0.0002 secs, 0.0001 secs, 0.0685 secs
      resp read:	0.0000 secs, 0.0000 secs, 0.0005 secs
    
    Status code distribution:
      [200]	10000 responses
    

    -locationsindex=0

    hey -c 1 -n 10000 http://localhost:8332/rest/txfromblock/$BLOCKHASH-5000.bin
    
    Summary:
      Total:	29.6382 secs
      Slowest:	0.0546 secs
      Fastest:	0.0016 secs
      Average:	0.0030 secs
      Requests/sec:	337.4024
      
      Total data:	2340000 bytes
      Size/request:	234 bytes
    
    Response time histogram:
      0.002 [1]	|
      0.007 [9998]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
      0.012 [0]	|
      0.018 [0]	|
      0.023 [0]	|
      0.028 [0]	|
      0.033 [0]	|
      0.039 [0]	|
      0.044 [0]	|
      0.049 [0]	|
      0.055 [1]	|
    
    
    Latency distribution:
      10% in 0.0029 secs
      25% in 0.0030 secs
      50% in 0.0030 secs
      75% in 0.0031 secs
      90% in 0.0031 secs
      95% in 0.0032 secs
      99% in 0.0033 secs
    
    Details (average, fastest, slowest):
      DNS+dialup:	0.0000 secs, 0.0016 secs, 0.0546 secs
      DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0006 secs
      req write:	0.0000 secs, 0.0000 secs, 0.0004 secs
      resp wait:	0.0029 secs, 0.0016 secs, 0.0545 secs
      resp read:	0.0000 secs, 0.0000 secs, 0.0005 secs
    
    Status code distribution:
      [200]	10000 responses
    

    </details>

  100. rest: allow fetching specific transaction from a block 772ee5c973
  101. romanz force-pushed on Sep 3, 2025
  102. romanz marked this as a draft on Sep 3, 2025
  103. romanz force-pushed on Sep 4, 2025
  104. romanz marked this as ready for review on Sep 4, 2025
  105. romanz requested review from hodlinator on Sep 4, 2025
  106. romanz commented at 6:29 AM on September 4, 2025: contributor

    Also tested with wrk 4.1.0, with similar performance improvement.

    <details>

    <summary> Details </summary>

    -locationsindex=1

    $ wrk --latency -t 1 -c 1 -d 10s http://localhost:8332/rest/txfromblock/$BLOCKHASH-5000.bin
    Running 10s test @ http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
      1 threads and 1 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency   137.29us   33.60us   2.78ms   93.01%
        Req/Sec     7.20k   518.79     8.84k    73.27%
      Latency Distribution
         50%  136.00us
         75%  147.00us
         90%  166.00us
         99%  185.00us
      72377 requests in 10.10s, 24.23MB read
    Requests/sec:   7165.98
    Transfer/sec:      2.40MB
    
    $ wrk --latency -t 4 -c 4 -d 10s http://localhost:8332/rest/txfromblock/$BLOCKHASH-5000.bin
    Running 10s test @ http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
      4 threads and 4 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency   139.39us   25.96us   1.02ms   74.11%
        Req/Sec     7.08k   255.89     7.90k    68.73%
      Latency Distribution
         50%  138.00us
         75%  154.00us
         90%  169.00us
         99%  205.00us
      283910 requests in 10.10s, 95.04MB read
    Requests/sec:  28111.74
    Transfer/sec:      9.41MB
    

    -locationsindex=0

    $ wrk --latency -t 1 -c 1 -d 10s http://localhost:8332/rest/txfromblock/$BLOCKHASH-5000.bin
    Running 10s test @ http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
      1 threads and 1 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency     2.29ms  646.55us   4.66ms   65.27%
        Req/Sec   439.01    104.35   590.00     72.00%
      Latency Distribution
         50%    1.84ms
         75%    2.92ms
         90%    2.96ms
         99%    3.30ms
      4373 requests in 10.01s, 1.46MB read
    Requests/sec:    436.96
    Transfer/sec:    149.78KB
    
    $ wrk --latency -t 4 -c 4 -d 10s http://localhost:8332/rest/txfromblock/$BLOCKHASH-5000.bin
    Running 10s test @ http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
      4 threads and 4 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency     2.62ms  833.44us   5.88ms   58.59%
        Req/Sec   382.58     44.82   515.00     69.25%
      Latency Distribution
         50%    2.53ms
         75%    3.20ms
         90%    3.69ms
         99%    4.87ms
      15244 requests in 10.01s, 5.10MB read
    Requests/sec:   1523.25
    Transfer/sec:    522.13KB
    

    </details>

  107. index: store per-block transaction locations for efficient lookups
    Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    961f94b293
  108. in src/rest.cpp:505 in 91408755df outdated
     500 | +
     501 | +    CTransactionRef tx{};
     502 | +    std::vector<std::byte> tx_bytes{};
     503 | +    if (const LocationsIndex* locations_index = g_locations_index.get()) {
     504 | +        if (!locations_index->BlockUntilSyncedToCurrentChain()) {
     505 | +            RESTERR(req, HTTP_SERVICE_UNAVAILABLE, "Locations index is still syncing");
    


    hodlinator commented at 9:19 AM on September 4, 2025:

    Missing return here.

    This will lead to ReadRawTransaction() being called anyway and in my testing it returns an empty vector, which we then discover on line 533 and on the next line we try to use req again but internal pointers have been reset to null by the line of this comment, leading to a crash.


    romanz commented at 10:47 AM on September 4, 2025:

    Thanks! Fixed in 91408755df...961f94b293.

  109. romanz force-pushed on Sep 4, 2025
  110. hodlinator commented at 1:13 PM on September 4, 2025: contributor

    Waiting for indexing to complete, did a 1min perf dump: <img width="1684" height="1177" alt="image" src="https://github.com/user-attachments/assets/b3da4ca7-57d1-4737-849a-f0ad1216f306" />

    • 35.2% of the time is spent in the CTransaction constructor, eagerly computing hashes which LocationsIndex doesn't even use.
    • 11.7% is spent in the GetSerializeSize calls inside LocationsIndex::CustomAppend().

    Would be nice for the LocationsIndex if hashes were lazily computed (avoided) and transactions had a serialized size field that was cached upon deserialization so we don't need to call GetSerializeSize.

    Next-day edit:

    1. Noticed I was accidentally running a debug build in the measurements above.
    2. ~Realized that 80+% of the 35.2% tx hash computation above was spent in serializing the transaction we had just deserialized.~ Edit: The re-serialization turned out to be cheap in the end, SHA256-hashing is performed during serialization and is where most of the time goes. (At the peaks of the flamegraphs, cropped away in the screenshot).
  111. romanz commented at 6:28 PM on September 4, 2025: contributor

    Would be nice for the LocationsIndex if hashes were lazily computed (avoided) and transactions had a serialized size field that was cached upon deserialization so we don't need to call GetSerializeSize.

    In bindex library, I am using @RCasatta's bitcoin_slices crate which allows parsing Bitcoin blocks and transactions without extra allocations & hashing. Maybe we can use a similar approach for building the indices (since each index may need a different part of the indexed transaction)?

  112. hodlinator commented at 9:38 PM on September 4, 2025: contributor

    I think that the performance gain will increase when fetching transactions near the end of the block.

    Oof, I can confirm. Skipping over the serialization of all the prior transactions really helps. :D

    (Edit: Accidentally ran these benchmarks in debug, updated numbers still confirm the improvement)

    <details><summary>Details</summary>

    1 worker - 16x improvement

    ₿ hey -c 1 -n 10000 http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
    

    -locationsindex=0

    Summary:
      Total:  19.9502 secs
      Slowest:  0.0191 secs
      Fastest:  0.0019 secs
      Average:  0.0020 secs
      Requests/sec: 501.2476
    

    -locationsindex=1

    Summary:
      Total:	1.2325 secs
      Slowest:	0.0178 secs
      Fastest:	0.0001 secs
      Average:	0.0001 secs
      Requests/sec:	8113.5517
    

    16 workers - 8.5x improvement

    ₿ hey -c 16 -n 10000 http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
    

    -locationsindex=0

    Summary:
      Total:  2.6111 secs
      Slowest:  0.0222 secs
      Fastest:  0.0019 secs
      Average:  0.0042 secs
      Requests/sec: 3829.8051
    

    -locationsindex=1

    Summary:
      Total:  0.3054 secs
      Slowest:  0.0182 secs
      Fastest:  0.0001 secs
      Average:  0.0005 secs
      Requests/sec: 32744.4170
    

    </details>

  113. in test/functional/interface_rest.py:488 in 961f94b293
     483 | +            self.log.debug("Checking block: %s (%d txs)", blockhash, len(txs))
     484 | +            for i, tx in enumerate(txs):
     485 | +                results = []
     486 | +                for node in self.nodes:
     487 | +                    self.log.debug("Checking tx #%d: %s on %s", i, tx["txid"], node.url)
     488 | +                    self.url = urllib.parse.urlparse(node.url)
    


    hodlinator commented at 11:46 AM on September 5, 2025:

    info: Re-parsing for every transaction in every block seems wasteful, but I couldn't measure any improvement from doing:

    diff --git a/test/functional/interface_rest.py b/test/functional/interface_rest.py
    index 34ae549cf4..89969e50f9 100755
    --- a/test/functional/interface_rest.py
    +++ b/test/functional/interface_rest.py
    @@ -476,6 +476,7 @@ class RESTTest (BitcoinTestFramework):
             self.wait_until(lambda: self.nodes[0].getindexinfo()['locationsindex'] == expected_index_info)
             assert self.nodes[1].getindexinfo() == {}  # locationsindex is disabled
     
    +        urls = (self.url, urllib.parse.urlparse(self.nodes[1].url))
             for height in range(0, block_count + 1):
                 blockhash = self.nodes[0].getblockhash(height)
                 block = self.nodes[0].getblock(blockhash, 2)
    @@ -483,9 +484,9 @@ class RESTTest (BitcoinTestFramework):
                 self.log.debug("Checking block: %s (%d txs)", blockhash, len(txs))
                 for i, tx in enumerate(txs):
                     results = []
    -                for node in self.nodes:
    +                for j, node in enumerate(self.nodes):
                         self.log.debug("Checking tx #%d: %s on %s", i, tx["txid"], node.url)
    -                    self.url = urllib.parse.urlparse(node.url)
    +                    self.url = urls[j]
                         tx_bytes = self.test_rest_request(f"/txfromblock/{blockhash}-{i}", req_type=ReqType.BIN, ret_type=RetType.BYTES)
                         assert tx_bytes.hex() == tx["hex"]
                         tx_hex = self.test_rest_request(f"/txfromblock/{blockhash}-{i}", req_type=ReqType.HEX, ret_type=RetType.BYTES).decode().strip()
    
  114. in src/index/locationsindex.cpp:37 in 961f94b293
      32 | +        READWRITE(prefix);
      33 | +        if (prefix != DB_BLOCK_HASH) {
      34 | +            throw std::ios_base::failure("Invalid format for location index DB hash key");
      35 | +        }
      36 | +
      37 | +        READWRITE(obj.hash);
    


    hodlinator commented at 9:39 AM on September 8, 2025:

    nit: If you re-touch, could be more explicit with block_hash + curly initialization by default to avoid unintentional narrowing:

        uint256 block_hash;
        uint32_t row; // allow splitting one block's transactions into multiple DB rows
    
        explicit DBKey(const uint256& block_hash_in, uint32_t row_in) : block_hash{block_hash_in}, row{row_in} {}
    
        SERIALIZE_METHODS(DBKey, obj)
        {
            uint8_t prefix{DB_BLOCK_HASH};
            READWRITE(prefix);
            if (prefix != DB_BLOCK_HASH) {
                throw std::ios_base::failure("Invalid format for location index DB hash key");
            }
    
            READWRITE(obj.block_hash);
    
  115. hodlinator commented at 12:34 PM on September 8, 2025: contributor

    Higher level flow:

    sequenceDiagram
        actor User
        participant C as Electrum<br/>Client
        participant S as Electrum<br/>Server
        participant R as bitcoind<br/>REST interface
        participant L as LocationsIndex
        participant B as blockstorage
    
        User->>+C: View wallet history
        C->>+S: blockchain.scripthash.get_history(scripthash) [1]
        note over C,S: https://electrum-protocol.readthedocs.io<br/>/en/latest/protocol-methods.html#blockchain-scripthash-get-history
        loop Fetch history
            S->>+R: GET /rest/txfromblock/BLOCKHASH-N.bin
            R->>+L: LocationsIndex::ReadRawTransaction(block_hash, n)
            note over L: BLOCKHASH+N is mapped to a LevelDB key with a compressed N-value
            L->>+B: BlockManager::OpenBlockFile()
            B->>-L: AutoFile
            note over L: Read span of bytes for transaction
            L->>-R: bytes
            R->>-S: bytes
            S->>-C: bytes
        end
        C->>-User: tx history
    

    Q1: Could LocationsIndex live on the Electrum Server (bindex-rs) side, with bitcoind instead having a REST endpoint for reading a span of bytes from a given block?


    Q2: Why is it better to implement a slow fallback in blockstorage than to fail over to requiring TxIndex? Is it a question of not wanting to store both blockhash+N and txid in case users switch around their configurations/bitcoind servers?


    Would be nice for the LocationsIndex if hashes were lazily computed (avoided) and transactions had a serialized size field that was cached upon deserialization so we don't need to call GetSerializeSize.

    In bindex library, I am using @RCasatta's bitcoin_slices crate which allows parsing Bitcoin blocks and transactions without extra allocations & hashing. Maybe we can use a similar approach for building the indices (since each index may need a different part of the indexed transaction)?

    Yeah, would be sweet if we could just allocate the block into a 4MB scratch buffer without any transformations and maybe even have the tx hashes appended to the end of blocks so they don't need to be recomputed (could still have some simple CRC over block + hashes to catch storage medium failure).

  116. romanz commented at 6:44 PM on September 8, 2025: contributor

    Q1: Could LocationsIndex live on the Electrum Server (bindex-rs) side, with bitcoind instead having a REST endpoint for reading a span of bytes from a given block?

    Great idea, thanks! I would try this approach in a separate bindex branch, to evaluate its performance.

    Q2: Why is it better to implement a slow fallback in blockstorage than to fail over to requiring TxIndex? Is it a question of not wanting to store both blockhash+N and txid in case users switch around their configurations/bitcoind servers?

    The drawback of TxIndex is its size (currently ~60GB). It also requires bindex to use txids (to query TxIndex), whose storage is estimated to take additional tens of GBs (~1B txids, each being 32-byte pseudo-random values).

    Currently bindex is using 4-byte txnum^1 (the "position" of the tx in the chain, when enumerating all the transactions from genesis), which can be used to efficiently find its block height and its offset within the block - without using txids :)

  117. DrahtBot added the label Needs rebase on Sep 9, 2025
  118. DrahtBot commented at 1:38 AM on September 9, 2025: contributor

    <!--cf906140f33d8803c4a75a2196329ecb-->

    🐙 This pull request conflicts with the target branch and needs rebase.

  119. TheCharlatan commented at 7:56 PM on October 11, 2025: contributor

    I would try this approach in a separate bindex branch, to evaluate its performance.

    What came of this @romanz?

  120. romanz commented at 8:05 PM on October 11, 2025: contributor

    Sorry, unfortunately I didn't have the time to implement and benchmark the new approach in the last month - but I expect it to have similar lookup performance, and since it wouldn'd require an additional Bitcoin Core index, I will prefer it over my original approach. WDYT?

  121. TheCharlatan commented at 8:18 PM on October 11, 2025: contributor

    I expect it to have similar lookup performance, and since it wouldn'd require an additional Bitcoin Core index, I will prefer it over my original approach.

    If the performance is similar, that sounds like the better course of action. I guess you'd still need the REST endpoint for retrieving the actual transaction?

  122. romanz commented at 8:50 PM on October 11, 2025: contributor

    Yes - it will need way to fetch partial block data, e.g. https://github.com/romanz/bitcoin/commit/5a319571e728de009fb81a27d2bb86f8f8811d09

  123. romanz referenced this in commit 11e3a3cd48 on Oct 19, 2025
  124. romanz referenced this in commit 787d6df48a on Oct 19, 2025
  125. romanz commented at 2:57 PM on October 19, 2025: contributor

    Closing this PR, the "index-less" approach will be implemented in #33657.

  126. romanz closed this on Oct 19, 2025


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-01 03:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me