index: store per-block transaction locations for efficient lookups #32541

pull romanz wants to merge 2 commits into bitcoin:master from romanz:locations-index changing 15 files +484 −6
  1. romanz commented at 7:27 am on May 17, 2025: contributor

    Currently, electrs and other indexers map between an address/scripthash to the list of the relevant transactions.

    However, in order to fetch those transactions from bitcoind, electrs relies on reading the whole block and post-filtering for a specific transaction1. Other indexers use a txindex to fetch a transaction using its txid 234.

    The above approach has significant storage and CPU overhead, since the txid is a pseudo-random 32-byte value.

    This PR is adding support for using the transaction’s position within its block to be able to fetch it directly using REST API, using the following HTTP request (to fetch the N-th transaction from BLOCKHASH):

    0GET /rest/txfromblock/BLOCKHASH-N.bin
    

    If binary response format is used, the transaction data will be read directly from the storage and sent back to the client, without any deserialization overhead.

    The resulting index is much smaller (allowing it to be cached in RAM):

    0$ du -sh indexes/locations/ indexes/txindex/
    12.5G	indexes/locations/
    257G	indexes/txindex/
    

    The new index is using the following DB schema:

    0struct DBKey {
    1    uint256 hash;   // blockhash
    2    uint32_t row;   // allow splitting one block's transactions into multiple DB rows
    3};
    4
    5struct DBValue {
    6    FlatFilePos block_pos;          // file id + offset of the block
    7    std::vector<uint32_t> offsets;  // a list of transaction offsets within the block
    8};
    

    For example, when fetching the 5000th transaction of [block #90005](https://mempool.space/block/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec) using ab -k -c 1 -n 100000, enabled locationsindex improves the performance ~19x (2.574ms → 0.136ms).

    I am working on a proof-of-concept indexer (https://github.com/romanz/bindex-rs) which is using #32540 & #32541 - please let me know if there are any questions/comments/concerns :)

  2. DrahtBot commented at 7:28 am on May 17, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32541.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK TheCharlatan, hodlinator

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30469 (index: Fix coinstats overflow by fjahr)
    • #26966 (index: initial sync speedup, parallelize process by furszy)
    • #17783 (common: Disallow calling IsArgSet() on ALLOW_LIST options by ryanofsky)
    • #17581 (refactor: Remove settings merge reverse precedence code by ryanofsky)
    • #17580 (refactor: Add ALLOW_LIST flags and enforce usage in CheckArgFlags by ryanofsky)
    • #17493 (util: Forbid ambiguous multiple assignments in config file by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label UTXO Db and Indexes on May 17, 2025
  4. DrahtBot added the label CI failed on May 17, 2025
  5. DrahtBot commented at 7:37 am on May 17, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/runs/42402043332 LLM reason (✨ experimental): The CI failure is due to missing include guards in src/index/locationsindex.h.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  6. romanz force-pushed on May 17, 2025
  7. TheCharlatan commented at 9:37 am on May 17, 2025: contributor

    Concept ACK

    Can you add the schema of the index and the expected arguments for the REST API to the pull request description? I was a bit confused at first if this now exposes the file position, but if I read it correctly now, this just allows querying a transaction by its index in the block.

  8. romanz commented at 10:00 am on May 17, 2025: contributor

    Concept ACK

    Thanks!

    Can you add the schema of the index and the expected arguments for the REST API to the pull request description?

    Sure - updated in #32541#issue-3070502385.

  9. romanz force-pushed on May 17, 2025
  10. romanz commented at 11:58 am on May 17, 2025: contributor
    Fixed a few issues (following SonarQube run).
  11. DrahtBot removed the label CI failed on May 17, 2025
  12. luke-jr commented at 6:15 pm on May 20, 2025: member
    How does this compare to getrawtransaction <txid> 0 <blockhash> without a txindex?
  13. romanz commented at 9:51 am on May 21, 2025: contributor

    I have used ApacheBench 2.3 for benchmarking REST API, and the following Rust client for getrawtransaction RPC:

    fetching using the new index

     0$ ab -k -c 1 -n 100000 http://localhost:8332/rest/txfromblock/0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe-42.bin
     1
     2Document Path:          /rest/txfromblock/0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe-42.bin
     3Document Length:        301 bytes
     4
     5Concurrency Level:      1
     6Time taken for tests:   13.760 seconds
     7Complete requests:      100000
     8Failed requests:        0
     9Keep-Alive requests:    100000
    10Total transferred:      40500000 bytes
    11HTML transferred:       30100000 bytes
    12Requests per second:    7267.65 [#/sec] (mean)
    13Time per request:       0.138 [ms] (mean)
    14Time per request:       0.138 [ms] (mean, across all concurrent requests)
    15Transfer rate:          2874.41 [Kbytes/sec] received
    

    fetching using txindex

     0$ ab -k -c 1 -n 100000 http://localhost:8332/rest/tx/4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a.bin
     1
     2Document Path:          /rest/tx/4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a.bin
     3Document Length:        301 bytes
     4
     5Concurrency Level:      1
     6Time taken for tests:   14.075 seconds
     7Complete requests:      100000
     8Failed requests:        0
     9Keep-Alive requests:    100000
    10Total transferred:      40500000 bytes
    11HTML transferred:       30100000 bytes
    12Requests per second:    7104.78 [#/sec] (mean)
    13Time per request:       0.141 [ms] (mean)
    14Time per request:       0.141 [ms] (mean, across all concurrent requests)
    15Transfer rate:          2810.00 [Kbytes/sec] received
    

    fetching without txindex

    0time cargo run --release -- 4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a 0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe
    1    Finished `release` profile [optimized] target(s) in 0.02s
    2     Running `target/release/bench-getrawtx 4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a 0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe`
    3iterations = 1000
    4average RPC duration = 8.563491ms
    5
    6real	0m8.628s
    7user	0m0.070s
    8sys	0m0.052s
    

    Conclusions

    • The new LocationsIndex is only a few percent faster than the old TxIndex, but the on-disk footprint is ~22x smaller.

    • getrawtransaction which is used in the last benchmark has an average RPC duration of ~8.6ms vs ~0.14ms for the ones above.

  14. DrahtBot added the label CI failed on Jun 13, 2025
  15. DrahtBot commented at 5:34 pm on June 13, 2025: contributor

    🚧 At least one of the CI tasks failed. Task previous releases, depends DEBUG: https://github.com/bitcoin/bitcoin/runs/42406243587 LLM reason (✨ experimental): The CI failure is caused by a missing header file test/util/index.h during compilation.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  16. romanz force-pushed on Jun 13, 2025
  17. romanz commented at 7:02 pm on June 13, 2025: contributor
    Rebased to fix #32541 (comment).
  18. DrahtBot removed the label CI failed on Jun 13, 2025
  19. romanz marked this as a draft on Jun 14, 2025
  20. romanz marked this as ready for review on Jun 14, 2025
  21. TheCharlatan commented at 2:34 pm on June 15, 2025: contributor

    How does this compare to getrawtransaction 0 without a txindex?

    As far as I understand the index makes this operation faster by not requiring to read the entire block and then iterating through the transactions to find the match, which I am guessing is what the last benchmark is showing. romanz, would this new endpoint be used while creating the entire index initially, or to serve certain requests? It is not quite clear to me when an indexing client wouldn’t want to read through the entire block and instead only get its transactions selectively.

  22. romanz commented at 5:20 pm on June 15, 2025: contributor

    As far as I understand the index makes this operation faster by not requiring to read the entire block and then iterating through the transactions to find the match

    Correct - the proposed index improves the performance of fetching a single transaction (similar to txindex), requiring significantly less storage.

    would this new endpoint be used while creating the entire index initially, or to serve certain requests?

    I would like the new index to be used to serve history-related queries. For example, https://electrum-protocol.readthedocs.io/en/latest/protocol-methods.html#blockchain-scripthash-get-history.

    You are right that during the history indexing process, the client doesn’t need the proposed index, since it needs to read both the entire block (and undo) data in order to map from ScriptPubKey to list of { block hash + transaction index within the block }.

    BTW, I am working on a proof-of-concept indexer (https://github.com/romanz/bindex-rs) which is using #32540 & #32541 - please let me know if there are any questions/comments/concerns :)

  23. in src/rest.cpp:412 in d962c9a917 outdated
    407+        }
    408+    }
    409+
    410+    const LocationsIndex* locations_index = g_locations_index.get();
    411+    if (!locations_index) {
    412+        return RESTERR(req, HTTP_BAD_REQUEST, "Locations index is not enabled");
    


    maflcko commented at 12:04 pm on June 20, 2025:

    not sure about returning an error here. Should be easy to implement a fallback?

    This should also make it easier to benchmark the benefit of the index.

    (I guess it is clear that RPC with json overhead eats some CPU, but out of curiosity I wonder what the difference is between reading the db value (tx location) + reading the tx versus reading the full block and implementing a raw-tx-seeker to jump to the raw bytes of the tx in the block. This should be possible by just reading the length prefixes and then skipping them. I guess the worst case scenario is when its the last transaction, where the prior transaction(s) are mostly filled with minimally sized length prefixes)


    romanz commented at 3:35 pm on June 20, 2025:

    Thanks!

    This should also make it easier to benchmark the benefit of the index.

    Sounds good - will implement a fallback.


    romanz commented at 1:29 pm on June 21, 2025:
    Implemented in df45388b97.

    romanz commented at 6:39 am on June 22, 2025:

    Tested using the following Testnet4 block:

    -locationsindex=1

     0$ BLOCKHASH=00000000000000047b26e3ef164e30f0475c12408d0ece060f90f722aa90787b
     1$ ab -k -c 1 -n 100000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-0.bin
     2
     3Document Path:          /rest/txfromblock/00000000000000047b26e3ef164e30f0475c12408d0ece060f90f722aa90787b-0.bin
     4Document Length:        273 bytes
     5
     6Concurrency Level:      1
     7Time taken for tests:   2.983 seconds
     8Complete requests:      100000
     9Failed requests:        0
    10Keep-Alive requests:    100000
    11Total transferred:      37700000 bytes
    12HTML transferred:       27300000 bytes
    13Requests per second:    33527.87 [#/sec] (mean)
    14Time per request:       0.030 [ms] (mean)
    15Time per request:       0.030 [ms] (mean, across all concurrent requests)
    16Transfer rate:          12343.76 [Kbytes/sec] received
    17
    18$ ab -k -c 1 -n 100000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-1000.bin
    19
    20Document Path:          /rest/txfromblock/00000000000000047b26e3ef164e30f0475c12408d0ece060f90f722aa90787b-1000.bin
    21Document Length:        308 bytes
    22
    23Concurrency Level:      1
    24Time taken for tests:   3.122 seconds
    25Complete requests:      100000
    26Failed requests:        0
    27Keep-Alive requests:    100000
    28Total transferred:      41200000 bytes
    29HTML transferred:       30800000 bytes
    30Requests per second:    32032.96 [#/sec] (mean)
    31Time per request:       0.031 [ms] (mean)
    32Time per request:       0.031 [ms] (mean, across all concurrent requests)
    33Transfer rate:          12888.26 [Kbytes/sec] received
    34
    35$ ab -k -c 1 -n 100000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-2000.bin
    36
    37Document Path:          /rest/txfromblock/00000000000000047b26e3ef164e30f0475c12408d0ece060f90f722aa90787b-2000.bin
    38Document Length:        308 bytes
    39
    40Concurrency Level:      1
    41Time taken for tests:   3.030 seconds
    42Complete requests:      100000
    43Failed requests:        0
    44Keep-Alive requests:    100000
    45Total transferred:      41200000 bytes
    46HTML transferred:       30800000 bytes
    47Requests per second:    33003.94 [#/sec] (mean)
    48Time per request:       0.030 [ms] (mean)
    49Time per request:       0.030 [ms] (mean, across all concurrent requests)
    50Transfer rate:          13278.93 [Kbytes/sec] received
    51
    52$ ab -k -c 1 -n 100000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-3000.bin
    53
    54Document Path:          /rest/txfromblock/00000000000000047b26e3ef164e30f0475c12408d0ece060f90f722aa90787b-3000.bin
    55Document Length:        309 bytes
    56
    57Concurrency Level:      1
    58Time taken for tests:   3.362 seconds
    59Complete requests:      100000
    60Failed requests:        0
    61Keep-Alive requests:    100000
    62Total transferred:      41300000 bytes
    63HTML transferred:       30900000 bytes
    64Requests per second:    29745.72 [#/sec] (mean)
    65Time per request:       0.034 [ms] (mean)
    66Time per request:       0.034 [ms] (mean, across all concurrent requests)
    67Transfer rate:          11997.05 [Kbytes/sec] received
    68
    69$ ab -k -c 1 -n 100000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-4000.bin
    70
    71Document Path:          /rest/txfromblock/00000000000000047b26e3ef164e30f0475c12408d0ece060f90f722aa90787b-4000.bin
    72Document Length:        309 bytes
    73
    74Concurrency Level:      1
    75Time taken for tests:   2.854 seconds
    76Complete requests:      100000
    77Failed requests:        0
    78Keep-Alive requests:    100000
    79Total transferred:      41300000 bytes
    80HTML transferred:       30900000 bytes
    81Requests per second:    35040.53 [#/sec] (mean)
    82Time per request:       0.029 [ms] (mean)
    83Time per request:       0.029 [ms] (mean, across all concurrent requests)
    84Transfer rate:          14132.56 [Kbytes/sec] received
    

    -locationsindex=0

     0$ ab -k -c 1 -n 100000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-0.bin
     1
     2Concurrency Level:      1
     3Time taken for tests:   3.245 seconds
     4Complete requests:      100000
     5Failed requests:        0
     6Keep-Alive requests:    100000
     7Total transferred:      37700000 bytes
     8HTML transferred:       27300000 bytes
     9Requests per second:    30819.31 [#/sec] (mean)
    10Time per request:       0.032 [ms] (mean)
    11Time per request:       0.032 [ms] (mean, across all concurrent requests)
    12Transfer rate:          11346.56 [Kbytes/sec] received
    13
    14$ ab -k -c 1 -n 10000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-1000.bin
    15
    16Concurrency Level:      1
    17Time taken for tests:   4.588 seconds
    18Complete requests:      10000
    19Failed requests:        0
    20Keep-Alive requests:    10000
    21Total transferred:      4120000 bytes
    22HTML transferred:       3080000 bytes
    23Requests per second:    2179.68 [#/sec] (mean)
    24Time per request:       0.459 [ms] (mean)
    25Time per request:       0.459 [ms] (mean, across all concurrent requests)
    26Transfer rate:          876.98 [Kbytes/sec] received
    27
    28$ ab -k -c 1 -n 10000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-2000.bin
    29
    30Concurrency Level:      1
    31Time taken for tests:   9.388 seconds
    32Complete requests:      10000
    33Failed requests:        0
    34Keep-Alive requests:    10000
    35Total transferred:      4120000 bytes
    36HTML transferred:       3080000 bytes
    37Requests per second:    1065.20 [#/sec] (mean)
    38Time per request:       0.939 [ms] (mean)
    39Time per request:       0.939 [ms] (mean, across all concurrent requests)
    40Transfer rate:          428.58 [Kbytes/sec] received
    41
    42$ ab -k -c 1 -n 10000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-3000.bin
    43
    44Concurrency Level:      1
    45Time taken for tests:   12.819 seconds
    46Complete requests:      10000
    47Failed requests:        0
    48Keep-Alive requests:    10000
    49Total transferred:      4130000 bytes
    50HTML transferred:       3090000 bytes
    51Requests per second:    780.10 [#/sec] (mean)
    52Time per request:       1.282 [ms] (mean)
    53Time per request:       1.282 [ms] (mean, across all concurrent requests)
    54Transfer rate:          314.63 [Kbytes/sec] received
    55
    56$ ab -k -c 1 -n 10000 http://localhost:48332/rest/txfromblock/$BLOCKHASH-4000.bin
    57
    58Concurrency Level:      1
    59Time taken for tests:   17.962 seconds
    60Complete requests:      10000
    61Failed requests:        0
    62Keep-Alive requests:    10000
    63Total transferred:      4130000 bytes
    64HTML transferred:       3090000 bytes
    65Requests per second:    556.73 [#/sec] (mean)
    66Time per request:       1.796 [ms] (mean)
    67Time per request:       1.796 [ms] (mean, across all concurrent requests)
    68Transfer rate:          224.54 [Kbytes/sec] received
    
  24. maflcko commented at 12:05 pm on June 20, 2025: member
    lgtm. I wonder if there should be a fallback?
  25. romanz force-pushed on Jun 21, 2025
  26. romanz commented at 9:58 am on June 21, 2025: contributor
    Rebased over master to use std::vector<std::byte> (following #32743).
  27. romanz marked this as a draft on Jun 21, 2025
  28. DrahtBot added the label CI failed on Jun 21, 2025
  29. DrahtBot commented at 2:43 pm on June 21, 2025: contributor

    🚧 At least one of the CI tasks failed. Task MSan, depends: https://github.com/bitcoin/bitcoin/runs/44528503073 LLM reason (✨ experimental): The CI failure is caused by the abort in the locationsindex_tests subprocess.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  30. romanz force-pushed on Jun 21, 2025
  31. romanz force-pushed on Jun 21, 2025
  32. romanz marked this as ready for review on Jun 21, 2025
  33. DrahtBot removed the label CI failed on Jun 21, 2025
  34. romanz requested review from maflcko on Jun 21, 2025
  35. romanz commented at 7:32 am on June 22, 2025: contributor
    BTW, if I understand correctly, this index may also improve the performance of SendBlockTransactions during handling of GETBLOCKTXN requests.
  36. in src/node/blockstorage.cpp:1182 in df45388b97 outdated
    1177+    size_t tx_count = ReadCompactSize(filein);
    1178+    if (tx_index >= tx_count) {
    1179+        return false;
    1180+    }
    1181+
    1182+    SkipTransaction skip_tx;
    


    maflcko commented at 1:20 pm on June 23, 2025:

    thanks for implementing this idea. Though, at least locally on my system it looks like it didn’t improve the end-to-end performance for me, at least not in a significant benchmark.

    So it seems fine to use the “normal” deserialization and even go through a redundant ser-deser loop?

     0diff --git a/src/node/blockstorage.cpp b/src/node/blockstorage.cpp
     1index a049122894..64cb914cb1 100644
     2--- a/src/node/blockstorage.cpp
     3+++ b/src/node/blockstorage.cpp
     4@@ -1179,17 +1179,14 @@ bool BlockManager::ReadRawTxFromBlock(std::vector<std::byte>& tx_bytes, const Fl
     5         return false;
     6     }
     7 
     8-    SkipTransaction skip_tx;
     9+    CMutableTransaction skip_tx;
    10     for (size_t i = 0; i < tx_index; ++i) {
    11-        filein >> skip_tx;
    12+        filein >> TX_WITH_WITNESS(skip_tx);
    13     }
    14-    int64_t tx_start = filein.tell();
    15-    filein >> skip_tx;
    16-    int64_t tx_end = filein.tell();
    17+        filein >> TX_WITH_WITNESS(skip_tx);
    18 
    19-    tx_bytes.resize(tx_end - tx_start);
    20-    filein.seek(tx_start, SEEK_SET);
    21-    filein.read(tx_bytes);
    22+    tx_bytes.resize(0);
    23+    VectorWriter {*(std::vector<unsigned char> *)&tx_bytes,tx_bytes.size()}<<TX_WITH_WITNESS(skip_tx);
    24     return true;
    25 }
    26 
    

    maflcko commented at 3:16 pm on June 23, 2025:
    Wrapping the autofile into BufferedReader helped more on my system with reading later transactions in the block via this fallback branch.

    romanz commented at 8:02 pm on June 23, 2025:

    romanz commented at 5:11 pm on June 26, 2025:

    Benchmark results

    Tested 99b48777ee using ab -k -c 1 -n 10000 over block 900005 (having >5000 txs).

    0BLOCKHASH=000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec
    

    -locationsindex=1

     0Document Path:          /rest/txfromblock/$BLOCKHASH-0.bin
     1Document Length:        384 bytes
     2Requests per second:    7387.93 [#/sec] (mean)
     3Time per request:       0.135 [ms] (mean)
     4
     5Document Path:          /rest/txfromblock/$BLOCKHASH-1000.bin
     6Document Length:        234 bytes
     7Requests per second:    7875.35 [#/sec] (mean)
     8Time per request:       0.127 [ms] (mean)
     9
    10Document Path:          /rest/txfromblock/$BLOCKHASH-2000.bin
    11Document Length:        234 bytes
    12Requests per second:    7470.15 [#/sec] (mean)
    13Time per request:       0.134 [ms] (mean)
    14
    15Document Path:          /rest/txfromblock/$BLOCKHASH-3000.bin
    16Document Length:        220 bytes
    17Requests per second:    7102.34 [#/sec] (mean)
    18Time per request:       0.141 [ms] (mean)
    19
    20Document Path:          /rest/txfromblock/$BLOCKHASH-4000.bin
    21Document Length:        221 bytes
    22Requests per second:    7569.49 [#/sec] (mean)
    23Time per request:       0.132 [ms] (mean)
    24
    25Document Path:          /rest/txfromblock/$BLOCKHASH-5000.bin
    26Document Length:        234 bytes
    27Requests per second:    7333.99 [#/sec] (mean)
    28Time per request:       0.136 [ms] (mean)
    

    -locationsindex=0

     0Document Path:          /rest/txfromblock/$BLOCKHASH-0.bin
     1Document Length:        384 bytes
     2Requests per second:    4030.87 [#/sec] (mean)
     3Time per request:       0.248 [ms] (mean)
     4
     5Document Path:          /rest/txfromblock/$BLOCKHASH-1000.bin
     6Document Length:        234 bytes
     7Requests per second:    764.46 [#/sec] (mean)
     8Time per request:       1.308 [ms] (mean)
     9
    10Document Path:          /rest/txfromblock/$BLOCKHASH-2000.bin
    11Document Length:        234 bytes
    12Requests per second:    526.17 [#/sec] (mean)
    13Time per request:       1.901 [ms] (mean)
    14
    15Document Path:          /rest/txfromblock/$BLOCKHASH-3000.bin
    16Document Length:        220 bytes
    17Requests per second:    391.21 [#/sec] (mean)
    18Time per request:       2.556 [ms] (mean)
    19
    20Document Path:          /rest/txfromblock/$BLOCKHASH-4000.bin
    21Document Length:        221 bytes
    22Requests per second:    261.30 [#/sec] (mean)
    23Time per request:       3.827 [ms] (mean)
    24
    25Document Path:          /rest/txfromblock/$BLOCKHASH-5000.bin
    26Document Length:        234 bytes
    27Requests per second:    208.15 [#/sec] (mean)
    28Time per request:       4.804 [ms] (mean)
    
  37. maflcko approved
  38. maflcko commented at 1:20 pm on June 23, 2025: member
    lgtm
  39. maflcko commented at 1:22 pm on June 23, 2025: member

    BTW, if I understand correctly, this index may also improve the performance of SendBlockTransactions during handling of GETBLOCKTXN requests.

    This is only the fallback where the recent block fell out of memory. If this happens, the performance either shouldn’t matter, or it may be better to keep more recent blocks in memory than just a single one.

  40. romanz force-pushed on Jun 23, 2025
  41. romanz force-pushed on Jun 28, 2025
  42. romanz commented at 6:43 am on June 28, 2025: contributor
    Rebased over master and fixed rest_tx_from_block() argument name case (in addition to #32825).
  43. romanz force-pushed on Jun 30, 2025
  44. romanz commented at 1:13 pm on June 30, 2025: contributor
    Rebased over master after #32825 is merged (to resolve a small conflict).
  45. romanz commented at 6:07 pm on July 5, 2025: contributor
    Added a short description in doc/REST-interface.md and doc/files.md, and updated release notes in https://github.com/bitcoin/bitcoin/pull/32541/commits/c695d134683d52bce9e499a5848e4c4c7951155c. Please let me know if there are any additional open issues :)
  46. romanz commented at 6:54 pm on July 14, 2025: contributor
    Ping :)
  47. romanz force-pushed on Jul 14, 2025
  48. romanz force-pushed on Jul 19, 2025
  49. romanz commented at 5:18 pm on July 19, 2025: contributor

    I have split the content of this PR into 2 commits:

    • 784459ac79 is adding the new /rest/txfromblock/ endpoint
    • 1b928f58fc is adding an optional locationsindex for optimizing lookups
  50. in src/node/blockstorage.cpp:1083 in 784459ac79 outdated
    1079@@ -1080,6 +1080,32 @@ bool BlockManager::ReadRawBlock(std::vector<std::byte>& block, const FlatFilePos
    1080     return true;
    1081 }
    1082 
    1083+static const uint32_t HEADER_SIZE = ::GetSerializeSize(CBlockHeader{});
    


    hodlinator commented at 6:25 pm on August 2, 2025:
    Unused variable. Was this supposed to be used to skip forward in the stream instead of deserializing the header?

    romanz commented at 6:43 am on August 3, 2025:

    Thanks - removed.

    Was this supposed to be used to skip forward in the stream instead of deserializing the header?

    Yes, but now it’s not needed here.


    hodlinator commented at 7:17 am on August 4, 2025:
    Better to not add it in 784459ac79a89f591eb52a4c6c266c421ca8baec?

    romanz commented at 11:32 am on August 4, 2025:

    Definitely - should be fixed now:

     0$ git diff 784459ac79 104b40c8c9
     1diff --git a/src/node/blockstorage.cpp b/src/node/blockstorage.cpp
     2index c8d824bb6c..05ec73697c 100644
     3--- a/src/node/blockstorage.cpp
     4+++ b/src/node/blockstorage.cpp
     5@@ -1080,8 +1080,6 @@ bool BlockManager::ReadRawBlock(std::vector<std::byte>& block, const FlatFilePos
     6     return true;
     7 }
     8 
     9-static const uint32_t HEADER_SIZE = ::GetSerializeSize(CBlockHeader{});
    10-
    11 bool BlockManager::ReadTxFromBlock(CTransactionRef& tx, const FlatFilePos& block_pos, size_t tx_index) const
    12 {
    13     AutoFile file{OpenBlockFile(block_pos, /*fReadOnly=*/true)};
    
  51. in src/index/locationsindex.cpp:70 in 1b928f58fc outdated
    65+    fs::create_directories(path);
    66+
    67+    m_db = std::make_unique<BaseIndex::DB>(path / "db", n_cache_size, f_memory, f_wipe);
    68+}
    69+
    70+static const uint32_t HEADER_SIZE = ::GetSerializeSize(CBlockHeader{});
    


    hodlinator commented at 6:29 pm on August 2, 2025:
    Would prefer it be moved to function-local scope so we don’t add any startup time for those who never enable -locationsindex or use the REST interface.

    romanz commented at 6:43 am on August 3, 2025:
    Thanks - moved into LocationsIndex::CustomAppend().
  52. in doc/REST-interface.md:44 in 1b928f58fc outdated
    35@@ -36,6 +36,14 @@ Responds with 404 if the transaction doesn't exist.
    36 By default, this endpoint will only search the mempool.
    37 To query for a confirmed transaction, enable the transaction index via "txindex=1" command line / configuration option.
    38 
    39+`GET /rest/txfromblock/<BLOCK-HASH>-<TX-OFFSET>.<bin|hex|json>`
    40+
    41+Given a block hash and transaction offset within it: returns a transaction in binary, hex-encoded binary, or JSON formats.
    42+Responds with 404 if the transaction doesn't exist.
    43+
    44+By default, this endpoint will read also the leading transactions, before reading and returning the requested one.
    


    hodlinator commented at 6:33 pm on August 2, 2025:

    nit:

    • Improved order?
    • Clarify that wasted work is being done.
    0By default, this endpoint will also deserialize the leading transactions, before reading and returning the requested one.
    

    romanz commented at 6:43 am on August 3, 2025:
    Thanks - fixed.

    hodlinator commented at 7:25 am on August 4, 2025:
    Typo in added parenthesis: beggining

    romanz commented at 11:34 am on August 4, 2025:

    Oops, sorry -> should be fixed now:

     0$ git diff b2a22ce33d 4441827ef4
     1diff --git a/doc/REST-interface.md b/doc/REST-interface.md
     2index ffb3c09af1..8f8431c29a 100644
     3--- a/doc/REST-interface.md
     4+++ b/doc/REST-interface.md
     5@@ -42,7 +42,7 @@ Given a block hash and transaction offset within it: returns a transaction in bi
     6 Responds with 404 if the transaction doesn't exist.
     7 
     8 By default, this endpoint will also deserialize the leading transactions, before reading and returning the requested one
     9-(which results in wasted deserialization work if the transaction is not in the beggining of the block).
    10+(which results in wasted deserialization work if the transaction is not in the beginning of the block).
    11 To read the requested transaction directly, enable the transaction locations' index via "locationsindex=1" command line / configuration option.
    12 
    13 #### Blocks
    
  53. in src/index/locationsindex.cpp:5 in 1b928f58fc outdated
    0@@ -0,0 +1,143 @@
    1+// Copyright (c) 2018-2022 The Bitcoin Core developers
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#include <map>
    


    hodlinator commented at 6:37 pm on August 2, 2025:
    nit: Please order std library headers last. Seems unused though.

    romanz commented at 6:43 am on August 3, 2025:
    Removed.
  54. in src/rest.cpp:524 in 1b928f58fc outdated
    514+        }
    515+        if (index >= pindex->nTx) {
    516+            return RESTERR(req, HTTP_BAD_REQUEST, strprintf("Block %s has only %d transactions", uriParts[0], pindex->nTx));
    517+        }
    518+        block_pos = pindex->GetBlockPos();
    519+    }
    


    hodlinator commented at 6:50 pm on August 2, 2025:
    We can skip this work if LocationsIndex is enabled as it doesn’t use block_pos.

    romanz commented at 6:42 am on August 3, 2025:
    Good catch - fixed.
  55. in src/rest.cpp:536 in 1b928f58fc outdated
    531+        }
    532+        success = locations_index->ReadRawTransaction(*block_hash, index, tx_bytes);
    533+    }
    534+
    535+    if (!success) {
    536+        return RESTERR(req, HTTP_NOT_FOUND, strprintf("Failed to read transaction #%d from block %s", index, (*block_hash).ToString()));
    


    hodlinator commented at 6:52 pm on August 2, 2025:

    nit:

    0        return RESTERR(req, HTTP_NOT_FOUND, strprintf("Failed to read transaction #%d from block %s", index, block_hash->ToString()));
    

    romanz commented at 6:42 am on August 3, 2025:
    Thanks - fixed.
  56. in src/index/locationsindex.cpp:28 in 1b928f58fc outdated
    23+
    24+namespace {
    25+
    26+struct DBKey {
    27+    uint256 hash;
    28+    uint32_t part; // allow splitting one block's transactions into multiple DB rows
    


    hodlinator commented at 7:37 pm on August 2, 2025:
    nit: Might be less confusing to use “row” everywhere here instead of “part”?

    romanz commented at 6:42 am on August 3, 2025:
    Thanks - renamed to row.
  57. in src/index/locationsindex.cpp:107 in 1b928f58fc outdated
    102+}
    103+
    104+bool LocationsIndex::ReadRawTransaction(const uint256& block_hash, size_t i, std::vector<std::byte>& out) const
    105+{
    106+    uint32_t part = i / TXS_PER_ROW; // used to find the correct DB row
    107+    i = i % TXS_PER_ROW;             // index within a single DB row
    


    hodlinator commented at 7:39 pm on August 2, 2025:

    nit: Could use local variable with descriptive name rather than mutating in-param:

    0    const auto column = i % TXS_PER_ROW;             // index within a single DB row
    

    romanz commented at 6:42 am on August 3, 2025:
    Thanks - fixed.
  58. in src/index/locationsindex.cpp:118 in 1b928f58fc outdated
    113+
    114+    if (value.offsets.empty()) {
    115+        LogError("%s: LocationsIndex entry for %s:%u must have >1 offsets\n", __func__, block_hash.ToString(), part);
    116+        return false;
    117+    }
    118+    size_t nTx = value.offsets.size() - 1;
    


    hodlinator commented at 7:42 pm on August 2, 2025:
    nit: Should use snake_case naming convention as per developer-notes.md, n_tx/num_tx/tx_count?

    romanz commented at 6:42 am on August 3, 2025:
    Thanks - renamed to tx_count.
  59. in src/index/locationsindex.cpp:114 in 1b928f58fc outdated
    109+    DBValue value{};
    110+    if (!m_db->Read(DBKey{block_hash, part}, value)) {
    111+        return false;
    112+    }
    113+
    114+    if (value.offsets.empty()) {
    


    hodlinator commented at 7:48 pm on August 2, 2025:

    Off-by-one - should match error message and later code:

    0    if (value.offsets.size() < 2) {
    

    romanz commented at 6:42 am on August 3, 2025:
    Good catch, thanks!
  60. in src/index/locationsindex.h:10 in 1b928f58fc outdated
     5+#ifndef BITCOIN_INDEX_LOCATIONSINDEX_H
     6+#define BITCOIN_INDEX_LOCATIONSINDEX_H
     7+
     8+#include <attributes.h>
     9+#include <chain.h>
    10+#include <flatfile.h>
    


    hodlinator commented at 7:51 pm on August 2, 2025:
    nit: Unused header.

    romanz commented at 6:42 am on August 3, 2025:
    Removed.
  61. in src/test/locationsindex_tests.cpp:70 in 1b928f58fc outdated
    65+        {
    66+            LOCK(cs_main);
    67+            block_index = m_node.chainman->m_blockman.LookupBlockIndex(block_hash);
    68+            block_pos = block_index->GetBlockPos();
    69+        }
    70+        BOOST_REQUIRE(block_index);
    


    hodlinator commented at 8:42 pm on August 2, 2025:

    Do we need to fetch a second CBlockIndex? This passes the test:

    0        FlatFilePos block_pos{WITH_LOCK(cs_main, return tip->GetBlockPos())};
    

    romanz commented at 6:34 am on August 3, 2025:
    You’re right, thanks - removed the unneeded code and renamed tip to block_index.
  62. hodlinator commented at 9:19 pm on August 2, 2025: contributor

    Concept ACK 1b928f58fc78f4727cef988902075abc05c372b2

    Cool way of compressing transaction metadata to significantly reduce on-disk footprint compared to TxIndex and make it more likely this one fits in RAM. (Fine as long as external code can fetch by block hash + tx index instead of by txid).

    Optimization as I understand it: All transactions within the same block have the same FlatFilePos base, which is compressed by a factor of 128 thanks to TXS_PER_ROW. Instead of storing an individual tx offset and size for each tx, it uses 2 offsets for a tx request, which allows re-use of offsets between adjacent txs.

    Thanks for electrs!

  63. romanz force-pushed on Aug 3, 2025
  64. romanz commented at 7:50 am on August 3, 2025: contributor
    Many thanks for the review! Applied the comments: 1b928f5 -> b2a22ce
  65. romanz requested review from hodlinator on Aug 3, 2025
  66. romanz requested review from maflcko on Aug 3, 2025
  67. romanz force-pushed on Aug 4, 2025
  68. romanz commented at 12:48 pm on August 4, 2025: contributor
    Applied a few more fixes: b2a22ce -> 4441827
  69. romanz commented at 7:20 pm on August 25, 2025: contributor
    Ping :)
  70. in src/index/locationsindex.cpp:6 in 4441827ef4 outdated
    0@@ -0,0 +1,141 @@
    1+// Copyright (c) 2018-2022 The Bitcoin Core developers
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#include <blockencodings.h>
    6+#include <clientversion.h>
    


    hodlinator commented at 12:47 pm on August 26, 2025:
    nit: Unused: clientversion.h

  71. in src/index/locationsindex.cpp:76 in 4441827ef4 outdated
    71+
    72+    assert(block.data);
    73+    assert(block.file_number >= 0);
    74+
    75+    const uint32_t nTx = block.data->vtx.size();
    76+    uint32_t nTxOffset = HEADER_SIZE + GetSizeOfCompactSize(nTx);
    


    hodlinator commented at 12:54 pm on August 26, 2025:

    nit: snake_case and narrower types:

    0    Assume(block.data->vtx.size() <= std::numeric_limits<uint32_t>::max());
    1    const uint32_t tx_count{static_cast<uint32_t>(block.data->vtx.size())};
    2    uint32_t tx_offset{HEADER_SIZE + GetSizeOfCompactSize(tx_count)};
    

  72. in src/index/locationsindex.cpp:105 in 4441827ef4 outdated
    100+}
    101+
    102+bool LocationsIndex::ReadRawTransaction(const uint256& block_hash, size_t i, std::vector<std::byte>& out) const
    103+{
    104+    uint32_t row = i / TXS_PER_ROW;      // used to find the correct DB row
    105+    const auto column = i % TXS_PER_ROW; // index within a single DB row
    


    hodlinator commented at 1:05 pm on August 26, 2025:

    nit: Could make row const as well and nail down the types:

    0bool LocationsIndex::ReadRawTransaction(const uint256& block_hash, uint32_t i, std::vector<std::byte>& out) const
    1{
    2    const uint32_t row{i / TXS_PER_ROW};      // used to find the correct DB row
    3    const uint32_t column{i % TXS_PER_ROW}; // index within a single DB row
    

  73. in src/index/locationsindex.cpp:132 in 4441827ef4 outdated
    127+    if (file.IsNull()) {
    128+        LogError("%s: OpenBlockFile failed\n", __func__);
    129+        return false;
    130+    }
    131+
    132+    out.resize(tx_size); // Zeroing of memory is intentional here
    


    hodlinator commented at 1:11 pm on August 26, 2025:

    Don’t see a good way around zeroing the memory, but comment makes it sound like a good thing? I guess it’s slightly better than returning a vector containing garbage upon failure, but even better would be to clear it upon failure?

    nit: Could change:

    0-    bool ReadRawTransaction(const uint256& block_hash, size_t i, std::vector<std::byte>& out) const;
    1+    std::vector<std::byte> ReadRawTransaction(const uint256& block_hash, uint32_t i) const;
    

    And only return non-empty vector on success instead of having a separate bool? Somewhat unorthodox compared to other Index-implementations but more modern in that it exploits the fact that vectors are movable.


  74. in src/rest.cpp:560 in 4441827ef4 outdated
    555+            serialized << TX_WITH_WITNESS(tx);
    556+            strHex = HexStr(serialized);
    557+        } else {
    558+            strHex = HexStr(tx_bytes);
    559+        }
    560+        strHex.append("\n");
    


    hodlinator commented at 5:34 pm on August 26, 2025:

    nit: Might as well use the fact that it’s only 1 char:

    0        strHex.push_back('\n');
    

  75. in src/rest.cpp:571 in 4441827ef4 outdated
    566+        if (!tx) {
    567+            DataStream{tx_bytes} >> TX_WITH_WITNESS(tx);
    568+        }
    569+        UniValue objTx(UniValue::VOBJ);
    570+        TxToUniv(*tx, /*block_hash=*/{}, /*entry=*/objTx);
    571+        std::string strJSON = objTx.write() + "\n";
    


    hodlinator commented at 5:35 pm on August 26, 2025:

    nit:

    0        std::string strJSON = objTx.write() + '\n';
    

  76. in src/rest.cpp:513 in 4441827ef4 outdated
    508+    }
    509+
    510+    FlatFilePos block_pos{};
    511+    if (!locations_index) {
    512+        LOCK(cs_main);
    513+        const CBlockIndex* pindex{chainman.m_blockman.LookupBlockIndex(*block_hash)};
    


    hodlinator commented at 5:42 pm on August 26, 2025:

    nit: Since we already have index and locations_index in scope here, could call this block, or block_index?

    0        const CBlockIndex* block{chainman.m_blockman.LookupBlockIndex(*block_hash)};
    

  77. in src/rest.cpp:485 in 4441827ef4 outdated
    480+
    481+    std::string param;
    482+    const RESTResponseFormat rf = ParseDataFormat(param, uri_part);
    483+
    484+    // request is sent over URI scheme /rest/txfromblock/blockhash-index
    485+    std::vector<std::string_view> uriParts{util::Split<std::string_view>(param, '-')};
    


    hodlinator commented at 5:45 pm on August 26, 2025:

    nit: Should use snake_case as per developer-notes.md:

    • uri_parts
    • str_hex
    • obj_tx
    • str_JSON

  78. in src/index/locationsindex.h:15 in 4441827ef4 outdated
    10+#include <index/base.h>
    11+
    12+static constexpr bool DEFAULT_LOCATIONSINDEX{false};
    13+
    14+/**
    15+ * LocationsIndex is used to store and retrieve transactions' location within a block.
    


    hodlinator commented at 5:51 pm on August 26, 2025:

    2 nits:

    1. Could reword and elaborate
      0 * LocationsIndex is used to store and retrieve the location of transactions within a block.
      1 *
      2 * This is done in a compressed manner, allowing the whole index to fit in RAM on decent hardware.
      3 * For example, the on-disk size was <X>GiB at block height <Y>.
      
    2. Could clarify PR description unless I’m mistaken.
      0-allowing it to be cached
      1+allowing it to be cached in RAM
      

  79. in src/CMakeLists.txt:193 in 4441827ef4 outdated
    237@@ -238,6 +238,7 @@ add_library(bitcoin_node STATIC EXCLUDE_FROM_ALL
    238   index/base.cpp
    


    hodlinator commented at 6:19 pm on August 26, 2025:

    (Inline comment in random location to keep main thread cleaner).

    nit: At #32541 (comment) you write:

    You are right that during the history indexing process, the client doesn’t need the proposed index, since it needs to read both the entire block (and undo) data in order to create a map between a transaction’s location and its ScriptPubKeys.

    IIUC, saying something closer to “map from ScriptPubKey to list of { block hash + tx index within the block }” would be more precise? If so, might be good to edit for future reviewers.


    romanz commented at 7:06 am on August 31, 2025:
    Thanks - updated the comment.
  80. in src/CMakeLists.txt:194 in 4441827ef4 outdated
    237@@ -238,6 +238,7 @@ add_library(bitcoin_node STATIC EXCLUDE_FROM_ALL
    238   index/base.cpp
    239   index/blockfilterindex.cpp
    


    hodlinator commented at 6:52 pm on August 26, 2025:

    (Inline comment in random location to keep main thread cleaner).

    nit: In the benchmark comment #32541 (comment)

    Maybe you could edit it to add something like:


    Conclusions: The new LocationsIndex is only a few percent faster than the old TxIndex, but the on-disk footprint is ~22x smaller.

    getrawtransaction which is used in the last benchmark has an average RPC duration of ~8.6ms vs ~0.14ms for the ones above.


    The reason I push for this is that I had to do a double-take even when coming back to this PR. I think it would assist reviewers. Hopefully this would not be grinding down the dinner into mushy baby food, making the reviews less thorough and leaving jaws underdeveloped.


    romanz commented at 6:04 pm on August 30, 2025:
    Sounds good - done.
  81. in src/init.cpp:1784 in 4441827ef4 outdated
    1779@@ -1777,6 +1780,11 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
    1780         node.indexes.emplace_back(g_coin_stats_index.get());
    1781     }
    1782 
    1783+    if (args.GetBoolArg("-locationsindex", DEFAULT_LOCATIONSINDEX)) {
    1784+        g_locations_index = std::make_unique<LocationsIndex>(interfaces::MakeChain(node), /*cache_size=*/0, false, do_reindex);
    


    hodlinator commented at 7:02 pm on August 26, 2025:
    Have you experimented with different cache sizes?

    hodlinator commented at 7:07 pm on August 26, 2025:

    nit: Better to name bool parameter:

    0        g_locations_index = std::make_unique<LocationsIndex>(interfaces::MakeChain(node), /*cache_size=*/0, /*f_memory=*/false, do_reindex);
    

    (These comments are validated by clang-tidy btw).

    Even though BaseIndex::DB::DB() names it f_memory, I think it would be clearer to use the term in DBParams:

    0        g_locations_index = std::make_unique<LocationsIndex>(interfaces::MakeChain(node), /*cache_size=*/0, /*memory_only=*/false, do_reindex);
    

    (If you do this, please rename it in LocationsIndex.h/cpp).


    romanz commented at 6:05 pm on August 30, 2025:
    Not yet - will do and report the results here.


    romanz commented at 5:22 am on September 4, 2025:

    It seems that OS block cache is quite effective (even with /*cache_size=*/0), so I am getting similar performance when using 3GiB DB cache compared to #32541 (comment).

    Tested with the patch below:

     0$ ab -k -c 1 -n 100000 http://localhost:8332/rest/txfromblock/$BLOCKHASH-5000.bin
     1...
     2Document Path:          /rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
     3Document Length:        234 bytes
     4
     5Concurrency Level:      1
     6Time taken for tests:   13.497 seconds
     7Complete requests:      100000
     8Failed requests:        0
     9Keep-Alive requests:    100000
    10Total transferred:      33800000 bytes
    11HTML transferred:       23400000 bytes
    12Requests per second:    7409.06 [#/sec] (mean)
    13Time per request:       0.135 [ms] (mean)
    14Time per request:       0.135 [ms] (mean, across all concurrent requests)
    15Transfer rate:          2445.57 [Kbytes/sec] received
    
     0diff --git a/src/init.cpp b/src/init.cpp
     1index 97ca0d1b18..3bab318790 100644
     2--- a/src/init.cpp
     3+++ b/src/init.cpp
     4@@ -1822,7 +1822,7 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
     5     }
     6 
     7     if (args.GetBoolArg("-locationsindex", DEFAULT_LOCATIONSINDEX)) {
     8-        g_locations_index = std::make_unique<LocationsIndex>(interfaces::MakeChain(node), /*cache_size=*/0, /*memory_only=*/false, do_reindex);
     9+        g_locations_index = std::make_unique<LocationsIndex>(interfaces::MakeChain(node), /*cache_size=*/3000_MiB, /*memory_only=*/false, do_reindex);
    10         node.indexes.emplace_back(g_locations_index.get());
    11     }
    

    hodlinator commented at 10:59 am on September 5, 2025:

    Wasn’t getting consistent results until I ran doas pyperf system tune and got the variance down from ±1.0ms to ±0.1ms.

    Used curl + hyperfine for clearer numbers on variance:

    0₿ hyperfine "curl -v http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin --output /dev/null" --warmup=1000 --min-runs=5000 --shell=none
    

    With 0 cache:

    0Benchmark 1: curl -v http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin --output /dev/null
    1  Time (mean ± σ):       4.5 ms ±   0.1 ms    [User: 1.9 ms, System: 2.0 ms]
    2  Range (min … max):     4.1 ms …   5.5 ms    5000 runs
    

    With 3'000_MIB cache:

    0Benchmark 1: curl -v http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin --output /dev/null
    1  Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.9 ms, System: 2.1 ms]
    2  Range (min … max):     4.1 ms …   5.9 ms    10000 runs
    3 
    4  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
    

    Confirms that cache doesn’t make any material difference (Edit: at least not on NVMe).

  82. in src/rest.cpp:503 in 4441827ef4 outdated
    498+    }
    499+    const size_t index = *parsed_index;
    500+
    501+    ChainstateManager* maybe_chainman = GetChainman(context, req);
    502+    if (!maybe_chainman) return false;
    503+    ChainstateManager& chainman = *maybe_chainman;
    


    hodlinator commented at 7:17 pm on August 26, 2025:
    nit: chainman is only used in the !locations_index scenario, so could be skipped unless you want to always require one.

  83. in src/index/locationsindex.cpp:65 in 4441827ef4 outdated
    60+    : BaseIndex(std::move(chain), "locationsindex")
    61+{
    62+    fs::path path = gArgs.GetDataDirNet() / "indexes" / "locations";
    63+    fs::create_directories(path);
    64+
    65+    m_db = std::make_unique<BaseIndex::DB>(path / "db", n_cache_size, f_memory, f_wipe);
    


    hodlinator commented at 8:13 pm on August 26, 2025:

    Not sure why explicit create_directories() is needed, this seems to work:

    0    , m_db{std::make_unique<BaseIndex::DB>(gArgs.GetDataDirNet() / "indexes" / "locations" / "db",
    1                                           n_cache_size, f_memory, f_wipe)}
    2{
    

  84. in test/functional/interface_rest.py:53 in 4441827ef4 outdated
    52@@ -53,7 +53,7 @@ def filter_output_indices_by_value(vouts, value):
    53 class RESTTest (BitcoinTestFramework):
    


    hodlinator commented at 8:27 pm on August 26, 2025:

    (Inline comment in random location to keep main thread cleaner).

    Played around with adding coverage in feature_init.py. However, I encountered some weird interaction between txindex and locationsindex. Not sure how incidental it is (going above a certain number of files) or whether some kind of interference is happening. See comment “If we add this, then we fail later while perturbing txindex”:

     0diff --git a/test/functional/feature_init.py b/test/functional/feature_init.py
     1index 15b3e8595f..47e29655ce 100755
     2--- a/test/functional/feature_init.py
     3+++ b/test/functional/feature_init.py
     4@@ -49,7 +49,7 @@ class InitTest(BitcoinTestFramework):
     5 
     6         def start_expecting_error(err_fragment):
     7             node.assert_start_raises_init_error(
     8-                extra_args=['-txindex=1', '-blockfilterindex=1', '-coinstatsindex=1', '-checkblocks=200', '-checklevel=4'],
     9+                extra_args=['-txindex=1', '-blockfilterindex=1', '-coinstatsindex=1', '-locationsindex=1', '-checkblocks=200', '-checklevel=4'],
    10                 expected_msg=err_fragment,
    11                 match=ErrorMatch.PARTIAL_REGEX,
    12             )
    13@@ -77,6 +77,7 @@ class InitTest(BitcoinTestFramework):
    14             b'txindex thread start',
    15             b'block filter index thread start',
    16             b'coinstatsindex thread start',
    17+            b'locationsindex thread start',
    18             b'msghand thread start',
    19             b'net thread start',
    20             b'addcon thread start',
    21@@ -84,7 +85,7 @@ class InitTest(BitcoinTestFramework):
    22         if self.is_wallet_compiled():
    23             lines_to_terminate_after.append(b'Verifying wallet')
    24 
    25-        args = ['-txindex=1', '-blockfilterindex=1', '-coinstatsindex=1']
    26+        args = ['-txindex=1', '-blockfilterindex=1', '-coinstatsindex=1', '-locationsindex=1']
    27         for terminate_line in lines_to_terminate_after:
    28             self.log.info(f"Starting node and will terminate after line {terminate_line}")
    29             with node.busy_wait_for_debug_log([terminate_line]):
    30@@ -101,12 +102,13 @@ class InitTest(BitcoinTestFramework):
    31         self.stop_node(0)
    32 
    33         self.log.info("Test startup errors after removing certain essential files")
    34-
    35         files_to_delete = {
    36             'blocks/index/*.ldb': 'Error opening block database.',
    37             'chainstate/*.ldb': 'Error opening coins database.',
    38             'blocks/blk*.dat': 'Error loading block database.',
    39             'indexes/txindex/MANIFEST*': 'LevelDB error: Corruption: CURRENT points to a non-existent file',
    40+            # If we add this, then we fail later while perturbing txindex!?:
    41+            #'indexes/locations/db/MANIFEST*': 'LevelDB error: Corruption: CURRENT points to a non-existent file',
    42             # Removing these files does not result in a startup error:
    43             # 'indexes/blockfilter/basic/*.dat', 'indexes/blockfilter/basic/db/*.*', 'indexes/coinstats/db/*.*',
    44             # 'indexes/txindex/*.log', 'indexes/txindex/CURRENT', 'indexes/txindex/LOCK'
    45@@ -120,6 +122,8 @@ class InitTest(BitcoinTestFramework):
    46             'indexes/coinstats/db/*.*': 'LevelDB error: Corruption',
    47             'indexes/txindex/*.log': 'LevelDB error: Corruption',
    48             'indexes/txindex/CURRENT': 'LevelDB error: Corruption',
    49+            'indexes/locations/db/*.log': 'LevelDB error: Corruption',
    50+            'indexes/locations/db/CURRENT': 'LevelDB error: Corruption',
    51             # Perturbing these files does not result in a startup error:
    52             # 'indexes/blockfilter/basic/*.dat', 'indexes/txindex/MANIFEST*', 'indexes/txindex/LOCK'
    53         }
    

    romanz commented at 6:03 pm on August 30, 2025:
    Good catch, will investigate.

    romanz commented at 6:50 am on August 31, 2025:


    hodlinator commented at 7:33 am on September 3, 2025:

    94389c28e1068ffcc116614d16ac3047eb3068e3 seems to be passing (after a rebase over master) 🤔

    Tried rebasing my suggestions-branch, and 528f79f010d1617163e4154753b85e04539af993 / #32835 (which I also reviewed :grimacing:) fixed the issue by waiting for indexes to sync up before restarting the node.

    would it be OK to squash 94389c28e1068ffcc116614d16ac3047eb3068e3 into bdd90ab25f5f3b05c03a8dfb7177399264bed377?

    Fine by me! Could add me as Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com> if you want.


    romanz commented at 6:54 pm on September 3, 2025:
  85. in src/node/blockstorage.h:420 in 4441827ef4 outdated
    416@@ -417,6 +417,8 @@ class BlockManager
    417 
    418     bool ReadBlockUndo(CBlockUndo& blockundo, const CBlockIndex& index) const;
    419 
    420+    bool ReadTxFromBlock(CTransactionRef& tx, const FlatFilePos& block_pos, size_t tx_index) const;
    


    hodlinator commented at 8:42 pm on August 26, 2025:

    nit: Could use more modern style instead of returning duplicate bool state (also nail down index type):

    0    CTransactionRef ReadTxFromBlock(const FlatFilePos& block_pos, uint32_t tx_index) const;
    

  86. in src/rest.cpp:484 in 4441827ef4 outdated
    479+    if (!CheckWarmup(req)) return false;
    480+
    481+    std::string param;
    482+    const RESTResponseFormat rf = ParseDataFormat(param, uri_part);
    483+
    484+    // request is sent over URI scheme /rest/txfromblock/blockhash-index
    


    hodlinator commented at 7:03 am on August 27, 2025:

    Have you considered the possibility of supporting batches of block-hash+tx-index in one GET?

    rest_getutxos() does something similar: https://github.com/bitcoin/bitcoin/blob/4441827ef4e9b5fe306c5f0a81a52b5d2b5e0b69/src/rest.cpp#L1007

    No sure if it’s best practices in the REST world though. Another option could be using a JSON blob in a HTTP header, but that may be even more frowned upon.

    cc @stickies-v who might have a feel for this kind of thing (came across his f959fc0397c3f3615e99bc28d2df549d9d52f277).


    romanz commented at 6:44 pm on September 3, 2025:
    Yes - but if it’s OK, I would prefer to do it in a separate PR.
  87. hodlinator commented at 7:34 am on August 27, 2025: contributor

    Reviewed 4441827ef4e9b5fe306c5f0a81a52b5d2b5e0b69

    Sorry for the bag of nits, take from it what you will. The questions have higher priority for sure. Tried adding coverage of LocationsIndex to test/functional/feature_init.py but ran into interference with TxIndex, could be some more general issue though. Haven’t had time to uncover the reason yet.

    Suggestions+test branch: https://github.com/bitcoin/bitcoin/compare/4441827ef4e9b5fe306c5f0a81a52b5d2b5e0b69...hodlinator:bitcoin:pr/32541_suggestions

    Further research idea: It would be awesome to use sendfile(socket, filein, offset, count)/TransmitFile() for this to move that part to 100% kernel-space but not sure how we would cut through the abstractions for that: https://www.man7.org/linux/man-pages/man2/sendfile.2.html Edit regarding sendfile: Oof, this is somewhat impossible unless one turns off XOR obfuscation. libevent had native support for sendfile via evbuffer_add_file and I did a quick & dirty implementation for /rest/block/0000000000000000000515e202c8ae73c8155fc472422d7593af87aa74f2cf3d.bin (fetching the taproot wizards 3.96MB block). Measured a 27% speedup (avg. 89.3ms vs 122.6ms for 1000 runs each).

  88. romanz force-pushed on Aug 30, 2025
  89. romanz commented at 6:02 pm on August 30, 2025: contributor
    Many thanks for the review and the suggested fixes! Squashed most the fixes into the above commits and force-pushed: https://github.com/bitcoin/bitcoin/compare/4441827ef4e9b5fe306c5f0a81a52b5d2b5e0b69...5e1c80a22f3d3c6ecf6d4e2777d67656423dd3f3
  90. DrahtBot added the label CI failed on Aug 30, 2025
  91. DrahtBot commented at 7:08 pm on August 30, 2025: contributor

    🚧 At least one of the CI tasks failed. Task TSan, depends, no gui: https://github.com/bitcoin/bitcoin/runs/49248056402 LLM reason (✨ experimental): ThreadSanitizer detected a data race (width in ios:501:18) causing the mptest to fail.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  92. romanz commented at 5:36 am on August 31, 2025: contributor
    Rebasing to resolve a conflict with master.
  93. romanz force-pushed on Aug 31, 2025
  94. DrahtBot removed the label CI failed on Aug 31, 2025
  95. romanz requested review from hodlinator on Sep 2, 2025
  96. in test/functional/interface_rest.py:494 in 94389c28e1 outdated
    485+                assert tx_bytes.hex() == tx["hex"]
    486+                tx_hex = self.test_rest_request(f"/txfromblock/{blockhash}-{i}", req_type=ReqType.HEX, ret_type=RetType.BYTES).decode().strip()
    487+                assert tx_hex == tx["hex"]
    488+                tx_json = self.test_rest_request(f"/txfromblock/{blockhash}-{i}", req_type=ReqType.JSON, ret_type=RetType.JSON)
    489+                assert tx_json["txid"] == tx["txid"]
    490+                assert tx_json["hex"] == tx["hex"]
    


    hodlinator commented at 8:14 am on September 3, 2025:

    Could cover both paths:

     0diff --git a/test/functional/interface_rest.py b/test/functional/interface_rest.py
     1index 541dde3174..333b8c3f79 100755
     2--- a/test/functional/interface_rest.py
     3+++ b/test/functional/interface_rest.py
     4@@ -28,7 +28,7 @@ from test_framework.wallet import (
     5     MiniWallet,
     6     getnewdestination,
     7 )
     8-from typing import Optional
     9+from typing import Optional, NamedTuple
    10 
    11 
    12 INVALID_PARAM = "abc"
    13@@ -53,7 +53,7 @@ def filter_output_indices_by_value(vouts, value):
    14 class RESTTest (BitcoinTestFramework):
    15     def set_test_params(self):
    16         self.num_nodes = 2
    17-        self.extra_args = [["-rest", "-blockfilterindex=1", "-locationsindex=1"], []]
    18+        self.extra_args = [["-rest", "-blockfilterindex=1", "-locationsindex=1"], ["-rest"]]
    19         # whitelist peers to speed up tx relay / mempool sync
    20         self.noban_tx_relay = True
    21         self.supports_cli = False
    22@@ -67,14 +67,17 @@ class RESTTest (BitcoinTestFramework):
    23             status: int = 200,
    24             ret_type: RetType = RetType.JSON,
    25             query_params: Optional[dict[str, typing.Any]] = None,
    26+            url: Optional[NamedTuple] = None,
    27             ) -> typing.Union[http.client.HTTPResponse, bytes, str, None]:
    28         rest_uri = '/rest' + uri
    29         if req_type in ReqType:
    30             rest_uri += f'.{req_type.name.lower()}'
    31         if query_params:
    32             rest_uri += f'?{urllib.parse.urlencode(query_params)}'
    33+        if not url:
    34+            url = self.url
    35 
    36-        conn = http.client.HTTPConnection(self.url.hostname, self.url.port)
    37+        conn = http.client.HTTPConnection(url.hostname, url.port)
    38         self.log.debug(f'{http_method} {rest_uri} {body}')
    39         if http_method == 'GET':
    40             conn.request('GET', rest_uri)
    41@@ -474,6 +477,7 @@ class RESTTest (BitcoinTestFramework):
    42         block_count = self.nodes[0].getblockcount()
    43         expected_index_info = {'synced': True, 'best_block_height': block_count}
    44         self.wait_until(lambda: self.nodes[0].getindexinfo()['locationsindex'] == expected_index_info)
    45+        other_url = urllib.parse.urlparse(self.nodes[1].url)
    46         for height in range(0, block_count + 1):
    47             blockhash = self.nodes[0].getblockhash(height)
    48             block = self.nodes[0].getblock(blockhash, 2)
    49@@ -489,5 +493,14 @@ class RESTTest (BitcoinTestFramework):
    50                 assert tx_json["txid"] == tx["txid"]
    51                 assert tx_json["hex"] == tx["hex"]
    52 
    53+                tx_bytes_noindex = self.test_rest_request(f"/txfromblock/{blockhash}-{i}", req_type=ReqType.BIN, ret_type=RetType.BYTES, url=other_url)
    54+                assert tx_bytes.hex() == tx["hex"]
    55+                tx_hex_noindex = self.test_rest_request(f"/txfromblock/{blockhash}-{i}", req_type=ReqType.HEX, ret_type=RetType.BYTES, url=other_url).decode().strip()
    56+                assert tx_hex == tx["hex"]
    57+                tx_json_noindex = self.test_rest_request(f"/txfromblock/{blockhash}-{i}", req_type=ReqType.JSON, ret_type=RetType.JSON, url=other_url)
    58+                assert tx_bytes_noindex == tx_bytes
    59+                assert tx_hex_noindex   == tx_hex
    60+                assert tx_json_noindex  == tx_json
    61+
    62 if __name__ == '__main__':
    63     RESTTest(__file__).main()
    

    romanz commented at 6:53 pm on September 3, 2025:
    Force-pushed 94389c28e1...91408755df, with a similar change to check both enabled and disabled index.
  97. hodlinator commented at 9:18 am on September 3, 2025: contributor

    Reviewed 94389c28e1068ffcc116614d16ac3047eb3068e3

    Measured a ~11% speed increase by enabling LocationsIndex, running on NVMe. Are you seeing greater improvements on other setups?

    -locationsindex=0

    0₿ hey -n 100000 http://localhost:8332/rest/txfromblock/0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe-42.bin
    1
    2Summary:
    3  Total:	2.3504 secs
    4  Slowest:	0.0193 secs
    5  Fastest:	0.0001 secs
    6  Average:	0.0012 secs
    7  Requests/sec:	42545.2261
    

    -locationsindex=1

    0₿ hey -n 100000 http://localhost:8332/rest/txfromblock/0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe-42.bin
    1
    2Summary:
    3  Total:  2.0951 secs
    4  Slowest:  0.0417 secs
    5  Fastest:  0.0001 secs
    6  Average:  0.0010 secs
    7  Requests/sec: 47730.5857
    
  98. romanz commented at 6:32 pm on September 3, 2025: contributor

    Are you seeing greater improvements on other setups?

    I think that the performance gain will increase when fetching transactions near the end of the block.

    For example, when fetching the 5000th transaction of [block #90005](https://mempool.space/block/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec) using ab -k -c 1 -n 10000012, enabled locationsindex improves the performance ~19x (2.574ms → 0.136ms).

    -locationsindex=1

     0Document Path:          /rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
     1Document Length:        234 bytes
     2
     3Concurrency Level:      1
     4Time taken for tests:   13.598 seconds
     5Complete requests:      100000
     6Failed requests:        0
     7Keep-Alive requests:    100000
     8Total transferred:      33800000 bytes
     9HTML transferred:       23400000 bytes
    10Requests per second:    7353.85 [#/sec] (mean)
    11Time per request:       0.136 [ms] (mean)
    12Time per request:       0.136 [ms] (mean, across all concurrent requests)
    13Transfer rate:          2427.34 [Kbytes/sec] received
    

    -locationsindex=0

     0Document Path:          /rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
     1Document Length:        234 bytes
     2
     3Concurrency Level:      1
     4Time taken for tests:   257.400 seconds
     5Complete requests:      100000
     6Failed requests:        0
     7Keep-Alive requests:    100000
     8Total transferred:      33800000 bytes
     9HTML transferred:       23400000 bytes
    10Requests per second:    388.50 [#/sec] (mean)
    11Time per request:       2.574 [ms] (mean)
    12Time per request:       2.574 [ms] (mean, across all concurrent requests)
    13Transfer rate:          128.24 [Kbytes/sec] received
    

    1. ApacheBench 2.3 ↩︎

    2. testing with warm OS block cache, i.e. measuring mostly HTTP & (de-)serialization ↩︎

  99. romanz commented at 6:44 pm on September 3, 2025: contributor

    Ran the same test using hey 0.1.4 - enabling locationsindex improves the performance ~15.8x (2.963ms → 0.187ms)

    -locationsindex=1

     0 hey -c 1 -n 10000 http://localhost:8332/rest/txfromblock/$BLOCKHASH-5000.bin
     1
     2Summary:
     3  Total:	1.8766 secs
     4  Slowest:	0.0686 secs
     5  Fastest:	0.0001 secs
     6  Average:	0.0002 secs
     7  Requests/sec:	5328.8354
     8  
     9  Total data:	2340000 bytes
    10  Size/request:	234 bytes
    11
    12Response time histogram:
    13  0.000 [1]	|
    14  0.007 [9998]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
    15  0.014 [0]	|
    16  0.021 [0]	|
    17  0.027 [0]	|
    18  0.034 [0]	|
    19  0.041 [0]	|
    20  0.048 [0]	|
    21  0.055 [0]	|
    22  0.062 [0]	|
    23  0.069 [1]	|
    24
    25
    26Latency distribution:
    27  10% in 0.0001 secs
    28  25% in 0.0002 secs
    29  50% in 0.0002 secs
    30  75% in 0.0002 secs
    31  90% in 0.0002 secs
    32  95% in 0.0002 secs
    33  99% in 0.0003 secs
    34
    35Details (average, fastest, slowest):
    36  DNS+dialup:	0.0000 secs, 0.0001 secs, 0.0686 secs
    37  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0003 secs
    38  req write:	0.0000 secs, 0.0000 secs, 0.0004 secs
    39  resp wait:	0.0002 secs, 0.0001 secs, 0.0685 secs
    40  resp read:	0.0000 secs, 0.0000 secs, 0.0005 secs
    41
    42Status code distribution:
    43  [200]	10000 responses
    

    -locationsindex=0

     0hey -c 1 -n 10000 http://localhost:8332/rest/txfromblock/$BLOCKHASH-5000.bin
     1
     2Summary:
     3  Total:	29.6382 secs
     4  Slowest:	0.0546 secs
     5  Fastest:	0.0016 secs
     6  Average:	0.0030 secs
     7  Requests/sec:	337.4024
     8  
     9  Total data:	2340000 bytes
    10  Size/request:	234 bytes
    11
    12Response time histogram:
    13  0.002 [1]	|
    14  0.007 [9998]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
    15  0.012 [0]	|
    16  0.018 [0]	|
    17  0.023 [0]	|
    18  0.028 [0]	|
    19  0.033 [0]	|
    20  0.039 [0]	|
    21  0.044 [0]	|
    22  0.049 [0]	|
    23  0.055 [1]	|
    24
    25
    26Latency distribution:
    27  10% in 0.0029 secs
    28  25% in 0.0030 secs
    29  50% in 0.0030 secs
    30  75% in 0.0031 secs
    31  90% in 0.0031 secs
    32  95% in 0.0032 secs
    33  99% in 0.0033 secs
    34
    35Details (average, fastest, slowest):
    36  DNS+dialup:	0.0000 secs, 0.0016 secs, 0.0546 secs
    37  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0006 secs
    38  req write:	0.0000 secs, 0.0000 secs, 0.0004 secs
    39  resp wait:	0.0029 secs, 0.0016 secs, 0.0545 secs
    40  resp read:	0.0000 secs, 0.0000 secs, 0.0005 secs
    41
    42Status code distribution:
    43  [200]	10000 responses
    
  100. rest: allow fetching specific transaction from a block 772ee5c973
  101. romanz force-pushed on Sep 3, 2025
  102. romanz marked this as a draft on Sep 3, 2025
  103. romanz force-pushed on Sep 4, 2025
  104. romanz marked this as ready for review on Sep 4, 2025
  105. romanz requested review from hodlinator on Sep 4, 2025
  106. romanz commented at 6:29 am on September 4, 2025: contributor

    Also tested with wrk 4.1.0, with similar performance improvement.

    -locationsindex=1

     0$ wrk --latency -t 1 -c 1 -d 10s http://localhost:8332/rest/txfromblock/$BLOCKHASH-5000.bin
     1Running 10s test @ http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
     2  1 threads and 1 connections
     3  Thread Stats   Avg      Stdev     Max   +/- Stdev
     4    Latency   137.29us   33.60us   2.78ms   93.01%
     5    Req/Sec     7.20k   518.79     8.84k    73.27%
     6  Latency Distribution
     7     50%  136.00us
     8     75%  147.00us
     9     90%  166.00us
    10     99%  185.00us
    11  72377 requests in 10.10s, 24.23MB read
    12Requests/sec:   7165.98
    13Transfer/sec:      2.40MB
    14
    15$ wrk --latency -t 4 -c 4 -d 10s http://localhost:8332/rest/txfromblock/$BLOCKHASH-5000.bin
    16Running 10s test @ http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
    17  4 threads and 4 connections
    18  Thread Stats   Avg      Stdev     Max   +/- Stdev
    19    Latency   139.39us   25.96us   1.02ms   74.11%
    20    Req/Sec     7.08k   255.89     7.90k    68.73%
    21  Latency Distribution
    22     50%  138.00us
    23     75%  154.00us
    24     90%  169.00us
    25     99%  205.00us
    26  283910 requests in 10.10s, 95.04MB read
    27Requests/sec:  28111.74
    28Transfer/sec:      9.41MB
    

    -locationsindex=0

     0$ wrk --latency -t 1 -c 1 -d 10s http://localhost:8332/rest/txfromblock/$BLOCKHASH-5000.bin
     1Running 10s test @ http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
     2  1 threads and 1 connections
     3  Thread Stats   Avg      Stdev     Max   +/- Stdev
     4    Latency     2.29ms  646.55us   4.66ms   65.27%
     5    Req/Sec   439.01    104.35   590.00     72.00%
     6  Latency Distribution
     7     50%    1.84ms
     8     75%    2.92ms
     9     90%    2.96ms
    10     99%    3.30ms
    11  4373 requests in 10.01s, 1.46MB read
    12Requests/sec:    436.96
    13Transfer/sec:    149.78KB
    14
    15$ wrk --latency -t 4 -c 4 -d 10s http://localhost:8332/rest/txfromblock/$BLOCKHASH-5000.bin
    16Running 10s test @ http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
    17  4 threads and 4 connections
    18  Thread Stats   Avg      Stdev     Max   +/- Stdev
    19    Latency     2.62ms  833.44us   5.88ms   58.59%
    20    Req/Sec   382.58     44.82   515.00     69.25%
    21  Latency Distribution
    22     50%    2.53ms
    23     75%    3.20ms
    24     90%    3.69ms
    25     99%    4.87ms
    26  15244 requests in 10.01s, 5.10MB read
    27Requests/sec:   1523.25
    28Transfer/sec:    522.13KB
    
  107. index: store per-block transaction locations for efficient lookups
    Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    961f94b293
  108. in src/rest.cpp:505 in 91408755df outdated
    500+
    501+    CTransactionRef tx{};
    502+    std::vector<std::byte> tx_bytes{};
    503+    if (const LocationsIndex* locations_index = g_locations_index.get()) {
    504+        if (!locations_index->BlockUntilSyncedToCurrentChain()) {
    505+            RESTERR(req, HTTP_SERVICE_UNAVAILABLE, "Locations index is still syncing");
    


    hodlinator commented at 9:19 am on September 4, 2025:

    Missing return here.

    This will lead to ReadRawTransaction() being called anyway and in my testing it returns an empty vector, which we then discover on line 533 and on the next line we try to use req again but internal pointers have been reset to null by the line of this comment, leading to a crash.


    romanz commented at 10:47 am on September 4, 2025:
    Thanks! Fixed in 91408755df…961f94b293.
  109. romanz force-pushed on Sep 4, 2025
  110. hodlinator commented at 1:13 pm on September 4, 2025: contributor

    Waiting for indexing to complete, did a 1min perf dump:

    • 35.2% of the time is spent in the CTransaction constructor, eagerly computing hashes which LocationsIndex doesn’t even use.
    • 11.7% is spent in the GetSerializeSize calls inside LocationsIndex::CustomAppend().

    Would be nice for the LocationsIndex if hashes were lazily computed (avoided) and transactions had a serialized size field that was cached upon deserialization so we don’t need to call GetSerializeSize.

    Next-day edit:

    1. Noticed I was accidentally running a debug build in the measurements above.
    2. Realized that 80+% of the 35.2% tx hash computation above was spent in serializing the transaction we had just deserialized. Edit: The re-serialization turned out to be cheap in the end, SHA256-hashing is performed during serialization and is where most of the time goes. (At the peaks of the flamegraphs, cropped away in the screenshot).
  111. romanz commented at 6:28 pm on September 4, 2025: contributor

    Would be nice for the LocationsIndex if hashes were lazily computed (avoided) and transactions had a serialized size field that was cached upon deserialization so we don’t need to call GetSerializeSize.

    In bindex library, I am using @RCasatta’s bitcoin_slices crate which allows parsing Bitcoin blocks and transactions without extra allocations & hashing. Maybe we can use a similar approach for building the indices (since each index may need a different part of the indexed transaction)?

  112. hodlinator commented at 9:38 pm on September 4, 2025: contributor

    I think that the performance gain will increase when fetching transactions near the end of the block.

    Oof, I can confirm. Skipping over the serialization of all the prior transactions really helps. :D

    (Edit: Accidentally ran these benchmarks in debug, updated numbers still confirm the improvement)

    1 worker - 16x improvement

    0₿ hey -c 1 -n 10000 http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
    

    -locationsindex=0

    0Summary:
    1  Total:  19.9502 secs
    2  Slowest:  0.0191 secs
    3  Fastest:  0.0019 secs
    4  Average:  0.0020 secs
    5  Requests/sec: 501.2476
    

    -locationsindex=1

    0Summary:
    1  Total:	1.2325 secs
    2  Slowest:	0.0178 secs
    3  Fastest:	0.0001 secs
    4  Average:	0.0001 secs
    5  Requests/sec:	8113.5517
    

    16 workers - 8.5x improvement

    0₿ hey -c 16 -n 10000 http://localhost:8332/rest/txfromblock/000000000000000000017bfd05b5fa367a424c4a565a4baf7950d9e8605df8ec-5000.bin
    

    -locationsindex=0

    0Summary:
    1  Total:  2.6111 secs
    2  Slowest:  0.0222 secs
    3  Fastest:  0.0019 secs
    4  Average:  0.0042 secs
    5  Requests/sec: 3829.8051
    

    -locationsindex=1

    0Summary:
    1  Total:  0.3054 secs
    2  Slowest:  0.0182 secs
    3  Fastest:  0.0001 secs
    4  Average:  0.0005 secs
    5  Requests/sec: 32744.4170
    
  113. in test/functional/interface_rest.py:488 in 961f94b293
    483+            self.log.debug("Checking block: %s (%d txs)", blockhash, len(txs))
    484+            for i, tx in enumerate(txs):
    485+                results = []
    486+                for node in self.nodes:
    487+                    self.log.debug("Checking tx #%d: %s on %s", i, tx["txid"], node.url)
    488+                    self.url = urllib.parse.urlparse(node.url)
    


    hodlinator commented at 11:46 am on September 5, 2025:

    info: Re-parsing for every transaction in every block seems wasteful, but I couldn’t measure any improvement from doing:

     0diff --git a/test/functional/interface_rest.py b/test/functional/interface_rest.py
     1index 34ae549cf4..89969e50f9 100755
     2--- a/test/functional/interface_rest.py
     3+++ b/test/functional/interface_rest.py
     4@@ -476,6 +476,7 @@ class RESTTest (BitcoinTestFramework):
     5         self.wait_until(lambda: self.nodes[0].getindexinfo()['locationsindex'] == expected_index_info)
     6         assert self.nodes[1].getindexinfo() == {}  # locationsindex is disabled
     7 
     8+        urls = (self.url, urllib.parse.urlparse(self.nodes[1].url))
     9         for height in range(0, block_count + 1):
    10             blockhash = self.nodes[0].getblockhash(height)
    11             block = self.nodes[0].getblock(blockhash, 2)
    12@@ -483,9 +484,9 @@ class RESTTest (BitcoinTestFramework):
    13             self.log.debug("Checking block: %s (%d txs)", blockhash, len(txs))
    14             for i, tx in enumerate(txs):
    15                 results = []
    16-                for node in self.nodes:
    17+                for j, node in enumerate(self.nodes):
    18                     self.log.debug("Checking tx #%d: %s on %s", i, tx["txid"], node.url)
    19-                    self.url = urllib.parse.urlparse(node.url)
    20+                    self.url = urls[j]
    21                     tx_bytes = self.test_rest_request(f"/txfromblock/{blockhash}-{i}", req_type=ReqType.BIN, ret_type=RetType.BYTES)
    22                     assert tx_bytes.hex() == tx["hex"]
    23                     tx_hex = self.test_rest_request(f"/txfromblock/{blockhash}-{i}", req_type=ReqType.HEX, ret_type=RetType.BYTES).decode().strip()
    
  114. in src/index/locationsindex.cpp:37 in 961f94b293
    32+        READWRITE(prefix);
    33+        if (prefix != DB_BLOCK_HASH) {
    34+            throw std::ios_base::failure("Invalid format for location index DB hash key");
    35+        }
    36+
    37+        READWRITE(obj.hash);
    


    hodlinator commented at 9:39 am on September 8, 2025:

    nit: If you re-touch, could be more explicit with block_hash + curly initialization by default to avoid unintentional narrowing:

     0    uint256 block_hash;
     1    uint32_t row; // allow splitting one block's transactions into multiple DB rows
     2
     3    explicit DBKey(const uint256& block_hash_in, uint32_t row_in) : block_hash{block_hash_in}, row{row_in} {}
     4
     5    SERIALIZE_METHODS(DBKey, obj)
     6    {
     7        uint8_t prefix{DB_BLOCK_HASH};
     8        READWRITE(prefix);
     9        if (prefix != DB_BLOCK_HASH) {
    10            throw std::ios_base::failure("Invalid format for location index DB hash key");
    11        }
    12
    13        READWRITE(obj.block_hash);
    
  115. hodlinator commented at 12:34 pm on September 8, 2025: contributor

    Higher level flow:

     0sequenceDiagram
     1    actor User
     2    participant C as Electrum<br/>Client
     3    participant S as Electrum<br/>Server
     4    participant R as bitcoind<br/>REST interface
     5    participant L as LocationsIndex
     6    participant B as blockstorage
     7
     8    User->>+C: View wallet history
     9    C->>+S: blockchain.scripthash.get_history(scripthash) [1]
    10    note over C,S: https://electrum-protocol.readthedocs.io<br/>/en/latest/protocol-methods.html#blockchain-scripthash-get-history
    11    loop Fetch history
    12        S->>+R: GET /rest/txfromblock/BLOCKHASH-N.bin
    13        R->>+L: LocationsIndex::ReadRawTransaction(block_hash, n)
    14        note over L: BLOCKHASH+N is mapped to a LevelDB key with a compressed N-value
    15        L->>+B: BlockManager::OpenBlockFile()
    16        B->>-L: AutoFile
    17        note over L: Read span of bytes for transaction
    18        L->>-R: bytes
    19        R->>-S: bytes
    20        S->>-C: bytes
    21    end
    22    C->>-User: tx history
    

    Q1: Could LocationsIndex live on the Electrum Server (bindex-rs) side, with bitcoind instead having a REST endpoint for reading a span of bytes from a given block?


    Q2: Why is it better to implement a slow fallback in blockstorage than to fail over to requiring TxIndex? Is it a question of not wanting to store both blockhash+N and txid in case users switch around their configurations/bitcoind servers?


    Would be nice for the LocationsIndex if hashes were lazily computed (avoided) and transactions had a serialized size field that was cached upon deserialization so we don’t need to call GetSerializeSize.

    In bindex library, I am using @RCasatta’s bitcoin_slices crate which allows parsing Bitcoin blocks and transactions without extra allocations & hashing. Maybe we can use a similar approach for building the indices (since each index may need a different part of the indexed transaction)?

    Yeah, would be sweet if we could just allocate the block into a 4MB scratch buffer without any transformations and maybe even have the tx hashes appended to the end of blocks so they don’t need to be recomputed (could still have some simple CRC over block + hashes to catch storage medium failure).

  116. romanz commented at 6:44 pm on September 8, 2025: contributor

    Q1: Could LocationsIndex live on the Electrum Server (bindex-rs) side, with bitcoind instead having a REST endpoint for reading a span of bytes from a given block?

    Great idea, thanks! I would try this approach in a separate bindex branch, to evaluate its performance.

    Q2: Why is it better to implement a slow fallback in blockstorage than to fail over to requiring TxIndex? Is it a question of not wanting to store both blockhash+N and txid in case users switch around their configurations/bitcoind servers?

    The drawback of TxIndex is its size (currently ~60GB). It also requires bindex to use txids (to query TxIndex), whose storage is estimated to take additional tens of GBs (~1B txids, each being 32-byte pseudo-random values).

    Currently bindex is using 4-byte txnum1 (the “position” of the tx in the chain, when enumerating all the transactions from genesis), which can be used to efficiently find its block height and its offset within the block - without using txids :)

  117. DrahtBot added the label Needs rebase on Sep 9, 2025
  118. DrahtBot commented at 1:38 am on September 9, 2025: contributor
    🐙 This pull request conflicts with the target branch and needs rebase.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-09-12 12:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me