Currently, electrs and other indexers map between an address/scripthash to the list of the relevant transactions.
However, in order to fetch those transactions from bitcoind, electrs relies on reading the whole block and post-filtering for a specific transaction^1. Other indexers use a txindex to fetch a transaction using its txid ^2^4.
The above approach has significant storage and CPU overhead, since the txid is a pseudo-random 32-byte value.
This PR is adding support for using the transaction's position within its block to be able to fetch it directly using REST API, using the following HTTP request (to fetch the N-th transaction from BLOCKHASH):
GET /rest/txfromblock/BLOCKHASH-N.bin
If binary response format is used, the transaction data will be read directly from the storage and sent back to the client, without any deserialization overhead.
The resulting index is much smaller (allowing it to be cached in RAM):
$ du -sh indexes/locations/ indexes/txindex/
2.5G indexes/locations/
57G indexes/txindex/
The new index is using the following DB schema:
struct DBKey {
uint256 hash; // blockhash
uint32_t row; // allow splitting one block's transactions into multiple DB rows
};
struct DBValue {
FlatFilePos block_pos; // file id + offset of the block
std::vector<uint32_t> offsets; // a list of transaction offsets within the block
};
For example, when fetching the 5000th transaction of block [#90005](/bitcoin-bitcoin/90005/) using ab -k -c 1 -n 100000, enabled locationsindex improves the performance ~19x (2.574ms → 0.136ms).
I am working on a proof-of-concept indexer (https://github.com/romanz/bindex-rs) which is using #32540 & #32541 - please let me know if there are any questions/comments/concerns :)