Add address-based index #6835

pull rnicoll wants to merge 1 commits into bitcoin:master from rnicoll:search-raw-transactions changing 9 files +272 −19

rnicoll commented at 6:17 PM on October 15, 2015: contributor

Update of #3652 (which is an update of #2802), to latest master. RPC tests are coming, but I wanted to open this now so it's clear there's work going on, make sure we don't duplicate effort.

in src/txdb.cpp:None in e6cfa10d80 outdated

 166 | @@ -163,6 +167,40 @@ bool CBlockTreeDB::WriteTxIndex(const std::vector<std::pair<uint256, CDiskTxPos>
 167 |      return WriteBatch(batch);
 168 |  }
 169 |  
 170 | +bool CBlockTreeDB::ReadAddrIndex(const uint160 &addrid, std::vector<CExtDiskTxPos> &list) {
 171 | +    CLevelDBIterator *iter = NewIterator();

dexX7 commented at 1:53 PM on October 16, 2015:

It looks like you rebased from the original.

This leaks, you'll need to delete the iterator, or use a smart pointer.

rnicoll commented at 5:39 PM on October 16, 2015:

Thanks, added the missing commit in now

laanwj added the label UTXO Db and Indexes on Oct 20, 2015
dcousens commented at 10:38 PM on October 21, 2015: contributor

concept ACK
btcdrak commented at 9:23 AM on October 22, 2015: contributor

I am very much in favour of a full address indexes as it removes the need for third party applications like insight-api separately parsing the blockchain. However, I'm not sure this PR's approach is the best way. I dont believe it is reorg safe, and there are other issues reported by @gmaxwell https://github.com/CounterpartyXCP/counterparty-lib/issues/783#issuecomment-120130217.

I think overall the approach of #5048 creating a separate index is better (but applied to all addresses rather than just the utxo).
in src/main.cpp:None in 0a42fd54d8 outdated
```
 983 | +    uint160 addrid;
 984 | +    const CKeyID *pkeyid = boost::get<CKeyID>(&dest);
 985 | +    if (pkeyid) {
 986 | +        addrid = static_cast<uint160>(*pkeyid);
 987 | +    } else {
 988 | +        const CScriptID *pscriptid = boost::get<CScriptID>(&dest);
```
dcousens commented at 9:59 AM on October 22, 2015:

Why not just always use the HASH160 of the script? That way you could query any UTXO.

afk11 commented at 4:02 AM on October 23, 2015:

+1 as there are more interesting script types than address types. Address lookup could just produce the necessary output script and look that up. Exposing both would be great.
afk11 commented at 8:12 PM on October 23, 2015: contributor

Concept ACK, but needs rebase.
gmaxwell commented at 10:21 PM on October 23, 2015: contributor

Concept (weak)NAK. (Please don't be upset, I feel that stating my view here is a courtesy over simply saying nothing at all).

I don't see this approach being viable.

There are several interrelated problems, but the foremost of this is that this approach forces its users to use an non-scalable and maintainable method of accessing data in the Bitcoin blockchain-- requiring 50GB+ storage and rapidly growing, where usually a watching wallet is sufficient. This means that user who do adopt this approach will rapidly be pushed off onto trusting third party API services as the resource costs exceed their tolerance.

And we're left then with an invasive change that is used by few users and will not be adequately supported (which we've also seen with this change, that no one has even bothered to diagnose the problems with index consistency that this particular implementation has had).

I would much rather see rapid utxo-only rescan (which would be much less resource intensive, and so more widely used) for watching import and similar functionality. And for indexing beyond that the indexes should be isolated and modularized to make them maintainable for small user bases, or even for out of tree indexing code.
rnicoll commented at 6:36 PM on October 24, 2015: contributor
Wanted to say, I am following the comments here, I'm just prioritising work elsewhere at the moment. Thanks for all the comments, feedback and analysis everyone, they are all appreciated.
1. Using hash of the full script sounds like an excellent idea.
2. The index consistency issues are likely arising due to lack of recognition of reorgs happening (and therefore the index becomes inconsistent with the blocks on disk).
3. Rapid UTXO-scan sounds like a good first step and we can come back to this.
I will look at this in more depth later, but unlikely before November. If anyone else wants to contribute, PRs against my branch would be much appreciated.

Add address-based index

1) Maintain a salt to perturbate the address index (protection against
   collisions).
2) Add support for address index entries in the block index, and
   maintain those if -addrindex is specified. It indexes the use of
   every >8-byte data push in output script or consumed script - or in
   case of no such push, the entire script.
3) Add a searchrawtransactions RPC call, which can look up raw
   transactions by address.

5c9a79441a

rnicoll force-pushed on Oct 24, 2015
dexX7 commented at 1:09 PM on October 25, 2015: contributor
If this patch is going to be overhauled, which I'd like very much, then there are a few points to consider:
1. filtering out of orphaned transactions: similar to the txdb, orphans are kept in the DB, which caused quite a bit confusion in the past. Currently the way to go is to manually check, whether transactions are indeed part of the main chain (e.g. by checking, whether "confirmations" > 0), but this may be done on a lower level. Some time ago I extended the RPC to have an option to filter on the RPC level, but this may also not be ideal: https://github.com/dexX7/bitcoin/commit/ffde89049a7ef96b7d09c31cc78a9a366329a24d
2. use more appropriate data types: verbose (and includeorphans) should be bool, not int.
3. refine what's actually being indexed: currently a) every data push >=8 bytes, b) if no such pushes, the entire script is indexed. given that only the lookup by destinations is exposed, it may not be useful to index only the parts that are exposed and needed
4. maybe add additional filters or aggregation calls: for http://redeem.bitwatch.co/ I'm using "listallunspent", which targets unspent outputs (in a somewhat ugly way!), and I've also played with "getallbalance" to retrieve balance information of arbitrary destinations. It's certainly not very fast, but for me an acceptable middle ground.
5. optionally maintain a mempool based view: DB entries are only written after confirmation, but it would be incredibly useful, if the unconfirmed chain state is available and exposed in one way or another.
Especially to the first and last point a few notes: I'm not sure who else uses the address-index patch, but most notably is probably counterparty-lib. They are manually filtering the results to get rid of orphans, and they are manually fetching the mempool over and over to get a better understanding of the unconfirmed chain state. This seems incredibly wasteful, performance wise, and doing the work directly in Core may provide a significant performance boost.
dcousens commented at 10:24 AM on October 26, 2015: contributor

After a discussion on IRC with @laanwj. I've changed my opinion to concept NACK with the conclusion this should be moved to an application external to bitcoind.

update: I have since written software to do this in an external index, see https://github.com/bitcoinjs/indexd (see this an example implementation)
btcdrak commented at 11:03 AM on October 26, 2015: contributor

I was also part of the conversations with @laanwj and @dcousens and I'm also in favour of doing this externally to bitcoind.
sipa commented at 4:18 PM on May 20, 2016: member

I vote to close this; this should be done in an external index. Perhaps at some point when we can a more modular design it can be supported as an optionally buildable module, but let's not complicate the current database tracking code.
sipa closed this on May 25, 2016
bitcoin locked this on Sep 8, 2021

Contributors

Labels

Linked (view graph)

#2802 Add an address-based transaction index #3652 Adds transaction index by address (updates #2802 to current master)#5048 txoutsbyaddress index