txoutsbyaddress index

cozz commented at 11:45 pm on October 5, 2014: contributor

Adds new rpc call “gettxoutsbyaddress” as requested in #4007. Disabled by default, enabled with -txoutsbyaddressindex.

The index is built from the normal utxo on first use, reindex is not required. For the GUI there is progress shown, on bitcoind you just need to be patient.

The qa-test includes a simple reorganization. I tested all 4 code parts in main.cpp by commenting them out and check if the qa-test fails at the expected line. I had to modify the rpc call for this test, as normally wrong outputs in the address index are not exposed to the user, because the rpc call relies on the normal GetCoins call. The rest of the code has been tested in random ad-hoc testing. I only tested linux on my machine.

Its all in 1 commit for laziness reasons, but if splitting in multiple commits would help reviewing, I could do that.

txoutsbyaddress

txoutsbyaddress3

sipa commented at 0:15 am on October 6, 2014: member

Would it not be possible to keep this information outside of (every) CCoinsView? We only care about it for the tip, and not for intermediate caches that are created during validation (i.e., just update it after a block is validated).

cozz commented at 4:47 pm on October 6, 2014: contributor

I can probably move the updates from (Dis)ConnectBlock to (Dis)ConnectTip, and make them with pcoinsTip, if this is what you mean?

I would have to move the CBlockUndo declaration too, if thats ok, because I need that information.

cozz force-pushed on Oct 7, 2014

sipa commented at 10:04 pm on October 8, 2014: member

I actually meant keeping this out of coins.{h,cpp} entirely. We don’t need every (partial copy) of a cache everywhere during validation of blocks and transactions to track modification of this index. Just updating a single independent index (in main, perhaps with an implementation in an independent file) after validation has succeeded should be sufficient.

The reason I ask this is because CCoinsView is both performance and consensus critical, and I’d really like to not complicate reasoning about either of those further.

cozz commented at 5:01 pm on October 9, 2014: contributor

So where exactly should we update the index, (Dis)ConnectTip? I think the address index needs to be written to disk in the same leveldb-batch with the utxo, dont you? (Because you are talking about an independent index) Otherwise, I dont think we can ensure a consistent address index, if we write async from the utxo to disk. Thats the reason for putting this in coins.{h,cpp}, as the connect/disconnect calls rely on the hashBestChain, which is also written to disk in the same leveldb-batch.

sipa commented at 7:11 pm on October 9, 2014: member

Right, it’s easier if the write is atomic (otherwise you’ll need to have a separate catch up function to rebuild the index if it’s out of date with the chainstate, and as it’s always going to be updated in lockstep with the chainstate on disk, better make it part of it).

One way would be to have CCoinsViewDB take a reference/pointer to a map with extra index entries to write, and have BatchWrite serialize those as well. That’s not particularly clean, and it’s pretty inconvenient to query as well.

It seems there is not really a better way than to make this part of the CCoinsView framework. I just wished we could keep optional features out of the consensus code…

cozz force-pushed on Oct 11, 2014

cozz commented at 3:59 am on October 11, 2014: contributor

Update: Moved the code from coins.{h,cpp} to new files coinsbyaddress.{h,cpp}, while the address index is still written to disk together with the utxo.

cozz force-pushed on Oct 16, 2014

cozz commented at 5:04 pm on October 16, 2014: contributor

Update: fixed a minor shutdown segfault in InitError case

theuni commented at 7:02 pm on October 16, 2014: member

There are lots of objects passed by ref that should be const ref it seems, and several new member functions that aren’t marked as const. Beyond the usual const rabbit-hole, it makes this substantially harder to review (for me, anyway). Could you please fix those up?

I made a few changes here as an example, please have a look: https://github.com/theuni/bitcoin/commit/d6c94fa726f19d722af718ca3b6bdfb20d1c999e

in src/txdb.cpp: in 5ca098ee03 outdated

72@@ -57,6 +73,15 @@ bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock) {
73         CCoinsMap::iterator itOld = it++;
74         mapCoins.erase(itOld);
75     }
76+    if (fTxOutsByAddressIndex && pcoinsByAddress) // only if -txoutsbyaddressindex

sipa commented at 7:55 pm on October 16, 2014:

CCoinsViewDB shouldn’t assume there is just one pcoinsByAddress global; can you pass it in in the constructor at least?

luke-jr commented at 8:00 pm on October 16, 2014: member

How do I lookup a txout by some non-address-represented script?

sipa commented at 8:19 pm on October 16, 2014: member

I would suggest storing scriptPubKeys by their hash160 or hash256, rather than storing them in full in the map (requiring an extra heap allocation + 24 bytes of overhead for each). That’s also more compact on disk than using the CScriptCompressor.

sipa commented at 8:30 pm on October 16, 2014: member

Calling the datastructure ByScript is probably more correct than ByAddress.

sipa commented at 8:30 pm on October 16, 2014: member

You can make gettxoutsbyaddress (or equivalent) also take a hex-encoding script for arbitrary script lookups.

cozz force-pushed on Oct 18, 2014

cozz commented at 4:10 am on October 18, 2014: contributor

Updated as requested.

If someone had this compiled before, you need to delete the index with -txoutsbyaddressindex=0 and then rebuild again with -txoutsbyaddressindex for incompatibility reasons. (database key is now hash of script)

in src/coinsbyscript.cpp: in 5f81feeab6 outdated

44+    assert(it != cacheCoinsByScript.end());
45+    return it->second;
46+}
47+
48+uint160 CCoinsViewByScript::getKey(const CScript &script) {
49+    return Hash160(script);

luke-jr commented at 4:19 am on October 18, 2014:

Might be better to use Hash160(first-push-in-script-data)

cozz commented at 4:38 am on October 18, 2014:

I dont understand. Is there anything wrong with hashing the complete script? We dont want 2 different scripts to result in the same hash.

luke-jr commented at 5:37 am on October 18, 2014:

Using a hash of the first push allows one to search by the hash of the first push, even without knowing the full script. This is useful for anyone wanting to find transactions to old pay-to-pubkey scripts.

I sure hope you’re not assuming there will never be collisions?

cozz commented at 3:03 pm on October 18, 2014:

Sure I assume that, why would there be collisions? Why cant you just hash the full old pay-to-pubkey script, is there any unknown data in those scripts?

I am no bitcoin script expert, but I think it is safer to hash from the first to the last bit, otherwise there is always room for an obvious attack, where I send coins to a script I can spend, but the hash is the same of a script you can spend.

cozz commented at 5:16 pm on October 18, 2014:

After thinking, I guess what you are suggesting is possible, but then we would need to have 2 rpc-functions:

gettxoutsbyaddress, which is just a wrapper around an internal searchCoinsByFirstPush(..) and loops through all search results to ensure the script is actually OP_DUP OP_HASH etc.
searchtxoutsbyscriptsfirstpush, which just returns the search results

Not sure, if this is worth the effort though. Are there really usecases, where you only know parts of the script?

luke-jr commented at 5:28 pm on October 18, 2014:

Old pay-to-pubkey scripts ( OP_CHECKSIG) were often referred to by a version 0 address with the pubkey hash. If one is searching for those, they likely only have the hash of the pubkey, not the pubkey itself.

cozz commented at 6:05 pm on October 18, 2014:

Ok I see. But I am unsure about redesigning and introducing a second rpc-call, just to support some old scripts nobody is creating anymore nowadays. To me this feature is not important, but we can hear some more opinions here. If more people would like to see this implemented, I may consider looking into it.

luke-jr commented at 6:08 pm on October 18, 2014:

I agree this feature is not important, but it would be nice to have the index compatible in case anyone ever wanted it.

sipa commented at 6:39 pm on October 18, 2014:

With a hash(fullscript) index you can still easily search for pay-to-pubkey outputs - but you do need to know the full pubkey. I think that’s perfectly acceptable.

btcdrak commented at 8:53 pm on October 26, 2014: contributor

Does this include or intersect/depend on #3652 ?

cozz commented at 6:12 pm on October 27, 2014: contributor

@btcdrak no, they are two independent indexes. This one is more lightweight, but not as powerful. With this index you can only query the unspent outputs for an address. With the other index you can query all transactions involving the address.

Of course with getrawtransaction you can also query the transaction which created the unspent output, but you can not query the transaction history for an address with this index. Only the current “balance” basically.

Depending on your use-case, this index may be all you need. But if you care about transaction history, you still need the other index. So even if we merge this index into the master, you still may want to have the other index in addition.

btcdrak commented at 7:59 pm on October 27, 2014: contributor

@cozz I’ve been using the addrindex patch, it’s extremely powerful. If we’re merging this PR I really think we should also have the other also (with is optional by config).

laanwj commented at 8:16 pm on October 27, 2014: member

The advantage of this one to the address index is that it doesn’t interfere with pruning. It doesn’t require the whole block chain, and is a much smaller index to boot.

I do agree with @sipa that this really doesn’t belong in the core consensus code. We’re trying to reduce that code to the absolute essentials (also because it will end up in a consensus library at some point), so we should not expand it with functionality that is not required for validation.

btcdrak commented at 8:20 pm on October 27, 2014: contributor

@laanwj You could say the same thing about txindex but these are optional. Even if these end up being split out eventually as the wallet code is for example, there is a real need and real world usecases for these indexes.

laanwj commented at 8:27 pm on October 27, 2014: member

One of the goals of moving the consensus and utxo code to a library, is that it will be possible to build additional tools and indexes without having to include everything and the kitchen sink into this project. Indexers can just use our code instead of the other way around.

Note that I don’t doubt for a moment that there are real use-cases. But the goal of Bitcoin Core is to provide the core infrastructure, not to satisfy every possible use-case. To keep this project maintainable we need to move away from that monolithic approach.

in src/leveldbwrapper.h: in 5f81feeab6 outdated

147@@ -148,6 +148,7 @@ class CLevelDBWrapper
148     }
149 
150     bool WriteBatch(CLevelDBBatch& batch, bool fSync = false) throw(leveldb_error);
151+    bool WriteBatch(leveldb::WriteBatch& batch, bool fSync = false) throw(leveldb_error);

sipa commented at 2:04 pm on November 4, 2014:

Why is this necessary? I’d prefer to not expose the underlying LevelDB-specific datatypes, so the database layer can be swapped out.

sipa commented at 2:07 pm on November 4, 2014: member

Thanks for changing the approach, and apologies for the slow response - it again needs rebase now.

cozz force-pushed on Nov 4, 2014

cozz commented at 8:15 pm on November 4, 2014: contributor

Rebased and removed the changes in leveldbwrapper, they were not necessary.

btcdrak commented at 9:31 pm on November 4, 2014: contributor

:+1:

laanwj added the label UTXO Db and Indexes on Dec 5, 2014

gwillen commented at 3:18 am on January 7, 2015: contributor

In #bitcoin, the user proserpine- reported the following segfault while running this pullreq:

http://pastebin.com/raw.php?i=D8EUbu4W

He says it happens once every few weeks. I’ve been looking at it and chatting with @gmaxwell about it, and it seems like it might be related to using ’tx’ in CTxMemPool::remove after it’s been invalidated.

In particular, it looks to me like tx is always a reference into mapNextTx, and my understanding is that if we reference it after performing any removal from mapNextTx – which does happen – we run the risk that our reference will become invalid (e.g. if mapNextTx undergoes a resize due to the removal.)

It looks like the other code in the function carefully avoids touching tx after removing from mapNextTx. (Also, at HEAD the remove function has been rewritten to be non-recursive, and is even more careful in how it uses the value of tx, which is now origTx.)

It looks to me like it might be sufficient (in both the old and new versions of remove) just to move your code to the top of remove(), but I’m just eyeballing that and you should not take my word for it.

gwillen commented at 3:24 am on January 7, 2015: contributor

Hmm, I take back part of that – it looks like references into a map other than the one being removed are not invalidated by a removal. So it’s no longer totally clear to me that tx is being invalidated and then used. But it still seems like a conceivable explanation for the crash. I am going to keep looking.

gwillen commented at 3:29 am on January 7, 2015: contributor

Ok, I think I understand now – I was mistaken about the transaction being in mapNextTx, which only stores pointers to transactions in mapTx. And we’re erasing the transaction from mapTx, and then later accessing it. So the new code just needs to be moved above the place where we remove the transaction from mapTx thus invalidating our reference to it.

gwillen commented at 3:39 am on January 7, 2015: contributor

Oops, that paste expired, here’s a better one:

http://pastebin.com/raw.php?i=s0uqLhtZ

paveljanik commented at 6:49 am on January 7, 2015: contributor

Can you please rebase this?

cozz force-pushed on Jan 9, 2015

cozz commented at 4:00 pm on January 9, 2015: contributor

Rebased and moved the code in CTxMemPool::remove(..), which probably fixes the segfault. thanks @gwillen

paveljanik commented at 9:18 pm on January 21, 2015: contributor

Works ok here, no crashes. Progress is displayed correctly when building the index etc.

What I consider strange is =0 semantic to remove the index. I’d go with -txoutsbyaddressindex to enable working with the index and if it is not found, build it. And -deletetxoutsbyaddressindex. Quite long, but better IMO. Or maybe this one should be the RPC command, not command line argument. What do you think?

Or the other way. Make it all RPC commands: txoutbyaddress true and ... false. And use the index automatically when it is built.

Diapolo commented at 7:25 am on January 22, 2015: none

We have the same behaviour for -txindex. If the flag is present it’s enabled and if it’s just removed or set to -txindex=0, the index is also removed. IMHO this is expected behaviour.

paveljanik commented at 7:40 am on January 23, 2015: contributor

@Diapolo Yes, but how GUI users do use this? Via RPC they could.

paveljanik commented at 12:15 pm on February 2, 2015: contributor

This need rebase now.

in qa/rpc-tests/txoutsbyaddress.sh: in 150fe1e0d7 outdated

0@@ -0,0 +1,163 @@
1+#!/usr/bin/env bash
2+# Copyright (c) 2013-2014 The Bitcoin Core developers
3+# Distributed under the MIT/X11 software license, see the accompanying

fanquake commented at 1:01 pm on February 2, 2015:

Can you drop the /X11 mentions from this pull before rebasing

cozz force-pushed on Feb 2, 2015

cozz commented at 4:19 pm on February 2, 2015: contributor

Rebased, removed /X11.

paveljanik commented at 5:42 pm on February 3, 2015: contributor

What will happen when the initial building of index is running (takes a lot of time here…) and the user pressed Ctrl+C to interrupt? The shutdown window is shown and the user has to wait anyway? This has to be changed IMO…

paveljanik commented at 5:44 pm on February 3, 2015: contributor

Travis build failure on OS X can be ignored…

paveljanik commented at 5:48 pm on February 3, 2015: contributor

Hmm, wanted to try it, but this again needs rebase.

cozz force-pushed on Feb 3, 2015

paveljanik commented at 5:31 pm on February 4, 2015: contributor

rpcserver.cpp needs rebase - very simple one.

cozz force-pushed on Feb 4, 2015

cozz commented at 6:32 pm on February 4, 2015: contributor

ok, rebased once again.

paveljanik commented at 2:03 pm on February 5, 2015: contributor

tested ACK More ideas: RPC call getaddresseswithtxouts; be able to stop the index creation in the middle. But anyway, thanks for nice work.

paveljanik commented at 2:30 pm on March 3, 2015: contributor

This needs more review… And github says that even a rebase is needed again.

ecdsa commented at 10:55 am on April 28, 2015: none

it needs to be rebased again…

cozz force-pushed on May 1, 2015

cozz commented at 12:12 pm on May 1, 2015: contributor

Rebased.

ecdsa commented at 11:32 am on May 17, 2015: none

hi can you rebase this again? thanks

cozz force-pushed on May 18, 2015

txoutsbyaddress index 76e96e19f2

cozz force-pushed on May 18, 2015

cozz commented at 12:45 pm on May 18, 2015: contributor

Rebased.

jgarzik commented at 6:20 pm on July 23, 2015: contributor

Leaning towards closing this PR. This garnered some interest, but no ACKs after a long time. It is not fair to ask @cozz to continually rebase if it’s not going in, in the short/medium term.

cozz commented at 0:43 am on July 27, 2015: contributor

Yes, a decision for 0.12 would be good. Otherwise spending more time on this, just feels like a waste to me.

btcdrak commented at 7:24 am on July 27, 2015: contributor

I have this PR on my list to test this week.

btcdrak commented at 9:58 am on August 5, 2015: contributor

Sorry for the delay, I finally tested it yesterday and today on a random sample set of 0 confirms and old txs.

Tested ACK

dcousens commented at 8:16 am on August 8, 2015: contributor

@btcdrak, I haven’t tested this yet, but fundamentally what makes this so much larger than your patch?

in src/txdb.cpp: in 76e96e19f2

35@@ -35,17 +36,33 @@ void static BatchWriteCoins(CLevelDBBatch &batch, const uint256 &hash, const CCo
36         batch.Write(make_pair(DB_COINS, hash), coins);
37 }
38 
39+void static BatchWriteCoins(CLevelDBBatch &batch, const uint160 &hash, const CCoinsByScript &coins) {

sipa commented at 2:48 pm on August 8, 2015:

Can you name this function differently? I found it confusing to see it being used by the by-script logic.

in src/txdb.cpp: in 76e96e19f2

35@@ -35,17 +36,33 @@ void static BatchWriteCoins(CLevelDBBatch &batch, const uint256 &hash, const CCo
36         batch.Write(make_pair(DB_COINS, hash), coins);
37 }
38 
39+void static BatchWriteCoins(CLevelDBBatch &batch, const uint160 &hash, const CCoinsByScript &coins) {
40+    if (coins.IsEmpty())
41+        batch.Erase(make_pair('d', hash));

sipa commented at 2:48 pm on August 8, 2015:

Can you introduce a DB_COINS_BYSCRIPT constant above instead of ’d'?

sipa commented at 3:07 pm on August 8, 2015: member

Agree with @jgarzik, it’s not fair to keep this lingering for too long.

I like the use case it solves, but I still dislike the degree to which it’s entangled with the CCoinsView. I think this feature should be something that is independent, rather than complicating the code necessary for its core function.

Would it be reasonable to implement it as a separate database file, and have it function like the wallet? Something that registers with CValidationInterface, and listens for new transactions being added (and removed) from the blockchain. It could store its own CBlockLocator (like the wallet does, triggered by CValidationInterface’s SetBestChain) so it can “rescan” like the wallet does too.

A by-script index for the mempool could be done separately, using an extra index added after #6470’s multiindex for the mempool?

cozz commented at 8:14 pm on August 8, 2015: contributor

Well, I am sorry, I am closing this now. As you may have noticed, I am not contributing to this project anymore, because I disagree with basically anything here and I am doing something else now. If anybody wants to pick this up, feel free to just copy and paste from my code.

Off-topic: I disagree with most code changes you guys are making. For example main.cpp was much more readable in 0.8 than it is now. Or libconsensus. You guys are very good low-level coders, but you seriously have zero understanding of what encapsulation of problems is. (hint: its not moving code to another file) You are actually making the source code worse in my opinion.

You havent merged my wallet-pulls, which I consider as important bug-fixes because the wallet-performance is just a bug. Instead you consider moving code around as more important. Long-term goal is to remove GUI and wallet even.

And last but not least, the block size. It is your responsibility to give people the latest in technology. This includes scale. Your job here is to max out the block size. This is like if your job is to make a 20000 people stadium full and you only let 1000 people in, with the argument that you can never satisfy demand anyway.

You are acting like, if we go higher than 1MB, then suddenly everybody has to fully trust the miners. The threshold to rent a server for a day and verify all blocks to not be affordable to normal people is very much higher than 1MB.

I believe we need different people in the lead position here to be successful. At least gavin should be back on the throne with final say on really everything.

So sorry, but long-term I have given up on the project. I am still following a little, so if there are maintanance things for my code, feel free to contact.

cozz closed this on Aug 8, 2015

sipa commented at 8:59 pm on August 8, 2015: member

I’m sorry you feel that way, and I understand the sentiment also partially.

I agree moving code to another file is not proper encapsulation, but some of the time, moving is a first step, and an easily reviewable one. Especially the separation of consensus code is not something we want to do without very high assurances behaviour does not change. I think we are making progress, but it is admittedly slow.

My only suggestion above here was a more encapsulated approach, by the way, not a NAK…

Regarding total complexity: I fully agree the code is not “nice”. It wasn’t nice when Satoshi disappeared, and it still isn’t. Significant parts have been properly encapsulated since (script execution, addrman, wallet, UTXO management, network base, UI interaction, limited size data structures), but at the same time, the code definitely has grown too. I believe that was necessary: higher performance often means the problem becomes more complex (for example the introduction of caches, batch processing, precomputed values, …). I often very much wish to rewrite everything in a clean way, but that’s just not something that would be reviewable.

About your wallet patches not being merged: I’m aware. But the truth is that there is little interest in the wallet code, from both users and developers. We need people who are interested in reviewing and testing to make changes. That of course leads to the wallet code growing more and more outdated, reinforcing that circle. One solution to this may be @jonasschnelli’s new wallet project, which aims to add a new expiremental wallet from scratch, which would be more competitive in features.

Regarding block size: please discuss that on the mailing list. As a Bitcoin Core maintainer, I believe we should merge any consensus change that appears uncontroversial to the community.

cozz commented at 10:02 pm on August 8, 2015: contributor

I didnt expect such an honest response, thanks, that gives you a lot of respect. It doesnt make me change my mind about this project though.

ghost commented at 2:01 am on July 26, 2017: none

how quick does this returns data ? has anybody tried it ? Thanks.

shamoons commented at 1:38 am on May 5, 2018: none

Still open?

MarcoFalke locked this on Sep 8, 2021

txoutsbyaddress index #5048