Adds index to transactions by scriptPubKey. Based off of #2802. Stores hashes of scripts (hashed using Murmurhash3, with a hash seed that is stored in the index database), along with the COutPoint for the output which was spent/created, and the CDiskTxPos for the transaction in which this happened.
Add address-based index (attempt 4?) #14053
pull marcinja wants to merge 4 commits into bitcoin:master from marcinja:add-address-index changing 13 files +805 −1-
marcinja commented at 6:51 PM on August 24, 2018: contributor
-
DrahtBot commented at 7:23 PM on August 24, 2018: member
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--174a7506f384e20aa4161008e828411d-->
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #20404 (Remove names from translatable strings by hebasto)
- #20012 (rpc: Remove duplicate name and argNames from CRPCCommand by MarcoFalke)
- #19873 (Flush dbcache early if system is under memory pressure by luke-jr)
- #19806 (validation: UTXO snapshot activation by jamesob)
- #19521 (Coinstats Index (without UTXO set hash) by fjahr)
- #15946 (Allow maintaining the blockfilterindex when using prune by jonasschnelli)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
- laanwj added the label UTXO Db and Indexes on Aug 25, 2018
- DrahtBot added the label Needs rebase on Aug 25, 2018
- marcinja force-pushed on Aug 27, 2018
- marcinja force-pushed on Aug 27, 2018
- DrahtBot removed the label Needs rebase on Aug 27, 2018
- marcinja force-pushed on Aug 28, 2018
- marcinja force-pushed on Aug 28, 2018
-
in src/index/base.h:103 in 3c7cc3c705 outdated
95 | @@ -93,6 +96,33 @@ class BaseIndex : public CValidationInterface 96 | 97 | /// Stops the instance from staying in sync with blockchain updates. 98 | void Stop(); 99 | + 100 | + bool IsInSyncWithMainChain() const; 101 | +}; 102 | + 103 | +struct CDiskTxPos : public CDiskBlockPos
ryanofsky commented at 4:47 PM on August 29, 2018:In commit "Introduce address index" (3c7cc3c705b08828dc8dc919e53343df78568ebd)
Note: this class is moved from
src/index/txindex.cppwith no changes (except whitespace)
jimpo commented at 10:53 PM on August 30, 2018:I'd vote to move it to it's own file, like
src/index/disktxpos.{h,cpp}.in src/index/addrindex.cpp:23 in 3c7cc3c705 outdated
18 | + 19 | +constexpr char DB_ADDRINDEX = 'a'; 20 | +std::unique_ptr<AddrIndex> g_addrindex; 21 | + 22 | +/** 23 | + * Access to the addrindex database (indexes/addrindex/)
ryanofsky commented at 4:53 PM on August 29, 2018:In commit "Introduce address index" (3c7cc3c705b08828dc8dc919e53343df78568ebd)
Note: new
index/addrindex.cpp,index/addrindex.h, andtest/addrindex_tests.cppfiles in this commit mirror existingindex/txindex.cppandindex/txindex.h,test/txindex_tests.cppfiles and have some code and comments in common. It can help to diff theaddrfiles against thetxfiles when reviewing this PR.in src/index/addrindex.cpp:43 in 3c7cc3c705 outdated
38 | + bool ReadAddrIndex(const uint64_t addr_id, 39 | + std::vector<std::pair<std::pair<char, uint64_t>, CDiskTxPos>> &keys_found, 40 | + const bool filter_by_value = false, 41 | + const uint64_t value_wanted = 0); 42 | + 43 | + bool WriteToIndex(const std::vector<std::pair<uint64_t, CDiskTxPos>> &positions, const uint256 block_hash);
ryanofsky commented at 5:37 PM on August 29, 2018:In commit "Introduce address index" (3c7cc3c705b08828dc8dc919e53343df78568ebd)
I think this code could benefit from some typedefs and more consistent use of types internally to improve readability. Maybe:
using AddressId = uint64_t; using BlockId = uint64; using DbKey = std::pair<std::pair<char, AddressID>, CDiskTxPos>; using DbValue = BlockId;I also wonder if
std::tuple<char, AddressId, CDiskTxPos>might be a more natural key format than the nested pair.in src/index/addrindex.cpp:55 in 3c7cc3c705 outdated
50 | +{} 51 | + 52 | +BaseIndex::DB& AddrIndex::GetDB() const { return *m_db; } 53 | + 54 | +bool AddrIndex::DB::ReadAddrIndex(const uint64_t addr_id, 55 | + std::vector<std::pair<std::pair<char, uint64_t>, CDiskTxPos>> &keys_found,
ryanofsky commented at 5:47 PM on August 29, 2018:In commit "Introduce address index" (3c7cc3c705b08828dc8dc919e53343df78568ebd)
Could just make this a return value instead of an output parameter. Existing return value seems redundant.
in src/index/addrindex.cpp:57 in 3c7cc3c705 outdated
52 | +BaseIndex::DB& AddrIndex::GetDB() const { return *m_db; } 53 | + 54 | +bool AddrIndex::DB::ReadAddrIndex(const uint64_t addr_id, 55 | + std::vector<std::pair<std::pair<char, uint64_t>, CDiskTxPos>> &keys_found, 56 | + const bool filter_by_value, 57 | + const uint64_t value_wanted){
ryanofsky commented at 5:48 PM on August 29, 2018:In commit "Introduce address index" (3c7cc3c705b08828dc8dc919e53343df78568ebd)
Could consolidate last two parameters into single
boost::optional<AddressId> filter_valueparam.in src/index/addrindex.cpp:124 in 3c7cc3c705 outdated
119 | + if (!g_txindex->FindTx(tx_in.prevout.hash, block_hash, tx)) { 120 | + // Both addrindex and txindex may be syncing in parallel, and addrindex might 121 | + // be ahead of txindex. We let txindex sync first so that addrindex can continue 122 | + // after it. 123 | + while (!g_txindex->IsInSyncWithMainChain()) { 124 | + MilliSleep(1000); //TODO: find a less arbitrary sleep time.
ryanofsky commented at 6:14 PM on August 29, 2018:In commit "Introduce address index" (3c7cc3c705b08828dc8dc919e53343df78568ebd)
This sleep + BlockUntilSyncedToCurrentChain call is pretty hacky but probably ok in principle. I think @jimpo could easily tell you how to improve it. I would move this code out of the
AddrIndexclass and into ag_txindex->WaitForSync()or similar method, and probably replace the sleep with a condition variable wait.Or maybe this code will be unnecessary with #14035 undo data?
in src/index/addrindex.cpp:66 in 3c7cc3c705 outdated
61 | + 62 | + iter->Seek(key_prefix); 63 | + while (iter->Valid()) { 64 | + std::pair<std::pair<char, uint64_t>, CDiskTxPos> key; 65 | + uint64_t value; 66 | + if (!iter->GetKey(key) || !iter->GetValue(value) || key.first != key_prefix) break;
ryanofsky commented at 6:22 PM on August 29, 2018:In commit "Introduce address index" (3c7cc3c705b08828dc8dc919e53343df78568ebd)
Should probably reorder this condition. It seems pointless to retrieve the value when the key is wrong, and maybe dangerous if the value is a different format and deserializing could throw an exception.
in src/rpc/rawtransaction.cpp:223 in 226eeea973 outdated
203 | @@ -203,6 +204,123 @@ static UniValue getrawtransaction(const JSONRPCRequest& request) 204 | return result; 205 | } 206 | 207 | +static UniValue searchrawtransactions(const JSONRPCRequest& request) {
ryanofsky commented at 6:32 PM on August 29, 2018:In commit "Add searchrawtransactions RPC" (226eeea9736127269058fbbf7816c7100f90974d)
Should add a basic python functional test for the new RPC method.
ryanofsky commented at 6:58 PM on August 29, 2018: memberReviewed most of the code, but just skimmed tests. It looks to me like this PR could be merged basically in its current form, so I'm curious if you're intending to make the improvements cited in the PR description above here or in a separate PRs.
I left a few minor suggestions about the code, which you should feel free to ignore. The only changes I would definitely like to see here are:
- adding some python code in
test/functional/to call the new rpc method - adding a blurb in
doc/release notes.mdto describe the feature and maybe mention some use-cases
in src/init.cpp:1630 in 4865275573 outdated
1624 | @@ -1611,6 +1625,13 @@ bool AppInitMain() 1625 | g_txindex->Start(); 1626 | } 1627 | 1628 | + if (gArgs.GetBoolArg("-addrindex", DEFAULT_ADDRINDEX)) { 1629 | + if (!g_txindex) 1630 | + InitWarning(_("-txindex must be enabled for -addrindex to index spends from addresses."));
luke-jr commented at 12:37 PM on August 30, 2018:There is no such thing as a "spend from address"
Sjors commented at 5:35 PM on September 7, 2018:Suggest dropping "to index spends from addresses"; if someone configures
addrindex=I assume they know why. Also prevents a long debate :-)in src/rpc/rawtransaction.cpp:216 in 4865275573 outdated
211 | + "\nReturns raw transactions that contain the given address and the hash of the block(s) they were found in.\n" 212 | + "\nRequires -addrindex to be enabled.\n" 213 | + "\nArguments:\n" 214 | + "1. \"address\" (string, required) The address to search for\n" 215 | + "2. \"verbose\" (bool, optional, default = false) If set to false, only returns data for hex-encoded `txid`s. \n" 216 | + "3. \"skip\" (numeric, optional, default = 0) If set, the result skips this number of initial values. \n"
luke-jr commented at 12:40 PM on August 30, 2018:skipandcountprobably make sense on an options object instead.in src/rpc/rawtransaction.cpp:217 in 4865275573 outdated
212 | + "\nRequires -addrindex to be enabled.\n" 213 | + "\nArguments:\n" 214 | + "1. \"address\" (string, required) The address to search for\n" 215 | + "2. \"verbose\" (bool, optional, default = false) If set to false, only returns data for hex-encoded `txid`s. \n" 216 | + "3. \"skip\" (numeric, optional, default = 0) If set, the result skips this number of initial values. \n" 217 | + "3. \"count\" (numeric, optional, default = 100) If set, the result will only contain this amount of values. \n"
luke-jr commented at 12:40 PM on August 30, 2018:(this would be number 4, but n/a once on an options object)
in src/index/addrindex.cpp:114 in 4865275573 outdated
109 | + for (const auto& tx : block.vtx) { 110 | + for (const auto tx_out : tx->vout){ 111 | + positions.emplace_back(GetAddrID(tx_out.scriptPubKey), pos); 112 | + } 113 | + 114 | + if (g_txindex && !tx->IsCoinBase()) {
luke-jr commented at 12:45 PM on August 30, 2018:This basically shouldn't be here. The address used to receive a coin has zero relationship with the transaction spending that coin.
If someone really cares about getting it anyway, there should be a separate input txid/index -> spend txid index.
jimpo commented at 10:45 PM on August 30, 2018:Spends from an address should at least get indexed under a separate key prefix even if it's done by the same indexer.
Using the undo files is a much better way to get this information than the txindex.
in src/index/addrindex.cpp:132 in 4865275573 outdated
127 | + // It's also possible we can't find the tx in txindex because it fell behind in 128 | + // the ValidationInterface queue. In this case we also let it finish before continuing. 129 | + g_txindex->BlockUntilSyncedToCurrentChain(); 130 | + 131 | + // If we still can't find the tx then a re-org may have happened. 132 | + if (!g_txindex->FindTx(tx_in.prevout.hash, block_hash, tx)) return false;
luke-jr commented at 12:45 PM on August 30, 2018:This looks like a very fragile assumption.
in src/index/addrindex.cpp:129 in 4865275573 outdated
124 | + MilliSleep(1000); //TODO: find a less arbitrary sleep time. 125 | + } 126 | + 127 | + // It's also possible we can't find the tx in txindex because it fell behind in 128 | + // the ValidationInterface queue. In this case we also let it finish before continuing. 129 | + g_txindex->BlockUntilSyncedToCurrentChain();
luke-jr commented at 12:45 PM on August 30, 2018:With this, why is the loop waiting for IsInSyncWithMainChain needed?
marcinja commented at 4:16 PM on August 30, 2018:BlockUntilSyncedToCurrentChainreturns false immediately ifm_syncedis false.The loop waiting for
IsInSyncWithMainChainwaits for txindex to finish inThreadSync. Once txindex finishes inThreadSyncit will be updated by ValiditionInterface callbacks from that point on.This line (since txindex must have
m_synced= true now) will actually block now until txindex syncs with the ValidationInterface queue.
jimpo commented at 10:49 PM on August 30, 2018:A better way to do it is to loop until
BlockUntilSyncedToCurrentChainbecomes true.luke-jr changes_requestedin src/index/addrindex.cpp:25 in 3c7cc3c705 outdated
20 | +std::unique_ptr<AddrIndex> g_addrindex; 21 | + 22 | +/** 23 | + * Access to the addrindex database (indexes/addrindex/) 24 | + * 25 | + * The database stores a block locator of the chain the database is synced to
jimpo commented at 6:31 PM on August 30, 2018:commit: Introduce address index
Can you move this comment to
BaseIndex::DBand delete fromTxIndex::DBas well? No need to copy it to every new index file, especially since it's handled at the base layer.in src/index/addrindex.cpp:158 in 4865275573 outdated
153 | + batch.Write(std::make_pair(std::make_pair(DB_ADDRINDEX, pos.first), pos.second), block_hash.GetUint64(0)); 154 | + } 155 | + return WriteBatch(batch); 156 | +} 157 | + 158 | +void AddrIndex::BlockDisconnected(const std::shared_ptr<const CBlock> &block) {
marcinja commented at 7:14 PM on August 30, 2018:This needs an
if (!m_synced) return;in src/index/addrindex.h:47 in 3c7cc3c705 outdated
42 | + 43 | + // Destructor is declared because this class contains a unique_ptr to an incomplete type. 44 | + virtual ~AddrIndex() override; 45 | + 46 | + /// Lookup transaction(s) by scriptPubKey. Fills txs vector with (block_hash, tx) pairs. 47 | + bool FindTxsByScript(const CScript& dest, std::vector<std::pair<uint256, CTransactionRef>> &txs);
jimpo commented at 10:47 PM on August 30, 2018:There should be a way of differentiating between txs where the script is the output and ones that spend from the address. Perhaps two methods or two output vector parameters.
in src/index/addrindex.cpp:162 in 3c7cc3c705 outdated
157 | + 158 | +void AddrIndex::BlockDisconnected(const std::shared_ptr<const CBlock> &block) { 159 | + const uint64_t block_hash_bits = block->GetHash().GetUint64(0); 160 | + std::unordered_set<uint64_t> addr_ids_to_remove; 161 | + 162 | + {
jimpo commented at 10:51 PM on August 30, 2018:Use the undo files instead of CCoinsViewCache.
in src/index/base.cpp:283 in 3c7cc3c705 outdated
277 | @@ -276,3 +278,8 @@ void BaseIndex::Stop() 278 | m_thread_sync.join(); 279 | } 280 | } 281 | + 282 | + 283 | +bool BaseIndex::IsInSyncWithMainChain() const {
jimpo commented at 10:52 PM on August 30, 2018:This shouldn't be necessary.
in src/index/base.h:65 in 3c7cc3c705 outdated
61 | @@ -61,6 +62,8 @@ class BaseIndex : public CValidationInterface 62 | void BlockConnected(const std::shared_ptr<const CBlock>& block, const CBlockIndex* pindex, 63 | const std::vector<CTransactionRef>& txn_conflicted) override; 64 | 65 | + void BlockDisconnected(const std::shared_ptr<const CBlock> &block) override;
jimpo commented at 10:52 PM on August 30, 2018:Why? The implementation is empty.
in src/index/addrindex.h:50 in 3c7cc3c705 outdated
45 | + 46 | + /// Lookup transaction(s) by scriptPubKey. Fills txs vector with (block_hash, tx) pairs. 47 | + bool FindTxsByScript(const CScript& dest, std::vector<std::pair<uint256, CTransactionRef>> &txs); 48 | + 49 | + // Returns part of key used to store information in db. 50 | + static uint64_t GetAddrID(const CScript& script);
jimpo commented at 10:56 PM on August 30, 2018:Why is this exposed? It seems internal.
in src/index/addrindex.cpp:56 in 3c7cc3c705 outdated
51 | + 52 | +BaseIndex::DB& AddrIndex::GetDB() const { return *m_db; } 53 | + 54 | +bool AddrIndex::DB::ReadAddrIndex(const uint64_t addr_id, 55 | + std::vector<std::pair<std::pair<char, uint64_t>, CDiskTxPos>> &keys_found, 56 | + const bool filter_by_value,
jimpo commented at 11:00 PM on August 30, 2018:Seems that
filter_by_valuefunctionality isn't used? Also, I'd rename to something likefilter_by_blockbecause it's not clear what value means and is tightly coupled to the database layout chosen.jimpo changes_requestedjimpo commented at 11:15 PM on August 30, 2018: contributorNice work! I'm glad someone's working on this. Concept ACK.
- The AddrIndex should return information about the outpoints and differentiate between outputs and spends, not just return the raw transactions. In fact, the AddrIndex could just return outpoints, then the client code could use the TxIndex could to fetch the tx bodies. It'd involve a separate lookup though.
- I don't think it's necessary to delete keys from the database when a block is disconnected. There's no harm in leaving it. The higher level methods to search the index can then filter for results that are on the main chain if that's what the client wants. It'd have to do this anyway to avoid races with reorgs and such.
- I'm worried about collisions on address IDs because they are only 64 bits. I can think of three options, 1) use a 32 byte cryptographic hash, 2) use a 20 byte cryptographic hash of the script plus some randomly generated, database-global salt, 3) use a 32- or 64-bit non-cryptographic hash (might as well use Murmur3 or SipHash, not truncated SHA256), then store the full script as the database value to double check against. Option 3 feels best to me.
- What's the purpose of having the first byte of the block hash as the value? It doesn't seem robust nor particularly useful.
marcinja commented at 3:19 PM on August 31, 2018: contributorThanks for all the reviews.
To answer some of @jimpo's questions: 2 & 4. I included part of the block hash so that in BlockDisconnected we ae sure to remove the entries in the index from this block only (that's where
filter_by_valueis used). The reason I chose to remove entries from the database is to prevent reading into a block file using an oldCDiskTxPosthat may no longer be a valid position. Otherwise inFindTxsByScriptyou could run into errors. You're right that this problem would be better handled by higher level methods.I think that returning just the outpoint is a better idea than the current choice so I'll switch to that and try to incorporate all the other feedback here.
in src/index/addrindex.cpp:68 in 4865275573 outdated
63 | + while (iter->Valid()) { 64 | + std::pair<std::pair<char, uint64_t>, CDiskTxPos> key; 65 | + uint64_t value; 66 | + if (!iter->GetKey(key) || !iter->GetValue(value) || key.first != key_prefix) break; 67 | + 68 | + if (!filter_by_value || (filter_by_value && value == value_wanted)) {
practicalswift commented at 7:30 AM on September 5, 2018:Redundant condition
filter_by_value: what about!filter_by_value || value == value_wantedinstead?in src/index/addrindex.cpp:146 in 4865275573 outdated
141 | + } 142 | + 143 | + return m_db->WriteToIndex(positions, block.GetHash()); 144 | +} 145 | + 146 | +bool AddrIndex::DB::WriteToIndex(const std::vector<std::pair<uint64_t, CDiskTxPos>>& positions, const uint256 block_hash)
practicalswift commented at 7:30 AM on September 5, 2018:block_hashshould be passed by const reference?in src/test/test_bitcoin.cpp:181 in 4865275573 outdated
174 | @@ -173,6 +175,76 @@ TestChain100Setup::CreateAndProcessBlock(const std::vector<CMutableTransaction>& 175 | return result; 176 | } 177 | 178 | +// Based off of BuildChain in validation_block_tests.cpp 179 | + 180 | +// Build a chain of blocks that contains all of the transactions in txns. 181 | +void TestChain100Setup::BuildChain(const uint256 prev_hash, const uint32_t prev_time, int height, std::vector<CMutableTransaction> &txns, const CScript& scriptPubKey, std::vector<std::shared_ptr<const CBlock>>& blocks) {
practicalswift commented at 7:30 AM on September 5, 2018:prev_hashshould be passed by const reference?in src/test/test_bitcoin.h:92 in 4865275573 outdated
88 | @@ -89,6 +89,15 @@ struct TestChain100Setup : public TestingSetup { 89 | CBlock CreateAndProcessBlock(const std::vector<CMutableTransaction>& txns, 90 | const CScript& scriptPubKey); 91 | 92 | + void BuildChain(const uint256 prev_hash,
practicalswift commented at 7:31 AM on September 5, 2018:prev_hashshould be passed by const reference?in src/test/test_bitcoin.cpp:228 in 4865275573 outdated
223 | + blocks.push_back(shared_pblock); 224 | + 225 | + BuildChain(blocks.back()->GetHash(), blocks.back()->nTime, height - 1, txns, scriptPubKey, blocks); 226 | +} 227 | + 228 | +void TestChain100Setup::CreateSpendingTxs(int coinbase_spent_offset, std::vector<CScript>& script_pub_keys, std::vector<CMutableTransaction> &spends, CScript coinbase_script_pub_key) {
practicalswift commented at 7:32 AM on September 5, 2018:coinbase_script_pub_keyshould be passed by const reference?in src/rpc/rawtransaction.cpp:312 in 4865275573 outdated
307 | + } 308 | + return ret; 309 | + } 310 | + 311 | + std::vector<std::pair<uint256, CTransactionRef>>::const_iterator it = result.begin(); 312 | + while (it != result.end() && skip--) it++; // Skip first set of results as needed.
practicalswift commented at 7:32 AM on September 5, 2018:Nit:
++itin src/rpc/rawtransaction.cpp:324 in 4865275573 outdated
319 | + std::string hex_tx = EncodeHexTx(*(tuple.second), RPCSerializationFlags()); 320 | + tx_val.pushKV("hex", hex_tx); 321 | + tx_val.pushKV("blockhash", tuple.first.GetHex()); 322 | + } 323 | + ret.push_back(tx_val); 324 | + it++;
practicalswift commented at 7:32 AM on September 5, 2018:Nit:
++itin src/test/test_bitcoin.cpp:194 in 4865275573 outdated
189 | + block.vtx.resize(1); 190 | + 191 | + // If this is the last block, add all remaining transactions. 192 | + // Otherwise add with some randomness. 193 | + for (auto it = txns.begin(); it != txns.end();) { 194 | + bool add_tx = (height == 1) || (GetRandInt(height) < txns.size());
practicalswift commented at 8:16 PM on September 6, 2018:test/test_bitcoin.cpp:194:60: warning: comparison of integers of different signs: 'int' and 'std::vector::size_type' (aka 'unsigned long') [-Wsign-compare] bool add_tx = (height == 1) || (GetRandInt(height) < txns.size()); ~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~
Sjors commented at 5:37 PM on September 7, 2018:@MarcoFalke why is it again that Travis doesn't fail on warnings?
in src/index/addrindex.cpp:110 in 4865275573 outdated
105 | + std::vector<std::pair<uint64_t, CDiskTxPos>> positions; 106 | + positions.reserve(2 * block.vtx.size()); // Most transactions have at least 1 input and 1 output. 107 | + 108 | + // Index addresses of spent outputs if txindex is enabled, 109 | + for (const auto& tx : block.vtx) { 110 | + for (const auto tx_out : tx->vout){
practicalswift commented at 8:17 PM on September 6, 2018:index/addrindex.cpp:110:25: warning: loop variable 'tx_out' of type 'const CTxOut' creates a copy from type 'const CTxOut' [-Wrange-loop-analysis] for (const auto tx_out : tx->vout){ ^ index/addrindex.cpp:110:14: note: use reference type 'const CTxOut &' to prevent copying for (const auto tx_out : tx->vout){ ^~~~~~~~~~~~~~~~~~~in src/index/addrindex.cpp:115 in 4865275573 outdated
110 | + for (const auto tx_out : tx->vout){ 111 | + positions.emplace_back(GetAddrID(tx_out.scriptPubKey), pos); 112 | + } 113 | + 114 | + if (g_txindex && !tx->IsCoinBase()) { 115 | + for (const auto tx_in : tx->vin) {
practicalswift commented at 8:17 PM on September 6, 2018:Same here - see above.
in src/index/addrindex.cpp:168 in 4865275573 outdated
163 | + LOCK(cs_main); 164 | + CCoinsViewCache view(pcoinsTip.get()); 165 | + 166 | + // Collect all addr_ids from txs in this block. 167 | + for (const auto& tx : block->vtx) { 168 | + for (const auto tx_out : tx->vout){
practicalswift commented at 8:18 PM on September 6, 2018:Same here - see above.
in src/index/addrindex.cpp:173 in 4865275573 outdated
168 | + for (const auto tx_out : tx->vout){ 169 | + addr_ids_to_remove.emplace(GetAddrID(tx_out.scriptPubKey)); 170 | + } 171 | + 172 | + if (!tx->IsCoinBase()) { 173 | + for (const auto tx_in : tx->vin){
practicalswift commented at 8:18 PM on September 6, 2018:Same here - see above.
in src/test/addrindex_tests.cpp:33 in 4865275573 outdated
107 | + 108 | + // Mine blocks for coinbase maturity, so we can spend some coinbase outputs in the test. 109 | + CScript coinbase_script_pub_key = CScript() << ToByteVector(coinbaseKey.GetPubKey()) << OP_CHECKSIG; 110 | + for (int i = 0; i < 20; i++) { 111 | + std::vector<CMutableTransaction> no_txns; 112 | + const CBlock& block = CreateAndProcessBlock(no_txns, coinbase_script_pub_key);
practicalswift commented at 8:18 PM on September 6, 2018:test/addrindex_tests.cpp:112:23: warning: unused variable 'block' [-Wunused-variable] const CBlock& block = CreateAndProcessBlock(no_txns, coinbase_script_pub_key); ^in src/rpc/rawtransaction.cpp:313 in 4865275573 outdated
301 | + 302 | + UniValue ret(UniValue::VARR); 303 | + std::vector<std::pair<uint256, CTransactionRef>> result; 304 | + if (!g_addrindex->FindTxsByScript(scriptPubKey, result)) { 305 | + if (!addrindex_ready) { 306 | + throw JSONRPCError(RPC_INVALID_ADDRESS_OR_KEY,"Transactions with given address not found. Blockchain transactions are still in the process of being indexed");
Sjors commented at 5:47 PM on September 7, 2018:The
Blockchain transactions are still in the process of being indexedwarning should always be shown, because there might be missing transactions while indexing is in progress (txindexdoesn't have that problem).Sjors commented at 6:13 PM on September 7, 2018: memberConcept ACK. Suggest adding
-addrindexandsearchrawtransactions <address>to the PR description for easier review.I would do "integrating output descriptors" in a different PR.
Lightly tested on macOS. It doesn't find the coinbase transaction to
mtnNvZY7iQCfzJCVon13tfJSVYDn9iiUWDin block 120,mg4bQva8w7Cjs8KaKHeEbyXUaCzoQAcXyHin block 10001 and several others.in src/index/addrindex.h:52 in 4865275573 outdated
39 | +public: 40 | + /// Constructs the index, which becomes available to be queried. 41 | + explicit AddrIndex(size_t n_cache_size, bool f_memory = false, bool f_wipe = false); 42 | + 43 | + // Destructor is declared because this class contains a unique_ptr to an incomplete type. 44 | + virtual ~AddrIndex() override;
practicalswift commented at 7:59 AM on September 23, 2018:2018-09-22 21:20:09 cpplint(pr=14053): src/index/addrindex.h:44: "virtual" is redundant since function is already declared as "override" [readability/inheritance] [4]DrahtBot added the label Needs rebase on Sep 24, 2018karelbilek commented at 6:40 AM on October 5, 2018: contributorComments to the concept
Some time ago I have rebased bitpay's bitcore patches (from their insight block explorer) to bitcoin core here (without much work on my own)
The consensus back then seemed to be that more indexes in the core is not a good thing, and work should be instead done on external indexes. However I am no sure how this PR differs, I just quickly skimmed it.
MarcoFalke commented at 8:02 AM on October 5, 2018: memberSjors commented at 10:14 AM on December 15, 2018: memberWe talked about (address) indexes during the previous IRC meeting.
marcinja force-pushed on Jan 22, 2019marcinja force-pushed on Jan 22, 2019marcinja force-pushed on Jan 22, 2019DrahtBot removed the label Needs rebase on Jan 22, 2019marcinja force-pushed on Jan 23, 2019marcinja force-pushed on Jan 23, 2019in src/test/addrindex_tests.cpp:73 in fc075a0d94 outdated
68 | + BOOST_CHECK(addr_index.BlockUntilSyncedToCurrentChain()); // Let the address index catch up. 69 | + BOOST_CHECK(chainActive.Tip()->GetBlockHash() == block_hash); // Sanity check to make sure this block is actually being used. 70 | + 71 | + // Now check that all the addresses we sent to are present in the index. 72 | + for (int i = 0; i < 10; i++) { 73 | + std::vector<std::pair<COutPoint, std::pair<CTransactionRef, uint256>>> spends;
practicalswift commented at 8:48 AM on January 24, 2019:spendsshadows a variable existing in the outer scope.in src/test/addrindex_tests.cpp:74 in fc075a0d94 outdated
69 | + BOOST_CHECK(chainActive.Tip()->GetBlockHash() == block_hash); // Sanity check to make sure this block is actually being used. 70 | + 71 | + // Now check that all the addresses we sent to are present in the index. 72 | + for (int i = 0; i < 10; i++) { 73 | + std::vector<std::pair<COutPoint, std::pair<CTransactionRef, uint256>>> spends; 74 | + std::vector<std::pair<COutPoint, std::pair<CTransactionRef, uint256>>> creations;
practicalswift commented at 8:48 AM on January 24, 2019:Same here but for
creations.marcinja force-pushed on Jan 24, 2019MarcoFalke deleted a comment on Jan 25, 2019MarcoFalke deleted a comment on Jan 25, 2019MarcoFalke deleted a comment on Jan 25, 2019MarcoFalke deleted a comment on Jan 25, 2019MarcoFalke deleted a comment on Jan 25, 2019MarcoFalke deleted a comment on Jan 25, 2019MarcoFalke deleted a comment on Jan 25, 2019MarcoFalke deleted a comment on Jan 25, 2019MarcoFalke deleted a comment on Jan 25, 2019MarcoFalke deleted a comment on Jan 25, 2019MarcoFalke deleted a comment on Jan 25, 2019MarcoFalke deleted a comment on Jan 25, 2019MarcoFalke deleted a comment on Jan 25, 2019DrahtBot added the label Needs rebase on Jan 30, 2019marcinja force-pushed on Jan 30, 2019marcinja force-pushed on Jan 30, 2019marcinja force-pushed on Jan 30, 2019DrahtBot removed the label Needs rebase on Jan 30, 2019marcinja force-pushed on Jan 30, 2019marcinja commented at 10:19 PM on January 30, 2019: contributorLightly tested on macOS. It doesn't find the coinbase transaction to
mtnNvZY7iQCfzJCVon13tfJSVYDn9iiUWDin block 120,mg4bQva8w7Cjs8KaKHeEbyXUaCzoQAcXyHin block 10001 and several others.I'm confused at what you're referring to by
mtn...andmg4.... The outputs for those two coinbase transactions are P2PK, so the only way to search for them is by scriptPubKey. I used the hex value of the scriptPubKey from here and was able to find it.DrahtBot added the label Needs rebase on Feb 19, 2019marcinja force-pushed on Mar 6, 2019DrahtBot removed the label Needs rebase on Mar 6, 2019DrahtBot added the label Needs rebase on Mar 18, 2019marcinja force-pushed on Mar 20, 2019DrahtBot removed the label Needs rebase on Mar 20, 2019DrahtBot added the label Needs rebase on Mar 20, 2019marcinja force-pushed on Mar 20, 2019marcinja force-pushed on Mar 20, 2019marcinja force-pushed on Mar 20, 2019DrahtBot removed the label Needs rebase on Mar 21, 2019DrahtBot added the label Needs rebase on Mar 21, 2019ryanofsky commented at 9:36 PM on July 17, 2019: memberMaybe tag this needing concept ack.
This would probably be pretty easy to rebase, and has a previous concept ack from jimpo, but it's not clear if other people want this change or might object to it.
MarcoFalke added the label Up for grabs on Jul 18, 2019fanquake added the label Needs Conceptual Review on Aug 5, 2019dan-da commented at 6:50 AM on August 7, 2019: noneI for one would be very happy to see an optional addr index + api finally available in bitcoin-core, and not as a separate patch or hacky thing. @marcinja is there an API in your PR that accepts multiple addresses as input and returns total received/spent for each? Some block explorers have this functionality and it is great because much more efficient than calling API for each addr individually.
Also I think it should be caller's option (via param) whether to return list of tx or not. Sometimes only addr summary info is needed eg total received/spent and tx info just adds bloat and slows things down in such cases.
marcinja force-pushed on Aug 16, 2019marcinja force-pushed on Aug 16, 2019marcinja force-pushed on Aug 16, 2019marcinja force-pushed on Aug 16, 2019DrahtBot removed the label Needs rebase on Aug 16, 2019fanquake added this to the "Chasing Concept ACK" column in a project
MarcoFalke removed the label Up for grabs on Oct 16, 2019DrahtBot added the label Needs rebase on Oct 16, 2019gmaxwell commented at 11:12 AM on November 20, 2019: contributorI haven't looked at how this is implemented, at all-- but the goal is one that is extraordinarily useful and addresses a severe and growing usability probable in the software that has burned me many times.
ryanofsky commented at 2:48 PM on November 21, 2019: memberI haven't looked at how this is implemented, at all-- but the goal is one that is extraordinarily useful and addresses a severe and growing usability probable in the software that has burned me many times.
Sorry for the dumb question, but what's the severe usability problem?
ryanofsky commented at 8:38 PM on November 25, 2019: memberMost recent IRC discussion I could find about address indexes, http://www.erisian.com.au/bitcoin-core-dev/log-2018-12-13.html#l-601 seemed to have conclusion (from @jamesob) with: "I'll report on any compelling usecases I find for addr index, but I suspect sipa et al are right that that's usually just the Wrong way"
dan-da commented at 8:52 PM on November 25, 2019: none"usually just the Wrong way"
This sounds like an ivory tower attitude to me. So tell us wise ones, if an addrindex is the wrong way, then what exactly is the right way? Because thus far bitcoin-core's answer seems to be: don't lookup transactions related to an address AT ALL unless they are in your own wallet, which is useless for anyone trying to provide address lookup as a service. They then must either run a patched bitcoind or turn to btcd or some other 3rd party solution.
ryanofsky commented at 8:58 PM on November 25, 2019: memberwhat exactly is the right way?
The right way to do what? Can you provide more detail? I think it'd be good to keep the discussion as concrete as possible and discuss specific problems and use cases.
dan-da commented at 9:19 PM on November 25, 2019: noneok, here are two tools I alone have written that need something like searchrawtransactions().
hd-wallet-addrs - a tool for discovering all hd-wallet addresses that have ever been used. So it derives [receive,change] and checks if each address has ever received funds, up to gap limit. Presently it must rely on 3rd party block explorers APIs.
bitprices A tool for querying all transactions for all used addresses in a wallet (or any group of addresses) and generating reports including historical prices on the date of each transaction. Useful for audits and finding gain/loss, etc after the fact. Presently btcd has the best/fastest implementation, but only after I submitted patches for searchrawtransactions.
Here is a website frontend.
Note that the above tools work with any arbitrary seeds or addresses and are expected to return results quickly, so creating a watch-only wallet for each request is not a good option, and anyway it would then just be discarded immediately, so why bother?
The first two links above actually link to discussions of the various API providers and their various strengths/weaknesses for these purposes.
Others have their own reasons. afaict, most block explorers or bitcoin API providers need some sort of address index and are rolling their own somehow.
ryanofsky commented at 9:37 PM on November 25, 2019: memberThanks. From what I can gather the tools described in #14053 (comment) are accounting / auditing tools useful for someone running the bitcoin wallet, and this PR would let people who want to use these tools run them locally instead of relying on centralized third party services.
It seems like a good argument in favor of this feature.
I'm not sure what the better arguments against this feature are, so it would be helpful if someone wanted to articulate them. One argument might be that we don't want people getting used to using tools like these, because eventually in the future historical indexes will be less able to scale, and people who are using these tools in their workflow will be forced to turn to centralized third party services.
jnewbery commented at 10:07 PM on November 25, 2019: memberConcept ACK. The main argument against adding an address index to Bitcoin Core is that people/services will come to rely on them, and they're not scalable long term. My response to that is that people are already using address indexes and block explorers. Providing/not providing that functionality in Bitcoin Core isn't going to change that.
The new indexing infrastructure means that this can be done in a very unobtrusive way. The index has no direct interaction at all with validation or networking. Instead it hooks into the validation interface and runs in a background thread, and so should have no performance impact when the node is sync'ed (I expect this might slow down IBD as
ActivateBestChain()gets blocked onLimitValidationInterfaceQueue(), but presumably users who are indexing the entire transaction history are prepared to wait a bit longer).jgarzik commented at 10:08 PM on November 25, 2019: contributoraddr-index is the most frequently patched-in change to Core, across the years, IME.
dan-da commented at 10:21 PM on November 25, 2019: nonequerying a 3rd party entails a loss of privacy and requires trusting the 3rd party to return truthful data.
jamesob commented at 4:45 PM on November 26, 2019: memberConcept ACK.
I know I've flip-flopped on this feature historically, but I agree very much with the points that @jnewbery makes. Trading the maintenance burden of ~800 lines of relatively non-consensus-risky indexing code for some very practical gains in user privacy is compelling to me. Even with the caveat that we don't think some of these uses are architecturally sound for years and years.
One might make the argument that there's a slippery slope here - if we consent to this index now, would demand for yet another index come along, and the next and so on? I don't think so. In the few years I've followed this, there has been reliable demand for an address index (and numerous community implementations of varying repute) and nothing else - aside of course from the compact block index - so I don't think a slippery-slope argument holds. @marcinja do you have time to rebase this PR?
marcinja commented at 11:18 PM on November 26, 2019: contributorYes, I will rebase this and clean it up a bit. Thanks for the conceptual review.
ryanofsky commented at 2:18 PM on November 27, 2019: memberMaybe we should log a warning when the address index is enabled, like:
<dl><dd>Warning: built-in address index is enabled! Address indexing is going to become less scalable as transaction history increases, and will eventually need to be removed from [PACKAGE_NAME] and replaced by a dedicated external index. Users relying on the address index for accounting purposes are advised to track metadata in real time so relying on a historical index is not necessary.</dd></dl>
Could also add a coda if the built in wallet is enabled:
<dl><dd>If using the address index to work around lack of tagging or notifications from the built-in [PACKAGE_NAME] wallet, please file wallet feature requests: [PACKAGE_BUGREPORT]</dd></dl>
The idea would be to give end users relying on third party accounting tools a way of knowing about the limitations of those tools, and encouraging authors to implement more scalable approaches.
promag commented at 11:51 PM on December 1, 2019: memberConcept ACK.
marcinja force-pushed on Dec 3, 2019DrahtBot removed the label Needs rebase on Dec 3, 2019marcinja force-pushed on Dec 3, 2019marcinja force-pushed on Dec 4, 2019marcinja force-pushed on Jan 2, 2020marcinja force-pushed on Jan 2, 2020marcinja force-pushed on Jan 2, 2020jnewbery added the label Review club on Jan 3, 2020marcinja force-pushed on Jan 3, 2020ryanofsky commented at 4:31 PM on January 6, 2020: memberNice set of notes on this PR at https://bitcoincore.reviews/14053.html
in src/Makefile.am:145 in c21767116b outdated
134 | @@ -135,6 +135,8 @@ BITCOIN_CORE_H = \ 135 | httpserver.h \ 136 | index/base.h \ 137 | index/blockfilterindex.h \ 138 | + index/addrindex.h \ 139 | + index/disktxpos.h \
jnewbery commented at 9:14 PM on January 7, 2020:nit: this is added in the wrong commit (should be in Move only: Move CDiskTxPos to own file, not Add address index)
in src/index/addrindex.h:32 in c21767116b outdated
27 | + 28 | +private: 29 | + const std::unique_ptr<DB> m_db; 30 | + 31 | + // m_hash_seed is used by GetAddrID in its calls to MurmurHash3. 32 | + // It is stored in the index, and restored from their on construction
jnewbery commented at 9:19 PM on January 7, 2020:spelling: there
in src/index/addrindex.cpp:23 in c21767116b outdated
18 | + 19 | +#include <boost/thread.hpp> 20 | + 21 | +std::unique_ptr<AddrIndex> g_addr_index; 22 | + 23 | +static constexpr char DB_ADDR_INDEX = 'a';
jnewbery commented at 9:38 PM on January 7, 2020:I'm not sure if this is actually needed. The other indexes have different prefixes for the different object types that they store, but all objects in this index are given the same prefix, so is it necessary?
in src/index/addrindex.cpp:126 in c21767116b outdated
121 | + DBKey key; 122 | + DBValue value; 123 | + if (!iter->GetKey(key) || key.m_addr_id != addr_id || !iter->GetValue(value) ) break; 124 | + 125 | + // Check that the stored script matches the one we're searching for, in case of hash collisions. 126 | + if (value.second != script) continue;
jnewbery commented at 9:44 PM on January 7, 2020:Does this branch need to advance the iterator? If not, I think
valuewill be the same the next time round and we'll never break out of this loop.I think the following is what we want:
... if (value.second == script) { result.emplace_back(std::make_pair(key, value)); } iter->Next(); }in src/index/addrindex.cpp:178 in c21767116b outdated
173 | +{ 174 | + CBlockUndo block_undo; 175 | + CDiskTxPos pos(pindex->GetBlockPos(), GetSizeOfCompactSize(block.vtx.size())); 176 | + std::vector<std::pair<DBKey, DBValue>> entries; 177 | + 178 | + const bool not_genesis_block = (pindex->nHeight > 0);
jnewbery commented at 9:48 PM on January 7, 2020:style nit: having a variable defined as
not_thingseems unintuitive to me. I think it'd be better to definegenesisand then test on!genesis.in src/index/addrindex.cpp:209 in c21767116b outdated
192 | + // Skip coinbase inputs. 193 | + if (not_genesis_block && i > 0) { 194 | + const CTxUndo& tx_undo = block_undo.vtxundo[i-1]; 195 | + for (size_t k = 0; k < tx.vin.size(); ++k) { 196 | + CScript spent_outputs_scriptpubkey = tx_undo.vprevout[k].out.scriptPubKey; 197 | + DBKey key(DBKeyType::SPENT, GetAddrId(spent_outputs_scriptpubkey), tx.vin[k].prevout);
jnewbery commented at 9:57 PM on January 7, 2020:I don't think this isn't the information that we want to save in the address index. For a given scriptPubKey, a user wants to know:
- which txouts (txid, output index) spent to that scriptPubKey (the CREATED DBKeyType above)
- which txins (txid, input index) consume UTXOs for that scriptPubKey (the SPENT DBKeyType here)
So here, I think you want to save the txid and input index spending the coin. You're actually saving the txid and output index that creates the coin, because you're using the prevout.
(I'm not entirely sure about this. Perhaps you are trying to return the outpoint that created the coin in the spent outputs array, but that's not clear to me from the RPC documentation).
narula commented at 5:50 PM on January 8, 2020:I think this is correct because he is using the prevout in the key. When looking up by this script with this outpoint, one would want to find this transaction because it spends this outpoint.
AFAICT the value just contains the position of the relevant transaction on disk and the scriptPubKey (to detect collisions). The value does not contain any indexes into inputs or outputs.
narula commented at 5:55 PM on January 8, 2020:Can you add a comment with your key/value format somewhere and the justification?
in src/rpc/rawtransaction.cpp:227 in c21767116b outdated
222 | + "\nRequires -addrindex to be enabled.\n" 223 | + "\nArguments:\n", 224 | + { 225 | + {"address", RPCArg::Type::STR_HEX, RPCArg::Optional::NO, /* default_val */ "", "address or scriptPubKey"}, 226 | + {"verbose", RPCArg::Type::BOOL, RPCArg::Optional::OMITTED, /* default_val */ "false", "If false, return hex-encoded tx, otherwise return a json object"}, 227 | + {"count", RPCArg::Type::NUM, RPCArg::Optional::OMITTED, /* default_val */ "", "The block in which to look for the transaction"},
jnewbery commented at 10:02 PM on January 7, 2020:"The block in which to look for the transaction" is wrong
andrewtoth commented at 2:50 AM on January 8, 2020:The default value is implicitly 100. It should be explicitly defined here.
in src/rpc/rawtransaction.cpp:232 in c21767116b outdated
227 | + {"count", RPCArg::Type::NUM, RPCArg::Optional::OMITTED, /* default_val */ "", "The block in which to look for the transaction"}, 228 | + }, 229 | + { 230 | + RPCResult{"if verbose is not set or set to false", 231 | + "\nResult:\n" 232 | + " [ (array of json objects)\n"
jnewbery commented at 10:10 PM on January 7, 2020:I suggest you make this an object with two keys:
createsandspendsjnewbery commented at 10:20 PM on January 7, 2020: memberI've done a preliminary review and left a few comments.
I think this PR could use:
- some design documentation - what's being stored as keys and values in the addr index database and why, and what values are being returned to the RPC user. This could be a code comment at the top of
addrindex.cpporaddrindex.h. - more test cases, particularly of the spent outputs values (there aren't currently any tests for that) and persistence across stop-start. The functional test can be moved to its own file, rather than being added to
rpc_rawtransaction.py.
in src/rpc/rawtransaction.cpp:218 in c21767116b outdated
213 | @@ -213,6 +214,142 @@ static UniValue getrawtransaction(const JSONRPCRequest& request) 214 | return result; 215 | } 216 | 217 | +static UniValue searchrawtransactions(const JSONRPCRequest& request) { 218 | + if (request.fHelp || request.params.size() < 1 || request.params.size() > 4)
andrewtoth commented at 2:45 AM on January 8, 2020:if (request.fHelp || request.params.size() < 1 || request.params.size() > 3)
MarcoFalke commented at 5:56 PM on January 10, 2020:Should use the new
.Check()syntax insteadin src/index/addrindex.cpp:220 in c21767116b outdated
215 | + 216 | +// FindTxsByScript fills the spends_result vector with outpoints corresponding 217 | +// to the output spent with the given script, and the transaction it was spent 218 | +// in. creations_result is filled with outpoints for outputs created with this 219 | +// script as their script pubkey, and the transactions they were created in. 220 | +bool AddrIndex::FindTxsByScript(const CScript& script,
andrewtoth commented at 3:09 AM on January 8, 2020:Shouldn't this method take the count as a parameter and return early when enough entries are found?
in src/rpc/rawtransaction.cpp:339 in c21767116b outdated
321 | + std::string hex_tx = EncodeHexTx(*(spend.second.first), RPCSerializationFlags()); 322 | + tx_val.pushKV("hex", hex_tx); 323 | + } 324 | + spends.push_back(tx_val); 325 | + ++spends_it; 326 | + }
andrewtoth commented at 3:13 AM on January 8, 2020:If there are more spends than
countthen no creates will be returned. Ideally the spends and creates will be returned in order, rather than all spends first.in src/rpc/rawtransaction.cpp:223 in c21767116b outdated
213 | @@ -213,6 +214,142 @@ static UniValue getrawtransaction(const JSONRPCRequest& request) 214 | return result; 215 | } 216 | 217 | +static UniValue searchrawtransactions(const JSONRPCRequest& request) {
andrewtoth commented at 3:14 AM on January 8, 2020:There doesn't seem to be a way to skip transactions anymore. Was this removed intentionally?
in src/index/addrindex.cpp:123 in e739288bd5 outdated
118 | + std::unique_ptr<CDBIterator> iter(NewIterator()); 119 | + iter->Seek(search_key); 120 | + while (iter->Valid()) { 121 | + DBKey key; 122 | + DBValue value; 123 | + if (!iter->GetKey(key) || key.m_addr_id != addr_id || !iter->GetValue(value) ) break;
narula commented at 4:31 PM on January 8, 2020:Do you want to log or throw something here? If I understand the LevelDB docs correctly, you have what LevelDB says is a valid iterator which should be positioned at the key. I think if any of these return false you'd want to know as the DB could be corrupted.
narula commented at 4:40 PM on January 8, 2020:I think in general it would be nice if this class did some of the error logging that I see in
blockfilterindex.cpp.in test/functional/rpc_rawtransaction.py:494 in c21767116b outdated
485 | @@ -487,6 +486,44 @@ def run_test(self): 486 | assert_equal(testres['allowed'], True) 487 | self.nodes[2].sendrawtransaction(hexstring=rawTxSigned['hex'], maxfeerate='0.20000000') 488 | 489 | + self._test_searchrawtransactions() 490 | + 491 | + def _test_searchrawtransactions(self):
narula commented at 5:31 PM on January 8, 2020:Please also test the spends_results path (returning transactions which spend from this address).
jonatack commented at 6:00 PM on January 8, 2020: memberStill gathering context here in light of the Concept ACKs others have given. Nevertheless, on first look I'm finding it difficult to ack the concept of investing resources on reviewing, adding, and maintaining a feature that will need to be removed down the road in favor of a dedicated external index by interested parties.
fjahr commented at 6:39 PM on January 8, 2020: memberFor me what this PR is missing some analysis on the current size of the index in mainnet and the growth rate of the data. Has anyone run the numbers recently and can report?
in src/rpc/rawtransaction.cpp:314 in c21767116b outdated
309 | + while (spends_it != spends_result.end() && count--) { 310 | + const auto& spend = *spends_it; 311 | + UniValue tx_val(UniValue::VOBJ); 312 | + 313 | + UniValue outpoint_val(UniValue::VOBJ); 314 | + outpoint_val.pushKV("txid", spend.first.hash.GetHex());
marcinja commented at 6:53 PM on January 8, 2020:outpoint_val.pushKV("txid", spend.second.first.hash.GetHex());I'll write a functional test that catches this too :)
in src/index/addrindex.cpp:267 in c21767116b outdated
246 | + std::pair<CTransactionRef, uint256> result = std::make_pair(tx, header.GetHash()); 247 | + 248 | + // Place entry into correct vector depending on its type. 249 | + switch (key.m_key_type) { 250 | + case DBKeyType::SPENT: 251 | + spends_result.emplace_back(std::make_pair(key.m_outpoint, result));
narula commented at 6:56 PM on January 8, 2020:As pointed out by jnewbery in review club, I think this is wrong.
The DB's key always represents the outpoint that created the scriptPubKey. For spends_result, you want the txid of the transaction that spends the scriptPubKey. So instead of the DB key's outpoint as the transaction id to be returned to the user, you want the transaction id of the transaction in the DB value. Or, in the RPC code lines 314 and 315, use the txid from
spend.second.Edit: Or maybe this field of the returned result is always the created_outpoint, even though it's called the spent_outpoint sometimes? Would be good to clarify this.
andrewtoth commented at 2:49 AM on January 10, 2020: contributorRan on i7-8750H, took about 48 hours on an already synced node. Took 200 GB disk space.
2020-01-08T02:22:01Z addr_index thread start ... 2020-01-10T02:37:07Z addr_index is enabled at height 611001 2020-01-10T02:37:07Z addr_index thread exit$ du -h addr_index 200G addr_indexDrahtBot added the label Needs rebase on Jan 22, 2020luke-jr commented at 3:07 AM on January 29, 2020: memberI think @jnewbery was hinting at this in the PR Review Club as well: it seems like this index does not handle reorgs. BaseIndex handles this through Rewind which is not overridden.
Handling reorgs might actually be undesirable for this... but changing CDiskTxPos like I just suggested to a block height could be incompatible with saving references to blocks not in the main chain too. (Maybe storing height OR blockhash would work?)
marcinja force-pushed on Feb 12, 2020DrahtBot removed the label Needs rebase on Feb 12, 2020marcinja force-pushed on Feb 12, 2020DrahtBot added the label Needs rebase on Apr 6, 2020baso10 commented at 8:18 AM on June 28, 2020: noneAny update when this could be merged?
fjahr commented at 9:49 AM on June 28, 2020: memberAny update when this could be merged?
I kind of thought that this was still in conceptual review but it has received 5 Concept ACKs from experienced Core developers and I did not see much of the push back this has received historically, so I tend to believe this can be moved out of Conceptual review phase and move forward. There is still a lot of work to be done in terms of testing, documentation etc. @marcinja If you are currently unable to work on this I would be happy to take this over since I have some experience with indexes already :)
luke-jr commented at 10:00 AM on June 28, 2020: memberPast NACKs don't disappear just because a new PR is opened...
fjahr commented at 10:12 AM on June 28, 2020: memberPast NACKs don't disappear just because a new PR is opened...
Definitely agree, especially if the PRs are very close. But I think if a new PR is opened with a different approach it can still be expected for the reviewers who gave NACKs in the past, to participate in the discussion again and reinstate their NACK or someone else who agrees with their view should do it for them. Otherwise, we might never be able to move forward if a concept is NACKed and then the reviewer leaves the project or just does not comment on the new approach. But it would probably be good if @marcinja would address the past NACKs directly and give those reviewers another chance to say if they have changed their minds or not.
MarcoFalke commented at 11:22 AM on June 28, 2020: memberConcept ACK
jonatack commented at 11:49 AM on June 28, 2020: memberAny update when this could be merged? @baso10
This PR needs to be rebased, as you can see in #14053 (comment)
The PR is still going through peer review -- see the discussion above
See https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#peer-review for more info about the peer review process in this project
jonatack commented at 11:49 AM on June 28, 2020: memberA good resource for more context and history and reasons for and against this addition: https://bitcoincore.reviews/14053.
marcinja force-pushed on Jul 8, 2020marcinja force-pushed on Jul 8, 2020DrahtBot removed the label Needs rebase on Jul 8, 2020marcinja force-pushed on Jul 8, 2020marcinja force-pushed on Jul 8, 2020DrahtBot added the label Needs rebase on Jul 14, 2020marcinja force-pushed on Jul 14, 2020DrahtBot removed the label Needs rebase on Jul 15, 2020marcinja commented at 1:54 AM on July 15, 2020: contributorHey everyone. Just wanted to give a small update here. I am still planning on keeping this PR up-to-date and rebased. I've realized it's more annoying to rebase after procrastinating, and also it's not fair to the reviewers and other people interested so I will be much more diligent about rebasing now.
The total size of the address index right now is 223GB. I don't have a good answer on index sync time because I was re-syncing on a laptop. This was done on the 8f3fc6b4b8fb1a2db3035398d3dd5a2ed91cc3f9 branch over the course of a few days. I'm going to time an IBD and addrindex sync on a desktop with a good SSD so as to get meaningful results.
The good news is that
searchrawtransactionsis not prohibitively slow.time ./src/bitcoin-cli searchrawtransactions 3EiAcrzq1cELXScc98KeCswGWZaPGceT1d true 30000 > big-search ./src/bitcoin-cli searchrawtransactions 3EiAcrzq1cELXScc98KeCswGWZaPGceT1d 4.79s user 1.95s system 49% cpu 13.575 total(the file
big-searchcreated here is about 366MB)I have also confirmed that the address index has the same number of created/spent outpoints as on this explorer: https://blockstream.info/address/3EiAcrzq1cELXScc98KeCswGWZaPGceT1d
DrahtBot added the label Needs rebase on Jul 30, 2020marcinja force-pushed on Jul 30, 2020DrahtBot removed the label Needs rebase on Jul 31, 2020in src/index/addrindex.cpp:41 in 75e044564b outdated
36 | + * key_type is SPENT when the outpoint stored in the key is spent, i.e. it is 37 | + * used as a prevout in a transaction input. It is CREATED when the outpoint is 38 | + * created as a new COutpoint in that transaction. 39 | + * 40 | + * outpoints are stored in the key as opposed to in the DB value to preserve 41 | + * uniqueness and to support multiple values for a a single addr_id and key_type
romanz commented at 5:16 PM on August 6, 2020:* uniqueness and to support multiple values for a single addr_id and key_typein src/index/addrindex.cpp:30 in 75e044564b outdated
25 | + * stores one unique global value under the DBKeyType::SEED key that seeds the 26 | + * MurmurHash3 hasher used to create AddrIds. 27 | + * 28 | + * The DB keys are structured as follows: <addr_id, key_type, outpoint> 29 | + * 30 | + * addr_id is the hash of the script_pub_key computed using MurmurHash3, a a
romanz commented at 5:16 PM on August 6, 2020:* addr_id is the hash of the script_pub_key computed using MurmurHash3, aDrahtBot added the label Needs rebase on Aug 13, 2020marcinja force-pushed on Aug 15, 2020marcinja force-pushed on Aug 15, 2020DrahtBot removed the label Needs rebase on Aug 15, 2020MarcoFalke referenced this in commit 1bc8e8eae2 on Aug 17, 2020DrahtBot added the label Needs rebase on Aug 17, 2020sidhujag referenced this in commit 4e2e81b871 on Aug 18, 2020marcinja force-pushed on Aug 19, 2020DrahtBot removed the label Needs rebase on Aug 19, 2020laanwj referenced this in commit 27eeb0337b on Aug 20, 2020sidhujag referenced this in commit defff93348 on Aug 20, 2020fanquake referenced this in commit 0d9e14a646 on Aug 21, 2020DrahtBot added the label Needs rebase on Aug 21, 2020sidhujag referenced this in commit e954a012af on Aug 21, 2020marcinja force-pushed on Aug 22, 2020marcinja force-pushed on Aug 22, 2020DrahtBot removed the label Needs rebase on Aug 22, 2020DrahtBot added the label Needs rebase on Sep 22, 2020marcinja force-pushed on Sep 30, 2020marcinja force-pushed on Sep 30, 2020DrahtBot removed the label Needs rebase on Sep 30, 20204c4c0cf239Add address index
Adds index that relates scriptPubKeys to location of transactions in the filesystem, along with the hash of the block they are found in, and the outpoint information of the txout with the related script.
545bc46a22Initialize address index
Setup address index in initialization process. Add initialization warning and wallet feature request warning as suggested by ryanofsky.
Add address index test 0f85eb9043cd274024b8Add searchrawtransactions RPC
Adds searchrawtransactions RPC that uses the address index to lookup transactions and outpoints by script and address. Adds basic functional tests for searchrawtransactions.
marcinja force-pushed on Oct 1, 2020c78867886 commented at 7:01 AM on October 13, 2020: noneCan someone plz approve this?
decryp2kanon commented at 1:48 AM on November 8, 2020: contributorConcept ACK
Talkless commented at 12:47 PM on December 13, 2020: none@marcinja will this allow other wallets like Electrum to utilize this feature (RPC credentials provided, of course) and avoid having to run ElecturmX (https://github.com/spesmilo/electrumx), EPS (https://github.com/chris-belcher/electrum-personal-server) or BWT (https://github.com/shesek/bwt) intermediary software? @ecdsa @SomberNight @chris-belcher @shesek could you provide your input on how this feature might be useful (or not)?
romanz commented at 2:30 PM on December 13, 2020: contributorThe main issue AFAIU is that Electrum is using
SHA256(scriptPubKey)while this PR is usingMurmurHash3(scriptPubKey). Also, ElectrumX & electrs are using RocksDB for the index storage - resulting in better performance and disk usage (compared to LevelDB).SomberNight commented at 3:04 PM on December 13, 2020: contributor@Talkless note that while some Electrum users run their own bitcoind, many do not. Electrum wants to support both use cases, and in fact the suspicion is that most users just use a public server. When using a public server, the client cannot use bitcoind RPC, hence in that case I don't see how the middleware (e.g. ElectrumX) could be avoided.
For the own bitcoind use case, maybe the client could have another optional mode of operation where it uses bitcoind RPC directly, which is I guess what you are asking about. For that, an address-index in bitcoind is the main thing missing indeed, however not the only one. For one, the electrum protocol (the client<->"middleware server" connection) has address subscriptions - the client gets a notification when a history of one its addresses changes. We are also planning on soon adding another method into the protocol that allows
txoutpoint->spender_txidlookups (and notifications); I guess that could be implemented using the index in this PR albeit in a very inefficient way for heavily reused addresses.IMHO there are multiple upsides for having this middleware setup for Electrum:
- for the project, it keeps the codebase simpler (again, we want to support users without own bitcoind)
- for the project, it also allows for more flexibility for implementing new functionality: just consider present PR here, we have needed such an index for 9 years but bitcoind did not have it or want it, so we could just implement it ourselves
- for the server operator, even if they don't want to open up the server for the public, they could share it with their friends and family. I don't think that's feasible with bitcoind RPC. I think this is a common use case.
Nevertheless if someone steps up and contributes patches, this kind of thing could be added.
The total size of the address index right now is 223GB
That sounds much larger than expected. Even with the
txoutpoint->spender_txidmap I mentioned above, when using LevelDB, ElectrumX uses around 90 GiB of disk space.The DB keys are structured as follows:
<addr_id, key_type, outpoint>The DB values are simply:<CDiskTxPos, CScript>Instead of having both
addr_idandCScript, why not just put a long hash, e.g.sha256(CScript)into the key?Also, I expect most users of this index would also want txindex enabled. You might want to consider making address index dependent on txindex. Have you investigated how much space that would save? There would be no need to store
CDiskTxPos.Another trick that ElectrumX uses is that only the best chain is indexed. We have a
tx_num->txidmap as raw files on disk. Atx_numis the index of the transaction in the linear history of the blockchain, a 5-byte integer (so e.g. the genesis block coinbase tx hastx_num=0). This map uses around 17 GiB atm. With this, you can encode the txid part of theoutpointas 5 bytes instead of 32 bytes.laanwj removed this from the "Chasing Concept ACK" column in a project
MarcoFalke commented at 7:09 PM on December 17, 2020: memberConcept ACK (might have already done that)
sipa commented at 7:25 PM on December 17, 2020: memberI'm concept -0 on this.
My primary objection is that I think it's a bad idea for any infrastructure to be built all that relies on having fully-indexed blockchain data available (this also applies to txindex, but we can't just remove support...).
However, it seems many people want something like this, and are going to use it anyway. The question is then whether it belongs in the bitcoin-core codebase. Alternative, and more performant presumably, like electrs exist already too, so it isn't exactly impossible to do this elsewhere.
Still, given that we now have the indexes infrastructure, it means that things like this are easy to add in a fairly modular way without invading consensus code. So if people really want this, fine.
Overall approach comment: I don't think MurmurHash should be used for anything new; there are strictly better hash functions available. I'd suggest SipHash if that's fast enough.
jamesob commented at 7:42 PM on December 17, 2020: memberI'm also a little more negative on having this in Core than I previously was. After working in a few industrial contexts on wallet stuff, it's clear to me that an address index is really only required if you want to implement a block explorer or do chain analysis. For both of these applications, using something like electrs seems sufficient.
For personal wallet management, a full address index is not required. I think the origin of some confusion is that things like the Electrum Personal Server have become synonymous with this kind of usage, but in reality a full index is overkill when descriptor-specific rescans can be done once for a historical backfill and then per-block scanning can be done from there on out.
I want to point out that this is a nice implementation and good work by @marcinja, but I'm leaning slightly against the inclusion of such an index in Core at this point.
jonasschnelli commented at 7:48 PM on December 17, 2020: contributorMy primary objection is that I think it's a bad idea for any infrastructure to be built that relies on having fully-indexed blockchain data available.
I agree on this.
IMO the only use cases to ever use a full address index are:
- Instant seed/xpriv backup recovery including spent history
- Backend service for thousands of wallets
- Debug/explore purposes
1 (instant backup recovery) could be solved with either
scantxoutset(take a minute or two) or by an address-index for the utxo set only. But both would not restore the spent history. A scalable non-enterprise solution to restore the spent history is using blockfilters. Scan through the filters and rescan only the relevant blocks (a matter of minutes), see #20664.2 (a backend for thousands of wallets): out of scope for this project.
3 (explore purposes): I think this is a valid use case. Though adding this PR to Bitcoin Core will lead to many many projects using it in production increasing the traffic in this project and eventually steal time from existing contributors (rebase, maintenance, drag-along)
My main fear is that people are going to use this index (a full address index) to use it as an electrum(ish) backend for a handful of wallets.
With multiwallet, watch-only-wallets, PSBT, we have all tools to server multiple wallets in a scalable way for external applications.
I also think merging this as it is, would be in contradiction to the process- and repository-separation effort.
Therefore I'm ~0 (slightly towards NACK) to add this. If this would be in another repository (still under
bitcoin/*) and process separated, I would ACK it.marcinja commented at 4:21 PM on January 4, 2021: contributorHi all, thanks for the feedback and review. This was an enjoyable PR to work on and I learned a lot from all your comments.
I'm closing this PR because its size probably requires stronger support from contributors to get in. It also seems more clear now that all of the practical use-cases are covered by existing features and some lightweight alternatives (https://github.com/bitcoin/bitcoin/pull/20664) .
I also agree that it would be bad to incentivize using an address index to support an external electrum wallet, when it's not the intended use-case and would cause unnecessary burden on contributors and maintainers in this project, e.g. from users of those wallets wanting new features or updates.
marcinja closed this on Jan 4, 2021vijaydasmp referenced this in commit 909c5fa563 on Sep 15, 2021vijaydasmp referenced this in commit f3165ddc45 on Sep 16, 2021PastaPastaPasta referenced this in commit 0e3b75d391 on Sep 17, 2021PastaPastaPasta referenced this in commit 4526efc6ac on Sep 17, 2021vijaydasmp referenced this in commit 736421f02e on Sep 20, 2021vijaydasmp referenced this in commit a97683ff3a on Sep 20, 2021vijaydasmp referenced this in commit feddd3c0c2 on Sep 20, 2021vijaydasmp referenced this in commit 71df8f25db on Sep 20, 2021vijaydasmp referenced this in commit a489420e16 on Sep 22, 2021vijaydasmp referenced this in commit 829aeff0f6 on Sep 22, 2021kittywhiskers referenced this in commit c7dff0d165 on Oct 12, 2021kittywhiskers referenced this in commit c1d2c79f4d on Oct 12, 2021gades referenced this in commit bef7b2508d on May 2, 2022gades referenced this in commit cf54acf2ce on May 2, 2022DrahtBot locked this on Aug 16, 2022Contributors
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-02 15:15 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me