txoutsbyaddress index (take 2) #8660

pull djpnewton wants to merge 11 commits into bitcoin:master from djpnewton:cozz8 changing 23 files +1274 −95
  1. djpnewton commented at 10:12 am on September 3, 2016: contributor

    This is an attempt to revive PR #5048

    I have moved the index to its own database and also added a gettxoutsbyaddress REST command

    I still need to think about the layout of binary input/output for the REST command, or is bin encoding necessary?

    Also still missing tests for the new REST stuff

  2. fanquake added the label UTXO Db and Indexes on Sep 3, 2016
  3. djpnewton force-pushed on Sep 3, 2016
  4. dcousens commented at 5:23 am on September 4, 2016: contributor

    It’s not overly complicated to maintain this index outside of bitcoind, IMHO, that is where it should be. Perhaps indexd or some other service. Concept NACK

    edit: See #8660 (comment)

  5. luke-jr commented at 5:24 am on September 4, 2016: member
    NACK. Txouts are not associated with addresses at all.
  6. paveljanik commented at 6:47 am on September 4, 2016: contributor

    @luke-jr They are, but this association is not useful for 99.9% bitcoind users. And it slows down the normal work used by 100% of bitcoind users. So NACK.

    If people need this, please contribute some description/work using the external indexing solution into contrib/.

  7. btcdrak commented at 12:26 pm on September 4, 2016: contributor
    Concept ACK from me. @paveljanik The feature is off by default so it shouldn’t slow down anything for normal users. OT: If indexes are really considered so bad then -txindex should be removed.
  8. in qa/rpc-tests/txoutsbyaddress.sh: in 673efd7483 outdated
    0@@ -0,0 +1,163 @@
    1+#!/usr/bin/env bash
    


    MarcoFalke commented at 9:29 pm on September 4, 2016:
    We don’t use bash anymore in /rpc-tests

    laanwj commented at 9:43 am on September 13, 2016:
    Yes, please look at the existing tests in qa/rpc-tests, use the Python framework. Also add the test to qa/pull-tester/rpc-tests.py so it gets executed by default.
  9. djpnewton commented at 10:25 pm on September 4, 2016: contributor

    The inspiration for this PR is to try and make it easier to build a fully validating wallet that does not use significantly more resources then a full node itself.

    It seems that most (all?) current solutions for building a wallet either involve added trust (SPV or centralized API services) or added resource requirements (bitcore.io needs 8GB RAM and 200GB disk space)

    Thats the rationale (which I probably should have stated at the outset) anyway. The last time I was trying to make a demo it was easier just to use the blockcypher API, or maybe there are projects that I missed that fill this gap?

  10. luke-jr commented at 1:30 am on September 5, 2016: member
    If the RPC was by script (as the internals seem to be) instead of by address, it might make sense.
  11. dcousens commented at 1:59 am on September 5, 2016: contributor

    If people need this, please contribute some description/work using the external indexing solution into contrib/.

    OT: If indexes are really considered so bad then -txindex should be removed.

    IMHO, that is where txindex should be, at least then it might easy to extend to things like this without adding 1000 SLOC.

  12. djpnewton commented at 3:28 am on September 5, 2016: contributor
    @luke-jr the RPC accepts both hex encoded script or addresses
  13. jonasschnelli commented at 7:54 am on September 5, 2016: contributor

    The inspiration for this PR is to try and make it easier to build a fully validating wallet that does not use significantly more resources then a full node itself.

    It seems that most (all?) current solutions for building a wallet either involve added trust (SPV or centralized API services) or added resource requirements (bitcore.io needs 8GB RAM and 200GB disk space)

    These are good point!

    Some thoughts:

    1. If we would add something like this, it should be more modular. Have a look at https://github.com/bitcoin/bitcoin/pull/8501/files, there you can see how just a handful of lines in init.cpp can add core features over a flexible modular approach. It also simplifies rebases.

    2. Was it considered to add the index outside of bitcoin-core? I guess using ZMQ notifications together with REST (or RPC) would work. What would be the downsides or advantages of the “external” approach?

  14. dcousens commented at 8:22 am on September 5, 2016: contributor

    Was it considered to add the index outside of bitcoin-core? I guess using ZMQ notifications together with REST (or RPC) would work. What would be the downsides or advantages of the “external” approach?

    As someone who does this approach, the only difficulties is ensuring that you handle orphans and forks properly. They do happen enough and cause issues, so you need to be prepared for the floor to fall out between RPC calls. Simple enough code, but battle testing it can take a while and would probably be important to have examples for in contrib/ for beginners. As far as the db requirements go, you just need to be able to rollback data for re-orgs.

  15. btcdrak commented at 10:42 am on September 5, 2016: contributor
    Indexes are desperately needed, there are quite a few users of my addrindex fork https://github.com/btcdrak/bitcoin/releases that demonstrate this need and there is a steady flow of people finding out about it. This PR is a more lightweight option and I think it should be part of Bitcoin Core in an easily accessible way (i.e. just turn on with a switch+reindex). While I could say add it to my fork, it is does reduce the visibility of it.
  16. sipa commented at 11:00 am on September 5, 2016: member

    -txindex was only added for backward compatibility with the pre-0.8 getrawtransaction RPC. I think people who need it should build an external solution. It’s inefficient, requires updates all over the place, and doesn’t even guarantee consistency when there are reorgs, and is not needed for any of the functions Bitcoin Core provides. The addrindex hack is even worse - it encourages people to build an external wallet in an easy, unscalable, and inadvisable way.

    Indexing the UTXO set like this PR does is less bad - at least it doesn’t prevent pruning - but what function would it provide apart from RPCs? The wallet can’t actually use it (as it would miss historical transactions), and it won’t be available in SPV mode.

  17. jonasschnelli commented at 11:24 am on September 5, 2016: contributor

    Agree with @sipa.

    A maybe more scalable way how “wallet” (or lets say something like the BWS [bitpay wallet service]) could be built on top of bitcoin core would be:

    • add multiwallet support (something similar like #2184)
    • add support for pubkey only wallets (should be relatively trivial, partially solved with the current support for “watchonly”)
    • external wallet could use RPC in order to create “wallets”, use fundrawtransaction in a pubkey-only multiwallet setup and
    • transactions can and should be signed in the “external wallet space”

    “Wallets” would then be reduces to…

    • A set of addresses to keep track of (track/store transactions)
    • Coinselection and utxo per set

    Restoring backups is the only feature I can think off that would require a addr-index (otherwise rescanning will be very slow).

  18. djpnewton commented at 8:34 pm on September 5, 2016: contributor

    The entire point of running a node is to provide validation for a wallet somewhere. That is what makes a node significant.

    Is it actually simple to provide this service outside of bitcoin-core? Are there any projects that manage to do it without duplicating the entire blockchain into their own database? It seems a real shame if we go to the trouble of assembling the UTXO set and cant provide ways for external wallets to leverage it. @sipa are you sure it would miss historical transactions? Any transactions adding funds to the wallet will show up in the UTXO set and any transactions spending funds from the wallet can only be created by the wallet itself (I guess they can be malleated). @jonasschnelli I think I like this idea. It might not be as flexible in the kinds of external wallets it allows though.

  19. dcousens commented at 0:56 am on September 6, 2016: contributor

    Are there any projects that manage to do it without duplicating the entire blockchain into their own database?

    Of course, but this index is not small by any means.

  20. laanwj commented at 9:40 am on September 13, 2016: member

    I think this functionality is very useful. E.g. to get the current balance for an arbitrary address/script. Missed this a few times. I think external wallets such as coinjoin may find this useful as well.

    • Sure, you cannot use this with SPV but the primary reason for Bitcoin Core’s existence is to be a full node. Storing the UTXO set comes with that. To be clear SPV mode is by no means the future of this project, at most it would be optional.
    • Unlike a full tx index a UTXO index can be used with pruning, and takes a lot less disk space.

    Concept ACK.

    The wallet can’t actually use it (as it would miss historical transactions),

    There may be use for a quick-rescan that ignores historical transactions, which can be used with pruning. (out of scope for this pull, but it has been brought up before - see #8497)

  21. laanwj commented at 9:45 am on September 13, 2016: member

    is bin encoding necessary?

    No it’s not necessary - only if there is a straightforward binary representation like for transactions and blocks. You don’t have to make up one.

  22. dcousens commented at 9:58 am on September 13, 2016: contributor

    I wasn’t aware this was just for UTXOs, not txouts at any point in history (aka, a full txout index, similar to the addrindex fork).

    light concept ACK

  23. dexX7 commented at 7:57 pm on September 13, 2016: contributor

    Concept ACK.

    Several projects I know of use the addrindex stuff from years ago to derive information about UTXO state, and this is much more elegant.

  24. in src/init.cpp: in 673efd7483 outdated
    344@@ -340,6 +345,7 @@ std::string HelpMessage(HelpMessageMode mode)
    345     strUsage += HelpMessageOpt("-sysperms", _("Create new files with system default permissions, instead of umask 077 (only effective with disabled wallet functionality)"));
    346 #endif
    347     strUsage += HelpMessageOpt("-txindex", strprintf(_("Maintain a full transaction index, used by the getrawtransaction rpc call (default: %u)"), DEFAULT_TXINDEX));
    348+    strUsage += HelpMessageOpt("-txoutsbyaddressindex", strprintf(_("Maintain an address to unspent outputs index (rpc: gettxoutsbyaddress). The index is built on first use. (default: %u)"), 0));
    


    dexX7 commented at 8:02 pm on September 13, 2016:
    My personal preference would be something shorter like "txoutsindex", especially because it really indexes scripts and not addresses.

    djpnewton commented at 10:50 am on September 15, 2016:
    Sure I dont see a problem with that

    MarcoFalke commented at 10:53 am on September 15, 2016:
    Or just txoutindex (or utxoutindex or utxoindex)

    djpnewton commented at 4:58 am on September 17, 2016:

    argh, paradox of choice!

    Ok, how about if nobody has any objection I will change the name to “txoutindex” (command line parameter, RPC, REST etc)?

  25. djpnewton commented at 10:49 am on September 15, 2016: contributor
    Ok I converted the RPC tests to python and added some REST tests
  26. in qa/rpc-tests/rest.py: in d50fd07c87 outdated
    52@@ -53,7 +53,8 @@ def __init__(self):
    53         self.num_nodes = 3
    54 
    55     def setup_network(self, split=False):
    56-        self.nodes = start_nodes(self.num_nodes, self.options.tmpdir)
    57+        args = ["-txoutsbyaddressindex"]
    


    MarcoFalke commented at 10:51 am on September 15, 2016:

    Nit: You can put something like the following up a few lines, so the scope is not limited to setup_network.

    0self.extra_args = [["-txoutsbyaddressindex"]] * 3
    
  27. in qa/rpc-tests/txoutsbyaddress.py: in d50fd07c87 outdated
    20+        self.nodes = []
    21+        self.nodes.append(start_node(0, self.options.tmpdir, ["-txoutsbyaddressindex"]))
    22+        self.nodes.append(start_node(1, self.options.tmpdir))
    23+        self.nodes.append(start_node(2, self.options.tmpdir))
    24+        self.is_network_split = False
    25+        self.sync_all()
    


    MarcoFalke commented at 10:54 am on September 15, 2016:
    Nit: I think this is not required
  28. in qa/rpc-tests/txoutsbyaddress.py: in d50fd07c87 outdated
    15+        self.setup_clean_chain = True
    16+        self.num_nodes = 3
    17+
    18+    def setup_network(self, split=False):
    19+        print("Setup network...")
    20+        self.nodes = []
    


    MarcoFalke commented at 10:56 am on September 15, 2016:
    0self.nodes = start_nodes(3, self.options.tmpdir, self.extra_args)
    

    should work as well. (You can put self.extra_args = [['-txoutindex'], []. []] in __init__())

  29. in qa/rpc-tests/txoutsbyaddress.py: in d50fd07c87 outdated
    30+        # Check that there's no UTXO on any of the nodes
    31+        for node in self.nodes:
    32+            assert_equal(len(node.listunspent()), 0)
    33+
    34+        # mining
    35+        connect_nodes(self.nodes[1], 0)
    


    MarcoFalke commented at 10:58 am on September 15, 2016:
    Nit: you can move the connect_*s up into setup_network(). I think all other tests do this
  30. in qa/rpc-tests/txoutsbyaddress.py: in d50fd07c87 outdated
    49+        txouts = self.nodes[0].gettxoutsbyaddress(1, (address,))
    50+        txid = txouts[0]["txid"]
    51+        assert_is_hash_string(txid)
    52+        assert_equal(txid, txid1)
    53+
    54+        # stop node 2
    


    MarcoFalke commented at 10:59 am on September 15, 2016:
    Why is this required?

    djpnewton commented at 5:02 am on September 17, 2016:

    node 2 is stopped so that later on it can be restarted and then orphan a block created by node 1 (to test txoutindex handles the reorg)

    basically its stopped so I will not build on a block created by node 1


    ryanofsky commented at 7:28 pm on January 4, 2017:
    Should probably mention this in a comment.
  31. djpnewton commented at 5:05 am on September 17, 2016: contributor
    addressed review nits by @MarcoFalke (thank you)
  32. jmcorgan commented at 12:09 pm on September 26, 2016: contributor
    Concept ACK.
  33. in qa/pull-tester/rpc-tests.py: in bca56f620a outdated
    142@@ -143,6 +143,7 @@
    143     'importprunedfunds.py',
    144     'signmessages.py',
    145     'p2p-compactblocks.py',
    146+    'txoutsbyaddress.py',
    


    MarcoFalke commented at 2:02 pm on September 26, 2016:
    Missing +x permission

    djpnewton commented at 6:24 am on October 1, 2016:
    done
  34. djpnewton force-pushed on Oct 1, 2016
  35. btcdrak commented at 11:05 am on December 16, 2016: contributor
    needs rebase
  36. gmaxwell commented at 3:03 pm on January 2, 2017: contributor
    Concept ACK. Comment, can others provide feeback: I think the argument should be changed to unspentoutputindex or utxoindex or something like that, otherwise it sounds like it indexes everything. Similar for the rpc names.
  37. droark commented at 10:02 pm on January 3, 2017: contributor
    Hmmm. If @djpnewton doesn’t pick this up again in the next few days, I’ll seriously consider keeping it going.
  38. txoutsbyaddress index f277cf567f
  39. move txoutsbyaddress index to its own database 80337066b2
  40. add txoutsbyaddress querying via rest interface 7458f1839d
  41. C++11: s/boost::scoped_ptr/std::unique_ptr/ 227cfe95ba
  42. convert txoutsbyaddress test to python ad28b1b170
  43. - add txoutsbyaddress REST interface tests
     - remove skeleton HEX/BIN format support from REST interface for gettxoutsbyaddress endpoint
    8e6154a0ec
  44. address test review nits 83277c8e8e
  45. rename -txoutsbyaddress to -txoutindex dd18e448ac
  46. set executable bit eed6139bbd
  47. add doc for gettxoutsbyaddress REST endpoint 5c2fb0d7b0
  48. in src/coinsbyscript.h: in aad38c87f5 outdated
    35+    inline void SerializationOp(Stream& s, Operation ser_action, int nType, int nVersion) {
    36+        READWRITE(setCoins);
    37+    }
    38+};
    39+
    40+typedef std::map<uint160, CCoinsByScript> CCoinsMapByScript; // uint160 = hash of script
    


    ryanofsky commented at 5:13 pm on January 4, 2017:
    Maybe replace uint160 with CScriptID to make the map type more obvious.
  49. in src/coinsbyscript.h: in aad38c87f5 outdated
    52+    bool GetCoinsByScript(const CScript &script, CCoinsByScript &coins);
    53+
    54+    // Return a modifiable reference to a CCoinsByScript.
    55+    CCoinsByScript &GetCoinsByScript(const CScript &script, bool fRequireExisting = true);
    56+
    57+    static uint160 getKey(const CScript &script); // we use the hash of the script as key in the database
    


    ryanofsky commented at 5:15 pm on January 4, 2017:
    Could remove if using CScriptID as the map key type.
  50. in src/coinstats.h: in aad38c87f5 outdated
    18+    uint64_t nAddressesOutputs; // equal nTransactionOutputs (if addressindex is enabled)
    19+    uint64_t nSerializedSize;
    20+    uint256 hashSerialized;
    21+    CAmount nTotalAmount;
    22+
    23+    CCoinsStats() : nHeight(0), nTransactions(0), nTransactionOutputs(0), nAddresses(0), nAddressesOutputs(0), nSerializedSize(0), nTotalAmount(0) {}
    


    ryanofsky commented at 5:22 pm on January 4, 2017:
    Could assign these values inline above with c++11.
  51. in src/coinsbyscript.cpp: in aad38c87f5 outdated
    32+    if (!base->GetCoinsByHashOfScript(key, tmp))
    33+    {
    34+        if (fRequireExisting)
    35+            return cacheCoinsByScript.end();
    36+    }
    37+    CCoinsMapByScript::iterator ret = cacheCoinsByScript.insert(it, std::make_pair(key, CCoinsByScript()));
    


    ryanofsky commented at 5:54 pm on January 4, 2017:

    Shouldn’t this be inserting tmp into the map, not an empty CCoinsByScript() set?

    Also,

    • It would be nice if this code had some basic c++ unit tests similar to coins_tests.cpp, even if they aren’t as comprehensive.
    • Don’t really need the other temporary ret variable here, could just return insert().second.
    • Could replace insert with emplace or emplace_hint here to avoid needing to call make_pair.
  52. in src/txmempool.cpp: in aad38c87f5 outdated
    391+{
    392+    LOCK(cs);
    393+    CCoinsMapByScript::const_iterator it = mapCoinsByScript.find(CCoinsViewByScript::getKey(script));
    394+    if (it != mapCoinsByScript.end())
    395+    {
    396+        BOOST_FOREACH(const COutPoint &outpoint, it->second.setCoins)
    


    ryanofsky commented at 6:27 pm on January 4, 2017:

    could avoid the loop with

    coinsByScript.setCoins.insert(it->second.setCoins.begin(), it->second.setCoins.end());

  53. in src/txdb.cpp: in 0caeaab4fe outdated
    147+        boost::this_thread::interruption_point();
    148+        try {
    149+            i++;
    150+            pcursor->Next();
    151+        } catch (std::exception &e) {
    152+            return 0;
    


    ryanofsky commented at 6:43 pm on January 4, 2017:
    Why is this exception being caught instead of passed on to the caller? I’d think it’d be better to remove the catch and have the error propogate up, or add a comment here which explains the cases where std::exception is expected and why they should return 0.
  54. in src/test/mempool_tests.cpp: in 0caeaab4fe outdated
    115@@ -116,7 +116,7 @@ void CheckSort(CTxMemPool &pool, std::vector<std::string> &sortedOrder)
    116 
    117 BOOST_AUTO_TEST_CASE(MempoolIndexingTest)
    118 {
    119-    CTxMemPool pool(CFeeRate(0));
    120+    CTxMemPool pool(CFeeRate(0), false);
    


    ryanofsky commented at 6:57 pm on January 4, 2017:
    The CTxMemPool constructor saves a const reference to the second argument, so instead of passing literal false, you should pass a variable with a lifetime at least as long as CTxMemPool. Might be better to have the CTxMemPool class just store a bool instead of a reference to a bool to avoid the need to do this.
  55. in qa/rpc-tests/txoutsbyaddress.py: in 0caeaab4fe outdated
    41+        txid1 = self.nodes[0].sendtoaddress(address, 10)
    42+        self.nodes[0].generate(101) # node will collect its own fee
    43+        self.sync_all()
    44+        assert_equal(self.nodes[0].getbalance(), 5090)
    45+        assert_equal(self.nodes[1].getbalance(), 10)
    46+        txouts = self.nodes[0].gettxoutsbyaddress(1, (address,))
    


    ryanofsky commented at 7:34 pm on January 4, 2017:
    Maybe check other txouts fields here as well.
  56. in src/coinsbyscript.cpp: in 0caeaab4fe outdated
    94+    pcoinsViewByScriptIn->cacheCoinsByScript.clear();
    95+
    96+    if (!hashBlock.IsNull())
    97+        batch.Write(DB_BEST_BLOCK, hashBlock);
    98+
    99+    LogPrint("coindb", "Committing %u coin address indexes to coin database...\n", (unsigned int)count);
    


    ryanofsky commented at 7:46 pm on January 4, 2017:
    Could use %z instead of (unsigned int) cast, I think.
  57. ryanofsky commented at 8:04 pm on January 4, 2017: member
    Reviewed most of this PR, but not the whole thing. Mostly minor comments, except for a possible bug in CCoinsViewByScript::FetchCoinsByScript.
  58. paveljanik commented at 7:50 pm on January 5, 2017: contributor
    Needs rebase.
  59. fix up a change missed by the rebase 8804522e20
  60. djpnewton force-pushed on Jan 5, 2017
  61. djpnewton commented at 10:55 pm on January 5, 2017: contributor

    rebased

    I will start addressing review by @ryanofsky (cheers mate)

    Agree with @gmaxwell that naming is confusing, I am thinking about unifying it around “-utxoindex”

  62. MarcoFalke commented at 11:36 pm on January 5, 2017: member

    I am thinking about unifying it around “-utxoindex”

    Sounds good.

  63. btcdrak commented at 12:19 pm on January 18, 2017: contributor
    needs rebase again.
  64. ryanofsky commented at 6:54 pm on February 13, 2017: member
    @djpnewton or @droark have you done more work on this? I’d be happy to help continue the effort on this feature.
  65. droark commented at 1:54 am on February 19, 2017: contributor

    @ryanofsky - I hadn’t been working on this but I’m happy to run with the ball since the OP seems to have disappeared again. I’ll post a new PR once it’s up and running.

    Thanks.

  66. fanquake commented at 2:14 am on February 19, 2017: member
    Closing this for now then. @droak or @ryanofsky can open a new PR once they’ve restarted the work.
  67. fanquake closed this on Feb 19, 2017

  68. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me