Add optional transaction index to databases #2168

pull sipa wants to merge 1 commits into bitcoin:master from sipa:txindex changing 5 files +94 −7
  1. sipa commented at 1:56 am on January 11, 2013: member

    By specifying -txindex when initializing the database, a txid-to-diskpos index is maintained in the blktree database. This database is used to help answering getrawtransaction() RPC queries, when enabled.

    Changing the -txindex value requires a -reindex; the client will abort at startup if the database and the specified -txindex mismatch.

  2. in src/main.h: in 00a49bf831 outdated
    224+        READWRITE(*(CDiskBlockPos*)this);
    225+        READWRITE(VARINT(nTxOffset));
    226+    )
    227+
    228+    CDiskTxPos(const CDiskBlockPos &blockIn, unsigned int nTxOffsetIn) : CDiskBlockPos(blockIn.nFile, blockIn.nPos), nTxOffset(nTxOffsetIn) {
    229+    }
    


    Diapolo commented at 6:34 am on January 11, 2013:
    This looks a little weird, can you put that } at the end of the above line?
  3. Diapolo commented at 6:35 am on January 11, 2013: none
    How much will this increase a nodes load / how much larger will the resulting database be?
  4. sipa commented at 12:15 pm on January 11, 2013: member
    @Diapolo Around 500 MB extra in storage, and a lot of extra I/O. I didn’t spend much effort optimizing this, as I don’t consider this functionality a priority.
  5. Diapolo commented at 1:02 pm on January 11, 2013: none
    @sipa Thanks, I was just interested in the technical base aspects :).
  6. mikehearn commented at 5:27 pm on January 14, 2013: contributor
    Could you grab a service bit and make “getdata” use the new index too? Or is that too much additional work for this change?
  7. sipa commented at 5:29 pm on January 14, 2013: member
    @mikehearn I’m absolutely against making this available to the P2P network. If there is one thing I don’t want services to depend on, then it is the availability of a fully indexed transaction history. If you really need one, fine, but maintain it yourself.
  8. BitcoinPullTester commented at 6:09 pm on January 14, 2013: none
    Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/1f4691130ea0c54c69ceb04407507cd80dd05133 for binaries and test log.
  9. mikehearn commented at 6:13 pm on January 14, 2013: contributor

    I’m not sure I see the issue. Yes you can make a node do lots of disk IO, but downloading the block chain does that too. It’d be nice to fix, but it’s not a new attack vector.

    Apps that need indexes like that will end up just using blockchain.info or various random other sites/protocols to get what they want. It’s not like those apps will go away if the P2P network doesn’t give them what they need.

  10. sipa commented at 7:56 pm on January 14, 2013: member

    The only thing keeping history is necessary for, is bootstrapping fully validating nodes. However, that doesn’t require an index (or even a Bitcoin node at all - it could be provided by an HTTP-based file service or other protocols - something that web has plenty of). I’m sure that making a feature available on the P2P will result in infrastructure depending on it, something that would burden the nodes that provide archive data. Yes, making it an optional feature makes this much less of a problem, but I still prefer to be conservative with functions the P2P network provides unless there is a very clear use.

    As far as I know, all versions before 0.7.0 maintained a full tx-to-diskpos index, and there was no way at all to query it (not even RPC). I’m quite sure this was a deliberate choice by Satoshi. In a mail about maintaining per-txout spendability of wallet transactions, he clearly said not to rely on the txindex for this, as it wouldn’t always be available.

    EDIT: I dislike this feature as whole (even as RPC), as I think that services that are built not to depend on infinite history have better scalability. It’s extremely useful for debugging though, so I added it. I’m sure people will use it for other purposes as well, and that’s clearly preferable to having them depend on a centralized service for it. Providing it via P2P doesn’t add much to that, IMHO.

  11. mikehearn commented at 8:50 pm on January 14, 2013: contributor
    OK, good points. Now I agree with you.
  12. Add optional transaction index to databases
    By specifying -txindex when initializing the database, a txid-to-diskpos
    index is maintained in the blktree database. This database is used to
    help answering getrawtransaction() RPC queries, when enabled.
    
    Changing the -txindex value requires a -reindex; the client will abort
    at startup if the database and the specified -txindex mismatch.
    2d1fa42e85
  13. BitcoinPullTester commented at 3:55 am on January 24, 2013: none
    Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/2d1fa42e85c9164688aa69b3f54f015fbefc06aa for binaries and test log.
  14. gavinandresen commented at 4:48 pm on January 25, 2013: contributor

    Test plan that I’ll finish executing after lunch:

    Default arguments, getrawtransaction w. arbitrary old, not-in-your-wallet txid, fully-spent txn EXPECT: transaction not found

    Rerun with -txindex=1 EXPECT: startup fails with “must -reindex” message

    Note disk space used by blktree/ subdirectory Rerun with -txindex=1 -reindex=1 -logtimestamps=1 EXPECT: long background process to rebuild index (Q: how long?) Wait until “Reindexing finished” appears in debug.log, then: getrawtransaction EXPECT: success Q: How much extra disk space ?

    Rerun with -txindex=0: EXPECT: startup fails with “must -reindex” message

    Note disk space used by blktree/ Rerun with -txindex=0 -reindex=1 -logtimestamps=1 EXPECT: long background process to rebuild index (Q: how long?) Wait until “Reindexing finished” appears in debug.log, then: getrawtransaction w. old txid EXPECT: transaction not found EXPECT: txindex disk space freed from blktree/

  15. sipa commented at 4:52 pm on January 25, 2013: member

    @gavinandresen RE test plan:

    • getrawtransaction without txindex should work for not-fully spent confirmed transactions, mempool transactions or transactions in the relay cache. For not-fully spent confirmed transactions, it may be slower than with txindex present. In general, without txindex I consider getrawtransaction to just work on a best-effort basis.
    • extra disk space caused by txindex will be in the blktree/ directory

    Otherwise the plan looks correct and complete to me.

    EDIT: -reindex doesn’t cause a slow startup, the startup is always instant, but importing (and thus building the txindex) happens in the background.

  16. gavinandresen commented at 8:55 pm on January 25, 2013: contributor

    Test results: success!

    Disk space: blktree/ 30M –> 600M

    Time to -reindex: (note: running -g build, no -dbcache set) 1 hour 50 minutes (same time, with/without -fullindex)

    I’m going to pull.

  17. gavinandresen referenced this in commit 63cc7661a5 on Jan 25, 2013
  18. gavinandresen merged this on Jan 25, 2013
  19. gavinandresen closed this on Jan 25, 2013

  20. sipa deleted the branch on May 3, 2013
  21. aaronash commented at 0:11 am on November 25, 2013: none
    Just wanted to say thanks for working on this. Being able to getrawtransaction on pretty much anything removes reliance on 3rd party services, which I find very valuable.
  22. laudney referenced this in commit eb2b007979 on Mar 19, 2014
  23. owlhooter referenced this in commit 2c303cdb11 on Oct 11, 2018
  24. guruvan referenced this in commit 2ccfffc87d on Nov 8, 2018
  25. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me