Build txindex in parallel with validation #13033

pull jimpo wants to merge 12 commits into bitcoin:master from jimpo:txindex-refactor-take2 changing 18 files +757 −86
  1. jimpo commented at 0:11 am on April 20, 2018: contributor

    I’m re-opening #11857 as a new pull request because the last one stopped loading for people


    This refactors the tx index code to be in it’s own class and get built concurrently with validation code. The main benefit is decoupling and moving the txindex into a separate DB. The primary motivation is to lay the groundwork for other indexers that might be desired (such as the compact filters). The basic idea is that the TxIndex spins up its own thread, which first syncs the txindex to the current block index, then once in sync the BlockConnected ValidationInterface hook writes new blocks.

    DB changes

    At the suggestion of some other developers, the txindex has been split out into a separate database. A data migration runs at startup on any nodes with a legacy txindex. Currently the migration blocks node initialization until complete.

    Open questions

    • Should the migration of txindex data from the old DB to the new DB block in init or should it happen in a background thread? The downside to backgrounding it is that getrawtransaction would return an error message saying the txindex is syncing while the migration is running.

    Impact

    In a sample size n=1 test where I synced nodes from scratch, the average time Index writing was 3.36ms in master and 1.72ms in this branch. The average time between UpdateTip log lines for sequential blocks between 400,000 and IBD end on mainnet was 0.297204s in master and 0.286134s in this branch. Most likely this is just variance in IBD times, but I can try with some more trials if people want.

  2. fanquake added the label UTXO Db and Indexes on Apr 20, 2018
  3. fanquake added this to the "Blockers" column in a project

  4. Sjors commented at 9:49 am on April 20, 2018: member

    If anyone wants to read the old PR and has difficulty loading the page, it helps to use another browser session that’s logged out.

    Only comment of mine I’d like to export to this PR is the suggestion to backport “the migration code that removes txindex without requiring a reindex would be useful for folks who regret having set txindex=1 on a node with slow hardware.”

    Tested 3272b17637c5, including kill -9ing the migration from legacy to the new db.

  5. in doc/release-notes-pr11857.md:1 in 3272b17637 outdated
    0@@ -0,0 +1,11 @@
    1+Transaction index changes
    


    promag commented at 2:24 pm on April 24, 2018:
    Rename file? PR changed.
  6. in src/index/txindex.cpp:1 in 3272b17637 outdated
    0@@ -0,0 +1,308 @@
    1+// Copyright (c) 2017 The Bitcoin Core developers
    


    promag commented at 2:25 pm on April 24, 2018:
    nit, 2018? (and 2 more files)
  7. in src/rpc/rawtransaction.cpp:51 in 3272b17637 outdated
    47@@ -47,6 +48,8 @@ void TxToJSON(const CTransaction& tx, const uint256 hashBlock, UniValue& entry)
    48     TxToUniv(tx, uint256(), entry, true, RPCSerializationFlags());
    49 
    50     if (!hashBlock.IsNull()) {
    51+        LOCK(cs_main);
    


    promag commented at 2:46 pm on April 24, 2018:
    In order to avoid locking and lookup here, how about adding the argument CBlockIndex* pindex to TxToJSON? (I know this is only called from getrawtransaction).

    jimpo commented at 5:56 pm on April 24, 2018:
    I’m pretty against adding more to this PR. Seems like a good follow-on cleanup item.

    promag commented at 6:17 pm on April 24, 2018:
    Fair.
  8. in src/index/txindex.h:79 in 3272b17637 outdated
    74+    ///
    75+    /// @param[in]   tx_hash  The hash of the transaction to be returned.
    76+    /// @param[out]  block_hash  The hash of the block the transaction is found in.
    77+    /// @param[out]  tx  The raw transaction itself.
    78+    /// @return  true if transaction is found, false otherwise
    79+    bool FindTx(const uint256& tx_hash, uint256& block_hash, CTransactionRef& tx) const;
    


    promag commented at 3:07 pm on April 24, 2018:
    nit, LookupTransaction.
  9. in src/index/txindex.h:73 in 3272b17637 outdated
    68+    /// the current state of the block chain. This only blocks if the index has gotten in sync once
    69+    /// and only needs to process blocks in the ValidationInterface queue. If the index is catching
    70+    /// up from far behind, this method does not block and immediately returns false.
    71+    bool BlockUntilSyncedToCurrentChain();
    72+
    73+    /// Look up a raw transaction by hash.
    


    promag commented at 3:07 pm on April 24, 2018:
    Drop raw?

    jimpo commented at 6:05 pm on April 24, 2018:
    Your wish is my command.
  10. in src/index/txindex.h:77 in 3272b17637 outdated
    72+
    73+    /// Look up a raw transaction by hash.
    74+    ///
    75+    /// @param[in]   tx_hash  The hash of the transaction to be returned.
    76+    /// @param[out]  block_hash  The hash of the block the transaction is found in.
    77+    /// @param[out]  tx  The raw transaction itself.
    


    promag commented at 3:07 pm on April 24, 2018:
    Drop raw?
  11. in src/index/txindex.cpp:293 in 3272b17637 outdated
    288+void TxIndex::Start()
    289+{
    290+    // Need to register this ValidationInterface before running Init(), so that
    291+    // callbacks are not missed if Init sets m_synced to true.
    292+    RegisterValidationInterface(this);
    293+    if (!Init()) {
    


    promag commented at 3:34 pm on April 24, 2018:
    I don’t understand this case, should exit instead?

    jimpo commented at 6:02 pm on April 24, 2018:
    My thinking is that logging and running without the TxIndex is better than FatalError, but I could see it the other way too. Probably a good idea to add a log line at the very least.

    promag commented at 8:30 pm on April 24, 2018:
    I’d say running without the desired functionality should exit.

    sipa commented at 10:20 pm on April 24, 2018:
    I agree with @promag. If the daemon can’t provide the service that is requested, it should fail early. Otherwise you may just complicate debugging.

    jimpo commented at 11:22 pm on April 24, 2018:
    Changed to FatalError.
  12. in src/rpc/rawtransaction.cpp:252 in 3272b17637 outdated
    256-        for (const auto& tx : setTxids) {
    257-            const Coin& coin = AccessByTxid(*pcoinsTip, tx);
    258-            if (!coin.IsSpent()) {
    259-                pblockindex = chainActive[coin.nHeight];
    260-                break;
    261+        LOCK(cs_main);
    


    promag commented at 3:43 pm on April 24, 2018:

    Instead of nesting, repeat the lock:

    0if (!request.params[1].isNull()) {
    1    hashBlock = uint256S(request.params[1].get_str());
    2    LOCK(cs_main);
    3    ...
    4} else {
    5    LOCK(cs_main);
    6    ...
    7}
    
  13. promag commented at 3:46 pm on April 24, 2018: member
    Tested ACK 3272b17 (round 1).
  14. jimpo force-pushed on Apr 24, 2018
  15. in src/index/txindex.cpp:111 in 40eabf6a71 outdated
    106+                LogPrintf("Syncing txindex with block chain from height %d\n", pindex->nHeight);
    107+                last_log_time = current_time;
    108+            }
    109+
    110+            if (last_locator_write_time + SYNC_LOCATOR_WRITE_INTERVAL < current_time) {
    111+                WriteBestBlock();
    


    sipa commented at 10:06 pm on April 24, 2018:
    As this is called in the middle of ThreadSync(), shouldn’t this be writing a locator for pindex rather than chainActive?

    jimpo commented at 11:13 pm on April 24, 2018:
    Yeah, wow, oops. Good catch.
  16. jimpo force-pushed on Apr 24, 2018
  17. in src/index/txindex.cpp:88 in 8f1bfe1488 outdated
    83+
    84+        int64_t last_log_time = 0;
    85+        int64_t last_locator_write_time = 0;
    86+        while (true) {
    87+            if (m_interrupt) {
    88+                WriteBestBlock();
    


    sipa commented at 11:40 pm on April 24, 2018:
    This function takes an argument now.
  18. jimpo force-pushed on Apr 25, 2018
  19. jimpo force-pushed on Apr 25, 2018
  20. jimpo force-pushed on Apr 25, 2018
  21. jimpo force-pushed on Apr 25, 2018
  22. [db] Create separate database for txindex.
    The new TxIndexDB class will be used by a future commit in this
    change set.
    0cb8303241
  23. [db] Migration for txindex data to new, separate database. c88bcec93f
  24. [index] Create new TxIndex class.
    The TxIndex will be responsible for building the transaction index
    concurrently with the main validation thread by implementing
    ValidationInterface. This does not process blocks concurrently yet.
    34d68bf3a3
  25. [index] TxIndex initial sync thread.
    TxIndex starts up a background thread to get in sync with the block
    index before blocks are processed through the ValidationInterface.
    94b4f8bbb9
  26. [index] Allow TxIndex sync thread to be interrupted. 70d510d93c
  27. [index] TxIndex method to wait until caught up.
    In order to preserve getrawtransaction RPC behavior, there needs to be
    a way for a thread to ensure the transaction index is in sync with the
    current state of the blockchain.
    f90c3a62f5
  28. [init] Initialize and start TxIndex in init code. 8181db88f6
  29. [validation] Replace tx index code in validation code with TxIndex. e0a3b80033
  30. [index] Move disk IO logic from GetTransaction to TxIndex::FindTx. a03f804f2a
  31. [rpc] Public interfaces to GetTransaction block until synced.
    Now that the transaction index is updated asynchronously, in order to
    preserve the current behavior of public interfaces, the code blocks
    until the transaction index is caught up with the current state of the
    blockchain.
    6d772a3d44
  32. [test] Simple unit test for TxIndex. ed77dd6b30
  33. [doc] Include txindex changes in the release notes. 9b2704777c
  34. jimpo force-pushed on Apr 25, 2018
  35. sipa commented at 1:20 am on April 26, 2018: member

    utACK 9b2704777ceeca48d57ce058ae91674c7764b143

    Also compared with a re-merge of master with 806b2f1 (which had utACK from @ryanofsky and @Sjors), with a re-merge of master with 523dd76 (which had utACK from @TheBlueMatt), and with a re-merge of master with ea8be45 (which had Tested ACK from @jamesob) to find the only differences are:

    • Adding an abstracted-out function TxIndex::WriteBestBlock, which is invoked additionally after interrupting the background sync and periodically.
    • Checks in TxIndex::SetBestChain to see if callbacks are received after BlockConnected.
    • Push down cs_main locks in gettxoutproof.
    • Simplification of TxIndexDB::ReadBestBlock.
    • Changes in comments and logging
  36. sipa merged this on Apr 26, 2018
  37. sipa closed this on Apr 26, 2018

  38. sipa referenced this in commit a07e8caa5d on Apr 26, 2018
  39. jamesob commented at 4:50 am on April 26, 2018: member
    :tada: nice work, @jimpo.
  40. fanquake removed this from the "Blockers" column in a project

  41. jimpo deleted the branch on May 1, 2018
  42. sipa added the label Needs release notes on Aug 14, 2018
  43. fanquake removed the label Needs release note on Mar 20, 2019
  44. UdjinM6 referenced this in commit 2924424924 on May 21, 2021
  45. UdjinM6 referenced this in commit f3ce0156a1 on May 25, 2021
  46. UdjinM6 referenced this in commit 7ff6515c88 on May 25, 2021
  47. UdjinM6 referenced this in commit 0f9f133f78 on Jun 5, 2021
  48. UdjinM6 referenced this in commit bcc8b35194 on Jun 5, 2021
  49. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 13:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me