indexes: Stop using node internal types and locking cs

ryanofsky commented at 4:56 pm on February 1, 2022: contributor

This PR lets indexing code mostly run outside of the node process. It also improves indexing sync logic, which is moved out of indexing code to a new node::SyncChain() function.

Almost all the commits in this PR are small refactoring changes that move code from src/index/ to src/node/, or replace references to node types like CBlockIndex, CChain, CChainState in index code. There are only two commits affecting indexing sync logic which make complicated or substantive changes:

The commit messages have more details about these and other changes. Followups to this PR will reuse indexing sync code for wallets (#15719, #11756) and let indexes run in separate processes (#10102)

DrahtBot added the label Build system on Feb 1, 2022

DrahtBot added the label interfaces on Feb 1, 2022

DrahtBot added the label P2P on Feb 1, 2022

DrahtBot added the label RPC/REST/ZMQ on Feb 1, 2022

DrahtBot added the label UTXO Db and Indexes on Feb 1, 2022

DrahtBot added the label Wallet on Feb 1, 2022

DrahtBot added the label Needs rebase on Feb 1, 2022

ryanofsky renamed this:
~~indexes: Stop using node API and locking cs_main, improve sync logic~~
indexes: Stop using node internal types and locking cs_main, improve sync logic
on Feb 1, 2022

ryanofsky force-pushed on Feb 1, 2022

ryanofsky commented at 9:47 pm on February 1, 2022: contributor

Rebased 8d8cdcb37005030f646ba3c45f7f54f556efb8bf -> 1fcfc73ba4e8180a8667868529774ab7b302ab5d (pr/indexy.1 -> pr/indexy.2, compare) due to conflict with #22932. Also fixed some bugs in the last commit 🙄

DrahtBot removed the label Needs rebase on Feb 1, 2022

DrahtBot commented at 4:02 am on February 2, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/24230.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	fjahr, mzumsande, jamesob, aureleoules, josibake
Approach ACK	TheCharlatan

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#30750 (scripted-diff: LogPrint -> LogDebug by maflcko)
#30664 (build: Remove Autotools-based build system by hebasto)
#30635 (rpc: add optional blockhash to waitfornewblock by Sjors)
#30546 (util: Use consteval checked format string in FatalErrorf by maflcko)
#30469 (index: Fix coinstats overflow and introduce index versioning by fjahr)
#30409 (Introduce waitTipChanged() mining interface, replace RPCNotifyBlockChange, drop CRPCSignals & g_best_block by Sjors)
#29770 (index: Check all necessary block data is available before starting to sync by fjahr)
#29652 (wallet: Avoid potentially writing incorrect best block locator by ryanofsky)
#29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)
#29432 (Stratum v2 Template Provider (take 3) by Sjors)
#26966 (index: initial sync speedup, parallelize process by furszy)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

ryanofsky force-pushed on Feb 2, 2022

ryanofsky commented at 8:55 pm on February 2, 2022: contributor

Updated 1fcfc73ba4e8180a8667868529774ab7b302ab5d -> c5be433bf3f1a2b5b96cbe00608eff24136e6f51 (pr/indexy.2 -> pr/indexy.3, compare) fixing asan error https://cirrus-ci.com/task/5137882733084672 and lint error https://cirrus-ci.com/task/5208251477262336. Did not look into tsan failure https://cirrus-ci.com/task/4574932779663360 and multiprocess blockfilter_index_tests segfault https://cirrus-ci.com/task/4856407756374016, but I think they may be addressed by the asan fix

ryanofsky force-pushed on Feb 3, 2022

ryanofsky commented at 10:39 pm on February 3, 2022: contributor

Updated c5be433bf3f1a2b5b96cbe00608eff24136e6f51 -> 3ab1ebeff4328dc1a456a3323d8a41f0200d7be2 (pr/indexy.3 -> pr/indexy.4, compare) fixing blockfilter_index_initial_sync crash that I could not produce locally but I believe was causing CI failures https://cirrus-ci.com/task/6670649857933312, https://cirrus-ci.com/task/5755856183623680, and https://cirrus-ci.com/task/4981799997669376

in src/interfaces/chain.h:87 in dc4675ddac outdated

67@@ -67,6 +68,20 @@ class FoundBlock
68     mutable bool found = false;
69 };
70 
71+//! Block data sent with blockConnected, blockDisconnected notifications.
72+struct BlockInfo {
73+    const uint256& hash;
74+    const uint256* prev_hash = nullptr;

Sjors commented at 1:26 pm on February 8, 2022:

dc4675ddaca8d0d1d40847191cb21ccd116a6a3c : why bother with a pointer here? To make it consistent with phashBlock in CBlockIndex? I assume it’s not just to shave 31 bytes of the BlockInfo size?

ryanofsky commented at 4:45 pm on February 8, 2022:

re: #24230 (review)

dc4675d : why bother with a pointer here? To make it consistent with phashBlock in CBlockIndex? I assume it’s not just to shave 31 bytes of the BlockInfo size?

Yes, hash is a reference and prev_hash is a pointer to minimize copying and make the struct small. C++ doesn’t provide named arguments, so I’m defining the struct as a substitute for named arguments, to make block connected and disconnected declarations more readable and the argument list easier to change in the future.

The field types are the same as the argument types that would be used otherwise. const uint256& hash could be replaced by uint256 hash, and const uint256* prev_hash could be std::optional<uint256>, and const CBlock* could be shared_ptr<CBlock>, but the various block connected and disconnected method implementations don’t benefit from getting copies instead of references, or optional instead of pointers, or smart pointers instead of plain pointers, so I’m opting for the cheapest and simplest passing style in each case.

in src/index/base.cpp:287 in ae26ae5d3a outdated

349@@ -345,9 +350,9 @@ void BaseIndex::Interrupt()
350     m_interrupt();
351 }
352 
353-bool BaseIndex::Start(CChainState& active_chainstate)
354+bool BaseIndex::Start()
355 {
356-    m_chainstate = &active_chainstate;
357+    m_chainstate = &m_chain->context()->chainman->ActiveChainstate();

Sjors commented at 1:52 pm on February 8, 2022:

ae26ae5d3a1d32ce212d66477b71a5ac03354338: you’re using context() here, but src/interfaces/chain.h suggests that should only be done in a test. Does this need a TODO? Is this resolved in one of the followup PRs?

ryanofsky commented at 5:56 pm on February 8, 2022:

re: #24230 (review)

ae26ae5: you’re using context() here, but src/interfaces/chain.h suggests that should only be done in a test. Does this need a TODO? Is this resolved in one of the followup PRs?

Good catch. Now removed this in the last commit and added comment about it being temporary

in src/index/blockfilterindex.cpp:219 in 515a1dbce2 outdated

216     CBlockUndo block_undo;
217     uint256 prev_header;
218 
219-    if (pindex->nHeight > 0) {
220+    if (block.height > 0) {
221+        const CBlockIndex* pindex = WITH_LOCK(cs_main, return m_chainstate->m_blockman.LookupBlockIndex(block.hash));

Sjors commented at 2:14 pm on February 8, 2022:

515a1dbce29078a1e90dc3f97bf25b309face9bb: this might warrant a TODO (or note in the commit message) that this CBlockIndex is going away in a later commit.

ryanofsky commented at 5:57 pm on February 8, 2022:

re: #24230 (review)

515a1db: this might warrant a TODO (or note in the commit message) that this CBlockIndex is going away in a later commit.

Good suggestion, added comments to clarify

in src/index/base.h:40 in 15398b1677 outdated

29@@ -28,7 +30,7 @@ struct IndexSummary {
30  * CValidationInterface and ensures blocks are indexed sequentially according
31  * to their position in the active chain.
32  */
33-class BaseIndex : public CValidationInterface

Sjors commented at 2:29 pm on February 8, 2022:

15398b1677362f4d98ffc786f1ae9976ad595ea8: comment above needs update

ryanofsky commented at 5:58 pm on February 8, 2022:

re: #24230 (review)

15398b1: comment above needs update

Good catch, updated

in src/interfaces/chain.h:84 in 15398b1677 outdated

80@@ -81,6 +81,8 @@ struct BlockInfo {
81     unsigned undo_pos = 0;
82     const CBlock* data = nullptr;
83     const CBlockUndo* undo_data = nullptr;
84+    //! Block is from the tip of the chain (always true except during initial sync)

Sjors commented at 2:46 pm on February 8, 2022:

15398b1677362f4d98ffc786f1ae9976ad595ea8 nit: “and reindex” (I haven’t checked if re-index nukes all the indexes, or if it leaves them alone until the previous best block is reached again)

ryanofsky commented at 5:58 pm on February 8, 2022:

re: #24230 (review)

15398b1 nit: “and reindex” (I haven’t checked if re-index nukes all the indexes, or if it leaves them alone until the previous best block is reached again)

Rephrased the comment to avoid the word sync. This isn’t referring to IBD or reindexing globally. It’s just referring to notifications from the attachChain / ThreadSync code that runs when the index is loaded. For example if we added an RPC method to start and stop indexes. chain_tip would be false for the blocks read after the index was first started, until the sync thread caught up. This would be independent of whether node itself was syncing to the network.

Sjors commented at 3:31 pm on February 8, 2022: member

Some questions and comments while reviewing the first 8 commits.

My brain got friend while trying to understand 15398b1677362f4d98ffc786f1ae9976ad595ea8, even with --color-moved and probably because the original logic is also a bit confusing. So I’ll try that again later.

ryanofsky force-pushed on Feb 9, 2022

ryanofsky commented at 1:10 am on February 9, 2022: contributor

Updated 3ab1ebeff4328dc1a456a3323d8a41f0200d7be2 -> d27072d58207b2648d8f1bf273389aee86db3001 (pr/indexy.4 -> pr/indexy.5, compare) with various suggested changes and a few other tweaks, all minor

My brain got friend while trying to understand 15398b1, even with --color-moved and probably because the original logic is also a bit confusing. So I’ll try that again later.

This is good feedback. I split the commit up into two parts, so the boilerplate code moving part is separate from the other changes which are a little less obvious.

ryanofsky commented at 1:10 am on February 9, 2022: contributor

Thanks for the review!

ryanofsky force-pushed on Feb 9, 2022

ryanofsky commented at 3:33 am on February 9, 2022: contributor

Updated d27072d58207b2648d8f1bf273389aee86db3001 -> 2acbc24a0ebcb2dc9f73f5a18964e796507e8ab5 (pr/indexy.5 -> pr/indexy.6, compare) to fix ubsan error https://cirrus-ci.com/task/5606579763412992?logs=ci#L4848

in src/node/chain.cpp:17 in 1131035134 outdated

12+    interfaces::BlockInfo info{index ? *index->phashBlock : uint256::ZERO};
13+    if (index) {
14+        info.prev_hash = index->pprev ? index->pprev->phashBlock : nullptr;
15+        info.height = index->nHeight;
16+        LOCK(::cs_main);
17+        info.file_number = index->nStatus & (BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO) ? index->nFile : -1;

Sjors commented at 1:37 pm on February 9, 2022:

11310351345e844ae22f9bc00bed00eff7197500 : this constraint of BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO makes sense in the context of indexes, because an index is updated after a block is connected, at which point it will always have undo data. But if we ever need it in other places, it seems a bit odd. Maybe add a comment? Alternatively maybe we should just add nStatus or bools for the individual flags we care about?

ryanofsky commented at 8:41 pm on February 9, 2022:

1131035 : this constraint of BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO makes sense in the context of indexes, because an index is updated after a block is connected, at which point it will always have undo data. But if we ever need it in other places, it seems a bit odd. Maybe add a comment? Alternatively maybe we should just add nStatus or bools for the individual flags we care about?

I guess I’ll drop this. I assumed one wouldn’t be present without the other, and was basing it on the thing CDiskBlockIndex is doing. Most intermediate code which uses this is deleted in the end anyway.

in src/interfaces/chain.h:46 in ade3f4306b outdated

37@@ -38,6 +38,12 @@ namespace interfaces {
38 class Handler;
39 class Wallet;
40 
41+//! Hash/height pair to help track and identify blocks.
42+struct BlockKey {

Sjors commented at 1:48 pm on February 9, 2022:

ade3f4306bba6e40197d645774097c78a5e0681a: this is a nice addition, e.g. block->hash is more readable than before when block would be a hash.

Sjors commented at 2:19 pm on February 9, 2022: member

In b0220aa8bb0ab3164415734e47c57a2f5204a97c you’re getting rid of BaseIndex::Init(), moving its prune checking code to BaseIndex::Start(), other parts to blockConnected and still more to checkBlocks() and attachChain() in interface.cpp. Is there way to do that across multiple commits?

I also find the commit description “Remove pruned data checking code” ambiguous, since it’s not being removed at all. It’s not even moved out of the indexer.

ryanofsky force-pushed on Feb 9, 2022

ryanofsky commented at 9:05 pm on February 9, 2022: contributor

Updated 2acbc24a0ebcb2dc9f73f5a18964e796507e8ab5 -> 9acc7fc39d89e31ea6b46c9fda4c539b82cced8c (pr/indexy.6 -> pr/indexy.7, compare) tweaking commit description and reverting status flags change.

In b0220aa you’re getting rid of BaseIndex::Init(), moving its prune checking code to BaseIndex::Start(), other parts to blockConnected and still more to checkBlocks() and attachChain() in interface.cpp. Is there way to do that across multiple commits?

I also find the commit description “Remove pruned data checking code” ambiguous, since it’s not being removed at all. It’s not even moved out of the indexer.

Renamed to just say it’s removing index Init method. I wonder if you are reviewing the code with –color-moved? The bulk of the init function is checking for pruned blocks and is moving unchanged. The other part is just initializing m_best_block_index and m_synced members.

It might be possible to move each of those those two parts of the Init function in different commits, but this would no longer be atomically setting m_best_block_index and m_synced and checking for pruned blocks with cs_main held. Also attachChain would have weird functionality in the interim.

I wonder if there is something specific which is confusing about this commit, or if you just don’t like the boilerplate it adds? Calling attachChain to call blockConnected to set m_synced and m_best_block_index is obviously a more roundabout way to set m_synced and m_best_block_index, but it does actually make sense later as the blockConnected method is filled out in the following commits.

Sjors commented at 10:38 am on February 10, 2022: member

I did review with --color-moved=dimmed_zebra, though without dimmed_zebra it’s slightly more clear and a bigger monitor helps too.

It helped to notice that Init() is only called from Start(). The first four lines that set locator are just moved.

The two lines after that, which obtain the lock and set active_chain are turned into a new function checkBlocks() which is called from another new function attachChain(). The end of the original Init() function comes back in that the only way attachChain can return null is with a prune violation. The middle part is more complicated.

One job of Init() was to set m_best_block_index and m_synced. That task has been moved to blockConnected(), which is called once by attachChain(). That part is a bit hard to follow, though it does look right. Part of the problem is that you need to understand the original logic.

Once those two are set, it continue on to find a prune violation. That part is a clear move to checkBlocks.

You might be able to introduce checkBlocks() in a commit before this. If wouldn’t make things simpler, but the diff would be more focussed on the boiler plate changes.

I wonder if there is something specific which is confusing about this commit, or if you just don’t like the boilerplate it adds?

I’m probably fine with the end result; we’ll see when I get to later commits, but I’m not worried about that. It’s indeed the change itself that’s a bit confusing.

2504811223597b211aa52b86a6386f4c02286d50: although I can follow along fine, it might make sense to split out the IgnoreBlockConnected and IgnoreChainStateFlushed helpers first (instead of “framing” it as a rename)

in src/index/base.cpp:436 in 2504811223 outdated

397         m_thread_sync.join();
398     }
399+
400+    // Call handler destructor after releasing m_mutex, because notification
401+    // threads and disconnect code may need to lock the mutex.
402+    auto handler = WITH_LOCK(m_mutex, return std::move(m_handler));

Sjors commented at 12:36 pm on February 10, 2022:

2504811223597b211aa52b86a6386f4c02286d50 : IIUC this makes it so the Handler that’s in m_handler (before its moved) is destroyed when this function is done. Why is it insufficient to just call m_handler->disconnect() and have it be destroyed along with BaseIndex?

I guess that’s what “because … disconnect code may need to lock the mutex” refers to, but it’s not super clear to me why this is.

Related to the next d148f9e2abc05c83b60f1d92bcd7884174fa9ae3 where you assert that m_handler really needs to be gone by the time destructor on BaseIndex is called.

ryanofsky commented at 3:45 pm on February 10, 2022:

re: #24230 (review)

2504811 : IIUC this makes it so the Handler that’s in m_handler (before its moved) is destroyed when this function is done. Why is it insufficient to just call m_handler->disconnect() and have it be destroyed along with BaseIndex?

I guess that’s what “because … disconnect code may need to lock the mutex” refers to, but it’s not super clear to me why this is.

Thanks, wrote a more descriptive comment. The disconnect code waits for the last notification to be processed, so if the last notification need to lock m_mutex for any reason there would be a deadlock if this code wasn’t releasing it.

ryanofsky commented at 3:10 pm on February 13, 2022:

re: #24230 (review)

Thanks, wrote a more descriptive comment.

Actually fixed now. I messed up the last push and did not include this change.

in src/index/base.cpp:256 in 0cefaaab90 outdated

253@@ -254,7 +254,7 @@ bool BaseIndex::Rewind(const CBlockIndex* current_tip, const CBlockIndex* new_ti
254     // In case we reorg beyond the pruned depth, ReadBlockFromDisk would
255     // throw and lead to a graceful shutdown
256     m_best_block_index = new_tip;

Sjors commented at 12:49 pm on February 10, 2022:

0cefaaab906e3de84a4e2c5a15e27e56255e17ac: I guess you can now defer this until after commit()?

ryanofsky commented at 3:45 pm on February 10, 2022:

re: #24230 (review)

0cefaaa: I guess you can now defer this until after commit()?

I guess you’re suggesting this code could commit first and update m_best_block_index later. I think this may be true currently, but I don’t think it’s necessarily safe to assume that other indexing code (particularly in CustomInit methods) doesn’t care about the m_best_block_index value or won’t depend on it in the future.

Anyway, the indexing commit code is pretty low quality–it’s bad about creating multiple batches, and sharing state inappropriately, and not being atomic–but I’m changing it as little as possible because it’s mostly orthogonal to this PR. I do think the PR should make it a little easier to improve committing in the future since at least now it’s passing the locator explicitly and it changes the base Commit method to call the custom Commit method instead of the other way around, so the control flow is simpler.

in src/index/coinstatsindex.cpp:271 in d6ebc941f6 outdated

296-            ReverseBlock(block, iter_tip);
297-
298-            iter_tip = iter_tip->GetAncestor(iter_tip->nHeight - 1);
299-        } while (new_tip_index != iter_tip);
300-    }
301+    ReverseBlock(block);

Sjors commented at 1:07 pm on February 10, 2022:

d6ebc941f641d68eabe4c3243eb6769dd2d13cf5: this refactor of CustomRewind from iterating over blocks to CustomRemove which just takes a single block might be worth doing in a separate commit.

ryanofsky commented at 3:46 pm on February 10, 2022:

re: #24230 (review)

d6ebc94: this refactor of CustomRewind from iterating over blocks to CustomRemove which just takes a single block might be worth doing in a separate commit.

~~Good suggestion, split this commit~~

ryanofsky commented at 5:46 am on February 15, 2022:

re: #24230 (comment)

d6ebc94: this refactor of CustomRewind from iterating over blocks to CustomRemove which just takes a single block might be worth doing in a separate commit.

I tried this but couldn’t actually figure out how how to do it in a good way. I updated the commit description and I want the commit to reflect fact that block read and block undo read is moving from custom indexes up to the BaseIndex::Rewind. If I add the loop in base class without removing it from the custom indexes, it’s less clear that code is just moving.

in src/index/base.cpp:79 in b8be01a7ef outdated

75@@ -75,17 +76,28 @@ void BaseIndexNotifications::blockConnected(const interfaces::BlockInfo& block)
76     if (!block.data || m_index.IgnoreBlockConnected(block)) return;
77 
78     const CBlockIndex* best_block_index = m_index.m_best_block_index.load();
79-    if (best_block_index != pindex->pprev && !m_index.Rewind(best_block_index, pindex->pprev)) {
80+    if (block.chain_tip && best_block_index != pindex->pprev && !m_index.Rewind(best_block_index, pindex->pprev)) {

Sjors commented at 1:27 pm on February 10, 2022:

b8be01a7ef20672fd5faf4f6b82cf5d3fa2a717b : Why this extra condition?

ryanofsky commented at 3:46 pm on February 10, 2022:

re: #24230 (review)

b8be01a : Why this extra condition?

Good catch. This change makes more sense in the earlier commit adding this code, so I moved it. (It wasn’t needed before this commit just because IgnoreBlockConnected was always returning true in this case before this commit.)

in src/index/base.cpp:91 in b8be01a7ef outdated

88+    CBlockUndo block_undo;
89+    if (m_options.connect_undo_data && pindex->nHeight > 0) {
90+        if (!node::UndoReadFromDisk(block_undo, pindex)) {
91+            FatalError("%s: Failed to read block %s undo data from disk",
92+                       __func__, pindex->GetBlockHash().ToString());
93+            return m_index.Interrupt();

Sjors commented at 1:31 pm on February 10, 2022:

b8be01a7ef20672fd5faf4f6b82cf5d3fa2a717b This seems new…

ryanofsky commented at 3:46 pm on February 10, 2022:

re: #24230 (review)

b8be01a This seems new…

This is not actually new, it’s moved from BlockFilterIndex::CustomAppend and CoinStatsIndex::CustomAppend below. The different UndoReadFromDisk calls are being consolidated here

in src/index/base.cpp:155 in b8be01a7ef outdated

 99                    __func__, pindex->GetBlockHash().ToString());
100-        return;
101-    } else {
102+        return m_index.Interrupt();
103+    }
104+    if (m_index.m_synced) {

Sjors commented at 1:33 pm on February 10, 2022:

b8be01a7ef20672fd5faf4f6b82cf5d3fa2a717b : this also seems new. Maybe belongs in the next commit?

ryanofsky commented at 4:06 pm on February 10, 2022:

re: #24230 (review)

b8be01a : this also seems new. Maybe belongs in the next commit?

This if (m_synced) condition just preserves existing behavior. Now that ThreadSync is calling blockConnected, m_synced is no longer always true at this point. For some reason I don’t know (but don’t want to change) ThreadSync code didn’t previously update m_best_block_index while it was reading and connecting blocks, and I didn’t want to change that. This behavior might have been intended for performance reasons, or error handling reasons, or maybe so RPC indexing status would only report best block of indexing data that had actually been flushed to disk. But in any case nothing should be changing here.

ryanofsky commented at 1:27 am on February 11, 2022:

re: #24230 (review)

b8be01a : this also seems new. Maybe belongs in the next commit?

Maybe I should add a comment here like “Don’t update m_best_block_index between flushes if not synced. Unclear why best block is not updated in this case, but this has been longstanding behavior since syncing was introduced in #13033 so care should be taken if changing this.”

ryanofsky commented at 3:20 pm on February 13, 2022:

re: #24230 (review)

Maybe I should add a comment

Added now

mzumsande commented at 9:30 pm on February 15, 2022:

My impression is that in the old code, the ThreadSync the strategy was to update m_best_block_index exactly before calling Commit() (and using pindex instead of m_best_block_index for everything else). So updating it after each WriteBlock call was just unnecessary.

ryanofsky commented at 4:59 am on February 16, 2022:

re: #24230 (review)

My impression is that in the old code, the ThreadSync the strategy was to update m_best_block_index exactly before calling Commit() (and using pindex instead of m_best_block_index for everything else). So updating it after each WriteBlock call was just unnecessary.

I don’t think this implies it’s unnecessary unless you are assuming ChainStateFlushed (which calls Commit) is always called immediately after BlockConnected, which I think may not always be the case (i.e. during IBD). I’d actually be more worried about removing the m_best_block_index assignment here than I would be about assigning it unconditionally. I’d be happy to review another PR which changes this (and happy to rebase this PR on top) but in general I would like to not change m_best_block_index semantics in this PR unless it really helps things and we can be very sure it’s safe.

mzumsande commented at 7:45 pm on February 18, 2022:

I meant “updating it after each WriteBlock in the ThreadSync phase” is unnecessary. During the Sync phase, validation callbacks are ignored, and ThreadSync does the work - considering that the ThreadSync code in master makes sure that m_best_block_index is updated right before each possible Commit() call, and never used otherwise (pindex is used for that), I think that an earlier update of m_best_block_index after the WriteBlock call within ThreadSync is just not necessary. So I was just trying to explain why the longstanding behavior in master makes sense to me. I’m not suggesting to change anything here.

ryanofsky commented at 8:45 pm on February 18, 2022:

Ok. Current behavior in master (which should not be changed here) still does not make sense to me. I don’t think it’s wrong (though it could be wrong), I just mean it seems inconsistent and strange.

in src/index/base.cpp:203 in e359c2666c outdated

202-                    // No need to handle errors in Commit. See rationale above.
203-                    Commit(GetLocator(*m_chain, pindex->GetBlockHash()));
204+                    assert(pindex);
205+                    notifications->blockConnected(node::MakeBlockInfo(pindex));
206+                    assert(m_synced);
207+                    notifications->chainStateFlushed(GetLocator(*m_chain, pindex->GetBlockHash()));

Sjors commented at 1:57 pm on February 10, 2022:

e359c2666cfd205c2ea84c241bccf565670365c0: I’m a bit puzzled about what chainStateFlushed is for in the context of indexing. Right now it doesn’t do much other than call Commit(). But we weren’t calling it here before.

The commit title says “… to chainStateFlushed” but in reality that function is untouched and only blockConnected is changed.

ryanofsky commented at 3:46 pm on February 10, 2022:

re: #24230 (review)

e359c26: I’m a bit puzzled about what chainStateFlushed is for in the context of indexing.

It just calls Commit() to update the best block locator on disk and flush whatever custom state the index may want to save in CustomCommit().

Right now it doesn’t do much other than call Commit(). But we weren’t calling it here before.

In commit e359c2666cfd205c2ea84c241bccf565670365c0, previous line 201 Commit(...); is changing to notifications->chainStateFlushed(...); so there should be preserving behavior

The commit title says “… to chainStateFlushed” but in reality that function is untouched and only blockConnected is changed.

I’m happy to change commit subject and I made some tweaks, but it would help to have more specific suggestions to know what to change. The format of these commits subject lines generally is “index: Remove [part of indexing code that doesn’t belong in indexing]” or “index: Move [part of indexing code out of the code that will be moved to the node later]”, so if I rename this subject line I might need to rename the other “index: Move” subject lines too.

It is often hard to describe changes commits are making in just a few words. The subject lines here are trying to describe the goal of each commit, and the commit bodies are trying to itemize the actual changes. A lot of the changes in different commits are pretty similar to each other, so the subject lines are also trying focus on what makes commits distinct from each other.

jamesob commented at 8:02 pm on March 22, 2022:

I guess this line could technically result in a double Commit() call at some point (based on blockConnected -> m_last_locator_write_time conditional), but even if that’s true it would only happen once per locator-sync-interval AFAICT. So not a big deal.

ryanofsky commented at 9:21 pm on March 25, 2022:

In commit “indexes, refactor: Move Commit logic out of ThreadSync to notification handlers” (7add4807d84e89daec20042559b076f9941b9c1a)

re: #24230 (review)

I guess this line could technically result in a double Commit() call at some point (based on blockConnected -> m_last_locator_write_time conditional), but even if that’s true it would only happen once per locator-sync-interval AFAICT. So not a big deal.

This shouldn’t happen because the new notifications->blockConnected call is not passing any block data (block.data == nullptr, so the blockConnected implementation just updates m_synced and m_best_block_index and doesn’t do anything else with the block).

I added more documentation about different types of blockConnected calls in interfaces/chain.h but the idea is that a blockConnected call with block.data != nullptr provides information about a new block that is being connected and needs to be processed, and a call with block.data == nullptr just tells you the last block that was already connected, which is useful at the beginning or end of a sync.

TheCharlatan commented at 9:26 am on December 28, 2023:

In commit 94fa723893a5acd2bdb69e859b76675f2d615b61: This still feels a bit convoluted. The way I read this, this now adds a call to IgnoreChainstateFlushed. Why not move the commit call to the new blockConnected code block as well?

ryanofsky commented at 4:25 pm on March 13, 2024:

re: #24230 (review)

In commit 94fa723: This still feels a bit convoluted. The way I read this, this now adds a call to IgnoreChainstateFlushed. Why not move the commit call to the new blockConnected code block as well?

The point of index blockConnected method is to add new data from blocks to the index, and the point of the chainStateFlushed method is to commit updated state to the database. In this one specific place, blockConnected and chainStateFlushed method are both called together, but normally they are called separately, so committing index data in blockConnected would be change in behavior and bad for performance.

It is true that adding a call to chainStateFlushed here goes through IgnoreChainstateFlushed logic instead of committing unconditionally, and this adds some temporary complexity. But the ignore method is dropped later, and ultimately indexing code becomes simpler because it no longer has to have special case code dealing with syncing. It can handle block connected and flush events the same whether it is in the sync phase or not.

Sjors commented at 1:59 pm on February 10, 2022: member

Reviewed a couple more commits.

ryanofsky force-pushed on Feb 11, 2022

ryanofsky commented at 1:16 am on February 11, 2022: contributor

Thanks again for the review and suggestions!

Updated 9acc7fc39d89e31ea6b46c9fda4c539b82cced8c -> 3c9fa85865c3d766b367e6f2842928112d362fc1 (pr/indexy.7 -> pr/indexy.8, compare) just making internal changes to commits and commit descriptions. Final code is the same except for a new const modifier.

re: #24230 (comment)

You might be able to introduce checkBlocks() in a commit before this.

Thanks, this is a good suggestion. I moved it to a new earlier commit.

2504811: although I can follow along fine, it might make sense to split out the IgnoreBlockConnected and IgnoreChainStateFlushed helpers first (instead of “framing” it as a rename)

I’m actually not sure what suggestion would be but definitely feel free to tell me what to change or send a commit (in case it is easier to show the change than explain it)

fjahr commented at 12:53 pm on February 13, 2022: contributor

Concept ACK, will start reviewing soon

uvhw referenced this in commit 47d44ccc3e on Feb 14, 2022

ryanofsky force-pushed on Feb 15, 2022

ryanofsky commented at 5:51 am on February 15, 2022: contributor

Updated 3c9fa85865c3d766b367e6f2842928112d362fc1 -> 7ae0a4639d25e31eb1d9bd1e01d93cadc3e90a6c (pr/indexy.8 -> pr/indexy.9, compare) with no code changes, just adding a few comments and more information to some commit messages

in src/interfaces/chain.h:78 in dc4675ddac outdated

73+    const uint256& hash;
74+    const uint256* prev_hash = nullptr;
75+    int height = -1;
76+    int file_number = -1;
77+    unsigned data_pos = 0;
78+    unsigned undo_pos = 0;

mzumsande commented at 3:12 pm on February 15, 2022:

commit dc4675ddaca8d0d1d40847191cb21ccd116a6a3c: Looks like undo_pos is never being used.

ryanofsky commented at 4:55 am on February 16, 2022:

re: #24230 (review)

commit dc4675d: Looks like undo_pos is never being used.

Great catch! Removed

in src/index/base.cpp:225 in f631a1b16a outdated

219@@ -211,19 +220,16 @@ void BaseIndex::ThreadSync()
220 bool BaseIndex::Commit()
221 {
222     CDBBatch batch(GetDB());
223-    if (!CommitInternal(batch) || !GetDB().WriteBatch(batch)) {
224+    bool success = CustomCommit(batch);
225+    if (success) {
226+        GetDB().WriteBestBlock(batch, GetLocator(*m_chain, m_best_block_index.load()->GetBlockHash()));

mzumsande commented at 3:26 pm on February 15, 2022:

f631a1b16ac2f58f47942c7cf51fbe8445694f65: I think that this might be a slight change in behavior: If m_best_block_index is nullptr here (which is possible if we get an interrupt early in init), this will segfault now. Though the old behavior (that a locator to the tip was returned) was not ideal either, see #24117.

ryanofsky commented at 4:56 am on February 16, 2022:

re: #24230 (review)

f631a1b: I think that this might be a slight change in behavior: If m_best_block_index is nullptr here (which is possible if we get an interrupt early in init), this will segfault now. Though the old behavior (that a locator to the tip was returned) was not ideal either, see #24117.

Absolutely. Rebasing on your fix from #24117, I believe this should be fixed here as well.

DrahtBot added the label Needs rebase on Feb 15, 2022

in src/index/base.cpp:252 in 3450df4d01 outdated

249-        return false;
250+    const Consensus::Params& consensus_params = Params().GetConsensus();
251+    CBlock block;
252+    CBlockUndo block_undo;
253+
254+    for (const CBlockIndex* iter_tip = current_tip; iter_tip != new_tip; iter_tip = iter_tip->pprev) {

mzumsande commented at 9:10 pm on February 15, 2022:

3450df4d01608bd29f443aab3df1eed45108b3b2: I think this incremental processing will mean that in case of a deeper reorg, intermediate blocks are processed twice in CopyHeightIndexToHashIndex now, as new_tip and current_tip. Also, there willl be some extra calls to ReadBlockFromDisk also for txindex now (which doesn’t implement CustomRemove logic). But this seems fine to me.

ryanofsky commented at 4:57 am on February 16, 2022:

re: #24230 (review)

3450df4: I think this incremental processing will mean that in case of a deeper reorg, intermediate blocks are processed twice in CopyHeightIndexToHashIndex now, as new_tip and current_tip.

Yes, exactly. I fixed commit message to describe this instead of saying “This commit does not change behavior in any way”

Also, there willl be some extra calls to ReadBlockFromDisk also for txindex now (which doesn’t implement CustomRemove logic). But this seems fine to me.

Oh, this part was not intentional. Changed to remove these block reads. They are wasteful for BlockFilterIndex as well as TxIndex

in src/index/base.h:143 in 3450df4d01 outdated

117@@ -118,7 +118,7 @@ class BaseIndex
118 
119     /// Rewind index to an earlier chain tip during a chain reorg. The tip must
120     /// be an ancestor of the current best block.
121-    [[nodiscard]] virtual bool CustomRewind(const interfaces::BlockKey& current_tip, const interfaces::BlockKey& new_tip) { return true; }
122+    [[nodiscard]] virtual bool CustomRemove(const interfaces::BlockInfo& block) { return true; }

mzumsande commented at 9:14 pm on February 15, 2022:

3450df4d01608bd29f443aab3df1eed45108b3b2: The comment above could be adjusted, CustomRemove now rewinds just one block.

ryanofsky commented at 4:58 am on February 16, 2022:

re: #24230 (review)

3450df4d01608bd29f443aab3df1eed45108b3b2: The comment above could be adjusted, CustomRemove now rewinds just one block.

Thanks, now noted in comment

in src/index/base.cpp:98 in 36a9a550de outdated

93@@ -92,6 +94,12 @@ void BaseIndexNotifications::blockConnected(const interfaces::BlockInfo& block)
94         }
95         block_info.undo_data = &block_undo;
96     }
97+    int64_t current_time = GetTime();
98+    if (m_last_log_time + SYNC_LOG_INTERVAL < current_time) {

mzumsande commented at 9:36 pm on February 15, 2022:

36a9a550debc7324954765e60aefc33ab1456e5c: This should probably be conditional on !m_index.m_synced, otherwise there would be additional log messages after the index is synced (after most new blocks, since SYNC_LOG_INTERVAL=30s).

ryanofsky commented at 5:00 am on February 16, 2022:

re: #24230 (review)

36a9a55: This should probably be conditional on !m_index.m_synced, otherwise there would be additional log messages after the index is synced (after most new blocks, since SYNC_LOG_INTERVAL=30s).

Yes, good catch. Added this

in src/index/base.cpp:363 in f56a851e11 outdated

359@@ -351,10 +360,12 @@ bool BaseIndex::IgnoreChainStateFlushed(const CBlockLocator& locator)
360     // event, log a warning and let the queue clear.
361     const CBlockIndex* best_block_index = m_best_block_index.load();
362     if (best_block_index->GetAncestor(locator_tip_index->nHeight) != locator_tip_index) {
363-        LogPrintf("%s: WARNING: Locator contains block (hash=%s) not on known best " /* Continued */
364-                  "chain (tip=%s); not writing index locator\n",
365-                  __func__, locator_tip_hash.ToString(),
366-                  best_block_index->GetBlockHash().ToString());
367+        if (m_synced) {

mzumsande commented at 10:06 pm on February 15, 2022:

f56a851e11321ede9c5bcb43cd067c17f9efab39: Adding the condition if (m_synced) seems unnecessary at this point, IgnoreChainStateFlushed returns right at the beginning if not synced.

ryanofsky commented at 5:01 am on February 16, 2022:

re: #24230 (review)

f56a851: Adding the condition if (m_synced) seems unnecessary at this point, IgnoreChainStateFlushed returns right at the beginning if not synced.

Oh, this is actually correct, but the check above should have been removed. The syncing here get simpler after the “Rewrite chain sync logic, remove racy init” commit, but right now chainStateFlushed notification events from the validation interface (not the sync thread) can happen at random times while the sync thread is still syncing, so the check avoids printing in this case while still not committing anything.

in src/index/base.cpp:144 in f56a851e11 outdated

138@@ -125,6 +139,25 @@ void BaseIndexNotifications::blockConnected(const interfaces::BlockInfo& block)
139 
140 void BaseIndexNotifications::blockDisconnected(const interfaces::BlockInfo& block)
141 {
142+    // During initial sync, ignore validation interface notifications, only
143+    // process notifications from sync thread.
144+    if (!m_index.m_synced && !block.chain_tip) return;

mzumsande commented at 10:51 pm on February 15, 2022:

f56a851e11321ede9c5bcb43cd067c17f9efab39: why the additional !block.chain_tip condition? I’d assume if we happen to experience a reorg during initial sync and disconnect the tip, block.chain_tip=true and we don’t return, but we wouldn’t want to process this. Possibly related: If, at this commit, I am in the middle of syncing an index and at the same time call invalidateblock for the tip, my node coredumps with “leveldb/db/db_iter.cc:65: virtual leveldb::Slice leveldb::{anonymous}::DBIter::key() const: Assertion valid_’ failed.`”

ryanofsky commented at 5:01 am on February 16, 2022:

re: #24230 (review)

f56a851: why the additional !block.chain_tip condition?

Oh no. This is just a bug. It should say if (!m_index.m_synced && block.chain_tip). During sync notifications can come from two different sources;

1 - The sync thread (with chain_tip = false) 2 - The validation interface (with chain_tip = true)

We just want to ignore sync notifications from the validation interface like the comment above says.

I’d assume if we happen to experience a reorg during initial sync and disconnect the tip, block.chain_tip=true and we don’t return, but we wouldn’t want to process this.

Most of the time we won’t want to process this but sometimes we might if the reorg happens right at the end of the sync. We just want to process this whenever being called from the sync thread, and not process it when not called from the sync thread during sync. After sync, we always want to process it.

Possibly related: If, at this commit, I am in the middle of syncing an index and at the same time call invalidateblock for the tip, my node coredumps with “leveldb/db/db_iter.cc:65: virtual leveldb::Slice leveldb::{anonymous}::DBIter::key() const: Assertion valid_’ failed.`”

Interesting, it does sound related. I would be curious to know if is fixed now. And maybe I could make this into a test. I’m curious if you made changes to somehow pause the sync to emulate a reorg happening at the same time?

mzumsande commented at 7:54 pm on February 18, 2022:

Interesting, it does sound related. I would be curious to know if is fixed now. And maybe I could make this into a test. I’m curious if you made changes to somehow pause the sync to emulate a reorg happening at the same time?

Yes, I don’t encounter this anymore now. What I did was manual testing on regtest (just adding a std::this_thread::sleep_for to ThreadSync, giving me the possibility to time my invalidateblock) - I don’t know if it is possible to somehow make an automated test for this that is not racy.

mzumsande commented at 10:55 pm on February 15, 2022: contributor

Reviewed up to commit f56a851e11321ede9c5bcb43cd067c17f9efab39 (“Move Rewind logic out of Rewind to blockDisconnected and ThreadSync”), below some comments.

ryanofsky force-pushed on Feb 16, 2022

ryanofsky commented at 11:33 pm on February 16, 2022: contributor

Great feedback! Thanks for the reviews!

Rebased 7ae0a4639d25e31eb1d9bd1e01d93cadc3e90a6c -> 72c29827f8adbc988be26c2002a8feedf9adc15a (pr/indexy.9 -> pr/indexy.10, compare) due to conflict with #24117, also making various changes suggested during review

DrahtBot removed the label Needs rebase on Feb 17, 2022

in src/interfaces/chain.h:146 in fb164ac88e outdated

133@@ -134,6 +134,9 @@ class Chain
134     //! pruned), and contains transactions.
135     virtual bool haveBlockOnDisk(int height) = 0;
136 
137+    //! Get tip information.
138+    virtual bool getTip(const FoundBlock& block={}) = 0;

mzumsande commented at 10:23 pm on February 21, 2022:

fb164ac88e8e5f7b7bb448637a06da6ae299e578 I think getTip() here and in interfaces.cpp could move to the next commit (“Remove SyncWithValidationInterfaceQueue call”) where they are used.

ryanofsky commented at 4:14 am on March 11, 2022:

re: #24230 (review)

fb164ac I think getTip() here and in interfaces.cpp could move to the next commit

Good catch. Moved this.

in src/index/base.cpp:232 in cec2014245 outdated

230+bool BaseIndex::Commit(const CBlockLocator& locator)
231 {
232     // Don't commit anything if we haven't indexed any block yet
233-    // (this could happen if init is interrupted).
234-    if (m_best_block_index == nullptr) {
235+    // (this could happen if init is interrupted).A

mzumsande commented at 10:28 pm on March 3, 2022:

cec20142450117a40e64a77b8447e0f6294ec9e9: remove “A”

ryanofsky commented at 4:15 am on March 11, 2022:

re: #24230 (review)

cec20142450117a40e64a77b8447e0f6294ec9e9: remove “A”

Thanks, fixed this.

in src/index/base.cpp:400 in c1bdcdf338 outdated

441@@ -426,11 +442,8 @@ bool BaseIndex::Start()
442         LOCK(m_mutex);
443         m_notifications = std::move(notifications);
444         m_handler = std::move(handler);
445-    }
446-
447-    const CBlockIndex* index = m_best_block_index.load();
448-    if (!CustomInit(index ? std::make_optional(interfaces::BlockKey{index->GetBlockHash(), index->nHeight}) : std::nullopt)) {

mzumsande commented at 7:22 pm on March 5, 2022:

c1bdcdf3388792b7c17e576170167f0be883fa4a: The commit message (“Move Commit and…”) doesn’t describe what’s being done here, maybe something like “Move CustomInit out of Start and error handling code out of ThreadSync to notification handlers”?

ryanofsky commented at 4:17 am on March 11, 2022:

re: #24230 (review)

c1bdcdf: The commit message (“Move Commit and…”) doesn’t describe what’s being done here, maybe something like “Move CustomInit out of Start and error handling code out of ThreadSync to notification handlers”?

Thanks, fixed this now.

in src/index/base.cpp:446 in c1bdcdf338 outdated

446-
447-    const CBlockIndex* index = m_best_block_index.load();
448-    if (!CustomInit(index ? std::make_optional(interfaces::BlockKey{index->GetBlockHash(), index->nHeight}) : std::nullopt)) {
449-        return false;
450+        assert(m_notifications->m_init_result);
451+        if (!m_notifications->m_init_result) return false;

mzumsande commented at 7:40 pm on March 5, 2022:

c1bdcdf3388792b7c17e576170167f0be883fa4a: why both an assert and a conditional? Or did you mean m_init_result.value() in the second line? Maybe also consider being explicit (has_value()) for the std::optional<bool> in the assert to avoid confusion.

ryanofsky commented at 4:17 am on March 11, 2022:

re: #24230 (review)

c1bdcdf3388792b7c17e576170167f0be883fa4a: why both an assert and a conditional? Or did you mean m_init_result.value() in the second line? Maybe also consider being explicit (has_value()) for the std::optional<bool> in the assert to avoid confusion.

Good catch. I was intending to use * operator, but I followed your suggestion instead since that seems more readable.

Note: bug here was wasn’t actually present in the full PR. Bug was temporarily introduced here in commit c1bdcdf3388792b7c17e576170167f0be883fa4a, but fixed by later commit fb164ac88e8e5f7b7bb448637a06da6ae299e578

mzumsande commented at 8:43 pm on March 5, 2022: contributor

Concept ACK, made it through all commits

Some minor comments below.

I did some light testing for the final version, and one thing I noticed that the Index Sync now seems to block everything else and makes it impossible to abort gracefully: The wallet is not loaded (so that RPC calls are disabled), and also interrupting bitcoind with SIGINT would not work for me any more.

DrahtBot added the label Needs rebase on Mar 9, 2022

jonatack commented at 2:04 pm on March 9, 2022: member

This pull seems related to areas I’ve been looking at, will review.

ryanofsky commented at 3:59 am on March 11, 2022: contributor

I did some light testing for the final version, and one thing I noticed that the Index Sync now seems to block everything else and makes it impossible to abort gracefully: The wallet is not loaded (so that RPC calls are disabled), and also interrupting bitcoind with SIGINT would not work for me any more. @mzumsande I’ve been trying to reproduce this by modifying the syncchain method, but as far as I can tell, interrupt logic seems to be working as expected. Can you provide more specific steps to reproduce this? It also would help to see log output particularly near “thread exit” and “Shutdown: In progress…” lines. If you can get gdb thread apply all bt output after sending SIGINT, that would provide a good picture, too.

ryanofsky force-pushed on Mar 11, 2022

ryanofsky commented at 4:57 am on March 11, 2022: contributor

Thanks for the review!

Rebased 72c29827f8adbc988be26c2002a8feedf9adc15a -> 4d387545bcc9ea7c158fd6278154c200c2cbec13 (pr/indexy.10 -> pr/indexy.11, compare) due to conflict with #24138. Also made a few updates suggested in review.

DrahtBot removed the label Needs rebase on Mar 11, 2022

ryanofsky force-pushed on Mar 11, 2022

ryanofsky commented at 7:00 am on March 11, 2022: contributor

Updated 4d387545bcc9ea7c158fd6278154c200c2cbec13 -> 016304173ef6864e3c2f1104bd5ffe7595721ca6 (pr/indexy.11 -> pr/indexy.12, compare) with lock fix for tsan error https://cirrus-ci.com/task/6456107408293888, and link fix for bitcoin-chainstate error https://cirrus-ci.com/task/4538559129452544

mzumsande commented at 5:04 pm on March 11, 2022: contributor

@mzumsande I’ve been trying to reproduce this by modifying the syncchain method, but as far as I can tell, interrupt logic seems to be working as expected. Can you provide more specific steps to reproduce this? It also would help to see log output particularly near “thread exit” and “Shutdown: In progress…” lines. If you can get gdb thread apply all bt output after sending SIGINT, that would provide a good picture, too.

I tried to reproduce this again, but couldn’t anymore. I must have made some error during my tests the last time. I again tried to interrupt the index sync with SIGINT, SIGKILL and invalidateblock during sync, and everything worked fine. Sorry for the noise.

ryanofsky commented at 5:31 pm on March 11, 2022: contributor

Sorry for the noise.

No, thank you for testing! This was a good exercise for me to try to reproduce. I could see there is a real underlying issue because index sync threads can be greedy in their use of cs_main, and because index sync threads start before wallets are loaded, and wallet loading does not seem to be interruptible, Ctrl-C could become less usable if wallets take longer to load.

This PR should be improving the issue, not making it worse, because it is locking cs_main strictly less than previously during index sync. But thread scheduling is not always straightforward, and it’s possible even releasing a lock could lead to worse latency, so this is something good to look out for and test.

maxim85200 approved

bitcoin deleted a comment on Mar 14, 2022

DrahtBot added the label Needs rebase on Mar 17, 2022

ryanofsky force-pushed on Mar 18, 2022

ryanofsky commented at 1:08 pm on March 18, 2022: contributor

Rebased 016304173ef6864e3c2f1104bd5ffe7595721ca6 -> 23d07effca30157779d7d9d324f1ee1a41324e07 (pr/indexy.12 -> pr/indexy.13, compare) due to conflict with #24515

DrahtBot removed the label Needs rebase on Mar 18, 2022

ryanofsky force-pushed on Mar 18, 2022

ryanofsky commented at 7:16 pm on March 18, 2022: contributor

Updated 23d07effca30157779d7d9d324f1ee1a41324e07 -> d14e89f701b224977bef3ccde233d86271f7ed9b (pr/indexy.13 ->

pr
 /indexy.14

, compare) to fix tsan race in coinstatsindex_unclean_shutdown test https://cirrus-ci.com/task/5516517512052736?logs=ci#L4877, and probably (couldn’t reproduce these) fix msan and centos segfaults in the same test https://cirrus-ci.com/task/6642417418895360?logs=ci and https://cirrus-ci.com/task/4953567558631424?logs=ci

in src/node/chain.cpp:17 in cf5d340448 outdated

 7+#include <uint256.h>
 8+
 9+namespace node {
10+interfaces::BlockInfo MakeBlockInfo(const CBlockIndex* index, const CBlock* data)
11+{
12+    interfaces::BlockInfo info{index ? *index->phashBlock : uint256::ZERO};

jamesob commented at 9:05 pm on March 18, 2022:

cf5d3404484b08939992566cca0f1151d050a921

Do we ever expect that index will be null? Based on the calls to MakeBlockInfo, it doesn’t look like it. Make sense to just accept a reference here?

Edit: okay, I see - in a later commit (when we add attachChain), this may be passed a nullptr if we’re initializing the handler with an empty datadir.

ryanofsky commented at 9:16 pm on March 25, 2022:

re: #24230 (review)

Edit: okay, I see - in a later commit (when we add attachChain), this may be passed a nullptr if we’re initializing the handler with an empty datadir.

Right, in cases where CChain::Tip() is null this just returns a struct with all default values. Alternative would be to return optional<BlockInfo> and pass around optional<BlockInfo> variables more places. But this would force code to handle nullopt as a special case when in practice the struct values are ignored anyway and the defaults are fine. It’s similar to using INT_MAX if you mean to represent infinity. It doesn’t capture the meaning perfectly, but the semantics are close enough that it can make sense if it gets rid of special cases and makes code simpler.

in src/index/base.h:133 in 15daa47ee0 outdated

89@@ -87,10 +90,8 @@ class BaseIndex : public CValidationInterface
90 
91     void ChainStateFlushed(const CBlockLocator& locator) override;
92 
93-    const CBlockIndex* CurrentIndex() { return m_best_block_index.load(); };
94-
95     /// Initialize internal state from the database and block index.
96-    [[nodiscard]] virtual bool Init();
97+    [[nodiscard]] virtual bool CustomInit(const std::optional<interfaces::BlockKey>& block) { return true; }

jamesob commented at 6:55 pm on March 21, 2022:

15daa47ee06b9f334dd2862856f41a2ad4b34a3d

Sad to see a Russ PR that’s introducing a new method with PascalCase, but I guess we’re matching existing style… :stuck_out_tongue:

ryanofsky commented at 9:12 pm on March 25, 2022:

re: #24230 (review)

15daa47

Sad to see a Russ PR that’s introducing a new method with PascalCase, but I guess we’re matching existing style… stuck_out_tongue

Yeah I like it for new classes mostly. I’m happy you are using this convention in validation code since validation code has lots of standalone functions, and knowing whether you are calling method or a function can really tell you something. In index code there aren’t too many standalone functions.

in src/index/coinstatsindex.cpp:420 in 0e80167cc8 outdated

409@@ -425,13 +410,13 @@ bool CoinStatsIndex::ReverseBlock(const CBlock& block, const CBlockIndex* pindex
410     }
411 
412     // Remove the new UTXOs that were created from the block
413-    for (size_t i = 0; i < block.vtx.size(); ++i) {
414-        const auto& tx{block.vtx.at(i)};
415+    for (size_t i = 0; i < block.data->vtx.size(); ++i) {

jamesob commented at 3:29 pm on March 22, 2022:

0e80167cc848364a456bbfd4cba12891a5f06b35

Maybe worth doing Assert(block.data)->vtx.size() here? Possibly also with .undo_data below.

ryanofsky commented at 9:12 pm on March 25, 2022:

re: #24230 (review)

0e80167

Maybe worth doing Assert(block.data)->vtx.size() here? Possibly also with .undo_data below.

Makes sense, added asserts

in src/index/coinstatsindex.cpp:179 in afce60462f outdated

172@@ -182,7 +173,7 @@ bool CoinStatsIndex::CustomAppend(const interfaces::BlockInfo& block)
173 
174             // The coinbase tx has no undo data since no former output is spent
175             if (!tx->IsCoinBase()) {
176-                const auto& tx_undo{block_undo.vtxundo.at(i - 1)};
177+                const auto& tx_undo{block.undo_data->vtxundo.at(i - 1)};

jamesob commented at 7:34 pm on March 22, 2022:

afce60462f6d4ebc7b470ac2860c5d905e406ec7

Might consider Assert(...) here.

ryanofsky commented at 9:12 pm on March 25, 2022:

re: #24230 (review)

afce604

Might consider Assert(...) here.

Added asserts above

in src/index/base.cpp:137 in bf6a0f8ae8 outdated

93@@ -92,6 +94,15 @@ void BaseIndexNotifications::blockConnected(const interfaces::BlockInfo& block)
94         }
95         block_info.undo_data = &block_undo;
96     }
97+    int64_t current_time = 0;

jamesob commented at 7:42 pm on March 22, 2022:

bf6a0f8ae87a3ea4aec2c5a080f86fc44e0bd19e

Hmm… note to other reviewers, I guess: it wasn’t obvious to me that this is equivalent to the original code because it seems like if m_synced is somehow set between the conditional below and the next usage of current_time (line 117 at time of writing), then this code will behave differently than the original code (i.e. current_time == 0 but is used in the second conditional). But I read through and verified that an m_synced set doesn’t happen in between.

Would be nice to be able to verify that current_time != 0 in the second usage (line 117), but not mandatory IMO.

ryanofsky commented at 9:12 pm on March 25, 2022:

re: #24230 (review)

Would be nice to be able to verify that current_time != 0 in the second usage (line 117), but not mandatory IMO.

Thanks. I added a direct assert that m_synced value didn’t change. This is an even stricter check, and ensures that the Commit() call still happens and doesn’t get skipped.

in src/index/base.h:85 in c4e0a9dd45 outdated

70 
71-    /// Sync the index with the block index starting from the current best block.
72-    /// Intended to be run in its own thread, m_thread_sync, and can be
73-    /// interrupted with m_interrupt. Once the index gets in sync, the m_synced
74-    /// flag is set and the BlockConnected ValidationInterface callback takes
75-    /// over and the sync thread exits.

jamesob commented at 3:22 pm on March 23, 2022:

c4e0a9dd457018d78dc6be29251aec9d649af56c

Edit: this criticism is invalidated by later commit 8c0c231840b6a88e470da69c6885e6b6dd088df5

Now that we’ve removed the doc here, it’d be nice to have some kind of indication in the attachChain() doc (or even inline in the body of the function) that that function is now responsible for starting and setting m_thread_sync; as it stands in this commit there isn’t any external indication of that.

ryanofsky commented at 9:11 pm on March 25, 2022:

re: #24230 (review)

c4e0a9d

Edit: this criticism is invalidated by later commit 8c0c231

Now that we’ve removed the doc here, it’d be nice to have some kind of indication in the attachChain() doc (or even inline in the body of the function) that that function is now responsible for starting and setting m_thread_sync; as it stands in this commit there isn’t any external indication of that.

To improve this, I added some comments and an assert to this commit.

in src/interfaces/chain.h:278 in c4e0a9dd45 outdated

274@@ -275,6 +275,8 @@ class Chain
275         bool disconnect_data = false;
276         //! Include undo data with block disconnected notifications.
277         bool disconnect_undo_data = false;
278+        //! Name to use for sync thread.

jamesob commented at 3:24 pm on March 23, 2022:

c4e0a9dd457018d78dc6be29251aec9d649af56c

“sync thread” in the context of Chain is a little ambiguous IMO; maybe say “index sync thread”?

ryanofsky commented at 9:11 pm on March 25, 2022:

re: #24230 (review)

c4e0a9d

“sync thread” in the context of Chain is a little ambiguous IMO; maybe say “index sync thread”?

Changed comment to “Name to use for attachChain sync thread”. I would like to use the new attachChain method for wallets as well as indexes (#15719) so this is avoiding the word “index”.

in src/interfaces/chain.h:293 in d39c6d0062 outdated

288@@ -286,6 +289,9 @@ class Chain
289     //! Register handler for notifications.
290     virtual std::unique_ptr<Handler> handleNotifications(std::shared_ptr<Notifications> notifications) = 0;
291 
292+    //! Wait for pending notifications.
293+    virtual void waitForNotifications() = 0;

jamesob commented at 3:49 pm on March 23, 2022:

d39c6d00627ddf746e1c37a4d6e07f0a24586034

The naming here given the use is somewhat misleading to me; naively, one possible meaning of “waitForNotifications” is that we’re just waiting for a single notification to come in. What this really does is wait for the validation interface queue to clear, i.e. for any pending indexation to complete. I think something like waitForEmptyIndexQueue() or similar might be less confusing.

ryanofsky commented at 9:10 pm on March 25, 2022:

re: #24230 (review)

d39c6d0

The naming here given the use is somewhat misleading to me; naively, one possible meaning of “waitForNotifications” is that we’re just waiting for a single notification to come in. What this really does is wait for the validation interface queue to clear, i.e. for any pending indexation to complete. I think something like waitForEmptyIndexQueue() or similar might be less confusing.

Changed from waitForNotifications to waitForPendingNotifications now. This new name is less consistent with the waitForNotificationsIfTipChanged method name immediately below, but it is less ambiguous.

I think waitForEmptyIndexQueue() would be misleading because the queue doesn’t need to be empty for this to return. It just waits for currently pending notifications to be processed, not new notifications. It just marks a point in the stream, so the queue may never need to be empty at any point.

in src/node/interfaces.cpp:768 in 8c0c231840 outdated

814         return handler;
815     }
816     std::unique_ptr<Handler> handleNotifications(std::shared_ptr<Notifications> notifications) override
817     {
818-        return std::make_unique<NotificationsHandlerImpl>(std::move(notifications));
819+        return std::make_unique<NotificationsHandlerImpl>(std::move(notifications), /*connect=*/true);

jamesob commented at 7:47 pm on March 23, 2022:

8c0c231840b6a88e470da69c6885e6b6dd088df5

I’m confused about what this is doing - why is connect true here but false in the call above?

ryanofsky commented at 9:10 pm on March 25, 2022:

re: #24230 (review)

8c0c231

I’m confused about what this is doing - why is connect true here but false in the call above?

Sorry, this logic got messy because the notifications proxy now gets registered and unregistered from different threads, and could disconnected from the handler before the register call could happen, and I was trying to deal with cirrus TSAN errors. I rewrote it so it should understandable now.

To answer the question, connect was true for the older handleNotifications method because it doesn’t care about stale notifications and just wants to connect right away, but false for attachChain because it delays registration to avoid sending stale notifications.

in src/node/interfaces.cpp:845 in c7729d1f0c outdated

780+            promise.set_value();
781+            SyncChain(active.m_chain, locator_block, notifications, interrupt,
782+                      /*on_sync=*/[proxy] { CallFunctionInValidationInterfaceQueue([proxy] { if (proxy->connect()) RegisterSharedValidationInterface(proxy); }); });
783+        });
784         }
785+        promise.get_future().wait();

jamesob commented at 8:16 pm on March 23, 2022:

c7729d1f0ccb17c39c42ee5c7881d469d150c101

Just so I’m sure I understand, this has the effect of blocking the init thread until the first blockConnected call is completed?

ryanofsky commented at 9:10 pm on March 25, 2022:

re: #24230 (review)

c7729d1

Just so I’m sure I understand, this has the effect of blocking the init thread until the first blockConnected call is completed?

Yes, added comment. The wallet or index calling attachChain needs be able to find out the sync starting point or last block synced. An alternate approach would be for attachChain to return the starting block. This is approach I took earlier in #15719 https://github.com/bitcoin/bitcoin/blob/2b898a0456d742911c1f74229c9fc64a0012ca42/src/interfaces/chain.h#L260-L261, but it seems like would made things more complicated for callers by adding an output argument, and requiring the caller to update m_synced and m_best_block from multiple places instead of only inside blockConnected.

jamesob commented at 1:56 pm on March 24, 2022: contributor

Concept ACK

I’ve read through the code and built each commit locally.

At a high level I think this is a great change - the more “peripheral” parts of the system we can put on the other side of the Chain interface, the easier it is to carve off those parts both in terms of process separation (limiting “contagion” risk of non-essential code) and deeper modularization (allowing for more straightforward progression of e.g. libbitcoinkernel). The indexing system is a great candidate for this kind of change, and I’m really glad to see this PR doing it.

That said, it took me a while to get through the changes. I think there’s some potentially unnecessary churn within the commits (noted below). While the code quality is very good, behavioral equivalence with existing code is not immediately obvious to me when reading through the changes (given there are just so many of them), and so to have confidence that this PR isn’t really changing any behavior, I’m really going to need to dig in and do some testing as well as review the coverage of the existing functional tests.

I reviewed all commits except for 4e0b09777b, which I found hard to follow - but I’m going to give it another look as I read it near the end of the day and maybe my brain was just fried.

The only other general note I had was on

8c0c231840 indexes: Rewrite chain sync logic, remove racy init

For easier reviewability it would be nice to absorb this into prior commits if possible; i.e. avoid introducing the Ignore* methods only to take them away in this commit. Creates a lot of churn to review. But would be curious to know if this is particularly difficult to do for some reason.

For what it’s worth, the code in this commit much clearer than previous equivalents of SyncChain. Again there seems to be some churn in attachChain() that would save on unnecessary review of prior commits.

The NotificationsHandlerImpl could use some doc; I’ve had a hard time determining generally how it fits in and e.g. what the connect parameter does.

ryanofsky force-pushed on Mar 28, 2022

ryanofsky commented at 6:36 pm on March 28, 2022: contributor

Thanks for the review!

Updated d14e89f701b224977bef3ccde233d86271f7ed9b -> 8d3fd88ee31569e8c5d7e33f584b222a4ef3cd1e (pr/indexy.14 -> pr/indexy.15, compare) with suggestions.

re: #24230#pullrequestreview-914866406

so to have confidence that this PR isn’t really changing any behavior, I’m really going to need to dig in and do some testing as well as review the coverage of the existing functional tests.

To be sure, I don’t think it’s hugely important for these commits not to change behavior. It’s just that the code is complicated, and it already works, so if the cleanups avoid changing behavior it is easier to have some confidence that they work, too, and don’t introduce larger problems. The goal is to avoid introducing bugs, and keeping behavior unchanged is just a way of achieving that.

I reviewed all commits except for 4e0b097, which I found hard to follow

Yeah, that commit has a lot of annoying detail. The main thing it does is change rewind code to stop calling CustomRemove in a loop, and instead call blockDisconnected in a loop, with blockDisconnected calling customRemove. This get the blockDisconnected method to be functional enough so the rewind loop can be moved out of indexing code entirely, and left to the node.

The only other general note I had was on
* [x]  [8c0c231](https://github.com/bitcoin/bitcoin/commit/8c0c231840b6a88e470da69c6885e6b6dd088df5) indexes: Rewrite chain sync logic, remove racy init
For easier reviewability it would be nice to absorb this into prior commits if possible; i.e. avoid introducing the Ignore* methods only to take them away in this commit. Creates a lot of churn to review. But would be curious to know if this is particularly difficult to do for some reason.

This is a really interesting idea and I’m going to give it a try and see what it looks like. I don’t think it would actually eliminate much churn, because if you look at the src/index/ code that is removed in 8c0c231840b6a88e470da69c6885e6b6dd088df5, it’s basically not touched by the earlier commits. But the surrounding code is definitely changed a lot in earlier commits, so if the code that’s going to be removed is removed earlier, the changes in surrounding code may be easier to understand.

The NotificationsHandlerImpl could use some doc; I’ve had a hard time determining generally how it fits in and e.g. what the connect parameter does.

This got messy due to some threading bugs but I added comments and simplified some things. No more connect parameter!

DrahtBot added the label Needs rebase on Apr 25, 2022

ryanofsky force-pushed on Apr 25, 2022

DrahtBot removed the label Needs rebase on Apr 25, 2022

DrahtBot added the label Needs rebase on Apr 26, 2022

in src/index/base.cpp:119 in 1819bdb7cb outdated

112@@ -101,8 +113,16 @@ void BaseIndexNotifications::blockConnected(const interfaces::BlockInfo& block)
113     // best block is not updated here before sync, but this has been
114     // longstanding behavior since syncing was introduced in #13033 so care
115     // should be taken if changing m_best_block_index semantics.
116-    if (m_index.m_synced) {
117+    assert(synced == m_index.m_synced);
118+    if (synced) {
119         m_index.m_best_block_index = pindex;
120+    } else if (m_last_locator_write_time + SYNC_LOCATOR_WRITE_INTERVAL < current_time || m_index.m_interrupt) {

mzumsande commented at 12:38 pm on May 6, 2022:

commit 1819bdb7cb0634dad2e07daccfe7e8a122a943d6: I think that this changed behavior slightly because it changes the order of indexing the block (CustomAppend) and periodically updating the index’ best block (Commit) - which is better because the indexed data on disk wouldn’t be in an inconsistent state temporarily. I opened #25074 to fix this on master.

ryanofsky commented at 2:13 am on May 11, 2022:

re: #24230 (review)

commit 1819bdb: I think that this changed behavior slightly because it changes the order of indexing the block (CustomAppend)

Thanks. Good catch and I reviewed your PR

ryanofsky commented at 1:36 am on May 11, 2022: contributor

re: #24230#pullrequestreview-922143453

For easier reviewability it would be nice to absorb this into prior commits if possible; i.e. avoid introducing the Ignore* methods only to take them away in this commit. Creates a lot of churn to review. But would be curious to know if this is particularly difficult to do for some reason.

This is a really interesting idea and I’m going to give it a try and see what it looks like.

I’ve been working on this on and off for weeks and I think I hit a breaking point. My branch is at https://github.com/ryanofsky/bitcoin/commits/wcr-export.2 and the main commit is https://github.com/ryanofsky/bitcoin/commit/2da7ac7d159056f3563b0cb845a2a8e9d64c548d. I backported the synchronous attachChain implementation from a later commit to that earlier commit to avoid the need for the the IgnoreBlockConnected and IgnoreChainFlushed methods

But this wrecked absolute havoc on the PR. I got through maybe 60% of the rebase and have had to fix countless conflicts and test failures so far. Then I realized the approach is inherently more messy and complicated because instead of:

Moving the sync thread from indexing code to node code
Replacing the racy sync implementation with non-racy sync implementation

it requires:

Adding non-racy sync thread in the node running after the racy sync thread in the index
Moving indexing functionality to racy sync thread to the non-racy sync thread
Deleting the non-racy sync thread

Step 2 seemed doable but was causing lots of bugs and headaches and was leading to bigger commits, making me prefer the original approach more.

I think the original approach is better by comparison. Yes, there are a few lines in the IgnoreBlockConnected and IgnoreChainFlushed methods which have to be touched up even though they are deleted later. But I don’t think there is a good way around this churn that doesn’t require changing indexing code and changing the sync implementation in the same commit, when the original approach is able to do these things separately.

ryanofsky force-pushed on May 11, 2022

ryanofsky commented at 3:59 am on May 11, 2022: contributor

Rebased 8d3fd88ee31569e8c5d7e33f584b222a4ef3cd1e -> 988c62598b5d388474e6181dfd793b7ad7d141ad (pr/indexy.15 -> pr/indexy.16, compare) with no conflicts. Rebased 988c62598b5d388474e6181dfd793b7ad7d141ad -> 3b3f66ec486d6a77175d26e9edd324666e65855a (pr/indexy.16 -> pr/indexy.17, compare) due to conflict with #24915 Rebased 3b3f66ec486d6a77175d26e9edd324666e65855a -> 2693249ffeab7bf6c41d7b6df16208dfbae5c33a (pr/indexy.17 -> pr/indexy.18, compare) due to conflicts with #21726 and #25079 Updated 2693249ffeab7bf6c41d7b6df16208dfbae5c33a -> fd6568f6888bc86985ce626731cd0f9db35889b8 (pr/indexy.18 -> pr/indexy.19, compare) to fix prune lock test errors https://cirrus-ci.com/task/4831256402722816?logs=functional_tests#L2561 and https://cirrus-ci.com/task/4690518914367488?logs=ci#L3346, and circular dependency error https://cirrus-ci.com/task/5112731379433472?logs=lint#L849

DrahtBot removed the label Needs rebase on May 11, 2022

ryanofsky force-pushed on May 11, 2022

DrahtBot added the label Needs rebase on May 16, 2022

in src/node/interfaces.cpp:828 in fd6568f688 outdated

775+        // CallFunctionInValidationInterfaceQueue, so the new notifications will
776+        // be added to the queue after the queue entry which connects the proxy.
777+        AssertLockNotHeld(::cs_main);
778+        LOCK(cs_main);
779+        const CChainState& active = Assert(m_node.chainman)->ActiveChainstate();
780+        const CBlockIndex* locator_block = locator.IsNull() ? nullptr : active.FindForkInGlobalIndex(locator);

mzumsande commented at 7:39 pm on May 16, 2022:

One general point (if it’s too unrelated to this PR, we could also discuss it elsewhere):

I think that the current way of initialising indexes by calling FindForkInGlobalIndex() for the saved locator and setting this as the best block is not compatible with indexes that have state beyond the best block, such as coinstatsindex: For this index, we need to call CustomRemove() (or Rewind on master) to revert the actions of each block and adjust the muhash. This is not done during Init, so the coinstatsindex currently cannot deal with the situation where the saved best block of the index is not part of the best chain.

One solution for this would be not to call FindForkInGlobalIndex but instead read in the first block stored in the locator and then revert blocks step by step up to the fork point - this is necessary for coinstatsindex, but it adds the additional requirement that all blocks between the forkpoint and the saved block must be available, which is something that e.g. txindex doesn’t really need.

I did a rough implementation for fixing the coinstatsindex init sequence in this branch, which I think works but (at least on first glance) seems to be really incompatible with the approach here, so I wouldn’t just PR it.

What do you think about this problem?

ryanofsky commented at 8:19 pm on May 16, 2022:

I think that the current way of initialising indexes by calling FindForkInGlobalIndex() for the saved locator and setting this as the best block

You’re right but this is easy to fix by just replacing FindForkInGlobalIndex() call with a LookupBlockIndex call. There should be no need to add additional remove/rewind calls, though. Existing code should be aware of forks and how to handle them.

mzumsande commented at 1:16 pm on May 17, 2022:

Great, thanks! I missed that NextSyncBlock() goes back to the fork point already in master during sync. If it’s that easy to fix, I might open a PR vs master. Here, this can be resolved.

ryanofsky commented at 1:29 am on May 18, 2022:

re: #24230 (review)

Great, thanks! I missed that NextSyncBlock() goes back to the fork point already in master during sync. If it’s that easy to fix, I might open a PR vs master. Here, this can be resolved.

Yes, this is a good catch and test on your branch looks good, but the fix could be simpler and more generic.

Just to close the thread, the issue described here is pre-existing, and basically orthogonal to this PR even if it conflicts a little. The issue just happens if there is a reorg away from the last block that was indexed while the index is shut down.

ryanofsky force-pushed on May 18, 2022

ryanofsky commented at 1:38 am on May 18, 2022: contributor

Rebased fd6568f6888bc86985ce626731cd0f9db35889b8 -> 699877a6c682eafde30b5c6eefa069a1ab7e58f4 (pr/indexy.19 -> pr/indexy.20, compare) due to conflict with #25109

DrahtBot removed the label Needs rebase on May 18, 2022

DrahtBot added the label Needs rebase on May 19, 2022

ryanofsky force-pushed on May 19, 2022

ryanofsky commented at 6:44 pm on May 19, 2022: contributor

Rebased 699877a6c682eafde30b5c6eefa069a1ab7e58f4 -> 3d33427d7ce832a763d1dc6a264244440c14fe1e (pr/indexy.20 -> pr/indexy.21, compare) due to conflict with #25074

DrahtBot removed the label Needs rebase on May 19, 2022

DrahtBot added the label Needs rebase on May 24, 2022

ryanofsky force-pushed on Jun 1, 2022

ryanofsky commented at 8:21 pm on June 1, 2022: contributor

Rebased 3d33427d7ce832a763d1dc6a264244440c14fe1e -> acfb6089041e7ba18f8778eb97e07d1bba2b5686 (pr/indexy.21 -> pr/indexy.22, compare) due to conflict with #24410

DrahtBot removed the label Needs rebase on Jun 1, 2022

DrahtBot added the label Needs rebase on Jun 15, 2022

ryanofsky force-pushed on Jun 15, 2022

ryanofsky commented at 1:19 pm on June 15, 2022: contributor

Rebased acfb6089041e7ba18f8778eb97e07d1bba2b5686 -> 4b1ab00331f2e62cddfcb8bb23a0c41be8af3dbd (pr/indexy.22 -> pr/indexy.23, compare) due to conflict with #25338 and silent conflict with #24931

DrahtBot removed the label Needs rebase on Jun 15, 2022

maflcko commented at 5:22 pm on June 15, 2022: member

0SUMMARY: ThreadSanitizer: data race src/./serialize.h:418:84 in void Wrapper<VarIntFormatter<(VarIntMode)0>, unsigned int const&>::Serialize<CDataStream>(CDataStream&) const

https://cirrus-ci.com/task/6321104825352192?logs=ci#L3331

maflcko removed the label Wallet on Jun 15, 2022

maflcko removed the label Build system on Jun 15, 2022

maflcko removed the label UTXO Db and Indexes on Jun 15, 2022

maflcko removed the label RPC/REST/ZMQ on Jun 15, 2022

maflcko removed the label P2P on Jun 15, 2022

maflcko removed the label interfaces on Jun 15, 2022

ryanofsky commented at 6:01 pm on June 15, 2022: contributor

In general, if there are two threads accessing a variable, and you know the second thread will not access the variable until the first thread is done accessing it, does it matter if no mutex is locked while accessing the variable?

Data race appears to happen reading / writing the BlockFilterIndex::m_next_filter_pos.nPos member variable. The variable is incremented in BlockFilterIndex::CustomAppend after blockfilter data is written, and it’s read in BlockFilterIndex::CustomCommit with apparently no synchronization between the two functions. I think there’s probably not a real race here. These methods are called from the index sync thread initially, and then the validation thread after that so they are called from two different threads, but the different calls should be synchronized through the validation queue, so I don’t think a lock should be necessary just to read this variable. But maybe just adding a lock is the path of least resistance

DrahtBot added the label Refactoring on Jun 15, 2022

in src/interfaces/chain.h:268 in bc39508eb8 outdated

264@@ -264,6 +265,9 @@ class Chain
265         virtual void chainStateFlushed(const CBlockLocator& locator) {}
266     };
267 
268+    //! Check if all blocks needed to sync from locator are present.

mzumsande commented at 7:33 pm on June 21, 2022:

bc39508eb83b33e620a81ca89ed3bade90344840 (“Remove index prune_violation code”): I’m not sure if it’s helpful to talk about a locator (which is more of a list of blocks) and call it locator_block here, since checkBlocks just accepts a single BlockIndex - it may currently be derived from a locator, but that info isn’t really required here - if we stopped using locators and just saved single block hashes, checkBlocks wouldn’t be affected at all.

ryanofsky commented at 7:39 pm on June 28, 2022:

re: #24230 (review)

Renamed from locator_block to start_block

in src/node/interfaces.cpp:783 in 1091e79d7b outdated

778@@ -754,6 +779,53 @@ class ChainImpl : public Chain
779         notifications->blockConnected(block_info);
780         if (!block_info.chain_tip && !checkBlocks(locator_block)) return nullptr;
781         auto handler = std::make_unique<NotificationsHandlerImpl>(notifications);
782+        assert(!handler->m_thread_sync.joinable());
783+        if (!block_info.chain_tip) handler->m_thread_sync = std::thread(&util::TraceThread, options.thread_name, [&active, locator_block, notifications, handler = handler.get()] {

mzumsande commented at 2:03 am on June 23, 2022:

1091e79d7bb57da3b96814fe26101e0bf75e1108 (Move sync thread from index to node)

I think it would be good to abort early and not start the sync process if CustomInit of the initial blockConnected call failed (because the index might be corrupted). This is the behavior on master, and otherwise the syncing logic could commit some work like syncing another block, and while we’d still terminate with an InitError because m_init_result=false after that, this error might no longer occur at the next startup - the index might now be silently corrupted.

ryanofsky commented at 3:59 pm on June 23, 2022:

re: #24230 (review)

I think it would be good to abort early and not start the sync process if CustomInit of the initial blockConnected call failed (because the index might be corrupted).

Good point. I think I’m missing a if (!*m_init_result) return m_index.Interrupt(); line after initializing m_init_result in blockConnected()

ryanofsky commented at 7:37 pm on June 28, 2022:

re: #24230 (review)

Added m_index.Interrupt() call on failure

mzumsande commented at 2:21 am on June 23, 2022: contributor

What do you think about splitting out incremental chunks of this to get more review and help things move forward? I agree with the general approach (as most other reviewers do) and would be confident to ACK the first couple of commits, but I find some of the changes are nontrivial in the details - by the time I’d get to the latest commits I already lost track about the details of the first ones.

ryanofsky commented at 4:02 pm on June 23, 2022: contributor

re: #24230#pullrequestreview-1014141911

What do you think about splitting out incremental chunks of this to get more review and help things move forward?

This sounds good I’d be happy to split this up. Any suggestion about which commits would be good to include in a smaller initial PR?

mzumsande commented at 2:22 pm on June 24, 2022: contributor

Any suggestion about which commits would be good to include in a smaller initial PR?

Maybe the first 7 commits, which mostly reduce the use of internal types within the existing index code. The commits after that move things to node, which could be another PR.

ryanofsky force-pushed on Jun 28, 2022

ryanofsky commented at 8:04 pm on June 28, 2022: contributor

Updated 4b1ab00331f2e62cddfcb8bb23a0c41be8af3dbd -> 8ca844e3ad924cf85918a5509f58b0eb9ab4c454 (pr/indexy.23 -> pr/indexy.24, compare) avoiding word locator as suggested and halting sync thread if CustomInit call fails, as suggested

ryanofsky commented at 8:16 pm on June 28, 2022: contributor

Any suggestion about which commits would be good to include in a smaller initial PR?

Maybe the first 7 commits, which mostly reduce the use of internal types within the existing index code. The commits after that move things to node, which could be another PR.

Thanks! Done in #25494

maflcko removed the label Refactoring on Jul 1, 2022

DrahtBot added the label UTXO Db and Indexes on Jul 1, 2022

DrahtBot added the label Needs rebase on Jul 18, 2022

fanquake referenced this in commit 92c8e1849d on Jul 19, 2022

ryanofsky force-pushed on Jul 19, 2022

ryanofsky commented at 10:05 pm on July 19, 2022: contributor

Rebased 8ca844e3ad924cf85918a5509f58b0eb9ab4c454 -> 8360eaa0b283ef525d99288bb9763e568cf430bf (pr/indexy.24 -> pr/indexy.25 after base PR #25494 merge

DrahtBot removed the label Needs rebase on Jul 19, 2022

maflcko commented at 5:41 am on July 20, 2022: member

0index/coinstatsindex.cpp:20:13: error: using decl 'ReadBlockFromDisk' is unused [misc-unused-using-decls,-warnings-as-errors]
1using node::ReadBlockFromDisk;
2            ^
3/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/index/coinstatsindex.cpp:20:13: note: remove the using
4using node::ReadBlockFromDisk;
5~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
6437 warnings generated.

DrahtBot added the label Needs rebase on Jul 21, 2022

in src/node/interfaces.cpp:720 in a035b6500f outdated

716@@ -717,6 +717,39 @@ class ChainImpl : public Chain
717     {
718         ::uiInterface.ShowProgress(title, progress, resume_possible);
719     }
720+    bool checkBlocks(const CBlockIndex* start_block) override {

jamesob commented at 6:21 pm on July 25, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/a035b6500f427cb535058c6718ff41edd961fdc3

checkBlocks is pretty vague; maybe hasDataFromTipDown()?

ryanofsky commented at 5:47 pm on August 3, 2022:

re: #24230 (review)

checkBlocks is pretty vague; maybe hasDataFromTipDown()?

Thanks, renamed

in src/index/base.cpp:170 in de311f9f9e outdated

166@@ -163,6 +167,7 @@ void BaseIndex::ThreadSync()
167 
168             CBlock block;
169             interfaces::BlockInfo block_info = kernel::MakeBlockInfo(pindex);
170+            block_info.chain_tip = false;

jamesob commented at 6:48 pm on July 25, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/de311f9f9ecb64dc83c070e3824c086b918cb09a

I don’t understand what this change does - it isn’t obvious to me from the surrounding code that pindex isn’t the tip.

ryanofsky commented at 5:48 pm on August 3, 2022:

re: #24230 (review)

I don’t understand what this change does - it isn’t obvious to me from the surrounding code that pindex isn’t the tip.

Thanks, removed this. Nothing is actually looking at this variable so it doesn’t do anything. This code is just reading old blocks which are mostly not from the tip of the chain, so this was setting the variable to avoid seeming like they were.

in src/node/interfaces.cpp:761 in 43e03ec21f outdated

755@@ -756,6 +756,9 @@ class ChainImpl : public Chain
756         const CBlockIndex* start_block = locator.IsNull() ? nullptr : active.FindForkInGlobalIndex(locator);
757         interfaces::BlockInfo block_info = kernel::MakeBlockInfo(start_block);
758         block_info.chain_tip = start_block == active.m_chain.Tip();
759+        // Need to register the ValidationInterface and call blockConnected
760+        // both while holding cs_main, so that callbacks are not missed if
761+        // blockConnected sets m_synced to true.

jamesob commented at 7:51 pm on July 25, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/43e03ec21fc2da2bd76912e3aa6981899030c8c2

Is this comment related to the changes in this commit? It seems like this corresponds to the previous commit.

ryanofsky commented at 7:15 pm on August 3, 2022:

re: #24230 (review)

Is this comment related to the changes in this commit? It seems like this corresponds to the previous commit.

Thanks, just dropped this. I was looking for someplace to move this comment so I moved it here, but it doesn’t really work here and is better to prune

in src/index/base.cpp:139 in 43e03ec21f outdated

79+        FatalError("%s: Failed to rewind index %s to a previous chain tip",
80+                   __func__, m_index.GetName());
81+        return;
82+    }
83+
84+    if (!m_index.CustomAppend(block)) {

jamesob commented at 5:52 pm on July 26, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/43e03ec21fc2da2bd76912e3aa6981899030c8c2

FWIW, inverting the conditional here makes it harder to verify the diff because now this block of logic isn’t move-only.

ryanofsky commented at 7:16 pm on August 3, 2022:

re: #24230 (review)

FWIW, inverting the conditional here makes it harder to verify the diff because now this block of logic isn’t move-only.

Good catch, now avoided this

in src/index/base.cpp:260 in 00b2300f91 outdated

257+    const Consensus::Params& consensus_params = Params().GetConsensus();
258+    CBlock block;
259+    CBlockUndo block_undo;
260+
261+    for (const CBlockIndex* iter_tip = current_tip; iter_tip != new_tip; iter_tip = iter_tip->pprev) {
262+        interfaces::BlockInfo block_info = node::MakeBlockInfo(iter_tip);

jamesob commented at 6:47 pm on July 26, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/00b2300f919060a89bca74f0a908602a6a716f6d

This commit fails to compile because MakeBlockInfo() is in the kernel namespace. I guess this code is moved later on, which resolves the compilation failure.

ryanofsky commented at 6:50 pm on August 3, 2022:

re: #24230 (review)

This commit fails to compile because MakeBlockInfo() is in the kernel namespace. I guess this code is moved later on, which resolves the compilation failure.

Thanks! Updated to fix

in src/index/base.cpp:177 in 5605e4e131 outdated

173@@ -158,6 +174,7 @@ void BaseIndex::ThreadSync()
174     const CBlockIndex* pindex = m_best_block_index.load();
175     if (!m_synced) {
176         auto& consensus_params = Params().GetConsensus();
177+        auto notifications = WITH_LOCK(m_mutex, return m_notifications);

jamesob commented at 6:54 pm on July 26, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/5605e4e1310d244892447755c1194195c2e866da#

Just making sure this should actually be a copy and not a reference.

ryanofsky commented at 7:06 pm on August 3, 2022:

re: #24230 (review)

Just making sure this should actually be a copy and not a reference.

Yes a reference to the shared_ptr wouldn’t work here because the shared_ptr could be reset at any time from another thread. A copy of the pointer is actually needed

ryanofsky force-pushed on Jul 26, 2022

ryanofsky commented at 7:05 pm on July 26, 2022: contributor

Rebased 8360eaa0b283ef525d99288bb9763e568cf430bf -> ad9f581fcc7b6e73697f59eb03d10e91d16a09b4 (pr/indexy.25 -> pr/indexy.26, compare) due to conflict with #22485

in src/index/base.cpp:153 in ac1369650d outdated

147@@ -134,6 +148,25 @@ void BaseIndexNotifications::blockConnected(const interfaces::BlockInfo& block)
148 
149 void BaseIndexNotifications::blockDisconnected(const interfaces::BlockInfo& block)
150 {
151+    // During initial sync, ignore validation interface notifications, only
152+    // process notifications from sync thread.
153+    if (!m_index.m_synced && block.chain_tip) return;

jamesob commented at 8:10 pm on July 26, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/ac1369650de26f325076340017745c67bdb34900

Is this new behavior or inherited?

ryanofsky commented at 7:17 pm on August 3, 2022:

re: #24230 (review)

Is this new behavior or inherited?

It’s all inherited behavior. The indexes ignored “blockDisconnected” notifications entirely before, so the “During initial sync, ignore validation interface notifications” part of this comment happened implicitly. The “process notifications from sync thread” part of this comment is referring to new blockDisconnected code below, but that code is just moving from the ThreadSync to here (so ThreadSync can move out of the src/index/ to src/node/ later)

jamesob commented at 8:33 pm on July 26, 2022: contributor

Partial review. This is a tough one so far because logic improvements/changes are intermingled with the (ideally more mechanical) BaseIndex -> BaseIndexNotifications move. Maybe the logic changes are necessary to enable the process separation, and it just can’t get any clearer than this.

I’m in the process of manually testing IBD with txindex, coinstatsindex enabled.

20f2973ae2 indexes, refactor: Remove index prune_violation code
151bad5de4 indexes, refactor: Remove index Init method
8d0bcf4033 indexes, refactor: Remove index validation interface and block locator code
fa6efbff7f indexes, refactor: Stop incorrectly calling Interupt() and Stop() in BaseIndex destructor
e62683db04 indexes, refactor: Add Commit CBlockLocator& argument
e1e040fdd4 indexes, refactor: Remove remaining CBlockIndex* uses in index Rewind methods
16f6a9900a indexes, refactor: Remove remaining CBlockIndex* uses in index CustomAppend methods

Commit message incorrect: “Move CustomAppend call from BaseIndex::ThreadSync to BaseIndex::ThreadSync”

bc71b09151 indexes, refactor: Move more new block logic out of ThreadSync to blockConnected
627887a9b9 indexes, refactor: Move Commit logic out of ThreadSync to notification handlers
[-] ac1369650d indexes, refactor: Move Rewind logic out of Rewind to blockDisconnected and ThreadSync

Maybe just too late in the day, but really couldn’t understand the contents of this commit, too much for me.

3038c883fd indexes, refactor: Move CustomInit and error handling code out of ThreadSync to notification handlers
01297836c7 indexes, refactor: Move sync thread from index to node

In progress; sadly, my diff tool doesn’t consider the contents of ThreadSync() to have been moved.

DrahtBot removed the label Needs rebase on Jul 26, 2022

DrahtBot added the label Needs rebase on Aug 3, 2022

ryanofsky force-pushed on Aug 4, 2022

ryanofsky commented at 8:11 pm on August 4, 2022: contributor

Rebased ad9f581fcc7b6e73697f59eb03d10e91d16a09b4 -> d83caa15e9b46227da9ad9a36da493bcf1de891b (pr/indexy.26 -> pr/indexy.27, compare) due to conflict with #25648, with suggested improvements.

re: #24230#pullrequestreview-1049878909

Partial review. This is a tough one so far because logic improvements/changes are intermingled with the (ideally more mechanical) BaseIndex -> BaseIndexNotifications move. Maybe the logic changes are necessary to enable the process separation, and it just can’t get any clearer than this.

I broke down the commits just to be able to demonstrate that new code and old code are equivalent, but I don’t think you have to care about perfect equivalence to review the PR. If you just want to review final code, I think it’s safe to do that and still provide a valid ACK for the PR without checking the math for equivalence.

I’m in the process of manually testing IBD with txindex, coinstatsindex enabled.

Thanks! This is very useful.

* [x]  [16f6a99](https://github.com/bitcoin/bitcoin/commit/16f6a9900abead79f6a437ca1badc9d0db6f4b5b) indexes, refactor: Remove remaining CBlockIndex* uses in index CustomAppend methods
Commit message incorrect: “Move CustomAppend call from BaseIndex::ThreadSync to BaseIndex::ThreadSync”

Thanks! Should have been “from ThreadSync to blockConnected”

* [-] [ac13696](https://github.com/bitcoin/bitcoin/commit/ac1369650de26f325076340017745c67bdb34900) indexes, refactor: Move Rewind logic out of Rewind to blockDisconnected and ThreadSync
Maybe just too late in the day, but really couldn’t understand the contents of this commit, too much for me.

I simplified the commit a little and added some comments. This is good feedback and it’s helps me to know if there are particularly confusing commits like this. In general, I think if you run across any code that doesn’t make sense after a few minutes, it probably needs to be clarified or explained better.

* [ ]  [0129783](https://github.com/bitcoin/bitcoin/commit/01297836c7ac40823e5d3a95fe044074dd8c9767) indexes, refactor: Move sync thread from index to node
In progress; sadly, my diff tool doesn’t consider the contents of ThreadSync() to have been moved.

My git diff with algorithm = histogram colorMoved = dimmed_zebra and colorMovedWs = ignore-all-space shows this as mostly moved. But when I’m reviewing code like this that is moving and changing, I just use git-diff to identify the moves source and destination. Then I paste the moved before and after versions into meld to look at the actual changes. It works pretty work for big chunks of code like this.

DrahtBot removed the label Needs rebase on Aug 4, 2022

maflcko commented at 2:07 pm on August 10, 2022: member

0musing decl 'ReadBlockFromDisk' is unused [misc-unused-using-decls,-warnings-as-errors]�[0m
1using node::ReadBlockFromDisk;

ryanofsky force-pushed on Aug 15, 2022

ryanofsky commented at 5:37 pm on August 15, 2022: contributor

Updated d83caa15e9b46227da9ad9a36da493bcf1de891b -> 781c4d775880de4190eecb93442148e2d28e61b2 (pr/indexy.27 -> pr/indexy.28, compare) adding mutex to fix blockfilterindex tsan errors https://cirrus-ci.com/task/4753821174857728?logs=ci#L3850 and dropped unused using declaration

re: #24230 (comment)

0using decl 'ReadBlockFromDisk' is unused [misc-unused-using-decls,-warnings-as-errors]

Thanks, fixed

DrahtBot added the label Needs rebase on Sep 13, 2022

ryanofsky force-pushed on Sep 21, 2022

ryanofsky commented at 3:24 am on September 21, 2022: contributor

Rebased 781c4d775880de4190eecb93442148e2d28e61b2 -> 8c66a7cc9a22b350dcdc03cec04c123b45ba7f8a (pr/indexy.28 -> pr/indexy.29, compare) due to conflicts with #25222, #24513, and #25971

DrahtBot removed the label Needs rebase on Sep 21, 2022

DrahtBot added the label Needs rebase on Oct 10, 2022

ryanofsky force-pushed on Oct 13, 2022

ryanofsky commented at 8:18 pm on October 13, 2022: contributor

Rebased 8c66a7cc9a22b350dcdc03cec04c123b45ba7f8a -> b25b01352ec29a5657813be3ba88c321b7c8fa2b (pr/indexy.29 -> pr/indexy.30, compare) due to conflicts with #26215 and #23549

DrahtBot removed the label Needs rebase on Oct 13, 2022

DrahtBot added the label Needs rebase on Nov 22, 2022

ryanofsky force-pushed on Nov 28, 2022

ryanofsky commented at 5:08 pm on November 28, 2022: contributor

Rebased b25b01352ec29a5657813be3ba88c321b7c8fa2b -> f8a26680e902820dc3ee84e1fa91f80c3668a10f (pr/indexy.30 -> pr/indexy.31, compare) due to conflicts with #25957 and #26292

DrahtBot removed the label Needs rebase on Nov 28, 2022

in src/interfaces/chain.h:281 in 2f71f52b9b outdated

281         virtual void updatedBlockTip() {}
282         virtual void chainStateFlushed(const CBlockLocator& locator) {}
283     };
284 
285-    //! Check if all blocks needed to sync start block to current tip are present.
286-    virtual bool hasDataFromTipDown(const CBlockIndex* start_block) = 0;

jamesob commented at 6:53 pm on November 28, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/2f71f52b9b37b94a03aa203d16eecdca467361ac

Not a big deal, but curious why this was removed since it was introduced in the previous commit.

ryanofsky commented at 2:52 pm on September 5, 2023:

re: #24230 (review)

2f71f52

Not a big deal, but curious why this was removed since it was introduced in the previous commit.

It’s not really removed, just made a internal method instead of a public one so removed from the public interface now that it no longer needs to be called by outside code.

in src/node/interfaces.cpp:769 in 2f71f52b9b outdated

764+        const Chainstate& active = Assert(m_node.chainman)->ActiveChainstate();
765+        const CBlockIndex* start_block = locator.IsNull() ? nullptr : active.FindForkInGlobalIndex(locator);
766+        interfaces::BlockInfo block_info = kernel::MakeBlockInfo(start_block);
767+        block_info.chain_tip = start_block == active.m_chain.Tip();
768+        notifications->blockConnected(block_info);
769+        if (!block_info.chain_tip && !hasDataFromTipDown(start_block)) return nullptr;

jamesob commented at 7:03 pm on November 28, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/2f71f52b9b37b94a03aa203d16eecdca467361ac

Wouldn’t we want to do this check/return before we issue a blockConnected() event, or does it not matter because we know that we at least have data for the current tip?

ryanofsky commented at 2:55 pm on September 5, 2023:

re: #24230 (review)

2f71f52

Wouldn’t we want to do this check/return before we issue a blockConnected() event, or does it not matter because we know that we at least have data for the current tip?

The code is a little different now, and the blockConnected call is replaced by a std::function callback but I think the idea in either case is that it might be useful for the index to know what the chain tip is even if data has been pruned and it might not be possible to sync.

in src/node/interfaces.cpp:820 in 2f71f52b9b outdated

757@@ -759,6 +758,18 @@ class ChainImpl : public Chain
758         }
759         return !prune_violation;
760     }
761+    std::unique_ptr<Handler> attachChain(std::shared_ptr<Notifications> notifications, const CBlockLocator& locator, const NotifyOptions& options) override

jamesob commented at 7:15 pm on November 28, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/2f71f52b9b37b94a03aa203d16eecdca467361ac

The contents of this function are pretty specific to indexes (e.g. we have logic in here that refuses to set up the notifications handler if there isn’t data beneath the given tip) and I think the name attachChain a little bit generic, so might consider renaming this attachIndexer() or something more descriptive.

ryanofsky commented at 3:02 pm on September 5, 2023:

re: #24230 (review)

2f71f52

The contents of this function are pretty specific to indexes (e.g. we have logic in here that refuses to set up the notifications handler if there isn’t data beneath the given tip) and I think the name attachChain a little bit generic, so might consider renaming this attachIndexer() or something more descriptive.

This is intentional because the attachChain method is meant to be usable by the wallet too, and it was actually written originally for the wallet in #15719. I think the behaviors you mentioned which are specific to indexes are things which could be controlled by options or just removed later. But idea is for indexes and wallets to just be passive clients that receive notifications, and do not have to implement their own sync logic.

in src/index/base.cpp:61 in a6d1c22dc1 outdated

57@@ -58,6 +58,7 @@ class BaseIndexNotifications : public interfaces::Chain::Notifications
58     void chainStateFlushed(const CBlockLocator& locator) override;
59 
60     BaseIndex& m_index;
61+    interfaces::Chain::NotifyOptions m_options = m_index.CustomOptions();

jamesob commented at 8:21 pm on November 28, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/a6d1c22dc1254cbfedda7add573f0d537aefd36c

Maybe I’m just dense today, but I don’t recognize this cpp pattern. Is this some kind of initialization shorthand that’s executed once on construction, when we get a valid reference for m_index?

ryanofsky commented at 2:58 pm on August 29, 2023:

re: #24230 (review)

a6d1c22

Maybe I’m just dense today, but I don’t recognize this cpp pattern. Is this some kind of initialization shorthand that’s executed once on construction, when we get a valid reference for m_index?

It’s more simple than that, it’s just declaring NotifyOptions m_options member, where NotifyOptions is an options struct with bool and string fields. The struct is initialized by calling the CustomOptions method whenever the class is constructed.

in src/index/base.h:99 in ecedecac38 outdated

95@@ -94,16 +96,21 @@ class BaseIndex : public CValidationInterface
96 
97     virtual bool AllowPrune() const = 0;
98 
99+    Mutex m_mutex;

jamesob commented at 1:48 pm on November 29, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/ecedecac38f231d855e3f663671301d6e1ea7f6b

It would be nice to document what this mutex is for and why it’s needed. I gather from below that it’s guarding m_notifications and m_handler, but it’s not clear to me why that’s necessary - my guess is that because the guarded state is accessed by both the sync thread and validationinterface callbacks via Chain.

ryanofsky commented at 3:13 pm on September 5, 2023:

re: #24230 (review)

ecedeca

It would be nice to document what this mutex is for and why it’s needed. I gather from below that it’s guarding m_notifications and m_handler, but it’s not clear to me why that’s necessary - my guess is that because the guarded state is accessed by both the sync thread and validationinterface callbacks via Chain.

This is basically right, it’s to allow those variables to be accessed from both the sync thread and the init thread (not the validationinterface thread). Added a comment.

in src/index/base.cpp:106 in a6d1c22dc1 outdated

102+        return m_index.Interrupt();
103+    }
104+    // Only update m_best_block_index between flushes if synced. Unclear why
105+    // best block is not updated here before sync, but this has been
106+    // longstanding behavior since syncing was introduced in #13033 so care
107+    // should be taken if changing m_best_block_index semantics.

jamesob commented at 2:29 pm on November 29, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/a6d1c22dc1254cbfedda7add573f0d537aefd36c

To add some color here, reviewers can verify that in the pre-PR code

BlockConnected events are ignored while not m_synced, and
ThreadSync() calls Commit() which confusingly updates the best block in the index db (GetDB().WriteBestBlock(…)) but doesn’t change m_best_block_index.

ryanofsky commented at 3:15 pm on September 5, 2023:

re: #24230 (review)

a6d1c22

To add some color here, reviewers can verify that in the pre-PR code

BlockConnected events are ignored while not m_synced, and

ThreadSync() calls Commit() which confusingly updates the best block in the index db (GetDB().WriteBestBlock(…)) but doesn’t change m_best_block_index.

This is right. If it helps I added a separate commit (f552c32de4359cd7dce38ce8c9a340039d5a29c6) to try to document m_best_block_index behavior

in src/index/base.cpp:165 in 15f8349271 outdated

160+
161+    CBlockUndo block_undo;
162+    interfaces::BlockInfo block_info = kernel::MakeBlockInfo(pindex, block.data);
163+    if (m_options.disconnect_undo_data && block.height > 0) {
164+        if (!node::UndoReadFromDisk(block_undo, pindex)) {
165+            m_rewind_error = true;

jamesob commented at 3:52 pm on November 29, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/15f83492713abefea3f6ffc7e4c100a6177cc462

Would it be worth FatalError()/return from here? I think that is needed to maintain strict behavioral compatibility with the way that ThreadSync() currently behaves, since !Rewind() can be tripped for either a failure to retrieve block data OR failure to retrieve undo data.

ryanofsky commented at 2:59 pm on August 29, 2023:

re: #24230 (review)

15f8349

Would it be worth FatalError()/return from here? I think that is needed to maintain strict behavioral compatibility with the way that ThreadSync() currently behaves, since !Rewind() can be tripped for either a failure to retrieve block data OR failure to retrieve undo data.

The error still happens on line 99 ("%s: Failed to rewind index %s to a previous chain tip") because rewind_ok will be false. The idea is to keep behavior unchanged, and trigger the error when there is actually an attempt to connect another block.

Maybe though the complexity of having m_rewind_error and m_rewind_start variables is not worth it, and there should be an immediate fatal error.

For now just added a comment about m_rewind_error here

in src/index/base.cpp:100 in ed79c47c13 outdated

84@@ -80,6 +85,15 @@ void BaseIndexNotifications::blockConnected(const interfaces::BlockInfo& block)
85         m_index.SetBestBlockIndex(pindex);
86         if (block.chain_tip) {
87             m_index.m_synced = true;
88+            if (pindex) {
89+                LogPrintf("%s is enabled at height %d\n", m_index.GetName(), pindex->nHeight);
90+            } else {
91+                LogPrintf("%s is enabled\n", m_index.GetName());
92+            }

jamesob commented at 4:04 pm on November 29, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/ed79c47c1385dc2e1b25d3958fa83a231770353d

Should the equivalent old code in ThreadSync() be removed here in this commit as well?

ryanofsky commented at 3:00 pm on August 29, 2023:

re: #24230 (review)

ed79c47

Should the equivalent old code in ThreadSync() be removed here in this commit as well?

Good catch, added that change to this commit!

in src/index/base.cpp:287 in ed79c47c13 outdated

283@@ -266,13 +284,13 @@ void BaseIndex::ThreadSync()
284                         interfaces::BlockInfo block_info = kernel::MakeBlockInfo(iter_tip);
285                         block_info.chain_tip = false;
286                         if (!ReadBlockFromDisk(block, iter_tip, consensus_params)) {
287-                            FatalError("%s: Failed to read block %s from disk",
288+                            block_info.error = strprintf("%s: Failed to read block %s from disk",

jamesob commented at 4:20 pm on November 29, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/ed79c47c1385dc2e1b25d3958fa83a231770353d

May be worth a comment that blockDisconnected will interrupt below in this case, triggering the related return?

ryanofsky commented at 3:52 pm on September 5, 2023:

re: #24230 (review)

ed79c47

May be worth a comment that blockDisconnected will interrupt below in this case, triggering the related return?

This change is implemented a little differently now. The ReadBlockFrom disk call is moved into blockDisconnected so hopefully more straightforward.

in src/interfaces/chain.h:99 in ed79c47c13 outdated

88@@ -89,6 +89,7 @@ struct BlockInfo {
89     const CBlockUndo* undo_data = nullptr;
90     //! Block is from the tip of the chain (always true except when first calling attachChain and reading old blocks).
91     bool chain_tip = true;
92+    std::string error;

jamesob commented at 4:21 pm on November 29, 2022:

https://github.com/bitcoin/bitcoin/pull/24230/commits/ed79c47c1385dc2e1b25d3958fa83a231770353d#diff-8e9fef4d0718d085f31da382899d97e3bbefc8d6a72ede95916a0e2fbd1a532fR92

It might help to add some documentation for how this is used and under what conditions it is set.

ryanofsky commented at 3:00 pm on August 29, 2023:

re: #24230 (review)

ed79c47#diff-8e9fef4d0718d085f31da382899d97e3bbefc8d6a72ede95916a0e2fbd1a532fR92

It might help to add some documentation for how this is used and under what conditions it is set.

Good idea, added.

jamesob commented at 6:43 pm on November 29, 2022: contributor

Partial review

I put about 8 hours into reviewing commits locally. Every time I review this PR I seem to drop off earlier and earlier - not a great testament to the health of my brain I guess.

Funny enough I had forgotten I’d written this in a previous review, but it still captures my sentiments pretty exactly:

At a high level I think this is a great change - the more “peripheral” parts of the system we can put on the other side of the Chain interface, the easier it is to carve off those parts both in terms of process separation (limiting “contagion” risk of non-essential code) and deeper modularization (allowing for more straightforward progression of e.g. libbitcoinkernel). The indexing system is a great candidate for this kind of change, and I’m really glad to see this PR doing it.

That said, it took me a while to get through the changes. ~~I think there’s some potentially unnecessary churn within the commits (noted below).~~ While the code quality is very good, behavioral equivalence with existing code is not immediately obvious to me when reading through the changes (given there are just so many of them), and so to have confidence that this PR isn’t really changing any behavior, I’m really going to need to dig in and do some testing as well as review the coverage of the existing functional tests.

I’d really like to sign off on this code, but I don’t feel like I have the horsepower to actually review the latter part of the changeset with much confidence. It’d almost be easier to review the entire thing squashed down IMO. If I ACK’d as-is, I’d be leaning on correctness verification via manual testing and verifying functional coverage.

I feel bad because you’ve put a ton of work into this PR, and I don’t want to be throwing out “stop energy;” I’m just having a really tough time reading through this and convincing myself that I understand it’s correct.

Could we somehow characterize in the PR description the general strategy explaining what’s happening in the middle commits - i.e. why exactly are we wanting to move code out of ThreadSync and into notification handler classes? I understand that this probably has to do with keeping node code on the node, but getting some kind of clear definition in the PR description might help. I.e. which processes/threads which classes are executed on.

I don’t have a good model that helps me understand, e.g., why we want to avoid FatalError() in ThreadSync() and instead set block_info.error.

Raw notes

257ef9f7b9 indexes, refactor: Remove index prune_violation code

Simple change; introduces hasDataFromTipDown() and moves. Some variable renames.

2f71f52b9b indexes, refactor: Remove index Init method

One of the keys to reviewing this commit is to compare Init() with attachChain().

Basically, this move index initialization from Index::Init() into a combination of attachChain() and the first blockConnected() call. BlockConnected events are moved from being called within the same process to the BaseIndexNotifications object which is attached to the Chain instance and holds a reference to the index object itself.

ecedecac38 indexes, refactor: Remove index validationinterface hooks
5e07d66a6d indexes, refactor: Stop incorrectly calling Interupt() and Stop() in BaseIndex destructor
c2ab79db68 indexes, refactor: Add Commit CBlockLocator& argument
f3d1bdae0e indexes, refactor: Remove remaining CBlockIndex* uses in index Rewind methods

Perhaps useful to keep in mind that the motivation for moving Rewind() from a range of blocks to a single block at a time (Remove()) is to avoid usage of CBlockIndex* entries.

a6d1c22dc1 indexes, refactor: Remove remaining CBlockIndex* uses in index CustomAppend methods
8b63269e7c indexes, refactor: Move more new block logic out of ThreadSync to blockConnected
1df5b0b6f9 indexes, refactor: Move Commit logic out of ThreadSync to notification handlers

Nice improvement to avoid duplicating commit/chainStateFlushed logic.

15f8349271 indexes, refactor: Move Rewind logic out of Rewind to blockDisconnected and ThreadSync

Minor typo in the commit message: “This main thing” -> “The main thing”.

This is the hardest commit for me to follow. It’s distributing the contents of Rewind() over ThreadSync() and blockDisconnected().

I’m not sure how to decompose this to make it easier to follow, but it’s a real challenge for me to feel confident about this not changing behavior.

ed79c47c13 indexes, refactor: Move CustomInit and error handling code out of ThreadSync to notification handlers

I don’t have a good model that helps me understand, e.g., why we want to avoid FatalError() in ThreadSync() and instead set block_info.error.

2897678732 indexes: Add blockfilterindex mutex
53a1aa56af indexes, refactor: Move sync thread from index to node

Move-only stuff is pretty clear. So now the initial index sync is going to happen all in the node process? Would there be a way to offload this to the index processes?

ac61e33545 indexes, refactor: Remove SyncWithValidationInterfaceQueue call
2eceaee59f indexes: Rewrite chain sync logic, remove racy init
e2f6c6ddf7 indexes: Initialize indexes without holding cs_main
3543161c82 indexes, refactor: Remove UndoReadFromDisk calls from indexing code
5cadd3636e indexes, refactor: Remove remaining CBlockIndex* pointers from indexing code
f8a26680e9 Remove direct index -> node dependency

ryanofsky commented at 4:24 am on November 30, 2022: contributor

Thanks for the feedback, and for trying to tackle this. This is just a quick response and I will work more on this later.

I don’t feel like I have the horsepower to actually review the latter part of the changeset with much confidence. It’d almost be easier to review the entire thing squashed down IMO. If I ACK’d as-is, I’d be leaning on correctness verification via manual testing and verifying functional coverage.

Yes, this I think this is a good approach, and I think I recommended it previously. The best way to review the whole PR is to look at the squashed code, and make sure it indexing blocks correctly. There’s no reason to pore over the individual commits, they are meant to be illustrative, and to communicate which code changes are intended to change application behavior, and which code changes which are intended to be refactoring.

The way I would review this would be to read commit descriptions in sequence, look over the code in each commit and make sure it doing what’s described with no obvious surprises, and that things that are being described move in a coherent direction and make sense.

Then I’d review final squashed code in detail and look closely at the final implementation for bugs.

I feel bad because you’ve put a ton of work into this PR, and I don’t want to be throwing out “stop energy;” I’m just having a really tough time reading through this and convincing myself that I understand it’s correct.

Could we somehow characterize in the PR description the general strategy explaining what’s happening in the middle commits - i.e. why exactly are we wanting to move code out of ThreadSync and into notification handler classes? I understand that this probably has to do with keeping node code on the node, but getting some kind of clear definition in the PR description might help. I.e. which processes/threads which classes are executed on.

I’ll work on this but the answer is that just that if every individual index and every individual wallet is individually rereading the same blocks on startup it is worse for performance and responsiveness than if they tell the node what blocks they need to read, and let it feed them the data. It can also deduplicate code between wallets and indexes, provide better feedback about overall sync progress, and make it straightforward to write new clients that need to be informed about connected and disconnected blocks and stay in sync with the chain.

I don’t have a good model that helps me understand, e.g., why we want to avoid FatalError() in ThreadSync() and instead set block_info.error.

Probably this is an example of something that I need to document better in the context of a specific commit, but in general failure to sync a specific index (or wallet) should just result in failure to load that index not necessarily a fatal error that should bring down the node.

15f8349 indexes, refactor: Move Rewind logic out of Rewind to blockDisconnected and ThreadSync

This is the hardest commit for me to follow. It’s distributing the contents of Rewind() over ThreadSync() and blockDisconnected().

I’m not sure how to decompose this to make it easier to follow, but it’s a real challenge for me to feel confident about this not changing behavior.

I don’t think this is something that’s especially important to check or feel confident about. This reminds me of high school physics tests that would sometimes punish you for showing extra work. If you skipped steps and just started from an equation that seemed like it might give a right answer, that could be a safer than actually thinking about the problem and showing extra work that grader could find fault with.

In this case, if I’m looking at a PR that has combination of refactoring changes and behavioral changes, I think it is helpful to know which changes are which, and what the intent is behind various changes, even if in the end I am only looking for serious bugs, and I’m not tediously verifying that there is no behavior change whatsoever.

Move-only stuff is pretty clear. So now the initial index sync is going to happen all in the node process? Would there be a way to offload this to the index processes?

If concern is about performance, this PR by itself shouldn’t affect that, since attachChain is keeping the one thread per index, all threads racing each other to reread the same blocks synchronization model. Moving the block reading code out of the indexes just allows for more coordination and better performance and monitoring in the future.

DrahtBot added the label Needs rebase on Jan 4, 2023

aureleoules commented at 5:25 pm on January 17, 2023: contributor

Concept ACK

257ef9f7b9537c5305a9d92afba541a73ec6c8c0: Now that the chunk of code is in a function, we can directly return the boolean and cleanup the logic.

 0diff --git a/src/node/interfaces.cpp b/src/node/interfaces.cpp
 1index 8c9a0b013..c05b5954a 100644
 2--- a/src/node/interfaces.cpp
 3+++ b/src/node/interfaces.cpp
 4@@ -729,35 +729,31 @@ public:
 5     bool hasDataFromTipDown(const CBlockIndex* start_block) override {
 6         LOCK(::cs_main);
 7         const CChain& active_chain = Assert(m_node.chainman)->ActiveChain();
 8-        bool prune_violation = false;
 9         if (!start_block) {
10             // make sure we have all block data back to the genesis
11-            prune_violation = m_node.chainman->m_blockman.GetFirstStoredBlock(*active_chain.Tip()) != active_chain.Genesis();
12+            return m_node.chainman->m_blockman.GetFirstStoredBlock(*active_chain.Tip()) == active_chain.Genesis();
13         }
14         // in case the index has a best block set and is not fully synced
15         // check if we have the required blocks to continue building the index
16-        else {
17-            const CBlockIndex* block_to_test = start_block;
18-            if (!active_chain.Contains(block_to_test)) {
19-                // if the bestblock is not part of the mainchain, find the fork
20-                // and make sure we have all data down to the fork
21-                block_to_test = active_chain.FindFork(block_to_test);
22-            }
23-            const CBlockIndex* block = active_chain.Tip();
24-            prune_violation = true;
25-            // check backwards from the tip if we have all block data until we reach the indexes bestblock
26-            while (block_to_test && block && (block->nStatus & BLOCK_HAVE_DATA)) {
27-                if (block_to_test == block) {
28-                    prune_violation = false;
29-                    break;
30-                }
31-                // block->pprev must exist at this point, since block_to_test is part of the chain
32-                // and thus must be encountered when going backwards from the tip
33-                assert(block->pprev);
34-                block = block->pprev;
35+        const CBlockIndex* block_to_test = start_block;
36+        if (!active_chain.Contains(block_to_test)) {
37+            // if the bestblock is not part of the mainchain, find the fork
38+            // and make sure we have all data down to the fork
39+            block_to_test = active_chain.FindFork(block_to_test);
40+        }
41+        const CBlockIndex* block = active_chain.Tip();
42+        // check backwards from the tip if we have all block data until we reach the indexes bestblock
43+        while (block_to_test && block && (block->nStatus & BLOCK_HAVE_DATA)) {
44+            if (block_to_test == block) {
45+                return true;
46             }
47+            // block->pprev must exist at this point, since block_to_test is part of the chain
48+            // and thus must be encountered when going backwards from the tip
49+            assert(block->pprev);
50+            block = block->pprev;
51         }
52-        return !prune_violation;
53+
54+        return false;
55     }
56     std::unique_ptr<Handler> handleNotifications(std::shared_ptr<Notifications> notifications) override
57     {

ryanofsky force-pushed on Jan 20, 2023

DrahtBot removed the label Needs rebase on Jan 20, 2023

in src/index/coinstatsindex.cpp:144 in 137bbe1478 outdated

134@@ -146,6 +135,7 @@ bool CoinStatsIndex::CustomAppend(const interfaces::BlockInfo& block)
135 
136         // Add the new utxos created from the block
137         assert(block.data);
138+        assert(block.undo_data);
139         for (size_t i = 0; i < block.data->vtx.size(); ++i) {
140             const auto& tx{block.data->vtx.at(i)};
141

maflcko commented at 9:28 am on January 21, 2023:

0index/coinstatsindex.cpp:143:37: error: use of undeclared identifier 'pindex'

ryanofsky commented at 2:58 pm on August 29, 2023:

re: #24230 (review)

0index/coinstatsindex.cpp:143:37: error: use of undeclared identifier 'pindex'

This was probably a rebase conflict. I think this error must have been fixed, or at least I can’t reproduce it the commit this line was added (5e2b1ea14823bcc43429a130d8820f7be0f45b4b)

DrahtBot added the label Needs rebase on Jan 27, 2023

ryanofsky force-pushed on Feb 10, 2023

DrahtBot removed the label Needs rebase on Feb 10, 2023

ryanofsky force-pushed on Feb 10, 2023

DrahtBot added the label Needs rebase on Feb 17, 2023

ryanofsky force-pushed on Feb 28, 2023

DrahtBot removed the label Needs rebase on Feb 28, 2023

DrahtBot added the label Needs rebase on Apr 21, 2023

ryanofsky force-pushed on May 2, 2023

DrahtBot removed the label Needs rebase on May 2, 2023

in src/node/interfaces.cpp:730 in d7eef9f49a outdated

724@@ -725,6 +725,39 @@ class ChainImpl : public Chain
725     {
726         ::uiInterface.ShowProgress(title, progress, resume_possible);
727     }
728+    bool hasDataFromTipDown(const CBlockIndex* start_block) override {
729+        LOCK(::cs_main);
730+        const CChain& active_chain = Assert(m_node.chainman)->ActiveChain();

maflcko commented at 3:24 pm on May 9, 2023:

nit in d7eef9f49a827e96337febcf20a22b392dc7b51e: Can use chaiman()?

ryanofsky commented at 3:58 pm on September 5, 2023:

re: #24230 (review)

nit in d7eef9f: Can use chaiman()?

Thanks, this was done in the meantime, switched to chainman() throughout

DrahtBot added the label Needs rebase on May 11, 2023

ryanofsky force-pushed on Jul 13, 2023

DrahtBot removed the label Needs rebase on Jul 13, 2023

DrahtBot added the label CI failed on Jul 13, 2023

ryanofsky force-pushed on Jul 14, 2023

ryanofsky commented at 7:49 pm on July 14, 2023: contributor

This is rebased after #27607, which allowed making a lot of simplifications and improvements. Also, I think the PR is better organized now, so I think I could split off the first commit or first few commits into a separate PR if reviewers would find that helpful.

jamesob commented at 2:21 am on July 15, 2023: contributor

I’ll start another review early next week.

ryanofsky force-pushed on Jul 17, 2023

ryanofsky force-pushed on Jul 18, 2023

DrahtBot removed the label CI failed on Jul 18, 2023

in src/index/base.cpp:310 in c0d1110d2a outdated

305+        // The other case where the new block will connect to an ancestor of the
306+        // current block rather than the current block is when the m_synced flag
307+        // is set to true too early. For example if the index is synced to
308+        // height 100, and -reindex-chainstate option is used, there may be
309+        // BlockConnected notifications for blocks 97, 98, 99, and 100 sitting
310+        // in the notifications queue when m_synced gets set to true. When this

furszy commented at 2:41 pm on August 2, 2023:

In c0d1110d:

Guess that you wrote this because you saw it in a unit test?

I’m not seeing how this could be possible during the regular node initialization process. We aren’t signaling any block prior to initializing the indexes.

ryanofsky commented at 3:36 pm on August 2, 2023:

re: #24230 (review)

Guess that you wrote this because you saw it in a unit test?

No, it I believe it happens pretty reliably when you use -reindex-chainstate. I saw it reliably when I ran the test/functional/feature_coinstatsindex.py “Test that the index works with -reindex-chainstate” test and had a bug in the code that disabled this check. It caused the index to be corrupted because the same blocks would be added to the index twice.

furszy commented at 8:49 pm on August 2, 2023:

Ok, I think that know how that could happen. It is not related to the Init() call.

The background sync process reads blocks directly from the chain index (calling NextSyncBlock()). So, what if a block gets connected while ThreadSync() is being executed but the event is not dispatched on time.. (before the background sync process finishes)

The block connection event is added to the validation queue.
The index background sync process reads the block from the chain and processes it.
The validation queue dispatches the block connection event added in (1) to the index. Then… boom: the index received an already processed block.

ryanofsky commented at 9:09 pm on August 2, 2023:

That’s accurate, but I don’t think ThreadSync needs to even do anything for this to happen, and no new blocks need to be connected after ThreadSync starts.

If -reindex-chainstate indexing completes, and ThreadSync immediately starts and exits because it detects that the index best block is equal to the chain tip, and no new blocks are connected, there could still be stale block connected notifications in the queue which will cause the index to rewind and add the blocks again. The commit “Avoid race, make -reindex-chainstate more efficient” e5f2641137413da5970007a9b932b5993067b550 changes index startup behavior to just ignore the stale block connected notifications while m_ready is false, instead of processing them and pointlessly rewinding and reconnecting blocks.

furszy commented at 2:44 pm on August 4, 2023:

Yeah ok, that is also a possibility. Great.

Could the fix be decoupled into a standalone PR? Sounds like something that we could rapidly merge independently of this PR. It fixes a bug and also makes this work smaller.

ryanofsky commented at 3:43 pm on August 4, 2023:

Yeah ok, that is also a possibility. Great.

Could the fix be decoupled into a standalone PR? Sounds like something that we could rapidly merge independently of this PR. It fixes a bug and also makes this work smaller.

Yes, I am definitely looking for suggestions on how many of the first commits it would make sense to split off into a separate PR.

But just to be clear, there isn’t a bug here, just suboptimal behavior. If -reindex-chainstate is used and there are still block-connected notifications waiting in the queue when the index is started, it doesn’t cause any index corruption. It just causes the last few blocks in the index to be rewound, and then appended again, as if there were a reorg

furszy commented at 11:16 pm on August 4, 2023:

Yes, I am definitely looking for suggestions on how many of the first commits it would make sense to split off into a separate PR.

Ok cool. I suggested to decouple this commit only because I’m going commit by commit and it seemed like an easy step forward, but will keep moving forward.

But just to be clear, there isn’t a bug here, just suboptimal behavior. If -reindex-chainstate is used and there are still block-connected notifications waiting in the queue when the index is started, it doesn’t cause any index corruption. It just causes the last few blocks in the index to be rewound, and then appended again, as if there were a reorg

Yeah, we are sync. I said “bug” because of the unexpected behavior. I don’t think that it was programmed to do that.

in src/index/base.cpp:157 in ba30b60bcf outdated

153+        // To prevent new notifications that may be happening in the background
154+        // from being lost, it is important to keep cs_main locked while calling
155+        // CallFunctionInValidationInterfaceQueue, so the new notifications will
156+        // be queued after the m_synced = true callback.
157+        CallFunctionInValidationInterfaceQueue([this] { m_synced = true; });
158+    }

furszy commented at 3:10 pm on August 2, 2023:

In ba30b60b:

Isn’t this conflicting with the StartBackgroundSync() call?

Let’s say that there are new blocks on the notifications queue at this point. By not setting the m_synced flag, we allow ThreadSync() to start. This former one might have some work to do if some of the queued blocks notifications were processed in-between Init() and StartBackgroundSync(). Then, while this is being done, the validation queue thread might end up setting m_synced=true. Which would enable the index new block connection signals reception at the same time that ThreadSync() is being executed.

ryanofsky commented at 3:57 pm on August 2, 2023:

re: #24230 (review)

Isn’t this conflicting with the StartBackgroundSync() call?

Let’s say that there are new blocks on the notifications queue at this point. By not setting the m_synced flag, we allow ThreadSync() to start.

Yes, that’s a good point. I forgot that the m_synced flag is not just used to process notifications, but also to determine whether to start the sync thread.

It’s hard to imagine this being a problem in pracice, but I think I should tweak the logic to distinguish between “synced” and “ready” states, where “synced” state means index is in sync with the chainstate and “ready” state means index is ready to start processing blockconnected notifications.

EDIT: Later commit “Move sync thread from index to node” 0048fd1cbc00205c191dae1a96c44d02f4a7aae0 actually already does this so there is not a race condition in the end, but I it would be good to do it earlier.

ryanofsky force-pushed on Aug 2, 2023

ryanofsky commented at 7:11 pm on August 2, 2023: contributor

Updated e025bb3f8d229969fdfe16e3c20191a5382c0443 -> c553d9f0b0f796c30124fddb920b5e8b5b32de11 (pr/indexy.41 -> pr/indexy.42, compare) to avoid race condition pointed out by furszy in an intermediate commit: #24230 (review) which could potentially lead to an assert failure if there was a reorg as indexes are starting up.

DrahtBot added the label Needs rebase on Aug 7, 2023

ryanofsky force-pushed on Aug 14, 2023

DrahtBot removed the label Needs rebase on Aug 14, 2023

DrahtBot added the label CI failed on Aug 16, 2023

maflcko commented at 9:25 am on August 17, 2023: member

Maybe a silent merge conflict in a functional test, see CI?

0�[0m�[0;31mfeature_index_prune.py                                 | ✖ Failed  | 2401 s

DrahtBot added the label Needs rebase on Aug 17, 2023

ryanofsky force-pushed on Aug 21, 2023

DrahtBot removed the label CI failed on Aug 21, 2023

DrahtBot removed the label Needs rebase on Aug 21, 2023

in src/index/base.cpp:338 in b96598d938 outdated

334@@ -356,7 +335,7 @@ void BaseIndex::BlockConnected(const std::shared_ptr<const CBlock>& block, const
335 
336 void BaseIndex::ChainStateFlushed(const CBlockLocator& locator)
337 {
338-    if (!m_synced) {
339+    if (!m_ready) {

furszy commented at 1:50 pm on August 24, 2023:

In b96598d9:

Same as you did for BlockConnected(), what about doing the same cleanup on this method?. Check few lines below, we have the same:

0// This checks that ChainStateFlushed callbacks are received after BlockConnected. The check may fail
1// immediately after the sync thread catches up and sets m_synced. Consider the case where
2// there is a reorg and the blocks on the stale branch are in the ValidationInterface queue
3// backlog even after the sync thread has caught up to the new chain tip. In this unlikely
4// event, log a warning and let the queue clear

which is no longer possible, thanks to m_ready set in the validation queue thread.

ryanofsky commented at 2:56 pm on August 29, 2023:

re: #24230 (review)

In b96598d:

Same as you did for BlockConnected(), what about doing the same cleanup on this method?

Great idea, done.

in src/index/base.cpp:157 in b96598d938 outdated

152+        //
153+        // To prevent new notifications that may be happening in the background
154+        // from being lost, it is important to keep cs_main locked while calling
155+        // CallFunctionInValidationInterfaceQueue, so the new notifications will
156+        // be queued after the m_ready = true callback.
157+        CallFunctionInValidationInterfaceQueue([this] { m_ready = true; });

furszy commented at 1:56 pm on August 24, 2023:

In b96598d9:

Small note, the comment that is above the RegisterValidationInterface() call is not so accurate anymore.

ryanofsky commented at 2:56 pm on August 29, 2023:

re: #24230 (review)

In b96598d:

Small note, the comment that is above the RegisterValidationInterface() call is not so accurate anymore.

Rewrote the comment. (Comment was also incomplete before, because even before this it was important to register for notifications before releasing cs_main.)

furszy commented at 2:40 pm on August 24, 2023: member

Left two comments for the initial commits, will continue.

Have to admit that the first two commits consume some time to grasp again. I find myself re-thinking the same topics over and over.

in src/node/interfaces.cpp:855 in 1a6a2a3fe5 outdated

730@@ -731,6 +731,16 @@ class ChainImpl : public Chain
731     {
732         ::uiInterface.ShowProgress(title, progress, resume_possible);
733     }
734+    std::unique_ptr<Handler> attachChain(std::shared_ptr<Notifications> notifications, const CBlockLocator& locator, const NotifyOptions& options, const StartSyncFn& start_sync) override
735+    {
736+        LOCK(cs_main);
737+        const Chainstate& active{chainman().ActiveChainstate()};
738+        const CBlockIndex* start_block_index{locator.IsNull() ? nullptr : active.m_blockman.LookupBlockIndex(locator.vHave.at(0))};

furszy commented at 1:38 pm on August 25, 2023:

In 1a6a2a3f:

As we always use the top locator item. What about providing the start block uint256 directly to attachChain?.

Also, what do you think about working on an index upgrade mechanism to only store the tip’s uint256 instead of the locator?.

It seems that with it, we will be able to simplify 432febc3.

Update: Made a quick version of it here: https://github.com/furszy/bitcoin-core/commit/7f2b2eb6c313d64e323edc11931d2786e181c0d8.

ryanofsky commented at 2:56 pm on August 29, 2023:

re: #24230 (review)

In 1a6a2a3:

As we always use the top locator item. What about providing the start block uint256 directly to attachChain?.

Also, what do you think about working on an index upgrade mechanism to only store the tip’s uint256 instead of the locator?.

It seems that with it, we will be able to simplify 432febc.

Update: Made a quick version of it here: furszy@7f2b2eb.

Thanks. I’m not sure I see the benefits here for code simplification. I think just because existing index types may require the best block to be present to sync, it doesn’t mean that other index types (or wallets) will have the same limitation. Since the locator is a generalization of the best block hash, it seems like we don’t get much advantage from switching to the best block hash, and switching also has disadvantage of changing the index format and affecting backwards compatibility or requiring backwards compatibility code.

I also would like wallets to eventually call attachChain so wallets reuse same sync code and potentially sync more efficiently, and wallets should support syncing from a locator.

in src/index/base.cpp:237 in cf491ea020 outdated

127-    Stop();
128+    //! Assert Stop() was called before this base destructor. Notification
129+    //! handlers call pure virtual methods like GetName(), so if they are still
130+    //! being called at this point, they would segfault.
131+    LOCK(m_mutex);
132+    assert(!m_handler);

furszy commented at 8:53 pm on August 28, 2023:

In cf491ea0:

Probably, we should fix the tests that could be depending on this here too. For instance, the blockfilter_index_init_destroy unit test isn’t interrupting nor stopping the index. Then, coinstatsindex_unclean_shutdown and txindex_initial_sync only call Stop() without calling Interrupt() first.

ryanofsky commented at 2:57 pm on August 29, 2023:

re: #24230 (review)

In cf491ea:

Probably, we should fix the tests that could be depending on this here too. For instance, the blockfilter_index_init_destroy unit test isn’t interrupting nor stopping the index. Then, coinstatsindex_unclean_shutdown and txindex_initial_sync only call Stop() without calling Interrupt() first.

Nice catch on the blockfilter_index_init_destroy test. I do think it safe, though because it is just creating and destroying index objects without actually starting them, so there no need to stop them first (otherwise would trigger the assert).

It’s also a good observation about Stop() being called without Interrupt() in some places. I think this is actually ok, because it just causes the test to wait for the index to sync.

furszy commented at 8:55 pm on August 28, 2023: member

Reviewed till 432febc3.

in src/index/base.cpp:341 in d260abb891 outdated

338+            if (!m_chainstate->m_blockman.ReadBlockFromDisk(block, *iter_tip)) {
339+                return error("%s: Failed to read block %s from disk",
340+                             __func__, iter_tip->GetBlockHash().ToString());
341+            }
342+            block_info.data = &block;
343+	}

furszy commented at 9:22 pm on August 28, 2023:

In d260abb8:

Missing indentation.

ryanofsky commented at 2:57 pm on August 29, 2023:

re: #24230 (review)

Thanks, fixed

in src/index/blockfilterindex.cpp:333 in d260abb891 outdated

293 
294     // During a reorg, we need to copy all filters for blocks that are getting disconnected from the
295     // height index to the hash index so we can still find them when the height index entries are
296     // overwritten.
297-    if (!CopyHeightIndexToHashIndex(*db_it, batch, m_name, new_tip.height, current_tip.height)) {
298+    if (!CopyHeightIndexToHashIndex(*db_it, batch, m_name, block.height - 1, block.height)) {

furszy commented at 9:27 pm on August 28, 2023:

In d260abb8:

Note: now that only one block per CustomRemove call is rewinded, we could simplify the loop inside CopyHeightIndexToHashIndex().

ryanofsky commented at 4:21 pm on March 13, 2024:

re: #24230 (review)

In d260abb:

Note: now that only one block per CustomRemove call is rewinded, we could simplify the loop inside CopyHeightIndexToHashIndex().

This is true, I think I would leave it for a possible followup though since it seems like a bigger diff for a smaller simplification.

in src/index/base.cpp:36 in 5e2b1ea148 outdated

32@@ -33,6 +33,7 @@ constexpr auto SYNC_LOCATOR_WRITE_INTERVAL{30s};
33 template <typename... Args>
34 void BaseIndex::FatalErrorf(const char* fmt, const Args&... args)
35 {
36+    Interrupt(); // Cancel the sync thread

furszy commented at 9:30 pm on August 28, 2023:

In 5e2b1ea1:

The description says that this commit does not change behavior but this Interrupt() call here seems like a behavior change.

ryanofsky commented at 2:58 pm on August 29, 2023:

re: #24230 (review)

In 5e2b1ea:

The description says that this commit does not change behavior but this Interrupt() call here seems like a behavior change.

It shouldn’t be a behavior change (unless I am missing something) because Interrupt() just stops the sync thread, and the sync thread never calls FatalErrorf unless it is already stopping.

in src/index/base.cpp:68 in a2e87d9d66 outdated

60@@ -61,12 +61,37 @@ class BaseIndexNotifications : public interfaces::Chain::Notifications
61 
62 void BaseIndexNotifications::blockConnected(const interfaces::BlockInfo& block)
63 {
64-    m_index.BlockConnected(block);
65+    if (m_index.IgnoreBlockConnected(block)) return;
66+
67+    const CBlockIndex* pindex = &m_index.BlockIndex(block.hash);
68+    const CBlockIndex* best_block_index = m_index.m_best_block_index.load();
69+    if (block.chain_tip && best_block_index && best_block_index != pindex->pprev && !m_index.Rewind(best_block_index, pindex->pprev)) {

furszy commented at 9:51 pm on August 28, 2023:

In a2e87d9d6:

I asked myself why this chain_tip check was placed here few times until saw 5e2b1ea1, which makes ThreadSync() re-use this code.

Probably, it would be helpful for other reviewers to have a comment saying: “chain_tip is used here to let ThreadSync() handle reorgs”.

ryanofsky commented at 2:57 pm on August 29, 2023:

re: #24230 (review)

In a2e87d9:

I asked myself why this chain_tip check was placed here few times until saw 5e2b1ea, which makes ThreadSync() re-use this code.

Probably, it would be helpful for other reviewers to have a comment saying: “chain_tip is used here to let ThreadSync() handle reorgs”.

Good point, that is unnecessarily confusing. Removed chain_tip reference here and added back later when it is actually used.

in src/index/coinstatsindex.cpp:271 in d260abb891 outdated

293-            ReverseBlock(block, iter_tip);
294-
295-            iter_tip = iter_tip->GetAncestor(iter_tip->nHeight - 1);
296-        } while (new_tip_index != iter_tip);
297-    }
298+    ReverseBlock(block);

furszy commented at 9:45 pm on August 30, 2023:

Seems that we are swallowing ReverseBlock() return value. When it could mean that there is something wrong with the index.

Do you want to fix it here or should I go with another PR?

ryanofsky commented at 1:50 pm on September 5, 2023:

re: #24230 (review)

Seems that we are swallowing ReverseBlock() return value. When it could mean that there is something wrong with the index.

Do you want to fix it here or should I go with another PR?

Good catch. It would be great if you want to follow up in different PR to reduce complexity here.

in src/index/base.cpp:97 in 5e2b1ea148 outdated

92         m_index.FatalErrorf("%s: Failed to write block %s to index",
93                    __func__, pindex->GetBlockHash().ToString());
94         return;
95     }
96 
97+    if (!m_index.m_synced) {

furszy commented at 9:50 pm on August 30, 2023:

In 5e2b1ea1:

I’m wondering what would change if we use if (!block.chain_tip) instead. Guess that nothing right?

ryanofsky commented at 1:51 pm on September 5, 2023:

re: #24230 (review)

In 5e2b1ea:

I’m wondering what would change if we use if (!block.chain_tip) instead. Guess that nothing right?

Right, that’s a good point that the block.chain_tip has same information as m_synced.

I like the idea of replacing all m_synced uses in this function with block.chain_tip, so I made that change. It should be better because it makes the code act directly on block information rather than going through an intermediate variable. After this, m_synced is only used to implement BaseIndex::GetSummary() and BaseIndex::BlockUntilSyncedToCurrentChain() and doesn’t affect indexing behavior anymore.

in src/interfaces/chain.h:301 in f74b34f2d4 outdated

296+    struct NotifyOptions
297+    {
298+        //! Include undo data with block connected notifications.
299+        bool connect_undo_data = false;
300+        //! Include block data with block disconnected notifications.
301+        bool disconnect_data = false;

furszy commented at 1:39 pm on August 31, 2023:

This field is only set by the coinstats index but never used.

Guessing that it is because the notifications now go through the NotificationsProxy class, who automatically fill-up the block data during disconnection.

While this is ok for the coinstats index, it is not that good for the other indexes. As they do not use the block data during disconnection, all these disk reads mean a resources waste for them.

ryanofsky commented at 1:51 pm on September 5, 2023:

re: #24230 (review)

This field is only set by the coinstats index but never used.

Guessing that it is because the notifications now go through the NotificationsProxy class, who automatically fill-up the block data during disconnection.

While this is ok for the coinstats index, it is not that good for the other indexes. As they do not use the block data during disconnection, all these disk reads mean a resources waste for them.

Good point. Removed this variable. Potentially the SyncChain method could use it to avoid reading data but this would be a minor optimization and in any case I never implemented it.

in src/index/base.cpp:35 in f74b34f2d4 outdated

39@@ -40,11 +40,6 @@ void BaseIndex::FatalErrorf(const char* fmt, const Args&... args)
40     node::AbortNode(m_chain->context()->exit_status, message);
41 }
42 
43-const CBlockIndex& BaseIndex::BlockIndex(const uint256& hash)
44-{
45-   return WITH_LOCK(cs_main, return *Assert(m_chainstate->m_blockman.LookupBlockIndex(hash)));
46-}
47-

furszy commented at 3:12 pm on August 31, 2023:

In f74b34f2:

BaseIndex::BlockIndex definition was removed but not its declaration in the header.

ryanofsky commented at 1:52 pm on September 5, 2023:

re: #24230 (review)

In f74b34f:

BaseIndex::BlockIndex definition was removed but not its declaration in the header.

Good catch. Removed

in src/index/base.cpp:246 in 0550d1749b outdated

420@@ -354,49 +421,6 @@ bool BaseIndex::Commit(const CBlockLocator& locator)
421     return true;
422 }
423 
424-bool BaseIndex::Rewind(const CBlockIndex* current_tip, const CBlockIndex* new_tip)

furszy commented at 3:15 pm on August 31, 2023:

In 0550d174:

BaseIndex::Rewind definition was removed but not its declaration in the header. Also, there is a comment in IgnoreBlockConnected() referring to this method that should be updated.

ryanofsky commented at 1:52 pm on September 5, 2023:

re: #24230 (review)

In 0550d17:

BaseIndex::Rewind definition was removed but not its declaration in the header. Also, there is a comment in IgnoreBlockConnected() referring to this method that should be updated.

Thanks, fixed both of these things

DrahtBot added the label Needs rebase on Sep 5, 2023

ryanofsky force-pushed on Sep 5, 2023

ryanofsky commented at 7:00 pm on September 5, 2023: contributor

Thanks for reviewing! I think I’ve caught up with all the unanswered comments. Also rebased due to conflict

Rebased f74b34f2d4d30f6895cfbeed2ec655a60290e990 -> 855dedc6fc4d2be0acac5ee0f52184761c495355 (pr/indexy.44 -> pr/indexy.45, compare) due to conflict with #28195, and implementing various review suggestions

DrahtBot removed the label Needs rebase on Sep 5, 2023

fanquake referenced this in commit fd69ffbbfb on Sep 12, 2023

DrahtBot added the label Needs rebase on Sep 12, 2023

ryanofsky force-pushed on Sep 12, 2023

ryanofsky commented at 6:18 pm on September 12, 2023: contributor

Rebased 855dedc6fc4d2be0acac5ee0f52184761c495355 -> 0d37538507e31468d24ac3249dc09ba6f44d4439 (pr/indexy.45 -> pr/indexy.46, compare) due to conflicts with #28427

DrahtBot removed the label Needs rebase on Sep 12, 2023

DrahtBot added the label CI failed on Sep 12, 2023

DrahtBot added the label Needs rebase on Sep 14, 2023

ryanofsky force-pushed on Sep 14, 2023

ryanofsky commented at 12:37 pm on September 14, 2023: contributor

Rebased 0d37538507e31468d24ac3249dc09ba6f44d4439 -> 84f38664060bba6326a51c4303e44b524aaf7c85 (pr/indexy.46 -> pr/indexy.47, compare) due to conflict with #28458 Updated 84f38664060bba6326a51c4303e44b524aaf7c85 -> e09e18601db94107992dfb086c77081ffc5e01ec (pr/indexy.47 -> pr/indexy.48, compare) fixing compile error from bad rebase conflict resolution (https://cirrus-ci.com/task/4702319267282944)

DrahtBot removed the label Needs rebase on Sep 14, 2023

ryanofsky force-pushed on Sep 14, 2023

DrahtBot removed the label CI failed on Sep 14, 2023

Frank-GER referenced this in commit 5f3d90b597 on Sep 19, 2023

DrahtBot added the label Needs rebase on Sep 26, 2023

in src/index/base.cpp:86 in b4328dc89c outdated

81+// notifications, and determine where best block is relative to chain tip.
82+//
83+// If the chain tip and index best block are the same, block connected and
84+// disconnected notifications will be enabled after this call and the index will
85+// update as the ImportBlocks() function connects blocks and sends
86+// notifications. Otherwise, when the chain tip and index best block not the

TheCharlatan commented at 12:14 pm on December 18, 2023:

Nit: s/best block not the same/best block are not the same/

ryanofsky commented at 4:20 pm on March 13, 2024:

re: #24230 (review)

Nit: s/best block not the same/best block are not the same/

Thanks, fixed

in src/index/base.cpp:365 in 8d0cc07587 outdated

362@@ -338,30 +363,14 @@ void BaseIndex::BlockConnected(const interfaces::BlockInfo& block_info)
363         // reorg, Rewind call below will remove existing blocks from the index
364         // before adding the new one.
365         assert(best_block_index->GetAncestor(pindex->nHeight - 1) == pindex->pprev);

TheCharlatan commented at 12:43 pm on December 22, 2023:

Nit: In commit 8d0cc07587d4acfab3a8b4f636b631467c8e5f19 why is this assertion not moved as well to blockConnected? Similarly the assertion in here in IgnoreChainstateFlushed.

ryanofsky commented at 4:25 pm on March 13, 2024:

re: #24230 (review)

Nit: In commit 8d0cc07 why is this assertion not moved as well to blockConnected? Similarly the assertion in here in IgnoreChainstateFlushed.

This commit is moving code from BaseIndex::BlockConnected to BaseIndexNotifications::blockConnected and from BaseIndex::ChainStateFlushed to BaseIndexNotifications::blockConnected.

The code that is being moved is the code that will be kept in the new methods and deals with updating the index state.

The code that is not being moved is code that will be deleted later (in commit “Rewrite chain sync logic, remove racy init”) and deals with ignoring spurious notification.

The asserts could be moved here but it would make the diff bigger for no reason.

In general, the PR is removing all code from src/index that deals with CBlockIndex pointers, and changing the code as little as possible before removing it.

TheCharlatan commented at 8:51 pm on December 22, 2023: contributor

Concept ACK

in src/index/blockfilterindex.cpp:237 in f2a0551eb8 outdated

232-        // will be removed in upcoming commit
233-        const CBlockIndex* pindex = WITH_LOCK(cs_main, return m_chainstate->m_blockman.LookupBlockIndex(block.hash));
234-        if (!m_chainstate->m_blockman.UndoReadFromDisk(block_undo, *pindex)) {
235-            return false;
236-        }
237-

TheCharlatan commented at 10:06 pm on December 23, 2023:

NIt: In commit f2a0551eb8ec8e115778dde5b833a616461da9a4: Would it make sense to add an assert(block.undo_data) here? I think that could make the code here a bit clearer.

ryanofsky commented at 4:21 pm on March 13, 2024:

re: #24230 (review)

NIt: In commit f2a0551: Would it make sense to add an assert(block.undo_data) here? I think that could make the code here a bit clearer.

Good suggestion, added

in src/index/base.cpp:168 in 8a164cf087 outdated

163+
164+    // During initial sync, ignore validation interface notifications, only
165+    // process notifications from sync thread.
166+    if (!m_index.m_ready && block.chain_tip) return;
167+
168+    const CBlockIndex* pindex = &m_index.BlockIndex(block.hash);

TheCharlatan commented at 10:56 am on December 31, 2023:

Nit: In commit 8a164cf087c480e68e9ec87a97f759b97015efea: I feel like this would be a bit more readable as a CBlockIndex&.

ryanofsky commented at 4:25 pm on March 13, 2024:

re: #24230 (review)

Nit: In commit 8a164cf: I feel like this would be a bit more readable as a CBlockIndex&.

A thing to keep in mind is that this PR removes all CBlockIndex references from indexing code, so this code is temporary and won’t last for long.

The old “pindex” variables are being replaced with new “block” variable, so it’s actually convenient that old code and new code are written in different styles and their names don’t clash. If I would try to write old code in a newer style, I think it would just be more name clashes.

in src/index/base.cpp:273 in e09e18601d outdated

287-        if (!locator_index) {
288-            return InitError(strprintf(Untranslated("%s: best block of the index not found. Please rebuild the index."), GetName()));
289+    auto options = CustomOptions();
290+    options.thread_name = GetName();
291+    auto notifications = std::make_shared<BaseIndexNotifications>(*this);
292+    auto start_sync = [&](const interfaces::BlockInfo& block) {

TheCharlatan commented at 12:29 pm on January 1, 2024:

Nit: Might prepare_sync (or something similar) be a better name for this lambda? The way it is named I would expect this to be the function that triggers the syncing to start, but that only happens once StartBackgroundSync is called.

ryanofsky commented at 4:20 pm on March 13, 2024:

re: #24230 (review)

Nit: Might prepare_sync (or something similar) be a better name for this lambda? The way it is named I would expect this to be the function that triggers the syncing to start, but that only happens once StartBackgroundSync is called.

Nice suggestion, renamed this everywhere. “Start” functions are often used to start threads or other asynchronous tasks, but this callback is completely synchronous so “prepare” is a better name.

in src/index/base.cpp:196 in c60c735320 outdated

191-                        __func__, pindex->GetBlockHash().ToString());
192-            return;
193-        } else {
194-            block.data = &block_data;
195-        }
196-    }

TheCharlatan commented at 1:34 pm on January 1, 2024:

Nit: In commit c60c735320694d124212a41e7e7a9d204ff35a6f: Could add a block.data assertion at the top of the method?

ryanofsky commented at 4:26 pm on March 13, 2024:

re: #24230 (review)

Nit: In commit c60c735: Could add a block.data assertion at the top of the method?

Good idea, added.

in src/index/base.cpp:146 in c60c735320 outdated

142@@ -143,7 +143,7 @@ void BaseIndexNotifications::blockConnected(const interfaces::BlockInfo& block_i
143         return;
144     }
145 
146-    if (!block.chain_tip && (m_last_locator_write_time + SYNC_LOCATOR_WRITE_INTERVAL < current_time || m_index.m_interrupt)) {
147+    if (!block.chain_tip && (m_last_locator_write_time + SYNC_LOCATOR_WRITE_INTERVAL < current_time || WITH_LOCK(m_index.m_mutex, return !m_index.m_notifications.get()))) {

TheCharlatan commented at 1:41 pm on January 1, 2024:

Just a question: Why call get() here? Is there a difference to return !m_index.m_notifications?

ryanofsky commented at 4:26 pm on March 13, 2024:

re: #24230 (review)

Just a question: Why call get() here? Is there a difference to return !m_index.m_notifications?

No that just seems like an oversight. Good catch, this should be simpler now.

in src/kernel/chain.cpp:54 in 1664a63578 outdated

27@@ -24,4 +28,56 @@ interfaces::BlockInfo MakeBlockInfo(const CBlockIndex* index, const CBlock* data
28     info.data = data;
29     return info;
30 }
31+
32+bool SyncChain(node::BlockManager& blockman, const CChain& chain, const CBlockIndex* block, std::shared_ptr<interfaces::Chain::Notifications> notifications, const CThreadInterrupt& interrupt, std::function<void()> on_sync)

TheCharlatan commented at 5:44 pm on January 1, 2024:

In commit 1664a63578941bafaa36d42f438e2e5cdac96583: Looking at this function, I am not sure why it is in the kernel namespace. More generally, I am confused by the responsibilities of this file in the first place. Seems like it could just be split up into chain.cpp and node/interfaces.cpp? Is it weird that a kernel file includes interfaces/chain.h? That seems like a higher order include to me that the kernel should not need to know of.

ryanofsky commented at 4:22 pm on March 13, 2024:

re: #24230 (review)

In commit 1664a63: Looking at this function, I am not sure why it is in the kernel namespace. More generally, I am confused by the responsibilities of this file in the first place. Seems like it could just be split up into chain.cpp and node/interfaces.cpp? Is it weird that a kernel file includes interfaces/chain.h? That seems like a higher order include to me that the kernel should not need to know of.

Yes, this should be cleaned up, but I might wait for #29015 to do it. #29015 adds src/common/types.h and src/node/types.h files, and I think it would make sense to move the BlockInfo to src/node/types.h and move MakeBlockInfo() and SyncChain() functions to src/node/chain.cpp.

When I was first implementing this PR, the SyncChain() method seemed like a general purpose method that would be useful to include in libbitcoinkernel for syncing external state starting at an old block to the current chain tip. This could be useful for wallets, indexes, or anything else storing data related to the chain. But in retrospect it drags in a few dependencies to the kernel which are probably not worth having. It might make sense to add to the kernel later if we saw applications that could benefit from this functionality. But there isn’t much reason to add it preemptively.

in src/kernel/chain.h:43 in 1664a63578 outdated

37+//! @param on_sync - optional callback invoked when reaching the chain tip
38+//!                  while cs_main is still held, before sending a final
39+//!                  blockConnected notification. This can be used to
40+//!                  synchronously register for new notifications.
41+//! @return true if synced, false if interrupted
42+bool SyncChain(node::BlockManager& blockman, const CChain& chain, const CBlockIndex* block, std::shared_ptr<interfaces::Chain::Notifications> notifications, const CThreadInterrupt& interrupt, std::function<void()> on_sync);

TheCharlatan commented at 11:24 am on January 2, 2024:

In commit 1664a63578941bafaa36d42f438e2e5cdac96583: The return type of SyncChain is unused. Are there future plans to use it, or could it be made void?

ryanofsky commented at 4:22 pm on March 13, 2024:

re: #24230 (review)

In commit 1664a63: The return type of SyncChain is unused. Are there future plans to use it, or could it be made void?

It could be made void, but I would be uncomfortable doing this because I think callers should be able to determine if a call was interrupted without having to check a separate interrupt variable. In general I think it’s good if functions return information about interrupt status when they can, so we don’t wind up with the kind of bad error handling #29642 tries to solve.

It is true that attachChain wrapper around SyncChain is not using the return value and not returning an interrupt status itself, but that is because attachChain is starting a thread, and can’t return the interrupt status in all cases, so it doesn’t try to return it in any case. But SyncChain is synchronous and can return the interrupt status in all cases, so I think it should.

in src/node/interfaces.cpp:851 in c8fde60faa outdated

780@@ -786,6 +781,12 @@ class ChainImpl : public Chain
781     }
782     std::unique_ptr<Handler> attachChain(std::shared_ptr<Notifications> notifications, const CBlockLocator& locator, const NotifyOptions& options, const StartSyncFn& start_sync) override
783     {
784+        std::unique_ptr<NotificationsHandlerImpl> handler;
785+        std::optional<interfaces::BlockInfo> start_block;

TheCharlatan commented at 12:08 pm on January 2, 2024:

Why is start_block moved here?

ryanofsky commented at 4:22 pm on March 13, 2024:

re: #24230 (review)

Why is start_block moved here?

start_block variable needs to live until the promise.set_value call below, but after that it can be destroyed, so it is declared right above the promise.

in src/node/interfaces.cpp:901 in c8fde60faa outdated

841+                // function runs. But in the reindex-chainstate case, since
842+                // chainstate data is wiped but indexes are not wiped, it is
843+                // most efficient for indexes to begin syncing after the
844+                // ImportBlocks function finishes, so the following sleep waits
845+                // for that to happen.
846+                while (handler->m_start.sleep_for(std::chrono::hours(1)));

TheCharlatan commented at 2:14 pm on January 2, 2024:

In commit c8fde60faad36d454cdd1300184a5169499f9e90: Why is this sleep limited to one hour? I get that ImportBlocks should finish within reasonable time, but what if it doesn’t?

ryanofsky commented at 4:23 pm on March 13, 2024:

re: #24230 (review)

In commit c8fde60: Why is this sleep limited to one hour? I get that ImportBlocks should finish within reasonable time, but what if it doesn’t?

This is just meant to wait forever, and the thread interrupt class, unlike the signal interrupt class doesn’t have a wait() method that doesn’t require a timeout value. I think this just needs to wait however long ImportBlocks takes, and there is not something better to be done here other than waiting for it.

in src/index/base.cpp:398 in 53a96c1ce7 outdated

398+void BaseIndex::SetBestBlock(const std::optional<interfaces::BlockKey>& block)
399 {
400     assert(!m_chainstate->m_blockman.IsPruneMode() || AllowPrune());
401 
402-    if (AllowPrune() && block) {
403+    if (block && AllowPrune()) {

TheCharlatan commented at 8:54 am on January 3, 2024:

Nit: In commit 53a96c1ce706e5375aab6765dbcf2754baa6adf1: Change not needed?

ryanofsky commented at 4:21 pm on March 13, 2024:

re: #24230 (review)

Nit: In commit 53a96c1: Change not needed?

Reverted, not sure where this came from.

in src/index/base.cpp:336 in 53a96c1ce7 outdated

331@@ -312,14 +332,14 @@ bool BaseIndex::BlockUntilSyncedToCurrentChain() const
332         return false;
333     }
334 
335-    if (const CBlockIndex* index = m_best_block_index.load()) {
336-        interfaces::BlockKey best_block{index->GetBlockHash(), index->nHeight};
337+    const auto best_block = WITH_LOCK(m_mutex, return m_best_block);
338+    if (best_block) {

TheCharlatan commented at 9:27 am on January 3, 2024:

Nit: In commit 53a96c1ce706e5375aab6765dbcf2754baa6adf1: Could just assign in the if block as done below.

ryanofsky commented at 4:20 pm on March 13, 2024:

re: #24230 (review)

Nit: In commit 53a96c1: Could just assign in the if block as done below.

Thanks, implemented this.

TheCharlatan commented at 10:45 am on January 3, 2024: contributor

Approach ACK

This change looks good, albeit its daunting length. The only thing I am still not sure about is the commit “indexes: Initialize indexes without holding cs_main” c8fde60faad36d454cdd1300184a5169499f9e90 and if there is some observable improvement. Maybe this commit could be left for another PR? Or does it achieve something that is strictly needed for this PR?

The amount of churn in the commits is a bit unfortunate, but I don’t have suggestions on how to improve this. Some of the nits I left are also resolved by later commits, so feel free to ignore them.

josibake commented at 12:00 pm on January 28, 2024: member

Concept ACK

Found this from https://github.com/bitcoin/bitcoin/blob/5fbcc8f0560cce36abafb8467339276b7c0d62b6/src/index/base.cpp#L87 while working with the indexing code.

ryanofsky force-pushed on Mar 13, 2024

ryanofsky commented at 9:23 pm on March 13, 2024: contributor

Rebased e09e18601db94107992dfb086c77081ffc5e01ec -> 6d3a10d9fbad43114317008445435b73896157d8 (pr/indexy.48 -> pr/indexy.49, compare) due to various conflicts

DrahtBot removed the label Needs rebase on Mar 13, 2024

DrahtBot added the label CI failed on Mar 14, 2024

DrahtBot commented at 2:19 am on March 14, 2024: contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/22632176621

ryanofsky referenced this in commit e295db870c on Mar 14, 2024

ryanofsky referenced this in commit c5583a35f3 on Mar 14, 2024

ryanofsky referenced this in commit 560063156a on Mar 14, 2024

ryanofsky force-pushed on Mar 15, 2024

ryanofsky commented at 6:26 pm on March 15, 2024: contributor

Updated 6d3a10d9fbad43114317008445435b73896157d8 -> 10e986b100e87768f8137765019afae57319f0bb (pr/indexy.49 -> pr/indexy.50, compare) with updates to fix silent conflicts with assumeutxo which were causing feature_assumeutxo.py test failures (https://github.com/bitcoin/bitcoin/runs/22632176621)

The PR conflicted in a few ways with assumeutxo:

Assumeutxo sends new BlockConnected/ChainstateFlushed notifications from the assumed-valid chain which need to be ignored by indexing code, and weren’t in being ignored in the previous version of this PR.
Assumeutxo restarts indexes after a snapshot is loaded, which broke commit “indexes, refactor: Remove index RegisterValidationInterface call” because it was not expecting BaseIndex::Init to be called after an index had been started, and was not resetting some new variables there.
Assumeutxo sends a ChainStateFlushed notification for the snapshot block after validating it but before it sends a BlockConnected notification for that block. This caused feature_assumeutxo.py to fail starting from commit “indexes: Avoid race, make -reindex-chainstate more efficient.”

This assumeutxo change seems pretty unfortunate and required a workaround to ignore the flush notification. The workaround is less than ideal however, and it would be better to send the notifications in right order: connected first, flushed second. There is more information about this issue in #29652.

ryanofsky force-pushed on Mar 18, 2024

ryanofsky commented at 5:06 pm on March 18, 2024: contributor

Updated 10e986b100e87768f8137765019afae57319f0bb -> 38c93e6a8e573dbdf3fc4d14be7483961e18a576 (pr/indexy.50 -> pr/indexy.51, compare) reorganizing sync code to be in src/node instead of src/kernel and making many changes to minimize diffs

DrahtBot removed the label CI failed on Mar 18, 2024

DrahtBot added the label Needs rebase on Mar 20, 2024

in src/index/blockfilterindex.cpp:333 in a608838b8c outdated

305 
306     // During a reorg, we need to copy all filters for blocks that are getting disconnected from the
307     // height index to the hash index so we can still find them when the height index entries are
308     // overwritten.
309-    if (!CopyHeightIndexToHashIndex(*db_it, batch, m_name, new_tip.height, current_tip.height)) {
310+    if (!CopyHeightIndexToHashIndex(*db_it, batch, m_name, block.height - 1, block.height)) {

furszy commented at 9:13 pm on April 1, 2024:

In a608838b8c79fc6:

q: why block.height - 1? It seems that the block filter index CopyHeightIndexToHashIndex will copy an extra entry to the hash index.

achow101 marked this as a draft on Apr 9, 2024

in src/node/chain.cpp:75 in ffa9e452ac outdated

113-                    __func__, pindex->GetBlockHash().ToString());
114-        } else {
115-            block_info.data = &block;
116+        bool synced = block == chain.Tip();
117+        if (synced || interrupt) {
118+            if (synced && on_sync) on_sync();

ryanofsky commented at 2:49 pm on April 16, 2024:

In commit “indexes: Rewrite chain sync logic, simplify index sync code” (ffa9e452ac7cfe08deecaae409eca3369dcccb76)

This commit has the same race condition reported in #29767 (which was introduced in #14121 and fixed in #29776), so when this PR is rebased it needs to incorporate a fix similar to #29776.

The problem is that index is considered synced before the last block is commited so new blocks could attached from the validationinterface thread while this thread is still committing. Probably a good fix would be to unlock cs_main and commit, then relock cs_main and call on_sync if there are no new blocks.

Sjors commented at 3:52 pm on July 23, 2024: member

There’s discussion in #29770 about whether to use some or all the things in this PR. Can you give it a rebase?

ryanofsky force-pushed on Jul 23, 2024

ryanofsky commented at 8:48 pm on July 23, 2024: contributor

Rebased 38c93e6a8e573dbdf3fc4d14be7483961e18a576 -> cf6d99bed842361fb06986780097fb80c9e879e4 (pr/indexy.51 -> pr/indexy.52, compare) due to conflicts with #29671, #29776, #29867, #30058

There’s discussion in #29770 about whether to use some or all the things in this PR. Can you give it a rebase?

Sure, thanks for the reminder

DrahtBot removed the label Needs rebase on Jul 23, 2024

DrahtBot added the label CI failed on Jul 24, 2024

DrahtBot commented at 2:10 am on July 24, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/27826877631

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot added the label Needs rebase on Aug 5, 2024

indexes, doc: Document current m_best_block_index behavior

Try to document race conditions and corner cases and idiosyncracies in current
indexing code to make upcoming changes easier to understand.

d5e6975aa3

indexes: Avoid race, make -reindex-chainstate more efficient

Avoid a race condition in index startup where indexes set m_synced = true too
early, while there are still notifications from previously connected blocks in
the validationinterface queue, causing those blocks to trigger the Rewind()
call in the BaseIndex::BlockConnected notification handler, and pointlessly
remove and then re-append the same blocks.

This race condition was easy to reproduce just by running the
test/functional/feature_coinstatsindex.py "Test that the index works with
-reindex-chainstate" test.

This change also allows simplifying code and getting rid of the "WARNING: Block %s
does not connect to an ancestor of known best chain (tip=%s); not updating
index" warning.

ea170e8c16

indexes, refactor: Remove index RegisterValidationInterface call

Add new interfaces::Chain::attachChain method and start moving index sync logic
out of src/index/ to src/node/. In this commit, move code that registers for
index validationinterface notifications. In later commits, move the actual sync
thread and add real blockConnected, blockDisconnected, and chainStateFlushed
implementations.

This commit does not change behavior in any way.

8317453306

indexes, refactor: Remove index validationinterface hooks

Split up and rename validationinterface callback functions
BaseIndex::BlockConnected to BaseIndex::IgnoreBlockConnected and
BaseIndex::ChainStateFlushed to BaseIndex IgnoreChainStateFlushed.

Most of the code in these functions was just ignoring spurious notifications,
so leave that code in place, but move the rest of the code to
BaseIndexNotifications::blockConnected and
BaseIndexNotifications::chainStateFlushed. In an upcoming commit the spurious
notifications will be dropped and the new BaseIndex::Ignore... methods will be
deleted entirely.

This commit does not change behavior in any way.

338d73e6ca

indexes, refactor: Stop incorrectly calling Interupt() and Stop() in BaseIndex destructor

This code never worked and was never used. Now that an m_handler member is
added, the incorrect usage is easier to detect and an assert can be added.

This commit does not change behavior in any way.

8f7e3be5a0

indexes, refactor: Add Commit CBlockLocator& argument

This commit does not change behavior in any way.

3214f6213e

indexes, refactor: Remove remaining CBlockIndex* uses in index Rewind methods

Move backwards ReadBlockFromDisk code from CoinStatsIndex::CustomRewind
to BaseIndex::Rewind

Move undo UndoReadFromDisk code from CoinStatsIndex::ReverseBlock to
BaseIndex::Rewind

This commit does change behavior slightly. Since the new CustomRemove
methods only take a single block at a time instead of a range of
disconnected blocks, when they call CopyHeightIndexToHashIndex they will
now do an index seek for each removed block instead of only seeking once
to the height of the earliest removed block. Seeking instead of scanning
is a little worse for performance if there is a >1 block reorg, but
probably not noticable unless the reorg is very deep.

707ff84981

indexes, refactor: Stop requiring CBlockIndex type to call IsBIP30Unspendable

This commit does not change behavior in any way.

9f3213e08a

indexes, refactor: Remove remaining CBlockIndex* uses in index CustomAppend methods

Move UndoReadFromBlock calls from BlockFilterIndex::CustomAppend and
CoinStatsIndex::CustomAppend to BaseIndexNotifications::blockConnected

Move CustomAppend call from BaseIndex::ThreadSync to blockConnected

This commit does not change behavior in any way.

b3e2e1970e

indexes, refactor: Move more new block logic out of ThreadSync to blockConnected

This commit does not change behavior in any way.

8c3957de62

indexes, refactor: Move Commit logic out of ThreadSync to notification handlers

This commit does not change behavior in any way.

d629698bf6

indexes, refactor: Move Rewind logic out of Rewind to blockDisconnected and ThreadSync

This main thing this commit code does is change rewind code to stop calling
`CustomRemove` in a loop, and instead call `blockDisconnected` in a loop, with
`blockDisconnected` calling `customRemove`. This fills out the
`blockDisconnected` method, so the rewind loop can later be moved out of
indexing code entirely, and left to the node code.

This commit does not change behavior in any way.

7c9d5d636b

indexes, refactor: Move CustomInit and error handling code out of ThreadSync to notification handlers

This commit does not change behavior in any way.

b7f5428a8e

indexes: Add blockfilterindex mutex

Add BlockFilterIndex::m_filter_mutex to avoid TSAN errors in next commit moving
index sync thread from indexes to node. This mutex was previously neccessary,
but TSAN did not detect errors until the sync thread was moved.

e0162fcdc0

indexes, refactor: Move sync thread from index to node

No changes in behavior, this change is mostly move-only.

45f06134e1

indexes, refactor: Remove SyncWithValidationInterfaceQueue call

This commit does not change behavior in any way.

0b0d5560dd

indexes: Rewrite chain sync logic, simplify index sync code

Add SyncChain method and improve Chain::attachChain behavior so it no longer
sends notifications from the sync thread at the same time as notifications from
validationinterface queue. This allows BaseIndex::m_ready variable and
surrounding logic to be dropped and indexing code to be simplified.

It is also a potential performance improvement since it avoids holding cs_main
while processing all indexing notifications. Previously cs_main was held while
sending some notifications in the attachChain sync thread.

7edb9cf983

indexes: Initialize indexes without holding cs_main

In attachChain, avoid locking cs_main while calling prepare_sync method to run
index CustomInit code.

This is implemented by unconditionally spawning index sync threads, instead of
only spawning them conditionally when the index best block is not the chain
tip. An alternative implementation could avoid the cost of spawning the sync
thread by calling prepare_sync from the validationinterface notification
thread. But this would be slightly more complex, and also potentially slower if
any current or future index takes longer to initialize, since it would
initialize serially instead of in parallel with other index and node init code.

d682a61e3f

indexes, refactor: Remove UndoReadFromDisk calls from indexing code

This commit does not change behavior in any way.

b7595eea3f

indexes, refactor: Remove remaining CBlockIndex* pointers from indexing code

This commit does not change behavior in any way.

1a7ff2b175

Remove direct index -> node dependency

Replace last m_blockman.UpdatePruneLock call with
interfaces::Chain::updatePruneLock call.

fcfed64bc1

ryanofsky force-pushed on Aug 6, 2024

DrahtBot removed the label Needs rebase on Aug 6, 2024

josibake commented at 11:01 am on August 7, 2024: member

Coming back to this and added it to my list of things to review. Is there an upstream PR this one depends on that I should be reviewing first, or is there another reason the PR is in draft?

ryanofsky commented at 2:30 pm on August 7, 2024: contributor

Coming back to this and added it to my list of things to review. Is there an upstream PR this one depends on that I should be reviewing first, or is there another reason the PR is in draft?

Thanks for being interested in this! The main reason it is in draft i just that it is too big and and I need to work on a way to split it up. For example the first commits and documentation and removing racy -reindex-chainstate behavior should be a separate PR. And I think I want to make a separate PR squashing down all the refactoring commits here into a single commit that is can be easier to review for correctness, even if it harder to see equivalence.

Another more urgent reason this is in draft is that tests broke after a recent very conflicted rebase, and I need to debug and go through the major fixes in that rebase: #29671 #29767 #29867, and make sure they are intact and working.

Not sure when I will have a chance to do these things, but I can definitely prioritize if there is interest in seeing this go forward. It would also be useful to know if particular parts of this PR are interesting or more appealing so I can split those off first.

josibake commented at 4:31 pm on August 7, 2024: member

Not sure when I will have a chance to do these things, but I can definitely prioritize if there is interest in seeing this go forward

Definitely interest from my side, but not super urgent. For context, I’m going to be focusing on reviewing kernel and multiprocess PRs going forward, so I’m generally interested in anything related to those topics. I had noticed some discussion on other indexing related PRs regarding the approach in this PR vs other approaches, which is what led me here.

It would also be useful to know if particular parts of this PR are interesting or more appealing so I can split those off first

What you described regarding the race condition and refactor splits makes sense to me. If you find time to fix the CI failures, I’ll take a closer look then and likely have some better feedback about which parts could be potentially split out into smaller chunks.

hebasto added the label Needs CMake port on Aug 16, 2024

maflcko removed the label Needs CMake port on Aug 29, 2024

in src/index/txindex.cpp:82 in fcfed64bc1

78@@ -79,7 +79,7 @@ bool TxIndex::FindTx(const uint256& tx_hash, uint256& block_hash, CTransactionRe
79         return false;
80     }
81 
82-    AutoFile file{m_chainstate->m_blockman.OpenBlockFile(postx, true)};
83+    AutoFile file{m_blockman.OpenBlockFile(postx, true)};

TheCharlatan commented at 10:59 am on August 31, 2024:

I feel like the entire FindTx function could just be moved into the block manager. Then there is no need to pass it around here and it can instead be called through the interface just like elsewhere.

DrahtBot added the label Needs rebase on Sep 2, 2024

DrahtBot commented at 10:46 pm on September 2, 2024: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

PastaPastaPasta referenced this in commit 52c65059ff on Oct 24, 2024

PastaPastaPasta referenced this in commit 40851bee61 on Oct 24, 2024

PastaPastaPasta referenced this in commit 52f036b316 on Oct 24, 2024

PastaPastaPasta referenced this in commit 8fb05d512f on Oct 24, 2024

DrahtBot commented at 1:11 am on November 30, 2024: contributor

⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.

DrahtBot commented at 0:11 am on February 27, 2025: contributor

⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.

DrahtBot commented at 1:50 am on May 27, 2025: contributor

⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.

in src/index/blockfilterindex.cpp:339 in b3e2e1970e outdated

335@@ -335,7 +336,7 @@ bool BlockFilterIndex::CustomRemove(const interfaces::BlockInfo& block)
336     if (!m_db->WriteBatch(batch)) return false;
337 
338     // Update cached header
339-    m_last_header = *Assert(ReadFilterHeader(new_tip.height, new_tip.hash));
340+    m_last_header = *Assert(ReadFilterHeader(block.height, block.hash));

furszy commented at 7:44 pm on June 6, 2025:

In b3e2e1970e46bcfbe19d4d421bb6866ceab40ca0:

This doesn’t seem to be correct. The last cached header is being set to the block we just disconnected. Also, it is in the wrong commit.

No need to fix it here, I’m cherry-picking this commit into another branch to push it as a separate PR anyway. But should be something like:

0m_last_header = *Assert(ReadFilterHeader(block.height - 1, *Assert(block.prev_hash)));

indexes: Stop using node internal types and locking cs_main, improve sync logic #24230

Code Coverage & Benchmarks

Reviews

Conflicts

Partial review

Raw notes