index: ignore BlockConnected if pindex is in chain #25462

Crypt-iQ commented at 12:50 AM on June 24, 2022: contributor

This change avoids unnecessary calls to Rewind, WriteBlock, and SetBestBlockIndex if the pindex is an ancestor of m_best_block_index. As the commit message describes, this can happen if the sync thread is almost done catching up, the tip is updated, the sync thread catches up to the new tip and sets m_synced to true, then the BlockConnected event for the new tip comes through. This leads to WriteBlock being called twice for the tip (though if more than 1 block is connected, WriteBlock would be called twice for each of them). For the blockfilterindex, this means that the filter file will contain a duplicate entry. The other indices seem to handle this fine. Because the blockfilterindex is not corrupted and the occurrence might be unlikely, I wasn't sure if this was a valuable patch or not (though IMO current behavior is not "correct"), so looking for Concept ACK or NACK. I was able to test that the filter file stores a dupe entry by adding some small sleeps.

index: ignore BlockConnected if pindex is in chain

This can happen if:
- the sync thread is almost done catching up
- the tip is then updated
- the sync thread syncs to tip
- the BlockConnected event is received after the sync thread is done

This leads to an unnecessary Rewind and WriteBlock. It also creates
a duplicate entry in the block filter file.

270579f85f

DrahtBot added the label UTXO Db and Indexes on Jun 24, 2022

DrahtBot commented at 2:51 AM on June 24, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#24230 (indexes: Stop using node internal types and locking cs_main, improve sync logic by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot cross-referenced this on Jun 24, 2022 from issue indexes: Stop using node internal types and locking cs_main, improve sync logic by ryanofsky

fanquake commented at 9:07 AM on June 24, 2022: member

cc @mzumsande @ryanofsky

Crypt-iQ commented at 12:04 PM on June 24, 2022: contributor

I didn't see the conflict PR earlier #24230, but I think it solves this issue by only registering on the validation interface once the initial sync is done

in src/index/base.cpp:268 in 270579f85f

 263 | @@ -262,13 +264,27 @@ void BaseIndex::BlockConnected(const std::shared_ptr<const CBlock>& block, const
 264 |                        best_block_index->GetBlockHash().ToString());
 265 |              return;
 266 |          }
 267 | -        if (best_block_index != pindex->pprev && !Rewind(best_block_index, pindex->pprev)) {
 268 | +
 269 | +        // If best_block_index is an ancestor of pindex, then this is a block that we've already

ryanofsky commented at 1:21 PM on June 24, 2022:

In commit "index: ignore BlockConnected if pindex is in chain" (270579f85fbe8cf97a9f506a258ad3ed865fcbeb)

Comment is a little off. Should say "If pindex in an ancestor of best_block_index" not "If best_block_index is an ancestor of pindex"

in src/index/base.cpp:274 in 270579f85f

 270 | +        // processed. This can occur if the sync thread is almost done catching up, the tip
 271 | +        // updates to the next block, and the sync thread finishes catching up to tip before
 272 | +        // the BlockConnected event is received. If we've already seen it, there's no reason to
 273 | +        // call Rewind, WriteBlock, and SetBestBlockIndex.
 274 | +        if (best_block_index->GetAncestor(pindex->nHeight) == pindex) {
 275 | +            skip_write = true;

ryanofsky commented at 1:24 PM on June 24, 2022:

In commit "index: ignore BlockConnected if pindex is in chain" (270579f85fbe8cf97a9f506a258ad3ed865fcbeb)

Seems like code would be more straightforward without skip_write variable. Maybe just:

if (best_block_index->GetAncestor(pindex->nHeight) == pindex) {
    LogPrintf("%s: WARNING: Block %s does is already-processed ancestor of " /* Continued */
              "known best chain (tip=%s); not updating index\n",
              __func__, pindex->GetBlockHash().ToString(),
              best_block_index->GetBlockHash().ToString());
    return;
}

ryanofsky commented at 1:58 PM on June 24, 2022: contributor

I think this is probably better behavior but I'm a little unsure about this. Technically even though there is a race condition at the end of sync in current code and it can connect the same blocks twice in a row, it will Rewind() before it connects them the second time, so there is not an actual bug here, just inefficient data storage, because the index does not check to see if old block data has been written before writing new block data.

This fix prevents duplicate block data from being written in that case, which should be a good thing. But it does not prevent duplicate data from being written in the general case (which maybe blockfilterindex could do internally if it wanted to) if there are multiple reorgs.

Possible reasons not to make this change:

Maybe behavior is worse with invalidateblock / reconsiderblock block usage? I didn't check details of this but it seems like if best block is height 100, blocks 91-100 are invalidated, then blocks 91-95 were reconsidered, previously the index would rewind and sync to latest valid block 95, but now with this PR it would stay synced at invalid block 100. Even if this happens it seems fine, but I would want to think a little more about whether there are other side effects from this change and be explicit about what they are in PR description or release notes.
As mentioned #24230 avoids this problem by just avoiding the race condition between sync thread and validationinterface notifications which causes it. So I do think #24230 is a better long term fix. But still this PR could be a short term improvement

Crypt-iQ commented at 12:28 PM on June 26, 2022: contributor

Even if this happens it seems fine, but I would want to think a little more about whether there are other side effects from this change and be explicit about what they are in PR description or release notes.

I checked each of the indexes behavior in this situation and it seems fine, but is not something I considered when opening the PR. I also didn't consider the multiple re-org case which this PR doesn't fix.

So I do think #24230 is a better long term fix. But still this PR could be a short term improvement

After taking a look at #24230 and seeing that it fixes the racy issues with indexes and the validation interface queue, I don't think it makes much sense to have a short term change when there is a PR that fixes the root of the problem. So I think I'll close this unless you have a strong opinion about keeping this one?

mzumsande commented at 8:51 PM on June 26, 2022: contributor

I checked each of the indexes behavior in this situation and it seems fine, but is not something I considered when opening the PR. I also didn't consider the multiple re-org case which this PR doesn't fix.

Yes, I think the index would just stay at the previous tip in case of invalidateblock, and I think that the RPC gettxoutsetinfo would keep on working correctly. However, this probably wouldn't go so well with the current save/load logic assumptions: With the coinstatsindex, the running muhash would stay at the old (now invalidated) tip. If you now stop the node, a locator corresponding to that old tip is saved. On next startup, we'd read that locator together with the muhash, call FindForkInGlobalIndex() and set the index' best block to the last valid block (before the invalidated one). So the muhash and the best block would be out of sync, and we'd probably throw an InitError here.

Crypt-iQ commented at 12:47 PM on June 27, 2022: contributor

With the coinstatsindex, the running muhash would stay at the old (now invalidated) tip. If you now stop the node, a locator corresponding to that old tip is saved.

Oh I didn't think about that, that would be bad.

Crypt-iQ closed this on Jun 27, 2022

ryanofsky commented at 2:12 PM on June 27, 2022: contributor

Oh I didn't think about that, that would be bad.

This problem would be fixed by #25193, I think, but it is another good complication to know about with this PR. Since I guess #24230 will be the alternative to this closed PR, I try to split that up and get more review for it

bitcoin locked this on Jun 27, 2023