index: make indices robust against init aborts

mzumsande commented at 10:16 pm on January 20, 2022: contributor

When an index thread receives an interrupt during init before it got to index anything (so m_best_block_index == nullptr still), it will still try to commit previous “work” before stopping the thread. That means that BaseIndex::CommitInternal() calls GetLocator(nullptr), which returns an locator to the tip (code), and saves it to the index DB. On the next startup, this locator will be read and it will be assumed that we have successfully synced the index to the tip, when in reality we have indexed nothing. In the case of coinstatsindex, this would lead to a shutdown of bitcoind without any indication what went wrong. For the other indexes, there would be no immediate shutdown, but the index would be corrupt.

This PR fixes this by not committing when m_best_block_index==nullptr, and it also adds an error log message to the silent coinstatsindex shutdown path.

This is another small bug found by feature_init.py - the second commit enables blockfilterindex and coinstatsindex for this test, enabling coinstatsindex without the first commit would have led to frequent failures.

mzumsande commented at 10:17 pm on January 20, 2022: contributor

fyi @fjahr @jamesob

jamesob commented at 10:22 pm on January 20, 2022: member

ACK https://github.com/bitcoin/bitcoin/pull/24117/commits/e87ee7569b08c6ad2e859ae568d85ae2b213b33c pending CI

Nice find! Code looks good.

DrahtBot added the label UTXO Db and Indexes on Jan 20, 2022

DrahtBot commented at 7:56 am on January 21, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#24230 (indexes: Stop using node internal types and locking cs_main, improve sync logic by ryanofsky)
#24133 (index: Improve robustness of coinstatsindex at restart by fjahr)
#15606 (assumeutxo by jamesob)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Successahead approved

fjahr commented at 10:50 pm on January 21, 2022: contributor

Concept ACK

enabling coinstatsindex without the first commit would have led to frequent failures.

Can you expand on this? I did not get any failures when I removed the changes from the first commit.

shaavan approved

shaavan commented at 2:38 pm on January 22, 2022: contributor

Code Review ACK e87ee7569b08c6ad2e859ae568d85ae2b213b33c

It makes sense not to commit when nothing is indexed, and I agree with it. The updates done to the test file are logical and make sense. The feature_init.py test ran successfully on the PR branch.

However, I was not able to verify this claim:

enabling coinstatsindex without the first commit would have led to frequent failures.

I ran the updated test multiple times on master, and it was successful each time.

mzumsande commented at 10:27 pm on January 22, 2022: contributor

Can you expand on this? I did not get any failures when I removed the changes from the first commit.

I ran the updated test multiple times on master, and it was successful each time.

That’s interesting! For me, it fails in the first part, between the runs for the expected lines init message: Verifying blocks (where the index gets corruptedafter the interrupt) and init message: Starting network threads (where the nodes fails to start because coinstatsindex terminates bitcoind). I just ran the updated test on current master 50 times and got 26 failures, so it fails ~50% of the time for me.

But I think it is possible that the timing can be different on different systems - the interrupt must come at a specific point in time during init for the failure to occur, so having a faster or slower system might change things a lot.

fjahr commented at 6:04 pm on January 23, 2022: contributor

Ok, I still think the changes are valuable even though I can not reproduce the failure.

Code review ACK e87ee75

DrahtBot added the label Needs rebase on Jan 31, 2022

index: Don't commit without valid m_best_block_index

Also report an error when coinstatsindex init fails.

0243907fae

test: activate all index types in feature_init.py bfcd60f5d5

mzumsande force-pushed on Jan 31, 2022

mzumsande commented at 8:26 pm on January 31, 2022: contributor

Rebased

DrahtBot removed the label Needs rebase on Jan 31, 2022

fjahr commented at 6:57 pm on February 12, 2022: contributor

reACK bfcd60f5d505334230013de4115483b22a7898ee

(CI failure looks unrelated)

shaavan approved

shaavan commented at 10:28 am on February 14, 2022: contributor

reACK bfcd60f5d505334230013de4115483b22a7898ee

Changes since my last review:

The PR is rebased which leading to removal of line: node.start(extra_args=['-txindex=1', '-blockfilterindex=1', '-coinstatsindex=1']) where it was no more needed.

I agree with @fjahr. Though I could not reproduce the intended failure, I think that these changes are valuable in their own right and can be merged.

maflcko merged this on Feb 15, 2022

maflcko closed this on Feb 15, 2022

sidhujag referenced this in commit 940a8e9c06 on Feb 16, 2022

Sjors commented at 6:10 pm on June 14, 2022: member

I still managed to corrupt this index during a spontaneous machine reboot, on the latest master. Unfortunately I forgot to keep a copy for more debugging.

Session 1 and 2:

 02022-06-14T18:06:18Z Opening LevelDB in /Users/sjors/Library/Application Support/Bitcoin/indexes/coinstats/db
 12022-06-14T18:06:18Z Opened LevelDB successfully
 22022-06-14T18:06:18Z Using obfuscation key for /Users/sjors/Library/Application Support/Bitcoin/indexes/coinstats/db: 0000000000000000
 32022-06-14T18:06:19Z ERROR: Init: Cannot read current coinstatsindex state; index may be corrupted
 4```
 5
 6Session 3:
 7
 8```
 92022-06-14T18:07:41Z coinstatsindex thread start
102022-06-14T18:07:41Z Syncing coinstatsindex with block chain from height 0
112022-06-14T18:07:41Z ERROR: Commit: Failed to commit latest coinstatsindex state
12```

mzumsande commented at 8:00 pm on June 14, 2022: contributor

That’s interesting - was the initial reboot that corrupted the index during initial sync of the index, or after that with validationinterface signals having taken over? And did you change anything between Session 1/2 and 3?

Sjors commented at 8:00 am on June 15, 2022: member

Unfortunately I didn’t track this very well, because I only found out some days ofter the reboot. My uptime says 20 days, so I guess it wasn’t even a full reboot, just enough for macOS to kill every application that was running. In other words, I don’t know when it happened relative to the log entries.

ryanofsky referenced this in commit 8bc1602821 on Jul 15, 2022

ryanofsky referenced this in commit 7878f97bf1 on Jul 18, 2022

ryanofsky referenced this in commit 65ba19fcdd on Jul 18, 2022

stickies-v referenced this in commit 0e6f5ccfd3 on Aug 2, 2022

Rspigler referenced this in commit bf7c5f7602 on Aug 21, 2022

aguycalled referenced this in commit 30f4f832c0 on Aug 25, 2022

mxaddict referenced this in commit d13efa1944 on Aug 29, 2022

Fabcien referenced this in commit 6175151d3a on Nov 25, 2022

janus referenced this in commit c6bb5b42bc on Jan 20, 2023

bitcoin locked this on Jun 15, 2023

mzumsande deleted the branch on Oct 13, 2023

index: make indices robust against init aborts #24117

Conflicts