Add reindex=auto flag to automatically reindex corrupt data

AaronDewes commented at 7:37 am on May 26, 2021: none

This PR allows the reindex flag to be set to auto, which automatically starts a reindex if the chain state or block index are corrupt. This can be especially useful for Raspberry Pi based full nodes, which often experience power outages or similar issues which can corrupt data. It allows full node operators to make Bitcoin Core reindex automatically, without having to worry about removing the reindex flag again. (Which isn’t much effort, but can be annoying to forget)

laanwj added the label UTXO Db and Indexes on May 26, 2021

laanwj added the label Data corruption on May 26, 2021

laanwj commented at 9:08 am on May 26, 2021: member

Thanks for your first contribution!

I’m a bit divided on whether to add yet another configuration option for this. Either something like this makes sense as default behavior, or you might just as well follow the manual steps.

AaronDewes commented at 9:57 am on May 26, 2021: none

Either something like this makes sense as default behavior, or you might just as well follow the manual steps.

In my opinion, it would make sense to have this as the default behavior, because the user can’t do anything except reindexing anyway (Maybe still ask the user on the GUI though).

in src/init.cpp:1523 in ed68820fb4 outdated

1520@@ -1520,11 +1521,18 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
1521 
1522         if (!fLoaded && !ShutdownRequested()) {
1523             // first suggest a reindex

amadeuszpawlik commented at 4:15 pm on May 26, 2021:

Couldn’t this comment be (re)moved?

AaronDewes commented at 4:36 pm on May 26, 2021:

Yes, I’ll move it.

sipa commented at 4:39 pm on May 26, 2021: member

I’m not sure about this. If the chainstate gets corrupted the user should be made aware, so they can fix their hardware or setup.

If you reindex automatically without changing anything in the environment, there is no reason why the situation would be any different, and it’ll just end up reindexing again?

AaronDewes commented at 4:56 pm on May 26, 2021: none

I’m not sure about this. If the chainstate gets corrupted the user should be made aware, so they can fix their hardware or setup.

Yes, so maybe leave it as an option, of course, users should fix their setup in such a case, but a sudden power outage can happen, and this would allow to try to automatically fix this.

If you reindex automatically without changing anything in the environment, there is no reason why the situation would be any different, and it’ll just end up reindexing again?

If I understand the init logic correctly, the code doesn’t restart from scratch if a reindex is triggered at that point, and because Bitcoin Core doesn’t automatically rebuild a corrupt block database, only with this argument, I don’t think such a loop is possible. Also, I’m pretty sure the part of the code suggesting a reindex is only reached when a reindex is not already running, and a reindex would most likely work.

sipa commented at 4:59 pm on May 26, 2021: member

If a power outage causes corruptions, you probably have bigger problems with your setup.

My point is: if you automatically reindex, the user isn’t made aware of the corruption, can’t do anything to resolve whatever is causing it, and the next time it gets corrupted you’ll just start all over again.

AaronDewes commented at 5:07 pm on May 26, 2021: none

If a power outage causes corruptions, you probably have bigger problems with your setup.

Solutions like RapiBlitz, Umbrel or MyNode make setting up a node really easy. These are cheap and easy to build nodes, but Raspberry Pis don’t deal with power outages well, and that can often lead to data corruption. A power outage causing problems is in my opinion okay for personal nodes, many users just want to get their node set up without having to worry about (and pay for) an UPS or something similar.

My point is: if you automatically reindex, the user isn’t made aware of the corruption, can’t do anything to resolve whatever is causing it, and the next time it gets corrupted you’ll just start all over again.

Should I add a warning to the logs about it then? A sudden power outage is the most likely cause, and if something can be fixed, why not fix it automatically? A delay will be noticed by the user (and a warning), like when Bitcoin is “Rolling forward”, which is also triggered by an unclean shutdown.

sipa commented at 5:15 pm on May 26, 2021: member

@AaronDewes Is it so easy to corrupt such hardware? That sounds terrible…

I’ll let others chime in about this, I’m -0 on it. I think it’s always better to make sure the user is aware that corruptions happens so they can investigate - and just putting it in the log will mostly be overlooked. That said I see the convenience for setups where the user isn’t directly using the node themselves.

jamesob commented at 5:21 pm on May 26, 2021: member

I think provided

(i) this is not the default behavior, (ii) the code change is fairly limited (as it appears to be), (iii) automatic reindexes are clearly logged, and (iv) it actually does resolve the issue of a non-UPS’d setup recovering from a power-loss based corruption,

then this is a pretty compelling feature to add.

I am not sure how frequent corruption that is solely based upon power loss (vs. bad hardware) is, though.

AaronDewes commented at 5:27 pm on May 26, 2021: none

I am not sure how frequent corruption that is solely based upon power loss (vs. bad hardware) is, though.

I’ve been helping a lot of Umbrel users with such issues, and if it was a bad drive, either

(a) The drive wouldn’t mount (b) There would be different errors (that I don’t remember right now)

Every time this message got printed, a reindex actually helped.

and just putting it in the log will mostly be overlooked.

For this, it maybe could help to provide information that the node is reindexing over the RPC api, becauce I can’t find such an endpoint. This way, custom UIs build on top of that API could also prompt the user with a warning, so it is more likely to be seen. I’m not sure if that should be part of this or another PR though.

(iii) automatic reindexes are clearly logged

I’ll do that.

(iv) it actually does resolve the issue of a non-UPS’d setup recovering from a power-loss based corruption,

A manual reindex did it, so it should be the same for this.

ryanofsky commented at 6:25 pm on May 26, 2021: contributor

This seems like a good feature. Just to bikeshed, maybe this could be a new reindex value, -reindex=auto complementing current -reindex=0 and -reindex=1 so it is less vague than -attemptrecovery and maybe more discoverable. attemptrecovery sounds more like a one time emergency flag to me than something I would configure permanently on my raspberry pi.

maflcko commented at 4:45 am on May 27, 2021: member

having to worry about removing the reindex flag again. (Which isn’t much effort, but can be annoying to forget)

An alternative solution to that would be to only store fReindex on -reindex and then stop immediately with instructions to restart. Also it will refuse to start with -reindex enabled again.

AaronDewes force-pushed on May 27, 2021

AaronDewes commented at 6:33 am on May 27, 2021: none

I implemented @ryanofsky’s idea now, with reindex defaulting to 0, but I can change the default to auto if that makes more sense for you.

An alternative solution to that would be to only store fReindex on -reindex and then stop immediately with instructions to restart.

This is true, but the user would still have to manually add this flag in case of a power outage.

This can be especially useful for Raspberry Pi based full nodes, which often experience power outages or similar issues which can corrupt data.

And what do you think about what I wrote above?

and just putting it in the log will mostly be overlooked.

For this, it maybe could help to provide information that the node is reindexing over the RPC api

AaronDewes renamed this:
~~Add flag to automatically reindex corrupt data~~
Add reindex=auto flag to automatically reindex corrupt data
on May 27, 2021

Add reindex=auto flag to automatically reindex corrupt data

This PR allows the reindex flag to be set to auto, which automatically starts a reindex if the chain state or block index are corrupt.
This can be especially useful for  Raspberry Pi based full nodes, which often experience power outages or similar issues which can corrupt data.
It allows full node operators to make Bitcoin Core reindex automatically, without having to worry about removing the reindex flag again. (Which isn't much effort, but can be annoying to forget)

602f4da917

AaronDewes force-pushed on May 31, 2021

luke-jr referenced this in commit 291ac87866 on Jun 27, 2021

DrahtBot commented at 8:40 pm on April 7, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#24789 (init, index: disallow indexes when running reindex-chainstate by mzumsande)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label Needs rebase on Apr 26, 2022

DrahtBot commented at 12:14 pm on April 26, 2022: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

achow101 commented at 6:30 pm on October 12, 2022: member

Closing this as it has not had any activity in a while. If you are interested in continuing work on this, please leave a comment so that it can be reopened.

achow101 closed this on Oct 12, 2022

luke-jr referenced this in commit 069ccfcbc4 on Jun 27, 2023

bitcoin locked this on Oct 12, 2023

Add reindex=auto flag to automatically reindex corrupt data #22072

Conflicts