Adding alert for failure to prevent dead-end user crash #33127

pull Ataraxia009 wants to merge 1 commits into bitcoin:master from Ataraxia009:launch-crash-failure changing 1 files +5 −0
  1. Ataraxia009 commented at 12:00 pm on August 2, 2025: none

    In some cases after running the client for a bit, the chainstate seems to have blocks with a lower validity level than VALID_SCRIPTS. This causes a crash on launch in the PruneBlockIndexCandidates function, where the setBlockIndexCandidates is empty after filtering by m_chain.

    Could not figure out root cause as to why, so for now just added an alert so the user is not deadended.

  2. DrahtBot commented at 12:00 pm on August 2, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33127.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    No conflicts as of last run.

  3. furszy commented at 12:18 pm on August 2, 2025: member

    Could not figure out root cause as to why, so for now just added an alert so the user is not deadended.

    Could you share steps to reproduce it?

  4. Ataraxia009 commented at 12:27 pm on August 2, 2025: none

    Could not figure out root cause as to why, so for now just added an alert so the user is not deadended.

    Could you share steps to reproduce it?

    Couldn’t really reproduce it myself, just a had a data dir in a bad state

    Which, if this assert is true, should imply we are in a bad data dir state where the chainstate is not included in the setBlockIndexCandidates and all setBlockIndexCandidates are less work.

  5. DrahtBot added the label CI failed on Aug 2, 2025
  6. DrahtBot commented at 1:05 pm on August 2, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/runs/47254107937 LLM reason (✨ experimental): The CI failure is caused by lint errors related to trailing whitespace.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. in src/validation.cpp:3308 in a2590d027e outdated
    3303+    if (setBlockIndexCandidates.empty()) {
    3304+        m_chainman.GetNotifications().fatalError(_("Seems like your data is corrupt. Try running -reindex-chainstate or -reindex. Shutting down..."));
    3305+    }
    3306+    
    3307     // Either the current tip or a successor of it we're working towards is left in setBlockIndexCandidates.
    3308     assert(!setBlockIndexCandidates.empty());
    


    luke-jr commented at 9:46 pm on August 2, 2025:
    Should probably just replace the assert

    Ataraxia009 commented at 3:24 am on August 3, 2025:
    We need to terminate the app after the alert. Keeping it going to lead to unsafe behaviour given that the data is corrupt.
  8. in src/validation.cpp:3304 in a2590d027e outdated
    3298@@ -3299,6 +3299,11 @@ void Chainstate::PruneBlockIndexCandidates() {
    3299     while (it != setBlockIndexCandidates.end() && setBlockIndexCandidates.value_comp()(*it, m_chain.Tip())) {
    3300         setBlockIndexCandidates.erase(it++);
    3301     }
    3302+
    3303+    if (setBlockIndexCandidates.empty()) {
    3304+        m_chainman.GetNotifications().fatalError(_("Seems like your data is corrupt. Try running -reindex-chainstate or -reindex. Shutting down..."));
    


    luke-jr commented at 9:46 pm on August 2, 2025:
    Adding command line options is non-trivial for GUI users. We should use the existing “data corrupt” message/prompt, so they can click a button to reindex.

    Ataraxia009 commented at 3:28 am on August 3, 2025:

    I agree this is a bad experience for non technical users.

    But it seems like this is the standard right now in the code, to give this kind of alert on the code base (ie tell the the user to reindex instead of a direct action button)

    If there is somewhere that actually has a button that reindexes, show me, I’ll use it. If not, adding this functionality is work for another pull request.


    luke-jr commented at 9:15 pm on August 4, 2025:

    There already is, in src/init.cpp (“Do you want to rebuild the databases now?”)

    (But I’m not sure we can detect this condition that early)

  9. luke-jr changes_requested
  10. Adding alert for failure to prevent dead-end user crash 099fb8502c
  11. Ataraxia009 force-pushed on Aug 3, 2025
  12. DrahtBot removed the label CI failed on Aug 3, 2025
  13. luke-jr commented at 9:43 pm on August 4, 2025: member
    Looking at the relevant code, this feels like a symptom of a problem we could/should detect earlier somehow (which may also be early enough to follow the reindex GUI path)
  14. Ataraxia009 commented at 1:42 am on August 5, 2025: none

    Looking at the relevant code, this feels like a symptom of a problem we could/should detect earlier somehow (which may also be early enough to follow the reindex GUI path)

    Yeah agreed overall. However i cant repro and don’t have the data dir. That being said this deadend the user to a crash with no explanation or logs. And the crash is repeated on every launch. This would just a lot of node runners to give up and move on. Atleast this way they have some way out. It’s quite a soft change.

  15. mzumsande commented at 7:59 pm on August 7, 2025: contributor

    Yeah agreed overall. However i cant repro and don’t have the data dir. That being said this deadend the user to a crash with no explanation or logs. And the crash is repeated on every launch. This would just a lot of node runners to give up and move on. Atleast this way they have some way out. It’s quite a soft change.

    The problem is that this is added to such a central spot that you could imagine other ways leading to this assert failing, in particular ways that have nothing to do with db corruption - in the worst case a logic bug in validation that we would want to be reported instead of telling the user to reindex, assume data corruption / blame LevelDB or the own hardware and forget about it.

    Therefore it’s probably best to recognize index corruption better in BlockManager::LoadBlockIndex() or similar - for which an explanation how this occurred, or a way to reproduce would be helpful. Just telling the user to reindex on a hunch seems not great.

  16. Ataraxia009 commented at 5:06 am on August 12, 2025: none

    I agree with @mzumsande here. At a second glance this alert is too deep into the flow when the bad state can easily be detected somewhere between LoadBlockIndex() and LoadChainTip() which more directly would imply a db corruption state.

    Unfortunately I lost access to the datadir to reproduce and test this. If it comes up again will fix by moving the alert up, closing for now.

  17. Ataraxia009 closed this on Aug 12, 2025


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-08-13 06:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me