init: return error when block index is non-contiguous, fix feature_init.py file perturbation #27823

pull mzumsande wants to merge 2 commits into bitcoin:master from mzumsande:202306_feature_init_fix changing 2 files +24 −5
  1. mzumsande commented at 6:55 pm on June 4, 2023: contributor

    When the block index database is non-contiguous due to file corruption (i.e. it contains indexes of height x-1 and x+1, but not x), bitcoind can currently crash with an assert in BuildSkip() / GetAncestor() during BlockManager::LoadBlockIndex():

    0bitcoind: chain.cpp:112: const CBlockIndex* CBlockIndex::GetAncestor(int) const: Assertion `pindexWalk->pprev' failed.
    

    This PR changes it such that we instead return an InitError to the user.

    I stumbled upon this because I noticed that the file perturbation in feature_init.py wasn’t working as intended, which is fixed in the second commit:

    • Opening the file twice in one with statement would lead to tf_read being empty, so the test wouldn’t perturb anything but replace the file with a new one. Fixed by first opening for read, then for write.
    • We need to restore the previous state after perturbations, so that only the current perturbation is active and not a mix of the current and previous ones.
    • I also added checkblocks=200 to the startup parameters so that corruption in earlier blocks of blk00000.dat is detected during init verification and not ignored.

    After fixing feature_init.py like that I’d run into the assert mentioned above (so running the testfix from the second commit without the first one is a way to reproduce it).

  2. DrahtBot commented at 6:55 pm on June 4, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK furszy, fjahr, achow101
    Concept ACK jonatack, TheCharlatan

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. in src/node/blockstorage.cpp:266 in fe00062217 outdated
    259@@ -260,8 +260,11 @@ bool BlockManager::LoadBlockIndex()
    260     std::sort(vSortedByHeight.begin(), vSortedByHeight.end(),
    261               CBlockIndexHeightOnlyComparator());
    262 
    263+    CBlockIndex* previous_index{nullptr};
    264     for (CBlockIndex* pindex : vSortedByHeight) {
    265         if (ShutdownRequested()) return false;
    266+        if (previous_index && pindex->nHeight > previous_index->nHeight + 1) return false; // corruption, non-contiguous block indexes
    


    Brotcrunsher commented at 7:27 pm on June 4, 2023:
    Wouldn’t it be better to check for != instead for >? If the file is corrupt, then the indices might as well suddenly go down.

    mzumsande commented at 7:32 pm on June 4, 2023:
    I think that would be incorrect, because there will be multiple indexes for a given height in case of stale blocks / forks.

    Brotcrunsher commented at 7:32 pm on June 4, 2023:
    True!
  4. Brotcrunsher commented at 7:34 pm on June 4, 2023: contributor

    Little disclaimer: It’s my first Review for Bitcoin Core. So treat my input with care, I might have no clue what I am talking about.

    What I don’t like about this change is that previously the user at least had a chance of finding out what’s wrong, without debugging the code. Now he won’t have any info of what went wrong other than “Error loading block database”, which can have multiple other reasons. Is there a way for us to communicate this better? Maybe a log? Or can we throw here? The sad thing is that the return seems to be intermangled with none error cases like ShutdownRequested.

  5. mzumsande force-pushed on Jun 4, 2023
  6. in src/node/blockstorage.cpp:268 in 0b8b0e3051 outdated
    259@@ -260,8 +260,11 @@ bool BlockManager::LoadBlockIndex()
    260     std::sort(vSortedByHeight.begin(), vSortedByHeight.end(),
    261               CBlockIndexHeightOnlyComparator());
    262 
    263+    CBlockIndex* previous_index{nullptr};
    264     for (CBlockIndex* pindex : vSortedByHeight) {
    265         if (ShutdownRequested()) return false;
    266+        if (previous_index && pindex->nHeight > previous_index->nHeight + 1) return false; // corruption, non-contiguous block indexes
    267+        previous_index = pindex;
    


    furszy commented at 1:11 pm on June 5, 2023:
    Couldn’t vSortedByHeight contain blocks from different chains at the same height?

    mzumsande commented at 2:30 pm on June 5, 2023:

    What do you mean with “different chains”? vSortedHeight could contain blocks with the same height x (forks), but I can’t see how these could have made it into the block index without having a predecessor of height x - 1 at the time they were added. Also, these shouldn’t make the added check fail.

    I do wonder though if it would be possible / more general to directly check that each block of height > 0 has a predecessor in the chain, instead of doing what the PR currently suggests. I’ll look into that.


    furszy commented at 3:09 pm on June 5, 2023:

    What do you mean with “different chains”? vSortedHeight could contain blocks with the same height x (forks), but I can’t see how these could have made it into the block index without having a predecessor of height x - 1 at the time they were added. Also, these shouldn’t make the added check fail.

    yeah ok. nvm. Got confused by the early return (I thought it in the other way around). The check will have the previous block at the same height, so pindex->nHeight > previous_index->nHeight + 1 will be false.

    I do wonder though if it would be possible / more general to directly check that each block of height > 0 has a predecessor in the chain, instead of doing what the PR currently suggests. I’ll look into that.

    Wouldn’t that be broken if we ever add a parallel headers download process? Which IIRC we don’t have because they are downloaded from a single peer at time. But if we implement it, we could get orphan block indexes pretty easily.


    mzumsande commented at 5:22 pm on June 5, 2023:

    Wouldn’t that be broken if we ever add a parallel headers download process? Which IIRC we don’t have because they are downloaded from a single peer at time. But if we implement it, we could get orphan block indexes pretty easily.

    Yes, but only we had a very naive parallel headers download process. I would imagine that if we ever parallelized headers download, we’d take care not to accept orphaned block indexes into our main block index (for DoS reasons) but would keep those headers in some kind of intermediate buffer until we can connect them.

    net_processing makes sure that we only save headers that we connect (code).

  7. mzumsande commented at 2:34 pm on June 5, 2023: contributor

    Now he won’t have any info of what went wrong other than “Error loading block database”, which can have multiple other reasons. Is there a way for us to communicate this better? Maybe a log? Or can we throw here?

    Thanks! Yes, adding a log seems like a good idea, I’ll do that with the next push!

  8. mzumsande force-pushed on Jun 15, 2023
  9. mzumsande commented at 6:53 pm on June 15, 2023: contributor

    Added a Log error, and made the commit message more precise.

    I do wonder though if it would be possible / more general to directly check that each block of height > 0 has a predecessor in the chain, instead of doing what the PR currently suggests.

    Since the current approach wouldn’t catch all possible cases of a non-contiguous block index (but the one encountered in feature_init.py !), still looking to into having a stricter check - opinions welcome!

  10. DrahtBot added the label CI failed on Jun 15, 2023
  11. in src/node/blockstorage.cpp:266 in 5352a981de outdated
    259@@ -260,8 +260,13 @@ bool BlockManager::LoadBlockIndex()
    260     std::sort(vSortedByHeight.begin(), vSortedByHeight.end(),
    261               CBlockIndexHeightOnlyComparator());
    262 
    263+    CBlockIndex* previous_index{nullptr};
    264     for (CBlockIndex* pindex : vSortedByHeight) {
    265         if (ShutdownRequested()) return false;
    266+        if (previous_index && pindex->nHeight > previous_index->nHeight + 1) {
    267+            return error("%s: block index is non-contiguous, index of height %d missing", __func__, previous_index->nHeight + 1);
    


    jonatack commented at 8:42 pm on June 15, 2023:

    I’ve probably confused myself by how I’m checking this, but this doesn’t seem to be the error site that is being hit? The diff below passes. (Your updated test does indeed fail on master and pass after building this pull, as described in the OP.)

     0diff --git a/test/functional/feature_init.py b/test/functional/feature_init.py
     1index 194ba9ed265..38bd3d5db0f 100755
     2--- a/test/functional/feature_init.py
     3+++ b/test/functional/feature_init.py
     4@@ -107,6 +107,12 @@ class InitStressTest(BitcoinTestFramework):
     5             'blocks/blk*.dat': 'Corrupted block database detected.',
     6         }
     7 
     8+        errors_logged = [
     9+            "ERROR: LoadBlockIndex: block index is non-contiguous",
    10+            "LevelDB read failure: Corruption: not an sstable (bad magic number)",
    11+            "ReadBlockFromDisk(CBlock&, CBlockIndex*): GetHash() doesn't match index for CBlockIndex",
    12+        ]
    13+
    14         for file_patt, err_fragment in files_to_delete.items():
    15             target_files = list(node.chain_path.glob(file_patt))
    16 
    17@@ -126,6 +132,7 @@ class InitStressTest(BitcoinTestFramework):
    18             self.stop_node(0)
    19 
    20         self.log.info("Test startup errors after perturbing certain essential files")
    21+        i = 0
    22         for file_patt, err_fragment in files_to_perturb.items():
    23             shutil.copytree(node.chain_path / "blocks", node.chain_path / "blocks_bak")
    24             shutil.copytree(node.chain_path / "chainstate", node.chain_path / "chainstate_bak")
    25@@ -139,8 +146,10 @@ class InitStressTest(BitcoinTestFramework):
    26                     tweaked_contents[50:350] = b'1' * 300
    27                 with open(target_file, "wb") as tf_write:
    28                     tf_write.write(bytes(tweaked_contents))
    29
    30-            start_expecting_error(err_fragment)
    31+            self.log.info(f"{i} - {err_fragment} - {errors_logged[i]}")
    32+            with node.assert_debug_log([errors_logged[i]]):
    33+                start_expecting_error(err_fragment)
    34+            i += 1
    35 
    36             shutil.rmtree(node.chain_path / "blocks")
    37             shutil.rmtree(node.chain_path / "chainstate")
    

    mzumsande commented at 2:41 pm on June 16, 2023:

    The diff below passes.

    Where does it pass for you? I tried to apply the diff too, and for me it passes on this branch and fails after removing 04d533781002be054e6f2493f3ab2389135ceec5 (not because the log message isn’t found, but because the bitcoin core crashes with the assert before that). That is what I would have expected, since the diff adds an additional check for the added log “ERROR: LoadBlockIndex: block index is non-contiguous”, which is supposed to be hit as a result of the perturbation of the block files.


    fjahr commented at 12:15 pm on September 28, 2023:
    nit (coming from a non-native speaker): isn’t non-continuous the more fitting description (meaning “(un)interrupted sequence”) rather than non-contiguous (meaning “(not) sharing a common boundary”)?

    mzumsande commented at 6:44 pm on October 5, 2023:
    Not a native speaker either, but for me “contiguous” has more of a spatial connotation (like blocks building a chain by being attached to each other), whereas “continuous” has a more temporal connotation.
  12. jonatack commented at 8:43 pm on June 15, 2023: contributor
    Concept ACK
  13. TheCharlatan commented at 1:23 pm on June 20, 2023: contributor
    Concept ACK
  14. mzumsande force-pushed on Jun 28, 2023
  15. mzumsande commented at 2:44 pm on June 28, 2023: contributor

    Could rebase for CI?

    done

  16. DrahtBot removed the label CI failed on Jun 29, 2023
  17. DrahtBot added the label Needs rebase on Jul 6, 2023
  18. init: abort loading of blockindex in case of missing height.
    If a height is missing we are facing a non-contiguous block index db, and could previously
    hit an assert in GetAncestor() called from BuildSkip() instead of returning an error.
    ad66ca1e47
  19. mzumsande force-pushed on Jul 18, 2023
  20. in src/node/blockstorage.cpp:268 in ad66ca1e47 outdated
    263     for (CBlockIndex* pindex : vSortedByHeight) {
    264         if (m_interrupt) return false;
    265+        if (previous_index && pindex->nHeight > previous_index->nHeight + 1) {
    266+            return error("%s: block index is non-contiguous, index of height %d missing", __func__, previous_index->nHeight + 1);
    267+        }
    268+        previous_index = pindex;
    


    furszy commented at 3:46 pm on July 18, 2023:

    In ad66ca1e:

    What if the genesis block index is missing? The first round of the loop would set previous_index to block_index_1. Second one to block_index_2, etc. Making LoadBlockIndex() pass?.


    mzumsande commented at 8:16 pm on July 18, 2023:

    That is true, but the specific case of a missing (or incorrect) genesis block index should get caught in an extra check right after, see here.

    There are other cases that wouldn’t be caught with the proposed solution: We could have the valid chain and also some extra block indices that don’t connect. A stricter check would therefore be to require each block index of height x (except genesis) to have a predecessor of height x - 1. I didn’t add it because these kind of extra blocks wouldn’t actually hurt anything, plus maybe there are some of these block indices out there because of some bug in some ancient version of core.

  21. DrahtBot removed the label Needs rebase on Jul 18, 2023
  22. test: fix feature_init.py file perturbation
    Simultaneously opening the file in read and write mode would
    lead to opening of an empty file instead of perturbing the existing
    file.
    Also, revert to the previous state after each perturbation so that
    each perturbation is applied in isolation.
    d27b9a2248
  23. mzumsande force-pushed on Jul 18, 2023
  24. mzumsande commented at 8:20 pm on July 18, 2023: contributor
    9d33681 to d27b9a2: Added an explanation why the tweaked_contents window of feature_init.py was changed: Since the genesis block is not checked by the -checkblocks verification, the perturbation window must be chosen such that a higher block in blk*.dat is affected.
  25. in test/functional/feature_init.py:142 in d27b9a2248
    139                     tweaked_contents = bytearray(contents)
    140-                    tweaked_contents[50:250] = b'1' * 200
    141+                    # Since the genesis block is not checked by -checkblocks, the
    142+                    # perturbation window must be chosen such that a higher block
    143+                    # in blk*.dat is affected.
    144+                    tweaked_contents[150:350] = b'1' * 200
    


    furszy commented at 8:45 pm on July 18, 2023:
    Maybe could randomize (or restructure) this a bit in a follow-up. Perturbing different parts of the block content might let us detect and add coverage for different errors.

    fjahr commented at 11:52 am on September 28, 2023:
    Potentially sounds like a good first issue
  26. in test/functional/feature_init.py:131 in d27b9a2248
    124@@ -124,18 +125,31 @@ def check_clean_start():
    125             check_clean_start()
    126             self.stop_node(0)
    127 
    128+        self.log.info("Test startup errors after perturbing certain essential files")
    129         for file_patt, err_fragment in files_to_perturb.items():
    130+            shutil.copytree(node.chain_path / "blocks", node.chain_path / "blocks_bak")
    131+            shutil.copytree(node.chain_path / "chainstate", node.chain_path / "chainstate_bak")
    


    furszy commented at 1:54 pm on July 19, 2023:
    nit: couldn’t this be copied outside of the for-loop only once?

    fjahr commented at 11:49 am on September 28, 2023:
    I think it could but then the moves below would need to be copies instead and then there would need to be another two lines needed for cleanup of the left-over files after the loop, so I think it’s alright to keep as is.
  27. furszy commented at 1:55 pm on July 19, 2023: member
    Code ACK d27b9a224
  28. achow101 requested review from fjahr on Sep 20, 2023
  29. achow101 requested review from josibake on Sep 20, 2023
  30. achow101 requested review from stickies-v on Sep 20, 2023
  31. in test/functional/feature_init.py:136 in d27b9a2248
    132             target_files = list(node.chain_path.glob(file_patt))
    133 
    134             for target_file in target_files:
    135                 self.log.info(f"Perturbing file to ensure failure {target_file}")
    136-                with open(target_file, "rb") as tf_read, open(target_file, "wb") as tf_write:
    137+                with open(target_file, "rb") as tf_read:
    


    fjahr commented at 12:00 pm on September 28, 2023:
    I guess I don’t see why we are reading the content if we don’t care about it anyway here. I think we can just override it and that simplifies things. I drafted that in this commit: https://github.com/fjahr/bitcoin/commit/84132abe9bd13845ef611615a30e53fbaffafb05

    mzumsande commented at 6:47 pm on October 5, 2023:
    I agree, that is simpler!
  32. in src/node/blockstorage.cpp:265 in ad66ca1e47 outdated
    258@@ -259,8 +259,13 @@ bool BlockManager::LoadBlockIndex()
    259     std::sort(vSortedByHeight.begin(), vSortedByHeight.end(),
    260               CBlockIndexHeightOnlyComparator());
    261 
    262+    CBlockIndex* previous_index{nullptr};
    263     for (CBlockIndex* pindex : vSortedByHeight) {
    264         if (m_interrupt) return false;
    265+        if (previous_index && pindex->nHeight > previous_index->nHeight + 1) {
    


    fjahr commented at 12:06 pm on September 28, 2023:
    nit: If you retouch I would like it if there was an additional comment how this could happen. The error message below explains what this checks but with these more obscure issues it’s also great to know where it might come from without having to git blame it.
  33. fjahr commented at 12:27 pm on September 28, 2023: contributor

    Code review ACK d27b9a2248476439ddab7700327f074005a810d5

    I have left a few suggestions for improvement but these could be done in a follow-up, along with suggestions from @furszy . If this is merged, I would probably give this to someone else as a good first issue.

  34. fanquake requested review from ryanofsky on Oct 2, 2023
  35. achow101 commented at 7:36 pm on October 4, 2023: member
    ACK d27b9a2248476439ddab7700327f074005a810d5
  36. achow101 merged this on Oct 4, 2023
  37. achow101 closed this on Oct 4, 2023

  38. mzumsande commented at 6:51 pm on October 5, 2023: contributor
    Thanks for the reviews! I didn’t get to address them before the merge (and I think some of the open suggestions could be incorporated in a follow-up). I agree that besides the code simplifications, there is some room for improvement, and opened #28603 for this!
  39. Frank-GER referenced this in commit a7ec6d1270 on Oct 13, 2023
  40. achow101 referenced this in commit 0387ca0774 on Nov 6, 2023
  41. mzumsande deleted the branch on Apr 29, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me