test: fix feature_init.py intermittencies #32835

pull furszy wants to merge 2 commits into bitcoin:master from furszy:2025_feature_init_fixes changing 1 files +82 −27
  1. furszy commented at 2:41 pm on June 30, 2025: member

    Aims to solve #32600. Found it while working on #26966 (this was really annoying there).

    This ensures the node is index-synced before perturbing files. If the index sync gets interrupted before it starts, the database could be empty, making any following perturbation ineffective (which explains why the node does not abort during startup in the #32600 logs).

    Also, the first commit avoids initializing components not under test. This reduces log flooding, which helped in understanding the issue.

    Patch to reproduce the issue on master using feature_init.py (this simulates a node shutting down before the index starts syncing):

     0diff --git a/src/index/base.cpp b/src/index/base.cpp
     1--- a/src/index/base.cpp	(revision 1e03052c3fefb188f047e72548f2c6b0cc019e50)
     2+++ b/src/index/base.cpp	(date 1751293306725)
     3@@ -185,6 +185,7 @@
     4 void BaseIndex::Sync()
     5 {
     6     const CBlockIndex* pindex = m_best_block_index.load();
     7+    m_interrupt();
     8     if (!m_synced) {
     9         std::chrono::steady_clock::time_point last_log_time{0s};
    10         std::chrono::steady_clock::time_point last_locator_write_time{0s};
    
  2. DrahtBot renamed this:
    test: fix feature_init.py intermittencies
    test: fix feature_init.py intermittencies
    on Jun 30, 2025
  3. DrahtBot added the label Tests on Jun 30, 2025
  4. DrahtBot commented at 2:41 pm on June 30, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32835.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK hodlinator, maflcko, achow101
    Concept ACK theStack
    Approach ACK TheCharlatan

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32427 ((RFC) kernel: Replace leveldb-based BlockTreeDB with flat-file based store by TheCharlatan)
    • #30469 (index: Fix coinstats overflow by fjahr)
    • #26966 (index: initial sync speedup, parallelize process by furszy)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. TheCharlatan commented at 2:58 pm on June 30, 2025: contributor
    Approach ACK
  6. in test/functional/feature_init.py:105 in 1e03052c3f outdated
     99@@ -100,39 +100,90 @@ def check_clean_start():
    100         check_clean_start()
    101         self.stop_node(0)
    102 
    103+        # Prior to deleting/perturbing index files, ensure indexes are synchronized (i.e., data exists to modify)
    104+        self.start_node(0, extra_args=args)
    105+        tip_height = node.getblockcount()
    106+        self.wait_until(lambda: all(i["synced"] and i["best_block_height"] == tip_height for i in node.getindexinfo().values()))
    107+        self.stop_node(0)
    


    maflcko commented at 3:07 pm on June 30, 2025:
    any reason to start the node again? Can’t this wait_until be done in check_clean_start?

    furszy commented at 3:30 pm on June 30, 2025:

    any reason to start the node again? Can’t this wait_until be done in check_clean_start?

    Just because check_clean_start() is also called inside the deletion rounds and didn’t want to start all indexes there on each cycle round. But that seems easily solvable.. one minute..


    furszy commented at 3:37 pm on June 30, 2025:
    Done.
  7. furszy force-pushed on Jun 30, 2025
  8. in test/functional/feature_init.py:104 in 2a9ae6f162 outdated
     99@@ -97,42 +100,88 @@ def check_clean_start():
    100             self.log.debug("Terminating node after terminate line was found")
    101             sigterm_node()
    102 
    103-        check_clean_start()
    104+        # Prior to deleting/perturbing index files, ensure indexes are synchronized (i.e., data exists to modify)
    105+        check_clean_start(args, lambda n: self.wait_until(lambda h=n.getblockcount(): all(i["synced"] and i["best_block_height"] == h for i in n.getindexinfo().values())))
    


    maflcko commented at 3:41 pm on June 30, 2025:
    this fails lint and getindexinfo is already a no-op when no args are provided, so no need to conditionally run it? could just inline it?

    furszy commented at 3:53 pm on June 30, 2025:

    this fails lint and getindexinfo is already a no-op when no args are provided, so no need to conditionally run it? could just inline it?

    Saw it late, pushed.

  9. furszy force-pushed on Jun 30, 2025
  10. DrahtBot added the label CI failed on Jun 30, 2025
  11. DrahtBot commented at 3:44 pm on June 30, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/runs/45063074304 LLM reason (✨ experimental): Lint check failed due to a Python code style error identified by ruff.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  12. furszy force-pushed on Jun 30, 2025
  13. DrahtBot removed the label CI failed on Jun 30, 2025
  14. fanquake commented at 12:39 pm on July 2, 2025: member
  15. in test/functional/feature_init.py:59 in e36e0b1aeb outdated
    59-        def check_clean_start():
    60+        def check_clean_start(args=None):
    61             """Ensure that node restarts successfully after various interrupts."""
    62-            node.start()
    63+            extra_args = args or []
    64+            node.start(extra_args)
    


    hodlinator commented at 1:08 pm on July 2, 2025:

    nit: Could simplify:

    0        def check_clean_start(args=[]):
    1            """Ensure that node restarts successfully after various interrupts."""
    2            node.start(args)
    

    furszy commented at 11:45 pm on July 2, 2025:

    hodlinator commented at 1:08 pm on July 4, 2025:
    Oof, that’s nasty.
  16. in test/functional/feature_init.py:215 in e36e0b1aeb outdated
    211@@ -158,7 +212,10 @@ def check_clean_start():
    212                     tf.seek(150)
    213                     tf.write(b"1" * 200)
    214 
    215-            start_expecting_error(err_fragment)
    216+            self.log.info(f"Expecting error frag {err_fragment}")
    


    hodlinator commented at 1:12 pm on July 2, 2025:

    nit: Could add quotes and make output as verbose as variables:

    0            self.log.info(f"Expecting error fragment: {err_fragment!r}")
    

    furszy commented at 11:55 pm on July 2, 2025:
    This is actually just a leftover from debugging the issue in #26966.. sorry for that. Will just remove the line.
  17. in test/functional/feature_init.py:186 in e36e0b1aeb outdated
    202                 self.log.info(f"Deleting file to ensure failure {target_file}")
    203                 bak_path = str(target_file) + ".bak"
    204                 target_file.rename(bak_path)
    205 
    206-            start_expecting_error(err_fragment)
    207+            start_expecting_error(err_fragment, startup_args)
    


    hodlinator commented at 1:16 pm on July 2, 2025:

    nit: It might be good to send in the same startup_args to check_clean_start() further down in the loop? https://github.com/bitcoin/bitcoin/blob/e36e0b1aebe437c090ff897d27bbec0a7c5a100a/test/functional/feature_init.py#L192

    (Side remark: It is unbalanced that we don’t call check_clean_start() at the bottom of the perturbation-loop, but saves execution time and should be alright as we check for different error messages).


    furszy commented at 11:50 pm on July 2, 2025:

    It might be good to send in the same startup_args to check_clean_start() further down in the loop?

    I don’t think so. All files are restored from the backup after deletion, so we know for sure the indexes are synced at that point (nothing changed).


    hodlinator commented at 1:16 pm on July 4, 2025:

    If we know for sure, why are we running check_clean_start()? IMO we should do one of:

    • Trust that we fully restored the state and remove check_clean_start() here.
    • Run check_clean_start(startup_args) to verify that the index being tested was restored.

    furszy commented at 1:46 pm on July 4, 2025:

    If we know for sure, why are we running check_clean_start()?

    I assume we are doing it to verify nothing else broke after deleting + restoring some data, which is mostly relevant for chain data. But you have a fair point. Pushed.

  18. in test/functional/feature_init.py:124 in e36e0b1aeb outdated
    146-            'indexes/blockfilter/basic/db/*.*': 'LevelDB error: Corruption',
    147-            'indexes/coinstats/db/*.*': 'LevelDB error: Corruption',
    148-            'indexes/txindex/*.log': 'LevelDB error: Corruption',
    149-            'indexes/txindex/CURRENT': 'LevelDB error: Corruption',
    150-            # Perturbing these files does not result in a startup error:
    151-            # 'indexes/blockfilter/basic/*.dat', 'indexes/txindex/MANIFEST*', 'indexes/txindex/LOCK'
    


    hodlinator commented at 7:36 pm on July 2, 2025:
    This comment for the perturbed files is lost, unlike the one for the deleted files above.

    furszy commented at 11:47 pm on July 2, 2025:
    ups, reverting. Thanks.
  19. in test/functional/feature_init.py:116 in e36e0b1aeb outdated
    118+                'filepath_glob': 'blocks/index/*.ldb',
    119+                'error_message': 'Error opening block database.',
    120+                'startup_args': [],
    121+            },
    122+            {
    123+                'filepath_glob': 'chainstate/*.ldb',
    


    hodlinator commented at 7:53 pm on July 2, 2025:

    nit: Could use @dataclass for slightly less stringly typed code, and tuple for immutability:

      0diff --git a/test/functional/feature_init.py b/test/functional/feature_init.py
      1index 6ee075f906..1913002962 100755
      2--- a/test/functional/feature_init.py
      3+++ b/test/functional/feature_init.py
      4@@ -9,6 +9,7 @@ import platform
      5 import shutil
      6 import signal
      7 import subprocess
      8+from dataclasses import dataclass, field
      9 
     10 from test_framework.test_framework import BitcoinTestFramework
     11 from test_framework.test_node import (
     12@@ -107,74 +108,55 @@ class InitTest(BitcoinTestFramework):
     13 
     14         self.log.info("Test startup errors after removing certain essential files")
     15 
     16-        deletion_rounds = [
     17-            {
     18-                'filepath_glob': 'blocks/index/*.ldb',
     19-                'error_message': 'Error opening block database.',
     20-                'startup_args': [],
     21-            },
     22-            {
     23-                'filepath_glob': 'chainstate/*.ldb',
     24-                'error_message': 'Error opening coins database.',
     25-                'startup_args': ['-checklevel=4'],
     26-            },
     27-            {
     28-                'filepath_glob': 'blocks/blk*.dat',
     29-                'error_message': 'Error loading block database.',
     30-                'startup_args': ['-checkblocks=200', '-checklevel=4'],
     31-            },
     32-            {
     33-                'filepath_glob': 'indexes/txindex/MANIFEST*',
     34-                'error_message': 'LevelDB error: Corruption: CURRENT points to a non-existent file',
     35-                'startup_args': ['-txindex=1'],
     36-            },
     37+ [@dataclass](/bitcoin-bitcoin/contributor/dataclass/)
     38+        class RoundInfo:
     39+            filepath_glob: str
     40+            error_message: str
     41+            startup_args: list[str] = field(default_factory=list)
     42+
     43+        deletion_rounds: tuple[RoundInfo, ...] = (
     44+            RoundInfo(filepath_glob='blocks/index/*.ldb',
     45+                      error_message='Error opening block database.'),
     46+            RoundInfo(filepath_glob='chainstate/*.ldb',
     47+                      error_message='Error opening coins database.',
     48+                      startup_args=['-checklevel=4']),
     49+            RoundInfo(filepath_glob='blocks/blk*.dat',
     50+                      error_message='Error loading block database.',
     51+                      startup_args=['-checkblocks=200', '-checklevel=4']),
     52+            RoundInfo(filepath_glob='indexes/txindex/MANIFEST*',
     53+                      error_message='LevelDB error: Corruption: CURRENT points to a non-existent file.',
     54+                      startup_args=['-txindex=1']),
     55             # Removing these files does not result in a startup error:
     56             # 'indexes/blockfilter/basic/*.dat', 'indexes/blockfilter/basic/db/*.*', 'indexes/coinstats/db/*.*',
     57             # 'indexes/txindex/*.log', 'indexes/txindex/CURRENT', 'indexes/txindex/LOCK'
     58-        ]
     59-
     60-        perturbation_rounds = [
     61-            {
     62-                'filepath_glob': 'blocks/index/*.ldb',
     63-                'error_message': 'Error loading block database.',
     64-                'startup_args': [],
     65-            },
     66-            {
     67-                'filepath_glob': 'chainstate/*.ldb',
     68-                'error_message': 'Error opening coins database.',
     69-                'startup_args': [],
     70-            },
     71-            {
     72-                'filepath_glob': 'blocks/blk*.dat',
     73-                'error_message': 'Corrupted block database detected.',
     74-                'startup_args': ['-checkblocks=200', '-checklevel=4'],
     75-            },
     76-            {
     77-                'filepath_glob': 'indexes/blockfilter/basic/db/*.*',
     78-                'error_message': 'LevelDB error: Corruption',
     79-                'startup_args': ['-blockfilterindex=1'],
     80-            },
     81-            {
     82-                'filepath_glob': 'indexes/coinstats/db/*.*',
     83-                'error_message': 'LevelDB error: Corruption',
     84-                'startup_args': ['-coinstatsindex=1'],
     85-            },
     86-            {
     87-                'filepath_glob': 'indexes/txindex/*.log',
     88-                'error_message': 'LevelDB error: Corruption',
     89-                'startup_args': ['-txindex=1'],
     90-            },
     91-            {
     92-                'filepath_glob': 'indexes/txindex/CURRENT',
     93-                'error_message': 'LevelDB error: Corruption',
     94-                'startup_args': ['-txindex=1'],
     95-            },
     96-        ]
     97+        )
     98+
     99+        perturbation_rounds: tuple[RoundInfo, ...] = (
    100+            RoundInfo(filepath_glob='blocks/index/*.ldb',
    101+                      error_message='Error loading block database.'),
    102+            RoundInfo(filepath_glob='chainstate/*.ldb',
    103+                      error_message='Error opening coins database.'),
    104+            RoundInfo(filepath_glob='blocks/blk*.dat',
    105+                      error_message='Corrupted block database detected.',
    106+                      startup_args=['-checkblocks=200', '-checklevel=4']),
    107+            RoundInfo(filepath_glob='indexes/blockfilter/basic/db/*.*',
    108+                      error_message='LevelDB error: Corruption',
    109+                      startup_args=['-blockfilterindex=1']),
    110+            RoundInfo(filepath_glob='indexes/coinstats/db/*.*',
    111+                      error_message='LevelDB error: Corruption',
    112+                      startup_args=['-coinstatsindex=1']),
    113+            RoundInfo(filepath_glob='indexes/txindex/*.log',
    114+                      error_message='LevelDB error: Corruption',
    115+                      startup_args=['-txindex=1']),
    116+            RoundInfo(filepath_glob='indexes/txindex/CURRENT',
    117+                      error_message='LevelDB error: Corruption',
    118+                      startup_args=['-txindex=1']),
    119+        )
    120 
    121         for round_info in deletion_rounds:
    122-            file_patt = round_info['filepath_glob']
    123-            err_fragment = round_info['error_message']
    124-            startup_args = round_info['startup_args']
    125+            file_patt = round_info.filepath_glob
    126+            err_fragment = round_info.error_message
    127+            startup_args = round_info.startup_args
    128             target_files = list(node.chain_path.glob(file_patt))
    129 
    130             for target_file in target_files:
    131@@ -195,9 +177,9 @@ class InitTest(BitcoinTestFramework):
    132         self.log.info("Test startup errors after perturbing certain essential files")
    133         dirs = ["blocks", "chainstate", "indexes"]
    134         for round_info in perturbation_rounds:
    135-            file_patt = round_info['filepath_glob']
    136-            err_fragment = round_info['error_message']
    137-            startup_args = round_info['startup_args']
    138+            file_patt = round_info.filepath_glob
    139+            err_fragment = round_info.error_message
    140+            startup_args = round_info.startup_args
    141 
    142             for dir in dirs:
    143                 shutil.copytree(node.chain_path / dir, node.chain_path / f"{dir}_bak")
    

    furszy commented at 11:53 pm on July 2, 2025:
    Thanks, but I’m about a ~0 here. I don’t really find that change better. I’ll just leave the current code unless someone feels strongly about changing it
  20. hodlinator commented at 8:33 pm on July 2, 2025: contributor

    Concept ACK e36e0b1aebe437c090ff897d27bbec0a7c5a100a

    Concept: Theory for intermittent failure in #32600 makes sense, better to wait for all indexes before shutting down.

    Testing: Successfully tested with the suggested diff modifying BaseIndex::Sync() to prevent syncing.

    Approach: Left some nits, but most important thing I found was a comment that was probably accidentally removed.

  21. test: feature_init, only init what's needed per perturbation/deletion round
    Avoids initializing and syncing components not under test.
    This not only speeds up execution a bit but also helps isolate
    and debug issues more easily, as logs aren't flooded with
    unrelated details.
    abd07cf733
  22. furszy force-pushed on Jul 3, 2025
  23. furszy commented at 0:00 am on July 3, 2025: member
    Updated per feedback. Thanks @hodlinator
  24. DrahtBot added the label CI failed on Jul 3, 2025
  25. DrahtBot removed the label CI failed on Jul 3, 2025
  26. theStack commented at 3:19 pm on July 3, 2025: contributor
    Concept ACK
  27. test: feature_init, ensure indexes are synced prior to perturbing files 4207d9bf82
  28. furszy force-pushed on Jul 4, 2025
  29. hodlinator approved
  30. hodlinator commented at 12:26 pm on July 7, 2025: contributor

    ACK 4207d9bf823bea9f94b484ebf3c9274ca781b245

    Changes since my concept review (https://github.com/bitcoin/bitcoin/pull/32835#issuecomment-3029241989):

    (Would still prefer less stringly typed code, but have a feeling my suggestion in #32835 (review) could probably be improved).

  31. DrahtBot requested review from TheCharlatan on Jul 7, 2025
  32. DrahtBot requested review from theStack on Jul 7, 2025
  33. maflcko commented at 2:04 pm on July 7, 2025: member

    lgtm ACK 4207d9bf823bea9f94b484ebf3c9274ca781b245 🍄

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: lgtm ACK 4207d9bf823bea9f94b484ebf3c9274ca781b245 🍄
    32EXDV5heWoQfQZkpGwQh96nSo8LQOg6o+F9Pz8zWpupTdDL71GjRRQw5M7JI1Td094wyEHaiN9RTI4W/Cj9MBQ==
    
  34. achow101 commented at 0:09 am on July 8, 2025: member
    ACK 4207d9bf823bea9f94b484ebf3c9274ca781b245
  35. achow101 merged this on Jul 8, 2025
  36. achow101 closed this on Jul 8, 2025

  37. furszy deleted the branch on Jul 8, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-07-23 00:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me