Reindex: save progress to continue after interruption #35071

pull pinheadmz wants to merge 2 commits into bitcoin:master from pinheadmz:reindex-continue changing 3 files +117 −5
  1. pinheadmz commented at 2:22 PM on April 14, 2026: member

    Currently, if the reindex process is interrupted it will start over on next run at blk00000.dat. Even after reindexing is finished when the node is in ActivateBestChain() an interruption may STILL require a full reindex process because DB_REINDEX_FLAG is written false, but not flushed.

    Mentioned in #30424 but I couldn't find any specific follow-up:

    There is no reindex progess (it should pick up the previous work and try to make progess)

    The solution in this PR is simply to write a new field DB_REINDEX_LASTFILE after every block is reindexed, and flush the DB_REINDEX_FLAG setting when the process is complete. The complication is that blocks may be out of order on disk and so as we reindex we store orphan blocks temporarily in memory until they are reconnected with their parent in later files. To ensure that data is recovered, the orphan map is serialized and also saved to the database as DB_REINDEX_ORPHAN_BLOCKS.

    I ran a full chain reindex on a Debian desktop CPUx8 32GB up to block 944994, the results were not super hot: master: 3.5 hours branch: 3.75 hours

    This is a bit of a drag so I'm opening as a draft for now to get feedback and run more tests.

    One idea I had is to either throttle the write to every 100 block files, or only trigger the write at all inside if (chainman.m_interrupt) which would be okay for user interruptions but not hardware or power failure.

  2. test: assert current interrupted-reindex behavior: wipe and start over 46bf05e7ec
  3. blockstorage: save progress during reindex to resume after interrupt
    Adds two new keys to the BlockTreeDB that are written after every
    block file is imported during reindex:
    - The last file read
    - A serialized map of orphan blocks
    
    If a reindex is interrupted, these values are read on restart and
    the reindex progress continues from the checkpoint. This does not
    affect runs with the -reindex flag explicitly set, which always
    wipes the index and starts from blk00000.dat
    1a520f23dd
  4. DrahtBot commented at 2:23 PM on April 14, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32427 ((RFC) kernel: Replace leveldb-based BlockTreeDB with flat-file based store by sedited)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  5. DrahtBot added the label CI failed on Apr 14, 2026
  6. maflcko commented at 5:16 PM on April 14, 2026: member

    Not sure about slowing down the happy path for an edge case: Reindex is already rare (hopefully?), and power outage during reindex should be doubly-rare.

    Also, writing the out-of-order blocks seems duplicate effort. Shouldn't it be trivial and fast to read them from the existing block files instead of going the extra hop through the leveldb?

    I guess it could make sense to have a flame graph showing the actual overhead that is seen when continuing a reindex. Without actual data it is hard to optimize it.

    If I had to guess, is the overhead from FindByte? If yes, my preference would be to just remove it, see #34044 (comment)

    Alternatively, the overhead is so minimal, that it doesn't matter?

  7. pinheadmz commented at 7:03 PM on April 14, 2026: member

    I could've used this at least in the interrupt block last week. Moving data to a bigger drive on my RPi node and messed something up so had to reindex. A few hours in I wanted to change something and hit ctrl-c. When I restarted I wondered why I had lost those hours of progress.

    Also, writing the out-of-order blocks seems duplicate effort. Shouldn't it be trivial and fast to read them from the existing block files instead of going the extra hop through the leveldb?

    Yeah saving the map after every file is a bummer, but we only need to read the map if we restart after an interruption, so there shouldn't be any hopping.

    Alternatively, the overhead is so minimal, that it doesn't matter?

    I could use a bit of clarity on what you're referring to as overhead here ?

  8. maflcko commented at 7:27 PM on April 14, 2026: member

    I could use a bit of clarity on what you're referring to as overhead here ?

    Well, I couldn't find a large overhead myself (but I only tried signet so far), so maybe I am missing something. Let's recall that AcceptBlock is guarded on current master, so any progress in deserializing blocks, and accepting them is properly saved on current master. So the only remaining overhead comes from BufferedFile, but locally and for signet, it was small enough to not matter.

    So I guess it could make sense to see a flame graph or anything else to see where the bottleneck is on your side. I see you have measured a reindex with this branch and saw that it is slower. But have you measured a resume and seen that it is faster? If yes, why is it faster? Knowing this will make it easier to find alternative solutions.

    I can also imagine that the performance depends on the storage device that hosts the blocks dir. If that is on a network drive, then BufferedFile may be slow enough to matter?

  9. mzumsande commented at 1:05 PM on April 15, 2026: contributor

    Not sure about slowing down the happy path for an edge case: Reindex is already rare (hopefully?), and power outage during reindex should be doubly-rare.

    I think agree with that. I think there is a use case for handling user interrupts, or for flushing after the first phase when all block files are indexed, but accommodating unclean restarts during reindex seems too much of a special case.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me