validation: In case of a continued reindex, only activate chain in the end #31439

pull mzumsande wants to merge 2 commits into bitcoin:master from mzumsande:202412_reindex_interrupt changing 1 files +9 −10
  1. mzumsande commented at 6:51 pm on December 6, 2024: contributor

    If a user interrupts a reindex while it is iterating over the block files, it will continue to reindex with the next node start (if the -reindex arg is dropped, otherwise it will start reindexing from scratch). However, due to an early call to ActivateBestChainState() that only exists to connect the genesis block during the original -reindex, it wil start connecting blocks immediately before having iterated through all block files. Because later headers above the minchainwork threshold won’t be loaded in this case, -assumevalid will not be applied and the process is much slower due to script validation being done.

    Fix this by only calling ActivateBestChainState() here if Genesis is not connected yet (equivalent to ActiveHeight() == -1). Also simplify this spot by only doing this for the active chainstate instead of looping over all chainstates (first commit).

    This issue was discussed in the thread below #31346 (review), the impact on assumevalid was found by l0rinc.

    The fix can be tested by manually aborting a -reindex e.g. on signet and observing in the debug log the order in which blockfiles are indexed / blocks are connected with this branch vs master.

  2. DrahtBot commented at 6:51 pm on December 6, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31439.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK ryanofsky, TheCharlatan

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. DrahtBot added the label Validation on Dec 6, 2024
  4. mzumsande force-pushed on Dec 6, 2024
  5. DrahtBot added the label CI failed on Dec 6, 2024
  6. DrahtBot commented at 7:01 pm on December 6, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/34049612281

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. mzumsande force-pushed on Dec 6, 2024
  8. l0rinc commented at 7:51 pm on December 6, 2024: contributor

    @mzumsande, do you happen to know why setting assumevalid to block 859999 works as expected (it is found in the 860001 blocks in m_blockman.m_block_index), so it’s correctly disabling script verification, but setting it to block 860002 (we’re currently at 873543) enables fScriptChecks from the very beginning for some reason (i.e. setting a later block results in having earlier validation).

    Edit:

     02024-12-06T22:37:39Z Synchronizing blockheaders, height: 860000 (~98.48%)
     1@@@m_blockman.m_block_index.size()@@@ = 860001
     2@@@fScriptChecks@@@ = 1
     32024-12-06T22:37:39Z UpdateTip: new best=00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048 height=1 version=0x00000001 log2_work=33.000022 tx=2 date='2009-01-09T02:54:25Z' progress=0.000000 cache=0.3MiB(1txo)
     4@@@m_blockman.m_block_index.size()@@@ = 860001
     5@@@fScriptChecks@@@ = 1
     62024-12-06T22:37:39Z UpdateTip: new best=000000006a625f06636b8bb6ac7b960a8d03705d1ace08b1a19da3fdcc99ddbd height=2 version=0x00000001 log2_work=33.584985 tx=3 date='2009-01-09T02:55:44Z' progress=0.000000 cache=0.3MiB(2txo)
     7@@@m_blockman.m_block_index.size()@@@ = 860001
     8@@@fScriptChecks@@@ = 1
     92024-12-06T22:37:39Z UpdateTip: new best=0000000082b5015589a3fdf2d4baff403e6f0be035a5d9742c1cae6295464449 height=3 version=0x00000001 log2_work=34.000022 tx=4 date='2009-01-09T03:02:53Z' progress=0.000000 cache=0.3MiB(3txo)
    10@@@m_blockman.m_block_index.size()@@@ = 860001
    11@@@fScriptChecks@@@ = 1
    122024-12-06T22:37:39Z UpdateTip: new best=000000004ebadb55ee9096c9a2f8880e09da59c0d68b1c228da88e48844a1485 height=4 version=0x00000001 log2_work=34.321950 tx=5 date='2009-01-09T03:16:28Z' progress=0.000000 cache=0.3MiB(4txo)
    13@@@m_blockman.m_block_index.size()@@@ = 860001
    14@@@fScriptChecks@@@ = 1
    152024-12-06T22:37:39Z UpdateTip: new best=000000009b7262315dbf071787ad3656097b892abffd1f95a1a022f896f533fc height=5 version=0x00000001 log2_work=34.584985 tx=6 date='2009-01-09T03:23:48Z' progress=0.000000 cache=0.3MiB(5txo)
    16@@@m_blockman.m_block_index.size()@@@ = 860001
    17@@@fScriptChecks@@@ = 1
    182024-12-06T22:37:39Z UpdateTip: new best=000000003031a0e73735690c5a1ff2a4be82553b2a12b776fbd3a215dc8f778d height=6 version=0x00000001 log2_work=34.807377 tx=7 date='2009-01-09T03:29:49Z' progress=0.000000 cache=0.3MiB(6txo)
    19@@@m_blockman.m_block_index.size()@@@ = 860001
    20@@@fScriptChecks@@@ = 1
    212024-12-06T22:37:39Z UpdateTip: new best=0000000071966c2b1d065fd446b1e485b2c9d9594acd2007ccbd5441cfc89444 height=7 version=0x00000001 log2_work=35.000022 tx=8 date='2009-01-09T03:39:29Z' progress=0.000000 cache=0.3MiB(7txo)
    22@@@m_blockman.m_block_index.size()@@@ = 860001
    23@@@fScriptChecks@@@ = 1
    242024-12-06T22:37:39Z UpdateTip: new best=00000000408c48f847aa786c2268fc3e6ec2af68e8468a34a28c61b7f1de0dc6 height=8 version=0x00000001 log2_work=35.169947 tx=9 date='2009-01-09T03:45:43Z' progress=0.000000 cache=0.3MiB(8txo)
    25@@@m_blockman.m_block_index.size()@@@ = 860001
    26@@@fScriptChecks@@@ = 1
    272024-12-06T22:37:39Z UpdateTip: new best=000000008d9dc510f23c2657fc4f67bea30078cc05a90eb89e84cc475c080805 height=9 version=0x00000001 log2_work=35.321950 tx=10 date='2009-01-09T03:54:39Z' progress=0.000000 cache=0.3MiB(9txo)
    28@@@m_blockman.m_block_index.size()@@@ = 860001
    29@@@fScriptChecks@@@ = 1
    302024-12-06T22:37:39Z UpdateTip: new best=000000002c05cc2e78923c34df87fd108b22221ac6076c18f3ade378a4d915e9 height=10 version=0x00000001 log2_work=35.459454 tx=11 date='2009-01-09T04:05:52Z' progress=0.000000 cache=0.3MiB(10txo)
    31@@@m_blockman.m_block_index.size()@@@ = 860001
    32@@@fScriptChecks@@@ = 1
    332024-12-06T22:37:39Z UpdateTip: new best=0000000097be56d606cdd9c54b04d4747e957d3608abe69198c661f2add73073 height=11 version=0x00000001 log2_work=35.584985 tx=12 date='2009-01-09T04:12:40Z' progress=0.000000 cache=0.3MiB(11txo)
    34@@@m_blockman.m_block_index.size()@@@ = 860001
    35@@@fScriptChecks@@@ = 1
    362024-12-06T22:37:39Z UpdateTip: new best=0000000027c2488e2510d1acf4369787784fa20ee084c258b58d9fbd43802b5e height=12 version=0x00000001 log2_work=35.700462 tx=13 date='2009-01-09T04:21:28Z' progress=0.000000 cache=0.3MiB(12txo)
    37@@@m_blockman.m_block_index.size()@@@ = 860001
    38@@@fScriptChecks@@@ = 1
    392024-12-06T22:37:40Z UpdateTip: new best=000000005c51de2031a895adc145ee2242e919a01c6d61fb222a54a54b4d3089 height=13 version=0x00000001 log2_work=35.807377 tx=14 date='2009-01-09T04:23:40Z' progress=0.000000 cache=0.3MiB(13txo)
    40@@@m_blockman.m_block_index.size()@@@ = 860001
    41@@@fScriptChecks@@@ = 1
    422024-12-06T22:37:41Z UpdateTip: new best=0000000080f17a0c5a67f663a9bc9969eb37e81666d9321125f0e293656f8a37 height=14 version=0x00000001 log2_work=35.906913 tx=15 date='2009-01-09T04:33:09Z' progress=0.000000 cache=0.3MiB(14txo)
    43@@@m_blockman.m_block_index.size()@@@ = 860001
    44@@@fScriptChecks@@@ = 1
    452024-12-06T22:37:42Z UpdateTip: new best=00000000b3322c8c3ef7d2cf6da009a776e6a99ee65ec5a32f3f345712238473 height=15 version=0x00000001 log2_work=36.000022 tx=16 date='2009-01-10T04:45:46Z' progress=0.000000 cache=0.3MiB(15txo)
    46@@@m_blockman.m_block_index.size()@@@ = 860001
    47@@@fScriptChecks@@@ = 1
    482024-12-06T22:37:42Z UpdateTip: new best=00000000174a25bb399b009cc8deff1c4b3ea84df7e93affaaf60dc3416cc4f5 height=16 version=0x00000001 log2_work=36.087485 tx=17 date='2009-01-10T04:45:58Z' progress=0.000000 cache=0.3MiB(16txo)
    492024-12-06T22:37:43Z Synchronizing blockheaders, height: 862000 (~98.71%)
    50@@@m_blockman.m_block_index.size()@@@ = 862001
    51@@@fScriptChecks@@@ = 0
    

    i.e. during synchronization the UpdateTip already starts and fScriptChecks gets enabled until the correct block is loaded, which will disable it - weird

  9. l0rinc commented at 8:11 pm on December 6, 2024: contributor
    I have tested 570af72, did an IBD to a clean datadir, stopped the process at ~20k blocks, restarted it with a -reindex: m_blockman.m_block_index contains the previous 25921 blocks only, so m_chainman.AssumedValidBlock() is not found among them, so script validation gets turned on at height=1 (same behavior as on master).
  10. mzumsande commented at 8:37 pm on December 6, 2024: contributor

    @mzumsande, do you happen to know why setting assumevalid to block 859999 works as expected (it is found in the 860001 blocks in m_blockman.m_block_index), so it’s correctly disabling script verification, but setting it to block 860002 (we’re currently at 873543) enables fScriptChecks from the very beginning for some reason (i.e. setting a later block results in having earlier validation).

    I can’t reproduce this. Assuming it’s not related to reindex, on an empty datadir after header sync, whether I start with -assumevalid="0000000000000000000196a7111c7aa3368af5a803a81c7bfb32860100dfd9c7" or -assumevalid="000000000000000000020c0ad748d78767d986e881259ba6de0a6e62cf946cf1" fScriptChecks is set to 0.

    I have tested https://github.com/bitcoin/bitcoin/commit/570af7281e21d51852b706e891835758d3d7161f, did an IBD to a clean datadir, stopped the process at ~20k blocks, restarted it with a -reindex: m_blockman.m_block_index contains the previous 25921 blocks only, so m_chainman.AssumedValidBlock() is not found among them, so script validation gets turned on at height=1 (same behavior as on master).

    Did you do this on mainnet? That would be because -reindex deletes the block tree db, but you don’t have the full blocks for most of the headers yet. So after reindexing all of the block files you have, you don’t have the headers for later blocks, so assumevalid is not applied.

    This is a general issue if you do reindex with a datadir that doesn’t have blocks above the minchainwork threshold in it - this PR doesn’t solve this.

  11. DrahtBot removed the label CI failed on Dec 6, 2024
  12. TheCharlatan commented at 3:39 pm on December 7, 2024: contributor

    Concept ACK - I was somehow under the impression that this was done on purpose to guarantee the validity of the already done work before continuing, but could never come up with a really good reason of why this should be the case - and I guess there really isn’t one!

    Reading through this code again recently makes me wish we’d have better handling for the genesis block in general. It feels weird that it is only activated in the first place during a fresh startup once ImportBlocks is called. It is also inconvenient for kernel users, since they need to manually call ActivateBestChain on startup (like done in bitcoin-chainstate). I wonder if it would be better to move loading it into load chainstate, for example like I did here (though that patch is pretty hacky with hardcoding the flat file position and making the registered validation interfaces miss the first notification and has some other implications).

  13. TheCharlatan approved
  14. TheCharlatan commented at 9:10 am on December 8, 2024: contributor
    ACK 570af7281e21d51852b706e891835758d3d7161f
  15. ryanofsky commented at 10:10 pm on December 9, 2024: contributor

    Concept ACK. These changes seem clearly better than current code, though the fix here seems like a workaround for a workaround for a workaround. IIUC, the original problem leading to this complexity was a crash in some code that assumed the genesis block existed, so a wait was added in #5243 to prevent that crash (first workaround instead of just making code handle the state and not crash). Then this wait apparently prevented gui from showing reindex progress so an ActivateBestChain call was added in #7917 to connect the genesis block and bypass the wait (second workaround, which can connect too many blocks after reindexing is interrupted or when -loadblock is used, leading to this bug). Now a third workaround is being implemented to prevent connecting too many blocks in the typical case when reindexing is interrupted and blocks are written in expected order with genesis first. But this is still a workaround, not a solid fix because other blocks besides the genesis block can still be connected in other cases.

    It seems to me like things would be simpler and more reliable if we could delete the wait and delete the ActivateBestChain call, but as long as they need to exist, this fix seems strictly better than the current code.

    Am wondering if there is some way to write a functional test for this PR. It’d be hard to write a test that interrupts reindexing at the right time, but maybe there could be a test that uses two -loadblock arguments to load the same blocks multiple times and makes sure all the blocks except the genesis blocks are only connected together at the very end.

  16. mzumsande commented at 5:46 pm on December 10, 2024: contributor

    Thanks for digging up the history!

    Am wondering if there is some way to write a functional test for this PR. It’d be hard to write a test that interrupts reindexing at the right time, but maybe there could be a test that uses two -loadblock arguments to load the same blocks multiple times and makes sure all the blocks except the genesis blocks are only connected together at the very end.

    I don’t completely understand this - do you mean -loadblock with specially crafted blockfiles in which block appear in a certain order?

    But this is still a workaround, not a solid fix because other blocks besides the genesis block can still be connected in other cases.

    Do you know any concrete examples for the latter? I was thinking that this was more of a solid fix because

    • ActivateBestChain() cannot connect blocks unless they are in the block tree db.
    • we cannot add blocks to the block tree db unless they connect to genesis (LoadExternalBlockFile detects out-of-order blocks and saves them for later).

    So, if we call ABC directly after calling AcceptBlock() for Genesis, it should be impossible for it to connect any other blocks than genesis because no other headers can be in the block tree db - even in some theoretical scenario where the order within the block file is messed up such that Genesis is not at its beginning.

    That being said, I agree it would be nice if we could remove this complexity, but I think it can only be done together with removing the wait-for-genesis in init - which means making sure that nothing bad happens when we don’t have a genesis block after startup.

  17. mzumsande commented at 5:46 pm on December 10, 2024: contributor

    So it looks me like there are several ideas for further improvements in this general area:

    1. Consider enabling assumevalid during reindex always - in case of a crash during IBD when we haven’t downloaded blocks above the assumevalid threshold, we currently don’t apply assumevalid since we don’t have headers above minchainwork but we also won’t request headers from peers until the (slow) reindex has finished - see issue #31494
    2. Call ActivateBestChain during first startup with an empty datadir earlier, not only in ImportBlocks (main challenge: -reindex must still work).
    3. Remove the genesis-specific ActivateBestChain() call - possibly together with the wait-for-genesis condition in init.
  18. in src/validation.cpp:5162 in dff5f81433 outdated
    5157@@ -5158,15 +5158,8 @@ void ChainstateManager::LoadExternalBlockFile(
    5158 
    5159                 // Activate the genesis block so normal node progress can continue
    5160                 if (hash == params.GetConsensus().hashGenesisBlock) {
    5161-                    bool genesis_activation_failure = false;
    5162-                    for (auto c : GetAll()) {
    


    ryanofsky commented at 1:00 pm on December 13, 2024:

    In commit “validation: Don’t loop over all chainstates in LoadExternalBlock” (dff5f81433fa8ce1d292a2f6051204a0d7824d6e)

    It would be helpful if commit description could clarify whether this is a change in behavior. The “no second chainstate can exist anyway because -reindex would have deleted any snapshot chainstate earlier” part suggests this doesn’t change current behavior. But I think in the case when -loadblock is being used, not -reindex, a snapshot chainstate might actually be loaded and not calling ActivateBestChain would be a (good?) change.

    Would just be good to clarify what this change does in terms of externally observable behavior.


    mzumsande commented at 4:40 pm on December 17, 2024:
    I added a clarification. The only reason to have this call here is to not block init, and in the -loadblock / assumeutxo case (which, by the way, I’m not sure anyone has ever tested) this is not an issue / ActivateBestChain() will be called for both chainstates at the end of ImportBlocks().
  19. in src/validation.cpp:5166 in 570af7281e outdated
    5156@@ -5157,7 +5157,10 @@ void ChainstateManager::LoadExternalBlockFile(
    5157                 }
    5158 
    5159                 // Activate the genesis block so normal node progress can continue
    5160-                if (hash == params.GetConsensus().hashGenesisBlock) {
    5161+                // Do this only if genesis isn't activated yet, to avoid connecting many blocks
    5162+                // without assumevalid in the case of a continuation of a reindex that
    5163+                // was interrupted by the user.
    5164+                if (hash == params.GetConsensus().hashGenesisBlock && WITH_LOCK(::cs_main, return ActiveHeight()) == -1) {
    


    ryanofsky commented at 1:09 pm on December 13, 2024:

    In commit “validation: fix issue with an interrupted -reindex” (570af7281e21d51852b706e891835758d3d7161f)

    It would be helpful if comment below mentioned that ActivateBestChain call will only connect the genesis block, not other blocks because ActivateBestChain on connects blocks which are in the block tree db, and block tree db only contains blocks whose parents are in it.

    Otherwise it’s not clear why ActivateBestChain is being called to connect just one block and why it wouldn’t connect other blocks, or at least it wasn’t clear to me.

    Alternately if there’s are some cases where this check is imperfect and ActivateBestChain could connect extra blocks, it would be good if comment clarified to say that intentions of this code is only to connect genesis block but there are case it could connect other blocks.


    mzumsande commented at 4:43 pm on December 17, 2024:
    I extended the comment a bit. I think one more theoretical case in which ActivateBestChain could have connected extra blocks (but wouldn’t anymore after this PR) is if, for some reason, block files loaded here (whether with -reindex or -loadblock) would contain the genesis block multiple times.
  20. ryanofsky approved
  21. ryanofsky commented at 1:34 pm on December 13, 2024: contributor

    Code review ACK 570af7281e21d51852b706e891835758d3d7161f. This seems like a clear improvement, but I think in longer term things would be a lot simpler we could delete the wait-for-genesis init loop and drop this special case ActivateBestChain call.

    Am wondering if there is some way to write a functional test for this PR. It’d be hard to write a test that interrupts reindexing at the right time, but maybe there could be a test that uses two -loadblock arguments to load the same blocks multiple times and makes sure all the blocks except the genesis blocks are only connected together at the very end.

    I don’t completely understand this - do you mean -loadblock with specially crafted blockfiles in which block appear in a certain order?

    I was thinking a test could use -loadblock with a normal block file with blocks in order, but just specify the same -loadblock file twice, so when genesis block is encountered the second time it triggers the ActivateBestChain call in the old code and causes non-genesis blocks to be connected to the chain in the middle of loading the block files. I’m not sure how important having a test for this is though, so maybe this would not be worth it, or wouldn’t work for some other reason.

    But this is still a workaround, not a solid fix because other blocks besides the genesis block can still be connected in other cases.

    Do you know any concrete examples for the latter?

    I was just thinking of cases with out of order blocks, but I didn’t realize that out of order blocks would not be connected because out of order blocks are not in the block tree db.

    Maybe a case where -reindex-chainstate and -loadblock were used together could still trigger this though, if -reindex-chainstate causes ActiveHeight() to return -1 but doesn’t clear the block tree db, and one of the -loadblocks files contains a genesis block, it could cause all the blocks in the block tree db to be connected when the genesis block is encountered.

  22. validation: Don't loop over all chainstates in LoadExternalBlock
    This simplifies the code. The only reason to call ActivateBestChain()
    here is to allow the main init thread to finish startup in a case of
    -reindex. In this situation no second chainstate can exist anyway
    because -reindex would have deleted any snapshot chainstate earlier.
    
    This could change behavior slightly if -loadblocks was used when there is a
    snapshot chainstate. In this case, there is no reason to call
    ActivateBestChain() for that chainstate here - it will be called in
    ImportBlocks() after all blocks have been indexed.
    a2675897e2
  23. validation: fix issue with an interrupted -reindex
    If a reindex was interrupted while it was iterating
    through the block files, genesis will already be connected
    when the reindex resumes at the next startup.
    In this case, a call to ActivateBestChainState() is not only unnecessary,
    but it would connect multiple blocks without applying
    -assumevalid, which is much slower.
    This is because assumevalid requires us to have a header above
    the minimum chainwork, but that header is unknown to us if it's in
    a later blockfile not indexed yet.
    c9136ca906
  24. mzumsande force-pushed on Dec 17, 2024
  25. mzumsande commented at 4:49 pm on December 17, 2024: contributor
    570af72 to c9136ca: Addressed feedback by @ryanofsky
  26. ryanofsky approved
  27. ryanofsky commented at 5:02 pm on December 17, 2024: contributor
    Code review ACK c9136ca90605bbe29f005f538b92ff96ca360a13. Only comments changed since last review. Appreciate the new comments, I think they make a little clearer what things code is trying to do and what things are just side-effects.
  28. DrahtBot requested review from TheCharlatan on Dec 17, 2024
  29. TheCharlatan approved
  30. TheCharlatan commented at 8:23 pm on December 17, 2024: contributor
    Re-ACK c9136ca90605bbe29f005f538b92ff96ca360a13

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me