assumeutxo: nTx and nChainTx violations in CheckBlockIndex #29261

issue maflcko openend this issue on January 17, 2024
  1. maflcko commented at 2:38 pm on January 17, 2024: member

    When disabling the “test-only” assumptions in CheckBlockIndex, the check fails. This is problematic, because test-only code should not affect the behavior of the program in production.

    Current diff:

     0diff --git a/src/validation.cpp b/src/validation.cpp
     1index 8c583c586c..00d7ee3736 100644
     2--- a/src/validation.cpp
     3+++ b/src/validation.cpp
     4@@ -4866,9 +4866,9 @@ void ChainstateManager::CheckBlockIndex()
     5         unsigned int prev_chain_tx = pindex->pprev ? pindex->pprev->nChainTx : 0;
     6         assert((pindex->nChainTx == pindex->nTx + prev_chain_tx)
     7                // For testing, allow transaction counts to be completely unset.
     8-               || (pindex->nChainTx == 0 && pindex->nTx == 0)
     9+               //|| (pindex->nChainTx == 0 && pindex->nTx == 0)
    10                // For testing, allow this nChainTx to be unset if previous is also unset.
    11-               || (pindex->nChainTx == 0 && prev_chain_tx == 0 && pindex->pprev)
    12+               //|| (pindex->nChainTx == 0 && prev_chain_tx == 0 && pindex->pprev)
    13                // Transaction counts prior to snapshot are unknown.
    14                || pindex->IsAssumedValid());
    15 
    

    Steps to reproduce the crash:

    0$ ./src/qt/bitcoin-qt -datadir=/tmp -regtest
    1> generatetodescriptor 1 raw(ff)
    

    Related to #28791 and #28562 (review)

  2. bitcoin deleted a comment on Jan 19, 2024
  3. willcl-ark commented at 7:58 pm on January 22, 2024: contributor

    $ ./src/qt/bitcoin-qt -datadir=/tmp -regtest

    Do we not consider regtest “testing”?

    ISTM that only “test mode” (regtest) will fail the assertion with the lines removed.

  4. ryanofsky commented at 8:45 pm on January 22, 2024: contributor

    Do we not consider regtest “testing”?

    The comment is vague, but it’s really referring to unit tests.

    When I suggested adding the two conditions allowing pindex->nChainTx == 0 “for testing” in #27596 (review), the reason was to avoid crashes in unit tests, because some unit tests skipped calling ReceivedBlockTransactions and therefore skipped setting a nChainTx value:

    https://github.com/bitcoin/bitcoin/blob/651fb034d85eb5db561bfd24b74f7271417defa5/src/validation.cpp#L3613

    But outside unit tests the condition pindex->nChainTx == pindex->nTx + prev_chain_tx should be true and the assert should succeed.

    So there is unexpected behavior here if bitcoin-qt is crashing without these conditions, and maybe there is a real bug. Also this code could probably be improved by updating unit tests to set nChainTx correctly, so the two special case conditions in the assert allowing pindex->nChainTx == 0 could be dropped.

  5. mzumsande commented at 0:17 am on January 23, 2024: contributor

    So there is unexpected behavior here if bitcoin-qt is crashing without these conditions

    This is not specific to bitcoin-qt, or assumeutxo. For example, It also happens with bitcoin-cli -regtest -generate 1 (after creating an empty wallet) on an empty datadir.

    I think the comment // For testing, allow transaction counts to be completely unset. is wrong. CheckBlockIndex traverses the entire block index tree, no matter if we have the block on disk or not. If we don’t have it, ReceivedBlockTransactions hasn’t been called, and both nTx and nChainTx are 0.

    Therefore, the check assert((pindex->nChainTx == pindex->nTx + prev_chain_tx) cannot be applied to headers for which we never processed the block data, and this has nothing to do with AssumeUtxo or Testing.

    I’ll open a PR with a fix tomorrow unless someone disagrees.

  6. ryanofsky commented at 0:41 am on January 23, 2024: contributor
    Oops, sorry I missed that context so my explanation doesn’t really describe what the checks are doing (although it does describe what they were intending to do). If the PR is just going to change the comments and delete the words “For testing,” that sounds good.
  7. ryanofsky commented at 1:16 am on January 23, 2024: contributor

    Other notes on improving this assert:

    • The final pindex->IsAssumedValid() check added in #28791 seems a little overbroad. I think the condition (pindex->nChainTx == pindex->nTx + prev_chain_tx) should actually be true for all assumed-valid blocks except the first one and the last one, so that check could be tightened up
    • It might possible to add a test covering the assert failure that pindex->IsAssumedValid() from #28791 fixes, since #28791 didn’t seem to include a test
  8. maflcko commented at 8:04 am on January 23, 2024: member

    $ ./src/qt/bitcoin-qt -datadir=/tmp -regtest

    Do we not consider regtest “testing”?

    Regtest should have the same properties as main for the purposes here. It should be possible to reproduce on main as well, if you want to do the POW.

  9. willcl-ark commented at 9:28 am on January 23, 2024: contributor
    Ah I see now, thanks. The repro steps threw me off there. I did now also verify that it is hit on (a custom) signet (with -checkblocks set).
  10. mzumsande commented at 10:59 pm on January 23, 2024: contributor
    • The final pindex->IsAssumedValid() check added in #28791 seems a little overbroad. I think the condition (pindex->nChainTx == pindex->nTx + prev_chain_tx) should actually be true for all assumed-valid blocks except the first one and the last one, so that check could be tightened up

    After trying this out, I think that this is not the case. The BLOCK_ASSUMED_VALID status is only removed when the block is connected to the chain and raised to BLOCK_VALID_SCRIPTS. However, nTx and nChainTx are updated from their fake to their actual values in AcceptBlock / ReceivedBlockTransactions, which happens before that. So during the period where we receive blocks for the background chain in some, possibly random, order, there will be a mix of blocks with updated nTx and fake nTx ones, so that the condition could fail everywhere in the range of assumed-valid blocks.

    • It might possible to add a test covering the assert failure that pindex->IsAssumedValid() from #28791 fixes, since #28791 didn’t seem to include a test

    For that, we’d need a chain where some blocks (at least the snapshot block, possibly also others) have additional transactions. It seems not very clean to update the regtest chainparams everytime we want to test a snapshot with a slightly different chain. Not sure if this has been discussed before, but maybe it could make sense to add a RPC that allows us to register an entry to m_assumeutxo_data dynamically (just for regtest of course).

  11. mzumsande commented at 11:31 pm on January 23, 2024: contributor
    See #29299 for a fix (only changing the doc and moving the check into a proper section).
  12. ryanofsky commented at 4:42 pm on January 24, 2024: contributor

    So during the period where we receive blocks for the background chain in some, possibly random, order, there will be a mix of blocks with updated nTx and fake nTx ones, so that the condition could fail everywhere in the range of assumed-valid blocks.

    Thanks for figuring this out and trying this. Trying to put this all together it seems like these are the cases:

    1. Case where IsValid() is true. Then nChainTx must be prev nChainTx + nTx. 2. Case where nTx is 0. Then nChainTx must be 0 because the block doesn’t have transactions yet. 3. Case where prev nChainTx is 0. Then nChainTx must be 0 because some ancestor block doesn’t have transactions yet. 4. Case where IsAssumedValid() is true. Probably the only meaningful thing to check in this case is that nChainTx is greater than prev nChainTx. This should always be true except if prev IsValid() is true and this is not the snapshot block. In that case, there will be a discontinuity and nChainTx will decrease because it is a bogus value following a real value.

    EDIT: There are a number of mistakes in the suggestion above. It would be good to check for increasing nChainTx values, but the breakdown above is not correct. The following checks do seem to work, though:

     0if (!pindex->pprev) {
     1    // If there's no previous block, nChainTx should be set to the number
     2    // of transactions in this block.
     3    assert(pindex->nChainTx == pindex->nTx);
     4} else if (pindex->IsAssumedValid()) {
     5    // If block is assumed valid, nChainTx could be faked, but should at
     6    // least be increasing between blocks. The only exception is
     7    // assumed-valid blocks that are not the snapshot block and don't
     8    // have transactions, directly following blocks that do have
     9    // transactions. The faked nChainTx values in these will probably be
    10    // less than the non-faked values in the previous blocks.
    11    assert(pindex->nChainTx > pindex->pprev->nChainTx || (!pindex->IsValid() && pindex->pprev->IsValid() && pindex != GetSnapshotBaseBlock()));
    12} else if (pindex->nTx == 0 || pindex->pprev->nChainTx == 0) {
    13    // If block is missing transactions, or any parent block is,
    14    // nChainTx should be unset.
    15    assert(pindex->nChainTx == 0);
    16} else {
    17    // For normal blocks, nChainTx should be set to the sum of
    18    // transactions in this and previous blocks.
    19    assert(pindex->nChainTx == pindex->pprev->nChainTx + pindex->nTx);
    20}
    
  13. willcl-ark added the label Validation on Jan 24, 2024
  14. maflcko added the label Docs on Jan 25, 2024
  15. maflcko added the label Tests on Jan 25, 2024
  16. glozow closed this on Jan 30, 2024

  17. maflcko commented at 12:26 pm on January 30, 2024: member

    The following checks do seem to work, though:

    Pull requests welcome

  18. fanquake referenced this in commit 7005766492 on Jan 30, 2024
  19. glozow commented at 2:29 pm on January 30, 2024: member
    Let me know if I should reopen this issue?
  20. ryanofsky commented at 5:22 pm on January 30, 2024: contributor

    Let me know if I should reopen this issue?

    It’s good to close. Extending the assert with #29261 (comment) would only be a marginal improvement. I think a better long term fix would be to get rid of fake nTx and nChainTx values as described in #29328 (comment) “In the long run…”.

  21. mzumsande commented at 6:04 pm on January 30, 2024: contributor

    // If block is assumed valid, nChainTx could be faked, but should at least be increasing between blocks.

    I’m not completely convinced this is always true - I might well be wrong (didn’t test it), but imagine the following scenario during the background sync: In the beginning, all relevant blocks are assumed valid and have fake values for nTx and nChainTx. Then we receive an out-of order block of height h (that doesn’t connect yet) -> we set nTx and nChainTx. -> Everything is ok, even though nChainTx may decrease, the checks succeeds because the block of height h+1 may have a small nChainTx, but IsValid() is false.

    But next, imagine we receive the block at height h-1 (that doesn’t connect to the tip either) with a large tx count. We’d update nChainTx for that block, but I don’t see how we would then also update the block at height h in ReceivedBlockTransactions (because it won’t be in m_blocks_unlinked if it has data). As a result the assert fail because we now have a decreasing nChainTx between two valid blocks.

  22. ryanofsky commented at 6:24 pm on January 30, 2024: contributor

    re: #29261 (comment)

    Then we receive an out-of order block of height h (that doesn’t connect yet) -> we set nTx and nChainTx.

    I was going to reply that in this case we should only set nTx not nChainTx, but this does not appear to be true. The code that is trying to check for this case:

    https://github.com/bitcoin/bitcoin/blob/cad2df24b396be403f13f372ec48ea14a90d7f06/src/validation.cpp#L3604-L3605

    just assumes that if the previous block’s nChainTx is nonzero then it is valid. So it seems you are right and in this case nChainTx will be assigned a nonsense value, and keep that value until the node is restarted and nChainTx is recomputed?

    So it seems like there is a larger bug here, and the comment there equating the condition pindexNew->pprev->HaveNumChainTxs() with “all parents are BLOCK_VALID_TRANSACTIONS” is not correct in the presence of fake values.

  23. mzumsande commented at 10:09 pm on January 30, 2024: contributor

    So it seems like there is a larger bug here

    Yes, that could be the case. Since I don’t see how this can correct without restarting after the blocks are connected to the chain and no longer assumed valid, I wonder if that could make the existing check https://github.com/bitcoin/bitcoin/blob/cad2df24b396be403f13f372ec48ea14a90d7f06/src/validation.cpp#L4949 fail. Since it is very slow to sync with -checkblockindex=1 even on signet (though commit https://github.com/bitcoin/bitcoin/pull/28339/commits/87d6155b91797d795433c9c8bca5da20c74f810c from #28339 could help with that) there is a good chance that no one has ever tried it. Alternatively, a functional test could be written.

  24. maflcko commented at 11:46 am on January 31, 2024: member

    test

    Good idea.

    On top of #29354 , the following diff should crash the node:

     0diff --git a/test/functional/feature_assumeutxo.py b/test/functional/feature_assumeutxo.py
     1index 528680f2ca..e9d74ea132 100755
     2--- a/test/functional/feature_assumeutxo.py
     3+++ b/test/functional/feature_assumeutxo.py
     4@@ -165,6 +165,14 @@ class AssumeutxoTest(BitcoinTestFramework):
     5                 self.mini_wallet.send_self_transfer(from_node=n0)
     6             self.generate(n0, nblocks=1, sync_fun=self.no_op)
     7             newblock = n0.getblock(n0.getbestblockhash(), 0)
     8+            if i == 4:
     9+                # Create a stale block
    10+                temp_invalid = n0.getbestblockhash()
    11+                n0.invalidateblock(temp_invalid)
    12+                stale_hash = self.generateblock(n0, output="raw(aaaa)", transactions=[], sync_fun=self.no_op)["hash"]
    13+                n0.invalidateblock(stale_hash)
    14+                n0.reconsiderblock(temp_invalid)
    15+                stale_block = n0.getblock(stale_hash, 0)
    16 
    17             # make n1 aware of the new header, but don't give it the block.
    18             n1.submitheader(newblock)
    19@@ -215,6 +223,12 @@ class AssumeutxoTest(BitcoinTestFramework):
    20 
    21         assert_equal(n1.getblockchaininfo()["blocks"], SNAPSHOT_BASE_HEIGHT)
    22 
    23+        self.log.info("Sumbit a stale block")
    24+        n1.submitblock(stale_block)
    25+        n1.getchaintips()
    26+        n1.getblock(stale_hash)
    27+        #assert False
    28+
    29         self.log.info("Submit a spending transaction for a snapshot chainstate coin to the mempool")
    30         # spend the coinbase output of the first block that is not available on node1
    31         spend_coin_blockhash = n1.getblockhash(START_HEIGHT + 1)
    
  25. maflcko reopened this on Jan 31, 2024

  26. ryanofsky referenced this in commit 717f3fa69e on Feb 7, 2024
  27. ryanofsky commented at 7:16 pm on February 7, 2024: contributor

    re: #29261 (comment)

    On top of #29354 , the following diff should crash the node:

    This is a great test, thank you! I added it to #29370 and it uncovered a new bug that PR introduced to the CheckBlockIndex code (forgetting to reset pindexFirstNeverProcessed after moving upwards from the snapshot node).

  28. ryanofsky commented at 8:29 pm on February 7, 2024: contributor

    re: #29261 (comment)

    So it seems like there is a larger bug here

    Looking into this more, I think the bug where pprev->HaveNumChainTxs can return true on this line even when “all parents are BLOCK_VALID_TRANSACTIONS” is false would be pretty hard to trigger and the consequences would not be too bad.

    https://github.com/bitcoin/bitcoin/blob/cad2df24b396be403f13f372ec48ea14a90d7f06/src/validation.cpp#L3604-L3605

    The main consequence would be blocks winding up with nonsense nChainTx values based on sums with previous blocks fake values, which could trigger the CheckBlockIndex failure in marco’s test #29261 (comment).

    But other than that, it shouldn’t matter because fork blocks before the snapshot block and ahead of the background chain tip will ignored by TryAddBlockIndexCandidate, so it is harmless if that is incorrectly called on them. And non-fork blocks before the snapshot block and ahead of the background chain tip will just get added to setBlockIndexCandidates instead of m_blocks_unlinked, which is fine because FindMostWorkChain should move them back to m_blocks_unlinked. As long as this happens the blocks should eventually be validated and the nChainTx values should get corrected.

    So the “or all parents are BLOCK_VALID_TRANSACTIONS” comment is incorrect, but it shouldn’t matter too much. #29370 should make this all more straightforward though.

  29. ryanofsky referenced this in commit 207a171769 on Feb 8, 2024
  30. ryanofsky referenced this in commit 50273702d7 on Feb 15, 2024
  31. ryanofsky referenced this in commit 594336ae8a on Feb 20, 2024
  32. ryanofsky referenced this in commit c69db62d5e on Feb 26, 2024
  33. ryanofsky referenced this in commit d351ea2f88 on Feb 26, 2024
  34. ryanofsky referenced this in commit 6c154da50e on Feb 29, 2024
  35. ryanofsky referenced this in commit fdabd745d1 on Mar 8, 2024
  36. ryanofsky referenced this in commit 95474d9f3b on Mar 8, 2024
  37. ryanofsky referenced this in commit 85233b2032 on Mar 8, 2024
  38. ryanofsky referenced this in commit 7553a8e062 on Mar 11, 2024
  39. ryanofsky referenced this in commit 0a14ef8d94 on Mar 12, 2024
  40. ryanofsky referenced this in commit 3596837882 on Mar 12, 2024
  41. ryanofsky referenced this in commit 66fb410c4b on Mar 13, 2024
  42. ryanofsky referenced this in commit 83f7cdc8c2 on Mar 13, 2024
  43. ryanofsky referenced this in commit 0391458d76 on Mar 18, 2024
  44. ryanofsky referenced this in commit e5d800d9d5 on Mar 18, 2024
  45. maflcko closed this on Mar 20, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me