index: Commit MuHash and best block together for coinstatsindex #24138

pull mzumsande wants to merge 2 commits into bitcoin:master from mzumsande:202201_coinstatsindex_commits changing 6 files +83 −12
  1. mzumsande commented at 2:10 pm on January 24, 2022: member

    Fixes #24076

    Coinstatsindex currently writes the MuHash (DB_MUHASH) to disk in CoinStatsIndex::WriteBlock() and CoinStatsIndex::ReverseBlock(), but the best synced block is written in BaseIndex::Commit(). These are called at different points in time, both during the ThreadSync phase, and also after the initial sync is finished and validation callbacks (BlockConnected() vs ChainStateFlushed()) perform the syncing.

    As a result, the index DB is temporarily in an inconsistent state, and if bitcoind is terminated uncleanly (so that there is no time to call Commit() by receiving an interrupt or by flushing the chainstate) this leads to problems: On the next startup, Init() will read the best block and a MuHash that corresponds to a different (higher) block. Indexing will be picked up at the the best block processing some blocks again, but since MuHash is a rolling hash, it will process some utxos twice and the muhashes for all future blocks will be wrong, as was observed in #24076.

    Fix this by always committing DB_MUHASH together with DB_BEST_BLOCK.

    Note that the block data for the index is still written at different times, but this does not corrupt the index - at worst, these entries will be processed another time and overwritten after an unclean shutdown and restart.

  2. mzumsande commented at 2:10 pm on January 24, 2022: member
    FYI @fjahr
  3. mzumsande force-pushed on Jan 24, 2022
  4. DrahtBot added the label UTXO Db and Indexes on Jan 24, 2022
  5. DrahtBot commented at 0:58 am on January 26, 2022: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #24230 (indexes: Stop using node internal types and locking cs_main, improve sync logic by ryanofsky)
    • #24150 (refactor: move index class members from protected to private by jonatack)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  6. in src/index/coinstatsindex.cpp:234 in 010156c07f outdated
    227@@ -228,10 +228,7 @@ bool CoinStatsIndex::WriteBlock(const CBlock& block, const CBlockIndex* pindex)
    228     m_muhash.Finalize(out);
    229     value.second.muhash = out;
    230 
    231-    CDBBatch batch(*m_db);
    232-    batch.Write(DBHeightKey(pindex->nHeight), value);
    233-    batch.Write(DB_MUHASH, m_muhash);
    234-    return m_db->WriteBatch(batch);
    


    luke-jr commented at 8:59 pm on January 29, 2022:
    I don’t see why this should change. The CommitInternal override feels like a hacky refactor.

    mzumsande commented at 9:41 pm on January 29, 2022:

    I think this is spot is problematic, and the reason for the behavior reported in #24076:

    If we persist DB_MUHASH here in each BlockConnected() callback, but not DB_BEST_BLOCK (which is persisted only in ChainStateFlushed() callbacks and will therefore at this point in time have a lower height), the db is temporarily in an inconsistent state.

    So in case of an unclean shutdown (with no time to flush the chainstate to disk to get in sync again), on next startup the index will initialize with the older DB_BEST_BLOCK but a DB_MUHASH which contains data from newer blocks. The index then reprocesses the blocks after DB_BEST_BLOCK’s height a second time. But MuHash is a rolling hash, so it will incorporate data from these blocks twice, and the hash will be incorrect.


    fjahr commented at 8:30 pm on February 12, 2022:

    I think it was not considered hacky by the author of the current index architecture because it’s done the same way in blockfilterindex.cpp and the documentation in index/base.h also states that this method can be overridden.

    Obviously, I did it differently here but I can’t remember if that was a deliberate choice or rather just lack of understanding (probably the latter).

    And that doesn’t mean there isn’t mayor room for refactoring here. But it looks like #24230 should be considered first when thinking about it first. I have not started reviewing it yet.


    ryanofsky commented at 8:17 pm on February 17, 2022:

    In commit “index: Commit DB_MUHASH and DB_BEST_BLOCK to disk together” (010156c07f28e2c8dc30d4408942d1e1d88ab54d)

    The CommitInternal override feels like a hacky refactor.

    I think I agree with luke if the point is that the override-and-call-base class code in CommitInternal is a little hacky, but fjahr’s right this just the way the base index class was designed, so the override seems like appropriate way to handle this now. #24230 does clean up all the overrides and will simplify this a bit.


    mzumsande commented at 4:21 pm on February 21, 2022:

    Just to make this clear: This is not a refactor, it changes behavior (the points in time when DB_MUHASH is written to disk).

    My understanding is that the indexes work in general such that that all persisted data describing the current progress of indexing (such as the best block) must be in sync with the persisted chainstate information (which is sometimes not flushed for several blocks received) - that is why ChainStateFlushed() callbacks are used. In case of an unclean shutdown, the chainstate will be constructed from this point on, so the indexes must as well.

    In contrast to this is the actual indexed data for each block - I think we could in principle cache it and also write it to disk with each ChainStateFlushed() callback and not use BlockConnected() at all.

    But it is also possible to write it to disk in advance with each BlockConnected() callback, as it is done: This is dependent on the assumption that we will not read (and possibly overwrite) the indexed data in case it is ahead of the flushed chainstate data (and the best block), and also that before flushing the chainstate, we have called BlockConnected() for each block.

    With this in mind, I don’t see a better way of doing this other than overriding CommitInternal.

  7. in src/index/coinstatsindex.cpp:492 in 010156c07f outdated
    485@@ -481,5 +486,5 @@ bool CoinStatsIndex::ReverseBlock(const CBlock& block, const CBlockIndex* pindex
    486     Assert(m_total_unspendables_scripts == read_out.second.total_unspendables_scripts);
    487     Assert(m_total_unspendables_unclaimed_rewards == read_out.second.total_unspendables_unclaimed_rewards);
    488 
    489-    return m_db->Write(DB_MUHASH, m_muhash);
    


    luke-jr commented at 9:00 pm on January 29, 2022:
    Seems like the only problem is this lone line?

    Sjors commented at 12:46 pm on February 8, 2022:
    If I replace only that line with true (or false for that matter) the reorg test in feature_coinstatsindex.py still passes. But not sure if that helps with the original issue (and it indicates an obvious test coverage omission, cc @fjahr).

    fjahr commented at 7:44 pm on February 12, 2022:
    Only replacing this line means the rolling muhash state is not persisted during reorgs. That would be a problem if the node gets shut down during the reorg, then we would risk an inconsistent state or probably the index wouldn’t even start up anymore because the guard check in Init() would fail.

    Sjors commented at 1:33 pm on February 18, 2022:

    if the node gets shut down during the reorg,

    I guess this is very non trivial to test, but maybe add a comment?


    mzumsande commented at 4:43 pm on February 21, 2022:
    I added a comment at the end of CoinStatsIndex::WriteBlock (as suggested by ryanofsky below). I think that duplicating this comment in CoinStatsIndex::ReverseBlock is not required, since this function doesn’t write anything else to disk there wouldn’t really be the expectation that for it to write DB_MUHASH here.
  8. luke-jr changes_requested
  9. Sjors commented at 12:43 pm on February 8, 2022: member

    Concept ACK

    It might be worth rebasing this on #24133 and if at all possible, add a test that triggers the error added there.

    I think overriding CommitInternal makes sense given that we require DB_MUHASH and DB_BEST_BLOCK to be in sync (which we explicitly check as of #24133).

    The alternative would be to drop DB_MUHASH altogether, but since MuHash can’t (?) be initialized from a finalized hash, we would have to index both the denominator and numerator.

  10. fjahr commented at 0:14 am on February 13, 2022: member

    Code review ACK 010156c07f28e2c8dc30d4408942d1e1d88ab54d

    While I can not reproduce the issue I agree that this is likely to fix #24076.

    I think there could still be inconsistencies between the running muhash and the coinstats data since those are not batched anymore now but at least that would then lead to a failure in Init() at restart. This could be avoided by squashing all of that into CommitInternal() as well (drafted here) but I don’t think this is necessary right now. As I have reviewed this I am now really not sure about the boundaries between Commit() and WriteBlock() anymore and will think about refactoring that while reviewing #24230 (maybe that’s already solved there).

  11. in src/index/coinstatsindex.cpp:231 in 010156c07f outdated
    227@@ -228,10 +228,7 @@ bool CoinStatsIndex::WriteBlock(const CBlock& block, const CBlockIndex* pindex)
    228     m_muhash.Finalize(out);
    229     value.second.muhash = out;
    230 
    231-    CDBBatch batch(*m_db);
    


    ryanofsky commented at 8:21 pm on February 17, 2022:

    In commit “index: Commit DB_MUHASH and DB_BEST_BLOCK to disk together” (010156c07f28e2c8dc30d4408942d1e1d88ab54d)

    I might add a code comment here like “Intentionally do not update DB_MUHASH here so it stays in sync with DB_BEST_BLOCK, and index is not corrupted if there is an unclean shutdown”


    mzumsande commented at 4:40 pm on February 21, 2022:
    Thanks, I added a comment here.
  12. ryanofsky approved
  13. ryanofsky commented at 8:53 pm on February 17, 2022: member

    Code review ACK 010156c07f28e2c8dc30d4408942d1e1d88ab54d

    It would be good to add a unit test to coinstatsindex_tests that triggers the issue this is supposed to fix. I think the test would just need to make the index object, send a BlockConnected notification, and then destroy the index object without sending ChainStateFlushed notification. This would save a corrupted index that should result in errors next time it is loaded.

    re: #24138#pullrequestreview-880885450

    I think there could still be inconsistencies between the running muhash and the coinstats data since those are not batched anymore now but at least that would then lead to a failure in Init() at restart.

    I could be wrong but this seems fine to me since coinstats data is accessed by height and written before the muhash & best block. So if the coinstats data is ahead of the muhash. it is just ignored. In your suggested followup a90394efb9b348b7d3615d61fbea612470b1c75c, I think the if (!m_db->Read(DBHeightKey(pindex->nHeight)) check would be good to add, but I think it should it should just trigger a fatal error if the key isn’t found. I don’t see how the recovery code added in that commit would work.

    #24230 (maybe that’s already solved there).

    My guess is #24230 doesn’t affect this, since it’s trying to leave index’s internal behavior the same, and just change the way it receives block connected and disconnected notifications, without changing the way it handles them.

  14. index: Commit DB_MUHASH and DB_BEST_BLOCK to disk together
    If these are written to disk at different times,
    unclean shutdowns can lead to index corruption.
    eb6cc05da3
  15. mzumsande force-pushed on Feb 21, 2022
  16. mzumsande commented at 4:51 pm on February 21, 2022: member

    I pushed an update, rebasing (with #24133 merged) and addressing feedback (adding a comment).

    It would be good to add a unit test to coinstatsindex_tests that triggers the issue this is supposed to fix. I think the test would just need to make the index object, send a BlockConnected notification, and then destroy the index object without sending ChainStateFlushed notification. This would save a corrupted index that should result in errors next time it is loaded.

    This would be great. I haven’t done this with this push, because a) I couldn’t make it work yet, I had some technical difficulties with the test framework when restarting the index. I will look into it more unless b) @fjahr who said here that he is working on this has more success.

  17. ryanofsky approved
  18. ryanofsky commented at 7:50 pm on February 22, 2022: member
    Code review ACK eb6cc05da32c5bde122725a0bc907d3767a791cd. Only changes since previous review are rebasing and adding comments. @mzumsande I wrote a little test which passes with this bugfix, and fails with the fix reverted. Feel free to add it to this PR or use the code in any way: ded9fda2b6e154ac64b2fc755ddaa738b309b7e9 d3c2ebbfdc3c61a04b645c6bb435424fa60f4a0b
  19. Add coinstatsindex_unclean_shutdown test 691d45fdc8
  20. mzumsande commented at 2:02 pm on March 1, 2022: member

    @mzumsande I wrote a little test which passes with this bugfix, and fails with the fix reverted. Feel free to add it to this PR or use the code in any way: ded9fda d3c2ebb

    Thank you! The test looks great, I added your commit to this PR (and confirmed that it fails on master).

  21. ryanofsky approved
  22. ryanofsky commented at 9:38 pm on March 2, 2022: member
    Code review ACK 691d45fdc83ec14f87a400f548553168ac70263f. Only change since last review is adding test
  23. in src/test/coinstatsindex_tests.cpp:123 in 691d45fdc8
    118+    }
    119+
    120+    {
    121+        CoinStatsIndex index{1 << 20};
    122+        // Make sure the index can be loaded.
    123+        BOOST_REQUIRE(index.Start(chainstate));
    


    fjahr commented at 1:47 pm on March 6, 2022:

    in 691d45fdc83ec14f87a400f548553168ac70263f:

    Not sure why we need this whole block with the new index instead of reusing the old index above. It doesn’t seem to make a difference in my testing?


    MarcoFalke commented at 10:49 am on March 9, 2022:
    Is there a huge cost to spinning up the second index? If not, I think both versions are fine.
  24. fjahr commented at 2:09 pm on March 6, 2022: member

    ACK 691d45fdc83ec14f87a400f548553168ac70263f

    Reviewed code and ensured that the newly added test fails when eb6cc05da32c5bde122725a0bc907d3767a791cd is reverted.

  25. MarcoFalke merged this on Mar 9, 2022
  26. MarcoFalke closed this on Mar 9, 2022

  27. sidhujag referenced this in commit fea9761a68 on Mar 9, 2022
  28. DrahtBot locked this on Mar 9, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-03 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me