refactor: Improve assumeutxo state representation #30214

pull ryanofsky wants to merge 16 commits into bitcoin:master from ryanofsky:pr/cstate changing 38 files +656 −721
  1. ryanofsky commented at 2:53 pm on May 31, 2024: contributor

    This PR contains the first part of #28608, which tries to make assumeutxo code more maintainable, and improve it by not locking cs_main for a long time when the snapshot block is connected, and by deleting the snapshot validation chainstate when it is no longer used, instead of waiting until the next restart.

    The changes in this PR are just refactoring. They make Chainstate objects self-contained, so for example, it is possible to determine what blocks to connect to a chainstate without querying ChainstateManager, and to determine whether a Chainstate is validated without basing it on inferences like &cs != &ActiveChainstate() or GetAll().size() == 1.

    The PR also tries to make assumeutxo terminology less confusing, using “current chainstate” to refer to the chainstate targeting the current network tip, and “historical chainstate” to refer to the chainstate downloading old blocks and validating the assumeutxo snapshot. It removes uses of the terms “active chainstate,” “usable chainstate,” “disabled chainstate,” “ibd chainstate,” and “snapshot chainstate” which are confusing for various reasons.

  2. DrahtBot commented at 2:53 pm on May 31, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK TheCharlatan
    Approach ACK fjahr

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30610 (validation: do not wipe utxo cache for stats/scans/snapshots by sipa)
    • #30598 (assumeutxo: Drop block height from metadata by fjahr)
    • #30342 (kernel, logging: Pass Logger instances to kernel objects by ryanofsky)
    • #30221 (wallet: Ensure best block matches wallet scan state by achow101)
    • #29680 (wallet: fix unrelated parent conflict doesn’t cause child tx to be marked as conflict by Eunovo)
    • #29652 (wallet: Avoid potentially writing incorrect best block locator by ryanofsky)
    • #29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)
    • #28616 (Show transactions as not fully confirmed during background validation by Sjors)
    • #26593 (tracing: Only prepare tracepoint arguments when actually tracing by 0xB10C)
    • #26022 (Add util::ResultPtr class by ryanofsky)
    • #25722 (refactor: Use util::Result class for wallet loading by ryanofsky)
    • #25665 (refactor: Add util::Result failure values, multiple error and warning messages by ryanofsky)
    • #24230 (indexes: Stop using node internal types and locking cs_main, improve sync logic by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Refactoring on May 31, 2024
  4. DrahtBot added the label CI failed on May 31, 2024
  5. DrahtBot commented at 7:56 pm on May 31, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/25656143496

  6. ryanofsky force-pushed on Jun 3, 2024
  7. DrahtBot removed the label CI failed on Jun 4, 2024
  8. fjahr commented at 9:21 am on June 8, 2024: contributor
    Concept ACK
  9. DrahtBot added the label Needs rebase on Jun 10, 2024
  10. ryanofsky force-pushed on Jun 17, 2024
  11. DrahtBot removed the label Needs rebase on Jun 17, 2024
  12. DrahtBot added the label Needs rebase on Jun 17, 2024
  13. ryanofsky force-pushed on Jun 17, 2024
  14. DrahtBot removed the label Needs rebase on Jun 17, 2024
  15. DrahtBot added the label Needs rebase on Jun 25, 2024
  16. in src/validation.h:556 in cdc44e878a outdated
    551+    {
    552+        return m_validity;
    553+    }
    554+
    555+    //! Return true if chainstate fully validated or is assumed valid.
    556+    bool IsValid() const EXCLUSIVE_LOCKS_REQUIRED(::cs_main)
    


    fjahr commented at 9:49 am on July 15, 2024:
    nit: In other places similar checks are inlined without a helper method. That might be actually a bit more clear as well.

    ryanofsky commented at 5:43 pm on July 18, 2024:

    re: #30214 (review)

    nit: In other places similar checks are inlined without a helper method. That might be actually a bit more clear as well.

    Good point, at least at first glance this looks reasonable to inline.


    ryanofsky commented at 9:48 pm on July 19, 2024:

    re: #30214 (review)

    nit: In other places similar checks are inlined without a helper method. That might be actually a bit more clear as well.

    You’re right, that is clearer. Updated different places where this was used.

  17. in src/test/validation_chainstatemanager_tests.cpp:32 in ce02d68008 outdated
    28@@ -29,6 +29,17 @@ using node::BlockManager;
    29 using node::KernelNotifications;
    30 using node::SnapshotMetadata;
    31 
    32+static std::vector<Chainstate*> GetAll(const ChainstateManager& chainman)
    


    fjahr commented at 10:13 am on July 15, 2024:
    Why do you add this function around here instead of using m_chainstates as well?

    ryanofsky commented at 5:43 pm on July 18, 2024:

    Why do you add this function around here instead of using m_chainstates as well?

    I need to double-check but IIRC, the issue is that m_chainstates requires a lock to access, and it would require larger changes and more verbosity in the tests if they had to be changed to access m_chainstates directly. Will follow up and add a comment describing purpose of this function.


    ryanofsky commented at 9:47 pm on July 19, 2024:

    re: #30214 (review)

    Why do you add this function around here instead of using m_chainstates as well?

    Dropped this function now, it was not hard to get rid of.

  18. in src/validation.h:579 in 493d3844cf outdated
    551     //! @sa ChainstateRole
    552-    ChainstateRole GetRole() const EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    553+    kernel::ChainstateRole GetRole() const EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    554+
    555+    //! Return whether chain is fully validated, assumed valid, or invalid.
    556+    ChainValidity Validity() const EXCLUSIVE_LOCKS_REQUIRED(::cs_main)
    


    fjahr commented at 10:51 am on July 15, 2024:
    Not sure why we need this when in another commit we remove GetAll() and instead use m_chainstates. Seems kind of inconsistent but maybe I am missing something.

    ryanofsky commented at 5:43 pm on July 18, 2024:

    re: #30214 (review)

    Not sure why we need this when in another commit we remove GetAll() and instead use m_chainstates. Seems kind of inconsistent but maybe I am missing something.

    Is this comment about the Validity() method? This is just supposed to be a public accessor for code that cannot access the private m_validity member/

  19. fjahr commented at 11:11 am on July 15, 2024: contributor

    Approach ACK

    I think pretty much everything here is a worth while improvement and should set up the following improvements well. I am not so deep into the current state of the kernel work though, so I can not comment on that part, for example the newly introduced typing file. I guess @TheCharlatan could give his nod to that here.

    It removes uses of the terms “active chainstate,” “usable chainstate,” “disabled chainstate,” “ibd chainstate,” and “snapshot chainstate” which are confusing for various reasons.

    I think this is a bit confusing because it sounds like you are getting rid of this completely but it’s actually just avoided where you touch the code, right? Cleaning the terms up everywhere can be a follow-up, I would like to try to rewrite the design doc a bit more anyway. But I think (selfishly) would like to break it up into design and usage docs because IMO that makes it way more manageable, i.e. the last commit of #29553.

  20. ryanofsky commented at 6:41 pm on July 18, 2024: contributor

    re: #30214#pullrequestreview-2177291599

    Thanks for reviewing! Will rebase and try to address comments. I’m especially interested if you have ideas on splitting up documentation and code changes into separate PRs.

    I am not so deep into the current state of the kernel work though, so I can not comment on that part, for example the newly introduced typing file. I guess @TheCharlatan could give his nod to that here.

    Sure, for background kernel/types.h is analogous to node/types.h, wallet/types.h, and common/types.h. The types files are used when libraries want to define shared struct and enum types that can be used by code NOT directly depending on the library. For example the gui code uses the wallet::isminetype enum type, but the gui should not depend on the wallet library, so the enum is defined in a shared header wallet/types.h. In this PR, wallet code uses the kernel::ChainstateRole struct type, but the wallet code should not depend on the kernel library, so the struct is defined in a shared header kernel/types.h. This pattern started with the wallet library, but has been useful for node and common libraries as well, so it is natural to extend to the kernel.

    A more stricter separation between libraries would be possible with a more complicated structure, but having shared types files has been a reasonable way to separate the libraries while having a 1 library = 1 directory = 1 namespace structure.

    It removes uses of the terms “active chainstate,” “usable chainstate,” “disabled chainstate,” “ibd chainstate,” and “snapshot chainstate” which are confusing for various reasons.

    I think this is a bit confusing because it sounds like you are getting rid of this completely but it’s actually just avoided where you touch the code, right? Cleaning the terms up everywhere can be a follow-up, I would like to try to rewrite the design doc a bit more anyway. But I think (selfishly) would like to break it up into design and usage docs because IMO that makes it way more manageable, i.e. the last commit of #29553.

    Yeah this description was written aspirationally before I implemented this PR and I did not get around to removing all the old language. I’d be interested if you have specific ideas on how to break this PR up better. I definitely think it could make sense to move documentation / comment changes to a separate PR, and I hope #29553 can be reviewed more and merged soon, before this PR. (This is a good reminder for me to re-ack #29553)

  21. ryanofsky force-pushed on Jul 19, 2024
  22. ryanofsky commented at 9:55 pm on July 19, 2024: contributor
    Rebased 493d3844cfc8628fd63184fa7a657126d9b5dc95 -> 2c1d8b426fa1832fa464c12f3cae7e2976584430 (pr/cstate.4 -> pr/cstate.5, compare) due to conflicts with various prs including #30267, #30388, #30395, #29996, #30406, #30320. Also made suggested simplifications. Rebased 2c1d8b426fa1832fa464c12f3cae7e2976584430 -> 4c995842e3689974364c8c82ee220bfaa30f7f79 (pr/cstate.5 -> pr/cstate.6, compare) due to conflict with #30111 Rebased 4c995842e3689974364c8c82ee220bfaa30f7f79 -> 7fd7960389254c7a2ba50614e1cbdf8911ef6a7d (pr/cstate.6 -> pr/cstate.7, compare) due to conflict with #29656
  23. DrahtBot removed the label Needs rebase on Jul 19, 2024
  24. DrahtBot added the label Needs rebase on Jul 24, 2024
  25. ryanofsky force-pushed on Jul 24, 2024
  26. DrahtBot removed the label Needs rebase on Jul 24, 2024
  27. DrahtBot added the label Needs rebase on Aug 5, 2024
  28. test: Fix broken chainstatemanager_snapshot_init check
    The following test code never checked anything because the if statement was
    always false:
    
        if (cs != &chainman_restarted.ActiveChainstate()) {
            BOOST_CHECK_EQUAL(cs->m_chain.Height(), 109);
        }
    
    Also, the height of the background chainstate it was intending to check is 110,
    not 109. Fix both problems by rewriting the check.
    90e55a8714
  29. refactor: Add Chainstate::m_target_blockhash member
    Make Chainstate objects aware of what block they are targeting. This makes
    Chainstate objects more self contained, so it's possible to determine what
    blocks should be downloaded and connected to them without needing to query
    ChainstateManager or look at other Chainstate objects.
    
    The motivation for this change is to make code more readable, so it only
    requires knowing about Chainstate targets, not reasoning about the assumeutxo
    snapshot validation sequence. This change also enables simplifications to the
    ChainstateManager interface in subsequent commits, and could make it easier to
    implement new features like creating new Chainstate objects to generate UTXO
    snapshots or index UTXO data.
    ccfeb047b3
  30. refactor: Add Chainstate m_validity and m_target_blockhash members
    Get rid of m_disabled/IsUsable members. Instead of marking chains disabled for
    different reasons, record chainstate validity and validation progress
    explicitly and use that information directly.
    a35a2a0588
  31. refactor: Deduplicate Chainstate activation code
    Move duplicate code from ChainstateManager::ActivateSnapshot and
    ChainstateManager::ActivateExistingSnapshot methods to a new
    ChainstateManager::AddChainstate method.
    
    The "AddChainstate" method name doesn't mention snapshots even though it is
    only used to add snapshot chainstates now, because it becomes more generalized
    in a later commit in this PR ("refactor: Add ChainstateManager::m_chainstates
    member")
    b10a9a41aa
  32. refactor: Pass chainstate parameters to MaybeCompleteSnapshotValidation
    Remove hardcoded references to m_ibd_chainstate and m_snapshot_chainstate so
    MaybeCompleteSnapshotValidation function can be simpler and focus on validating
    the snapshot without dealing with internal ChainstateManager states.
    
    This is a step towards being able to validate the snapshot outside of
    ActivateBestChain loop so cs_main is not locked for minutes when the snapshot
    block is connected.
    5f60910d8d
  33. refactor: Add Chainstate::StoragePath() method
    Use to simplify code determining the chainstate leveldb paths.
    f36304ed8a
  34. refactor: Add ChainstateManager::CurrentChainstate() method
    CurrentChainstate() is basically the same as ActiveChainstate() except it
    requires cs_main to be locked when it is called, instead of locking cs_main
    internally.
    
    The name "current" should also be less confusing than "active" because multiple
    chainstates can be active, and CurrentChainstate() returns the chainstate
    targeting the current network tip, regardless of what chainstates are being
    downloaded or how they are used.
    c95d4917da
  35. refactor: Add ChainstateManager::ValidatedChainstate() method
    ValidatedChainstate() accessor replaces GetChainstateForIndexing() with no
    change in behavior.
    e65271cd86
  36. refactor: Convert ChainstateRole enum to struct
    Change ChainstateRole parameter passed to wallets and indexes. Wallets and
    indexes need to know whether chainstate is historical and whether it is fully
    validated. They should not be aware of the assumeutxo snapshot validation
    process.
    543fbc3893
  37. refactor: Delete ChainstateManager::IsSnapshotActive() method
    IsSnapshotActive() method is only called one place outside of tests and
    asserts, and is confusing because it returns true even after the snapshot is
    fully validated.
    
    The documentation which said this "implies that a background validation
    chainstate is also in use" is also incorrect, because after the snapshot is
    validated, the background chainstate gets disabled and IsUsable() would return
    false.
    1647e6142b
  38. refactor: Delete ChainstateManager::IsSnapshotValidated() method
    IsSnapshotValidated() is only called one place outside of tests, and is use
    redundantly in some tests, asserting that a snapshot is not validated when a
    snapshot chainstate does not even exist. Simplify by dropping the method and
    checking Chainstate validity field directly.
    6db2c4355d
  39. refactor: Delete ChainstateManager::SnapshotBlockhash() method
    SnapshotBlockhash() is only called two places outside of tests, and is used
    redundantly in some tests, checking the same field as other checks. Simplify by
    dropping the method and using the m_from_snapshot_blockhash field directly.
    e5bee04e6c
  40. refactor: Add ChainstateManager::m_chainstates member
    Use to replace m_active_chainstate, m_ibd_chainstate, and m_snapshot_chainstate
    members.
    7eb46726bf
  41. refactor: Add ChainstateManager::ActivateBestChains() method
    Deduplicate code looping over chainstate objects and calling
    ActivateBestChain().
    c4ab2c6cbd
  42. refactor: Delete ChainstateManager::GetAll() method
    Just use m_chainstates array instead.
    136c5f784c
  43. doc: Improve ChainstateManager documentation, use consistent terms 7fd7960389
  44. ryanofsky force-pushed on Aug 7, 2024
  45. DrahtBot removed the label Needs rebase on Aug 7, 2024
  46. DrahtBot added the label Needs rebase on Aug 9, 2024
  47. DrahtBot commented at 10:19 pm on August 9, 2024: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  48. fjahr commented at 8:20 pm on August 25, 2024: contributor

    Thanks for addressing my feedback @ryanofsky ! I hope we can merge #29553 soon and then I will get back to this with another full review. But it does look very good to me already.

    I’m especially interested if you have ideas on splitting up documentation and code changes into separate PRs.

    I’m unsure if it’s really needed because I find clean refactoring commits like those here can be reviewed quicker than other changes, so I would be fine to keep it as is just based on volume of the change. But I did think about it and if that is feasible maybe splitting along the lines of everything that touches ChainstateRole and the rest (in whatever order you think works better) could be an idea. If you feel like splitting it that’s fine for me but I think I would prefer to keep it as one PR as I have already passed everything so I am biased towards fewer fundamental changes ;)

    Ah, and since the first is a fix you could just open that as a separate PR now. That should be an easy review and merge that doesn’t conflict with anything else. The docs changes it feels like it’s better to keep them along the code since you need to get your head into the topic either way. I often feel like reviewing docs and code side by side challenges your understanding and might lead to higher quality review.

  49. Theghost256 approved
  50. TheCharlatan commented at 9:09 pm on October 9, 2024: contributor
    Concept ACK

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me