Update
Draft overall changeset published in #20158, closing this in favor of discussing in #20158.
Prelude
From my reading of past conversations and from a few offline chats, it seems that modularizing our consensus engine is a worthwhile first step towards a more complete isolation of said engine from non-consensus code.
Modularizing our consensus engine means that:
- We get clearer visibility into what currently lies in consensus codepaths and what depends on our consensus engine
- We can coalesce duplicate consensus initialization codepaths, mitigating against bugs that arise out of test/non-test initialization inconsistencies
De-globalizing g_chainman
In order to modularize our consensus engine, we need to first de-globalize the global ChainstateManager
– namely g_chainman
– as it and its dependencies are what makes up the bulk of our consensus engine. A few direct references to g_chainman
have already been removed in #19927, however, its indirect uses (mainly via ::Chain(state|)Active()
) are numerous in our codebase and often used to cheat avoid obtaining a ChainstateManager
reference.
In a series of PRs, I plan to remove the g_chainman
global and ensure that codepaths which depend on our consensus engine have a non-global reference to the object they require (be it ChainstateManger
, CChainState
, BlockManager
, or something more specific).
Order of commits
After having reorganized my commits a few times, I’ve realized that the ordering of commits can drastically influence both the review complexity of each diff and the number of commits needed in total to achieve the same thing. My current approach is basically to “continuously trim the leaves of the g_chainman call graph”.
Take, as an example, the following call graph where all of these functions either directly or indirectly reference g_chainman
/::Chain(state|)Active()
:
I can start with Qux
, and do the following:
- Does it need all of
ChainstateManger
? Or does it just need aCChainState
? Or just aBlockManager
? Or just theChainstateManager::ActiveTip()
? - If it already has the reference that it needs (as a member of the class it belongs to, as a function parameter, etc.) -> Use it
- If we’re dealing with a function which looks like it should be a member of a class that would have that reference -> Move the function to that class
- Otherwise, we need to pass the reference to
Qux
.
When we have to invoke the last two options, we propagate the dependency on g_chainman
/::Chain(state|)Active()
up to Wibble
and Wobble
. In the case of (3), it’ll look something like ::ChainstateActive().Qux()
; whereas in the case of (4), it’ll look something like Qux(::ChainstateActive())
.
This mean that, for some functions (LookupBlockIndex
being the main culprit), a resolution of their dependency on the g_chainman
/::Chain(state|)Active()
globals results in an increase in apparent references to g_chainman
/::Chain(state|)Active()
. However, it is important to note that these are not new dependencies by any means, those dependency were always there – they was just hidden by Qux
’s use of the globals. Furthermore, all of these “new” apparent references to g_chainman
/::Chain(state|)Active()
will be dealt with as we prune the call graph upward.
Current status
Right now, I have a working branch https://github.com/bitcoin/bitcoin/compare/master...dongcarl:2020-09-libbitcoinruntime-v5, which resolves of all the g_chainman
/::Chain(state|)Active()
global calls in validation.{cpp,h}
. I noticed that for resolving these calls in validation, there are 3 somewhat distinct bundles that have a natural ordering:
- A bundle of functions that are related to
::LookupBlockIndex
in the call graph (see: #20050) - A bundle of functions that are mempool-related
- This has a dependency on bundle (1) as
MemPoolAccept::PreChecks
calls::GetSpendHeight
, which is in bundle (1) as it callsLookupBlockIndex
- This has a dependency on bundle (1) as
- A bundle of functions which are not related to either
- This has a dependency on bundle (2) as
CChainState::{ActivateBestChainStep,InvalidateBlock}
both call::UpdateMempoolForReorg
which is in bundle (2)
- This has a dependency on bundle (2) as
I believe this bundling naturally splits up the validation cleanup into 3 pull requests, each touching around 12 functions.
I’m still organizing the rest the non-validation commits and will be pushing that up when it’s ready.
Note to reviewers: At this stage I would very much like to get a sanity check on my approach to make sure that there isn’t a better way to organize my commits that I’ve missed.