As IsInitialBlockDownload latches to false only once the Tip is sufficiently advanced there is no need to check the Tip everytime IsIBD is called.
By caching this in advance we can avoid extra work and more importantly a lock.
As IsInitialBlockDownload latches to false only once the Tip is sufficiently advanced there is no need to check the Tip everytime IsIBD is called.
By caching this in advance we can avoid extra work and more importantly a lock.
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--006a51241073e994b41acfe9ec718e94-->
For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32885.
<!--021abf342d371248e50ceaed478a90ca-->
See the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste <code><!--meta-tag:bot-skip--></code> into the comment that the bot should ignore.
<!--174a7506f384e20aa4161008e828411d-->
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
<!--5faf32d7da4f0f540f40219e4f7537a3-->
<!--85328a0da195eb286784d51f73fa0af9-->
🚧 At least one of the CI tasks failed.
<sub>Task tidy: https://github.com/bitcoin/bitcoin/runs/45440913265</sub>
<sub>LLM reason (✨ experimental): The CI failure is caused by compilation errors due to missing mutex lock assertions in validation.cpp.</sub>
<details><summary>Hints</summary>
Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
</details>
Conceptually not a bad idea to cache and lock less, but imo this makes the code more brittle (and harder to understand), e.g. if any tip updates happen without the cache being updated separately.
Do you have any data as to the actual performance improvements from this PR?
I think something like this, if properly implemented (I haven't thought much about the code yet), would reduce the GUI freezes during IBD in a noticeable manner.
Conceptually not a bad idea to cache and lock less, but imo this makes the code more brittle (and harder to understand), e.g. if any tip updates happen without the cache being updated separately.
Do you have any data as to the actual performance improvements from this PR?
I'm (very) open to suggestions on how to make the caching call more robust. (Indeed I expected some.)
There's no performance improvement from this PR, it's the first in a series of proposed changes I'll be making to remove locking where it's not necessary, with the end goal being some form of concurrency being possible in message processing.
I think something like this, if properly implemented (I haven't thought much about the code yet), would reduce the GUI freezes during IBD in a noticeable manner.
I hadn't even considered that, but certainly that's a possible direct improvement.
@pstratem, not sure if you saw this, but could be helpful: https://github.com/bitcoin/bitcoin/pull/25081
tight polling of is_ibd seems like a mistake in the first place, so i am not sure if this is something to optimize for.
Looking at the remaining call sites of the ibd check, most have cs_main already, so they won't be affected by this? The remaining ones (I only found MaybeSendFeefilter), if they are relevant, could either re-order their code to call it less often, or cache the bool themselves? For the gui, see also https://github.com/bitcoin/bitcoin/issues/17145
Concept ACK, but I'm not convinced this implementation is safe as-is. If we want to maintain the current behaviour, it's not sufficient to update only when the tip changes. We also need to re-check when importing/reindexing completes, and schedule an update timer if max_tip_age is the final cause of not exiting IBD.
Concept ACK, but I'm not convinced this implementation is safe as-is. If we want to maintain the current behaviour, it's not sufficient to update only when the tip changes. We also need to re-check when importing/reindexing completes, and schedule an update timer if max_tip_age is the final cause of not exiting IBD.
This made me revisit the function and consider what we're trying to achieve.
The function is only interesting when it can latch to the IBD finished state.
That's only possible when all four conditions are met, which can only happen when the tip is updated.
The final time based condition can only change when the tip changes as it gets further away with time, not closer.
3078 | @@ -3089,6 +3079,7 @@ bool Chainstate::DisconnectTip(BlockValidationState& state, DisconnectedBlockTra 3079 | } 3080 | 3081 | m_chain.SetTip(*pindexDelete->pprev); 3082 | + m_chainman.CacheIsInitialBlockDownload();
First I thought this was not necessary because disconnecting a block shouldn't usually get you out of IBD, but I guess there are edge cases (starting up, with the old tip having a lower timestamp than it's parent block) where this could lead to get us out of IBD?!
It's technically possible for disconnecting a block to get us out of IBD, though I really don't think that particular edge case is super important.
I was just trying to be thorough.
2043 | + if (m_cached_finished_ibd.load(std::memory_order_relaxed)) return; 2044 | + 2045 | + if (m_blockman.LoadingBlocks()) return; 2046 | + 2047 | + { 2048 | + AssertLockHeld(cs_main);
since the function is annotated with EXCLUSIVE_LOCKS_REQUIRED(cs_main) anyway, why not put it to the beginning of the function, as it is done in most other places?
The function is only interesting when it can latch to the IBD finished state.
That's only possible when all four conditions are met, which can only happen when the tip is updated.
The final time based condition can only change when the tip changes as it gets further away with time, not closer.
I think that @luke-jr is right. If we reindex, we set m_importing to true in ImportBlocks, so any blocks we connect there can never result in getting out of IBD due to the m_blockman.LoadingBlocks() early return.
Therefore we need a call to CacheIsInitialBlockDownload() after ImportingNow goes out of scope in ImportBlocks().
Ok I thought about it and it just wasn't obviously correct enough.
So I've rewritten into three commits to be simpler.
On systems with sane clocks the chain tip checks can only change when the tip
changes. The gap between the chain tip and the current time only grows.
<!--cf906140f33d8803c4a75a2196329ecb-->
🐙 This pull request conflicts with the target branch and needs rebase.
20 | @@ -21,6 +21,7 @@ void TestChainstateManager::DisableNextWrite() 21 | void TestChainstateManager::ResetIbd() 22 | { 23 | m_cached_finished_ibd = false; 24 | + m_cached_chaintip_recent = false;
I find the existence of his whole method very hacky, we're testing something that cannot happen in reality so if the test passes or fails after this, it won't increase my confidence in the product.
But if you insist on updating it (which we likely have to), we should update JumpOutOfIbd as well for symmetry.
2039 | @@ -2040,6 +2040,17 @@ bool ChainstateManager::IsInitialBlockDownload() const 2040 | return false; 2041 | } 2042 | 2043 | +void ChainstateManager::UpdateCachedChaintipRecent()
We're introducing dead code in the first commit without context about where these values are coming from.
What if instead we extract the internal checks from IsInitialBlockDownload and slowly migrate that behavior away from there.
Note also that ActiveTip() already returns the tip we need.
I'm also not exactly sure why we're calling the current state "cached".
And we're already in ChainstateManager, simply referring to "tip" is already unambiguous.
The first commit could lay the groundwork by extracting-and-reusing the recency check only, the second commit could route active chain SetTip through ChainstateManager to make sure each state change updates this as well, the third commit could cache the locked recency calculations, and the last one could finally eliminate the lock from the reader side.
1142 | @@ -1142,6 +1143,7 @@ class ChainstateManager 1143 | 1144 | /** Check whether we are doing an initial block download (synchronizing from disk or network) */ 1145 | bool IsInitialBlockDownload() const; 1146 | + void UpdateCachedChaintipRecent() EXCLUSIVE_LOCKS_REQUIRED(cs_main);
This could be const getter instead and it could use some comment (and I'd specialize it to just return the value instead of mutating the state, we can do that in the SetTip method instead)
/** Check whether the active chain tip exists, has enough work, and is recent. */
bool IsTipRecent() const EXCLUSIVE_LOCKS_REQUIRED(cs_main);
3099 | @@ -3100,6 +3100,7 @@ bool Chainstate::DisconnectTip(BlockValidationState& state, DisconnectedBlockTra
3100 | }
3101 |
3102 | m_chain.SetTip(*pindexDelete->pprev);
what's the reason for separating this work from SetTip, if it's related to it? We could move it to the manager which would call both method, the tip update, followed by the IBD state update
2046 | + CChain& chain{ActiveChain()}; 2047 | + if (chain.Tip() == nullptr) return; 2048 | + if (chain.Tip()->nChainWork < MinimumChainWork()) return; 2049 | + if (chain.Tip()->Time() < Now<NodeSeconds>() - m_options.max_tip_age) return; 2050 | + 2051 | + m_cached_chaintip_recent = true;
shouldn't we guard this method with this being false?
1028 | @@ -1029,6 +1029,7 @@ class ChainstateManager 1029 | * const, which latches this for caching purposes. 1030 | */ 1031 | mutable std::atomic<bool> m_cached_finished_ibd{false}; 1032 | + mutable std::atomic<bool> m_cached_chaintip_recent{false};
no need to mention chain here and we can use std::atomic_bool instead and should add some description to it
Concept ACK, makes sense to push the burden to the writer instead of the reader. But we need to restructure it slightly so that it tells a story of what we're extracting, delegating and caching exactly.
I have implemented an example in https://github.com/l0rinc/bitcoin/pull/60 (prototype, may not pass all tests yet).
There hasn't been progress here in many months. Maybe time to re-open it?
I will open an alternative PR for this today Edit: pushed https://github.com/bitcoin/bitcoin/pull/34253