Ensure KernelNotifications m_tip_block
is set even if no new block arrives.
Suggested in #31297 (comment)
Ensure KernelNotifications m_tip_block
is set even if no new block arrives.
Suggested in #31297 (comment)
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31346.
See the guideline for information on the review process. A summary of reviews will appear here.
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
4705@@ -4706,6 +4706,11 @@ bool Chainstate::LoadChainTip()
4706 m_chain.Height(),
4707 FormatISO8601DateTime(tip->GetBlockTime()),
4708 GuessVerificationProgress(m_chainman.GetParams().TxData(), tip));
4709+
4710+ // Ensure KernelNotifications m_tip_block is set even if no new block arrives.
4711+ // Ignoring return value for now.
4712+ (void)m_chainman.GetNotifications().blockTip(GetSynchronizationState(/*init=*/true, m_chainman.m_blockman.m_blockfiles_indexed), *pindex);
[[nodiscard]]
from blockTip
instead
LoadChainTip()
is called in a loop for all chainstates in CompleteChainstateInitialization()
so that blockTip
would be called twice, for different blocks, in the case of an ongoing AssumeUtxo
sync.
This seems problematic to me - how would subscribers to blockTip
be supposed to deal with that? I think that the background chainstate shouldn’t send a notification, as it it is done in ActivateBestChain()
.
@maflcko which if-guard do you mean?
I think probably this one:
0 // Wait for genesis block to be processed
1 if (WITH_LOCK(chainman.GetMutex(), return chainman.ActiveTip() == nullptr)) {
Removing the if()
might help verify that m_tip_block
is set reliably
I dropped the if guard.
ensuring that any interfaces will only be spun up after that point in init
It could, but I’m not sure if it’s worth putting such a restriction on the interfaces.
Similarly we have startupnotify
which happens at the end of init. If we also ensure that the interfaces are ready to listen before this notification, then an application launched with systemd (or similiar) can wait for this signal before trying to connect.
If on the other hand an external application is started at the same time, it would just have to keep trying to connect until we listen. Such a race would exist even if we opened the interface a bit earlier in the init process, though it would be shorter.
It could, but I’m not sure if it’s worth putting such a restriction on the interfaces.
Tend to agree for now. Having waitTipChanged wait for a tip to be connected seems like a natural thing to implement and might avoid the need for clients to poll the node on startup. But I think this check could be dropped in BlockTemplate::waitNext in #31283 (maybe replaced with an assert there) since it should not be possible to get a blocktemplate reference before the tip is connected. After #31283 it might make sense to drop the waitTipChanged
method entirely too.
Several reindex related tests are broken, investigating…
After #31283 it might make sense to drop the
waitTipChanged
method entirely too.
I think it’s generically useful though. There are various types of applications that need to know when a new block is added to the chain (and have no need for a template).
🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/33386413030
Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
Specifically -reindex-chainstate
is broken, easy to reproduce with e.g. testnet4.
It stops after the second block:
02024-11-22T16:44:14.338986Z Setting NODE_NETWORK on non-prune mode
12024-11-22T16:44:14.339014Z Wait for genesis block to be processed
22024-11-22T16:44:14.339023Z initload thread start
32024-11-22T16:44:14.341109Z UpdateTip: new best=00000000da84f2bafbbc53dee25a72ae507ff4914b867c565be350b0da8bf043 height=0 version=0x00000001 log2_work=32.000022 tx=1 date='2024-05-03T23:11:00Z' progress=0.000001 cache=0.3MiB(0txo)
42024-11-22T16:44:14.341179Z UpdateTip: new best=0000000012982b6d5f621229286b880e909984df669c2afabb102ce311b13f28 height=1 version=0x20000000 log2_work=33.000022 tx=2 date='2024-05-06T13:08:48Z' progress=0.000002 cache=0.3MiB(1txo)
Putting the if-guard back “fixes” it. When I put a log statement before WAIT_LOCK
and after the wait
function returns, I can see that this lines are called regardless of the if-guard.
Turns out the only critical thin is that the WAIT_LOCK
stuff lives in its own scope.
1791@@ -1792,7 +1792,7 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
1792 });
1793
1794 // Wait for genesis block to be processed
My understanding is that this will only ever actually wait in three cases: 1.) first startup with an empty datadir 2.) reindex 3.) reindex-chainstate
where the genesis block is being connected by the initload
thread, whereas otherwise the m_tip_block_cv
wait condition should be satisfied because m_tip_block
should have been set earlier by the same thread. So removing the if
guard is not necessary but just a matter of taste.
After this PR, it will be harder to understand how the folllowing code block refers to the genesis block, so maybe the comment could be expanded to mention some of the above.
I drafted a comment, but I think it’s wrong so I haven’t pushed it yet:
0 /*
1 * Wait for genesis block to be processed. Typically kernel_notifications.m_tip_block
2 * has already been set, except in three cases where the genesis block is
3 * being connected by the initload thread:
4 *
5 * 1. first startup with an empty datadir
6 * 2. reindex
7 * 3. reindex-chainstate
8 */
I don’t think the initload
thread ever sets kernel_notifications.m_tip_block
. The thread only deals with indexes derived from BaseIndex
, which doesn’t include CBlockIndex
.
I’m still confused under what circumstance it would actually be unset.
It’s clearly doing something though, because the following breaks several functional tests:
0--- a/src/init.cpp
1+++ b/src/init.cpp
2@@ -1792,13 +1792,14 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
3 });
4
5 // Wait for genesis block to be processed
6 {
7 WAIT_LOCK(kernel_notifications.m_tip_block_mutex, lock);
8- kernel_notifications.m_tip_block_cv.wait(lock, [&]() EXCLUSIVE_LOCKS_REQUIRED(kernel_notifications.m_tip_block_mutex) {
9- return !kernel_notifications.m_tip_block.IsNull() || ShutdownRequested(node);
10- });
11+ Assert(!kernel_notifications.m_tip_block.IsNull());
12+ // kernel_notifications.m_tip_block_cv.wait(lock, [&]() EXCLUSIVE_LOCKS_REQUIRED(kernel_notifications.m_tip_block_mutex) {
13+ // return !kernel_notifications.m_tip_block.IsNull() || ShutdownRequested(node);
14+ // });
15 }
Outside the context of a snapshot, we only call LoadChainTip()
if:
That explains why these three scenarios are different. With an empty datadir or when we -reindex(-chainstate)
the UTXO set is empty[0].
Still looking into where it is set.
[0] more accurately, doesn’t point to a tip or we asked to wipe it. The name is misleading, it doesn’t matter if by some magic every UTXO ends up spent.
In the case of fresh data dir or reindex, it’s ActivateBestChain()
that calls blockTip()
once it sets the genesis block to the tip.
This indeed happens in the initload
threads, which seems to keep doing this throughout the reindex process. Once all previously stored blocks are done, it exits and the msghand
thread takes over.
With a fresh datadir the initload
thread only connects the genesis block.
Specifically it’s ImportBlocks()
that does this, even if no blocks are passed to it.
which seems to keep doing this throughout the reindex process.
In fact it does it after reindex is finished. ActivateBestChain
connects blocks one by one until it has nothing left to do so that makes sense.
In fact it does it after reindex is finished. ActivateBestChain connects blocks one by one until it has nothing left to do so that makes sense.
Just as an aside: ABC is also called during reindexing when encountering the genesis block. That’s why we have this slightly weird behavior with a full -reindex
that normally first reindexes all blocks and then builds the chainstate, but if the user interrupts it during reindexing and restarts the node without -reindex
, it will pick up by first connecting all blocks of the previously indexed block files, then reindex the remaining block files and finally connect the blocks from the second batch.
But that’s not really relevant to this PR, the comment seems fine now!
Ensure KernelNotifications m_tip_block is set even if no new block arrives.
Additionally, have node init always wait for this to happen.
I added a comment based on #31346 (review) to clarify under which circumstances m_tip_block
is unset at this init stage.
I find it quite confusing that ActivateBestChain
is called by ImportBlocks
even if no blocks are imported, but haven’t dared to refactor it.
I find it quite confusing that
ActivateBestChain
is called byImportBlocks
even if no blocks are imported, but haven’t dared to refactor it.
It is explained in the code comment above the line that does it. It was added as a fix for https://github.com/bitcoin/bitcoin/issues/2239