Set notifications m_tip_block in LoadChainTip()

Sjors commented at 12:13 pm on November 22, 2024: member

Ensure KernelNotifications m_tip_block is set even if no new block arrives.

Suggested in #31297 (comment)

DrahtBot commented at 12:13 pm on November 22, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31346.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	ryanofsky, mzumsande, TheCharlatan

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#31393 (refactor: Move GuessVerificationProgress into ChainstateManager by maflcko)
#31325 (Make m_tip_block std::optional by Sjors)
#31283 (Add waitNext() to BlockTemplate interface by Sjors)
#31177 (rpc, logging: return “verificationprogress” of 1 when up to date by polespinasa)
#29700 (kernel, refactor: return error status on all fatal errors by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

in src/validation.cpp:4712 in b6d5be198a outdated

4705@@ -4706,6 +4706,11 @@ bool Chainstate::LoadChainTip()
4706               m_chain.Height(),
4707               FormatISO8601DateTime(tip->GetBlockTime()),
4708               GuessVerificationProgress(m_chainman.GetParams().TxData(), tip));
4709+
4710+    // Ensure KernelNotifications m_tip_block is set even if no new block arrives.
4711+    // Ignoring return value for now.
4712+    (void)m_chainman.GetNotifications().blockTip(GetSynchronizationState(/*init=*/true, m_chainman.m_blockman.m_blockfiles_indexed), *pindex);

l0rinc commented at 12:17 pm on November 22, 2024:

nit: at this point it may make more sense to just remove the [[nodiscard]] from blockTip instead

Sjors commented at 12:30 pm on November 22, 2024:

Maybe, or we could do something with the return value. I’m not familiar enough with init code and kernel notifications to have a strong opinion on this. cc @ryanofsky, @TheCharlatan

ryanofsky commented at 1:16 pm on November 22, 2024:

Just ignoring the return value is probably the best thing for now since this function and its callers return bool. After #29700 it could more easily bubble up the interrupted status. The kernel API is designed for a better world where operations can be more easily interrupted but not everything is interruptible for now.

mzumsande commented at 4:24 am on November 25, 2024:

LoadChainTip() is called in a loop for all chainstates in CompleteChainstateInitialization() so that blockTip would be called twice, for different blocks, in the case of an ongoing AssumeUtxo sync. This seems problematic to me - how would subscribers to blockTip be supposed to deal with that? I think that the background chainstate shouldn’t send a notification, as it it is done in ActivateBestChain().

ryanofsky commented at 6:38 pm on December 4, 2024:

re: #31346 (review)

Good catch, looks like this is fixed now.

maflcko commented at 12:42 pm on November 22, 2024: member

This is incomplete. You’ll have to update the docs as well: “If the tip was not connected on * startup, this will wait.”, and “which may //! be true even long after startup, until shutdown.”. Also, in init.cpp you can remove the if-guard?

Sjors force-pushed on Nov 22, 2024

Sjors commented at 2:19 pm on November 22, 2024: member

I updated the waitTipChanged() and m_tip_block doc. @maflcko which if-guard do you mean?

ryanofsky commented at 2:23 pm on November 22, 2024: contributor

@maflcko which if-guard do you mean?

I think probably this one:

0    // Wait for genesis block to be processed
1    if (WITH_LOCK(chainman.GetMutex(), return chainman.ActiveTip() == nullptr)) {

Removing the if() might help verify that m_tip_block is set reliably

maflcko commented at 2:29 pm on November 22, 2024: member

Also, the ZERO workaround could be removed completely, ensuring that any interfaces will only be spun up after that point in init?

Sjors force-pushed on Nov 22, 2024

Sjors commented at 2:42 pm on November 22, 2024: member

I dropped the if guard.

ensuring that any interfaces will only be spun up after that point in init

It could, but I’m not sure if it’s worth putting such a restriction on the interfaces.

Similarly we have startupnotify which happens at the end of init. If we also ensure that the interfaces are ready to listen before this notification, then an application launched with systemd (or similiar) can wait for this signal before trying to connect.

If on the other hand an external application is started at the same time, it would just have to keep trying to connect until we listen. Such a race would exist even if we opened the interface a bit earlier in the init process, though it would be shorter.

ryanofsky commented at 3:00 pm on November 22, 2024: contributor

It could, but I’m not sure if it’s worth putting such a restriction on the interfaces.

Tend to agree for now. Having waitTipChanged wait for a tip to be connected seems like a natural thing to implement and might avoid the need for clients to poll the node on startup. But I think this check could be dropped in BlockTemplate::waitNext in #31283 (maybe replaced with an assert there) since it should not be possible to get a blocktemplate reference before the tip is connected. After #31283 it might make sense to drop the waitTipChanged method entirely too.

Sjors commented at 4:02 pm on November 22, 2024: member

Several reindex related tests are broken, investigating…

After #31283 it might make sense to drop the waitTipChanged method entirely too.

I think it’s generically useful though. There are various types of applications that need to know when a new block is added to the chain (and have no need for a template).

DrahtBot added the label CI failed on Nov 22, 2024

DrahtBot commented at 4:06 pm on November 22, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/33386413030

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

Sjors commented at 4:46 pm on November 22, 2024: member

Specifically -reindex-chainstate is broken, easy to reproduce with e.g. testnet4.

It stops after the second block:

02024-11-22T16:44:14.338986Z Setting NODE_NETWORK on non-prune mode
12024-11-22T16:44:14.339014Z Wait for genesis block to be processed
22024-11-22T16:44:14.339023Z initload thread start
32024-11-22T16:44:14.341109Z UpdateTip: new best=00000000da84f2bafbbc53dee25a72ae507ff4914b867c565be350b0da8bf043 height=0 version=0x00000001 log2_work=32.000022 tx=1 date='2024-05-03T23:11:00Z' progress=0.000001 cache=0.3MiB(0txo)
42024-11-22T16:44:14.341179Z UpdateTip: new best=0000000012982b6d5f621229286b880e909984df669c2afabb102ce311b13f28 height=1 version=0x20000000 log2_work=33.000022 tx=2 date='2024-05-06T13:08:48Z' progress=0.000002 cache=0.3MiB(1txo)

Putting the if-guard back “fixes” it. When I put a log statement before WAIT_LOCK and after the wait function returns, I can see that this lines are called regardless of the if-guard.

Turns out the only critical thin is that the WAIT_LOCK stuff lives in its own scope.

Sjors force-pushed on Nov 22, 2024

DrahtBot removed the label CI failed on Nov 22, 2024

in src/init.cpp:1794 in 3317123aa3 outdated

1791@@ -1792,7 +1792,7 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
1792     });
1793 
1794     // Wait for genesis block to be processed

mzumsande commented at 3:56 am on November 25, 2024:

My understanding is that this will only ever actually wait in three cases: 1.) first startup with an empty datadir 2.) reindex 3.) reindex-chainstate

where the genesis block is being connected by the initload thread, whereas otherwise the m_tip_block_cv wait condition should be satisfied because m_tip_block should have been set earlier by the same thread. So removing the if guard is not necessary but just a matter of taste.

After this PR, it will be harder to understand how the folllowing code block refers to the genesis block, so maybe the comment could be expanded to mention some of the above.

Sjors commented at 8:42 am on November 25, 2024:

I drafted a comment, but I think it’s wrong so I haven’t pushed it yet:

0    /* 
1     * Wait for genesis block to be processed. Typically kernel_notifications.m_tip_block
2     * has already been set, except in three cases where the genesis block is
3     * being connected by the initload thread:
4     *
5     * 1. first startup with an empty datadir
6     * 2. reindex
7     * 3. reindex-chainstate
8     */

I don’t think the initload thread ever sets kernel_notifications.m_tip_block. The thread only deals with indexes derived from BaseIndex, which doesn’t include CBlockIndex.

I’m still confused under what circumstance it would actually be unset.

Sjors commented at 8:47 am on November 25, 2024:

It’s clearly doing something though, because the following breaks several functional tests:

 0--- a/src/init.cpp
 1+++ b/src/init.cpp
 2@@ -1792,13 +1792,14 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
 3     });
 4 
 5     // Wait for genesis block to be processed
 6     {
 7         WAIT_LOCK(kernel_notifications.m_tip_block_mutex, lock);
 8-        kernel_notifications.m_tip_block_cv.wait(lock, [&]() EXCLUSIVE_LOCKS_REQUIRED(kernel_notifications.m_tip_block_mutex) {
 9-            return !kernel_notifications.m_tip_block.IsNull() || ShutdownRequested(node);
10-        });
11+        Assert(!kernel_notifications.m_tip_block.IsNull());
12+        // kernel_notifications.m_tip_block_cv.wait(lock, [&]() EXCLUSIVE_LOCKS_REQUIRED(kernel_notifications.m_tip_block_mutex) {
13+        //     return !kernel_notifications.m_tip_block.IsNull() || ShutdownRequested(node);
14+        // });
15     }

Sjors commented at 9:02 am on November 25, 2024:

With some manual testing I indeed get all three conditions from your comment to hit that assert. I’m just confused why.

Sjors commented at 9:31 am on November 25, 2024:

Outside the context of a snapshot, we only call LoadChainTip() if:

https://github.com/bitcoin/bitcoin/blob/2638fdb4f934be96b7c798dbac38ea5ab8a6374a/src/node/chainstate.cpp#L147-L150

That explains why these three scenarios are different. With an empty datadir or when we -reindex(-chainstate) the UTXO set is empty[0].

Still looking into where it is set.

[0] more accurately, doesn’t point to a tip or we asked to wipe it. The name is misleading, it doesn’t matter if by some magic every UTXO ends up spent.

https://github.com/bitcoin/bitcoin/blob/2638fdb4f934be96b7c798dbac38ea5ab8a6374a/src/node/chainstate.cpp#L260-L262

Sjors commented at 10:22 am on November 25, 2024:

In the case of fresh data dir or reindex, it’s ActivateBestChain() that calls blockTip() once it sets the genesis block to the tip.

This indeed happens in the initload threads, which seems to keep doing this throughout the reindex process. Once all previously stored blocks are done, it exits and the msghand thread takes over.

With a fresh datadir the initload thread only connects the genesis block.

Sjors commented at 10:42 am on November 25, 2024:

Specifically it’s ImportBlocks() that does this, even if no blocks are passed to it.

which seems to keep doing this throughout the reindex process.

In fact it does it after reindex is finished. ActivateBestChain connects blocks one by one until it has nothing left to do so that makes sense.

Sjors commented at 10:56 am on November 25, 2024:

Took the comment and expanded it a bit.

mzumsande commented at 3:32 pm on November 25, 2024:

In fact it does it after reindex is finished. ActivateBestChain connects blocks one by one until it has nothing left to do so that makes sense.

Just as an aside: ABC is also called during reindexing when encountering the genesis block. That’s why we have this slightly weird behavior with a full -reindex that normally first reindexes all blocks and then builds the chainstate, but if the user interrupts it during reindexing and restarts the node without -reindex, it will pick up by first connecting all blocks of the previously indexed block files, then reindex the remaining block files and finally connect the blocks from the second batch.

But that’s not really relevant to this PR, the comment seems fine now!

ryanofsky commented at 6:37 pm on December 4, 2024:

re: #31346 (review)

it will pick up by first connecting all blocks of the previously indexed block files

I don’t think I understand what the slightly weird behavior is, or maybe what is meant by connecting all blocks. I’d expect if node is started without -reindex, LoadChainTip should connect all the blocks that were previously connected, and this would just avoid repeating work and not be weird.

mzumsande commented at 7:13 pm on December 4, 2024:

The “natural” IBD order is to first download all headers, add them to the block tree database, and then download all blocks / connect them when possible (with “connect” I mean ActivateBestChain() / ConnectBlock()). During reindex we normally do it similarly: Go through all blockfiles, accept block headers to the db, don’t connect any block except Genesis, and only in the end call ActivateBestChain to connect blocks / rebuild the chainstate (the reindex-chainstate part).

In the case of interrupt / restart, this order is mixed up. Naively I would expect to load the previously saved state (with only Genesis connected), continue to walk through the block files until all headers are loaded, and finally call ActivateBestChain to connect all blocks.

But in reality, we will call already ActivateBestChain early, when the best header is somewhere mid-chain, connect a first batch of blocks to the chainstate, then reindex the remaining block files, and then connect a second batch.

When I called this “weird” I did so because it mixes up the normal order, but I didn’t know of a downside. However, it has a negative impact that @l0rinc pointed out to me (and I believe is working to fix): When we start connecting blocks without having all headers in our block tree db, we don’t apply assumevalid (because our best header is not above the minchainwork threshold), and are therefore connecting blocks much slower while reindexing the chainstate.

ryanofsky commented at 7:40 pm on December 4, 2024:

Thanks, so is the scenario specifically the case when a node started with -reindex is stopped before the final ActivateBestChain call? If it is stopped during the final ActivateBestChain call it seems like it would be picking up work where it left off when ActivateBestChain was called again, due to LoadChainTip setting the previous tip.

we will call already ActivateBestChain early, when the best header is somewhere mid-chain, connect a first batch of blocks to the chainstate, then reindex the remaining block files, and then connect a second batch.

This is referring to the ActivateBestChain call in LoadExternalBlockFile call you linked to earlier? And removing this call would fix the weirdness and assumevalid issue? It does seem like it would be nice to remove this call. I’m not sure what the point of connecting the genesis block early but not connecting any other blocks is.

mzumsande commented at 8:04 pm on December 4, 2024:

Thanks, so is the scenario specifically the case when a node started with -reindex is stopped before the final ActivateBestChain call?

Yes.

This is referring to the ActivateBestChain call in LoadExternalBlockFile call you linked to earlier?

Yes!

And removing this call would fix the weirdness and assumevalid issue? It does seem like it would be nice to remove this call. I’m not sure what the point of connecting the genesis block early but not connecting any other blocks is.

I don’t think we’d want to remove it completely, because in case of a -reindex it allows init to complete normal startup without having to wait for the import thread to finish reindexing block files (which can take a long time for mainnet). I think that @l0rinc is looking into possible fixes, but without having tested anything my first idea was to only call ActivateBestChain there it if the chain is empty, so it only doesn’t get called in the restart scenario where we already loaded Genesis during LoadChainTip.

l0rinc commented at 12:34 pm on December 6, 2024:

This is in my backlog, but I would be very happy if someone who already understands this fixes it instead :) I just noticed it during Core Dev that my perf logs are full of script validations when trying out different scenarios and cancelling a reindex.

mzumsande commented at 3:42 pm on December 6, 2024:

This is in my backlog, but I would be very happy if someone who already understands this fixes it instead :)

Ok thanks, I’ll try out my suggestion from above and will open a PR if it’s as straightforward as I think.

mzumsande commented at 6:52 pm on December 6, 2024:

#31439

Sjors force-pushed on Nov 25, 2024

Sjors commented at 8:44 am on November 25, 2024: member

I added a check to ensure we’re not the background chainstate when sending the notification.

Sjors force-pushed on Nov 25, 2024

Sjors commented at 10:59 am on November 25, 2024: member

I added a comment based on #31346 (review) to clarify under which circumstances m_tip_block is unset at this init stage.

I find it quite confusing that ActivateBestChain is called by ImportBlocks even if no blocks are imported, but haven’t dared to refactor it.

allysa08 approved

maflcko commented at 11:17 am on November 25, 2024: member

I find it quite confusing that ActivateBestChain is called by ImportBlocks even if no blocks are imported, but haven’t dared to refactor it.

It is explained in the code comment above the line that does it. It was added as a fix for https://github.com/bitcoin/bitcoin/issues/2239

ryanofsky approved

ryanofsky commented at 6:46 pm on December 4, 2024: contributor

Code review ACK be78eaa9831be06bd94438d1f10ec8879b5c756d. Having LoadChainTip call blockTip makes a lot of sense and removes a confusing special case here and in followup PRs (#31297, #31283). The new comments are also really nice IMO.

in src/node/blockstorage.cpp:1209 in be78eaa983 outdated

1205@@ -1206,7 +1206,7 @@ void ImportBlocks(ChainstateManager& chainman, std::span<const fs::path> import_
1206 {
1207     ImportingNow imp{chainman.m_blockman.m_importing};
1208 
1209-    // -reindex
1210+    // Empty datadir or -reindex

mzumsande commented at 9:40 pm on December 4, 2024:

This comment change is not correct and should just be dropped: m_blockfiles_indexed is true (its default value) in case of an empty datadir, false in case of a reindex.

Sjors commented at 5:33 am on December 6, 2024:

Ok, then I’m confused about what causes case (1): #31346 (review)

mzumsande commented at 5:47 am on December 6, 2024:

With an empty datadir, we don’t have the genesis connected at this point. But we don’t enter the reindex if clause here (the comment only refers to that), we just skip this and the loadblock block and go to the ActivateBestChain call below which connects Genesis and allows the init thread to continue.

mzumsande commented at 9:49 pm on December 4, 2024: contributor

Looks good to me, just a request to change one comment, will A CK after.

Sjors force-pushed on Dec 6, 2024

Sjors commented at 7:23 am on December 6, 2024: member

Rebased (just in case) an dropped comment (change): #31346 (review)

Set notifications m_tip_block in LoadChainTip()

Ensure KernelNotifications m_tip_block is set even if no new block arrives.

Additionally, have node init always wait for this to happen.

37946c0aaf

Sjors force-pushed on Dec 6, 2024

ryanofsky approved

ryanofsky commented at 2:30 pm on December 6, 2024: contributor

Code review ACK 37946c0aafeebc1585f1316fb05f252f7fb51e91, fixing comment bug caught by @mzumsande in #31346 (review) in another really helpful clarification

mzumsande commented at 3:41 pm on December 6, 2024: contributor

Code Review ACK 37946c0aafeebc1585f1316fb05f252f7fb51e91

TheCharlatan approved

TheCharlatan commented at 10:39 am on December 13, 2024: contributor

ACK 37946c0aafeebc1585f1316fb05f252f7fb51e91

ryanofsky merged this on Dec 13, 2024

ryanofsky closed this on Dec 13, 2024

Sjors deleted the branch on Dec 14, 2024

Sjors commented at 4:29 am on December 14, 2024: member

If you enjoyed this PR, you may like #31325.

Set notifications m_tip_block in LoadChainTip() #31346

Code Coverage & Benchmarks

Reviews

Conflicts