Have createNewBlock() wait for tip, make rpc handle shutdown during long poll and wait methods

Sjors commented at 2:56 pm on February 3, 2025: member

This PR prevents Mining interface methods from sometimes crashing when called during startup before a tip is connected. It also makes other improvements like making more RPC methods usable from the GUI. Specifically this PR:

Adds an Assume check to disallow passing negative timeout values to Mining::waitTipChanged
Makes waitfornewblock, waitforblock and waitforblockheight RPC methods usable from the GUI when -server=1 is not set.
Changes Mining::waitTipChanged to return optional<BlockRef> instead of BlockRef and return nullopt instead of crashing if there is a timeout or if the node is shut down before a tip is connected.
Changes Mining::waitTipChanged to not time out before a tip is connected, so it is convenient and safe to call during startup, and only returns nullopt on early shutdowns.
Changes Mining::createNewBlock to block and wait for a tip to be connected if it is called on startup instead of crashing. Also documents that it will return null on early shutdowns.

This allows waitNext() (added in #31283) to safely assume TipBlock() isn’t null, not even during a scenario of early shutdown.

Finally this PR clarifies long poll behaviour, mostly by adding code comments, but also through an early break.

DrahtBot commented at 2:57 pm on February 3, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31785.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	ryanofsky, TheCharlatan, vasild, achow101
Concept ACK	ismaelsadeeq

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#31117 (miner: Reorg Testnet4 minimum difficulty blocks by fjahr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

in src/node/interfaces.cpp:995 in af85987a16 outdated

988@@ -989,6 +989,12 @@ class MinerImpl : public Mining
989 
990     std::unique_ptr<BlockTemplate> createNewBlock(const BlockCreateOptions& options) override
991     {
992+        {
993+            // Ensure m_tip_block is set so consumers of BlockTemplate can rely on that.
994+            WAIT_LOCK(notifications().m_tip_block_mutex, lock);
995+            if (!notifications().TipBlock()) return nullptr;

l0rinc commented at 3:03 pm on February 3, 2025:

If it’s not too difficult, can we reproduce this with a test?

Sjors commented at 3:06 pm on February 3, 2025:

This depends on the node initialization (and shutdown) sequence. I’m not sure how to reproduce that in a test. And we might change that sequence in a way that this never happens in the first place.

Sjors commented at 3:05 pm on February 3, 2025: member

The Stratum v2 Template Provider that I implemented first calls mining.waitTipChanged(uint256::ZERO) before calling mining.createNewBlock(). This ensures that no matter how we order the node & IPC startup sequence, it won’t prematurely try to make a block.

See https://github.com/Sjors/bitcoin/pull/49 (src/sv2/template_provider.cpp in Sv2TemplateProvider::ThreadSv2Handler()).

But other implementers may not realise this, so having createNewBlock() return nothing when called too early seems like a good precaution.

See this comment for more details on the init sequence:

https://github.com/bitcoin/bitcoin/blob/1172bc4157eefe80d1aaf0b56459857ec651e535/src/init.cpp#L1817-L1833

Also note that currently the IPC server starts listening in AppInitMain before the above wait so the race condition is possible.

https://github.com/bitcoin/bitcoin/blob/1172bc4157eefe80d1aaf0b56459857ec651e535/src/init.cpp#L1358-L1367

A good client application might wait for -startupnotify before trying to connect via IPC. StartupNotify() is called at the end AppInitMain. But we shouldn’t assume all clients do.

in src/node/interfaces.cpp:1103 in af85987a16 outdated

988@@ -989,6 +989,12 @@ class MinerImpl : public Mining
989 
990     std::unique_ptr<BlockTemplate> createNewBlock(const BlockCreateOptions& options) override

maflcko commented at 3:21 pm on February 3, 2025:

Would be good to update the docs, because CreateNewBlock never returns null and always returns a block.

Sjors commented at 5:17 pm on February 3, 2025:

The latest push has waitTipChanged and createNewBlock return null if the node shuts down, and documentation is updated to reflect that.

Sjors force-pushed on Feb 3, 2025

Sjors commented at 5:13 pm on February 3, 2025: member

There were slightly more worms in this can.

Instead I’ve now changed createNewBlock() to first call waitTipChanged() and the latter to return null during a shutdown. This shifts some work to the RPC, but I think makes the overall behaviour easier to understand.

Sjors renamed this:
~~Have createNewBlock() ensure m_tip_block is set~~
Have createNewBlock() wait for tip, make rpc handle shutdown during long poll and wait methods
on Feb 3, 2025

Sjors force-pushed on Feb 3, 2025

in src/interfaces/mining.h:86 in be15f2e769 outdated

80@@ -81,15 +81,21 @@ class Mining
81      * @param[in] current_tip block hash of the current chain tip. Function waits
82      *                        for the chain tip to differ from this.
83      * @param[in] timeout     how long to wait for a new tip
84-     * @returns               Hash and height of the current chain tip after this call.
85+     *
86+     * @retval BlockRef hash and height of the current chain tip after this call.
87+     * @retval empty if node is shutting down

vasild commented at 9:41 am on February 12, 2025:

0     * [@retval](/bitcoin-bitcoin/contributor/retval/) std::nullopt if the given `timeout` passes or the node is shut down before the tip is connected (there is no tip during startup).

ryanofsky commented at 2:44 pm on February 12, 2025:

re: #31785 (review)

I don’t think suggested text is accurate because if timeout is exceeded while waiting for a new tip, this returns information about the current tip. I don’t think that behavior should be changed.

Returning null only when node is shutting down should be the simplest and best behavior.

vasild commented at 5:35 am on February 13, 2025:

if timeout is exceeded while waiting for a new tip, this returns information about the current tip

True. What I meant was that if there is no tip within the timeout. The function does wait_for(... tip_hash && tip_hash != current_tip ...) - tip_hash to be set and to be different than the current. I meant that it could remain unset while the timeout passes. I see how the comment I suggested is misleading. What about:

0[@retval](/bitcoin-bitcoin/contributor/retval/) std::nullopt if the given `timeout` passes before the tip is
1connected (there is no tip during startup) or the node is shut down.

?

Sjors commented at 10:20 am on February 13, 2025:

Ah, I think I see the problem. There’s a potential scenario where node startup is unusually slow, longer than timeout.

I think it’s better in that case to ignore the timeout and instead keep waiting for the node to set a tip.

vasild commented at 11:05 am on February 13, 2025:

Yes, this scenario I had in mind.

in that case to ignore the timeout and instead keep waiting for the node to set a tip

Hmm. What if the tip is not being connected for whatever reason? Wait forever? The current behavior of respecting the timeout looks safer. Would be an odd user experience to provide a timeout and the program to decide to wait longer for whatever reason.

Sjors commented at 2:34 pm on February 13, 2025:

Let’s continue here: #31785 (review)

in src/node/interfaces.cpp:992 in be15f2e769 outdated

990+        if (!tip_hash || chainman().m_interrupt) return {};
991+
992         // Must release m_tip_block_mutex before locking cs_main, to avoid deadlocks.
993         LOCK(::cs_main);
994-        return BlockRef{chainman().ActiveChain().Tip()->GetBlockHash(), chainman().ActiveChain().Tip()->nHeight};
995+        return BlockRef{*Assume(tip_hash), Assume(chainman().ActiveChain().Tip())->nHeight};

vasild commented at 9:50 am on February 12, 2025:

Just before the lock and return there was a period when both m_tip_block_mutex and cs_main were unlocked, so tip_hash may be stale here. So this could return an inconsistent hash+height - the hash of one block and the height of another. The previous code was ok because it retrieved both the hash and the height from chainman().ActiveChain().Tip() under cs_main. I think this will fix it:

0-        LOCK(::cs_main);
1-        return BlockRef{*Assume(tip_hash), Assume(chainman().ActiveChain().Tip())->nHeight};
2+        return getTip();

Sjors commented at 2:16 pm on February 12, 2025:

That seems simpler anyway.

ryanofsky commented at 2:41 pm on February 12, 2025:

In commit “rpc: handle shutdown during long poll and wait methods” (ad3af401c19b8d05ce69011a359db3090b7018e1)

Nice suggestion. If this suggestion is applied should also simplify the code above

0-        if (!tip_hash || chainman().m_interrupt) return {};
1+        if (chainman().m_interrupt) return {};

since tip_hash is no longer relevant. Can also reduce the scope of the tip_hash variable.

in src/rpc/blockchain.cpp:301 in be15f2e769 outdated

294+        std::optional<BlockRef> maybe_block{timeout ? miner.waitTipChanged(block.hash, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(block.hash)};
295+        // Return new block unless the node is shutting down
296+        if (maybe_block) {
297+            block = *CHECK_NONFATAL(maybe_block);
298+        }
299     }

vasild commented at 10:23 am on February 12, 2025:

Given that “no tip” is expected here (unlikely but is not a “programming logic error”), maybe avoid CHECK_NONFATAL() and use JSONRPCError:

0std::optional<BlockRef> old_block{miner.getTip()};
1std::optional<BlockRef> new_block;
2if (IsRPCRunning()) {
3    const uint256 h{old_block.has_value() ? old_block->hash : uint256::ZERO};
4    new_block = timeout > 0 ? miner.waitTipChanged(h, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(h);
5}
6
7if (!new_block.has_value()) {
8    throw JSONRPCError(RPC_CLIENT_NOT_CONNECTED, "No tip within timeout or shutting down");
9}

Sjors commented at 2:21 pm on February 12, 2025:

Oh, you mean old_block could be nullopt (assuming the RPC is loaded at this point).

But the suggested error here would change the current behavior - if there’s a timeout we return the old block. I could do an early error return though if the first getTip() doesn’t give us a tip.

vasild commented at 5:48 am on February 13, 2025:

I mean to avoid CHECK_NONFATAL() if the tip is null and use JSONRPCError instead.

If the tip is null at the start, then wait for it to be set, if not set within the timeout, then JSONRPCError.

Sjors commented at 11:15 am on February 13, 2025:

I ended up with something similar…

in src/rpc/blockchain.cpp:356 in be15f2e769 outdated

353+        }
354+        // Return new block unless the node is shutting down
355+        if (maybe_block) {
356+            block = *CHECK_NONFATAL(maybe_block);
357         }
358     }

vasild commented at 10:39 am on February 12, 2025:

Similar here:

 0std::optional<BlockRef> block{miner.getTip()};
 1const auto deadline{std::chrono::steady_clock::now() + 1ms * timeout};
 2while (IsRPCRunning() && (!block.has_value() || block->hash != hash)) {
 3    const uint256 h{block.has_value() ? block->hash : uint256::ZERO};
 4    if (timeout) {
 5        const auto now{std::chrono::steady_clock::now()};
 6        if (now >= deadline) break;
 7        const MillisecondsDouble remaining{deadline - now};
 8        block = miner.waitTipChanged(h, remaining);
 9    } else {
10        block = miner.waitTipChanged(h);
11    }
12}
13
14if (!block.has_value()) {
15    JSONRPCError(RPC_CLIENT_NOT_CONNECTED, "No tip within timeout or shutting down");
16}

in src/rpc/mining.cpp:810 in be15f2e769 outdated

806+                std::optional<BlockRef> maybe_tip{miner.waitTipChanged(hashWatchedChain, checktxtime)};
807+                if (!maybe_tip) {
808+                    throw JSONRPCError(RPC_CLIENT_NOT_CONNECTED, "Shutting down");
809+                };
810+                tip = CHECK_NONFATAL(maybe_tip)->hash;
811                 // Timeout: Check transactions for update

vasild commented at 10:43 am on February 12, 2025:

Why is comment saying “Timeout: …” when we could be here if the tip changed (no timeout)? Is the comment wrong?

Sjors commented at 12:46 pm on February 13, 2025:

I pushed a commit to explain the long poll behaviour. Also added an early return above this point, just for clarity.

in src/rpc/mining.cpp:807 in be15f2e769 outdated

801@@ -801,7 +802,11 @@ static RPCHelpMan getblocktemplate()
802         {
803             MillisecondsDouble checktxtime{std::chrono::minutes(1)};
804             while (tip == hashWatchedChain && IsRPCRunning()) {
805-                tip = miner.waitTipChanged(hashWatchedChain, checktxtime).hash;
806+                std::optional<BlockRef> maybe_tip{miner.waitTipChanged(hashWatchedChain, checktxtime)};
807+                if (!maybe_tip) {
808+                    throw JSONRPCError(RPC_CLIENT_NOT_CONNECTED, "Shutting down");

vasild commented at 10:45 am on February 12, 2025:

Maybe elaborate the message that we could be here if no tip was connected within the timeout: “No tip within timeout or shutting down”.

Sjors commented at 11:01 am on February 13, 2025:

I’m introducing a new RPC_SHUTDOWN_ERROR which makes sense here. I don’t think the message itself matters, since the user knows they shut down the node - and the exact details of where and why the RPC call fails aren’t important.

in src/rpc/mining.cpp:826 in be15f2e769 outdated

801@@ -801,7 +802,11 @@ static RPCHelpMan getblocktemplate()
802         {
803             MillisecondsDouble checktxtime{std::chrono::minutes(1)};
804             while (tip == hashWatchedChain && IsRPCRunning()) {
805-                tip = miner.waitTipChanged(hashWatchedChain, checktxtime).hash;
806+                std::optional<BlockRef> maybe_tip{miner.waitTipChanged(hashWatchedChain, checktxtime)};
807+                if (!maybe_tip) {
808+                    throw JSONRPCError(RPC_CLIENT_NOT_CONNECTED, "Shutting down");
809+                };
810+                tip = CHECK_NONFATAL(maybe_tip)->hash;

vasild commented at 10:48 am on February 12, 2025:

nit: I think it is redundant to check that maybe_tip has value with CHECK_NONFATAL() just after if (!maybe_tip). That’s like:

0if (A) {
1     throw ...
2}
3assert(!A);

tip = maybe_tip->hash; seems fine as well.

Sjors commented at 11:04 am on February 13, 2025:

Indeed, but on the other hand code changes over time, stuff appears between the if statement and the usage of tip, then someone changes the if statement, and now we have a crash…

vasild commented at 11:08 am on February 13, 2025:

Ok

in src/node/interfaces.cpp:1106 in be15f2e769 outdated

 996     }
 997 
 998     std::unique_ptr<BlockTemplate> createNewBlock(const BlockCreateOptions& options) override
 999     {
1000+        // Ensure m_tip_block is set so consumers of BlockTemplate can rely on that.
1001+        if (!waitTipChanged(uint256::ZERO, MillisecondsDouble::max())) return {};

vasild commented at 10:50 am on February 12, 2025:

The default for the timeout argument is MillisecondsDouble::max(), so:

0        if (!waitTipChanged(uint256::ZERO)) return {};

Sjors commented at 11:06 am on February 13, 2025:

It seems the compiler doesn’t know about this default inside the implementation file.

vasild commented at 11:36 am on February 13, 2025:

TIL https://flylib.com/books/en/2.887.1.53/1/

Possible solutions I see:

drop the default altogether and have all callers use 2 arguments
use the “non-virtual interface idiom” as described in the chapter above (it is from the book “Effective C++” by Scott Meyers)
use overload instead of default arguments. Have two methods waitTipChanged(hash) and waitTipChanged(hash, timeout) (the first could call the second internally).

Sjors commented at 1:45 pm on February 13, 2025:

I don’t think we should drop the default, because MillisecondsDouble::max() is tedious. It seems fine that the interface does it internally, which is simpler than the other two solutions.

Keep in mind that the interface has a much broader target audience (IPC users), so it should be as simple as possible. I’m less worried about other Bitcoin Core devs that need to change something inside node/interfaces.cpp

Sjors commented at 1:51 pm on February 13, 2025:

I could also do this:

 0diff --git a/src/node/interfaces.cpp b/src/node/interfaces.cpp
 1index 66ce64b1c3..b0735d497c 100644
 2--- a/src/node/interfaces.cpp
 3+++ b/src/node/interfaces.cpp
 4@@ -971,7 +971,7 @@ public:
 5         return BlockRef{tip->GetBlockHash(), tip->nHeight};
 6     }
 7 
 8-    std::optional<BlockRef> waitTipChanged(uint256 current_tip, MillisecondsDouble timeout) override
 9+    std::optional<BlockRef> waitTipChanged(uint256 current_tip, MillisecondsDouble timeout = MillisecondsDouble::max()) override
10     {
11         if (timeout > std::chrono::years{100}) timeout = std::chrono::years{100}; // Upper bound to avoid UB in std::chrono
12         std::optional<uint256> tip_hash;
13@@ -1005,7 +1005,7 @@ public:
14     std::unique_ptr<BlockTemplate> createNewBlock(const BlockCreateOptions& options) override
15     {
16         // Ensure m_tip_block is set so consumers of BlockTemplate can rely on that.
17-        if (!waitTipChanged(uint256::ZERO, MillisecondsDouble::max())) return {};
18+        if (!waitTipChanged(uint256::ZERO)) return {};
19 
20         BlockAssembler::Options assemble_options{options};
21         ApplyArgsManOptions(*Assert(m_node.args), assemble_options);

But that doesn’t seem nicer. Also the compiler doesn’t warn if the default in the override is different, so this seems unsafe.

Sjors commented at 2:29 pm on February 13, 2025:

My takeaway from the article is that having a default for the base class is fine, and that it’s better to not use a default for the inherited classes (but it could get ignored).

The patch in my previous comment would be bad. But the current code that calls waitTipChanged(uint256::ZERO, MillisecondsDouble::max()) should be fine.

ryanofsky commented at 4:28 pm on February 13, 2025:

In commit “Have createNewBlock() wait for a tip” (854766b886b6c2ee564b867db64bdbce80064b3b)

Another alternative but probably not better is:

0-        if (!waitTipChanged(uint256::ZERO, MillisecondsDouble::max())) return {};
1+        if (!static_cast<Mining*>(this)->waitTipChanged(uint256::ZERO)) return {};

I think in longer term after we add an initial waitNext() method and make improvements to it, it will be natural to move more functionality out of Mining class methods into node/miner.cpp or somewhere like that and these methods will get simpler. The node/interfaces.cpp file is mostly meant to contain glue code and wrap other functions, not implement real functionality itself. Since a lot of code needs to wait, I imagine there will be some new waiting utility function this could call instead of waitTipChanged, but using waitTipChanged for now here should be fine.

in src/node/interfaces.cpp:1071 in be15f2e769 outdated

970@@ -971,24 +971,32 @@ class MinerImpl : public Mining
971         return BlockRef{tip->GetBlockHash(), tip->nHeight};
972     }

vasild commented at 10:54 am on February 12, 2025:

The message of the first commit has a line that is 211 chars. Would be nice to use the 50/72 rule in commit messages.

in src/node/interfaces.cpp:1070 in be15f2e769 outdated

970@@ -971,24 +971,32 @@ class MinerImpl : public Mining
971         return BlockRef{tip->GetBlockHash(), tip->nHeight};

vasild commented at 10:56 am on February 12, 2025:

In commit message of ad3af401c19b8d05ce69011a359db3090b7018e1 rpc: handle shutdown during long poll and wait methods:

0- getblocktemplate,  waitfornewblock
1+ getblocktemplate, waitfornewblock

vasild commented at 10:56 am on February 12, 2025: contributor

Approach ACK be15f2e769987b1df2cf1c8f327c5aafe064dbb3

ryanofsky commented at 2:27 pm on February 12, 2025: contributor

Previously it would return the last known tip during shutdown, but this creates an ambiguous circumstance in the scenario where the node is started and quickly shutdown, before notifications().TipBlock() is set.

New behavior seems good, but I am confused by this description. It looks to me like previous behavior was to segfault in this case, because it would returnTip()->GetBlockHash() and Tip() would be null? It would be helpful if description said more specifically what previous behavior was.

ryanofsky commented at 2:48 pm on February 12, 2025: contributor

Code review be15f2e769987b1df2cf1c8f327c5aafe064dbb3. Mostly looks good but I think vasild’s suggestion #31785 (review) should be used so waitTipChanged can’t return an inconsistent hash and height

Sjors force-pushed on Feb 13, 2025

Sjors commented at 11:15 am on February 13, 2025: member

I changed waitTipChanged() to only return nullopt during shutdown. This is achieved by extending the timeout in the unlikely event that node initialization takes longer.

I introduced a new RPC_SHUTDOWN_ERROR code and use it in case waitfornewblock and friends are called really early and the node shuts down (probably very hard to reach in practice). See #31785 (review)

It would be helpful if description said more specifically what previous behavior was.

Updated the description.

On question about long polling I still need to investigate: #31785 (review)

Sjors force-pushed on Feb 13, 2025

DrahtBot added the label CI failed on Feb 13, 2025

DrahtBot commented at 12:04 pm on February 13, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37158437645

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

Sjors force-pushed on Feb 13, 2025

Sjors commented at 12:50 pm on February 13, 2025: member

I pushed a commit to clarify long poll behavior.

in src/rpc/protocol.h:52 in e6760e7f0f outdated

48@@ -49,6 +49,7 @@ enum RPCErrorCode
49     RPC_VERIFY_ALREADY_IN_UTXO_SET  = -27, //!< Transaction already in utxo set
50     RPC_IN_WARMUP                   = -28, //!< Client still warming up
51     RPC_METHOD_DEPRECATED           = -32, //!< RPC method is deprecated
52+    RPC_SHUTDOWN_ERROR              = -37, //!< Node is shutting down

ryanofsky commented at 1:13 pm on February 13, 2025:

In commit “rpc: introduce RPC_SHUTDOWN_ERROR” (e6760e7f0f04c076c3eb8fa61cb54adf12fed36b)

Out of curiosity, where did 37 come from?

Sjors commented at 2:29 pm on February 13, 2025:

First available number.

ryanofsky commented at 3:48 pm on February 13, 2025:

First available number.

Oh, thanks. Didn’t notice the numbers were out of order and later ones were used below.

in src/node/interfaces.cpp:1101 in fa99eddd92 outdated

1007+        Assume(tip_hash);
1008+
1009+        // Must release m_tip_block_mutex before getTip() locks cs_main, to
1010+        // avoid deadlocks.
1011+        return getTip();
1012     }

ryanofsky commented at 2:12 pm on February 13, 2025:

In commit “rpc: handle shutdown during long poll and wait methods” (fa99eddd926e0d49604d9ca8f21fac85d63652f4)

I think this new behavior of only using timeout as time limit for tip changing, not as a time limit for the tip to be set on startup makes a lot of sense and should be much more convenient for callers than previous behavior.

(Previous version in ad3af401c19b8d05ce69011a359db3090b7018e1 would return ambiguous null values on startup that could mean either the node was shutting down or the timeout was reached, and would force callers to immediately make another waitTipChanged call without even giving them any new information in the meantime.)

Only downside of this behavior is that mining API no longer includes a function that can wait with a limited timeout for the tip to be non-null on startup. But if anybody wants this we could add a timeout option to getTip() or a new separate waitForStartup() method accepting a timeout. Or we could not provide anything, since it is pretty easy for callers to implemented their own timeouts if they are using the capnproto client API.

However, the current implementation of this behavior in fa99eddd926e0d49604d9ca8f21fac85d63652f4 looks a bit janky to me, with double nested wait loop, unnecessary 100ms polling, unnecessary tip_hash variable, and an Assume() check out of place and delayed for no reason. Following is untested, but I think would a simpler implementation that cleans all these issues up:

 0    std::optional<BlockRef> waitTipChanged(uint256 current_tip, MillisecondsDouble timeout) override
 1    {
 2        if (timeout > std::chrono::years{100}) timeout = std::chrono::years{100}; // Upper bound to avoid UB in std::chrono
 3        auto deadline{std::chrono::steady_clock::now() + timeout};
 4        {
 5            WAIT_LOCK(notifications().m_tip_block_mutex, lock);
 6            // For callers convenience, wait longer than the provided timeout
 7            // during startup for the tip to be non-null. That way this function
 8            // always returns valid tip information when possible and only
 9            // returns null when shutting down, not when timing out.
10            notifications().m_tip_block_cv.wait(lock, [&]() EXCLUSIVE_LOCKS_REQUIRED(notifications().m_tip_block_mutex) {
11                return notifications().TipBlock() || chainman().m_interrupt;
12            });
13            // At this point TipBlock is set, so continue to wait until it is
14            // different then `current_tip` provided by caller.
15            notifications().m_tip_block_cv.wait_until(lock, deadline, [&]() EXCLUSIVE_LOCKS_REQUIRED(notifications().m_tip_block_mutex) {
16                return Assume(notifications().TipBlock()) != current_tip || chainman().m_interrupt;
17            });
18         }
19
20        if (chainman().m_interrupt) return {};
21
22        // Must release m_tip_block_mutex before getTip() locks cs_main, to
23        // avoid deadlocks.
24        return getTip();
25    }

Sjors commented at 2:39 pm on February 13, 2025:

Thanks, that indeed looks simpler. Done in 0c06c8108fb43be317d86199750ecd3b7db9a5d2.

I also think clients should handle timeouts themselves. For bitcoin-cli we have -rpcclienttimeout=0, other RPC clients presumably have their own timeout mechanisms, and IPC clients can do the same.

Sjors force-pushed on Feb 13, 2025

Sjors commented at 2:47 pm on February 13, 2025: member

I mixed up the commits in my last push, will fix…

Sjors force-pushed on Feb 13, 2025

in src/rpc/blockchain.cpp:300 in 0c06c8108f outdated

297-        block = timeout ? miner.waitTipChanged(block.hash, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(block.hash);
298+        new_block = timeout ? miner.waitTipChanged(block.hash, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(block.hash);
299+        // Return new block unless the node is shutting down
300+        if (new_block) {
301+            block = *CHECK_NONFATAL(new_block);
302+        }

ryanofsky commented at 4:46 pm on February 13, 2025:

In commit “rpc: handle shutdown during long poll and wait methods” (0c06c8108fb43be317d86199750ecd3b7db9a5d2)

I think this code is not quite right because it is assuming that if getTip() returns null it means the node is shutting down, when in reality it is starting up. Also the IsRPCRunning() call unnecessarily complicates things and is incorrect as we found previously in #30635. Would also be nice to reduce number of variables used in this code and remove CHECK_NONFATAL calls that can’t trigger because they immediately follow null checks. Would suggest:

 0--- a/src/rpc/blockchain.cpp
 1+++ b/src/rpc/blockchain.cpp
 2@@ -287,18 +287,12 @@ static RPCHelpMan waitfornewblock()
 3     NodeContext& node = EnsureAnyNodeContext(request.context);
 4     Mining& miner = EnsureMining(node);
 5 
 6-    std::optional<BlockRef> old_block{miner.getTip()};
 7-    if (!old_block) throw JSONRPCError(RPC_SHUTDOWN_ERROR, "Node is shutting down");
 8-    BlockRef block{*CHECK_NONFATAL(old_block)};
 9-
10-    std::optional<BlockRef> new_block;
11-    if (IsRPCRunning()) {
12-        new_block = timeout ? miner.waitTipChanged(block.hash, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(block.hash);
13-        // Return new block unless the node is shutting down
14-        if (new_block) {
15-            block = *CHECK_NONFATAL(new_block);
16-        }
17-    }
18+    std::optional<BlockRef> tip{miner.getTip()};
19+    if (!tip) throw JSONRPCError(RPC_IN_WARMUP, "Node is starting up");
20+    BlockRef block{*tip};
21+    tip = timeout > 0 ? miner.waitTipChanged(block.hash, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(block.hash);
22+    if (!tip) throw JSONRPCError(RPC_SHUTDOWN_ERROR, "Node is shutting down");
23+    block = *tip;
24 
25     UniValue ret(UniValue::VOBJ);
26     ret.pushKV("hash", block.hash.GetHex());

Probably similar simplifications can apply in other methods below.

Sjors commented at 7:01 pm on February 13, 2025:

Good point, getTip() doesn’t wait.

Sjors commented at 7:41 pm on February 13, 2025:

I made a variant of this that’s a bit more in line with #30635; during startup we’ll wait by using the 0 hash.

ryanofsky commented at 5:25 pm on February 18, 2025:

re: #31785 (review)

I made a variant of this that’s a bit more in line with #30635; during startup we’ll wait by using the 0 hash.

This seems ok to me, and I initially considered suggesting that, but it seemed like it expanded the scope of the PR too much and was not related to its original goals.

I think if the PR will do this, it should also include release notes that say these RPCs will now wait on startup instead of returning errors on startup. The RPC method documentation should also be updated to say that timeout option only applies to the specific condition the RPC method is waiting for, and if the RPC method is called during startup it will wait arbitrarily long for the chain state to be ready, ignoring the timeout.

I’d still slightly prefer keeping semantics of the RPC methods unchanged in this PR and just making the more minimal change I suggested above. I think a dedicated PR like #30635 would be a better place to update the RPC interface, if that is what we want to do.

No strong opinion though, and this seems ok too.

Sjors commented at 11:55 am on February 19, 2025:

I went back to the old behavior.

DrahtBot removed the label CI failed on Feb 13, 2025

Sjors force-pushed on Feb 13, 2025

in src/node/interfaces.cpp:1093 in 2042ed1055 outdated

991+            });
992+            // At this point TipBlock is set, so continue to wait until it is
993+            // different then `current_tip` provided by caller.
994+            notifications().m_tip_block_cv.wait_until(lock, deadline, [&]() EXCLUSIVE_LOCKS_REQUIRED(notifications().m_tip_block_mutex) {
995+                return Assume(notifications().TipBlock()) != current_tip || chainman().m_interrupt;
996             });

vasild commented at 11:50 am on February 18, 2025:

The Assume() can be triggered during normal circumstances: if the first wait quits with TipBlock() null and chainman().m_interrupt being true. The easiest way to work around this would be to swap the order in the second condition:

0- Assume(notifications().TipBlock()) != current_tip || chainman().m_interrupt
1+ chainman().m_interrupt || Assume(notifications().TipBlock()) != current_tip

that might be a bit fragile though, wrt future changes.

Sjors commented at 2:44 pm on February 18, 2025:

I flipped it.

ryanofsky commented at 5:00 pm on February 18, 2025:

re: #31785 (review)

Nice catch. I agree flipping condition is a bit fragile. It also makes code less consistent and leaves “At this point TipBlock is set…” comment being not totally accurate. Would suggest a tweak to unflip and be clearer:

 0--- a/src/node/interfaces.cpp
 1+++ b/src/node/interfaces.cpp
 2@@ -984,10 +984,11 @@ public:
 3             notifications().m_tip_block_cv.wait(lock, [&]() EXCLUSIVE_LOCKS_REQUIRED(notifications().m_tip_block_mutex) {
 4                 return notifications().TipBlock() || chainman().m_interrupt;
 5             });
 6+            if (chainman().m_interrupt) return {};
 7             // At this point TipBlock is set, so continue to wait until it is
 8             // different then `current_tip` provided by caller.
 9             notifications().m_tip_block_cv.wait_until(lock, deadline, [&]() EXCLUSIVE_LOCKS_REQUIRED(notifications().m_tip_block_mutex) {
10-                return chainman().m_interrupt || Assume(notifications().TipBlock()) != current_tip;
11+                return Assume(notifications().TipBlock()) != current_tip || chainman().m_interrupt;
12             });
13         }
14

Sjors commented at 8:37 am on February 19, 2025:

Good point about the comment, I took your suggestion.

in src/rpc/blockchain.cpp:361 in 2042ed1055 outdated

362 
363     UniValue ret(UniValue::VOBJ);
364-    ret.pushKV("hash", block.hash.GetHex());
365-    ret.pushKV("height", block.height);
366+    ret.pushKV("hash", block->hash.GetHex());
367+    ret.pushKV("height", block->height);

vasild commented at 12:01 pm on February 18, 2025:

This could crash in the following (stupid) scenario:

the user provides 0 as desired block hash to the waitforblock
getTip() returns nullopt, which is assigned to block
the while loop is never entered, not even once so block remains nullopt
at the end we try to dereference the nullopt block :rescue_worker_helmet:

We shouldn’t crash even on such “idiotic” input. Maybe change the while condition:

0- while (tip_hash != hash) {
1+ while (tip_hash != hash || !block.has_value()) {

Sjors commented at 2:25 pm on February 18, 2025:

I guess it could also crash on a non-existing block hash? That’s not unlikely to happen.

Sjors commented at 2:44 pm on February 18, 2025:

Added

in src/rpc/blockchain.cpp:409 in 2042ed1055 outdated

406+    // If getTip() returns null, which can happen during startup, pass 0 to
407+    // waitTipChanged() so it will wait for any tip hash to be set.
408+    uint256 tip_hash{block ? block->hash : uint256{}};
409+    int block_height{block ? block->height : 0};
410+
411+    while (block_height < height) {

vasild commented at 12:04 pm on February 18, 2025:

Same here, if the user provides height=0, and getTip() returns nullopt, then this will be while (0 < 0), will never enter and block will remain nullopt and be dereferenced at the end.

in src/rpc/blockchain.cpp:293 in 2042ed1055 outdated

292-        block = timeout ? miner.waitTipChanged(block.hash, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(block.hash);
293-    }
294+    std::optional<BlockRef> block{miner.getTip()};
295+    // If getTip() returns null, which can happen during startup, pass 0 to
296+    // waitTipChanged() so it will wait for any tip hash to be set.
297+    uint256 tip_hash{block ? block->hash : uint256{}};

vasild commented at 12:09 pm on February 18, 2025:

nit: maybe use uint256::ZERO, also used in commit b954af72e4d6a83b3ba701b7ee52d97f2af3254b Have createNewBlock() wait for a tip:

0    uint256 tip_hash{block ? block->hash : uint256::ZERO};

vasild commented at 12:15 pm on February 18, 2025: contributor

Approach ACK 2042ed105578a71234295b01be39ad8607722abe

Sjors force-pushed on Feb 18, 2025

Sjors commented at 2:45 pm on February 18, 2025: member

Addressed @vasild’s feedback (and rebased).

in src/rpc/blockchain.cpp:346 in 98282f5611 outdated

343+    // waitTipChanged() so it will wait for any tip hash to be set.
344+    uint256 tip_hash{block ? block->hash : uint256{}};
345+
346     const auto deadline{std::chrono::steady_clock::now() + 1ms * timeout};
347-    while (IsRPCRunning() && block.hash != hash) {
348+    while (tip_hash != hash || !block.has_value()) {

ryanofsky commented at 5:35 pm on February 18, 2025:

In commit “rpc: handle shutdown during long poll and wait methods” (98282f56112af33691a3fe2e1ad7c14febbb0660)

Not important but it would be nice if code style in this PR were a little more consistent. I’d prefer replacing uint256{} with uint256::ZERO and replacing block.has_value() with just block but any consistent style would be better IMO.

Sjors commented at 11:53 am on February 19, 2025:

Agreed, though both disappeared in my last push.

in src/rpc/mining.cpp:821 in 6806651d2f outdated

815@@ -801,16 +816,21 @@ static RPCHelpMan getblocktemplate()
816         LEAVE_CRITICAL_SECTION(cs_main);
817         {
818             MillisecondsDouble checktxtime{std::chrono::minutes(1)};
819-            while (tip == hashWatchedChain && IsRPCRunning()) {
820+            while (IsRPCRunning()) {
821+                // If hashWatchedChain is not be a real block hash, this will
822+                // return immediately.

ryanofsky commented at 6:09 pm on February 18, 2025:

In commit “rpc: clarify longpoll behavior” (6806651d2f64d51094178cbfb23ef2f4a956aa4d)

s/is not be/is not/

ryanofsky approved

ryanofsky commented at 6:19 pm on February 18, 2025: contributor

Code review ACK 6806651d2f64d51094178cbfb23ef2f4a956aa4d

As mentioned in a comment below, I don’t really like how this PR is changing startup behavior of the RPC methods without documenting it, and when it seem like it expands the scope of this PR and makes it harder to understand. Would prefer doing that in #30635 so this PR could be more focused. But current approach does seem ok too.

DrahtBot requested review from vasild on Feb 18, 2025

Sjors commented at 8:20 am on February 19, 2025: member

I think a dedicated PR like #30635 would be a better place to update the RPC interface, if that is what we want to do.

I’ll rebase the latter PR on top of this, and then reduce the scope here.

vasild approved

vasild commented at 8:48 am on February 19, 2025: contributor

ACK 6806651d2f64d51094178cbfb23ef2f4a956aa4d

Would be happy to re-review if suggestions from @ryanofsky are applied, especially #31785 (review)

Note wrt RPC changing of the behavior when waiting at startup for the tip to be set – the current code first sets the tip and then starts the RPC, so this is mostly theoretical or about future changes that might swap the startup order. Thus, IMO, no need for release notes. Edit: I forgot that there is also a change that the RPC will now throw during shutdown. Maybe deserves a release note.

Sjors force-pushed on Feb 19, 2025

Sjors commented at 11:56 am on February 19, 2025: member

the current code first sets the tip and then starts the RPC, so this is mostly theoretical or about future changes that might swap the startup order.

Indeed, that’s what it’s supposed to do.

I ended up simplifying things a bit. The original CHECK_NONFATAL(miner.getTip() has been replaced with JSONRPCError(RPC_IN_WARMUP. So in the unlikely event there’s no tip, the user gets a clear error. I also added an Assume, so we can fix things if it turns out SetRPCWarmupFinished was called too early by our init code.

I restored the original behavior of returning the last tip for both timeout and shutdown. This is possible now because we throw during startup if there’s no tip.

I dropped the use of RPC_SHUTDOWN_ERROR in rpc/mining.cpp (and the new code along with it).

I added a separate commit to drop the IsRPCRunning checks. Since these methods are undocumented and hidden, I don’t think it needs a release note. That can wait for #30635.

Sjors force-pushed on Feb 19, 2025

Sjors commented at 12:15 pm on February 19, 2025: member

I forgot that you can’t use Assume in RPC code. Using NONFATAL_UNREACHABLE instead, though that’s not as clear to the user.

DrahtBot commented at 12:15 pm on February 19, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37463939357

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot added the label CI failed on Feb 19, 2025

Sjors commented at 12:51 pm on February 19, 2025: member

Four CI machines consistently failing rpc_blockchain.py (either --v1transport or --v2transport) at:

0def assert_waitforheight(height, timeout=2):
1   assert_equal(
2      node.waitforblockheight(height=height, timeout=timeout)['height'],
3      current_height)

With “Remote end closed connection without response”, but not much useful info otherwise. Can’t reproduce locally yet.

E.g. https://cirrus-ci.com/task/5674616741429248?logs=ci#L2902

Oh there is a hint:

 0[07:26:06.749] 
 1[07:26:06.749] 
 2 node0 stderr Error: Invalid height value (bip34@-2) for -testactivationheight=name@height. 
 3[07:26:06.749] 
 4 node0 stderr Error: Invalid format () for -testactivationheight=name@height. 
 5[07:26:06.749] 
 6 node0 stderr /usr/include/c++/14/optional:475: constexpr _Tp& std::_Optional_base_impl<_Tp, _Dp>::_M_get() [with _Tp = interfaces::BlockRef; _Dp = std::_Optional_base<interfaces::BlockRef, true, true>]: Assertion 'this->_M_is_engaged()' failed. 
 7[07:26:06.749] 
 8 node0 stderr Error: Invalid name (name@2) for -testactivationheight=name@height. 
 9[07:26:06.749] 
10 node0 stderr : The block database contains a block which appears to be from the future. This may be due to your computer's date and time being set incorrectly. Only rebuild the block database if you are sure that your computer's date and time are correct.
11[07:26:06.749] Please restart with -reindex or -reindex-chainstate to recover.

Or maybe not, since these are just things that are being tested.

in src/rpc/blockchain.cpp:405 in 0a8a9e8a90 outdated

399@@ -387,23 +400,30 @@ static RPCHelpMan waitforblockheight()
400     NodeContext& node = EnsureAnyNodeContext(request.context);
401     Mining& miner = EnsureMining(node);
402 
403-    auto block{CHECK_NONFATAL(miner.getTip()).value()};
404+    std::optional<BlockRef> current_block{miner.getTip()};
405+    // Abort if RPC came out of warmup too early
406+    if (!current_block) NONFATAL_UNREACHABLE();

maflcko commented at 12:52 pm on February 19, 2025:

0     BlockRef current_block{*CHECK_NONFATAL(miner.getTip())};
1    // Abort if RPC came out of warmup too early
2

There is no need for optional, if the value can never be nullopt

Sjors force-pushed on Feb 19, 2025

Sjors commented at 1:07 pm on February 19, 2025: member

Replaced NONFATAL_UNREACHABLE with CHECK_NONFATAL again by avoiding the intermediate std::optional, see #31785 (review)

Rebased and pushed again, though I haven’t figured out the CI issue.

vasild commented at 1:25 pm on February 19, 2025: contributor

Remote end closed connection without response

That probably means bitcoind crashed.

node0 stderr … optional … Assertion ’this->_M_is_engaged()’ failed.

Sjors commented at 1:47 pm on February 19, 2025: member

M_engaged seems to be gcc specific for std::optional: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/optional#L291

It presumably happens when you dereference it in a bad way, but then why doesn’t it crash when compiled with clang (as on my macOS machine)?

Llvm uses the term “engaged” too, but maybe it has slightly different checks? https://github.com/llvm/llvm-project/blob/main/libcxx/include/optional

Sjors force-pushed on Feb 19, 2025

vasild commented at 2:01 pm on February 19, 2025: contributor

I would guess the optional is dereferenced when it is nullopt and this is not compiler related. Maybe it only happens on CI due to some timing specifics? CI failures that don’t repro locally are the most tedious to nail down. Try adding a lot of LogError("reached line 123, relevant values %d %s %whatever", ...) and then inspect the CI log for those.

Sjors commented at 2:02 pm on February 19, 2025: member

Trying .value() instead of *…

https://github.com/bitcoin/bitcoin/compare/51aa29ede7bd9a1897f3cf240e70ef180849591d..3b877e6718a916402adf8dcb141da01064289a6b

I think that’s the only place where something relevant changed since the last successful CI run, see:

0git diff  3a9ab8f  51aa29e src/rpc/blockchain.cpp

https://gist.github.com/Sjors/60c05fb3e9c47ed6d41b78fa66e76ef5

If that works, there’s probably an issue with how CHECK_NONFATAL works.

ryanofsky commented at 2:05 pm on February 19, 2025: contributor

This failure is strange. I tried to reproduce the failure from https://cirrus-ci.com/task/6207546920271872?logs=ci#L2923:

0 node0 stderr /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/optional:479: _Tp &std::_Optional_base_impl<interfaces::BlockRef, std::_Optional_base<interfaces::BlockRef>>::_M_get() [_Tp = interfaces::BlockRef, _Dp = std::_Optional_base<interfaces::BlockRef>]: Assertion 'this->_M_is_engaged()' failed.

using gcc 13.3.0, checking out commit that failed 3872aed9d3cc0c8046272eb5d488ff5ddbe78647 and running the test that failed test/functional/rpc_blockchain.py --v1transport but it seemed to pass. I also didn’t see any places an optional BlockRef is being referenced in the code reviewing 51aa29ede7bd9a1897f3cf240e70ef180849591d.

maflcko commented at 2:21 pm on February 19, 2025: member

CI failures that don’t repro locally are the most tedious to nail down.

It is not possible to reliably reproduce undefined behavior, because it is undefined. My recommendation would be to just run in valgrind to spot any memory errors (and get a stacktrace). Alternatively, you can re-compile with hardened library flags, but that requires a recompilation with different flags and doesn’t give a backtrace either.

Sjors commented at 2:29 pm on February 19, 2025: member

It’s unlikely to be related to starting up the node, because this happens only on the fourth waitforblockheight call in the test, where it’s waiting for current_height + 1. That’s the only time when it actually has to wait. It should hit the timeout after 2 milliseconds.

in src/rpc/blockchain.cpp:347 in 1e078f4e02 outdated

344-    while (block.hash != hash) {
345+    while (current_block.hash != hash) {
346+        std::optional<BlockRef> block;
347         if (timeout) {
348             auto now{std::chrono::steady_clock::now()};
349             if (now >= deadline) break;

Sjors commented at 2:32 pm on February 19, 2025:

This might be a problem.

Sjors force-pushed on Feb 19, 2025

Sjors commented at 3:07 pm on February 19, 2025: member

Perhaps on a slow machine the timeout could go negative here:

0auto now{std::chrono::steady_clock::now()};
1if (now >= deadline) break;
2const MillisecondsDouble remaining{deadline - now};
3block = miner.waitTipChanged(current_block.hash, remaining);

Though that seems unlikely, and I wouldn’t expect it to happen consistently. In any case I added better handling for negative timeout.

Update: no, deadline couldn’t have gone negative

in src/rpc/blockchain.cpp:411 in 505cb484eb outdated

410             auto now{std::chrono::steady_clock::now()};
411-            if (now >= deadline) break;
412             const MillisecondsDouble remaining{deadline - now};
413-            block = miner.waitTipChanged(block.hash, remaining);
414+            if (remaining < 0ms) break;
415+            block = miner.waitTipChanged(block->hash, remaining);

ryanofsky commented at 3:12 pm on February 19, 2025:

I reproduced the CI bug and problem happens on this line. Looks like block->hash should be replaced by current_block.hash here and below

Sjors commented at 3:21 pm on February 19, 2025:

Good catch, that’s also consistent with waitforblock.

Sjors commented at 4:28 pm on February 19, 2025:

Looks like that indeed did the trick.

Sjors force-pushed on Feb 19, 2025

in src/interfaces/mining.h:84 in f036e5091a outdated

79@@ -80,7 +80,9 @@ class Mining
80      *
81      * @param[in] current_tip block hash of the current chain tip. Function waits
82      *                        for the chain tip to differ from this.
83-     * @param[in] timeout     how long to wait for a new tip
84+     * @param[in] timeout     how long to wait for a new tip (default is forever,
85+     *                        negative values are rounded to 0, but should be avoided)

ryanofsky commented at 4:40 pm on February 19, 2025:

In commit “Handle negative timeout for waitTipChanged()” (f036e5091a90b75982684c5b64db1301d3ec2515)

IMO, new handling of negative timeouts seems reasonable to keep, but mentioning them in documentation is distracting. Would be less confusing to just shorten this to “(default is forever)”.

in src/rpc/blockchain.cpp:291 in edfb4ad204 outdated

286@@ -286,12 +287,17 @@ static RPCHelpMan waitfornewblock()
287     NodeContext& node = EnsureAnyNodeContext(request.context);
288     Mining& miner = EnsureMining(node);
289 
290-    auto block{CHECK_NONFATAL(miner.getTip()).value()};
291-    block = timeout ? miner.waitTipChanged(block.hash, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(block.hash);
292+    // Abort if RPC came out of warmup too early
293+    BlockRef current_block{CHECK_NONFATAL(miner.getTip()).value()};

ryanofsky commented at 4:46 pm on February 19, 2025:

In commit “rpc: handle shutdown during long poll and wait methods” (edfb4ad204e71fc266be55748290340b8791e12f)

This seem inconsistent with the commit message with says “a missing tip at startup now triggers NONFATAL_UNREACHABLE instead of CHECK_NONFATAL.” Is commit message just out of date?

Sjors commented at 6:03 pm on February 19, 2025:

Yes, I’ll update the commit message.

in src/rpc/blockchain.cpp:296 in edfb4ad204 outdated

293+    BlockRef current_block{CHECK_NONFATAL(miner.getTip()).value()};
294+    std::optional<BlockRef> block = timeout ? miner.waitTipChanged(current_block.hash, std::chrono::milliseconds(timeout)) :
295+                                              miner.waitTipChanged(current_block.hash);
296+
297+    // Return current block upon shutdown
298+    if (!block) block = current_block;

ryanofsky commented at 4:55 pm on February 19, 2025:

In commit “rpc: handle shutdown during long poll and wait methods” (edfb4ad204e71fc266be55748290340b8791e12f)

Would be more consistent with other methods (and also clearer IMO) to just return current_block instead of block like:

 0--- a/src/rpc/blockchain.cpp
 1+++ b/src/rpc/blockchain.cpp
 2@@ -293,11 +293,11 @@ static RPCHelpMan waitfornewblock()
 3                                               miner.waitTipChanged(current_block.hash);
 4 
 5     // Return current block upon shutdown
 6-    if (!block) block = current_block;
 7+    if (block) current_block = *block;
 8 
 9     UniValue ret(UniValue::VOBJ);
10-    ret.pushKV("hash", block->hash.GetHex());
11-    ret.pushKV("height", block->height);
12+    ret.pushKV("hash", current_block.hash.GetHex());
13+    ret.pushKV("height", current_block.height);
14     return ret;
15 },
16     };

in src/rpc/blockchain.cpp:410 in edfb4ad204 outdated

409         if (timeout) {
410             auto now{std::chrono::steady_clock::now()};
411-            if (now >= deadline) break;
412             const MillisecondsDouble remaining{deadline - now};
413-            block = miner.waitTipChanged(block.hash, remaining);
414+            if (remaining < 0ms) break;

ryanofsky commented at 5:03 pm on February 19, 2025:

In commit “rpc: handle shutdown during long poll and wait methods” (edfb4ad204e71fc266be55748290340b8791e12f)

This change seems equivalent to previous code, and previous code seems clearer. This change also makes waitforblockheight implementation less consistent with waitforblock. Would suggest:

 0--- a/src/rpc/blockchain.cpp
 1+++ b/src/rpc/blockchain.cpp
 2@@ -406,8 +406,8 @@ static RPCHelpMan waitforblockheight()
 3         std::optional<BlockRef> block;
 4         if (timeout) {
 5             auto now{std::chrono::steady_clock::now()};
 6+            if (now >= deadline) break;
 7             const MillisecondsDouble remaining{deadline - now};
 8-            if (remaining < 0ms) break;
 9             block = miner.waitTipChanged(current_block.hash, remaining);
10         } else {
11             block = miner.waitTipChanged(current_block.hash);

Sjors commented at 6:07 pm on February 19, 2025:

Indeed, I was briefly confused about this, but there’s no need to change it.

in src/interfaces/mining.h:98 in a5b3406489 outdated

 95+     * During node initialization, this will wait until the tip is connected.
 96      *
 97      * @param[in] options options for creating the block
 98-     * @returns a block template
 99+     * @retval BlockTemplate a block template.
100+     * @retval std::nullopt if the node is shut down.

ryanofsky commented at 5:07 pm on February 19, 2025:

In commit “Have createNewBlock() wait for a tip” (a5b34064891814a998cac73c32f60d55a93e6811)

This is actually returning nullptr not nullopt

ryanofsky approved

ryanofsky commented at 5:10 pm on February 19, 2025: contributor

Code review ACK b74b6c95eeba57ee6f627ceae0deeb6b14eadc0c

DrahtBot requested review from vasild on Feb 19, 2025

ryanofsky commented at 5:44 pm on February 19, 2025: contributor

From the PR description:

The first commit allows these three methods to be used in the GUI, by removing the IsRPCRunning(). This simplifies the other changes.

This currently happens in the second commit, not the first commit.

I think in general PR description is a little hard to follow though. Here is how I would describe this PR:

This PR prevents Mining interface methods from sometimes crashing when called during startup before a tip is connected. It also makes other improvements like making more RPC methods usable from the GUI. Specifically this PR:
- Adds an Assume check to disallow passing negative timeout values to Mining::waitTipChanged
- Makes waitfornewblock waitforblock and waitforblockheight RPC methods usable from the GUI when -server=1 is not set.
- Changes Mining::waitTipChanged to return optional<BlockRef> instead of BlockRef and return nullopt instead of crashing if there is a timeout or if the node is shut down before a tip is connected.
- Changes Mining::waitTipChanged to not time out before a tip is connected, so it is convenient and safe to call during startup, and only returns nullopt on early shutdowns.
- Changes Mining::createNewBlock to block and wait for a tip to be connected if it is called on startup instead of crashing. Also documents that it will return null on early shutdowns.

DrahtBot removed the label CI failed on Feb 19, 2025

Sjors commented at 6:13 pm on February 19, 2025: member

Addressed feedback and updated the PR description.

Sjors force-pushed on Feb 19, 2025

vasild approved

vasild commented at 9:13 am on February 20, 2025: contributor

ACK 9725cd6a5c20302da3dcae8fc1156458537ce1df

DrahtBot requested review from ryanofsky on Feb 20, 2025

ryanofsky approved

ryanofsky commented at 12:20 pm on February 20, 2025: contributor

Code review ACK 9725cd6a5c20302da3dcae8fc1156458537ce1df. Just minor suggestions implemented since last review: changing comments and swapping current_block / block in waitfornewblock.

Handle negative timeout for waitTipChanged() f9cf8bd0ab

rpc: drop unneeded IsRPCRunning() guards

This was preventing the (hidden) waitfornewblock, waitforblock and
waitforblockheight methods from being used in the GUI.

The check was added in d6a5dc4a2eaa0d7348804254ca09e75fc3a858ab
when these RPC methods were first introduced.

They could have been dropped when dca923150e3ac10a57c23a7e29e76516d32ec10d
refactored these methods to use waitTipChanged(), which already
checks for shutdown.

Making this change now simplifies the next commit.

a3bf43343f

rpc: handle shutdown during long poll and wait methods

The waitTipChanged() now returns nullopt if the node is shutting down.

Previously it would return the last known tip during shutdown, but
this creates an ambiguous circumstance in the scenario where the
node is started and quickly shutdown, before notifications().TipBlock()
is set.

The getblocktemplate, waitfornewblock and waitforblockheight RPC
are updated to handle this. Existing behavior is preserved.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>

64a2795fd4

Sjors force-pushed on Mar 13, 2025

Sjors commented at 11:16 am on March 13, 2025: member

Rebased after #31283. There was no conflict, but the PR description makes reference to waitNext() so this should make it easier to (re-)review with that context in mind.

ismaelsadeeq commented at 11:58 am on March 20, 2025: member

Concept ACK

In 64a2795fd4fe223a55564c31e9fa36264e79ac22 “rpc: handle shutdown during long poll and wait methods”

is there a use for returning the last known tip?

If not, should we simplify the behavior by always returning nullopt when the tip has not changed or no better block is available?

Or It will be more consistent to return the last known tip if it is set, when this edge case happen handle it by returning nullopt.

Sjors commented at 2:13 pm on March 20, 2025: member

@ismaelsadeeq I think this just preserves existing RPC behavior. What to return during shutdown is a bit arbitrary anyway, because a connected client will have to handle shutdown more generally, potentially by also shutting down.

I also haven’t checked if any of the other tests rely on the existing behavior.

So maybe better for a followup?

vasild approved

vasild commented at 4:56 pm on March 20, 2025: contributor

ACK cc1001f3bf17b31512c05fb359e09483a07fb2a3

DrahtBot requested review from ryanofsky on Mar 20, 2025

DrahtBot requested review from ismaelsadeeq on Mar 20, 2025

in src/interfaces/mining.h:92 in 64a2795fd4 outdated

88@@ -89,15 +89,16 @@ class Mining
89 
90     /**
91      * Waits for the connected tip to change. During node initialization, this will
92-     * wait until the tip is connected.
93+     * wait until the tip is connected (regardless of `timeout`).

ryanofsky commented at 3:29 pm on March 23, 2025:

In commit “Have createNewBlock() wait for a tip” (db14ca3556ca792546bf4343feb733271333690f)

Commit message is a little confusing because it doesn’t mention the timeout change. Would be clearer if it said the commit was changing two things (1) returning null on shutdown instead of last tip (2) ignoring timeout value during startup instead of returning 0 if timeout elapsed before tip was connected

Sjors commented at 8:50 am on March 24, 2025:

I improved the commit message.

ryanofsky commented at 3:45 pm on April 1, 2025:

re: #31785 (review)

I improved the commit message.

Thanks for the update. In case you wind up updating again, maybe consider extending to make a little more readable:

0-- return null on shutdown instead of the last tip
1-- ignore timeout value node initialization
2+Update createNewBlock to:
3+
4+- ignore timeout value during startup and wait for a tip to be connected instead of returning 0
5+- return null on shutdown instead of the last tip to make shutdown easier to detect
6 
7 This allows consumers of BlockTemplate to safely
8 assume that a tip is connected, instead of having
9 to account for startup and early shutdown scenarios.

Sjors commented at 8:37 am on April 2, 2025:

Will do if I need to retouch.

ryanofsky approved

ryanofsky commented at 3:51 pm on March 23, 2025: contributor

Code review ACK cc1001f3bf17b31512c05fb359e09483a07fb2a3. Only change since last review was rebasing after #31283

re: #31785#pullrequestreview-2702397335

is a use for returning the last known tip?

I think the main use for returning the last known tip when the tip hasn’t changed is just to allow the caller distinguish between the case where the tip has not changed from the case where the node is shutting down. If it returned null in both cases we would need to have to add a separate isShuttingDown method to distinguish them and code calling waitTipChanged in a loop would be more complicated.

I don’t think there is a use for returning the last known tip when the known is shutting down, just because caller can assume the tip at that point is the same value they passed as the current_tip argument.

Have createNewBlock() wait for a tip

- return null on shutdown instead of the last tip
- ignore timeout value node initialization

This allows consumers of BlockTemplate to safely
assume that a tip is connected, instead of having
to account for startup and early shutdown scenarios.

5315278e7c

rpc: clarify longpoll behavior

Move the comparison to hashWatchedChain inside the while loop.

Although this early return prevents the GetTransactionsUpdated()
call in cases where the tip updates, it's only done to improve
readability. The check itself is very cheap (although a more
useful check might not be).

Also add code comments.

05117e6e17

Sjors force-pushed on Mar 24, 2025

ryanofsky approved

ryanofsky commented at 3:46 pm on April 1, 2025: contributor

Code review ACK 05117e6e17f9a2d9a18a5b32570808c8907febb3, just updated a commit message since last review

DrahtBot requested review from vasild on Apr 1, 2025

in src/node/interfaces.cpp:1098 in 64a2795fd4 outdated

1101-        LOCK(::cs_main);
1102-        return BlockRef{chainman().ActiveChain().Tip()->GetBlockHash(), chainman().ActiveChain().Tip()->nHeight};
1103+
1104+        if (chainman().m_interrupt) return {};
1105+
1106+        // Must release m_tip_block_mutex before getTip() locks cs_main, to

TheCharlatan commented at 2:19 pm on April 9, 2025:

Nit: This comment seems a bit misplaced here now. Maybe place it further up before calling WaitLock?

Sjors commented at 3:32 pm on April 9, 2025:

Will look into moving it if I need to retouch more.

TheCharlatan approved

TheCharlatan commented at 3:28 pm on April 9, 2025: contributor

ACK 05117e6e17f9a2d9a18a5b32570808c8907febb3

vasild approved

vasild commented at 10:33 am on April 10, 2025: contributor

ACK 05117e6e17f9a2d9a18a5b32570808c8907febb3

achow101 commented at 9:22 pm on April 14, 2025: member

ACK 05117e6e17f9a2d9a18a5b32570808c8907febb3

achow101 assigned achow101 on Apr 14, 2025

achow101 merged this on Apr 14, 2025

achow101 closed this on Apr 14, 2025

Have createNewBlock() wait for tip, make rpc handle shutdown during long poll and wait methods #31785

Code Coverage & Benchmarks

Reviews

Conflicts