Have createNewBlock() wait for tip, make rpc handle shutdown during long poll and wait methods #31785

pull Sjors wants to merge 5 commits into bitcoin:master from Sjors:2025/02/create_new_block changing 4 files +102 −40
  1. Sjors commented at 2:56 pm on February 3, 2025: member

    This PR prevents Mining interface methods from sometimes crashing when called during startup before a tip is connected. It also makes other improvements like making more RPC methods usable from the GUI. Specifically this PR:

    • Adds an Assume check to disallow passing negative timeout values to Mining::waitTipChanged
    • Makes waitfornewblock, waitforblock and waitforblockheight RPC methods usable from the GUI when -server=1 is not set.
    • Changes Mining::waitTipChanged to return optional<BlockRef> instead of BlockRef and return nullopt instead of crashing if there is a timeout or if the node is shut down before a tip is connected.
    • Changes Mining::waitTipChanged to not time out before a tip is connected, so it is convenient and safe to call during startup, and only returns nullopt on early shutdowns.
    • Changes Mining::createNewBlock to block and wait for a tip to be connected if it is called on startup instead of crashing. Also documents that it will return null on early shutdowns.

    This allows the proposed waitNext() method in #31283 to safely assume TipBlock() isn’t null, not even during a scenario of early shutdown.

    Finally this PR clarifies long poll behaviour, mostly by adding code comments, but also through an early break.

  2. DrahtBot commented at 2:57 pm on February 3, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31785.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK vasild, ryanofsky

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31117 (miner: Reorg Testnet4 minimum difficulty blocks by fjahr)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. in src/node/interfaces.cpp:995 in af85987a16 outdated
    988@@ -989,6 +989,12 @@ class MinerImpl : public Mining
    989 
    990     std::unique_ptr<BlockTemplate> createNewBlock(const BlockCreateOptions& options) override
    991     {
    992+        {
    993+            // Ensure m_tip_block is set so consumers of BlockTemplate can rely on that.
    994+            WAIT_LOCK(notifications().m_tip_block_mutex, lock);
    995+            if (!notifications().TipBlock()) return nullptr;
    


    l0rinc commented at 3:03 pm on February 3, 2025:
    If it’s not too difficult, can we reproduce this with a test?

    Sjors commented at 3:06 pm on February 3, 2025:
    This depends on the node initialization (and shutdown) sequence. I’m not sure how to reproduce that in a test. And we might change that sequence in a way that this never happens in the first place.
  4. Sjors commented at 3:05 pm on February 3, 2025: member

    The Stratum v2 Template Provider that I implemented first calls mining.waitTipChanged(uint256::ZERO) before calling mining.createNewBlock(). This ensures that no matter how we order the node & IPC startup sequence, it won’t prematurely try to make a block.

    See https://github.com/Sjors/bitcoin/pull/49 (src/sv2/template_provider.cpp in Sv2TemplateProvider::ThreadSv2Handler()).

    But other implementers may not realise this, so having createNewBlock() return nothing when called too early seems like a good precaution.

    See this comment for more details on the init sequence:

    https://github.com/bitcoin/bitcoin/blob/1172bc4157eefe80d1aaf0b56459857ec651e535/src/init.cpp#L1817-L1833

    Also note that currently the IPC server starts listening in AppInitMain before the above wait so the race condition is possible.

    https://github.com/bitcoin/bitcoin/blob/1172bc4157eefe80d1aaf0b56459857ec651e535/src/init.cpp#L1358-L1367

    A good client application might wait for -startupnotify before trying to connect via IPC. StartupNotify() is called at the end AppInitMain. But we shouldn’t assume all clients do.

  5. in src/node/interfaces.cpp:1004 in af85987a16 outdated
    988@@ -989,6 +989,12 @@ class MinerImpl : public Mining
    989 
    990     std::unique_ptr<BlockTemplate> createNewBlock(const BlockCreateOptions& options) override
    


    maflcko commented at 3:21 pm on February 3, 2025:
    Would be good to update the docs, because CreateNewBlock never returns null and always returns a block.

    Sjors commented at 5:17 pm on February 3, 2025:
    The latest push has waitTipChanged and createNewBlock return null if the node shuts down, and documentation is updated to reflect that.
  6. Sjors force-pushed on Feb 3, 2025
  7. Sjors commented at 5:13 pm on February 3, 2025: member

    There were slightly more worms in this can.

    Instead I’ve now changed createNewBlock() to first call waitTipChanged() and the latter to return null during a shutdown. This shifts some work to the RPC, but I think makes the overall behaviour easier to understand.

  8. Sjors renamed this:
    Have createNewBlock() ensure m_tip_block is set
    Have createNewBlock() wait for tip, make rpc handle shutdown during long poll and wait methods
    on Feb 3, 2025
  9. Sjors force-pushed on Feb 3, 2025
  10. in src/interfaces/mining.h:86 in be15f2e769 outdated
    80@@ -81,15 +81,21 @@ class Mining
    81      * @param[in] current_tip block hash of the current chain tip. Function waits
    82      *                        for the chain tip to differ from this.
    83      * @param[in] timeout     how long to wait for a new tip
    84-     * @returns               Hash and height of the current chain tip after this call.
    85+     *
    86+     * @retval BlockRef hash and height of the current chain tip after this call.
    87+     * @retval empty if node is shutting down
    


    vasild commented at 9:41 am on February 12, 2025:
    0     * [@retval](/bitcoin-bitcoin/contributor/retval/) std::nullopt if the given `timeout` passes or the node is shut down before the tip is connected (there is no tip during startup).
    

    ryanofsky commented at 2:44 pm on February 12, 2025:

    re: #31785 (review)

    I don’t think suggested text is accurate because if timeout is exceeded while waiting for a new tip, this returns information about the current tip. I don’t think that behavior should be changed.

    Returning null only when node is shutting down should be the simplest and best behavior.


    vasild commented at 5:35 am on February 13, 2025:

    if timeout is exceeded while waiting for a new tip, this returns information about the current tip

    True. What I meant was that if there is no tip within the timeout. The function does wait_for(... tip_hash && tip_hash != current_tip ...) - tip_hash to be set and to be different than the current. I meant that it could remain unset while the timeout passes. I see how the comment I suggested is misleading. What about:

    0[@retval](/bitcoin-bitcoin/contributor/retval/) std::nullopt if the given `timeout` passes before the tip is
    1connected (there is no tip during startup) or the node is shut down.
    

    ?


    Sjors commented at 10:20 am on February 13, 2025:

    Ah, I think I see the problem. There’s a potential scenario where node startup is unusually slow, longer than timeout.

    I think it’s better in that case to ignore the timeout and instead keep waiting for the node to set a tip.


    vasild commented at 11:05 am on February 13, 2025:

    Yes, this scenario I had in mind.

    in that case to ignore the timeout and instead keep waiting for the node to set a tip

    Hmm. What if the tip is not being connected for whatever reason? Wait forever? The current behavior of respecting the timeout looks safer. Would be an odd user experience to provide a timeout and the program to decide to wait longer for whatever reason.


    Sjors commented at 2:34 pm on February 13, 2025:
    Let’s continue here: #31785 (review)
  11. in src/node/interfaces.cpp:992 in be15f2e769 outdated
    990+        if (!tip_hash || chainman().m_interrupt) return {};
    991+
    992         // Must release m_tip_block_mutex before locking cs_main, to avoid deadlocks.
    993         LOCK(::cs_main);
    994-        return BlockRef{chainman().ActiveChain().Tip()->GetBlockHash(), chainman().ActiveChain().Tip()->nHeight};
    995+        return BlockRef{*Assume(tip_hash), Assume(chainman().ActiveChain().Tip())->nHeight};
    


    vasild commented at 9:50 am on February 12, 2025:

    Just before the lock and return there was a period when both m_tip_block_mutex and cs_main were unlocked, so tip_hash may be stale here. So this could return an inconsistent hash+height - the hash of one block and the height of another. The previous code was ok because it retrieved both the hash and the height from chainman().ActiveChain().Tip() under cs_main. I think this will fix it:

    0-        LOCK(::cs_main);
    1-        return BlockRef{*Assume(tip_hash), Assume(chainman().ActiveChain().Tip())->nHeight};
    2+        return getTip();
    

    Sjors commented at 2:16 pm on February 12, 2025:
    That seems simpler anyway.

    ryanofsky commented at 2:41 pm on February 12, 2025:

    In commit “rpc: handle shutdown during long poll and wait methods” (ad3af401c19b8d05ce69011a359db3090b7018e1)

    Nice suggestion. If this suggestion is applied should also simplify the code above

    0-        if (!tip_hash || chainman().m_interrupt) return {};
    1+        if (chainman().m_interrupt) return {};
    

    since tip_hash is no longer relevant. Can also reduce the scope of the tip_hash variable.

  12. in src/rpc/blockchain.cpp:301 in be15f2e769 outdated
    294+        std::optional<BlockRef> maybe_block{timeout ? miner.waitTipChanged(block.hash, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(block.hash)};
    295+        // Return new block unless the node is shutting down
    296+        if (maybe_block) {
    297+            block = *CHECK_NONFATAL(maybe_block);
    298+        }
    299     }
    


    vasild commented at 10:23 am on February 12, 2025:

    Given that “no tip” is expected here (unlikely but is not a “programming logic error”), maybe avoid CHECK_NONFATAL() and use JSONRPCError:

    0std::optional<BlockRef> old_block{miner.getTip()};
    1std::optional<BlockRef> new_block;
    2if (IsRPCRunning()) {
    3    const uint256 h{old_block.has_value() ? old_block->hash : uint256::ZERO};
    4    new_block = timeout > 0 ? miner.waitTipChanged(h, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(h);
    5}
    6
    7if (!new_block.has_value()) {
    8    throw JSONRPCError(RPC_CLIENT_NOT_CONNECTED, "No tip within timeout or shutting down");
    9}
    

    Sjors commented at 2:21 pm on February 12, 2025:

    Oh, you mean old_block could be nullopt (assuming the RPC is loaded at this point).

    But the suggested error here would change the current behavior - if there’s a timeout we return the old block. I could do an early error return though if the first getTip() doesn’t give us a tip.


    vasild commented at 5:48 am on February 13, 2025:

    I mean to avoid CHECK_NONFATAL() if the tip is null and use JSONRPCError instead.

    If the tip is null at the start, then wait for it to be set, if not set within the timeout, then JSONRPCError.


    Sjors commented at 11:15 am on February 13, 2025:
    I ended up with something similar…
  13. in src/rpc/blockchain.cpp:356 in be15f2e769 outdated
    353+        }
    354+        // Return new block unless the node is shutting down
    355+        if (maybe_block) {
    356+            block = *CHECK_NONFATAL(maybe_block);
    357         }
    358     }
    


    vasild commented at 10:39 am on February 12, 2025:

    Similar here:

     0std::optional<BlockRef> block{miner.getTip()};
     1const auto deadline{std::chrono::steady_clock::now() + 1ms * timeout};
     2while (IsRPCRunning() && (!block.has_value() || block->hash != hash)) {
     3    const uint256 h{block.has_value() ? block->hash : uint256::ZERO};
     4    if (timeout) {
     5        const auto now{std::chrono::steady_clock::now()};
     6        if (now >= deadline) break;
     7        const MillisecondsDouble remaining{deadline - now};
     8        block = miner.waitTipChanged(h, remaining);
     9    } else {
    10        block = miner.waitTipChanged(h);
    11    }
    12}
    13
    14if (!block.has_value()) {
    15    JSONRPCError(RPC_CLIENT_NOT_CONNECTED, "No tip within timeout or shutting down");
    16}
    
  14. in src/rpc/mining.cpp:810 in be15f2e769 outdated
    806+                std::optional<BlockRef> maybe_tip{miner.waitTipChanged(hashWatchedChain, checktxtime)};
    807+                if (!maybe_tip) {
    808+                    throw JSONRPCError(RPC_CLIENT_NOT_CONNECTED, "Shutting down");
    809+                };
    810+                tip = CHECK_NONFATAL(maybe_tip)->hash;
    811                 // Timeout: Check transactions for update
    


    vasild commented at 10:43 am on February 12, 2025:
    Why is comment saying “Timeout: …” when we could be here if the tip changed (no timeout)? Is the comment wrong?

    Sjors commented at 12:46 pm on February 13, 2025:
    I pushed a commit to explain the long poll behaviour. Also added an early return above this point, just for clarity.
  15. in src/rpc/mining.cpp:807 in be15f2e769 outdated
    801@@ -801,7 +802,11 @@ static RPCHelpMan getblocktemplate()
    802         {
    803             MillisecondsDouble checktxtime{std::chrono::minutes(1)};
    804             while (tip == hashWatchedChain && IsRPCRunning()) {
    805-                tip = miner.waitTipChanged(hashWatchedChain, checktxtime).hash;
    806+                std::optional<BlockRef> maybe_tip{miner.waitTipChanged(hashWatchedChain, checktxtime)};
    807+                if (!maybe_tip) {
    808+                    throw JSONRPCError(RPC_CLIENT_NOT_CONNECTED, "Shutting down");
    


    vasild commented at 10:45 am on February 12, 2025:
    Maybe elaborate the message that we could be here if no tip was connected within the timeout: “No tip within timeout or shutting down”.

    Sjors commented at 11:01 am on February 13, 2025:
    I’m introducing a new RPC_SHUTDOWN_ERROR which makes sense here. I don’t think the message itself matters, since the user knows they shut down the node - and the exact details of where and why the RPC call fails aren’t important.
  16. in src/rpc/mining.cpp:826 in be15f2e769 outdated
    801@@ -801,7 +802,11 @@ static RPCHelpMan getblocktemplate()
    802         {
    803             MillisecondsDouble checktxtime{std::chrono::minutes(1)};
    804             while (tip == hashWatchedChain && IsRPCRunning()) {
    805-                tip = miner.waitTipChanged(hashWatchedChain, checktxtime).hash;
    806+                std::optional<BlockRef> maybe_tip{miner.waitTipChanged(hashWatchedChain, checktxtime)};
    807+                if (!maybe_tip) {
    808+                    throw JSONRPCError(RPC_CLIENT_NOT_CONNECTED, "Shutting down");
    809+                };
    810+                tip = CHECK_NONFATAL(maybe_tip)->hash;
    


    vasild commented at 10:48 am on February 12, 2025:

    nit: I think it is redundant to check that maybe_tip has value with CHECK_NONFATAL() just after if (!maybe_tip). That’s like:

    0if (A) {
    1     throw ...
    2}
    3assert(!A);
    

    tip = maybe_tip->hash; seems fine as well.


    Sjors commented at 11:04 am on February 13, 2025:
    Indeed, but on the other hand code changes over time, stuff appears between the if statement and the usage of tip, then someone changes the if statement, and now we have a crash…

    vasild commented at 11:08 am on February 13, 2025:
    Ok
  17. in src/node/interfaces.cpp:1007 in be15f2e769 outdated
     996     }
     997 
     998     std::unique_ptr<BlockTemplate> createNewBlock(const BlockCreateOptions& options) override
     999     {
    1000+        // Ensure m_tip_block is set so consumers of BlockTemplate can rely on that.
    1001+        if (!waitTipChanged(uint256::ZERO, MillisecondsDouble::max())) return {};
    


    vasild commented at 10:50 am on February 12, 2025:

    The default for the timeout argument is MillisecondsDouble::max(), so:

    0        if (!waitTipChanged(uint256::ZERO)) return {};
    

    Sjors commented at 11:06 am on February 13, 2025:
    It seems the compiler doesn’t know about this default inside the implementation file.

    vasild commented at 11:36 am on February 13, 2025:

    TIL https://flylib.com/books/en/2.887.1.53/1/

    Possible solutions I see:

    • drop the default altogether and have all callers use 2 arguments
    • use the “non-virtual interface idiom” as described in the chapter above (it is from the book “Effective C++” by Scott Meyers)
    • use overload instead of default arguments. Have two methods waitTipChanged(hash) and waitTipChanged(hash, timeout) (the first could call the second internally).

    Sjors commented at 1:45 pm on February 13, 2025:

    I don’t think we should drop the default, because MillisecondsDouble::max() is tedious. It seems fine that the interface does it internally, which is simpler than the other two solutions.

    Keep in mind that the interface has a much broader target audience (IPC users), so it should be as simple as possible. I’m less worried about other Bitcoin Core devs that need to change something inside node/interfaces.cpp


    Sjors commented at 1:51 pm on February 13, 2025:

    I could also do this:

     0diff --git a/src/node/interfaces.cpp b/src/node/interfaces.cpp
     1index 66ce64b1c3..b0735d497c 100644
     2--- a/src/node/interfaces.cpp
     3+++ b/src/node/interfaces.cpp
     4@@ -971,7 +971,7 @@ public:
     5         return BlockRef{tip->GetBlockHash(), tip->nHeight};
     6     }
     7 
     8-    std::optional<BlockRef> waitTipChanged(uint256 current_tip, MillisecondsDouble timeout) override
     9+    std::optional<BlockRef> waitTipChanged(uint256 current_tip, MillisecondsDouble timeout = MillisecondsDouble::max()) override
    10     {
    11         if (timeout > std::chrono::years{100}) timeout = std::chrono::years{100}; // Upper bound to avoid UB in std::chrono
    12         std::optional<uint256> tip_hash;
    13@@ -1005,7 +1005,7 @@ public:
    14     std::unique_ptr<BlockTemplate> createNewBlock(const BlockCreateOptions& options) override
    15     {
    16         // Ensure m_tip_block is set so consumers of BlockTemplate can rely on that.
    17-        if (!waitTipChanged(uint256::ZERO, MillisecondsDouble::max())) return {};
    18+        if (!waitTipChanged(uint256::ZERO)) return {};
    19 
    20         BlockAssembler::Options assemble_options{options};
    21         ApplyArgsManOptions(*Assert(m_node.args), assemble_options);
    

    But that doesn’t seem nicer. Also the compiler doesn’t warn if the default in the override is different, so this seems unsafe.


    Sjors commented at 2:29 pm on February 13, 2025:

    My takeaway from the article is that having a default for the base class is fine, and that it’s better to not use a default for the inherited classes (but it could get ignored).

    The patch in my previous comment would be bad. But the current code that calls waitTipChanged(uint256::ZERO, MillisecondsDouble::max()) should be fine.


    ryanofsky commented at 4:28 pm on February 13, 2025:

    In commit “Have createNewBlock() wait for a tip” (854766b886b6c2ee564b867db64bdbce80064b3b)

    Another alternative but probably not better is:

    0-        if (!waitTipChanged(uint256::ZERO, MillisecondsDouble::max())) return {};
    1+        if (!static_cast<Mining*>(this)->waitTipChanged(uint256::ZERO)) return {};
    

    I think in longer term after we add an initial waitNext() method and make improvements to it, it will be natural to move more functionality out of Mining class methods into node/miner.cpp or somewhere like that and these methods will get simpler. The node/interfaces.cpp file is mostly meant to contain glue code and wrap other functions, not implement real functionality itself. Since a lot of code needs to wait, I imagine there will be some new waiting utility function this could call instead of waitTipChanged, but using waitTipChanged for now here should be fine.

  18. in src/node/interfaces.cpp:972 in be15f2e769 outdated
    970@@ -971,24 +971,32 @@ class MinerImpl : public Mining
    971         return BlockRef{tip->GetBlockHash(), tip->nHeight};
    972     }
    


    vasild commented at 10:54 am on February 12, 2025:
    The message of the first commit has a line that is 211 chars. Would be nice to use the 50/72 rule in commit messages.
  19. in src/node/interfaces.cpp:971 in be15f2e769 outdated
    970@@ -971,24 +971,32 @@ class MinerImpl : public Mining
    971         return BlockRef{tip->GetBlockHash(), tip->nHeight};
    


    vasild commented at 10:56 am on February 12, 2025:

    In commit message of ad3af401c19b8d05ce69011a359db3090b7018e1 rpc: handle shutdown during long poll and wait methods:

    0- getblocktemplate,  waitfornewblock
    1+ getblocktemplate, waitfornewblock
    
  20. vasild commented at 10:56 am on February 12, 2025: contributor
    Approach ACK be15f2e769987b1df2cf1c8f327c5aafe064dbb3
  21. ryanofsky commented at 2:27 pm on February 12, 2025: contributor

    Previously it would return the last known tip during shutdown, but this creates an ambiguous circumstance in the scenario where the node is started and quickly shutdown, before notifications().TipBlock() is set.

    New behavior seems good, but I am confused by this description. It looks to me like previous behavior was to segfault in this case, because it would returnTip()->GetBlockHash() and Tip() would be null? It would be helpful if description said more specifically what previous behavior was.

  22. ryanofsky commented at 2:48 pm on February 12, 2025: contributor
    Code review be15f2e769987b1df2cf1c8f327c5aafe064dbb3. Mostly looks good but I think vasild’s suggestion #31785 (review) should be used so waitTipChanged can’t return an inconsistent hash and height
  23. Sjors force-pushed on Feb 13, 2025
  24. Sjors commented at 11:15 am on February 13, 2025: member

    I changed waitTipChanged() to only return nullopt during shutdown. This is achieved by extending the timeout in the unlikely event that node initialization takes longer.

    I introduced a new RPC_SHUTDOWN_ERROR code and use it in case waitfornewblock and friends are called really early and the node shuts down (probably very hard to reach in practice). See #31785 (review)

    It would be helpful if description said more specifically what previous behavior was.

    Updated the description.

    On question about long polling I still need to investigate: #31785 (review)

  25. Sjors force-pushed on Feb 13, 2025
  26. DrahtBot added the label CI failed on Feb 13, 2025
  27. DrahtBot commented at 12:04 pm on February 13, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37158437645

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  28. Sjors force-pushed on Feb 13, 2025
  29. Sjors commented at 12:50 pm on February 13, 2025: member
    I pushed a commit to clarify long poll behavior.
  30. in src/rpc/protocol.h:52 in e6760e7f0f outdated
    48@@ -49,6 +49,7 @@ enum RPCErrorCode
    49     RPC_VERIFY_ALREADY_IN_UTXO_SET  = -27, //!< Transaction already in utxo set
    50     RPC_IN_WARMUP                   = -28, //!< Client still warming up
    51     RPC_METHOD_DEPRECATED           = -32, //!< RPC method is deprecated
    52+    RPC_SHUTDOWN_ERROR              = -37, //!< Node is shutting down
    


    ryanofsky commented at 1:13 pm on February 13, 2025:

    In commit “rpc: introduce RPC_SHUTDOWN_ERROR” (e6760e7f0f04c076c3eb8fa61cb54adf12fed36b)

    Out of curiosity, where did 37 come from?


    Sjors commented at 2:29 pm on February 13, 2025:
    First available number.

    ryanofsky commented at 3:48 pm on February 13, 2025:

    First available number.

    Oh, thanks. Didn’t notice the numbers were out of order and later ones were used below.

  31. in src/node/interfaces.cpp:1002 in fa99eddd92 outdated
    1007+        Assume(tip_hash);
    1008+
    1009+        // Must release m_tip_block_mutex before getTip() locks cs_main, to
    1010+        // avoid deadlocks.
    1011+        return getTip();
    1012     }
    


    ryanofsky commented at 2:12 pm on February 13, 2025:

    In commit “rpc: handle shutdown during long poll and wait methods” (fa99eddd926e0d49604d9ca8f21fac85d63652f4)

    I think this new behavior of only using timeout as time limit for tip changing, not as a time limit for the tip to be set on startup makes a lot of sense and should be much more convenient for callers than previous behavior.

    (Previous version in ad3af401c19b8d05ce69011a359db3090b7018e1 would return ambiguous null values on startup that could mean either the node was shutting down or the timeout was reached, and would force callers to immediately make another waitTipChanged call without even giving them any new information in the meantime.)

    Only downside of this behavior is that mining API no longer includes a function that can wait with a limited timeout for the tip to be non-null on startup. But if anybody wants this we could add a timeout option to getTip() or a new separate waitForStartup() method accepting a timeout. Or we could not provide anything, since it is pretty easy for callers to implemented their own timeouts if they are using the capnproto client API.

    However, the current implementation of this behavior in fa99eddd926e0d49604d9ca8f21fac85d63652f4 looks a bit janky to me, with double nested wait loop, unnecessary 100ms polling, unnecessary tip_hash variable, and an Assume() check out of place and delayed for no reason. Following is untested, but I think would a simpler implementation that cleans all these issues up:

     0    std::optional<BlockRef> waitTipChanged(uint256 current_tip, MillisecondsDouble timeout) override
     1    {
     2        if (timeout > std::chrono::years{100}) timeout = std::chrono::years{100}; // Upper bound to avoid UB in std::chrono
     3        auto deadline{std::chrono::steady_clock::now() + timeout};
     4        {
     5            WAIT_LOCK(notifications().m_tip_block_mutex, lock);
     6            // For callers convenience, wait longer than the provided timeout
     7            // during startup for the tip to be non-null. That way this function
     8            // always returns valid tip information when possible and only
     9            // returns null when shutting down, not when timing out.
    10            notifications().m_tip_block_cv.wait(lock, [&]() EXCLUSIVE_LOCKS_REQUIRED(notifications().m_tip_block_mutex) {
    11                return notifications().TipBlock() || chainman().m_interrupt;
    12            });
    13            // At this point TipBlock is set, so continue to wait until it is
    14            // different then `current_tip` provided by caller.
    15            notifications().m_tip_block_cv.wait_until(lock, deadline, [&]() EXCLUSIVE_LOCKS_REQUIRED(notifications().m_tip_block_mutex) {
    16                return Assume(notifications().TipBlock()) != current_tip || chainman().m_interrupt;
    17            });
    18         }
    19
    20        if (chainman().m_interrupt) return {};
    21
    22        // Must release m_tip_block_mutex before getTip() locks cs_main, to
    23        // avoid deadlocks.
    24        return getTip();
    25    }
    

    Sjors commented at 2:39 pm on February 13, 2025:

    Thanks, that indeed looks simpler. Done in 0c06c8108fb43be317d86199750ecd3b7db9a5d2.

    I also think clients should handle timeouts themselves. For bitcoin-cli we have -rpcclienttimeout=0, other RPC clients presumably have their own timeout mechanisms, and IPC clients can do the same.

  32. Sjors force-pushed on Feb 13, 2025
  33. Sjors commented at 2:47 pm on February 13, 2025: member
    I mixed up the commits in my last push, will fix…
  34. Sjors force-pushed on Feb 13, 2025
  35. Sjors force-pushed on Feb 13, 2025
  36. Sjors force-pushed on Feb 13, 2025
  37. in src/rpc/blockchain.cpp:300 in 0c06c8108f outdated
    297-        block = timeout ? miner.waitTipChanged(block.hash, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(block.hash);
    298+        new_block = timeout ? miner.waitTipChanged(block.hash, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(block.hash);
    299+        // Return new block unless the node is shutting down
    300+        if (new_block) {
    301+            block = *CHECK_NONFATAL(new_block);
    302+        }
    


    ryanofsky commented at 4:46 pm on February 13, 2025:

    In commit “rpc: handle shutdown during long poll and wait methods” (0c06c8108fb43be317d86199750ecd3b7db9a5d2)

    I think this code is not quite right because it is assuming that if getTip() returns null it means the node is shutting down, when in reality it is starting up. Also the IsRPCRunning() call unnecessarily complicates things and is incorrect as we found previously in #30635. Would also be nice to reduce number of variables used in this code and remove CHECK_NONFATAL calls that can’t trigger because they immediately follow null checks. Would suggest:

     0--- a/src/rpc/blockchain.cpp
     1+++ b/src/rpc/blockchain.cpp
     2@@ -287,18 +287,12 @@ static RPCHelpMan waitfornewblock()
     3     NodeContext& node = EnsureAnyNodeContext(request.context);
     4     Mining& miner = EnsureMining(node);
     5 
     6-    std::optional<BlockRef> old_block{miner.getTip()};
     7-    if (!old_block) throw JSONRPCError(RPC_SHUTDOWN_ERROR, "Node is shutting down");
     8-    BlockRef block{*CHECK_NONFATAL(old_block)};
     9-
    10-    std::optional<BlockRef> new_block;
    11-    if (IsRPCRunning()) {
    12-        new_block = timeout ? miner.waitTipChanged(block.hash, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(block.hash);
    13-        // Return new block unless the node is shutting down
    14-        if (new_block) {
    15-            block = *CHECK_NONFATAL(new_block);
    16-        }
    17-    }
    18+    std::optional<BlockRef> tip{miner.getTip()};
    19+    if (!tip) throw JSONRPCError(RPC_IN_WARMUP, "Node is starting up");
    20+    BlockRef block{*tip};
    21+    tip = timeout > 0 ? miner.waitTipChanged(block.hash, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(block.hash);
    22+    if (!tip) throw JSONRPCError(RPC_SHUTDOWN_ERROR, "Node is shutting down");
    23+    block = *tip;
    24 
    25     UniValue ret(UniValue::VOBJ);
    26     ret.pushKV("hash", block.hash.GetHex());
    

    Probably similar simplifications can apply in other methods below.


    Sjors commented at 7:01 pm on February 13, 2025:
    Good point, getTip() doesn’t wait.

    Sjors commented at 7:41 pm on February 13, 2025:
    I made a variant of this that’s a bit more in line with #30635; during startup we’ll wait by using the 0 hash.

    ryanofsky commented at 5:25 pm on February 18, 2025:

    re: #31785 (review)

    I made a variant of this that’s a bit more in line with #30635; during startup we’ll wait by using the 0 hash.

    This seems ok to me, and I initially considered suggesting that, but it seemed like it expanded the scope of the PR too much and was not related to its original goals.

    I think if the PR will do this, it should also include release notes that say these RPCs will now wait on startup instead of returning errors on startup. The RPC method documentation should also be updated to say that timeout option only applies to the specific condition the RPC method is waiting for, and if the RPC method is called during startup it will wait arbitrarily long for the chain state to be ready, ignoring the timeout.

    I’d still slightly prefer keeping semantics of the RPC methods unchanged in this PR and just making the more minimal change I suggested above. I think a dedicated PR like #30635 would be a better place to update the RPC interface, if that is what we want to do.

    No strong opinion though, and this seems ok too.


    Sjors commented at 11:55 am on February 19, 2025:
    I went back to the old behavior.
  38. DrahtBot removed the label CI failed on Feb 13, 2025
  39. Sjors force-pushed on Feb 13, 2025
  40. in src/node/interfaces.cpp:994 in 2042ed1055 outdated
    991+            });
    992+            // At this point TipBlock is set, so continue to wait until it is
    993+            // different then `current_tip` provided by caller.
    994+            notifications().m_tip_block_cv.wait_until(lock, deadline, [&]() EXCLUSIVE_LOCKS_REQUIRED(notifications().m_tip_block_mutex) {
    995+                return Assume(notifications().TipBlock()) != current_tip || chainman().m_interrupt;
    996             });
    


    vasild commented at 11:50 am on February 18, 2025:

    The Assume() can be triggered during normal circumstances: if the first wait quits with TipBlock() null and chainman().m_interrupt being true. The easiest way to work around this would be to swap the order in the second condition:

    0- Assume(notifications().TipBlock()) != current_tip || chainman().m_interrupt
    1+ chainman().m_interrupt || Assume(notifications().TipBlock()) != current_tip
    

    that might be a bit fragile though, wrt future changes.


    Sjors commented at 2:44 pm on February 18, 2025:
    I flipped it.

    ryanofsky commented at 5:00 pm on February 18, 2025:

    re: #31785 (review)

    Nice catch. I agree flipping condition is a bit fragile. It also makes code less consistent and leaves “At this point TipBlock is set…” comment being not totally accurate. Would suggest a tweak to unflip and be clearer:

     0--- a/src/node/interfaces.cpp
     1+++ b/src/node/interfaces.cpp
     2@@ -984,10 +984,11 @@ public:
     3             notifications().m_tip_block_cv.wait(lock, [&]() EXCLUSIVE_LOCKS_REQUIRED(notifications().m_tip_block_mutex) {
     4                 return notifications().TipBlock() || chainman().m_interrupt;
     5             });
     6+            if (chainman().m_interrupt) return {};
     7             // At this point TipBlock is set, so continue to wait until it is
     8             // different then `current_tip` provided by caller.
     9             notifications().m_tip_block_cv.wait_until(lock, deadline, [&]() EXCLUSIVE_LOCKS_REQUIRED(notifications().m_tip_block_mutex) {
    10-                return chainman().m_interrupt || Assume(notifications().TipBlock()) != current_tip;
    11+                return Assume(notifications().TipBlock()) != current_tip || chainman().m_interrupt;
    12             });
    13         }
    14 
    

    Sjors commented at 8:37 am on February 19, 2025:
    Good point about the comment, I took your suggestion.
  41. in src/rpc/blockchain.cpp:361 in 2042ed1055 outdated
    362 
    363     UniValue ret(UniValue::VOBJ);
    364-    ret.pushKV("hash", block.hash.GetHex());
    365-    ret.pushKV("height", block.height);
    366+    ret.pushKV("hash", block->hash.GetHex());
    367+    ret.pushKV("height", block->height);
    


    vasild commented at 12:01 pm on February 18, 2025:

    This could crash in the following (stupid) scenario:

    • the user provides 0 as desired block hash to the waitforblock
    • getTip() returns nullopt, which is assigned to block
    • the while loop is never entered, not even once so block remains nullopt
    • at the end we try to dereference the nullopt block :rescue_worker_helmet:

    We shouldn’t crash even on such “idiotic” input. Maybe change the while condition:

    0- while (tip_hash != hash) {
    1+ while (tip_hash != hash || !block.has_value()) {
    

    Sjors commented at 2:25 pm on February 18, 2025:
    I guess it could also crash on a non-existing block hash? That’s not unlikely to happen.

    Sjors commented at 2:44 pm on February 18, 2025:
    Added
  42. in src/rpc/blockchain.cpp:409 in 2042ed1055 outdated
    406+    // If getTip() returns null, which can happen during startup, pass 0 to
    407+    // waitTipChanged() so it will wait for any tip hash to be set.
    408+    uint256 tip_hash{block ? block->hash : uint256{}};
    409+    int block_height{block ? block->height : 0};
    410+
    411+    while (block_height < height) {
    


    vasild commented at 12:04 pm on February 18, 2025:
    Same here, if the user provides height=0, and getTip() returns nullopt, then this will be while (0 < 0), will never enter and block will remain nullopt and be dereferenced at the end.
  43. in src/rpc/blockchain.cpp:293 in 2042ed1055 outdated
    292-        block = timeout ? miner.waitTipChanged(block.hash, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(block.hash);
    293-    }
    294+    std::optional<BlockRef> block{miner.getTip()};
    295+    // If getTip() returns null, which can happen during startup, pass 0 to
    296+    // waitTipChanged() so it will wait for any tip hash to be set.
    297+    uint256 tip_hash{block ? block->hash : uint256{}};
    


    vasild commented at 12:09 pm on February 18, 2025:

    nit: maybe use uint256::ZERO, also used in commit b954af72e4d6a83b3ba701b7ee52d97f2af3254b Have createNewBlock() wait for a tip:

    0    uint256 tip_hash{block ? block->hash : uint256::ZERO};
    
  44. vasild commented at 12:15 pm on February 18, 2025: contributor
    Approach ACK 2042ed105578a71234295b01be39ad8607722abe
  45. Sjors force-pushed on Feb 18, 2025
  46. Sjors commented at 2:45 pm on February 18, 2025: member
    Addressed @vasild’s feedback (and rebased).
  47. in src/rpc/blockchain.cpp:346 in 98282f5611 outdated
    343+    // waitTipChanged() so it will wait for any tip hash to be set.
    344+    uint256 tip_hash{block ? block->hash : uint256{}};
    345+
    346     const auto deadline{std::chrono::steady_clock::now() + 1ms * timeout};
    347-    while (IsRPCRunning() && block.hash != hash) {
    348+    while (tip_hash != hash || !block.has_value()) {
    


    ryanofsky commented at 5:35 pm on February 18, 2025:

    In commit “rpc: handle shutdown during long poll and wait methods” (98282f56112af33691a3fe2e1ad7c14febbb0660)

    Not important but it would be nice if code style in this PR were a little more consistent. I’d prefer replacing uint256{} with uint256::ZERO and replacing block.has_value() with just block but any consistent style would be better IMO.


    Sjors commented at 11:53 am on February 19, 2025:
    Agreed, though both disappeared in my last push.
  48. in src/rpc/mining.cpp:821 in 6806651d2f outdated
    815@@ -801,16 +816,21 @@ static RPCHelpMan getblocktemplate()
    816         LEAVE_CRITICAL_SECTION(cs_main);
    817         {
    818             MillisecondsDouble checktxtime{std::chrono::minutes(1)};
    819-            while (tip == hashWatchedChain && IsRPCRunning()) {
    820+            while (IsRPCRunning()) {
    821+                // If hashWatchedChain is not be a real block hash, this will
    822+                // return immediately.
    


    ryanofsky commented at 6:09 pm on February 18, 2025:

    In commit “rpc: clarify longpoll behavior” (6806651d2f64d51094178cbfb23ef2f4a956aa4d)

    s/is not be/is not/

  49. ryanofsky approved
  50. ryanofsky commented at 6:19 pm on February 18, 2025: contributor

    Code review ACK 6806651d2f64d51094178cbfb23ef2f4a956aa4d

    As mentioned in a comment below, I don’t really like how this PR is changing startup behavior of the RPC methods without documenting it, and when it seem like it expands the scope of this PR and makes it harder to understand. Would prefer doing that in #30635 so this PR could be more focused. But current approach does seem ok too.

  51. DrahtBot requested review from vasild on Feb 18, 2025
  52. Sjors commented at 8:20 am on February 19, 2025: member

    I think a dedicated PR like #30635 would be a better place to update the RPC interface, if that is what we want to do.

    I’ll rebase the latter PR on top of this, and then reduce the scope here.

  53. vasild approved
  54. vasild commented at 8:48 am on February 19, 2025: contributor

    ACK 6806651d2f64d51094178cbfb23ef2f4a956aa4d

    Would be happy to re-review if suggestions from @ryanofsky are applied, especially #31785 (review)

    Note wrt RPC changing of the behavior when waiting at startup for the tip to be set – the current code first sets the tip and then starts the RPC, so this is mostly theoretical or about future changes that might swap the startup order. Thus, IMO, no need for release notes. Edit: I forgot that there is also a change that the RPC will now throw during shutdown. Maybe deserves a release note.

  55. Sjors force-pushed on Feb 19, 2025
  56. Sjors commented at 11:56 am on February 19, 2025: member

    the current code first sets the tip and then starts the RPC, so this is mostly theoretical or about future changes that might swap the startup order.

    Indeed, that’s what it’s supposed to do.

    I ended up simplifying things a bit. The original CHECK_NONFATAL(miner.getTip() has been replaced with JSONRPCError(RPC_IN_WARMUP. So in the unlikely event there’s no tip, the user gets a clear error. I also added an Assume, so we can fix things if it turns out SetRPCWarmupFinished was called too early by our init code.

    I restored the original behavior of returning the last tip for both timeout and shutdown. This is possible now because we throw during startup if there’s no tip.

    I dropped the use of RPC_SHUTDOWN_ERROR in rpc/mining.cpp (and the new code along with it).

    I added a separate commit to drop the IsRPCRunning checks. Since these methods are undocumented and hidden, I don’t think it needs a release note. That can wait for #30635.

  57. Sjors force-pushed on Feb 19, 2025
  58. Sjors commented at 12:15 pm on February 19, 2025: member
    I forgot that you can’t use Assume in RPC code. Using NONFATAL_UNREACHABLE instead, though that’s not as clear to the user.
  59. DrahtBot commented at 12:15 pm on February 19, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37463939357

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  60. DrahtBot added the label CI failed on Feb 19, 2025
  61. Sjors commented at 12:51 pm on February 19, 2025: member

    Four CI machines consistently failing rpc_blockchain.py (either --v1transport or --v2transport) at:

    0def assert_waitforheight(height, timeout=2):
    1   assert_equal(
    2      node.waitforblockheight(height=height, timeout=timeout)['height'],
    3      current_height)
    

    With “Remote end closed connection without response”, but not much useful info otherwise. Can’t reproduce locally yet.

    E.g. https://cirrus-ci.com/task/5674616741429248?logs=ci#L2902


    Oh there is a hint:

     0[07:26:06.749] 
     1[07:26:06.749] 
     2 node0 stderr Error: Invalid height value (bip34@-2) for -testactivationheight=name@height. 
     3[07:26:06.749] 
     4 node0 stderr Error: Invalid format () for -testactivationheight=name@height. 
     5[07:26:06.749] 
     6 node0 stderr /usr/include/c++/14/optional:475: constexpr _Tp& std::_Optional_base_impl<_Tp, _Dp>::_M_get() [with _Tp = interfaces::BlockRef; _Dp = std::_Optional_base<interfaces::BlockRef, true, true>]: Assertion 'this->_M_is_engaged()' failed. 
     7[07:26:06.749] 
     8 node0 stderr Error: Invalid name (name@2) for -testactivationheight=name@height. 
     9[07:26:06.749] 
    10 node0 stderr : The block database contains a block which appears to be from the future. This may be due to your computer's date and time being set incorrectly. Only rebuild the block database if you are sure that your computer's date and time are correct.
    11[07:26:06.749] Please restart with -reindex or -reindex-chainstate to recover. 
    

    Or maybe not, since these are just things that are being tested.

  62. in src/rpc/blockchain.cpp:405 in 0a8a9e8a90 outdated
    399@@ -387,23 +400,30 @@ static RPCHelpMan waitforblockheight()
    400     NodeContext& node = EnsureAnyNodeContext(request.context);
    401     Mining& miner = EnsureMining(node);
    402 
    403-    auto block{CHECK_NONFATAL(miner.getTip()).value()};
    404+    std::optional<BlockRef> current_block{miner.getTip()};
    405+    // Abort if RPC came out of warmup too early
    406+    if (!current_block) NONFATAL_UNREACHABLE();
    


    maflcko commented at 12:52 pm on February 19, 2025:
    0     BlockRef current_block{*CHECK_NONFATAL(miner.getTip())};
    1    // Abort if RPC came out of warmup too early
    2 
    

    There is no need for optional, if the value can never be nullopt

  63. Sjors force-pushed on Feb 19, 2025
  64. Sjors commented at 1:07 pm on February 19, 2025: member

    Replaced NONFATAL_UNREACHABLE with CHECK_NONFATAL again by avoiding the intermediate std::optional, see #31785 (review)

    Rebased and pushed again, though I haven’t figured out the CI issue.

  65. vasild commented at 1:25 pm on February 19, 2025: contributor

    Remote end closed connection without response

    That probably means bitcoind crashed.

    node0 stderr … optional … Assertion ’this->_M_is_engaged()’ failed.

  66. Sjors commented at 1:47 pm on February 19, 2025: member

    M_engaged seems to be gcc specific for std::optional: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/optional#L291

    It presumably happens when you dereference it in a bad way, but then why doesn’t it crash when compiled with clang (as on my macOS machine)?

    Llvm uses the term “engaged” too, but maybe it has slightly different checks? https://github.com/llvm/llvm-project/blob/main/libcxx/include/optional

  67. Sjors force-pushed on Feb 19, 2025
  68. vasild commented at 2:01 pm on February 19, 2025: contributor
    I would guess the optional is dereferenced when it is nullopt and this is not compiler related. Maybe it only happens on CI due to some timing specifics? CI failures that don’t repro locally are the most tedious to nail down. Try adding a lot of LogError("reached line 123, relevant values %d %s %whatever", ...) and then inspect the CI log for those.
  69. Sjors commented at 2:02 pm on February 19, 2025: member

    Trying .value() instead of *

    https://github.com/bitcoin/bitcoin/compare/51aa29ede7bd9a1897f3cf240e70ef180849591d..3b877e6718a916402adf8dcb141da01064289a6b

    I think that’s the only place where something relevant changed since the last successful CI run, see:

    0git diff  3a9ab8f  51aa29e src/rpc/blockchain.cpp
    

    https://gist.github.com/Sjors/60c05fb3e9c47ed6d41b78fa66e76ef5

    If that works, there’s probably an issue with how CHECK_NONFATAL works.

  70. ryanofsky commented at 2:05 pm on February 19, 2025: contributor

    This failure is strange. I tried to reproduce the failure from https://cirrus-ci.com/task/6207546920271872?logs=ci#L2923:

    0 node0 stderr /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/optional:479: _Tp &std::_Optional_base_impl<interfaces::BlockRef, std::_Optional_base<interfaces::BlockRef>>::_M_get() [_Tp = interfaces::BlockRef, _Dp = std::_Optional_base<interfaces::BlockRef>]: Assertion 'this->_M_is_engaged()' failed. 
    

    using gcc 13.3.0, checking out commit that failed 3872aed9d3cc0c8046272eb5d488ff5ddbe78647 and running the test that failed test/functional/rpc_blockchain.py --v1transport but it seemed to pass. I also didn’t see any places an optional BlockRef is being referenced in the code reviewing 51aa29ede7bd9a1897f3cf240e70ef180849591d.

  71. maflcko commented at 2:21 pm on February 19, 2025: member

    CI failures that don’t repro locally are the most tedious to nail down.

    It is not possible to reliably reproduce undefined behavior, because it is undefined. My recommendation would be to just run in valgrind to spot any memory errors (and get a stacktrace). Alternatively, you can re-compile with hardened library flags, but that requires a recompilation with different flags and doesn’t give a backtrace either.

  72. Sjors commented at 2:29 pm on February 19, 2025: member
    It’s unlikely to be related to starting up the node, because this happens only on the fourth waitforblockheight call in the test, where it’s waiting for current_height + 1. That’s the only time when it actually has to wait. It should hit the timeout after 2 milliseconds.
  73. in src/rpc/blockchain.cpp:347 in 1e078f4e02 outdated
    344-    while (block.hash != hash) {
    345+    while (current_block.hash != hash) {
    346+        std::optional<BlockRef> block;
    347         if (timeout) {
    348             auto now{std::chrono::steady_clock::now()};
    349             if (now >= deadline) break;
    


    Sjors commented at 2:32 pm on February 19, 2025:
    This might be a problem.
  74. Sjors force-pushed on Feb 19, 2025
  75. Sjors commented at 3:07 pm on February 19, 2025: member

    Perhaps on a slow machine the timeout could go negative here:

    0auto now{std::chrono::steady_clock::now()};
    1if (now >= deadline) break;
    2const MillisecondsDouble remaining{deadline - now};
    3block = miner.waitTipChanged(current_block.hash, remaining);
    

    Though that seems unlikely, and I wouldn’t expect it to happen consistently. In any case I added better handling for negative timeout.


    Update: no, deadline couldn’t have gone negative

  76. in src/rpc/blockchain.cpp:411 in 505cb484eb outdated
    410             auto now{std::chrono::steady_clock::now()};
    411-            if (now >= deadline) break;
    412             const MillisecondsDouble remaining{deadline - now};
    413-            block = miner.waitTipChanged(block.hash, remaining);
    414+            if (remaining < 0ms) break;
    415+            block = miner.waitTipChanged(block->hash, remaining);
    


    ryanofsky commented at 3:12 pm on February 19, 2025:
    I reproduced the CI bug and problem happens on this line. Looks like block->hash should be replaced by current_block.hash here and below

    Sjors commented at 3:21 pm on February 19, 2025:
    Good catch, that’s also consistent with waitforblock.

    Sjors commented at 4:28 pm on February 19, 2025:
    Looks like that indeed did the trick.
  77. Sjors force-pushed on Feb 19, 2025
  78. Sjors force-pushed on Feb 19, 2025
  79. in src/interfaces/mining.h:84 in f036e5091a outdated
    79@@ -80,7 +80,9 @@ class Mining
    80      *
    81      * @param[in] current_tip block hash of the current chain tip. Function waits
    82      *                        for the chain tip to differ from this.
    83-     * @param[in] timeout     how long to wait for a new tip
    84+     * @param[in] timeout     how long to wait for a new tip (default is forever,
    85+     *                        negative values are rounded to 0, but should be avoided)
    


    ryanofsky commented at 4:40 pm on February 19, 2025:

    In commit “Handle negative timeout for waitTipChanged()” (f036e5091a90b75982684c5b64db1301d3ec2515)

    IMO, new handling of negative timeouts seems reasonable to keep, but mentioning them in documentation is distracting. Would be less confusing to just shorten this to “(default is forever)”.

  80. in src/rpc/blockchain.cpp:291 in edfb4ad204 outdated
    286@@ -286,12 +287,17 @@ static RPCHelpMan waitfornewblock()
    287     NodeContext& node = EnsureAnyNodeContext(request.context);
    288     Mining& miner = EnsureMining(node);
    289 
    290-    auto block{CHECK_NONFATAL(miner.getTip()).value()};
    291-    block = timeout ? miner.waitTipChanged(block.hash, std::chrono::milliseconds(timeout)) : miner.waitTipChanged(block.hash);
    292+    // Abort if RPC came out of warmup too early
    293+    BlockRef current_block{CHECK_NONFATAL(miner.getTip()).value()};
    


    ryanofsky commented at 4:46 pm on February 19, 2025:

    In commit “rpc: handle shutdown during long poll and wait methods” (edfb4ad204e71fc266be55748290340b8791e12f)

    This seem inconsistent with the commit message with says “a missing tip at startup now triggers NONFATAL_UNREACHABLE instead of CHECK_NONFATAL.” Is commit message just out of date?


    Sjors commented at 6:03 pm on February 19, 2025:
    Yes, I’ll update the commit message.
  81. in src/rpc/blockchain.cpp:296 in edfb4ad204 outdated
    293+    BlockRef current_block{CHECK_NONFATAL(miner.getTip()).value()};
    294+    std::optional<BlockRef> block = timeout ? miner.waitTipChanged(current_block.hash, std::chrono::milliseconds(timeout)) :
    295+                                              miner.waitTipChanged(current_block.hash);
    296+
    297+    // Return current block upon shutdown
    298+    if (!block) block = current_block;
    


    ryanofsky commented at 4:55 pm on February 19, 2025:

    In commit “rpc: handle shutdown during long poll and wait methods” (edfb4ad204e71fc266be55748290340b8791e12f)

    Would be more consistent with other methods (and also clearer IMO) to just return current_block instead of block like:

     0--- a/src/rpc/blockchain.cpp
     1+++ b/src/rpc/blockchain.cpp
     2@@ -293,11 +293,11 @@ static RPCHelpMan waitfornewblock()
     3                                               miner.waitTipChanged(current_block.hash);
     4 
     5     // Return current block upon shutdown
     6-    if (!block) block = current_block;
     7+    if (block) current_block = *block;
     8 
     9     UniValue ret(UniValue::VOBJ);
    10-    ret.pushKV("hash", block->hash.GetHex());
    11-    ret.pushKV("height", block->height);
    12+    ret.pushKV("hash", current_block.hash.GetHex());
    13+    ret.pushKV("height", current_block.height);
    14     return ret;
    15 },
    16     };
    
  82. in src/rpc/blockchain.cpp:410 in edfb4ad204 outdated
    409         if (timeout) {
    410             auto now{std::chrono::steady_clock::now()};
    411-            if (now >= deadline) break;
    412             const MillisecondsDouble remaining{deadline - now};
    413-            block = miner.waitTipChanged(block.hash, remaining);
    414+            if (remaining < 0ms) break;
    


    ryanofsky commented at 5:03 pm on February 19, 2025:

    In commit “rpc: handle shutdown during long poll and wait methods” (edfb4ad204e71fc266be55748290340b8791e12f)

    This change seems equivalent to previous code, and previous code seems clearer. This change also makes waitforblockheight implementation less consistent with waitforblock. Would suggest:

     0--- a/src/rpc/blockchain.cpp
     1+++ b/src/rpc/blockchain.cpp
     2@@ -406,8 +406,8 @@ static RPCHelpMan waitforblockheight()
     3         std::optional<BlockRef> block;
     4         if (timeout) {
     5             auto now{std::chrono::steady_clock::now()};
     6+            if (now >= deadline) break;
     7             const MillisecondsDouble remaining{deadline - now};
     8-            if (remaining < 0ms) break;
     9             block = miner.waitTipChanged(current_block.hash, remaining);
    10         } else {
    11             block = miner.waitTipChanged(current_block.hash);
    

    Sjors commented at 6:07 pm on February 19, 2025:
    Indeed, I was briefly confused about this, but there’s no need to change it.
  83. in src/interfaces/mining.h:98 in a5b3406489 outdated
     95+     * During node initialization, this will wait until the tip is connected.
     96      *
     97      * @param[in] options options for creating the block
     98-     * @returns a block template
     99+     * @retval BlockTemplate a block template.
    100+     * @retval std::nullopt if the node is shut down.
    


    ryanofsky commented at 5:07 pm on February 19, 2025:

    In commit “Have createNewBlock() wait for a tip” (a5b34064891814a998cac73c32f60d55a93e6811)

    This is actually returning nullptr not nullopt

  84. ryanofsky approved
  85. ryanofsky commented at 5:10 pm on February 19, 2025: contributor
    Code review ACK b74b6c95eeba57ee6f627ceae0deeb6b14eadc0c
  86. DrahtBot requested review from vasild on Feb 19, 2025
  87. ryanofsky commented at 5:44 pm on February 19, 2025: contributor

    From the PR description:

    The first commit allows these three methods to be used in the GUI, by removing the IsRPCRunning(). This simplifies the other changes.

    This currently happens in the second commit, not the first commit.

    I think in general PR description is a little hard to follow though. Here is how I would describe this PR:

    • This PR prevents Mining interface methods from sometimes crashing when called during startup before a tip is connected. It also makes other improvements like making more RPC methods usable from the GUI. Specifically this PR:
      • Adds an Assume check to disallow passing negative timeout values to Mining::waitTipChanged
      • Makes waitfornewblock waitforblock and waitforblockheight RPC methods usable from the GUI when -server=1 is not set.
      • Changes Mining::waitTipChanged to return optional<BlockRef> instead of BlockRef and return nullopt instead of crashing if there is a timeout or if the node is shut down before a tip is connected.
      • Changes Mining::waitTipChanged to not time out before a tip is connected, so it is convenient and safe to call during startup, and only returns nullopt on early shutdowns.
      • Changes Mining::createNewBlock to block and wait for a tip to be connected if it is called on startup instead of crashing. Also documents that it will return null on early shutdowns.
  88. DrahtBot removed the label CI failed on Feb 19, 2025
  89. Handle negative timeout for waitTipChanged() aff5a9f5be
  90. rpc: drop unneeded IsRPCRunning() guards
    This was preventing the (hidden) waitfornewblock, waitforblock and
    waitforblockheight methods from being used in the GUI.
    
    The check was added in d6a5dc4a2eaa0d7348804254ca09e75fc3a858ab
    when these RPC methods were first introduced.
    
    They could have been dropped when dca923150e3ac10a57c23a7e29e76516d32ec10d
    refactored these methods to use waitTipChanged(), which already
    checks for shutdown.
    
    Making this change now simplifies the next commit.
    8844bd5b3a
  91. rpc: handle shutdown during long poll and wait methods
    The waitTipChanged() now returns nullopt if the node is shutting down.
    
    Previously it would return the last known tip during shutdown, but
    this creates an ambiguous circumstance in the scenario where the
    node is started and quickly shutdown, before notifications().TipBlock()
    is set.
    
    The getblocktemplate, waitfornewblock and waitforblockheight RPC
    are updated to handle this. Existing behavior is preserved.
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    353a41cbad
  92. Have createNewBlock() wait for a tip
    Additionally it returns null if the node started to shutdown before TipBlock() was set.
    56fac7c8d8
  93. rpc: clarify longpoll behavior
    Move the comparison to hashWatchedChain inside the while loop.
    
    Although this early return prevents the GetTransactionsUpdated()
    call in cases where the tip updates, it's only done to improve
    readability. The check itself is very cheap (although a more
    useful check might not be).
    
    Also add code comments.
    9725cd6a5c
  94. Sjors commented at 6:13 pm on February 19, 2025: member
    Addressed feedback and updated the PR description.
  95. Sjors force-pushed on Feb 19, 2025
  96. vasild approved
  97. vasild commented at 9:13 am on February 20, 2025: contributor
    ACK 9725cd6a5c20302da3dcae8fc1156458537ce1df
  98. DrahtBot requested review from ryanofsky on Feb 20, 2025
  99. ryanofsky approved
  100. ryanofsky commented at 12:20 pm on February 20, 2025: contributor
    Code review ACK 9725cd6a5c20302da3dcae8fc1156458537ce1df. Just minor suggestions implemented since last review: changing comments and swapping current_block / block in waitfornewblock.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-10 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me