Mining IPC createNewBlock should not return before IBD is over #33994

issue plebhash openend this issue on December 2, 2025
  1. plebhash commented at 12:10 pm on December 2, 2025: none

    Please describe the feature you’d like to see added.

    Mining IPC createNewBlock should not return before IBD is over

    on SRI bitcoin_core_sv2 crate, we want to wait for IBD is over as part of the bootstrapping processes

    discussing with @ismaelsadeeq and @Shourya742 during BTrust dev day, we came to the conclusion that createNewBlock shouldn’t even return before IBD is over

    Describe the solution you’d like

    Mining IPC createNewBlock should not return before IBD is over

    Describe any alternatives you’ve considered

    isInitialBlockDownload is already available, but it would require us to poll it, which feels a bit backwards with capnproto

    Please leave any additional context

    we’re getting reports of crashes when sv2-tp is connected to v30 during IBD

    https://discord.com/channels/950687892169195530/1179824984592490496/1445351230258937867

  2. plebhash added the label Feature on Dec 2, 2025
  3. ismaelsadeeq commented at 1:47 pm on December 2, 2025: member
    @plebhash It will be nice to get to the root of the issue and see whether it’s bitcoin core or from the template provider. Can you please provide the logs and setups accessible in a text file and not a discord link I think some people might not have discord. cc @Sjors I attempt to reproduce the failure but was not successful yet. blocking createnewblock from creating template during ibd is potential solution and would also delegate the responsibility of polling isinitialblockdownload to the node, which is more ideal IMO. We can just make the thread sleeps and wakes up when we are done with ibd and return an empty block. curious to hear what others think.
  4. Sjors commented at 2:37 pm on December 2, 2025: member

    The problem is that ChainstateManager::IsInitialBlockDownload() uses a fairly relaxed heuristic:

    https://github.com/bitcoin/bitcoin/blob/d0f6d9953a15d7c7111d46dcb76ab2bb18e5dee3/src/validation.cpp#L1998-L2023

    So it latches to true when there’s still a day worth of blocks left to process.

    We could make this function much more precise by having it consider the most proof-of-work (not invalid) header chain. It would then insist that we have all blocks fully downloaded and verified.

    But that might have impacts in other places in the codebase, potentially causing things to get stuck longer than needed just because we’re missing a few recent blocks. So it’s probably better to have a new function like ReallyAtTheTipNow().

    Both the isInitialBlockDownload() and createNewBlock() interface methods could switch to that. It’s better for those methods to wait until we’re really at the tip, because otherwise we start emitting templates for blocks that are guaranteed to be stale.

  5. sipa commented at 2:42 pm on December 2, 2025: member

    You can’t use “active_tip == best_known_block_header” as condition for mining, because an attacker could mine a block, relay the header, and never relay the block. Sure, very expensive, but if the outcome is shutting down other miners, some joker might try it.

    Bitcoin nodes are always synchronizing, and unless you have very strong evidence to the contrary, you should act as if your active tip is the best block on the chain. If we had evidence of a better block, it would be our best tip instead.

    IsInitialBlockDownload() is essentially this very conservative check. Only if you’re really behind it’ll say true.

  6. Sjors commented at 3:11 pm on December 2, 2025: member

    Makes sense. We’ll have to find another method of preventing a burst of new templates while the node churns through the last day worth of blocks.

    The Template Provider client could simply wait 1 minute at startup if IBD was initially true and changed to false. In practice that only happens after the node restarts (we wouldn’t want to waste one minute each time the TP restarts).

  7. plebhash commented at 3:14 pm on December 2, 2025: none

    Can you please provide the logs and setups accessible in a text file and not a discord link I think some people might not have discord.

    yeah sorry I was in a hurry for the workshop, indeed pasting Discord links is not ideal

    this is what @jbesraa said:

    I think handling IBD and other initialization aspects is critical when working in production env. As these two services need to set on the same machine you usually deploy them through the same “task” and I think they should wait for each other to come fully available until they connect. I did observe issues with core30/tp when it is restarted alongside sv2-tp(this is actually very consistent error that iam still investigating):

    0./node/interfaces.cpp:980 chainman: Assertion `m_node.chainman' failed.
    

    also sv2-tp seem to be handling initial IBD fine for some period but then something happen:

    02025-12-02T09:12:18Z [sv2:trace] Waiting to come out of IBD
    

    I can see this is printed while core is downloading but then core reach 100%

    0   
    12025-12-02T09:34:17Z UpdateTip: new best=000000000000000000010925227dbff92c7e5d41d120cec541487f4fcde6d71c height=926118 version=0x2e9ea000 log2_work=95.967332 tx=1278190156 date='2025-12-02T09:33:49Z' progress=1.000000 cache=587.5MiB(4072609txo)
    

    but I dont see logs from sv2-tp


    i think (2) is really just logging and sv2-tp is just working after that but (1) is kinda a problem(might actually be more core problem than sv2-tp)

    I think @jbesraa knows how to reproduce it deterministically?

  8. plebhash commented at 3:35 pm on December 2, 2025: none

    IsInitialBlockDownload() is essentially this very conservative check. Only if you’re really behind it’ll say true.

    how far behind? are we talking ~2, ~20 or ~200 blocks behind?

  9. sipa commented at 3:37 pm on December 2, 2025: member

    how far behind? are we talking ~2, ~20 or ~200 blocks behind?

    A day.

  10. jbesraa commented at 5:14 pm on December 2, 2025: none

    Can you please provide the logs and setups accessible in a text file and not a discord link I think some people might not have discord.

    yeah sorry I was in a hurry for the workshop, indeed pasting Discord links is not ideal

    this is what @jbesraa said:

    I think handling IBD and other initialization aspects is critical when working in production env. As these two services need to set on the same machine you usually deploy them through the same “task” and I think they should wait for each other to come fully available until they connect. I did observe issues with core30/tp when it is restarted alongside sv2-tp(this is actually very consistent error that iam still investigating):

    0./node/interfaces.cpp:980 chainman: Assertion `m_node.chainman' failed.
    

    also sv2-tp seem to be handling initial IBD fine for some period but then something happen:

    02025-12-02T09:12:18Z [sv2:trace] Waiting to come out of IBD
    

    I can see this is printed while core is downloading but then core reach 100%

    0   
    12025-12-02T09:34:17Z UpdateTip: new best=000000000000000000010925227dbff92c7e5d41d120cec541487f4fcde6d71c height=926118 version=0x2e9ea000 log2_work=95.967332 tx=1278190156 date='2025-12-02T09:33:49Z' progress=1.000000 cache=587.5MiB(4072609txo)
    

    but I dont see logs from sv2-tp

    i think (2) is really just logging and sv2-tp is just working after that but (1) is kinda a problem(might actually be more core problem than sv2-tp)

    I think @jbesraa knows how to reproduce it deterministically?

    Oh hey.

    As mentioned in the original comment, I think there is mainly a problem with (1). A bit of info about our setup:

    We run both sv2-tp(by sjors, version 1.0.4) and core 30 via docker on the same (aws)linux EC2 machine. When starting fresh setup, its usually smooth but whenever we need to update a config or something similar and restart the setup, we encounter the following:

    sv2-tp logs (some data is redacted due to security reasons)

     02025-11-28T13:40:39Z Default data directory
     12025-11-28T13:40:39Z Using data directory 
     22025-11-28T13:40:39Z Config file: 
     32025-11-28T13:40:39Z Command-line arg: datadir=
     42025-11-28T13:40:39Z Command-line arg: debug=sv2
     52025-11-28T13:40:39Z Command-line arg: ipcconnect=unix:/run/bitcoin/node.sock
     62025-11-28T13:40:39Z Command-line arg: loglevel=
     72025-11-28T13:40:39Z Command-line arg: sv2bind=
     82025-11-28T13:40:39Z Command-line arg: sv2port=
     92025-11-28T13:40:39Z Using the 'sse4(1way),sse41(4way),avx2(8way)' SHA256 implementation
    102025-11-28T13:40:39Z Using RdSeed as an additional entropy source
    112025-11-28T13:40:39Z Using RdRand as an additional entropy source
    122025-11-28T13:40:39Z [ipc:info] {sv2-tp-1/sv2-tp-1} IPC client first request from current thread, constructing waiter
    13Connected to bitcoin-node
    142025-11-28T13:40:39Z [sv2] Reading cached static key from
    152025-11-28T13:40:39Z [sv2:info] Static key: 
    162025-11-28T13:40:39Z [sv2:info] Template Provider authority key: 
    172025-11-28T13:40:39Z [sv2:trace] Authority key: 
    182025-11-28T13:40:39Z [sv2:trace] Certificate hashed data: 
    192025-11-28T13:40:39Z [sv2:info] Template Provider listening on 
    202025-11-28T13:40:39Z  thread start
    212025-11-28T13:40:39Z sv2 thread start
    222025-11-28T13:40:39Z [ipc:info] {sv2-tp-1/b-sv2-9} IPC client first request from current thread, constructing waiter
    232025-11-28T13:40:39Z [ipc:warning] {sv2-tp-1/b-capnp-loop-7} IPC client: unexpected network disconnect.
    242025-11-28T13:40:39Z [ipc:error] IPC client method call interrupted by disconnect.
    252025-11-28T13:40:39Z 
    26************************
    27************************
    28EXCEPTION: N3ipc9ExceptionE       
    29EXCEPTION: N3ipc9ExceptionE       
    30IPC client method call interrupted by disconnect.       
    31IPC client method call interrupted by disconnect.       
    32bitcoin in sv2       
    33bitcoin in sv2       
    34terminate called after throwing an instance of 'ipc::Exception'
    35  what():  IPC client method call interrupted by disconnect.
    

    core-v30 logs (some data is redacted due to security reasons)

     02025-11-28T13:40:38Z Bitcoin Core version v30.0 (release build)
     12025-11-28T13:40:38Z parameter interaction: -bind set -> setting -listen=1
     22025-11-28T13:40:38Z Using the 'sse4(1way);sse41(4way);avx2(8way)' SHA256 implementation
     32025-11-28T13:40:39Z Using RdSeed as an additional entropy source
     42025-11-28T13:40:39Z Using RdRand as an additional entropy source
     52025-11-28T13:40:39Z Default data directory
     62025-11-28T13:40:39Z Using data directory 
     72025-11-28T13:40:39Z Config file: 
     82025-11-28T13:40:39Z Command-line arg: bind="0.0.0.0"
     92025-11-28T13:40:39Z Command-line arg: chain="main"
    102025-11-28T13:40:39Z Command-line arg: datadir=
    112025-11-28T13:40:39Z Command-line arg: ipcbind=
    122025-11-28T13:40:39Z Command-line arg: port=
    132025-11-28T13:40:39Z Command-line arg: printtoconsole=""
    142025-11-28T13:40:39Z Command-line arg: prune="1092"
    152025-11-28T13:40:39Z Command-line arg: rpcbind=
    162025-11-28T13:40:39Z Command-line arg: rpcport=
    172025-11-28T13:40:39Z scheduler thread start
    182025-11-28T13:40:39Z Listening for IPC requests on address 
    192025-11-28T13:40:39Z Using wallet directory 
    202025-11-28T13:40:39Z init message: Verifying wallet(s)…
    212025-11-28T13:40:39Z Using /16 prefix for IP bucketing
    222025-11-28T13:40:39Z init message: Loading P2P addresses…
    23./node/interfaces.cpp:980 chainman: Assertion `m_node.chainman' failed.
    
  11. ryanofsky commented at 6:25 pm on December 2, 2025: contributor

    re: #33994 (comment)

    Makes sense. We’ll have to find another method of preventing a burst of new templates while the node churns through the last day worth of blocks.

    I think the fact that we have separate createNewBlock and waitNext to functions to fetch the initial block template and subsequent ones should make this easier to solve, because createNewBlock can be conservative and wait to return until after some cooldown period, and waitNext can be more responsive and always try to return as quickly as possible. So I think we could just add something like the following to createNewBlock

    0constexpr auto COOLDOWN = 1s;
    1while (!interrupted && (GetTime() - last_block_connected_time < COOLDOWN) && active_tip != best_known_block_header) {
    2    cv.wait_for(lock, COOLDOWN);
    3}
    

    Checking best_known_block_header would just be an optimization. This wouldn’t wait longer than the cooldown period to return after the last block was connected.

  12. Sjors commented at 6:56 pm on December 2, 2025: member

    @ryanofsky wrote:

    GetTime() - last_block_connected_time < COOLDOWN @jbesraa:

    That looks nice, and would keep client code simple.

    We run both sv2-tp(by sjors, version 1.0.4)

    You should upgrade to v1.0.5 :-)

    and core 30

    At this point I think it’s better to run the 30.x branch built from source if you can. It has quite a few IPC related bug fixes. Otherwise you’re going to hunt for bugs that have already been fixed. Once v30.1 is out it’s fine to switch to a release again.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-12-10 03:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me