Race condition between ZMQ UpdateTip and getblocktemplate #30862

issue mcelrath openend this issue on September 10, 2024
  1. mcelrath commented at 4:28 pm on September 10, 2024: none

    Is there an existing issue for this?

    • I have searched the existing issues

    Current behaviour

    When ZMQ announces a new chain tip via “sequence” “C” message, if getblocktemplate is called immediately, it can return a block template that does not reflect the new chain tip.

    This is important for Braidpool, which needs to switch to a new chain tip as quickly as possible when a new block comes in.

    Expected behaviour

    getblocktemplate should return a result consistent with the information announced via ZMQ and correctly reflect the current chain tip.

    Steps to reproduce

    This was seen on the cpunet testnet fork (which does not touch any of the consensus or ZMQ code). The bug report there is: https://github.com/braidpool/bitcoin/issues/1#issuecomment-2341397203

    This occurs infrequently and it may be necessary to run the miner script for a day or more to see it.

    The script that receives the ZMQ messages and calls getblocktemplate is here: https://github.com/braidpool/bitcoin/blob/cpunet/contrib/cpunet/miner

    Relevant log output

    Please see https://github.com/braidpool/bitcoin/issues/1#issuecomment-2341397203 for log output.

    How did you obtain Bitcoin Core

    Compiled from source

    What version of Bitcoin Core are you using?

    Bitcoin Core version v27.99.0-e8b157f687eb-dirty

    Operating system and version

    Arch Linux 6.10.5-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC

    Machine specifications

    8 core Ryzen, SSD.

  2. fanquake added the label RPC/REST/ZMQ on Sep 11, 2024
  3. mzumsande commented at 8:17 pm on September 12, 2024: contributor
    can you rule out that someone else on the network just mined a block, and the inconclusive getblocktemplate result belongs to a previous call that was sent off before that new block was connected?
  4. mcelrath commented at 9:02 pm on September 12, 2024: none

    I believe that’s answered positively by the timestamps and structure of the calling code.

    A better test would be to reproduce this with regtest which I’ll create a script to demonstrate soon.

    But at the same time it seems there should be a mutex around the chain tip. If there isn’t this is basically guaranteed to happen. I haven’t looked yet.

    On Thu, Sep 12, 2024, 4:18 PM Martin Zumsande @.***> wrote:

    can you rule out that someone else on the network just mined a block, and the inconclusive getblocktemplate result belongs to a previous call that was sent off before that new block was connected?

    — Reply to this email directly, view it on GitHub https://github.com/bitcoin/bitcoin/issues/30862#issuecomment-2347161468, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANKOXC46L4DZOZRMYSCJSLZWHZIRAVCNFSM6AAAAABN7EPF2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBXGE3DCNBWHA . You are receiving this because you authored the thread.Message ID: @.***>

  5. mzumsande commented at 9:41 pm on September 12, 2024: contributor

    I believe that’s answered positively by the timestamps and structure of the calling code.

    not really. I don’t see a "Calling bitcoin-cli: getblocktemplate" entry in the log excerpt, but I do see a DEBUG result = log entry.

    Both logs are part of async def bitcoin_cli(self, *args, input=b""): in your script. That’s why I thought the rpc call may have happened before the block was connected by bitcoind.

  6. maflcko commented at 4:14 pm on September 18, 2024: member

    A better test would be to reproduce this with regtest which I’ll create a script to demonstrate soon.

    Did you get a chance to write the reproducer?

    But at the same time it seems there should be a mutex around the chain tip. If there isn’t this is basically guaranteed to happen. I haven’t looked yet.

    Yes, the mutex is called cs_main or ChainstateManager::GetMutex() . The mutex is held while enqueueing the event. And getblocktemplate takes the mutex at the start.

    So there may be (for example) two BlockConnected events, but the tip that getblocktemplate later sees obviously can only be one of the two (not both at the same time), or even none of the two.

  7. mcelrath commented at 4:25 pm on September 18, 2024: none

    Not yet. I’ve been running my morning script all week and haven’t seen it happen again.

    On Wed, Sep 18, 2024, 12:14 PM maflcko @.***> wrote:

    A better test would be to reproduce this with regtest which I’ll create a script to demonstrate soon.

    Did you get a chance to write the reproducer?

    But at the same time it seems there should be a mutex around the chain tip. If there isn’t this is basically guaranteed to happen. I haven’t looked yet.

    Yes, the mutex is called cs_main or ChainstateManager::GetMutex() . The mutex is held while enqueueing the event. And getblocktemplate takes the mutex at the start.

    So there may be (for example) two BlockConnected events, but the tip that getblocktemplate later sees obviously can only be one of the two (not both at the same time), or even none of the two.

    — Reply to this email directly, view it on GitHub https://github.com/bitcoin/bitcoin/issues/30862#issuecomment-2358889569, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANKOXHI3RSJCGFNMVOSFYLZXGRHHAVCNFSM6AAAAABN7EPF2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJYHA4DSNJWHE . You are receiving this because you authored the thread.Message ID: @.***>

  8. maflcko commented at 6:43 am on September 19, 2024: member

    Expected behaviour

    getblocktemplate should return a result consistent with the information announced via ZMQ and correctly reflect the current chain tip.

    So to clarify my previous reply. What you describe as “expected behavior” is impossible to achieve. A C message just contains the block hash of a previous BlockConnected event. While often this will be equal to the result of an RPC getbestblockhash or getblocktemplate done after receiving the C message, there isn’t (and obviously can’t) be a guarantee for it. The tip can (and should) advance at any point in time, when there is a better tip.

    Maybe I am missing something, but given the information so far, I don’t think there is a bug here, or otherwise that there is something that can be fixed.

  9. mcelrath commented at 1:09 pm on September 19, 2024: none

    Why cannot these results be consistent? Isn’t that the purpose of a mutex on the Chainstate? It’s not clear to me why there “obviously can’t” be multi-threaded consistency on the chain state. The (apparent) order of operations here is:

    1. Chain tip received
    2. ZMQ broadcast
    3. RPC call getblocktemplate

    If 3 were reversed with 1 or 2 I would agree with the “can’t” statement, but that doesn’t seem to be what’s happening.

    It seems to me that the ZMQ message is being sent while that mutex is not being held, and the fix should be simple – make sure that mutex is held until the ZMQ alert is finished. A new RPC call that comes in should block until the Chainstate update (including announcement over ZMQ) is complete.

  10. maflcko commented at 2:35 pm on September 19, 2024: member

    fix should be simple – make sure that mutex is held until the ZMQ alert is finished. A new RPC call that comes in should block until the Chainstate update (including announcement over ZMQ) is complete.

    That’d block Bitcoin Core from making any progress in the meantime. When it comes to connecting new blocks to the tip, announcing/relaying, and starting to mine on them, no additional delay is acceptable, because for miners any delay means that they’d work on a stale block.

    Why cannot these results be consistent?

    Because a block can appear and be connected at any time.

    For example:

    • Block A is connected, ZMQ C:A is enqueued
    • Block B is connected, ZMQ C:B is enqueued
    • getbestblockhash is called in response to C:A

    getbestblockhash must return the best block hash (aka tip). It must not and can not return A, because it is not the tip. It will return the tip, which may or may not be B.

  11. sipa commented at 2:45 pm on September 19, 2024: member

    My advice would be to never ever look at the contents of ZMQ notifications, but just see them as notifications that new information may be available, so you should go check and see for yourself. Given the fact that ZMQ notifications can be delayed (see this example) or even dropped entirely (if the queue grows too large, ZMQ has no guarantees AFAIK), you:

    • need to check for updated information anyway whenever a notification comes in
    • need a mechanism to check occasionally even without notifications (and especially after restarts)

    In fact, I consider it a historic mistake that ZMQ notifications include block/transaction hashes/data at all.

  12. mcelrath commented at 2:45 pm on September 19, 2024: none
    Working on a stale block is exactly the problem here, because of inconsistent responses. I think we have a miscommunication here… It sound like you’re saying there’s no way to avoid working on a stale block…
  13. mcelrath commented at 2:49 pm on September 19, 2024: none

    @sipa we’re planning on switching to the new IPC mechanism anyway, that StratumV2 is also using. Do you have any comments about synchronicity in that context?

    FWIW I think I’m the only person ever to have used ZMQ. My contrib/zmq/zmq_sub.py file no longer works due to python async changes. I’m also planning on submitting a PR to update it. But maybe I agree ZMQ might be a mistake. OTOH I think polling in the 21st century is also a mistake.

  14. mzumsande commented at 3:01 pm on September 19, 2024: contributor

    Would be nice to respond to my prior message: Why doesn’t your Miner log show a timestamped “Calling bitcoin-cli (…) getblocktemplate” entry, if, supposedly, the relevant call to getblocktemplate call was sent off after the zmq notification for the new tip was received?

    Without an explanation for that, it just looks to me that you get the delayed result of an older getblocktemplate invocation, and there is no issue at all.

  15. mcelrath commented at 3:20 pm on September 19, 2024: none

    @mzumsande you are right. Apparently I failed to scroll up in the log. There was several tips announced simultaneously and this is absolutely due to a previous invocation of getblocktemplate. I’m going to close issue this now, as I’ve also failed to reproduce it since.

    Log with more context attached.

    miner.log

  16. mcelrath closed this on Sep 19, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-23 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me