ActivateBestChain concurrency issues #13092

issue sdaftuar openend this issue on April 26, 2018
  1. sdaftuar commented at 3:59 pm on April 26, 2018: member

    #13023 highlights a concurrency issue in ABC that I think is worth more explanation and discussion, particularly as there are two bugs that should be fixed, and at least one of them is now causing occasional test failures in rpc_deprecated.py on travis.

    1. A known issue in ABC is that if we try to connect a block and discover that is invalid, then we can release cs_main while chainActive.Tip() has less work than our prior tip. However, in such a situation, another thread can be invoking CheckBlockIndex() just as cs_main is released (eg on regtest – CheckBlockIndex isn’t run by default on mainnet), and see an assertion fail because the current tip is not in setBlockIndexCandidates (it would have been previously pruned because we know about a valid more-work tip).

    A band-aid fix would be to add the current tip to setBlockIndexCandidates after disconnecting a block, but that would just be masking the bigger issue, which is that we should fix ABC/ABCStep to not release cs_main while we’re at a lower-work tip.

    1. The second bug is the one @skeees found in #13023, which I tried to more fully explain in #13023 (comment). For posterity:

    If ActivateBestChain is invoked simultaneously in separate threads, then we can end up at a lower-work tip, and remain stuck until the next block comes in.

    • Suppose in thread 1, we have just been delivered a block that causes us to try to activate two blocks in ABC. We connect 1 of them in ABCStep, and then release cs_main before connecting the second.
    • In thread 2, suppose we have been delivered a 3rd block (say via rpc) that builds on the first two. It invokes ABC and gets to run after the first block has been connected in thread 1. It connects one block, releases cs_main, and then connects one more, and finishes.
    • When thread 1 gets to run again, the most work chain has advanced, but (before this PR) we don’t refresh pindexMostWork (except when we find an invalid block). Consequently we would invoke ABCStep with a less work tip than our current tip(!). This would cause us to disconnect our tip and return. Some travis failures have been observed due to this bug, as seen for instance here: https://travis-ci.org/bitcoin/bitcoin/jobs/370848272. The test that sometimes fails, rpc_deprecated.py, generates blocks on two nodes roughly simultaneously, so one of the nodes is generating blocks in an rpc thread while also processing blocks on the network thread, which I believe is enough to trigger this bug.

    I think both bugs should be fixed, though the second one is more severe (as it could result in an unlucky miner losing a block) even in the absence of any kind of attack or misbehavior.

  2. skeees commented at 4:13 pm on April 26, 2018: contributor
  3. laanwj added the label Validation on May 3, 2018
  4. sdaftuar commented at 6:23 pm on May 18, 2018: member
    Resolved in #13023.
  5. sdaftuar closed this on May 18, 2018

  6. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-06 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me