Acquire CCheckQueue's lock to avoid race condition #5721

pull sdaftuar wants to merge 1 commits into bitcoin:master from sdaftuar:fix-checkqueue-race changing 1 files +8 −4
  1. sdaftuar commented at 7:48 PM on January 28, 2015: member

    This fixes a potential race condition in the CCheckQueueControl constructor, which was looking directly at data in CCheckQueue without acquiring its lock.

    Even though only one CCheckQueueControl exists at a time, one of the CCheckQueue threads may have completed work but not yet updated nIdle or released its lock, so looking at that variable without acquiring the lock first is not safe.

    Fixes #5703.

  2. in src/checkqueue.h:None in 54e9638e51 outdated
     185 | @@ -180,9 +186,7 @@ class CCheckQueueControl
     186 |      {
     187 |          // passed queue is supposed to be unused, or NULL
     188 |          if (pqueue != NULL) {
     189 | -            assert(pqueue->nTotal == pqueue->nIdle);
     190 | -            assert(pqueue->nTodo == 0);
     191 | -            assert(pqueue->fAllOk == true);
     192 | +            assert(pqueue->IsIdle());
    


    sipa commented at 8:02 PM on January 28, 2015:

    Can you avoid having an effectful statement inside an assert? We're requiring NDEBUG now, but that may not remain the case.


    sdaftuar commented at 8:19 PM on January 28, 2015:

    Good point -- fixed so that the function gets called outside the assert (commit squashed).

  3. sdaftuar force-pushed on Jan 28, 2015
  4. wtogami commented at 8:24 PM on January 29, 2015: contributor

    0.10?

  5. sipa commented at 4:28 AM on January 30, 2015: member

    Untested ACK (after squash), including 0.10.

  6. laanwj added this to the milestone 0.10.0 on Jan 30, 2015
  7. laanwj added the label Bug on Jan 30, 2015
  8. TheBlueMatt commented at 7:02 PM on February 2, 2015: member

    utACK, after squash

  9. sdaftuar force-pushed on Feb 2, 2015
  10. sdaftuar commented at 7:36 PM on February 2, 2015: member

    Now that I've eliminated the empty commit I used to bump travis (after it initially failed for reasons I think are unrelated to my pull), travis is again showing the initial failed run. Is there a better way for me to handle this in the future?

  11. theuni commented at 9:19 PM on February 2, 2015: member

    @sdaftuar creating a new commit on top forces travis to test that commit. But when you pop it back off, it's already built that exact revision, so it doesn't bother trying again.

    Rather than that, just re-commit the change in some way that generates a new commit hash. Edit the commit message somewhat, git format-patch -1 + git am, something like that. Then force-push to overwrite the old one.

    But of course, the above only applies if the test failure really was a fluke!

  12. theuni commented at 9:38 PM on February 2, 2015: member

    CCheckQueue should be able to drop its friendship with CCheckQueueControl after this change (sorry CCheckQueueControl...)

    That should keep something like this from happening again, since the member vars would be guarded. utACK after that.

  13. ghost commented at 12:39 AM on February 3, 2015: none

    utACK

    The following was observed on testnet in one of the initial syncs at height 224873:

    bitcoind: checkqueue.h:183: CCheckQueueControl<T>::CCheckQueueControl(CCheckQueue<T>*) [with T = CScriptCheck]: Assertion `pqueue->nTotal == pqueue->nIdle' failed.

  14. laanwj commented at 7:57 AM on February 3, 2015: member

    Weird, it failed travis again.

  15. laanwj removed this from the milestone 0.10.0 on Feb 3, 2015
  16. jonasschnelli commented at 9:51 AM on February 3, 2015: contributor

    Travis reports during make check (a.k.a src/test/test_bitcoin):

    2015-01-28 20:27:11 Unlocked: cs_Shutdown  init.cpp:141
    make[2]: *** [check-local] Error 1
    

    It looks like test_bitcoin crashes at init.cpp:141 (Shutdown()) This like is TRY_LOCK(cs_Shutdown, lockShutdown); (https://github.com/bitcoin/bitcoin/blob/master/src/init.cpp#L141)

    And this pull introduces a mutex. I tend to NACK unless this is sorted out.

  17. Acquire CCheckQueue's lock to avoid race condition
    This fixes a potential race condition in the CCheckQueueControl constructor,
    which was looking directly at data in CCheckQueue without acquiring its lock.
    
    Remove the now-unnecessary friendship for CCheckQueueControl
    cf008ac8c3
  18. sdaftuar force-pushed on Feb 3, 2015
  19. sdaftuar commented at 3:24 PM on February 3, 2015: member

    To clarify -- this pull doesn't introduce an actually new mutex, but does it acquire a lock based on the existing mutex that is already used to synchronize access to the member variables in CCheckQueue.

    The travis failure was unrelated to the pull; the travis log (https://travis-ci.org/bitcoin/bitcoin/builds/48672519) shows the java comparison tool test failed with:

    08:27:09 14 BitcoindComparisonTool$1.onPeerDisconnected: bitcoind node disconnected!

    What appears below that is the end of the debug log from bitcoind after the comparison tool failure, which shows a clean shutdown for a bitcoind that is compiled with DEBUG_LOCKORDER and run with -debug (ie the message @jonasschnelli pasted is a normal one, not indicative of the failure).

    Note also that this is not test_bitcoin which failed; the test_bitcoin unit tests passed:

    $ if [ "$RUN_TESTS" = "true" ]; then make check; fi
    Making check in src
    make[1]: Entering directory `/home/travis/build/bitcoin/bitcoin/bitcoin-x86_64-unknown-linux-gnu/src'
    make[2]: Entering directory `/home/travis/build/bitcoin/bitcoin/bitcoin-x86_64-unknown-linux-gnu/src'
    make  check-TESTS check-local
    make[3]: Entering directory `/home/travis/build/bitcoin/bitcoin/bitcoin-x86_64-unknown-linux-gnu/src'
    Running 137 test cases...
    *** No errors detected
    PASS: test/test_bitcoin
    =============
    1 test passed
    =============
    

    I tested my original pull by compiling with DEBUG_LOCKORDER and doing a -reindex on testnet, and by verifying that it solved the race condition in the original issue (which I exacerbated in testing by adding a usleep(500000) before nIdle++ at checkqueue.h:101).

    At any rate I just made the change suggested by @theuni (removing the friend class designation for CCheckQueueControl), which is going to bump travis again.

  20. theuni commented at 9:55 PM on February 3, 2015: member

    I've tested this a good bit locally, since although it doesn't look like it should be able to break anything, the random travis failure on the DEBUG_LOCKORDER test is rather scary.

    I've re-run the comparison tool test dozens of times now, with the exact same conditions (built from depends, NO_QT=1 NO_UPNP=1 DEBUG=1, configured with --enable-glibc-back-compat CPPFLAGS=-DDEBUG_LOCKORDER).

    No problems here.

    Also, notice that in the Travis failure, we never even get a "bitcoind connected". Looks to me like failure really was a fluke.

  21. TheBlueMatt commented at 11:07 PM on February 3, 2015: member

    I'd bet this is related to #5433 (comment) (probably the same issue), try with https://github.com/TheBlueMatt/bitcoin/commit/e21269e04cbc5e1e0e3a3f7e3bd906bb5c74ca82 merged in.

  22. sdaftuar commented at 3:46 PM on February 6, 2015: member

    @TheBlueMatt Just to clarify is there anything more to be done on this pull? Since travis ran cleanly with the current code, I thought I'd just keep your workaround in mind if that travis issue comes up again, but not necessarily change anything now.

  23. laanwj merged this on Feb 6, 2015
  24. laanwj closed this on Feb 6, 2015

  25. laanwj referenced this in commit fb6140b54b on Feb 6, 2015
  26. laanwj referenced this in commit d148f62e00 on Feb 24, 2015
  27. reddink referenced this in commit 9327e51d77 on May 27, 2020
  28. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-14 12:16 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me