IBD Crash with v0.10rc3: checkqueue.h:183: Assertion `pqueue->nTotal == pqueue->nIdle' failed. #5703

issue wtogami opened this issue on January 24, 2015
  1. wtogami commented at 3:47 AM on January 24, 2015: contributor

    Fedora 21 x86_64 with Bitcoin Core v0.10rc3 linux64 gitian build. 17 minutes into testnet IBD it crashed with this assertion failure.

    [warren@odin bin]$ bitcoin-qt -testnet
    bitcoin-qt: checkqueue.h:183: CCheckQueueControl<T>::CCheckQueueControl(CCheckQueue<T>*) [with T = CScriptCheck]: Assertion `pqueue->nTotal == pqueue->nIdle' failed.
    Aborted (core dumped)
    

    debug.log ends with:

    2015-01-23 22:52:57 UpdateTip: new best=000000000019eeffa3a51b555d4cefb6bcc373665bd3498b98b4d6587dde57f1  height=178433  log2_work=58.029322  tx=1090333  date=2014-02-01 19:42:07 progress=0.910668  cache=109722
    

    Two subsequent IBD's succeeded without crash.

  2. sdaftuar commented at 6:01 PM on January 28, 2015: member

    In the CCheckQueueControl constructor, we're not acquiring the lock on the pqueue before checking its state:

    CCheckQueueControl(CCheckQueue<T>* pqueueIn) : pqueue(pqueueIn), fDone(false)
        {
            // passed queue is supposed to be unused, or NULL
            if (pqueue != NULL) {
                assert(pqueue->nTotal == pqueue->nIdle);
                assert(pqueue->nTodo == 0);
                assert(pqueue->fAllOk == true);
            }
        }
    

    Consequently I think there could be a race condition where these values could look inconsistent (nIdle is updated each time in the thread's Loop()).

    I was able to reliably reproduce this behavior by inserting a usleep(500000); in CCheckQueue::Loop, just before the call to nIdle++;, and then starting up with a reindex.

    To fix this I think we just need to acquire the pqueue's lock before checking these variables; perhaps add an IsIdle() member function to CCheckQueue that acquires the lock and then does these checks, and then assert that function returns true in the CCheckQueueControl constructor?

  3. sipa commented at 7:20 PM on January 28, 2015: member

    @sdaftuar Nice find, and doing that shouldn't hurt. But I'm not entirely sure how this is possible in the first place - there should not be two threads holding a CCheckQueueControl object simultaneous (it's only created inside ConnectBlock, while holding cs_main the whole time).

  4. laanwj closed this on Feb 6, 2015

  5. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-14 15:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me