Intermittent (and rare) unit test failure in blockfilter_index_tests/blockfilter_index_initial_sync #16337

issue practicalswift openend this issue on July 4, 2019
  1. practicalswift commented at 12:41 pm on July 4, 2019: contributor

    Intermittent (and likely super-rare) unit test failure in blockfilter_index_tests/blockfilter_index_initial_sync:

    0Running 352 test cases...
    1test/blockfilter_index_tests.cpp(149): fatal error: 
    2  in "blockfilter_index_tests/blockfilter_index_initial_sync":
    3  critical check time_start + timeout_ms > GetTimeMillis() has failed
    4pure virtual method called
    5terminate called without an active exception
    6unknown location(0): fatal error: in 
    7  "blockfilter_index_tests/blockfilter_index_initial_sync": 
    8  signal: SIGABRT (application abort requested)
    

    Scheduling non-determinism that could be handled more gracefully?

    Possibly related:

  2. fanquake added the label Tests on Jul 4, 2019
  3. MarcoFalke commented at 9:22 pm on July 8, 2019: member
    Looks like a race with a destructor of a derived class
  4. MarcoFalke commented at 9:24 pm on July 8, 2019: member
    Hmm, maybe not. Does this happen with a sanitizer enabled?
  5. practicalswift commented at 10:03 am on July 30, 2019: contributor

    @MarcoFalke This happens without any sanitiser enabled.

    FWIW I hit the case today again:

    0Running 15 test cases...
    1test/blockfilter_index_tests.cpp(149): fatal error: in "blockfilter_index_tests/blockfilter_index_initial_sync": critical check time_start + timeout_ms > GetTimeMillis() has failed
    2pure virtual method called
    3terminate called without an active exception
    4unknown location(0): fatal error: in "blockfilter_index_tests/blockfilter_index_initial_sync": signal: SIGABRT (application abort requested)
    5test/blockfilter_index_tests.cpp(149): last checkpoint
    6
    7*** 2 failures are detected in the test module "Bitcoin Core Test Suite"
    8test_bitcoin: /usr/include/boost/thread/pthread/condition_variable_fwd.hpp:116: boost::condition_variable::~condition_variable(): Assertion `!ret' failed.
    
  6. MarcoFalke commented at 12:01 pm on July 30, 2019: member
  7. MarcoFalke commented at 1:06 am on April 27, 2020: member
    Is this still an issue with a recent version of Bitcoin Core? If yes, what are the steps to reproduce?
  8. MarcoFalke closed this on Apr 27, 2020

  9. MarcoFalke commented at 11:39 pm on May 14, 2020: member
  10. MarcoFalke reopened this on May 14, 2020

  11. practicalswift commented at 5:47 am on May 15, 2020: contributor
    @MarcoFalke If you want to be able to hit this condition deterministically you can use rr’s chaos mode. It is gold for finding intermittent issues in test cases. I’ve never heard it discussed by other Core contributors but I use it extensively :)
  12. MarcoFalke commented at 10:13 am on May 15, 2020: member
    Interesting. What steps do you use to reproduce this issue with rr?
  13. practicalswift commented at 4:38 pm on May 15, 2020: contributor

    @MarcoFalke Try this:

     0$ git clone https://github.com/bitcoin/bitcoin
     1$ cd bitcoin/
     2$ ./autogen.sh
     3$ ./configure --with-incompatible-bdb
     4$ make
     5$ git clone https://github.com/mozilla/rr
     6$ cd rr/
     7$ ./configure
     8$ make
     9$ cd ../
    10$ rm -rf test_bitcoin_chaos
    11$ while rr/bin/rr record --chaos -o test_bitcoin_chaos src/test/test_bitcoin -l test_suite -t blockfilter_index_tests -- DEBUG_LOG_OUT; do
    12      rm -rf test_bitcoin_chaos
    13  done
    
  14. practicalswift commented at 6:41 am on May 16, 2020: contributor

    After running the loop above for 14 hours I hit the bug using rr:

    02020-05-16T05:59:59.184771Z Syncing basic block filter index with block chain from height 0
    1test/blockfilter_index_tests.cpp(140): fatal error: in "blockfilter_index_tests/blockfilter_index_initial_sync": critical check time_start + timeout_ms > GetTimeMillis() has failed
    2pure virtual method called
    3terminate called without an active exception
    4unknown location(0): fatal error: in "blockfilter_index_tests/blockfilter_index_initial_sync": signal: SIGABRT (application abort requested)
    5test/blockfilter_index_tests.cpp(140): last checkpoint
    6test/blockfilter_index_tests.cpp(105): Leaving test case "blockfilter_index_initial_sync"; testing time: 334500us
    

    Now we can replay the failing test execution deterministically using rr replay test_bitcoin_chaos/.

  15. MarcoFalke commented at 10:09 am on May 16, 2020: member
    What is the diff in the debug log/stdout between a passing build and this failing one?
  16. practicalswift commented at 1:10 pm on May 20, 2020: contributor

    I haven’t investigated that but here is the output from a failing one:

     02020-05-16T05:59:57.550904Z     - Index writing: 1.94ms [0.40s (2.03ms/blk)]                                                                                                                                                                                                                                                    [0/1710]
     12020-05-16T05:59:57.551566Z     - Callbacks: 0.67ms [0.12s (0.61ms/blk)]
     22020-05-16T05:59:57.552209Z BlockChecked: block hash=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d state=Valid
     32020-05-16T05:59:57.552852Z   - Connect total: 7.13ms [1.38s (6.93ms/blk)]
     42020-05-16T05:59:57.553500Z   - Flush: 0.65ms [0.14s (0.70ms/blk)]
     52020-05-16T05:59:57.554153Z   - Writing chainstate: 0.65ms [0.17s (0.84ms/blk)]
     62020-05-16T05:59:57.554895Z Blockpolicy estimates updated by 0 of 0 block txs, since last block 0 of 0 tracked, mempool map size 0, max target 0 from current
     72020-05-16T05:59:57.555586Z UpdateTip: new best=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d height=99 version=0x20000000 log2_work=7.6438562 tx=100 date='2020-05-16T06:00:11Z' progress=1.000000 cache=0.0MiB(99txo)
     82020-05-16T05:59:57.556231Z   - Connect postprocess: 2.08ms [0.43s (2.14ms/blk)]
     92020-05-16T05:59:57.556877Z - Connect block: 10.51ms [2.11s (10.62ms/blk)]
    102020-05-16T05:59:57.557687Z Checking mempool with 0 transactions and 0 inputs
    112020-05-16T05:59:57.571535Z Enqueuing BlockConnected: block hash=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d block height=99
    122020-05-16T05:59:57.572202Z Enqueuing UpdatedBlockTip: new block hash=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d fork block hash=5baaa452c43885d714369586561867ad382e55ed27699fa82676c33495e14125 (in IBD=false)
    132020-05-16T05:59:57.572970Z CreateNewBlock(): block weight: 904 txs: 0 fees: 0 sigops 400
    142020-05-16T05:59:57.573985Z     - Sanity checks: 0.01ms [0.00s (0.01ms/blk)]
    152020-05-16T05:59:57.574644Z     - Fork checks: 0.67ms [0.31s (1.55ms/blk)]
    162020-05-16T05:59:57.575301Z       - Connect 1 transactions: 0.65ms (0.648ms/tx, 0.000ms/txin) [0.29s (1.46ms/blk)]
    172020-05-16T05:59:57.575962Z     - Verify 0 txins: 1.32ms (0.000ms/txin) [0.53s (2.63ms/blk)]
    182020-05-16T05:59:57.576603Z CreateNewBlock() packages: 0.01ms (0 packages, 0 updated descendants), validity: 3.65ms (total 3.66ms)
    192020-05-16T05:59:57.577353Z NewPoWValidBlock: block hash=7eddde99a72607cda3cfac83dfbce8cb254693ca76fe0e28d85b01612c7162a5
    202020-05-16T05:59:57.579751Z   - Load block from disk: 0.00ms [0.00s]
    212020-05-16T05:59:57.580721Z     - Sanity checks: 0.01ms [0.00s (0.01ms/blk)]
    222020-05-16T05:59:57.581374Z     - Fork checks: 0.66ms [0.31s (1.55ms/blk)]
    232020-05-16T05:59:57.582022Z       - Connect 1 transactions: 0.64ms (0.644ms/tx, 0.000ms/txin) [0.29s (1.45ms/blk)]
    242020-05-16T05:59:57.582672Z     - Verify 0 txins: 1.30ms (0.000ms/txin) [0.53s (2.62ms/blk)]
    252020-05-16T05:59:57.584895Z     - Index writing: 2.21ms [0.41s (2.02ms/blk)]
    262020-05-16T05:59:57.585539Z     - Callbacks: 0.66ms [0.12s (0.61ms/blk)]
    272020-05-16T05:59:57.586168Z BlockChecked: block hash=7eddde99a72607cda3cfac83dfbce8cb254693ca76fe0e28d85b01612c7162a5 state=Valid
    282020-05-16T05:59:57.587934Z   - Connect total: 8.19ms [1.39s (6.90ms/blk)]
    292020-05-16T05:59:57.588569Z   - Flush: 0.64ms [0.14s (0.70ms/blk)]
    302020-05-16T05:59:57.604022Z   - Writing chainstate: 15.44ms [0.18s (0.91ms/blk)]
    312020-05-16T05:59:57.604763Z Blockpolicy estimates updated by 0 of 0 block txs, since last block 0 of 0 tracked, mempool map size 0, max target 0 from current
    322020-05-16T05:59:57.605454Z UpdateTip: new best=7eddde99a72607cda3cfac83dfbce8cb254693ca76fe0e28d85b01612c7162a5 height=100 version=0x20000000 log2_work=7.6582115 tx=101 date='2020-05-16T06:00:11Z' progress=1.000000 cache=0.0MiB(100txo)
    332020-05-16T05:59:57.606099Z   - Connect postprocess: 2.09ms [0.43s (2.13ms/blk)]
    342020-05-16T05:59:57.606740Z - Connect block: 26.36ms [2.14s (10.65ms/blk)]
    352020-05-16T05:59:57.607554Z Checking mempool with 0 transactions and 0 inputs
    362020-05-16T05:59:57.608496Z Enqueuing BlockConnected: block hash=7eddde99a72607cda3cfac83dfbce8cb254693ca76fe0e28d85b01612c7162a5 block height=100
    372020-05-16T05:59:57.609144Z Enqueuing UpdatedBlockTip: new block hash=7eddde99a72607cda3cfac83dfbce8cb254693ca76fe0e28d85b01612c7162a5 fork block hash=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d (in IBD=false)
    382020-05-16T05:59:57.613390Z Opened LevelDB successfully
    392020-05-16T05:59:57.614711Z Using obfuscation key for /tmp/test_common_Bitcoin Core/8452f2eaa126750ba2992d9a174f267010c49d1332b33956fb3e0963cd70821e/regtest/indexes/blockfilter/basic/db: 0000000000000000
    402020-05-16T05:59:57.616968Z BlockConnected: block hash=5baaa452c43885d714369586561867ad382e55ed27699fa82676c33495e14125 block height=98
    412020-05-16T05:59:57.618456Z UpdatedBlockTip: new block hash=5baaa452c43885d714369586561867ad382e55ed27699fa82676c33495e14125 fork block hash=79fd5be16c059ff46e8eb70b8b6a37e1b71f4788059ad054941a1f5445ea90f6 (in IBD=false)
    422020-05-16T05:59:57.619685Z BlockConnected: block hash=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d block height=99
    432020-05-16T05:59:57.620941Z UpdatedBlockTip: new block hash=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d fork block hash=5baaa452c43885d714369586561867ad382e55ed27699fa82676c33495e14125 (in IBD=false)
    442020-05-16T05:59:57.622205Z BlockConnected: block hash=7eddde99a72607cda3cfac83dfbce8cb254693ca76fe0e28d85b01612c7162a5 block height=100
    452020-05-16T05:59:57.623487Z UpdatedBlockTip: new block hash=7eddde99a72607cda3cfac83dfbce8cb254693ca76fe0e28d85b01612c7162a5 fork block hash=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d (in IBD=false)
    462020-05-16T05:59:57.649646Z basic block filter index thread start
    472020-05-16T05:59:57.650334Z Syncing basic block filter index with block chain from height 0
    48test/blockfilter_index_tests.cpp(140): fatal error: in "blockfilter_index_tests/blockfilter_index_initial_sync": critical check time_start + timeout_ms > GetTimeMillis() has failed
    49pure virtual method called
    50terminate called without an active exception
    51unknown location(0): fatal error: in "blockfilter_index_tests/blockfilter_index_initial_sync": signal: SIGABRT (application abort requested)
    
  17. MarcoFalke commented at 2:44 pm on May 20, 2020: member
    Would be worth a shot to see if #18791 by @promag fixes this
  18. adamjonas commented at 2:26 pm on March 18, 2021: member

    This remains an issue and pops up with regularity on our Jenkins CI, which continuously runs master.

    Here are the relevant error snippets:

    02021-03-04T10:12:28.684510Z [basic block filter index] [index/base.cpp:163] [ThreadSync] Syncing basic block filter index with block chain from height 0
    1test/blockfilter_index_tests.cpp(140): [4;31;49mfatal error: in "blockfilter_index_tests/blockfilter_index_initial_sync": critical check time_start + timeout_ms > GetTimeMillis() has failed[0;39;49m
    2==2949==WARNING: MemorySanitizer: use-of-uninitialized-value
    3...(cut for brevity)...
    4SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/lib/jenkins/workspace/bitcoin-depends_sanitizers_memory_MSan_focal/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/./streams.h:439:13 in CDataStream::Xor(std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > const&)
    5Exiting
    6make[3]: *** [Makefile:17137: test/blockfilter_index_tests.cpp.test] Error 1
    7make[3]: *** Waiting for unfinished jobs....
    

    Over the last 30 days, we’ve seen this 66 of 1831 runs (3.6%) and 11 of 419 (2.6%) over the last week.

    This OOME event may be responsible for a portion of the cirrus crashes, which show up as a generic “agent crashed” in the logs.

    (Thanks to @evanbaer and @MarcoFalke for tracking this down)

  19. laanwj commented at 8:16 am on August 3, 2021: member
    FWIW I had the same issue locally (#22614)
  20. MarcoFalke commented at 1:22 pm on August 5, 2021: member
    I am trying for the last two days to reproduce this with #16337 (comment) , but couldn’t yet.
  21. practicalswift closed this on Oct 28, 2021

  22. DrahtBot locked this on Oct 30, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me