test: disable comparison tool #6278

pull laanwj wants to merge 1 commits into bitcoin:master from laanwj:2015_06_disable_comparison_tool changing 1 files +1 −1
  1. laanwj commented at 4:48 AM on June 13, 2015: member

    Recently, many Travis builds fail due to timeouts, which happen in the comparison tool. Just a test to see if it can also be triggered without. Not meant to be merged as-is.

  2. test: disable comparison tool cf6b62bbda
  3. laanwj added the label Tests on Jun 13, 2015
  4. dexX7 commented at 2:54 PM on June 13, 2015: contributor

    Hmmm.. that's interesting. Are you sure this is related to the comparison tool?

    I'm asking, because I've never seen timeouts in another project/fork, which is currently based on Bitcoin Core 0.10. There are probably less than 10 builds per day though. But what I actually noticed: after testing a migration to 0.11 (based on 053110d), the Boost tests timed out from time to time, but not always.

    I assumed this was somehow a mistake on my part, or simply "bad luck". The number of 0.11 builds was very low, and right now 0.10 is still used, so this might not be related at all. On top, Travis builds are routed through the container based infrastructure.

    However, when running ./src/test/test_bitcoin --log_level=test_suite in a loop locally (done a few minutes ago), it stops at some point here:

    Entering test suite "scheduler_tests"
    Entering test case "manythreads"
    

    I think this rules out that it's related to Travis. I'm going to test a clean version of Core now.

  5. dexX7 commented at 3:00 PM on June 13, 2015: contributor

    The current master seems to have the same problem.

    When running test_bitcoin in a loop, it stops at some point during the scheduler_tests.

    Tested on Ubuntu 14.04 LTS x64 with:

    Bitcoin version v0.11.99.0-ab0ec67 (2015-06-12 16:49:53 +0200)
    Using OpenSSL version OpenSSL 1.0.1f 6 Jan 2014
    Using BerkeleyDB version Berkeley DB 4.8.30: (April  9, 2010)
    
  6. laanwj commented at 5:52 AM on June 15, 2015: member

    The scheduler test is another possible source of hangs. It has been fixed a few times, but it's still possible for there to be some race condition that makes it either never finish or really slowly.

    Adding timestamps to the test output, as well as more verbose logging to test_bitcoin, may be a good idea.

  7. laanwj commented at 5:54 AM on June 15, 2015: member

    On the other hand I see a lot of false positives in travis end in the comparison tool .e.g. the end usually looks like

    11:06:51 15 BitcoindComparisonTool$1.onPreMessageReceived: Got empty header message from bitcoind
    11:06:51 1 BitcoindComparisonTool.main: Block "b3" completed processing
    11:06:51 1 BitcoindComparisonTool.main: Testing block b3 499b1ec0ece4c4ef3b123d7498e3a5cfc85685fc9998a3f07b0fc7c977433627
    11:06:51 1 BitcoindComparisonTool.main: Sent inv with block 499b1ec0ece4c4ef3b123d7498e3a5cfc85685fc9998a3f07b0fc7c977433627
    11:06:51 15 BitcoindComparisonTool$1.onPreMessageReceived: Got empty header message from bitcoind
    Exception in thread "main" java.lang.NullPointerException
    at com.google.bitcoin.core.BitcoindComparisonTool.main(BitcoindComparisonTool.java:311)
    No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself.
    
    The build has been terminated
    

    I think this is the NULL pointer that @theuni is looking for too.

  8. theuni commented at 5:57 PM on June 15, 2015: member

    @laanwj Yes, if you see the NPE, the tests will timeout. In that case, it's 100% related to the comparison tool.

    I'm still not sure how to proceed with troubleshooting that issue.

  9. laanwj commented at 12:41 PM on June 16, 2015: member

    @theuni I remember your plan was to replace the comparison tool with one built from a known source code. This at least makes the line numbers reliable. Or did problems come up with that?

  10. theuni commented at 3:10 AM on June 17, 2015: member

    Ah right.

    Nope, it just slipped my mind. Will do.

  11. laanwj commented at 8:21 AM on June 17, 2015: member

    Yippie. I found one build that random-errored on master that didn't involve the comparison tool.

    make[3]: Entering directory `/home/travis/build/bitcoin/bitcoin/bitcoin-x86_64-unknown-linux-gnu/src'
    
    Running 158 test cases...
    
    No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself.
    
    The build has been terminated
    

    Probably the scheduler-tests again, although without more diagnostics it's hard to say.

  12. dexX7 commented at 1:37 PM on June 17, 2015: contributor

    although without more diagnostics it's hard to say.

    Locally I'm able to pin it down:

    STATUS=0; while [ $STATUS=0 ]; do ./src/test/test_bitcoin --run_test=scheduler_tests/manythreads --log_level=all; STATUS=$?; done
    

    After a few rounds it always stops with:

    Entering test case "manythreads"
    test/scheduler_tests.cpp(70): info: check nTasks == 0 passed
    test/scheduler_tests.cpp(82): info: check nTasks == 100 passed
    test/scheduler_tests.cpp(83): info: check first < last passed
    test/scheduler_tests.cpp(84): info: check last > now passed
    

    I confirmed it via Travis in a similar manner, see here and here.

    With additional output, it looks like it doesn't make it past joining the threads in the test:

    // Drain the task queue then exit threads
    microTasks.stop(true);
    microThreads.join_all(); // ... wait until all the threads are done | <---- not passed
    

    There seem to be a few reports mentioning issues in this context:

    I'm not sure, how this plays together with the comparison tool though.

  13. gavinandresen commented at 1:49 PM on June 17, 2015: contributor

    @dexX7 what OS and version of boost? (I cannot reproduce a hang using your while loop on OSX 10.10.3 and boost 1.58.0). @laanwj : if it turns out to be a bug in boost or the OS handling gazillions of threads, perhaps just removing the scheduler stress test would be the right thing to do. We aren't actually using gazillions of threads with the scheduler in Core code, just one.

  14. dexX7 commented at 2:04 PM on June 17, 2015: contributor

    @gavinandresen: locally I'm using Ubuntu 14.04.2 with Boost 1.54. Travis uses Ubuntu 12.04 with Boost 1.55.

    Disabling the scheduler test may reduce the number of occurrences, but as this PR indicates, timeouts were also seen during the comparison tool tests. It's not yet shown that this is caused by Boost threads, and it's only my guess.

  15. laanwj commented at 4:39 PM on June 17, 2015: member

    The comparison tool failures and the scheduler tests failures are completely unrelated, the only thing they have in common is that they cause transient travis failures. @dexX7 Thanks for the extensive report, I'll have a look at the scheduler tests again, and see if I can find the issue. I doubt that it is an issue with boost, just with our usage of it (under load). Disabling the test is also a possibility, but I'd prefer to find out what is wrong.

  16. laanwj commented at 3:40 PM on June 19, 2015: member

    Closing this pull for now, let's see how it goes after #6305

  17. laanwj closed this on Jun 19, 2015

  18. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 15:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me