[qa] rpc-tests.py fails randomly while trying to get stdout/stderr #8649

issue sdaftuar opened this issue on September 2, 2016
  1. sdaftuar commented at 1:26 AM on September 2, 2016: member

    I've been seeing this issue frequently since the end of the July, when running the rpc tests locally using our testrunner:

    $ ./rpc-tests.py 
    ..............................................................................
    bip68-112-113-p2p.py:
    [...omitting rest of output...]
    Tests successful
    [..].
    Pass: True, Duration: 39 s
    ...............................................................................................................
    wallet.py:
    Initializing test directory /tmp/test_qt_32mz/38
    [...omitting again...]
    Tests successful
    [...]
    Pass: True, Duration: 95 s
    
    .......................................................................................................................................................................................Traceback (most recent call last):
      File "./rpc-tests.py", line 343, in <module>
        runtests()
      File "./rpc-tests.py", line 212, in runtests
        (name, stdout, stderr, passed, duration) = job_queue.get_next()
      File "./rpc-tests.py", line 267, in get_next
        (stdout, stderr) = proc.communicate(timeout=3)
      File "/usr/lib/python3.4/subprocess.py", line 960, in communicate
        stdout, stderr = self._communicate(input, endtime, timeout)
      File "/usr/lib/python3.4/subprocess.py", line 1618, in _communicate
        self._check_timeout(endtime, orig_timeout)
      File "/usr/lib/python3.4/subprocess.py", line 986, in _check_timeout
        raise TimeoutExpired(self.args, orig_timeout)
    subprocess.TimeoutExpired: Command '['/home/sdaftuar/bitcoin/qa/rpc-tests/wallet-dump.py', '--srcdir=/home/sdaftuar/bitcoin/src', '--portseed=36']' timed out after 3 seconds
    

    So the interesting thing here is that the error is in rpc-tests.py, not in the test itself (in this case, wallet-dump.py). I did a little digging and can't figure out why the python code here would fail; it seems we check that the process has terminated before calling Popen.communicate(timeout=3). I don't really understand why that timeout would be hit if the process has already died; I don't think the test can generate giant amounts of stdout/stderr or anything like that?

    Not really understanding the problem, it's hard for me to postulate the right fix; would it make sense to just eliminate the timeout, or make it much larger or something? If someone has a recommended approach, I can test locally to see if I can come up with something that works on my hardware, just not sure what kind of fix would make the most sense.

    I only ever seem to see this on my slightly slower machine, but it's the same box that I've been using for continuous testing of the repo for a year or more.

    FYI, initially it always seemed to be wallet-dump.py that would trigger this issue in rpc-tests.py, but I just noticed at least one instance where I got this same error in communicating with walletbackup.py, so I don't think this is unique to that particular test or anything.

  2. fanquake added the label Tests on Sep 2, 2016
  3. rodentrabies commented at 11:49 PM on September 4, 2016: contributor

    Tried running with timeout changed from 3 to 6, worked fine for 100 runs. https://github.com/bitcoin/bitcoin/blob/master/qa/pull-tester/rpc-tests.py#L268

  4. sdaftuar commented at 12:02 AM on September 5, 2016: member

    It occurred to me that rather than change the timeout, maybe we could just try to catch that exception and continue, and warn if it happens too much? I was planning to try that locally and see if that solves my problem.

  5. sdaftuar commented at 2:33 PM on September 9, 2016: member

    I spent a bunch of time the last couple of days trying and failing to track this down, though I think I have a workaround for now; I'm posting some updates here just in case anyone else stumbles across this issue.

    I first tried to catch the subprocess.TimeoutExpired exception being thrown, and then just retry. On a subsequent test run I had an infinite loop of the exception being caught and retrying. I had 3 "defunct" python subprocesses that were hanging around, and no idea why that was the case. Maybe either a python subprocess bug, or an OS bug?

    I did stumble across something that made me wonder if there's an issue with our use of subprocess.PIPE but I haven't done enough investigation at this point to identify that as the problem (and in fact I don't yet have a good hypothesis as to how that could be the problem here, either).

    I did identify two workarounds that appear to work for me:

    • Commented out wallet-dump.py from rpc-tests.py, and changed my test invocation so that I first run rpc-tests.py and then separately ran wallet-dump.py.
    • Change test invocation to just run rpc-tests.py -parallel=2

    Both of these seem to never trigger the exception for me; I'm using the second workaround for now and will monitor.

    Leaving this issue open as I still don't understand the underlying issue here.

  6. sdaftuar commented at 5:19 PM on September 16, 2016: member

    Update: now even running rpc-tests.py -extended -parallel=1 is regularly failing for me locally on the 0.13 branch.

  7. MarcoFalke commented at 10:08 AM on September 17, 2016: member

    @sdaftuar #8750 should help you get rid of TimeoutExpired... Would be interesting to know what Exceptions remain after #8750.

  8. sdaftuar commented at 1:43 PM on September 17, 2016: member

    @marcofalke I'll give it a try, thanks!

  9. MarcoFalke commented at 9:30 AM on September 20, 2016: member

    Closing for now. Please open a new issue in case there are still issues with the rpc tests in master.

  10. MarcoFalke closed this on Sep 20, 2016

  11. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-14 12:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me