I've been seeing this issue frequently since the end of the July, when running the rpc tests locally using our testrunner:
$ ./rpc-tests.py
..............................................................................
bip68-112-113-p2p.py:
[...omitting rest of output...]
Tests successful
[..].
Pass: True, Duration: 39 s
...............................................................................................................
wallet.py:
Initializing test directory /tmp/test_qt_32mz/38
[...omitting again...]
Tests successful
[...]
Pass: True, Duration: 95 s
.......................................................................................................................................................................................Traceback (most recent call last):
File "./rpc-tests.py", line 343, in <module>
runtests()
File "./rpc-tests.py", line 212, in runtests
(name, stdout, stderr, passed, duration) = job_queue.get_next()
File "./rpc-tests.py", line 267, in get_next
(stdout, stderr) = proc.communicate(timeout=3)
File "/usr/lib/python3.4/subprocess.py", line 960, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/usr/lib/python3.4/subprocess.py", line 1618, in _communicate
self._check_timeout(endtime, orig_timeout)
File "/usr/lib/python3.4/subprocess.py", line 986, in _check_timeout
raise TimeoutExpired(self.args, orig_timeout)
subprocess.TimeoutExpired: Command '['/home/sdaftuar/bitcoin/qa/rpc-tests/wallet-dump.py', '--srcdir=/home/sdaftuar/bitcoin/src', '--portseed=36']' timed out after 3 seconds
So the interesting thing here is that the error is in rpc-tests.py, not in the test itself (in this case, wallet-dump.py). I did a little digging and can't figure out why the python code here would fail; it seems we check that the process has terminated before calling Popen.communicate(timeout=3). I don't really understand why that timeout would be hit if the process has already died; I don't think the test can generate giant amounts of stdout/stderr or anything like that?
Not really understanding the problem, it's hard for me to postulate the right fix; would it make sense to just eliminate the timeout, or make it much larger or something? If someone has a recommended approach, I can test locally to see if I can come up with something that works on my hardware, just not sure what kind of fix would make the most sense.
I only ever seem to see this on my slightly slower machine, but it's the same box that I've been using for continuous testing of the repo for a year or more.
FYI, initially it always seemed to be wallet-dump.py that would trigger this issue in rpc-tests.py, but I just noticed at least one instance where I got this same error in communicating with walletbackup.py, so I don't think this is unique to that particular test or anything.