Pruning.py regression test failed (timeout) #10393

ftrader commented at 3:38 PM on May 12, 2017: none

Running the 'pruning.py' regression test (0.14 branch, f2a96e7d0289682eb87877a3528cb6a170f8b786) failed with a block synchronization timeout on my system:

pruning.py:
Initializing test directory /tmp/testk070t0a2/590
Warning! This test requires 4GB of disk space and takes over 30 mins (up to 2 hours)
Mining a big blockchain of 995 blocks
Check that we haven't started pruning yet because we're below PruneAfterHeight
Success
Though we're already using more than 550MiB, current usage: 590.8564891815186
Mining 25 more blocks should cause the first block file to be pruned
Success
Usage should be below target: 479.3257579803467
Check that we'll exceed disk space target if we have a very high stale block rate
Mine 24 (stale) blocks on Node 1, followed by 25 (main chain) block reorg from Node 0, for 12 rounds
Usage can be over target because of high stale rate: 622.9866495132446
Check that we can survive a 288 block reorg still
Current block height: 1320
Invalidating block at height: 1033 6b25fcf68ea49735d0859943708b19c1929b27348b0ce3e6ddb0480e898b8603
New best height 1032
Generating new longer chain of 300 more blocks
Reconnect nodes
Assertion failed: Block sync to height 1332 timed out:
  {'hash': '47bdd7faa2829143d463c664671fbaa03b0b161b6e89467c683fbb94206e7d02', 'height': 1320}
  {'hash': '5179c9a3a390105ee91106ea32befc89ebcee5a66ea85d80fc6de6099ea2fdcd', 'height': 1332}
  {'hash': '47bdd7faa2829143d463c664671fbaa03b0b161b6e89467c683fbb94206e7d02', 'height': 1320}
Stopping nodes

stderr:
  File "/home/ftrader/core/qa/rpc-tests/test_framework/test_framework.py", line 145, in main
    self.run_test()
  File "/home/ftrader/core/qa/rpc-tests/pruning.py", line 390, in run_test
    (self.forkheight,self.forkhash) = self.reorg_test() #(1033, )
  File "/home/ftrader/core/qa/rpc-tests/pruning.py", line 168, in reorg_test
    sync_blocks(self.nodes[0:3], timeout=120)
  File "/home/ftrader/core/qa/rpc-tests/test_framework/util.py", line 147, in sync_blocks
    maxheight, "".join("\n  {!r}".format(tip) for tip in tips)))
Traceback (most recent call last):
  File "/home/ftrader/core/qa/rpc-tests/test_framework/authproxy.py", line 125, in _request
    self.__conn.request(method, path, postdata, headers)
  File "/home/ftrader/lib/python3.5/http/client.py", line 1106, in request
    self._send_request(method, url, body, headers)
  File "/home/ftrader/lib/python3.5/http/client.py", line 1151, in _send_request
    self.endheaders(body)
  File "/home/ftrader/lib/python3.5/http/client.py", line 1102, in endheaders
    self._send_output(message_body)
  File "/home/ftrader/lib/python3.5/http/client.py", line 936, in _send_output
    self.send(message_body)
  File "/home/ftrader/lib/python3.5/http/client.py", line 908, in send
    self.sock.sendall(data)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ftrader/core/qa/rpc-tests/pruning.py", line 454, in <module>
    PruneTest().main()
  File "/home/ftrader/core/qa/rpc-tests/test_framework/test_framework.py", line 164, in main
    stop_nodes(self.nodes)
  File "/home/ftrader/core/qa/rpc-tests/test_framework/util.py", line 384, in stop_nodes
    stop_node(node, i)
  File "/home/ftrader/core/qa/rpc-tests/test_framework/util.py", line 375, in stop_node
    node.stop()
  File "/home/ftrader/core/qa/rpc-tests/test_framework/coverage.py", line 49, in __call__
    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
  File "/home/ftrader/core/qa/rpc-tests/test_framework/authproxy.py", line 152, in __call__
    response = self._request('POST', self.__url.path, postdata.encode('utf-8'))
  File "/home/ftrader/core/qa/rpc-tests/test_framework/authproxy.py", line 138, in _request
    self.__conn.request(method, path, postdata, headers)
  File "/home/ftrader/lib/python3.5/http/client.py", line 1106, in request
    self._send_request(method, url, body, headers)
  File "/home/ftrader/lib/python3.5/http/client.py", line 1151, in _send_request
    self.endheaders(body)
  File "/home/ftrader/lib/python3.5/http/client.py", line 1102, in endheaders
    self._send_output(message_body)
  File "/home/ftrader/lib/python3.5/http/client.py", line 934, in _send_output
    self.send(msg)
  File "/home/ftrader/lib/python3.5/http/client.py", line 877, in send
    self.connect()
  File "/home/ftrader/lib/python3.5/http/client.py", line 849, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "/home/ftrader/lib/python3.5/socket.py", line 711, in create_connection
    raise err
  File "/home/ftrader/lib/python3.5/socket.py", line 702, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

Pass: False, Duration: 7913 s

TEST       | PASSED | DURATION

pruning.py | False  | 7913 s

ALL        | False  | 7913 s (accumulated)

I will do another run to see if I can reproduce it, but it is quite likely given that the test failed before on my system when I attempted it with other clients (BU, Classic).

The command I used to run the test:

$ qa/pull-tester/rpc-tests.py pruning

System specs:

OS: Debian 7 x86_64
CPU: 6 core AMD
RAM: 16gb
Disk size: plenty free
Disk Type (HD/SDD): HDD

fanquake added the label Tests on May 13, 2017

jnewbery commented at 2:47 PM on May 15, 2017: member

Logging in the integration tests is much improved in the master branch. Are you able to check out the master branch and run this test against your current binary with the new test framework (it's now located in test/functional/pruning.py and the rpc-tests.py script is now called test/functional/test_runner.py). You can either checkout the branch in your current directory and not rebuild bitcoind, or if you don't want to checkout in your directory, you can checkout the branch elsewhere and use the --srcdir argument to specify where your bitcoind executable is. You should run the pruning.py script directly rather than through test_runner.py.

You can run with --tracerpc to trace out all the RPC requests/responses, -l 10 to trace debug log to the console and --nocleanup to make sure the temp directory is not deleted after the test has completed. Once the test is completed you can see the logs from the test framework and bitcoind processes in the temp directory (the first line of the logs to console will be something like Initializing test directory /tmp/testk070t0a2/590 which tells you where to find the temp directory)

ftrader commented at 3:11 PM on May 15, 2017: none

Thanks @jnewbery , I will give that new logging a try and see if it uncovers something.

paveljanik commented at 4:12 PM on May 15, 2017: contributor

I can't reproduce the issue on current master.

jnewbery commented at 1:36 PM on May 18, 2017: member

@ftrader I can't reproduce this. Have you tried running again with the more recent test framework? Are you able to reproduce?

If not, are you happy to close this issue?

ftrader commented at 2:36 PM on May 18, 2017: none

Yes, I have reproduced it using the modern test framework. Enabling the tracing recommended above produced a 1.9 GB log file. I attach the final logged exceptions:

ftrader-pruning-log-exceptions.txt

The test ran from 20:53 until 21:24 where it crashed, so roughly 1,5 hrs.

jnewbery commented at 3:08 PM on May 18, 2017: member

pruning.py takes around 15-20 minutes on my PC, so something is going very wrong if it's taking 90 minutes. Looking at the stack, I see that you crashed in create_big_chain() at pruning.py line 333, which is before the test has really begun. The framework is trying to create a chain of large blocks by repeatedly calling createrawtransaction() and generate(). The fact that this is taking such a long time and eventually timing out on a getnetworkinfo() call suggests to me that there's something wrong with the bitcoind process or that it doesn't have enough resources. Either way - your test framework and bticoind logs should be able to show you where this is taking a long time.

ftrader commented at 4:40 PM on May 18, 2017: none

Hmm, it says in the test comments "This test takes 30 mins or more (up to 2 hours)"

Are there specific timing benchmarks that one can run with?

jnewbery commented at 6:36 PM on May 18, 2017: member

The test framework logs and bitcoind logs all have microsecond timestamps, so you should be able to see where this test hung.

MarcoFalke commented at 8:03 PM on April 11, 2018: member

Feel free to open a new issue if this is still observable on current master.

MarcoFalke closed this on Apr 11, 2018

MarcoFalke locked this on Sep 8, 2021