The TSAN job is now running on Cirrus. Increase the allocated memory to the maximum allowed.
ci: no-longer exclude feature_block in TSAN job #20543
pull fanquake wants to merge 1 commits into bitcoin:master from fanquake:dont_exclude_feature_block_cirrus changing 2 files +2 −3-
fanquake commented at 4:15 AM on December 2, 2020: member
- fanquake added the label Tests on Dec 2, 2020
-
fanquake commented at 4:58 AM on December 2, 2020: member
feature_blockhas failed:2020-12-02T04:37:36.022000Z TestFramework (INFO): Accept a block with invalid opcodes in dead execution paths 2020-12-02T04:37:36.134000Z TestFramework (INFO): Test re-orging blocks with OP_RETURN in them 2020-12-02T04:37:36.912000Z TestFramework (INFO): Test a re-org of one week's worth of blocks (1088 blocks) 2020-12-02T04:46:04.530000Z TestFramework (ERROR): Assertion failed Traceback (most recent call last): File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 126, in main self.run_test() File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/feature_block.py", line 1278, in run_test self.send_blocks([block], True, timeout=2440) File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/feature_block.py", line 1410, in send_blocks self.helper_peer.send_blocks_and_test(blocks, self.nodes[0], success=success, reject_reason=reject_reason, force_send=force_send, timeout=timeout, expect_disconnect=reconnect) File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/p2p.py", line 631, in send_blocks_and_test self.sync_with_ping(timeout=timeout) File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/p2p.py", line 507, in sync_with_ping self.wait_until(test_function, timeout=timeout) File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/p2p.py", line 412, in wait_until wait_until_helper(test_function, timeout=timeout, lock=p2p_lock, timeout_factor=self.timeout_factor) File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 247, in wait_until_helper if predicate(): File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/p2p.py", line 409, in test_function assert self.is_connected AssertionError 2020-12-02T04:46:06.683000Z TestFramework (INFO): Stopping nodes stderr: Traceback (most recent call last): File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/authproxy.py", line 107, in _request self.__conn.request(method, path, postdata, headers) File "/usr/lib/python3.8/http/client.py", line 1255, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1301, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1250, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1049, in _send_output self.send(chunk) File "/usr/lib/python3.8/http/client.py", line 971, in send self.sock.sendall(data) BrokenPipeError: [Errno 32] Broken pipe During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/feature_block.py", line 1417, in <module> FullBlockTest().main() File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 149, in main exit_code = self.shutdown() File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 278, in shutdown self.stop_nodes() File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 526, in stop_nodes node.stop_node(wait=wait) File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_node.py", line 319, in stop_node self.stop(wait=wait) File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/coverage.py", line 47, in __call__ return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs) File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/authproxy.py", line 144, in __call__ response, status = self._request('POST', self.__url.path, postdata.encode('utf-8')) File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/authproxy.py", line 113, in _request self.__conn.request(method, path, postdata, headers) File "/usr/lib/python3.8/http/client.py", line 1255, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1301, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1250, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1010, in _send_output self.send(msg) File "/usr/lib/python3.8/http/client.py", line 950, in send self.connect() File "/usr/lib/python3.8/http/client.py", line 921, in connect self.sock = self._create_connection( File "/usr/lib/python3.8/socket.py", line 808, in create_connection raise err File "/usr/lib/python3.8/socket.py", line 796, in create_connection sock.connect(sa) ConnectionRefusedError: [Errno 111] Connection refusednode0 2020-12-02T04:44:06.247207Z [msghand] - Disconnect block: 2249.91ms node0 2020-12-02T04:44:06.863029Z [msghand] UpdateTip: new best=107d0d7d666cc30a086b3346a220e0f71fe736988247e8c7227b7025c92e7993 height=378 version=0x00000004 log2_work=9.566054 tx=11656 date='2020-12-02T04:42:46Z' progress=1.000000 cache=1.6MiB(12001txo) node0 2020-12-02T04:44:06.863218Z [msghand] Enqueuing BlockDisconnected: block hash=621dba2ca71816fc9d2e28994582593ea03aa4184638ee3962a7ed632e8b2ba8 block height=379 node0 2020-12-02T04:44:18.793214Z [msghand] - Disconnect block: 7207.89ms node0 2020-12-02T04:44:53.834226Z [msghand] UpdateTip: new best=73b411695b1980e6ea88ae9fd02cddd51a71be59e988eee78243ffd0eaf60ccf height=377 version=0x00000004 log2_work=9.562242 tx=11653 date='2020-12-02T04:42:45Z' progress=1.000000 cache=1.6MiB(12001txo) node0 2020-12-02T04:45:01.905174Z [msghand] Enqueuing BlockDisconnected: block hash=107d0d7d666cc30a086b3346a220e0f71fe736988247e8c7227b7025c92e7993 block height=378 test 2020-12-02T04:46:03.936000Z TestFramework.p2p (DEBUG): Closed connection to: 127.0.0.1:16244 test 2020-12-02T04:46:04.530000Z TestFramework (ERROR): Assertion failed Traceback (most recent call last): File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 126, in main self.run_test() File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/feature_block.py", line 1278, in run_test self.send_blocks([block], True, timeout=2440) File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/feature_block.py", line 1410, in send_blocks self.helper_peer.send_blocks_and_test(blocks, self.nodes[0], success=success, reject_reason=reject_reason, force_send=force_send, timeout=timeout, expect_disconnect=reconnect) File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/p2p.py", line 631, in send_blocks_and_test self.sync_with_ping(timeout=timeout) File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/p2p.py", line 507, in sync_with_ping self.wait_until(test_function, timeout=timeout) File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/p2p.py", line 412, in wait_until wait_until_helper(test_function, timeout=timeout, lock=p2p_lock, timeout_factor=self.timeout_factor) File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 247, in wait_until_helper if predicate(): File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/p2p.py", line 409, in test_function assert self.is_connected AssertionError test 2020-12-02T04:46:06.528000Z TestFramework (DEBUG): Closing down network thread test 2020-12-02T04:46:06.683000Z TestFramework (INFO): Stopping nodes test 2020-12-02T04:46:06.683000Z TestFramework.node0 (DEBUG): Stopping nodeI've added a commit to increase the memory of the container to the allowed maximum. Either that fixes it, or we update the comment to make it generic, as Travis is going away, and this is currently being run on Cirrus.
-
fanquake commented at 5:51 AM on December 2, 2020: member
The TSAN job is now passing. @MarcoFalke can you advise if it's ok for us to just bump the memory to 24GB? I think the only potential downside is that the TSAN job may not get scheduled immediately depending on the load on Cirrus's community containers.
-
MarcoFalke commented at 7:22 AM on December 2, 2020: member
Concept ACK the changes are fine, but unrelated we should look into why bitcoind suddenly consumes more than 16GB of memory with tsan enabled.
-
MarcoFalke commented at 7:22 AM on December 2, 2020: member
Please squash your commits according to https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#squashing-commits
-
hebasto commented at 7:47 AM on December 2, 2020: member
Concept ACK.
I think the only potential downside is that the TSAN job may not get scheduled immediately depending on the load on Cirrus's community containers.
I saw a similar behavior when CPU number was bumped to its maximum.
-
MarcoFalke commented at 8:05 AM on December 2, 2020: member
If there are scheduling issues with one of the tasks, we could use compute credits for it
- fanquake force-pushed on Dec 2, 2020
-
2b356117e9
ci: no-longer exclude feature_block in TSAN job
The TSAN job is now running on Cirrus. Increase the allocated memory to the maximum allowed.
- fanquake force-pushed on Dec 2, 2020
-
practicalswift commented at 9:46 AM on December 2, 2020: contributor
Strong concept ACK
More TSAN is better than less TSAN.
And generally: if adding more testing hardware means more safety checking that is typically a very good deal :)
Aside:
The same economical argument can be applied when doing capacity planning for fuzzing hardware farms: as a general rule be very aggressive when allocating hardware resources to your long-term fuzzing jobs. It is typically a relatively cheap way to find bugs (of course: assuming good fuzzing coverage, etc.).
Some empirical results from Marcel Böhme (@mboehme) and Brandon Falk (@gamozolabs)'s excellent paper "Fuzzing: On the Exponential Cost of Vulnerability Discovery":
We present counterintuitive results for the scalability of fuzzing. Given the same non-deterministic fuzzer, finding the same bugs linearly faster requires linearly more machines. For instance, with twice the machines, we can find all known bugs in half the time. Yet, finding linearly more bugs in the same time requires exponentially more machines. For instance, for every new bug we want to find in 24 hours, we might need twice more machines. Similarly for coverage. With exponentially more machines, we can cover the same code exponentially faster, but uncovered code only linearly faster. In other words, re-discovering the same vulnerabilities is cheap but finding new vulnerabilities is expensive. This holds even under the simplifying assumption of no parallelization overhead. We derive these observations from over four CPU years worth of fuzzing campaigns involving almost three hundred open source programs, two state-of-the-art greybox fuzzers, four measures of code coverage, and two measures of vulnerability discovery. We provide a probabilistic analysis and conduct simulation experiments to explain this phenomenon.
-
jonasschnelli commented at 9:49 AM on December 2, 2020: contributor
utACK 2b356117e94f9ef27b67a8e98663f5d676f58c11 - checked the CI run and confirmed that the feature_block runs: https://cirrus-ci.com/task/6008403543719936?command=ci#L3249
-
MarcoFalke commented at 9:54 AM on December 2, 2020: member
review ACK 2b356117e94f9ef27b67a8e98663f5d676f58c11
- MarcoFalke merged this on Dec 2, 2020
- MarcoFalke closed this on Dec 2, 2020
- sidhujag referenced this in commit 4c2e9849c1 on Dec 2, 2020
-
dongcarl commented at 6:01 PM on December 19, 2020: member
I'm getting a failure that is identical to the one fanquake got: https://cirrus-ci.com/task/5238486280175616?command=ci#L3462
-
MarcoFalke commented at 6:17 PM on December 19, 2020: member
@dongcarl The issue should fix itself after the next push.
- DrahtBot locked this on Feb 15, 2022
- fanquake deleted the branch on Nov 9, 2022