[WIP] Run unit tests in parallel

MarcoFalke commented at 3:47 pm on March 29, 2018: member

Unit tests can be run in parallel on systems with more than one cpu and thus run faster.

Since each test case is run separately from all others, test cases will no longer share the same global variables.

MarcoFalke commented at 3:47 pm on March 29, 2018: member

TODO:

Should be run on make check

fanquake added the label Tests on Mar 29, 2018

practicalswift commented at 5:11 pm on March 29, 2018: contributor

Concept ACK

Very nice!

practicalswift commented at 5:13 pm on March 29, 2018: contributor

Perhaps a stupid question but why the need of a static test_list.txt? What would be the disadvantages of generating the test list at run-time?

MarcoFalke commented at 5:28 pm on March 29, 2018: member

--list_content is only documented in 1.60.0, so I assume it wouldn’t work in previous versions. Currently we support 1.47.0+

http://www.boost.org/doc/libs/1_60_0/libs/test/doc/html/boost_test/utf_reference/rt_param_reference/list_content.html https://github.com/bitcoin/bitcoin/blob/master/doc/dependencies.md#dependencies

practicalswift commented at 8:30 pm on March 29, 2018: contributor

@MarcoFalke The output from git grep -E '(BOOST_FIXTURE_TEST_SUITE|BOOST_AUTO_TEST_CASE)' -- "src/**.cpp" could perhaps be used to quickly (~14 ms on my machine) generate the list of available tests?

practicalswift commented at 8:39 pm on March 29, 2018: contributor

If we go with the static text file test_list.txt I suggest adding a script contrib/devtools/lint-test_list.sh which checks that the list of tests in test_list.txt is in sync with the list of tests given by src/test/test_bitcoin --list_content.

That way Travis would automatically catch the case where someone adds a test and forgets to manually run src/test/parallel.py --write_list (which requires running Boost.Test 1.60.0 or above) and commit the changes made to test_list.txt after adding a new test.

in src/test/parallel.py:1 in 4292954ebe outdated

0@@ -0,0 +1,864 @@
1+#!/usr/bin/env python

conscott commented at 5:08 am on March 30, 2018:

Should be python3?

in src/test/parallel.py:310 in 4292954ebe outdated

305+
306+class FilterFormat(object):
307+  def __init__(self, output_dir):
308+    if sys.stdout.isatty():
309+      # stdout needs to be unbuffered since the output is interactive.
310+      sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)

conscott commented at 5:24 am on March 30, 2018:

This seems to be fine in python 2.7 but is a problem in 3.x

0ValueError: can't have unbuffered text I/O

The 0 buffer is only valid for byte streams in python 3, from docs the default buffer policy for interactive text files is line buffering, which is probably fine.

in src/test/test_list.txt:37 in 4292954ebe outdated

32+allocator_tests/lockedpool_tests_live
33+base32_tests/base32_testvectors
34+base58_tests/base58_EncodeBase58
35+base58_tests/base58_DecodeBase58
36+base64_tests/base64_testvectors
37+abc/bech32_tests/bip173_testvectors_valid

conscott commented at 5:41 am on March 30, 2018:

not sure why abc/ is appended to path. Getting error:

0Test setup error: no test cases matching filter

Removing abc/ lets tests run and pass.

conscott commented at 5:47 am on March 30, 2018:

This also poses an interesting problem. If the test case isn’t found, its listed as FAILED TESTS, which is somewhat confusing, because bech32_tests/bip173_testvectors_valid would pass, it’s just the path listed is wrong.

Maybe the initial reader of FILE_TEST_LIST can verify the path exists before handing it off to workers.

in src/test/parallel.py:730 in 4292954ebe outdated

732+
733+  if binaries == []:
734+    parser.print_usage()
735+    sys.exit(1)
736+
737+  if options.shard_count < 1:

conscott commented at 5:52 am on March 30, 2018:

Nit: if you are going to verify the input is valid, you can check some other fields as well

Right now you could input a negative for workers, repeat, timeout, etc. without complaint

in src/test/parallel.py:672 in 4292954ebe outdated

667+  parser.add_option('--print_test_times', action='store_true', default=False,
668+                    help='list the run time of each test at the end of execution')
669+  parser.add_option('--shard_count', type='int', default=1,
670+                    help='total number of shards (for sharding test execution '
671+                         'between multiple machines)')
672+  parser.add_option('--shard_index', type='int', default=0,

conscott commented at 6:01 am on March 30, 2018:

Not sure it makes sense to give shard_index a default. I think you want to ensure that shard_count and shard_index are used in combination, so if shard_count is used and options.shard_index is None, you print proper usage. Right now options.shard_index will just default to 0, so you can’t tell if it’s being used properly with shard_count.

in src/test/parallel.py:604 in 4292954ebe outdated

599+        with self.task_lock:
600+          for task_id in range(len(self.tasks)):
601+            task = self.tasks[task_id]
602+
603+            if self.running_groups is not None:
604+              test_group = task.test_name.split('.')[0]

conscott commented at 6:15 am on March 30, 2018:

Should this be split('/')[0] ?

Right now task.test_name is a line from src/test/test_list.txt like

0bloom_tests/rolling_bloom

So the split will always return that full path, since there is no . - making option.serialize_test_cases just run everything in parallel anyway.

in src/test/parallel.py:682 in 4292954ebe outdated

677+                         'readable file. The format of the file is specified at '
678+                         'https://www.chromium.org/developers/the-json-test-results-format')
679+  parser.add_option('--timeout', type='int', default=None,
680+                    help='Interrupt all remaining processes after the given '
681+                         'time (in seconds).')
682+  parser.add_option('--serialize_test_cases', action='store_true',

conscott commented at 6:19 am on March 30, 2018:

I think this option is currently incompatible with sharding, since it shards tests in a round-robin type fashion, thus splitting up test_groups between shards. This is probably desired, but just need to document it, or try to prevent the two options being used in combination.

conscott commented at 6:21 am on March 30, 2018: contributor

Concept ACK - Tested it out and left some initial feedback. Realize it’s WIP so just left broad comments.

MarcoFalke force-pushed on Mar 30, 2018

MarcoFalke commented at 10:21 pm on April 1, 2018: member

Thanks for looking at this! I kept the patches to the gtest-parallel script minimal. Feedback not about my patches should be submitted upstream: https://github.com/google/gtest-parallel

Also, if someone knows more about autotools, help is very much appreciated to make it run on make check.

MarcoFalke added the label Up for grabs on Apr 5, 2018

MarcoFalke commented at 3:26 pm on April 5, 2018: member

@theuni Mind to give a Concept ACK/NACK or some general comments?

sipa commented at 3:44 pm on April 5, 2018: member

I have an idea for a simpler approach, where you could tell the test binary there are N processes, and which one out of those it is. It would then partition the tests randomly into N groups, and only run one of them.

If there’s interest, I’ll try to implement that soon.

MarcoFalke commented at 3:56 pm on April 5, 2018: member

@sipa That wouldn’t help running the most time-expensive test first or avoiding that two expensive tests end up in the same group?

sipa commented at 4:15 pm on April 5, 2018: member

@MarcoFalke No, but I think that’s an independent problem. If some tests take exorbitantly more time than others, perhaps those tests need to be split up.

MarcoFalke commented at 4:52 pm on April 5, 2018: member

See #10026 for an (outdated) list of slow unit tests. I haven’t checked how practical it is to split them up, but there will always be tests that run slower compared to others.

MarcoFalke force-pushed on Apr 5, 2018

MarcoFalke commented at 5:20 pm on April 5, 2018: member

The currently longest running test on my machine seems to be “test_big_witness_transaction”:

0$ python3 ./src/test/parallel.py | tail -3
1[285/287] streams_tests/streams_serializedata_xor (47 ms)
2[286/287] coinselector_tests/knapsack_solver_test (12060 ms)
3[287/287] transaction_tests/test_big_witness_transaction (17931 ms)

MarcoFalke commented at 7:58 pm on April 5, 2018: member

Got rid of test_list.txt
Run test suites in parallel instead of test cases

practicalswift commented at 8:07 pm on April 5, 2018: contributor

Repeating concept ACK

Automatic linting comment: transform_boost_output_to_test_list is unused now? :-)

MarcoFalke force-pushed on Apr 5, 2018

practicalswift commented at 9:53 pm on April 9, 2018: contributor

NACK. Prefering #12926 :-)

laanwj referenced this in commit dd1ca9e0b3 on Apr 10, 2018

test: Add parallel.py from gtest-parallel

As of commit 9ae552468cf096cb281d1ab7c87d9baea56e86c9

https://github.com/google/gtest-parallel/commit/9ae552468cf096cb281d1ab7c87d9baea56e86c9

8510c7e289

test: Adjust parallel.find_tests for our unit tests 7785663d6f

MarcoFalke force-pushed on Apr 10, 2018

MarcoFalke removed the label Up for grabs on Apr 10, 2018

MarcoFalke commented at 4:08 pm on April 10, 2018: member

Rebased

MarcoFalke closed this on Apr 10, 2018

MarcoFalke deleted the branch on Apr 10, 2018

PastaPastaPasta referenced this in commit e8962ff953 on Apr 13, 2021

PastaPastaPasta referenced this in commit 460c1ee359 on Apr 17, 2021

kittywhiskers referenced this in commit 53254c5dbf on Apr 23, 2021

MarcoFalke locked this on Sep 8, 2021

[WIP] Run unit tests in parallel #12831