Unit tests can be run in parallel on systems with more than one cpu and thus run faster.
Since each test case is run separately from all others, test cases will no longer share the same global variables.
Unit tests can be run in parallel on systems with more than one cpu and thus run faster.
Since each test case is run separately from all others, test cases will no longer share the same global variables.
TODO:
make checkConcept ACK
Very nice!
Perhaps a stupid question but why the need of a static test_list.txt? What would be the disadvantages of generating the test list at run-time?
--list_content is only documented in 1.60.0, so I assume it wouldn't work in previous versions. Currently we support 1.47.0+
http://www.boost.org/doc/libs/1_60_0/libs/test/doc/html/boost_test/utf_reference/rt_param_reference/list_content.html https://github.com/bitcoin/bitcoin/blob/master/doc/dependencies.md#dependencies
@MarcoFalke The output from git grep -E '(BOOST_FIXTURE_TEST_SUITE|BOOST_AUTO_TEST_CASE)' -- "src/**.cpp" could perhaps be used to quickly (~14 ms on my machine) generate the list of available tests?
If we go with the static text file test_list.txt I suggest adding a script contrib/devtools/lint-test_list.sh which checks that the list of tests in test_list.txt is in sync with the list of tests given by src/test/test_bitcoin --list_content.
That way Travis would automatically catch the case where someone adds a test and forgets to manually run src/test/parallel.py --write_list (which requires running Boost.Test 1.60.0 or above) and commit the changes made to test_list.txt after adding a new test.
0 | @@ -0,0 +1,864 @@ 1 | +#!/usr/bin/env python
Should be python3?
305 | + 306 | +class FilterFormat(object): 307 | + def __init__(self, output_dir): 308 | + if sys.stdout.isatty(): 309 | + # stdout needs to be unbuffered since the output is interactive. 310 | + sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
This seems to be fine in python 2.7 but is a problem in 3.x
ValueError: can't have unbuffered text I/O
The 0 buffer is only valid for byte streams in python 3, from docs the default buffer policy for interactive text files is line buffering, which is probably fine.
32 | +allocator_tests/lockedpool_tests_live 33 | +base32_tests/base32_testvectors 34 | +base58_tests/base58_EncodeBase58 35 | +base58_tests/base58_DecodeBase58 36 | +base64_tests/base64_testvectors 37 | +abc/bech32_tests/bip173_testvectors_valid
not sure why abc/ is appended to path. Getting error:
Test setup error: no test cases matching filter
Removing abc/ lets tests run and pass.
This also poses an interesting problem. If the test case isn't found, its listed as FAILED TESTS, which is somewhat confusing, because bech32_tests/bip173_testvectors_valid would pass, it's just the path listed is wrong.
Maybe the initial reader of FILE_TEST_LIST can verify the path exists before handing it off to workers.
732 | + 733 | + if binaries == []: 734 | + parser.print_usage() 735 | + sys.exit(1) 736 | + 737 | + if options.shard_count < 1:
Nit: if you are going to verify the input is valid, you can check some other fields as well
Right now you could input a negative for workers, repeat, timeout, etc. without complaint
667 | + parser.add_option('--print_test_times', action='store_true', default=False, 668 | + help='list the run time of each test at the end of execution') 669 | + parser.add_option('--shard_count', type='int', default=1, 670 | + help='total number of shards (for sharding test execution ' 671 | + 'between multiple machines)') 672 | + parser.add_option('--shard_index', type='int', default=0,
Not sure it makes sense to give shard_index a default. I think you want to ensure that shard_count and shard_index are used in combination, so if shard_count is used and options.shard_index is None, you print proper usage. Right now options.shard_index will just default to 0, so you can't tell if it's being used properly with shard_count.
599 | + with self.task_lock: 600 | + for task_id in range(len(self.tasks)): 601 | + task = self.tasks[task_id] 602 | + 603 | + if self.running_groups is not None: 604 | + test_group = task.test_name.split('.')[0]
Should this be split('/')[0] ?
Right now task.test_name is a line from src/test/test_list.txt like
bloom_tests/rolling_bloom
So the split will always return that full path, since there is no . - making option.serialize_test_cases just run everything in parallel anyway.
677 | + 'readable file. The format of the file is specified at ' 678 | + 'https://www.chromium.org/developers/the-json-test-results-format') 679 | + parser.add_option('--timeout', type='int', default=None, 680 | + help='Interrupt all remaining processes after the given ' 681 | + 'time (in seconds).') 682 | + parser.add_option('--serialize_test_cases', action='store_true',
I think this option is currently incompatible with sharding, since it shards tests in a round-robin type fashion, thus splitting up test_groups between shards. This is probably desired, but just need to document it, or try to prevent the two options being used in combination.
Concept ACK - Tested it out and left some initial feedback. Realize it's WIP so just left broad comments.
Thanks for looking at this! I kept the patches to the gtest-parallel script minimal. Feedback not about my patches should be submitted upstream: https://github.com/google/gtest-parallel
Also, if someone knows more about autotools, help is very much appreciated to make it run on make check.
@theuni Mind to give a Concept ACK/NACK or some general comments?
I have an idea for a simpler approach, where you could tell the test binary there are N processes, and which one out of those it is. It would then partition the tests randomly into N groups, and only run one of them.
If there's interest, I'll try to implement that soon.
@sipa That wouldn't help running the most time-expensive test first or avoiding that two expensive tests end up in the same group?
@MarcoFalke No, but I think that's an independent problem. If some tests take exorbitantly more time than others, perhaps those tests need to be split up.
See #10026 for an (outdated) list of slow unit tests. I haven't checked how practical it is to split them up, but there will always be tests that run slower compared to others.
The currently longest running test on my machine seems to be "test_big_witness_transaction":
$ python3 ./src/test/parallel.py | tail -3
[285/287] streams_tests/streams_serializedata_xor (47 ms)
[286/287] coinselector_tests/knapsack_solver_test (12060 ms)
[287/287] transaction_tests/test_big_witness_transaction (17931 ms)
Repeating concept ACK
Automatic linting comment: transform_boost_output_to_test_list is unused now? :-)
NACK. Prefering #12926 :-)
As of commit 9ae552468cf096cb281d1ab7c87d9baea56e86c9
https://github.com/google/gtest-parallel/commit/9ae552468cf096cb281d1ab7c87d9baea56e86c9
Rebased