tests: Add option --valgrind to run the functional tests under Valgrind

practicalswift commented at 4:13 pm on November 29, 2019: contributor

What is better than fixing bugs? Fixing entire bug classes of course! :)

Add option --valgrind to run the functional tests under Valgrind.

Regular functional testing under Valgrind would have caught many of the uninitialized reads we’ve seen historically.

Let’s kill this bug class once and for all: let’s never use an uninitialized value ever again. Or at least not one that would be triggered by running the functional tests! :)

My hope is that this addition will make it super-easy to run the functional tests under Valgrind and thus increase the probability of people making use of it :)

Hopefully test/functional/test_runner.py --valgrind will become a natural part of the pre-release QA process.

Usage:

0$ test/functional/test_runner.py --help
1…
2  --valgrind            run nodes under the valgrind memory error detector:
3                        expect at least a ~10x slowdown, valgrind 3.14 or
4                        later required

Live demo:

First, let’s re-introduce a memory bug by reverting the recent P2P uninitialized read bug fix from PR #17624 (“net: Fix an uninitialized read in ProcessMessage(…, “tx”, …) when receiving a transaction we already have”).

 0$ git diff
 1diff --git a/src/consensus/validation.h b/src/consensus/validation.h
 2index 3401eb64c..940adea33 100644
 3--- a/src/consensus/validation.h
 4+++ b/src/consensus/validation.h
 5@@ -114,7 +114,7 @@ inline ValidationState::~ValidationState() {};
 6
 7 class TxValidationState : public ValidationState {
 8 private:
 9-    TxValidationResult m_result = TxValidationResult::TX_RESULT_UNSET;
10+    TxValidationResult m_result;
11 public:
12     bool Invalid(TxValidationResult result,
13                  const std::string &reject_reason="",

Second, let’s test as normal without Valgrind:

0$ test/functional/p2p_segwit.py -l INFO
12019-11-28T09:30:42.810000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test__fc8q3qo
2…
32019-11-28T09:31:57.187000Z TestFramework (INFO): Subtest: test_non_standard_witness_blinding (Segwit active = True)
4…
52019-11-28T09:32:08.265000Z TestFramework (INFO): Tests successful

Third, let’s test with --valgrind and see if the test fail (as we expect) when the unitialized value is used:

0$ test/functional/p2p_segwit.py -l INFO --valgrind
12019-11-28T09:32:33.018000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_gtjecx2l
2…
32019-11-28T09:40:36.702000Z TestFramework (INFO): Subtest: test_non_standard_witness_blinding (Segwit active = True)
42019-11-28T09:40:37.813000Z TestFramework (ERROR): Assertion failed
5ConnectionRefusedError: [Errno 111] Connection refused

fanquake added the label Tests on Nov 29, 2019

DrahtBot commented at 7:36 pm on November 29, 2019: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#12134 (Build previous releases and run functional tests by Sjors)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

in test/functional/test_framework/test_framework.py:161 in 61a48c1b95 outdated

156@@ -157,6 +157,8 @@ def parse_args(self):
157                             help="use bitcoin-cli instead of RPC for all commands")
158         parser.add_argument("--perf", dest="perf", default=False, action="store_true",
159                             help="profile running nodes with perf for the duration of the test")
160+        parser.add_argument("--valgrind", dest="valgrind", default=False, action="store_true",
161+                            help="run nodes under the valgrind memory error detector: expect at least a ~10x slowdown, valgrind 3.14 or later required")

paymog commented at 9:13 am on November 30, 2019:

what do you think about updating the developer docs or some other documentation (I’m new to bitcoin so I don’t know where the right place is) with instructions on valgrind?

Is there a CI step that runs the functional tests? Should we use this option there?

paymog commented at 9:15 am on November 30, 2019:

Searching for valgrind doesn’t show much documentation: https://github.com/bitcoin/bitcoin/search?q=valgrind&unscoped_q=valgrind

in test/functional/test_framework/test_node.py:106 in 61a48c1b95 outdated

101+                os.path.dirname(os.path.realpath(__file__)),
102+                "..", "..", "..", "contrib", "valgrind.supp")
103+            suppressions_file = os.getenv("VALGRIND_SUPPRESSIONS_FILE",
104+                                          default_suppressions_file)
105+            self.args = ["valgrind", "--suppressions={}".format(suppressions_file),
106+                         "--gen-suppressions=all", "--exit-on-first-error=yes",

paymog commented at 9:15 am on November 30, 2019:

--exist-on-first-error might slow down the development cycle if there are multiple valgrind problems in a single change.

practicalswift commented at 10:30 am on December 2, 2019:

There is no easy way to achieve what we are trying to do here without using --exit-on-first-error, and hopefully it won’t be very common with more than one memory problem per bitcoind execution. Additionally, continuing execution after a memory error is not meaningful: the diagnostics emitted after the first error are likely spurious anyways :)

So in summary – we’ll have to live with --exit-on-first-error=yes :)

paymog commented at 9:16 am on November 30, 2019: none

Do all tests currently pass with this new valgrind option?

tests: Add option --valgrind to run nodes under valgrind in the functional tests 5db506ba59

practicalswift force-pushed on Dec 1, 2019

practicalswift commented at 2:13 pm on December 2, 2019: contributor

@paymog

There are a few failing tests: most of them due to timing assumptions that fail in the presence of the bitcoind slowdown caused by Valgrind. I plan to fix these failing tests by increasing timeouts where required.

Note that doing that has the added nice benefit of turning a few flaky tests into stable ones since said timing assumptions are intermittently violated under normal test execution due to high load, etc.

MarcoFalke commented at 3:53 pm on December 3, 2019: member

ACK 5db506ba5943868cc2c845f717508739b7f05714

in test/functional/test_framework/test_node.py:105 in 5db506ba59

100+            default_suppressions_file = os.path.join(
101+                os.path.dirname(os.path.realpath(__file__)),
102+                "..", "..", "..", "contrib", "valgrind.supp")
103+            suppressions_file = os.getenv("VALGRIND_SUPPRESSIONS_FILE",
104+                                          default_suppressions_file)
105+            self.args = ["valgrind", "--suppressions={}".format(suppressions_file),

jonatack commented at 7:44 pm on December 3, 2019:

suggestion if you have to retouch this:

 0-            self.args = ["valgrind", "--suppressions={}".format(suppressions_file),
 1-                         "--gen-suppressions=all", "--exit-on-first-error=yes",
 2-                         "--error-exitcode=1", "--quiet"] + self.args
 3+            self.args = [
 4+                "valgrind",
 5+                "--suppressions={}".format(suppressions_file),
 6+                "--gen-suppressions=all",
 7+                "--exit-on-first-error=yes",
 8+                "--error-exitcode=1",
 9+                "--quiet",
10+            ] + self.args

(like lines 89-97 before it)

jonatack commented at 8:13 pm on December 3, 2019: member

Concept ACK, thank you for working on adding valgrind for the tests. Unfortunately, every test run with --valgrind fails for me with AssertionError: [node 0] Error: no RPC connection. Pointers welcome :)

0$ valgrind --version
1valgrind-3.14.0

MarcoFalke commented at 8:16 pm on December 3, 2019: member

@jonatack What does combine_logs.py tell you, and are the directories stdout and stderr in the test dir empty?

jonatack commented at 8:29 pm on December 3, 2019: member

@MarcoFalke thanks! I don’t see stdout or stderr directories in /test and /test/functional. Here is the combine_logs.py output for p2p_segwit.py (which passes without the --valgrind option).

MarcoFalke commented at 8:52 pm on December 3, 2019: member

Try tail /tmp/bitcoin_func_test_u513zycz/node*/std* or similar (Reminds me that this should ideally be done by combine_logs.py)

MarcoFalke commented at 8:54 pm on December 3, 2019: member

Also, could you try with the suppressions file from https://github.com/bitcoin/bitcoin/pull/17490/files#diff-8982317ece972c9b26bd8545350812f2R31 ? Maybe remove the line I linked to if it still doesn’t work.

jonatack commented at 11:31 pm on December 3, 2019: member

ACK 5db506ba5943868cc2c845f717508739b7f05714

Reviewed code, reproduced the PR description steps after modifying contrib/valgrind.supp (diff)… perhaps I have a depends issue in my build.

Thank you @MarcoFalke for your suggestions.

practicalswift commented at 4:24 pm on December 10, 2019: contributor

Try tail /tmp/bitcoin_func_test_u513zycz/node*/std* or similar (Reminds me that this should ideally be done by combine_logs.py)

I’m probably missing something, but isn’t that what print_node_warnings in combine_logs.py does? :)

https://github.com/bitcoin/bitcoin/blob/1189b6acab115a7fe7bd67f8b4c6e3f55e53274e/test/functional/combine_logs.py#L98-L111

MarcoFalke commented at 4:32 pm on December 10, 2019: member

Ah right. Forgot that I already wrote that. And it indeed was also included in https://gist.github.com/jonatack/915df3494f9f7274a07e29ae092a2a47#file-pr17633-combine_log_output-txt-L263

MarcoFalke referenced this in commit 3d6752779f on Dec 10, 2019

MarcoFalke merged this on Dec 10, 2019

MarcoFalke closed this on Dec 10, 2019

sidhujag referenced this in commit 6e0bed9cbe on Dec 10, 2019

MarcoFalke referenced this in commit 90b3e59caf on Mar 11, 2020

sidhujag referenced this in commit bdfcd43d0c on Mar 11, 2020

MarcoFalke referenced this in commit 7027c67cac on Jul 2, 2020

jasonbcox referenced this in commit 86af2d5005 on Jul 30, 2020

sidhujag referenced this in commit bcf410ff00 on Nov 10, 2020

sidhujag referenced this in commit 8f6fbc027e on Nov 10, 2020

practicalswift deleted the branch on Apr 10, 2021

PastaPastaPasta referenced this in commit c5ad6c618d on Jun 4, 2021

vijaydasmp referenced this in commit dfa262c93f on Oct 4, 2021

vijaydasmp referenced this in commit ea9fa6ff79 on Mar 17, 2022

vijaydasmp referenced this in commit d673c784d1 on Mar 27, 2022

vijaydasmp referenced this in commit 80e96ab03d on Mar 28, 2022

vijaydasmp referenced this in commit f85c17c910 on Mar 31, 2022

vijaydasmp referenced this in commit 946c17cd75 on Mar 31, 2022

vijaydasmp referenced this in commit 1a0df96a11 on Apr 1, 2022

vijaydasmp referenced this in commit 82e8a6b2fd on Apr 1, 2022

vijaydasmp referenced this in commit 06246f9d55 on Apr 1, 2022

vijaydasmp referenced this in commit 53fee2a01c on Apr 2, 2022

vijaydasmp referenced this in commit 729f8d6644 on Apr 2, 2022

vijaydasmp referenced this in commit 77f106ebf5 on Apr 3, 2022

vijaydasmp referenced this in commit b080eaabde on Apr 3, 2022

vijaydasmp referenced this in commit f80a99c3b0 on Apr 3, 2022

kittywhiskers referenced this in commit 6d1ac9b459 on Apr 5, 2022

kittywhiskers referenced this in commit b0cacbb76a on Apr 5, 2022

kittywhiskers referenced this in commit 09b0205903 on Apr 5, 2022

kittywhiskers referenced this in commit 3ace58c648 on Apr 5, 2022

DrahtBot locked this on Aug 16, 2022

tests: Add option –valgrind to run the functional tests under Valgrind #17633

Conflicts