Add valgrind constant-time test to `make check`

real-or-random commented at 4:43 pm on March 4, 2020: contributor

I’m not sure about the second commit. The local ./libtool works for me and I see no reason why we should use the system libtool. Let’s see what travis says. @gmaxwell
I don’t understand really libtool --mode=execute to be honest. Can you explain why we need this here?

sipa commented at 4:49 pm on March 4, 2020: contributor

Ping @theuni: any ideas about libtool?

real-or-random force-pushed on Mar 4, 2020

gmaxwell commented at 5:39 pm on March 4, 2020: contributor

@real-or-random libtool –mode-execute is needed because when libtool is used to build a shared library the “binary” (e.g. the benches, or valgrind executable) you run isn’t a binary. – look at it, it’s a shell script and the real binary is hidden in .libs. The reason for this is so that it can run the real binary linked against your non-installed copy of the library rather than getting linked to the system library.

If you valgrind the wrapper script, that won’t be terribly informative! :)

There is probably some official way to use libtool –mode=execute in a make check. I hadn’t even noticed that there was a local copy, that’s probably part of it.

real-or-random commented at 5:45 pm on March 4, 2020: contributor

Ah thanks! We had just figured that out in IRC at the same time when you replied.

But we were wondering if there is a reason why we don’t simply pass -static as we do for the other tests?

theuni commented at 7:09 pm on March 4, 2020: contributor

Yup, passing -static makes sense. I did the same thing just the other day for benches when it was interfering with my test wrapping: https://github.com/theuni/secp256k1/commit/a9577fd029b10ff984336c716dd88c81b8cc09f8

gmaxwell commented at 1:03 am on March 5, 2020: contributor

Then you’re no longer testing the production libraries and small compiler differences change the constant time behaviour. The tests are compiled with special instrumentation and don’t use the libraries in any case (and completely static builds are not supported by some OSes today, e.g. fedora doesn’t ship static versions of most libraries anymore).

in .travis.yml:93 in 540dbafae1 outdated

89@@ -95,7 +90,7 @@ script:
90    travis_wait 30 valgrind --error-exitcode=42 ./exhaustive_tests;
91    fi
92  - if [ -n "$CTIMETEST" ]; then
93-   libtool --mode=execute valgrind  ./valgrind_ctime_test &> valgrind_ctime_test.log;
94+   ./libtool --mode=execute valgrind  ./valgrind_ctime_test &> valgrind_ctime_test.log;

sipa commented at 3:05 am on March 5, 2020:

Maybe just invoke valgrind_ctime_test.sh here?

real-or-random commented at 11:54 am on March 5, 2020:

Oh, right and that’s actually the correct thing to do because it adds the --error-exitcode=1 parameter.

Without that, valgrind simply passes the exit code through, even in the case of valgrind errors. That is, travis would succeed even if valgrind found problems. I fixed this and added a note to the commit message.

gmaxwell commented at 1:58 am on March 6, 2020:

Have you actually tested this PR? e.g. add a constant time error (perhaps by reverting a recent change) and seen if travis will fail?

real-or-random commented at 9:21 am on March 6, 2020:

Good call. I tested locally that it outputs the right exit code but apparently it does not fail on travis for reasons I don’t understand yet: -.- https://travis-ci.org/real-or-random/secp256k1/jobs/659057756#L434

On top of that, we don’t need to call the test in travis because it’s included in make check now anyway.

I’ll fix that stuff.

gmaxwell commented at 9:34 am on March 6, 2020:

That also may mean that none of the other valgrind tests in travis are doing anything.

jonasnick commented at 1:05 pm on March 25, 2020:

That also may mean that none of the other valgrind tests in travis are doing anything.

I don’t think so because the other valgrind tests have --error-exitcode=42

real-or-random commented at 12:53 pm on May 4, 2020:

Good call. I tested locally that it outputs the right exit code but apparently it does not fail on travis for reasons I don’t understand yet: -.- https://travis-ci.org/real-or-random/secp256k1/jobs/659057756#L434

For future reference: This was fixed by storing the exit code in an explicit variable instead of relying on $? twice. Apparently shells behave differently when we use $? in the case match and in the cases then.

sipa commented at 3:10 am on March 5, 2020: contributor

ACK

real-or-random force-pushed on Mar 5, 2020

real-or-random commented at 11:59 am on March 5, 2020: contributor

There is probably some official way to use libtool –mode=execute in a make check. I hadn’t even noticed that there was a local copy, that’s probably part of it.

I couldn’t find anything but the current way seems to work, so I think we should use it.

Then you’re no longer testing the production libraries and small compiler differences change the constant time behaviour. The tests are compiled with special instrumentation and don’t use the libraries in any case (and completely static builds are not supported by some OSes today, e.g. fedora doesn’t ship static versions of most libraries anymore).

Okay, yes, I was suspecting that this is the reason. Makes sense.

theuni commented at 5:36 pm on March 5, 2020: contributor

That’s fine if it matters for instrumentation.

But be aware (because it’s really non-obvious) that in this case “-static” is a libtool flag, not a linker one. Libtool builds 2 versions of the lib regardless. -static just tells the particular binaries which to link against. It doesn’t actually attempt to create a static binary.

real-or-random force-pushed on Mar 6, 2020

elichai commented at 9:58 am on March 8, 2020: contributor

Can you try using $(LIBTOOL) it Makefile.am? I think this is how autotools expects it to be used. Although that means not calling the script so not sure

Add constant-time test to `make check` and make Travis fail if it fails

Moreover, this changes the way Travis runs the constant-time tests
by adding a `--error-exitcode=1` parameter to valgrind. Without that
parameter, valgrind simply passes the exit code through, even in the
case of valgrind errors. That is, Travis would succeed even if
valgrind found problems.

668c7e60e3

Run valgrind constant-time tests with locally generated ./libtool 35a648f00a

real-or-random force-pushed on Mar 25, 2020

real-or-random referenced this in commit 8db960be6a on Mar 25, 2020

real-or-random referenced this in commit 5d1aa7fbc5 on Mar 25, 2020

real-or-random referenced this in commit 52a03512c1 on Mar 27, 2020

elichai commented at 7:03 pm on May 2, 2020: contributor

FWIW, According to my tests in adding macOS to travis, the local libtool works on macOS but the system’s libtool doesn’t. apparently Apple has it’s own tool named libtool which has nothing to do with the GNU libtool

from https://formulae.brew.sh/formula/libtool:

In order to prevent conflicts with Apple’s own libtool we have prepended a “g” so, you have instead: glibtool and glibtoolize.

also https://github.com/racket/racket/issues/337#issuecomment-17695117

real-or-random added this to the milestone initial release (1.0.0-rc.1) on May 8, 2020

elichai commented at 7:47 am on June 4, 2020: contributor

Needs a rebase. Also, maybe instead of mapping over error 127 and mapping to 77 etc we check in autotools if valgrind exists? we currently look for the valgrind header in VALGRIND_ENABLED but we could also look for the actual executable via https://www.gnu.org/software/autoconf-archive/ax_valgrind_check.html

deadalnix referenced this in commit 81b7dd6bc3 on Jun 4, 2020

deadalnix referenced this in commit fe2a79cd93 on Jun 5, 2020

elichai commented at 10:12 am on July 23, 2020: contributor

@real-or-random What’s the status here? Core might pull another libsecp update to fix because of #769 and I’d really like to see the ct tests being run by everyone running make check in core.

gmaxwell commented at 11:40 am on July 23, 2020: contributor

I’m a little dubious about this running in make check in bitcoin core. You’re going to get reports of errors which will be hard to address. e.g. there are ones with current gcc on ppc64. On one hand good to learn about them, OTOH there are already known ones which AFAIK no one is working on resolving. It’s really bad to give users errors that you won’t respond to, because it will habituate them to not reporting ones that you will respond to. (including by them googling the error and then finding messages saying it’s known/ignore it)

elichai commented at 4:06 pm on July 23, 2020: contributor

I’m a little dubious about this running in make check in bitcoin core. You’re going to get reports of errors which will be hard to address. e.g. there are ones with current gcc on ppc64. On one hand good to learn about them, OTOH there are already known ones which AFAIK no one is working on resolving. It’s really bad to give users errors that you won’t respond to, because it will habituate them to not reporting ones that you will respond to. (including by them googling the error and then finding messages saying it’s known/ignore it)

Won’t documenting those properly in the README(or an issue) will solve the problem? (and maybe someone will actually fix those)

gmaxwell commented at 8:18 pm on July 23, 2020: contributor

Unless the failure is extremely rare and specialized, documenting a failure thrown out by make check will just perpetually stop users from reporting other vaguely similar looking issues because they will find it and assume the authors already know or don’t care.

There is also plenty of collateral damage possible. A test failure here will cause some users to feel this library is unsafe and go pick some other code that not only doesn’t have constant time testing but which doesn’t make any effort to be constant time and isn’t. In the past we had issues where debian started refusing to upgrade their Bitcoin package after we added tests that failed on bigendian (at the time BE was totally unsupported by the software)– debian had been shipping a completely non-functional BE package, once make check failed, the packaging process failed, and they refused to upgrade to a newer version that failed even though all the failure did was document the existing broken behaviour.

As far as users submitting fixes go– If the worlds foremost experts in the software won’t fix an issue, what chance does anyone else have?

These sorts of issues also are at high risk of being no one’s bug: The language makes no promises about constant time behaviour, so a compiler is not faulty just for making obviously constant time code variable time. Simultaneously, if we can’t get constant time behaviour from the compiler, there is sometimes little we can do (other than write more of the impacted routines in ASM).

The test serves a purpose because it helps find the introduction of behaviours that are grossly non-constant or subtle but easily avoided– as a tool for reviewers.

If it were the case that that extensive effort had gone into making sure the tests passed on a huge variety of targets and all issues fixed such that the discovery was limiting additional improvements then there would be a case for using users to expand the search. Without first having our own comprehensive-ish coverage we couldn’t really tell that any proposed fix wasn’t actually making the matter worse someplace else. But that isn’t where this test is right now.

I’d much rather see this PR make it a separate check-like target– but while there are known issues I think it would be counterproductive to have uninformed end users running it. – besides, valgrind isn’t normally a dependency: So if the test optionally runs when valgrind is present you’ll have make check fail or pass randomly based on if valgrind is installed.

real-or-random commented at 9:34 am on July 24, 2020: contributor

Hm, I’d really like to see this test enabled by default because then there’s a chance that it will be included in packaging process of distros. If the Debian/Fedora/Arch Linux secp256k1 packaging process uses a compiler that produces variable-time code, then I’d really like to know about this. At the moment we’re just blind. The fact that C does not guarantee constant-time compilation is exactly the reason why the user should run these tests.

If it were the case that that extensive effort had gone into making sure the tests passed on a huge variety of targets and all issues fixed such that the discovery was limiting additional improvements then there would be a case for using users to expand the search.

Yes, given unlimited resources, we should do this extensive effort, and then I see merit in your argument. But I don’t believe that’s realistic. Unless someone funds another three cryptographic people working full time on this library, this won’t happen. With that said, I prefer to learn about issues now than never.

For users who switch to other libraries. Ok yes, there’s a risk. But if we want to see adoption, there are other low-hanging fruits and we just need to get our hands dirty. There are so many open issues here that make people use other code: no example code, experimental ECDH, no public hash module (okay, debatable), no docs for manual building, and most importantly no release. And let’s be honest, who’s really working on this? A handful of people who have hundreds of other things that are more interesting than maintaining a library. We had better phases this year and last year but at the moment, we don’t even have the resources to review incoming PRs.

edit: If we want, we could still disable the test in Core, e.g., by setting TESTS. But I think it should be enabled here by default.

ps: Urghs, I accidentally edited your comment instead of mine. Sorry, reverted…

gmaxwell commented at 6:18 pm on July 24, 2020: contributor

As it stands now the test fails on platforms these distributions targets! So the result would be what I pointed out above w/ debian– suddenly the distribution refuses to upgrade to new versions because the tests fail.

Having bitcoin core not running libsecp256k1 tests at all because a constant time test fails is not a good tradeoff: To avoid a test that we know fails but only indicates behaviour that is usually inconsequential in most places bitcoin core is deployed, we’d disable detection of miscompilation that can cause a total consensus fault, direct leak of secret keys, etc. – reports of issues that won’t get ignored.

Again: We already know the test fails on some systems. You say “If the Debian/Fedora/Arch Linux secp256k1 packaging process uses a compiler that produces variable-time code, then I’d really like to know about this”– but this isn’t true: We already know the fedora compiler on PPC64LE fails this test, and no one is interested in doing anything about it.

So what happens then when users show up reporting the same issue because make check fails? We already know it will fail in at least fedora’s build process (and likely debian since it already supports PPC64LE). Even if it weren’t the case that it already failed, if no one maintaining this library can afford the time to spend a couple afternoons first getting this likely-to-fail test run one time on those distro targets (e.g. by finding people with relevant systems and begging them to run the tests, or asking a distro packager to do an experimental package build that runs the test)– how can you expect that anyone is going to find the time to fix any additional reported issues?

At best all this does is teach them to not run make check because it reports issues that the developers don’t care about and send off a signal that this library is unmaintained. Or it fixes distributors on old worse versions that don’t fail, or worst it drives users to software which isn’t even remotely constant time or doesn’t even have tests (which would be pretty much every alternative to this library).

sipa commented at 6:21 pm on July 24, 2020: contributor

Could it be enabled by default just on platforms where we expect it to work?

gmaxwell commented at 6:35 pm on July 24, 2020: contributor

When it fails will someone fix the failures? If not, there is only downsides from running it.

real-or-random commented at 6:58 pm on July 24, 2020: contributor

Again: We already know the test fails on some systems. You say “If the Debian/Fedora/Arch Linux secp256k1 packaging process uses a compiler that produces variable-time code, then I’d really like to know about this”– but this isn’t true: We already know the fedora compiler on PPC64LE fails this test, and no one is interested in doing anything about it.

I think this should be fixed I’m very interested in doing something about it! But I didn’t know it. It’s not documented anywhere (except that you hinted at it above). We need to write this down at least.

Even if it weren’t the case that it already failed, if no one maintaining this library can afford the time to spend a couple afternoons first getting this likely-to-fail test run one time on those distro targets (e.g. by finding people with relevant systems and begging them to run the tests, or asking a distro packager to do an experimental package build that runs the test)– how can you expect that anyone is going to find the time to fix any additional reported issues?

Why is it set in stone that it’s a couple of afternoons? The last one took me a few minutes (https://github.com/bitcoin-core/secp256k1/pull/728). Sure, this was on a system I could test on but in the end I looked more at compiler output than running tests. But if we don’t get these reports, we don’t even know about the issues.

real-or-random commented at 7:00 pm on July 24, 2020: contributor

Could it be enabled by default just on platforms where we expect it to work?

That sounds reasonable as a first thing to do.

When it fails will someone fix the failures? If not, there is only downsides from running it.

I believe when it fails on a platform where we except it to work, that’s a bug, and sure we’ll fix bugs.

gmaxwell commented at 8:34 pm on July 24, 2020: contributor

I think this should be fixed I’m very interested in doing something about it! But I didn’t know it. It’s not documented anywhere (except that you hinted at it above).

I reported it.

#708 (comment)

and

http://gnusha.org/secp256k1/2020-01-11.log at 11:29 and 11:37

(and probably in several other places if I were to go search)

Working on this and repetitively getting zero response from anyone is extremely demoralizing.

I expect that it also fails elsewhere, but why bother testing more exotic platforms when there is a known failure in a less exotic one that no one seems interested in? I stopped testing more systems once I saw no interest in this known failure.

Similarly, why waste my time arguing me in a loop over this without pointing out that you didn’t know that it already failed? Your response is much less surprising to me now that I realize you didn’t know that it was already known to fail. If I thought that the test passed everywhere I would have been somewhat more supportive of running it by default.

Why is it set in stone that it’s a couple of afternoons? The last one took me a few minutes

It usually doesn’t take much effort to fix a constant time behaviour … Except for complicated ones like “oh gee, we can’t use < in constant time code”– for that I think we’re facing a big complicated rewrite, potential performance loss, and having to retest all over to make sure the fix doesn’t move the non-constant time bug to another platform.

But whatever effort it takes to fix an issue it takes much less effort to get a single passing test result, which was really my point.

If the project can’t find the effort to get some tests run then it’s not going to be able to muster the effort to fix issues that crop up. Without the ability to retest fixes broadly you can’t be confident that a “fix” isn’t making the problem worse except on platforms you can test. Having a year long iteration cycle where you try a fix for a users platform then 6 months later learn you broke a different user’s platform (potentially an established one with many users) isn’t a convergent process.

The best you can really do is have a set of platforms you’re able to test (either directly or via community help) and be able to make some promises that the tests pass on those platforms. Anything you can’t test a proposed fix on is just a crapshoot because any change to the software, particularly any constant time fix for another uncommon platform, might break them.

hebasto commented at 3:21 pm on February 22, 2023: member

Needs a rebase?

real-or-random commented at 6:26 pm on February 22, 2023: contributor

Well, yeah, if we want this at all.

Enabling it by default just on platforms where we expect it to work still sounds reasonable to me.

real-or-random commented at 12:22 pm on April 11, 2023: contributor

Could it be enabled by default just on platforms where we expect it to work?

I think we should do this. If not, we should at least enable it in dev-mode, so that we contributors run tests on a regular basis.

real-or-random referenced this in commit 05438fca11 on Nov 23, 2023

real-or-random referenced this in commit 02f77c9676 on Nov 23, 2023

real-or-random referenced this in commit 071d717e16 on Nov 23, 2023

real-or-random referenced this in commit c23201f2e1 on Nov 23, 2023

real-or-random commented at 4:23 pm on July 1, 2024: contributor

Tracked in https://github.com/bitcoin-core/secp256k1/issues/1560

real-or-random closed this on Jul 1, 2024

Add valgrind constant-time test to make check #723

Add valgrind constant-time test to `make check` #723