./libtool
works for me and I see no reason why we should use the system libtool. Let’s see what travis says.
@gmaxwellI don’t understand really
libtool --mode=execute
to be honest. Can you explain why we need this here?
make check
#723
./libtool
works for me and I see no reason why we should use the system libtool. Let’s see what travis says.
@gmaxwelllibtool --mode=execute
to be honest. Can you explain why we need this here?
@real-or-random libtool –mode-execute is needed because when libtool is used to build a shared library the “binary” (e.g. the benches, or valgrind executable) you run isn’t a binary. – look at it, it’s a shell script and the real binary is hidden in .libs. The reason for this is so that it can run the real binary linked against your non-installed copy of the library rather than getting linked to the system library.
If you valgrind the wrapper script, that won’t be terribly informative! :)
There is probably some official way to use libtool –mode=execute in a make check. I hadn’t even noticed that there was a local copy, that’s probably part of it.
Ah thanks! We had just figured that out in IRC at the same time when you replied.
But we were wondering if there is a reason why we don’t simply pass -static
as we do for the other tests?
89@@ -95,7 +90,7 @@ script:
90 travis_wait 30 valgrind --error-exitcode=42 ./exhaustive_tests;
91 fi
92 - if [ -n "$CTIMETEST" ]; then
93- libtool --mode=execute valgrind ./valgrind_ctime_test &> valgrind_ctime_test.log;
94+ ./libtool --mode=execute valgrind ./valgrind_ctime_test &> valgrind_ctime_test.log;
Oh, right and that’s actually the correct thing to do because it adds the --error-exitcode=1
parameter.
Without that, valgrind simply passes the exit code through, even in the case of valgrind errors. That is, travis would succeed even if valgrind found problems. I fixed this and added a note to the commit message.
Good call. I tested locally that it outputs the right exit code but apparently it does not fail on travis for reasons I don’t understand yet: -.- https://travis-ci.org/real-or-random/secp256k1/jobs/659057756#L434
On top of that, we don’t need to call the test in travis because it’s included in make check now anyway.
I’ll fix that stuff.
That also may mean that none of the other valgrind tests in travis are doing anything.
I don’t think so because the other valgrind tests have --error-exitcode=42
Good call. I tested locally that it outputs the right exit code but apparently it does not fail on travis for reasons I don’t understand yet: -.- https://travis-ci.org/real-or-random/secp256k1/jobs/659057756#L434
For future reference: This was fixed by storing the exit code in an explicit variable instead of relying on $?
twice. Apparently shells behave differently when we use $?
in the case match and in the cases then.
There is probably some official way to use libtool –mode=execute in a make check. I hadn’t even noticed that there was a local copy, that’s probably part of it.
I couldn’t find anything but the current way seems to work, so I think we should use it.
Then you’re no longer testing the production libraries and small compiler differences change the constant time behaviour. The tests are compiled with special instrumentation and don’t use the libraries in any case (and completely static builds are not supported by some OSes today, e.g. fedora doesn’t ship static versions of most libraries anymore).
Okay, yes, I was suspecting that this is the reason. Makes sense.
That’s fine if it matters for instrumentation.
But be aware (because it’s really non-obvious) that in this case “-static” is a libtool flag, not a linker one. Libtool builds 2 versions of the lib regardless. -static just tells the particular binaries which to link against. It doesn’t actually attempt to create a static binary.
$(LIBTOOL)
it Makefile.am? I think this is how autotools expects it to be used.
Although that means not calling the script so not sure
Moreover, this changes the way Travis runs the constant-time tests
by adding a `--error-exitcode=1` parameter to valgrind. Without that
parameter, valgrind simply passes the exit code through, even in the
case of valgrind errors. That is, Travis would succeed even if
valgrind found problems.
FWIW, According to my tests in adding macOS to travis, the local libtool works on macOS but the system’s libtool doesn’t. apparently Apple has it’s own tool named libtool
which has nothing to do with the GNU libtool
from https://formulae.brew.sh/formula/libtool:
In order to prevent conflicts with Apple’s own libtool we have prepended a “g” so, you have instead: glibtool and glibtoolize.
also https://github.com/racket/racket/issues/337#issuecomment-17695117
VALGRIND_ENABLED
but we could also look for the actual executable via https://www.gnu.org/software/autoconf-archive/ax_valgrind_check.html
make check
in core.
I’m a little dubious about this running in make check in bitcoin core. You’re going to get reports of errors which will be hard to address. e.g. there are ones with current gcc on ppc64. On one hand good to learn about them, OTOH there are already known ones which AFAIK no one is working on resolving. It’s really bad to give users errors that you won’t respond to, because it will habituate them to not reporting ones that you will respond to. (including by them googling the error and then finding messages saying it’s known/ignore it)
Won’t documenting those properly in the README(or an issue) will solve the problem? (and maybe someone will actually fix those)
Unless the failure is extremely rare and specialized, documenting a failure thrown out by make check will just perpetually stop users from reporting other vaguely similar looking issues because they will find it and assume the authors already know or don’t care.
There is also plenty of collateral damage possible. A test failure here will cause some users to feel this library is unsafe and go pick some other code that not only doesn’t have constant time testing but which doesn’t make any effort to be constant time and isn’t. In the past we had issues where debian started refusing to upgrade their Bitcoin package after we added tests that failed on bigendian (at the time BE was totally unsupported by the software)– debian had been shipping a completely non-functional BE package, once make check failed, the packaging process failed, and they refused to upgrade to a newer version that failed even though all the failure did was document the existing broken behaviour.
As far as users submitting fixes go– If the worlds foremost experts in the software won’t fix an issue, what chance does anyone else have?
These sorts of issues also are at high risk of being no one’s bug: The language makes no promises about constant time behaviour, so a compiler is not faulty just for making obviously constant time code variable time. Simultaneously, if we can’t get constant time behaviour from the compiler, there is sometimes little we can do (other than write more of the impacted routines in ASM).
The test serves a purpose because it helps find the introduction of behaviours that are grossly non-constant or subtle but easily avoided– as a tool for reviewers.
If it were the case that that extensive effort had gone into making sure the tests passed on a huge variety of targets and all issues fixed such that the discovery was limiting additional improvements then there would be a case for using users to expand the search. Without first having our own comprehensive-ish coverage we couldn’t really tell that any proposed fix wasn’t actually making the matter worse someplace else. But that isn’t where this test is right now.
I’d much rather see this PR make it a separate check-like target– but while there are known issues I think it would be counterproductive to have uninformed end users running it. – besides, valgrind isn’t normally a dependency: So if the test optionally runs when valgrind is present you’ll have make check fail or pass randomly based on if valgrind is installed.
Hm, I’d really like to see this test enabled by default because then there’s a chance that it will be included in packaging process of distros. If the Debian/Fedora/Arch Linux secp256k1 packaging process uses a compiler that produces variable-time code, then I’d really like to know about this. At the moment we’re just blind. The fact that C does not guarantee constant-time compilation is exactly the reason why the user should run these tests.
If it were the case that that extensive effort had gone into making sure the tests passed on a huge variety of targets and all issues fixed such that the discovery was limiting additional improvements then there would be a case for using users to expand the search.
Yes, given unlimited resources, we should do this extensive effort, and then I see merit in your argument. But I don’t believe that’s realistic. Unless someone funds another three cryptographic people working full time on this library, this won’t happen. With that said, I prefer to learn about issues now than never.
For users who switch to other libraries. Ok yes, there’s a risk. But if we want to see adoption, there are other low-hanging fruits and we just need to get our hands dirty. There are so many open issues here that make people use other code: no example code, experimental ECDH, no public hash module (okay, debatable), no docs for manual building, and most importantly no release. And let’s be honest, who’s really working on this? A handful of people who have hundreds of other things that are more interesting than maintaining a library. We had better phases this year and last year but at the moment, we don’t even have the resources to review incoming PRs.
edit: If we want, we could still disable the test in Core, e.g., by setting TESTS
. But I think it should be enabled here by default.
ps: Urghs, I accidentally edited your comment instead of mine. Sorry, reverted…
As it stands now the test fails on platforms these distributions targets! So the result would be what I pointed out above w/ debian– suddenly the distribution refuses to upgrade to new versions because the tests fail.
Having bitcoin core not running libsecp256k1 tests at all because a constant time test fails is not a good tradeoff: To avoid a test that we know fails but only indicates behaviour that is usually inconsequential in most places bitcoin core is deployed, we’d disable detection of miscompilation that can cause a total consensus fault, direct leak of secret keys, etc. – reports of issues that won’t get ignored.
Again: We already know the test fails on some systems. You say “If the Debian/Fedora/Arch Linux secp256k1 packaging process uses a compiler that produces variable-time code, then I’d really like to know about this”– but this isn’t true: We already know the fedora compiler on PPC64LE fails this test, and no one is interested in doing anything about it.
So what happens then when users show up reporting the same issue because make check fails? We already know it will fail in at least fedora’s build process (and likely debian since it already supports PPC64LE). Even if it weren’t the case that it already failed, if no one maintaining this library can afford the time to spend a couple afternoons first getting this likely-to-fail test run one time on those distro targets (e.g. by finding people with relevant systems and begging them to run the tests, or asking a distro packager to do an experimental package build that runs the test)– how can you expect that anyone is going to find the time to fix any additional reported issues?
At best all this does is teach them to not run make check because it reports issues that the developers don’t care about and send off a signal that this library is unmaintained. Or it fixes distributors on old worse versions that don’t fail, or worst it drives users to software which isn’t even remotely constant time or doesn’t even have tests (which would be pretty much every alternative to this library).
Again: We already know the test fails on some systems. You say “If the Debian/Fedora/Arch Linux secp256k1 packaging process uses a compiler that produces variable-time code, then I’d really like to know about this”– but this isn’t true: We already know the fedora compiler on PPC64LE fails this test, and no one is interested in doing anything about it.
I think this should be fixed I’m very interested in doing something about it! But I didn’t know it. It’s not documented anywhere (except that you hinted at it above). We need to write this down at least.
Even if it weren’t the case that it already failed, if no one maintaining this library can afford the time to spend a couple afternoons first getting this likely-to-fail test run one time on those distro targets (e.g. by finding people with relevant systems and begging them to run the tests, or asking a distro packager to do an experimental package build that runs the test)– how can you expect that anyone is going to find the time to fix any additional reported issues?
Why is it set in stone that it’s a couple of afternoons? The last one took me a few minutes (https://github.com/bitcoin-core/secp256k1/pull/728). Sure, this was on a system I could test on but in the end I looked more at compiler output than running tests. But if we don’t get these reports, we don’t even know about the issues.
Could it be enabled by default just on platforms where we expect it to work?
That sounds reasonable as a first thing to do.
When it fails will someone fix the failures? If not, there is only downsides from running it.
I believe when it fails on a platform where we except it to work, that’s a bug, and sure we’ll fix bugs.
I think this should be fixed I’m very interested in doing something about it! But I didn’t know it. It’s not documented anywhere (except that you hinted at it above).
I reported it.
and
http://gnusha.org/secp256k1/2020-01-11.log at 11:29 and 11:37
(and probably in several other places if I were to go search)
Working on this and repetitively getting zero response from anyone is extremely demoralizing.
I expect that it also fails elsewhere, but why bother testing more exotic platforms when there is a known failure in a less exotic one that no one seems interested in? I stopped testing more systems once I saw no interest in this known failure.
Similarly, why waste my time arguing me in a loop over this without pointing out that you didn’t know that it already failed? Your response is much less surprising to me now that I realize you didn’t know that it was already known to fail. If I thought that the test passed everywhere I would have been somewhat more supportive of running it by default.
Why is it set in stone that it’s a couple of afternoons? The last one took me a few minutes
It usually doesn’t take much effort to fix a constant time behaviour … Except for complicated ones like “oh gee, we can’t use < in constant time code”– for that I think we’re facing a big complicated rewrite, potential performance loss, and having to retest all over to make sure the fix doesn’t move the non-constant time bug to another platform.
But whatever effort it takes to fix an issue it takes much less effort to get a single passing test result, which was really my point.
If the project can’t find the effort to get some tests run then it’s not going to be able to muster the effort to fix issues that crop up. Without the ability to retest fixes broadly you can’t be confident that a “fix” isn’t making the problem worse except on platforms you can test. Having a year long iteration cycle where you try a fix for a users platform then 6 months later learn you broke a different user’s platform (potentially an established one with many users) isn’t a convergent process.
The best you can really do is have a set of platforms you’re able to test (either directly or via community help) and be able to make some promises that the tests pass on those platforms. Anything you can’t test a proposed fix on is just a crapshoot because any change to the software, particularly any constant time fix for another uncommon platform, might break them.
Well, yeah, if we want this at all.
Enabling it by default just on platforms where we expect it to work still sounds reasonable to me.
Could it be enabled by default just on platforms where we expect it to work?
I think we should do this. If not, we should at least enable it in dev-mode, so that we contributors run tests on a regular basis.