cross-testing

real-or-random commented at 9:18 am on November 9, 2019: contributor

In #635 I was convinced that cross-testing is a good thing. It would be nice to have cross tests for

https://github.com/elichai/ecc-secp256k1 as inspired by the discussion in #635, or another “simple” library
gnupg. If I understand https://gist.github.com/markblundeberg/af59d7cd234cbdb14dcf9e00f0ea2c17 correctly, it should produce compatible signatures (ignoring the encoding).

elichai commented at 11:42 am on November 9, 2019: contributor

Just linking to #641 (which I stopped because there was no more discussion around if it’s wanted and how to vendor the code)

real-or-random cross-referenced this on Feb 14, 2020 from issue Embed stripped down copy of micro-ecc for comparison tests? by gmaxwell

real-or-random commented at 2:56 pm on February 14, 2020: contributor

@gmaxwell suggests micro-ecc in #716. I think that’s by far the simplest alternative (simple, mostly portable C), see #716 for a detailed discussion. I’d prefer this over the other mentioned projects. What do people think?

elichai cross-referenced this on Apr 12, 2020 from issue Replace OpenSSL in cross-testing with micro-ecc(uECC) by elichai

real-or-random cross-referenced this on Sep 28, 2021 from issue [RFC] Remove OpenSSL testing support by sipa

real-or-random commented at 10:52 am on October 17, 2021: contributor

Now after the OpenSSL tests are gone, we should make this a priority.

Here are some possibilities:

kmackay/micro-ecc in pure C: See PR #738.
elichai/ecc-secp256k1 in Rust. That one is nice because @elichai says he wrote it without really looking at our code here. See PR #641.
LLFourn/secp256kfun in Rust
randombit/botan in C++

I think micro-ecc is a nice option and a good first choice. @elichai would you still interested in getting this in?

#641 got stuck because there was no clear commitment to using Rust. I wouldn’t opposed to other languages than C in the test but here’s a suggestion that I find very attractive: We could compile the Rust down to (ugly) C using mrustc and vendor this C code. This would mean that we get the best of both worlds: One can compare to an independent simple implementation with the same API and one does not need a Rust compiler to run the tests. The same is true for secp256kfun but I don’t know how independent that implementation is. (cc @LLFourn)

But before we decide on the way forward, here’s another very interesting direction: A much more sophisticated approach to this kind of testing is cryptofuzz by @guidovranken, which does differential testings of all kinds of algorithms and libraries. For example, it’s used to test libsecp256k1 against Botan on oss-fuzz (https://github.com/google/oss-fuzz/pull/5717). @guidovranken Were you aware of our OpenSSL cross tests (removed in #983)? I really wonder how crytofuzz compares against these. I can imagine it would be superior to simply use cryptofuzz and maybe integrate more libraries such as some of the ones mentioned above.

guidovranken commented at 11:43 pm on October 17, 2021: none

micro-ecc: There were issues with last time I tried and I abandoned testing it: https://github.com/kmackay/micro-ecc/issues/179
libecc Very good library albeit slow due to timing channel resistance. It is also being fuzzed on OSS-Fuzz against Botan by me.
Botan is supreme, I use it for testing in almost every crypto project
I don’t know if you’re running Wycheproof tests on libsec256k1, but if not, you definitely should. I recently found bugs in another secp256k1 library (https://github.com/paulmillr/noble-secp256k1) after it was audited by simply running Wycheproof tests on it.
Trezor firmware is very good and is in fact used in the Cryptofuzz fuzzer for libsecp256k1 https://github.com/google/oss-fuzz/blob/0c1d5231de61b4166281b4d5685aa66e9f4948a8/projects/bitcoin-core/build_cryptofuzz.sh#L73

Apart from testing sign/verify/recover, it is also a good idea to test point validation, addition and multiplication.

real-or-random commented at 9:35 am on October 18, 2021: contributor

Thanks, that’s a helpful overview! Unfortunately the Wycheproof tests do not match our semantics entirely, see https://github.com/google/wycheproof/issues/70 . @guidovranken Can you also comment on the high-level ideas here? For example, when it comes to functionatily, do you think cryptofuzz is superior in every aspect, or do you think it would still make sense for us to maintain our own cross tests?

Of course there are also other considerations besides functionality, e.g., build integration etc. For example, if we think we stick to maintain own tests because it’s easier to ship them to the user, I imagine we could still have a short run of cryptofuzz on CI.

LLFourn commented at 11:43 pm on October 18, 2021: none

The same is true for secp256kfun but I don’t know how independent that implementation is. (cc @LLFourn)

The best rust impl to try this approach for would be k256 since it is a completely independent code base and even uses different algorithms for ECC (it uses projective addition formula from https://eprint.iacr.org/2015/1060.pdf). I was going to move secp256kfun over to using it as a backend eventually anyway.

guidovranken commented at 5:37 am on October 19, 2021: none

@guidovranken Can you also comment on the high-level ideas here? For example, when it comes to functionatily, do you think cryptofuzz is superior in every aspect, or do you think it would still make sense for us to maintain our own cross tests?

A notable shortcoming of CF is that it only compares the results produced by various crypto libraries, but not the absence of results. For example it doesn’t confirm the absence of a result in these cases:

Point multiplication by 0 is an illegal operation and should not produce a result.
Point addition/multiplication involving invalid points should not produce a result.
ECDSA recovery that yields a point at infinity pubkey should not produce a result.
ECDSA verification involving a pubkey or signature which are not valid points may result false or it may not produce a result at all.

Technically, each operation run in a crypto library returns a std::optional<Result>. It returns std::nullopt if it cannot compute a result for any reason, and it returns a Result if it can. Once the operation has been run in every crypto library, CF filters out all results which are std::nullopt. The remaining set is then compared for differences.

So in Python-pseudocode:

 0results = []
 1
 2# Run operation in each crypto library
 3for module in modules:
 4    results += [ module.Run(operation) ]
 5
 6# Filter out nullopt results
 7results = filter(lambda r: r != None, results)
 8
 9# Detect discrepancies
10if len(set(results)) != 1:
11    crash()

This logic exists for a good reason in CF but the corollary is that it can miss certain types of bugs. I do want to address this at some point in the future but for now having a separate test suite is still valuable.

elichai commented at 12:48 pm on October 19, 2021: contributor

The best rust impl to try this approach for would be k256 since it is a completely independent code base and even uses different algorithms for ECC (it uses projective addition formula from https://eprint.iacr.org/2015/1060.pdf).

I’m not sure how true this is, See for example: https://github.com/RustCrypto/elliptic-curves/pull/59, https://github.com/RustCrypto/elliptic-curves/pull/82

LLFourn commented at 10:24 pm on October 19, 2021: none

The best rust impl to try this approach for would be k256 since it is a completely independent code base and even uses different algorithms for ECC (it uses projective addition formula from https://eprint.iacr.org/2015/1060.pdf).

I’m not sure how true this is, See for example: RustCrypto/elliptic-curves#59, RustCrypto/elliptic-curves#82

Right. To be clear I think they use different ECC addition/doubling formula – I guess field/scalar arithmetic could be pretty similar. I wouldn’t be the best judge of that. Still I think it’s the best candidate if you want to try cross testing against a rust codebase.

real-or-random commented at 4:46 pm on February 11, 2022: contributor

This logic exists for a good reason in CF but the corollary is that it can miss certain types of bugs. I do want to address this at some point in the future but for now having a separate test suite is still valuable.

Can you elaborate what these good reasons are? Maybe we could help.

edit: Let me clarify that even if cryptofuzz would be perfect, I still believe it’s useful to have simple cross tests in our test suite. cryptofuzz is nothing we will integrate in make check (naturally a lot of dependencies, etc). I think it’s good to have something which is easy enough to be run by not only by us but also by our users, packagers, etc. It should ideally just work out of the box without any dependencies.

guidovranken commented at 8:30 pm on February 11, 2022: none

The “good reason” is that I allow every library to fail executing an operation for any or no reason. Reasons include:

In some libraries I induce random allocation failures, to test its resiliency. If a malloc() fails, then the operation should be allowed to fail.
In some libraries I hook the PRNG that a library is using (for example for ECDSA signing). For particular outputs of the PRNG, it should be allowed to fail.
Some libraries have fixed constraints, such as a predefined bignum size. Therefore it is normal that some operations, like multiplication, will fail.
A library might be implementing only a subset of a primitive in a way that is adequate for its intended usage domain.

In some cases it is dubious whether a certain function failing is signalling some specific information. For example, if a function which computes a modular inverse fails, is it due to an internal error (e.g. malloc failure) or is it signalling that the inverse does not exist? I need to review these on a case-by-case basis, so I cannot simply interpret failure as a bug by default.

Currently, the ECC point operations all return a std::optional<component::ECC_Point>. The component::ECC_Point struct is something like this:

0struct {
1	std::string X;
2	std::string Y;
3};

In hindsight it would have been better if this struct also included a field bool is_infinity. I could change this, but I would have to do it for all modules, otherwise the whole thing won’t compile anymore, so that will take some time.

As a workaround I’ve started to add asserts to various libraries to ensure certain operations conform to certain expectations.

For example, in secp256k1, I now have:

Multiplication of valid point must succeed except if scalar is 0 (or multiple of order): https://github.com/guidovranken/cryptofuzz/blob/1cacacc30c43ef5ff9d726eeede90831d55a40e0/modules/secp256k1/module.cpp#L1121-L1127
Negation of valid point must yield valid point: https://github.com/guidovranken/cryptofuzz/blob/1cacacc30c43ef5ff9d726eeede90831d55a40e0/modules/secp256k1/module.cpp#L1191-L1194

I want to add similar asserts for point addition and point doubling, where the logic will be:

Addition of two valid points must produce a valid point UNLESS both points are negations of eachother, in which case it must produce point at infinity
Point doubling of a valid point must always succeed

Feel free to specify additional invariants for any secp256k1 function/operation and I will implement it in the fuzzer.

real-or-random commented at 10:04 am on February 18, 2022: contributor

@guidovranken Makes sense. Here’s another random thought (that may be stupid): Did you ever consider adding an old version of libsecp256k1? We have done so many changes that this may be interesting. On the other hand, I don’t know if this adds much to the diversity of implementations. (Maybe it’s useful to cross-test new libsecp256k1 against old libsecp256k1 but not if you’re anyway testing in parallel against all the other libraries?)

guidovranken commented at 1:09 am on February 21, 2022: none

@real-or-random

Testing two versions of the same library in the same fuzzer binary is difficult due to symbol collisions, though it is possible by launching an external process.

After your suggestion I made some changes to the harness to allow compiling it with older versions of libsecp256k1 (specifically at the last commit of 2019 and the last commit of 2018 – these were arbitrary choices): https://github.com/guidovranken/cryptofuzz/blob/master/modules/secp256k1/README.md

I ran the fuzzer on those and did not find anything amiss.

Are there any specific older versions that you think are worthwhile to test? Maybe specific releases of the library that are still in widespread use (for example by Ethereum)? I suppose it would be valuable to discover bugs in older versions even if they are fixed in the current version. Sometimes software projects fix bugs while addressing something else, without ever knowing there was a bug, so this can be meaningful. I can even add an additional fuzzer for an older libsecp256k1 version to the OSS-Fuzz project.

apoelstra commented at 12:58 pm on February 21, 2022: contributor

@guidovranken one potential source of old versions might be the bundled copies of libsecp256k1 in rust-secp256k1. These are in the directory secp256k1-sys/depend/secp256k1; the revision used is recorded in the file secp256k1-sys/depend/secp256k1-HEAD-revision.txt; and all of the externally linked symbols have been conveniently renamed to include version numbers.

Not commenting on whether this is a useful direction to go in or not, but if you did want to do it, here is a corpus.

real-or-random commented at 3:41 pm on February 21, 2022: contributor

Are there any specific older versions that you think are worthwhile to test?

I didn’t have anything specific in mind. My suggestion was simply based on the observation that a lot of code has been changed during the keys, including replacement of entire internal algorithms (safegcd for modinv, or https://github.com/bitcoin-core/secp256k1/pull/1033/commits/557b31fac36529948709d4bfcc00ad3acb7e83b9 for a very recent example, …)

I suppose it would be valuable to discover bugs in older versions even if they are fixed in the current version. Sometimes software projects fix bugs while addressing something else, without ever knowing there was a bug, so this can be meaningful.

Indeed, this was my thinking.

Maybe specific releases of the library that are still in widespread use (for example by Ethereum)?

Yeah, those are an obvious choice. A lot of projects vendor our code including (old releases of) Bitcoin Core, and lightning implementations, many many altcoins. I think these are good choices for secp256k1, as are the versions bundled by rust-secp256k1 or other bindings as mentioned by Andrew.

In the end I think there are so many valid choices that I think I would just pick one or two more or less arbitrary. And covering “time” as much as possible is useful and your approach of taking the last commit in a year does the job. I think a mix of old and more recent versions is interesting: A very old version still in use is good as a target to find bugs and more recent versions are good as reference because they tend to have the same API (and support Schnorrsig etc).

I think if you run cryptofuzz with libsecp256k1 plus 10 other libraries, then I’d suppose adding an old version does not add much if you want to find bugs in the current version. But if you want to find bugs in older versions of libsecp256k1, then of course it makes a lot of sense to run the old version.

I can even add an additional fuzzer for an older libsecp256k1 version to the OSS-Fuzz project.

(With my limited knowledge of the available resources there etc.,) this sounds interesting at least.

real-or-random cross-referenced this on Apr 6, 2022 from issue OpenSSL 3 will deprecate some of the functions we use in tests by rex4539

BerryYoghurt cross-referenced this on May 16, 2022 from issue Cross testing against a naïve implementation by BerryYoghurt

real-or-random cross-referenced this on May 8, 2023 from issue Cross tests with libecc by BerryYoghurt

cross-testing #691