tl;dr: EC_KEY_new_by_curve_name() affects global state in some versions/configs of openssl, leading to crashes when called by multiple threads. Avoid the issue by only calling it once at startup and caching the resulting group.
This is likely unnecessary for master with libsecp256k1-verification landing soon, but I think it makes sense for backports.
This is a real-world issue for libbitcoinconsensus as reported by Tamas Blummer here: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-August/010219.html
When calling EC_KEY_new_by_curve_name(), openssl internally checks to see how to setup the curve's EC_METHOD (simple, montgomery, or nist).
Unfortunately, in all released OpenSSL versions (as far as I can tell master is the only branch that has fixed this issue), it's tested like so:
- Try a method. If it fails, set a global error and return.
- If the global error is set, try a different method.
Prior to OpenSSL 1.0.0, these were tested in the order: EC_GFp_nist_method -> EC_GFp_mont_method. The secp256k1 curve fails the ec_GFp_nist_group_set_curve test and sets the global error. That error is then checked for failure, and EC_GFp_mont_method is tried (and succeeds).
Obviously that global error usage is dangerous, especially since it happens for each transaction verification in libbitcoinconsensus. In a multi-threaded environment, a crash is guaranteed within a few seconds.
However, OpenSSL 1.0.1 reversed the order, trying EC_GFp_mont_method first, so that the global error doesn't end up being used: https://github.com/openssl/openssl/commit/17674bfdf75bffa4e225f8328b9d42cb74504005
This was backported from master back to 1.0.1, but not to 1.0.0 or 0.9.8.
So that change (accidentally) "solved" the problem. As you can see, it's still possible to hit the reversed order in the !defined(OPENSSL_BN_ASM_MONT) case. That's easily tested by building OpenSSL with the -no-asm config option. It's probably also the case for obscure architectures and OSs, but I haven't looked deeply into that. In that case, it's reasonable to assume that this crash would likely occur on such platforms.
Also, OSX, even the latest version (10.10 as of now), still ships with OpenSSL 0.9.8. Which is how Tamas ran into it.