This is a rebase of #448, including a few changes (mostly implementing what I described in #185). I haven’t tested this on Windows and I actually haven’t reviewed this in detail so far, but people can start to have a look.
Clear sensitive memory without getting optimized out #636
pull real-or-random wants to merge 9 commits into bitcoin-core:master from real-or-random:cleanse changing 20 files +109 −104-
real-or-random commented at 8:49 pm on June 6, 2019: contributor
-
in src/util.h:140 in aa65cd6f73 outdated
135+#define SECP256K1_CLEANSE(v) secp256k1_cleanse(&(v), sizeof(v)) 136+static SECP256K1_INLINE void secp256k1_cleanse(void *ptr, size_t len) { 137+#if defined(_MSC_VER) 138+ /* SecureZeroMemory is guaranteed not to be optimized out by MSVC. */ 139+ SecureZeroMemory(ptr, n); 140+#elif defined(__GNUC__)
real-or-random commented at 8:51 pm on June 6, 2019:I couldn’t figure out when GCC started to support assembly but it’s some version <= 2.95… So if you have such an old compiler, I guess you’re on your own.
gmaxwell commented at 9:10 am on June 7, 2019:GCC 2.95 is likely the earliest we’d would ever encounter someone wanting to support. (Haiku OS uses it still, I believe :) )
sipa commented at 7:58 pm on July 23, 2019:Yeah, I think this is fine.in src/util.h:22 in aa65cd6f73 outdated
16 #include <stdint.h> 17 #include <stdio.h> 18 19+#if defined(_MSC_VER) 20+// For SecureZeroMemory 21+#include <Windows.h>
real-or-random commented at 8:52 pm on June 6, 2019:I think that’s okay but I’m not entirely sure if we want that. It’s used in Bitcoin Core too.
gmaxwell commented at 9:08 am on June 7, 2019:I would be that this should be version gated, I’ll see if I can figure out where thats supported to.gmaxwell commented at 11:49 pm on June 6, 2019: contributorI think using a sizeof on an argument is likely to introduce almost impossible to detect bugs when e.g. someone hands it the first element of an array (esp one passed from another function). It doesn’t feel particularly typesafe to me.
Is there a reason why you preferred this approach to having every subsystem (like field, scalar) define a clear function for its relevant type?
real-or-random commented at 6:51 am on June 7, 2019: contributorNo, I don’t prefer that to be honest. It’s just the approach taken in #448, and I was somewhat worried about it, too.
I’ll try to change it.
real-or-random force-pushed on Jun 12, 2019real-or-random marked this as ready for review on Jun 12, 2019real-or-random commented at 2:40 pm on June 12, 2019: contributorI removed the macro from the PR. Also I removed
_fe_set_zero
from the PR (one can use_fe_set_int
).I’m happy to take suggestions for additional code locations where memory should be cleared.
in src/util.h:152 in 198cf25399 outdated
146+ * This method used in memzero_explicit() the Linux kernel, too. Its advantage is that it is 147+ * pretty efficient, because the compiler can still implement the memset() efficently, 148+ * just not remove it entirely. See "Dead Store Elimination (Still) Considered Harmful" by 149+ * Yang et al. (USENIX Security 2017) for more background. 150+ */ 151+ memset(ptr, 0, len);
real-or-random commented at 2:43 pm on June 12, 2019:Now that “clearing” is cleanly separated from “setting to zero”, it’s a good idea to replace 0 with a different value here for testing that the code does not rely on the fact that the “_clear” functions set to 0. We could actually always use a different value here, or use a different value in (some runs of) tests. What do people think about this?
elichai commented at 2:54 pm on June 12, 2019:Did you try checking the binary that it’s actually not optimized out?
real-or-random commented at 3:22 pm on June 12, 2019:Not yet. But yes, we should do this with a few compilers.
jonasnick commented at 2:57 pm on June 24, 2019:it’s a good idea to replace 0 with a different value here for testing
I don’t think it’s a good idea to use a different value in testing. But I don’t see a downside to just replace
0
with42
. This also has the (slight) advantage that it’ll make it easier to detect now improper usage of*_clear
when rebasing -zkp on top of this. -zkp uses*_clear
for initializing in many places.
sipa commented at 7:59 pm on July 23, 2019:I like the idea of using a non-zero constant byte value. Using 0 is more likely to result in silent failures when uninitialized memory is accidentally used.in src/field.h:56 in 198cf25399 outdated
48@@ -49,10 +49,11 @@ static int secp256k1_fe_normalizes_to_zero(secp256k1_fe *r); 49 * implementation may optionally normalize the input, but this should not be relied upon. */ 50 static int secp256k1_fe_normalizes_to_zero_var(secp256k1_fe *r); 51 52-/** Set a field element equal to a small integer. Resulting field element is normalized. */ 53+/** Set a field element equal to a small integer. Resulting field element is normalized; it has 54+ * magnitude 0 if a == 0, and magnitude 1 otherwise. */ 55 static void secp256k1_fe_set_int(secp256k1_fe *r, int a);
real-or-random commented at 2:47 pm on June 12, 2019:(I wondered why this acceptsint
. An unsigned type seems more appropriate. For type safety, we would even want to exclude too large values and use something as small asunsigned char
. But I guess this is slower and just not worth the hassle)?
sipa commented at 8:02 pm on July 23, 2019:Agree, though this seems like something for another patch.elichai commented at 3:17 pm on June 12, 2019: contributorWhat about clearing the memory of the temporary Field Elements in the GE addition? https://github.com/bitcoin-core/secp256k1/blob/master/src/group_impl.h#L419 (and I think most of this file)real-or-random commented at 10:10 pm on June 12, 2019: contributorThanks, collecting more: https://github.com/bitcoin-core/secp256k1/blob/master/src/scalar_impl.h#L71 A lot in https://github.com/bitcoin-core/secp256k1/blob/master/src/scalar_8x32_impl.h (also also 4x64)
And I think there more in the field module… I guess I’ll just go through the entire codebase. :man_shrugging:
jonasnick commented at 11:00 pm on June 24, 2019: contributorI confirmed by looking at the disassembly that both the memory barrier and volatile function pointer method do work with gcc 9.1.0. Looking at the disassembly also confirmed that the memory barrier method lets gcc optimize the memset and the volatile function pointer method lets gcc actually call memset. I didn’t look at Windows.
I changed the memset overwrite from
0x00
to0x42
and added a function0void secp256k1_do_nothing(void) { 1 unsigned char foo[32]; 2 secp256k1_mem_clear(foo, sizeof(foo)); 3}
After compiling the tests with
-fno-stack-protector
(for brevity) the output ofobjdump -d tests
is:0000000000001ac30 <secp256k1_do_nothing>: 1 1ac30: 66 0f 6f 05 e8 2c 04 movdqa 0x42ce8(%rip),%xmm0 # 5d920 <secp256k1_ge_const_g+0x80> 2 1ac37: 00 3 1ac38: 48 8d 44 24 d8 lea -0x28(%rsp),%rax 4 1ac3d: 0f 29 44 24 d8 movaps %xmm0,-0x28(%rsp) 5 1ac42: 0f 29 44 24 e8 movaps %xmm0,-0x18(%rsp) 6 1ac47: c3 retq 7 1ac48: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 8 1ac4f: 00
This moves 16 bytes of
0x42
located at5d920
(readelf -x .rodata tests
) intoxmm0
and then moves two times 16 bytes fromxmm0
into thefoo
buffer.With the volatile function pointer method the disassembly is as follows:
0000000000001acd0 <secp256k1_do_nothing>: 1 1acd0: 48 83 ec 38 sub $0x38,%rsp 2 1acd4: 48 8b 05 ed a2 04 00 mov 0x4a2ed(%rip),%rax # 64fc8 <memset@GLIBC_2.2.5> 3 1acdb: ba 20 00 00 00 mov $0x20,%edx 4 1ace0: be 42 00 00 00 mov $0x42,%esi 5 1ace5: 48 8d 7c 24 10 lea 0x10(%rsp),%rdi 6 1acea: 48 89 44 24 08 mov %rax,0x8(%rsp) 7 1acef: 48 8b 44 24 08 mov 0x8(%rsp),%rax 8 1acf4: ff d0 callq *%rax 9 1acf6: 48 83 c4 38 add $0x38,%rsp 10 1acfa: c3 retq 11 1acfb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
This calls memset with arguments rsp + 10, 0x42 and 32 as expected (see the calling convention).
jonasnick commented at 11:08 pm on June 24, 2019: contributorResults are positive and very similar to the above with clang version 8.0.0.elichai commented at 11:44 pm on June 24, 2019: contributor@jonasnick did you try to benchmark if this affects performance in a non negligible way?jonasnick commented at 7:50 am on June 25, 2019: contributorOn my system the memory barrier and volatile function pointer method have no measurable performance difference. However, using one of the methods vs. using nothing inbench_sign
appears to be1.2%
slower (41.3us
vs40.8us
, averaged over 6 runs).jonasnick commented at 2:36 pm on June 25, 2019: contributorapproach ACK. Very cool, someone had to do it eventually.
Clearing no secrets in pippenger is fine, not sure why I added it. Looks like
fe_clear
is replaced everywhere withset_int(0)
and the right memsets are replaced with the corresponding_clear
functionI noticed that a few
field.h
comments about magnitudes are wrong (independent of this PR). For example a field element set to 0 withfe_set_int
will have magnitude 0 but can be correctly compared withsecp256k1_fe_equal
.Tests run fine under valgrind, also with endo and using the volatile function pointer method. I didn’t test on Windows.
Going through the codebase to look for locations where a
_clear
is missing can happen in another PR.sipa cross-referenced this on Jul 23, 2019 from issue Clear sensitive memory without getting optimized out. by isle2983jonasnick commented at 3:06 pm on July 29, 2019: contributorLooks like memsetting to
0x42
instead of0
costs about 1% :/0ecdsa_sign: min 41.3us / avg 41.5us / max 42.2us 1vs. 2ecdsa_sign: min 40.9us / avg 41.1us / max 41.8us
(I changed run_benchmark count argument from 10 to 500)
sipa commented at 4:54 pm on July 29, 2019: contributor@jonasnick That sounds like a deal breaker, if true.real-or-random commented at 6:53 pm on July 29, 2019: contributorNote that’s just signing time, not verification time.
But I agree, let’s stick to 0, that’s just simpler.
sipa commented at 10:58 pm on August 6, 2019: contributorConcept ACK.
There are a number of data types which end up being cleared using the generic
secp256k1_mem_clear()
function after this PR (including at leastsecp256k1_sha256
andsecp256k1_rfc6979_hmac_sha256
). I think it would be cleaner to provide*_clear()
functions for those as well, and use those where relevant.elichai cross-referenced this on Aug 28, 2019 from issue Questions about some clear functions usage by spartucusreal-or-random cross-referenced this on Nov 25, 2019 from issue Is the compiler optimizing out some of the benchmarks? by elichaielichai cross-referenced this on Feb 4, 2020 from issue secp256k1_ecdh_hash_function must return 0 or 1 by sipaelichai cross-referenced this on Mar 1, 2020 from issue Simplify callback logic to returning raw coordinates by elichaireal-or-random force-pushed on Apr 27, 2020real-or-random force-pushed on Apr 27, 2020Don't clear secrets in pippenger implementation
This code is not supposed to handle secret data.
real-or-random force-pushed on Apr 27, 2020Add secp256k1_memclear() for clearing secret data
We rely on memset() and an __asm__ memory barrier where it's available or on SecureZeroMemory() on Windows. The fallback implementation uses a volatile function pointer to memset which the compiler is not clever enough to optimize.
Make _set_fe_int( . , 0 ) set magnitude to 0 b17a7df814Separate secp256k1_fe_set_int( . , 0 ) from secp256k1_fe_clear()
There are two uses of the secp256k1_fe_clear() function that are now separated into these two functions in order to reflect the intent: 1) initializing the memory prior to being used -> converted to fe_set_int( . , 0 ) 2) zeroing the memory after being used such that no sensitive data remains. -> remains as fe_clear() In the latter case, 'magnitude' and 'normalized' need to be overwritten when VERIFY is enabled. Co-Authored-By: isle2983 <isle2983@yahoo.com>
Separate between clearing memory and setting to zero in tests
Co-Authored-By: isle2983 <isle2983@yahoo.com> Co-Authored-By: Pieter Wuille <pieter.wuille@gmail.com>
Use secp256k1_memclear() to clear stack memory instead of memset()
All of the invocations of secp256k1mem_clear() operate on stack memory and happen after the function is done with the memory object. This commit replaces existing memset() invocations and also adds secp256k1_memclear() to code locations where clearing was missing; there is no guarantee that this commit covers all code locations where clearing is necessary. Co-Authored-By: isle2983 <isle2983@yahoo.com>
Implement various _clear() functions with secp256k1_memclear() dda8737db2Don't rely on memset to set signed integers to 0 244c749a0aIntroduce separate _clean functions for hash module
This gives the caller more control about whether the state should be cleaned (= should be considered secret), which will be useful for example for Schnorr signature verification in the future. Moreover, it gives the caller the possibility to clean a hash struct without finalizing it.
real-or-random force-pushed on Apr 27, 2020real-or-random commented at 5:00 pm on April 27, 2020: contributorI rebased this and implemented @sipa’s suggestion for the hash API. Now the caller is responsible to call
_clean()
(instead of letting_finalize
) do it. I think this is actually the right thing to do.nit: Also renamed
secp256k1_mem_clear
tomemclear
now that we have alsomemczero
inutil.h
real-or-random commented at 5:15 pm on April 27, 2020: contributorGoing through the codebase to look for locations where a
_clear
is missing can happen in another PR.Indeed, so this is ready for review.
For looking into more places in the future, in particular within some arithmetic function, I was wondering if “overdoing” this will introduce more stores to memory, which is not only slightly worse for security but will probably also affect performace. For example, the
explicit_bzero
function in Glibc has this problem:Also, explicit_bzero only operates on RAM. If a sensitive data object never needs to have its address taken other than to call explicit_bzero, it might be stored entirely in CPU registers until the call to explicit_bzero. Then it will be copied into RAM, the copy will be erased, and the original will remain intact. Data in RAM is more likely to be exposed by a bug than data in registers, so this creates a brief window where the data is at greater risk of exposure than it would have been if the program didn’t try to erase it at all.
https://www.gnu.org/software/libc/manual/html_node/Erasing-Sensitive-Data.html
AFAIU and I’ve seen from looking at generated ASM this should not happen with our memory barrier approach but I’m not 100% sure and any insight will be helpful.
Note to myself: Cleaning registers is out of the scope currently but today the cited USENIX paper reminded me of a simple but expensive way to do this: Just run the operation again. (For example, run another SHA256 transform on dummy midstate and dummy input.) This will reuse and hence overwrite the registers (modulo inlining etc).
elichai cross-referenced this on Apr 30, 2020 from issue Add usage examples by elichaireal-or-random cross-referenced this on May 1, 2020 from issue Secret key generation and cleaning by real-or-randomreal-or-random commented at 4:28 pm on May 15, 2020: contributorApparently stores of 64-byte zero-over-zero can be faster than stores of other values, see https://twitter.com/BRIAN_____/status/1260913021116993536. This would be a reason not so overwrite with zeros.
But I think we shouldn’t have secrets that are all zeroes.This would be relevant for indistinguishability-based primitives with arbitrary inputs, e.g., encryption, but not for signing and key exchange.
mratsim cross-referenced this on May 15, 2020 from issue Protection of users private keys by mratsimreal-or-random cross-referenced this on Sep 9, 2020 from issue Cleaner infinity handling in group law and ecmult_const. by gmaxwellreal-or-random cross-referenced this on Oct 5, 2020 from issue Switch to our own memcmp function by real-or-randomelichai cross-referenced this on Oct 18, 2020 from issue Secrets Management with Secrecy/Zeroize by gakonstroconnor-blockstream cross-referenced this on May 13, 2021 from issue VERIFY_CHECK precondition for secp256k1_fe_set_int. by roconnor-blockstreamreal-or-random cross-referenced this on Oct 15, 2021 from issue Verify that secp256k1_ge_set_gej_zinv does not operate on infinity. by roconnor-blockstreamreal-or-random cross-referenced this on Nov 16, 2021 from issue Add Elligator Square module by sipareal-or-random cross-referenced this on Dec 28, 2021 from issue Signed-digit multi-comb for ecmult_gen (by peterdettman) by sipareal-or-random cross-referenced this on Aug 5, 2022 from issue Add dedicated methods for verifying asset/value proofs by apoelstrasipa commented at 3:16 pm on May 8, 2023: contributorConceptually I think it makes sense to have separate secure-erase wiping and zeroing of values. It needs a big redo by now though; are you still interesting in working on this?real-or-random commented at 3:18 pm on May 8, 2023: contributorConceptually I think it makes sense to have separate secure-erase wiping and zeroing of values. It needs a big redo by now though; are you still interesting in working on this?
Yeah, I certainly want to come back to this, but I don’t want to commit to a specific timeline. Marking as draft for now.
real-or-random marked this as a draft on May 8, 2023real-or-random added the label assurance on May 8, 2023real-or-random added the label side-channel on May 8, 2023real-or-random cross-referenced this on May 22, 2023 from issue dead store in secp256k1_ecmult_gen by jgriffiths
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 11:15 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me