Clear sensitive memory without getting optimized out. #448

pull isle2983 wants to merge 14 commits into bitcoin-core:master from isle2983:PR-cleanse changing 25 files +79 −102
  1. isle2983 commented at 0:53 am on March 15, 2017: none

    (Previous discussion of this topic in #438 )

    explicit_bzero() is used when available on the platform (glibc 2.25+), otherwise it falls back to an inline implementation. There are many possible versions of how the inline could be implemented but I have no reason to prefer one over another.

    The commit stack does some refactoring in order to preserve the intended non-‘cleanse’ functionality done by several of the _clear() functions. I constructed the stack as unsquashed, incremental refactoring changes. Please let me know if a squash is preferred.

    The measured delta in performance appears negligible. Trials were done inside a minimal headless VMs with Debian 8 in one for glibc 2.19 and the current base of Arch Linux in the other for glibc 2.25. The full session output of the trials was logged for reference and is available as a gist.

    For reference to bare metal performance with the same CPU, timing on the underlying native Debian 8 install (with glibc 2.19 and gcc 4.9.2 running with master:

    0(wo/ endomorphism)
    1$ ./bench_sign
    2ecdsa_sign: min 55.3us / avg 55.4us / max 55.5us
    3$ ./bench_verify
    4ecdsa_verify: min 85.6us / avg 85.9us / max 86.3us
    5ecdsa_verify_openssl: min 476us / avg 484us / max 490us
    

    Trials:

    1. master w/ glibc 2.19, gcc 4.9.2 w/ and wo/ endomorphisim
     0(wo/ endomorphism)
     1$ ./bench_sign
     2ecdsa_sign: min 55.9us / avg 56.4us / max 56.7us
     3$ ./bench_verify
     4ecdsa_verify: min 86.2us / avg 86.8us / max 87.7us
     5ecdsa_verify_openssl: min 487us / avg 488us / max 491us
     6
     7(w/ endomorphism)
     8$ ./bench_sign
     9ecdsa_sign: min 56.1us / avg 56.6us / max 57.2us
    10$ ./bench_verify
    11ecdsa_verify: min 66.5us / avg 67.4us / max 68.8us
    12ecdsa_verify_openssl: min 487us / avg 493us / max 504us
    
    1. a16034039f07572a64bc9704c23abb1fff9d70ad w/ glibc 2.19 gcc 4.9.2 w/ and wo/ endomorphism
     0(wo/ endomorphism)
     1./bench_sign
     2ecdsa_sign: min 55.8us / avg 56.5us / max 57.0us
     3$ ./bench_verify
     4ecdsa_verify: min 86.3us / avg 86.6us / max 86.9us
     5ecdsa_verify_openssl: min 482us / avg 483us / max 484us
     6
     7(w/ endomorphism)
     8$ ./bench_sign
     9ecdsa_sign: min 56.0us / avg 56.5us / max 56.7us
    10$ ./bench_verify
    11ecdsa_verify: min 65.7us / avg 66.0us / max 66.4us
    12ecdsa_verify_openssl: min 482us / avg 484us / max 489us
    
    1. master w/ glibc 2.25, gcc 6.3.1 w/ and wo/ endomorphisim
     0(wo/ endomorphism)
     1./bench_sign
     2ecdsa_sign: min 57.2us / avg 57.8us / max 58.1us
     3$ ./bench_verify
     4ecdsa_verify: min 76.1us / avg 76.7us / max 77.7us
     5ecdsa_verify_openssl: min 461us / avg 464us / max 482us
     6
     7(w/ endomorphism)
     8$ ./bench_sign
     9ecdsa_sign: min 57.0us / avg 57.2us / max 57.5us
    10$ ./bench_verify
    11ecdsa_verify: min 54.8us / avg 55.1us / max 55.5us
    12ecdsa_verify_openssl: min 461us / avg 462us / max 467us
    
    1. a16034039f07572a64bc9704c23abb1fff9d70ad w/ glibc 2.25 gcc 6.3.1 w/ and wo/ endomorphism
     0(wo/ endomorphism)
     1./bench_sign
     2ecdsa_sign: min 56.7us / avg 57.0us / max 57.5us
     3$ ./bench_verify
     4ecdsa_verify: min 75.6us / avg 76.1us / max 76.6us
     5ecdsa_verify_openssl: min 460us / avg 460us / max 461us
     6
     7(w/ endomorphism)
     8$ ./bench_sign
     9ecdsa_sign: min 56.5us / avg 56.8us / max 57.1us
    10$ ./bench_verify
    11ecdsa_verify: min 54.7us / avg 54.8us / max 55.0us
    12ecdsa_verify_openssl: min 462us / avg 463us / max 464us
    

    The desired behavior of not getting optimized out was also verified by looking at the resulting assembly. With the inline implementation on glibc 2.19, the end section of secp256k1_ecmult_gen() from gcc 4.9.2 looks like:

     0	movl	%r15d, 120(%rbx)	# infinity, r_7(D)->infinity
     1	movq	128(%rsp), %rax	# %sfp, ivtmp.249
     2	cmpq	$64, %rax	#, ivtmp.249
     3	je	.L83	#,
     4	salq	$34, %rax	#, D.9134
     5	shrq	$37, %rax	#, D.9134
     6	movl	304(%rsp,%rax,4), %r9d	# gnb.d, D.9141
     7	jmp	.L84	#
     8.L83:
     9	movq	$memset, 768(%rsp)	#, volatile_memset
    10	leaq	300(%rsp), %rdi	#, tmp5061
    11	movq	768(%rsp), %rax	# volatile_memset, D.9135
    12	movl	$4, %edx	#,
    13	xorl	%esi, %esi	#
    14	call	*%rax	# D.9135
    15	movq	$memset, 816(%rsp)	#, volatile_memset
    16	leaq	976(%rsp), %rdi	#, tmp5062
    17	movq	816(%rsp), %rax	# volatile_memset, D.9135
    18	movl	$84, %edx	#,
    19	xorl	%esi, %esi	#
    20	call	*%rax	# D.9135
    21	movq	$memset, 864(%rsp)	#, volatile_memset
    22	leaq	304(%rsp), %rdi	#, tmp5063
    23	movq	864(%rsp), %rax	# volatile_memset, D.9135
    24	movl	$32, %edx	#,
    25	xorl	%esi, %esi	#
    26	call	*%rax	# D.9135
    27	addq	$1080, %rsp	#,
    

    With the explicit_bzero() from glibc 2.25 linked and with gcc 6.3.1, the same end section of secp256k1_ecmult_gen looks like:

     0	movl	$0, 240(%rsp)	#, add.infinity
     1	call	secp256k1_gej_add_ge	#
     2	addq	$1, (%rsp)	#, %sfp
     3	movq	8(%rsp), %r10	# %sfp, tmp242
     4	movq	(%rsp), %rax	# %sfp, ivtmp.629
     5	movq	16(%rsp), %r9	# %sfp, _90
     6	movq	24(%rsp), %r8	# %sfp, _85
     7	cmpq	$64, %rax	#, ivtmp.629
     8	jne	.L360	#,
     9	leaq	60(%rsp), %rdi	#, tmp380
    10	movl	$4, %esi	#,
    11	call	explicit_bzero	#
    12	leaq	160(%rsp), %rdi	#, tmp381
    13	movl	$88, %esi	#,
    14	call	explicit_bzero	#
    15	leaq	64(%rsp), %rdi	#, tmp382
    16	movl	$32, %esi	#,
    17	call	explicit_bzero	#
    18	addq	$264, %rsp	#,
    
  2. isle2983 cross-referenced this on Mar 15, 2017 from issue [trivial] clear stack variable by function by isle2983
  3. in src/util.h:82 in ec7f08d375 outdated
    76@@ -76,6 +77,17 @@ static SECP256K1_INLINE void *checked_malloc(const secp256k1_callback* cb, size_
    77     return ret;
    78 }
    79 
    80+/* Cleanses memory to prevent leaking sensitive info. Won't be optimized out. */
    81+#ifdef HAVE_EXPLICIT_BZERO
    82+#define SECP256K1_CLEANSE(v) explicit_bzero(&v, sizeof(v))
    


    sipa commented at 0:15 am on March 22, 2017:
    Perhaps use &(v) here so that compound expressions in the macro call work?

    isle2983 commented at 8:22 pm on March 27, 2017:
    Parens added to the macro as suggested
  4. in src/field_10x26_impl.h:296 in c62bf18cc4 outdated
    299@@ -300,7 +300,8 @@ SECP256K1_INLINE static int secp256k1_fe_is_odd(const secp256k1_fe *a) {
    300 SECP256K1_INLINE static void secp256k1_fe_clear(secp256k1_fe *a) {
    301     int i;
    302 #ifdef VERIFY
    303-    secp256k1_fe_verify_init(a);
    304+    a->magnitude = 0;
    


    sipa commented at 0:17 am on March 22, 2017:
    Why inline the call again here (compared to the previous commit)?
  5. in src/field_5x52_impl.h:259 in c62bf18cc4 outdated
    262@@ -263,7 +263,8 @@ SECP256K1_INLINE static int secp256k1_fe_is_odd(const secp256k1_fe *a) {
    263 SECP256K1_INLINE static void secp256k1_fe_clear(secp256k1_fe *a) {
    264     int i;
    265 #ifdef VERIFY
    266-    secp256k1_fe_verify_init(a);
    267+    a->magnitude = 0;
    


    sipa commented at 0:18 am on March 22, 2017:
    And here as well?

    isle2983 commented at 8:22 pm on March 27, 2017:

    It isn’t actually a straight inline. Both values are set to zero as part of the cleanse. The thought was to be a bit pedantic about showing the progression in the commits, but it might have been be overboard.

    In the new commit progression with secp256_fe_set_zero(), seeing it side-by-side with secp256k1_fe_clear(), with a->normalized = 1; and a->normalized = 0; respectively is clearer with the intent of the two separated functions.

  6. in src/tests.c:1893 in 3fdd634fd7 outdated
    1888@@ -1889,7 +1889,11 @@ void test_ge(void) {
    1889     secp256k1_fe zfi2, zfi3;
    1890 
    1891     secp256k1_gej_set_infinity(&gej[0]);
    1892-    secp256k1_ge_clear(&ge[0]);
    1893+    memset(&ge[0], 0, sizeof(secp256k1_ge));
    1894+#ifdef VERIFY
    


    sipa commented at 0:21 am on March 22, 2017:
    And alternative maybe is using secp256k1_ge_set_gej(&ge[0], &gej[0]) ?

    isle2983 commented at 8:22 pm on March 27, 2017:
    Changed as suggested in d056957095d8153d7b29bb2e30125e7341b763f1
  7. sipa commented at 0:27 am on March 22, 2017: contributor

    Overall concept ACK.

    I’m not sure I like the separation of secp256k1_fe_verify_init, as it places a responsibility on the caller that they shouldn’t have (as in: the group code ideally shouldn’t need to know there is such a thing as debug variables in secp256k1_fe that need initialization). Given that secp256k1_fe_clear remains, can’t the secp256k1_fe_verify_init logic remain in there? Then you can still use a strict separation between SECP256K1_CLEANSE(fe) (meaning fe won’t be used anymore) and secp256k1_fe_clear(&fe) (meaning fe must get the value 0). You could even rename the latter to secp256k1_fe_set_zero perhaps.

  8. Use AC_USE_SYSTEM_EXTENSIONS to define _GNU_SOURCE in libsecp256k1-config.h
    In glibc 2.25, the new explicit_bzero() call is guarded by #ifdef __USE_MISC in
    string.h. Via glibc 2.25's features.h, the selectors for __USE_MISC are set
    under #ifdef _GNU_SOURCE.
    
    Without this set, the linker still figures out how to link explicit_bzero(),
    but it throws a bunch of warnings about a missing prototype for
    explicit_bzero().
    
    By setting AC_USE_SYSTEM_EXTENSIONS, the system is detected and
    libsecp256k1-config.h is generated with the correct setting for _GNU_SOURCE
    
    It was found that even after this addition, the compile for the bench_internal
    executable was still throwing the warning. This was because the <stdio.h> was
    included before libsec256k1-config.h (via including src/util.h) and confusing
    the preprocessing. It turns out that this <stdio.h> include is redundant since
    it is included later from src/util.h in the correct order.
    3b32684708
  9. Introduce SECP256K1_CLEANSE()
    explicit_bzero() is newly available in glibc 2.25 (released 2017-02-05; current
    stable).
    
    The fall-back implementation uses the 'volatile' keyword to signal the compiler
    to not optimize away the zeroing in the same way as explicit_bzero().
    
    The wrapper macro SECP256K1_CLEANSE() borrows its name from OPENSSL_cleanse()
    in the openssl codebase which serves the same purpose and looks reasonably
    clear in code context.
    fa474312e8
  10. Replace secp256k1_scalar_clear() calls with SECP256K1_CLEANSE() 407cd76dcd
  11. Separate secp256k1_fe_set_zero() from secp256k1_fe_clear()
    There are two uses of the secp256k1_fe_clear() function that are now separated
    into these two functions in order to reflect the intent:
    
    1) initializing the memory prior to being used -> converted to fe_set)zero()
    2) zeroing the memory after being used such that no sensitive data remains. ->
        remains as fe_clear()
    
    In the latter case, 'magnitude' and 'normalized' need to be overwritten when
    VERIFY is enabled.
    d743f2fa97
  12. Replace secp256k1_fe_clear() with SECP256K1_CLEANSE() e270ef017b
  13. Setup test ge from the infinity gej
    ...rather than relying on zero assignment to set the initial state.
    
    Suggested by Pieter Wuille to avoid using secp256k1_ge_clear() for this
    initialization need.
    d056957095
  14. Replace secp256k1_ge_clear with SECP256K1_CLEANSE() 6e8cb68a3a
  15. Replace secp256k1_gej_clear() with SECP256K1_CLEANSE() ad225e8487
  16. Split secp256k1_ecmult_context_teardown() out of secp256k1_ecmult_context_clear
    The 'teardown' naming is to complement the 'build' function name.  To 'build'
    allocates and sets up the memory for pre_g and pre_g_128. To 'teardown' frees
    the memory.
    059012ff03
  17. Replace secp256k1_ecmult_context_clear() with SECP256K1_CLEANSE() 2f05e69300
  18. Split secp256k1_ecmult_gen_context_teardown() out of secp256k1_ecmult_gen_context_clear
    The 'teardown' naming is to complement the 'build' function name.  To 'build'
    allocates and sets up the memory for ctx->prec. To 'teardown' frees the memory.
    0edecfd377
  19. Replace secp256k1_ecmult_gen_context_clear() with SECP256K1_CLEANSE() 8feb9fb352
  20. Cleanse 'bits' variable such that it doesn't get optimized away.
    Fixes clang static analysis issue:
    
    --------------------------------------------------------------------------------
    An issue has been found in ./src/ecmult_gen_impl.h:150:5
    Type:         Dead assignment
    Description:  Value stored to 'bits' is never read
    
    0: ./src/ecmult_gen_impl.h:153:5 - Value stored to 'bits' is never read
    --------------------------------------------------------------------------------
    95824cd2e6
  21. Use SECP256K1_CLEANSE() to zero stack memory instead of memset()
    All of these conversions:
    
    1) operate on stack memory.
    2) happen after the function is done with the variable
    3) had an existing memset() action to be replaced
    
    These were found by visual inspection and may not be the total set of
    places where SECP256K1_CLEANSE should ideally be applied.
    9acb45db56
  22. isle2983 force-pushed on Mar 27, 2017
  23. isle2983 commented at 8:21 pm on March 27, 2017: none
    Great suggestions. I introduced secp256k1_fe_set_zero() and re-worked the commit history to avoid the wrangling of the #define VERIFY stuff by converting the remaining clients of secp256k1_fe_set_zero() over first. secp256k1_fe_clear() is then replaced with SECP256K1_CLEANSE() along with the others. secp256k1_fe_set_zero() remains and encapsulates the (former) verify_init() initialization logic.
  24. gmaxwell commented at 6:35 pm on April 29, 2017: contributor
    Concept ACK. I worry about the CLEANSE macro using sizeof, this is sort of begging to do the wrong thing with arrays, no?
  25. real-or-random commented at 9:05 am on April 10, 2019: contributor
    @isle2983 I considered to work on a similar PR and then found this one, which seems to be pretty mature already. :) Are you still willing to work on this PR?
  26. gmaxwell commented at 4:50 am on May 22, 2019: contributor
    @real-or-random you could also just pick up this PR, rebase, make your fixups, and just note it in the commit messages.
  27. real-or-random commented at 8:28 am on May 23, 2019: contributor
    Yes, that’s my plan; consider this PR adopted by me. I just couldn’t find time to work on this so far.
  28. real-or-random cross-referenced this on May 30, 2019 from issue Memory zeroization improvements by gmaxwell
  29. real-or-random cross-referenced this on Jun 6, 2019 from issue Clear sensitive memory without getting optimized out by real-or-random
  30. sipa commented at 7:16 pm on July 23, 2019: contributor
    Closing as superseded by #636.
  31. sipa closed this on Jul 23, 2019


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 16:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me