musig: Invalidate secnonce in secp256k1_musig_partial

john-moffett commented at 7:12 PM on September 4, 2025: contributor

Replace memset which can still be optimized out as secnonce isn't read later in this function. The API makes it clear the callee is responsible for it, so we need to assure it's cleared properly.

real-or-random commented at 6:49 AM on September 5, 2025: contributor

Replace memset which can still be optimized out as secnonce isn't read later in this function.

Why do you think it can be optimized out? I think it can be optimized out if the compiler can prove that no other code will read it. That seems unlikely, as the call is an API call. But if the compiler can prove this, then it's indeed fine to optimize it out? Or are you saying we should make absolutely sure that it's cleared out, even if the caller won't read it? Perhaps.

in src/modules/musig/session_impl.h:682 in 10c46ec318 outdated

 676 | @@ -677,9 +677,9 @@ int secp256k1_musig_partial_sign(const secp256k1_context* ctx, secp256k1_musig_p
 677 |      ARG_CHECK(secnonce != NULL);
 678 |      /* Fails if the magic doesn't match */
 679 |      ret = secp256k1_musig_secnonce_load(ctx, k, &pk, secnonce);
 680 | -    /* Set nonce to zero to avoid nonce reuse. This will cause subsequent calls
 681 | -     * of this function to fail */
 682 | -    memset(secnonce, 0, sizeof(*secnonce));
 683 | +    /* Clear and invalidate nonce to avoid nonce reuse. This will cause subsequent 
 684 | +     * calls of this function to fail */
 685 | +    secp256k1_musig_secnonce_invalidate(ctx, secnonce, 1);

real-or-random commented at 6:50 AM on September 5, 2025:

This replaces one dead store with another, so if the compiler can optimize out the one, it could also optimize out the other. If we want to avoid that the store is optimized out, we'll need secp256k1_memclear.

real-or-random added the label tweak/refactor on Sep 5, 2025

real-or-random commented at 8:44 AM on September 5, 2025: contributor

cc @jonasnick who wrote this module

jonasnick commented at 12:09 PM on September 5, 2025: contributor

Thanks @john-moffett. I think it would be an improvement to use secp256k1_memclear in this case.

But if the compiler can prove this, then it's indeed fine to optimize it out?

I think in this case it still makes sense to clear it to try to avoid having secret data in memory that we don't use anymore. This is also the argument for clearing local secret variables at the end of functions.

john-moffett force-pushed on Sep 5, 2025

john-moffett commented at 2:20 PM on September 5, 2025: contributor

Why do you think it can be optimized out? I think it can be optimized out if the compiler can prove that no other code will read it. That seems unlikely, as the call is an API call.

I agree that it's unlikely to be optimized out, and I confirmed on my setup (arm64 macOS Clang 17.0.0) that even with -O3 and full LTO that it wasn't. However, that's not guaranteed since it's possible a different compiler (or later version) can prove no later reads, so I figured it's just best practice to use a wiper much less likely to be DSE'd.

But if the compiler can prove this, then it's indeed fine to optimize it out? Or are you saying we should make absolutely sure that it's cleared out, even if the caller won't read it?

The latter. If they assumed the callee wiped the memory (as is promised in the API comment), they might not bother themselves and it could be left in caller memory.

john-moffett force-pushed on Sep 5, 2025

in src/modules/musig/session_impl.h:685 in 312ff27a9c outdated

 682 | -    memset(secnonce, 0, sizeof(*secnonce));
 683 | +     * of this function to fail. Declassify as memclear marks the bytes
 684 | +     * as undefined. */
 685 | +    secp256k1_memclear(secnonce, sizeof(*secnonce));
 686 | +    secp256k1_declassify(ctx, secnonce->data, sizeof(secp256k1_musig_secnonce_magic));
 687 | +    secp256k1_declassify(ctx, &secnonce->data[68], 64);

jonasnick commented at 5:30 PM on September 5, 2025:

We do not need to declassify because we do not use secnonce anymore after this.

john-moffett commented at 5:59 PM on September 5, 2025:

The tests reuse it inentioanlly and valgrind flags it and gives an error on VERIFY builds.

I can declassify in the tests if that's more natural.

jonasnick commented at 9:14 AM on September 7, 2025:

Ok I understand now. I missed that secp256k1_memclear uses the UNDEFINED macro (in VERIFY builds).

I think it's indeed preferrable to explicitly "redefine" the macro in the test instead of declassifying in partial_sign. I think the purpose of UNDEFINEing is to catch accidental use of the buffer and there's no legitimate reason to read the secnonce buffer except in the test that checks that it was cleared correctly.

john-moffett force-pushed on Sep 5, 2025

jonasnick commented at 5:59 PM on September 5, 2025: contributor

@john-moffett Can you remove the remaining declassify call as well?

john-moffett marked this as a draft on Sep 5, 2025

john-moffett force-pushed on Sep 6, 2025

in src/modules/musig/tests_impl.h:407 in 16f62f545d outdated

 399 | @@ -400,7 +400,9 @@ static void musig_api_tests(void) {
 400 |  
 401 |      memcpy(&secnonce_tmp, &secnonce[0], sizeof(secnonce_tmp));
 402 |      CHECK(secp256k1_musig_partial_sign(CTX, &partial_sig[0], &secnonce_tmp, &keypair[0], &keyagg_cache, &session) == 1);
 403 | -    /* The secnonce is set to 0 and subsequent signing attempts fail */
 404 | +    /* The secnonce is set to 0 and subsequent signing attempts fail.
 405 | +     * Redefine memory first as secp256k1_musig_partial_sign securely clears and undefines the bytes.*/
 406 | +    SECP256K1_CHECKMEM_DEFINE(&secnonce_tmp, sizeof(secnonce_tmp));

jonasnick commented at 9:14 AM on September 7, 2025:

    /* The secnonce is set to 0 and subsequent signing attempts fail.
     * musig_partial_sign calls secp256k1_memclear to clear the secnonce, which
     * marks it as undefined for memory checkers. So we need to explicitly mark
     * secnonce_tmp as defined before we can compare its value. */
    SECP256K1_CHECKMEM_DEFINE(&secnonce_tmp, sizeof(secnonce_tmp));

john-moffett commented at 2:11 PM on September 7, 2025:

Thanks! A clearer explanation is indeed warranted.

john-moffett force-pushed on Sep 7, 2025

john-moffett marked this as ready for review on Sep 7, 2025

jonasnick approved

jonasnick commented at 2:46 PM on September 7, 2025: contributor

ACK d872a08d6dab8157259328365f73b6d7d8dfa5d3

real-or-random commented at 7:40 AM on September 8, 2025: contributor

I agree it's better to prevent the compiler from optimizing out that store.

I'm not sure if this PR adds another code smell. I know it's me who suggested secp256k1_memclear above. In the rest of the code base, we use secp256k1_memclear to nuke memory contents. This is true here as well, but here we want the additional guarantee that the memory is overwritten by zero bytes. That fits the current implementation but not the documented contract of secp256k1_memclear. An implementation that overwrites everything with 0x42 is equally correct. We simply picked 0x00 because it appears to be faster. I doubt that will change in the future, but I still think we should avoid the contract violation.

What could we do instead?

We could use secp256k1_memclear but rename secp256k1_memclear to secp256k1_memzero and change its docs to guarantee it writes zeros. There's nothing wrong with adding that guarantee, but I don't think the result will be much cleaner if we only do this. We use SECP256K1_CHECKMEM_UNDEFINE inside this function for a reason. If the goal is to nuke a buffer, then there should be no legitimate reason to read the buffer again (as @jonasnick pointed out above). So nuking and zeroing are different concerns, and neither is a strict subset of the other. What we want in this specific case of zeroing the secnonce is simply both.

We could instead call secp256k1_memzero (after renaming) and then call SECP256K1_CHECKMEM_DEFINE. That's totally fine, and that's my suggestion. But we really should call SECP256K1_CHECKMEM_DEFINE inside the function and not only in the tests. If we provide the API guarantee that the secnonce is zeroed, then it's totally legitimate that the API user reads (if only in a test). And we don't want the user to get a Valgrind false positive when doing that!

We could also call secp256k1_memclear and then memset(..., 0, ...). That's also good. It may be a performance hit, but in any case a tiny one. (On Linux, the compiler should not be able to remove the second memset due to the memory barrier. But I wouldn't care too much.)

edit: @john-moffett Sorry that this PR escalated quickly given that the issue is seemingly simple...

real-or-random added the label side-channel on Sep 8, 2025

jonasnick commented at 12:12 PM on September 8, 2025: contributor

If we provide the API guarantee that the secnonce is zeroed, then it's totally legitimate that the API user reads (if only in a test). And we don't want the user to get a Valgrind false positive when doing that!

That's a good point that I hadn't considered.

An alternative to your suggestion is to have both a memclear and memzero function, where the latter differs from the former by guaranteeing that the memory is overwritten with 0s and not calling CHECKMEM_UNDEFINE. We'd clear memory if we never read it again (and therefore do not care what it's being overwritten with) and we zero memory if we care what it's being overwritten with because it might be read again. I think this is cleaner because we wouldn't have to call CHECKMEM_DEFINE in musig_partial_sign (which seems like a code smell in and of itself).

john-moffett commented at 12:35 PM on September 8, 2025: contributor

An alternative to your suggestion is to have both a memclear and memzero function, where the latter differs from the former by guaranteeing that the memory is overwritten with 0s and not calling CHECKMEM_UNDEFINE

I was going to make the same suggestion. But is there any value in just reusing memczero and making it non-elidable in the same way memclear is (though SecureZeroMemory would have to be replaced with something suitable for conditional erase)? As a separate matter, is it worth making memczero non-elidable anyway? Perhaps not given its current usage.

@john-moffett Sorry that this PR escalated quickly given that the issue is seemingly simple..

No problem! I should've been aware of the issue from the start, but I've learned a lot, so I'm happy.

At this point, I think the memclear and (new) memzero separation is the cleanest.

john-moffett commented at 1:20 PM on September 8, 2025: contributor

I suppose this usage of memczero suffers from the same potential issue I initially raised in this PR, so maybe memczero ought to be hardened? I agree that it's unlikely that any current compiler would actually DSE any of the calls we've been discussing, but is the possibility enough to warrant a change?

real-or-random commented at 1:28 PM on September 8, 2025: contributor

An alternative to your suggestion is to have both a memclear and memzero function, where the latter differs from the former by guaranteeing that the memory is overwritten with 0s and not calling CHECKMEM_UNDEFINE

Ok, sure, that's another option. Concept ACK .

I was considering that option too, but I didn't mention it because I had the feeling that the existence of two functions may confuse future devs. But now that I think about it, I guess it's fine. Perhaps then we can make it clear from the naming that both variants protect against optimization, e.g., by adding an _explicit (inspired by memzero_explicit in the Linux kernel) or _secure suffix.

But is there any value in just reusing memczero and making it non-elidable in the same way memclear is (though SecureZeroMemory would have to be replaced with something suitable for conditional erase)?

As a separate matter, is it worth making memczero non-elidable anyway? Perhaps not given its current usage.

That's an interesting point you raise.

Regarding its usage: I think it was initially introduced to clear the pubkey when the secret key is invalid, but without leaking the fact that it is. But protecting the fact that it is already "above and beyond" and just defense-in-depth. And perhaps a bit questionable, though it won't hurt.

But all its three usages in musig/session_impl.h (two of them wrapped by _musig_secnonce_invalidate) are for clearing secrets, and this is where it gets interesting. In this case, I believe there's a tension here between protecting against two kinds of side-channels, namely timing/memory access (protecting the secrecy of the flag) vs. someone stealing secrets from a memory dump such as a crash dump (protecting the secrecy of the memory region).

How would a function look like that is both non-elidable and protects the flag? I have no idea, exactly due to platform-specific APIs such as SecureZeroMemory. That's the right way(TM) to clear memory on Windows, and I'm not at all convinced that it's a good idea to replace it, e.g., by a handrolled version. Or by falling back to the "volatile" function pointer trick, which is supposed to be portable, but who knows what compilers will do in the future and what they will do when they encounter "volatile", which pretty much has implementation-defined semantics. The nice thing about SecureMemoryZero is really that we get a guarantee from MSVC not to elide it. So I think memczero should stay entirely separate from memclear and memzero, at least for now.

So if we can't have the best of both worlds, we need to decide what protection we want. Currently, whenever we use memczero, we implicitly favor the protection of the flag, and this seems to be a questionable choice for its usages in musig/session_impl.h (or in general when clearing out secrets). I'd much rather give the attacker that one bit whether the secret key was invalid vs. taking the risk to keep the entire secnonce (or the entire nonce seed) somewhere on the stack. If we believe that this PR is a good idea, i.e., the clearing of the secnonce should better be non-elidable, then I think we should arrive at the same conclusion for the other three uses. And in particular this use here, where the flag is a constant anyway: https://github.com/bitcoin-core/secp256k1/blob/03fb60ad2e38ae9415e64bf5ba523a10100d4c3c/src/modules/musig/session_impl.h#L485

But this may be a larger discussion, and I'd like to hear @jonasnick's opinion on it. So my suggestion is to leave memczero for a follow-up PR. This PR here could then just add memzero as a variant of memclear, perhaps add a suffix to the names, and use it in _musig_partial_sign.

john-moffett force-pushed on Sep 8, 2025

Split memclear into two versions

secp256k1_memclear has the side effect of undefining bytes for
valgrind checks. In some cases, we may want to zero bytes
but allow subsequent reads. So we split memclear into
memclear_explicit, which makes no guarantees about the content
of the buffer on return, and memzero_explicit, which guarantees
zero value on return.

Change the memset in partial_sign to use memzero_explicit.

399b582a5f

john-moffett force-pushed on Sep 8, 2025

real-or-random approved

real-or-random commented at 6:28 AM on September 9, 2025: contributor

utACK 399b582a5f7422bac3b1707d34ffad80e072f796 thanks!

jonasnick commented at 6:49 AM on September 11, 2025: contributor

we use memczero, we implicitly favor the protection of the flag, and this seems to be a questionable choice for its usages in musig/session_impl.h. [...] And in particular this use here, where the flag is a constant anyway

I agree. We have issue #1621 pointing out the inconsistencies in handling constant-timeness with respect to invalid arguments in the library.

jonasnick approved

jonasnick commented at 7:39 AM on September 11, 2025: contributor

ACK 399b582a5f7422bac3b1707d34ffad80e072f796

real-or-random merged this on Sep 12, 2025

real-or-random closed this on Sep 12, 2025

theStack commented at 1:34 PM on September 12, 2025: contributor

post-merge ACK 399b582a5f7422bac3b1707d34ffad80e072f796

musig: Invalidate secnonce in secp256k1_musig_partial_sign #1735

What could we do instead?