crypto: cleanse HMAC stack buffers after use and ChainCode #35254

pull thomasbuilds wants to merge 2 commits into bitcoin:master from thomasbuilds:crypto-cleanse-hmac-buffers changing 6 files +20 −9
  1. thomasbuilds commented at 11:04 AM on May 10, 2026: contributor

    CHMAC_SHA256 and CHMAC_SHA512 leave two stack buffers populated on return: rkey[] holds K' ⊕ ipad after the constructor, and temp[] holds the inner-hash output after Finalize().

    When the HMAC is keyed with sensitive material (chain code in BIP32Hash() in hash.cpp for BIP32 child key derivation; PRK in HKDF-Expand in hkdf_sha256_32.cpp, used for BIP324 transport keying), rkey is one constant XOR from that key, and temp is a one-way digest covering it.

    This PR cleanses both buffers with memory_cleanse(), matching the convention already used in chacha20.cpp and chacha20poly1305.cpp. No observable change for callers.

    Update: Cleansing the HMAC primitive's internal buffers still leaves a caller's ChainCode value populated in memory after use. The second commit promotes ChainCode from typedef uint256 to a base_blob<256> subclass with a memory_cleanse() destructor, so chain codes in CExtKey, CExtPubKey, and local variables are cleansed on scope exit. MUSIG_CHAINCODE is retyped from constexpr uint256 to const ChainCode to match its BIP328 semantic role; this also removes the GCC-14 consteval lambda workaround.

  2. DrahtBot added the label Utils/log/libs on May 10, 2026
  3. DrahtBot commented at 11:04 AM on May 10, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35254.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK davidgumberg, optout21
    Concept ACK sedited
    Approach ACK winterrdog

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  4. crypto: cleanse HMAC stack buffers after use
    CHMAC_SHA256 and CHMAC_SHA512 leave two stack buffers populated on
    return: rkey[] holds K' XOR ipad after the constructor, and temp[]
    holds the inner-hash output after Finalize().
    
    When the HMAC is keyed with sensitive material (chain code in
    BIP32Hash() in hash.cpp for BIP32 child key derivation; PRK in
    HKDF-Expand in hkdf_sha256_32.cpp, used for BIP324 transport keying),
    rkey is one constant XOR from that key, and temp is a one-way digest
    covering it.
    
    Cleanse both buffers with memory_cleanse(), matching the convention
    in chacha20.cpp and chacha20poly1305.cpp. No observable change for
    callers.
    b3a3f88346
  5. thomasbuilds force-pushed on May 10, 2026
  6. sedited requested review from davidgumberg on May 10, 2026
  7. optout21 commented at 11:38 AM on May 11, 2026: contributor

    ConceptACK b3a3f88346dfd218a049acec6a77166f319c70e8

    Seems like the right thing to do, done correctly.

    While not trivial, it's possible to write a test for it (by calling a method with similar signature and layout of local variables, it's possible to read out the content of the local buffer, after the call). Do you think a test would be worthwhile here?

  8. thomasbuilds commented at 1:06 PM on May 11, 2026: contributor

    Thanks for the review. For the test I disagree, chacha20.cpp and chacha20poly1305.cpp use the same memory_cleanse pattern without equivalent coverage, not to mention the test would be fragile since stack layout isn't specified across our CI matrix.

  9. optout21 commented at 1:30 PM on May 11, 2026: contributor

    For the test I disagree, chacha20.cpp and chacha20poly1305.cpp use the same memory_cleanse pattern without equivalent coverage, not to mention the test would be fragile since stack layout isn't specified across our CI matrix.

    I can accept that a test would be an overkill, and maybe not the best way to ensure no data is left on the stack. I've asked because it's an interesting challenge to write such a test. I think it would be fragile because the test would have to replicate the stack layout of the method-under-test, which may change. The variations in CI environments is less relevant for this, but there can be other issues, like implicit memory cleanup in some environments.

  10. key: cleanse ChainCode on destruction
    HMAC primitives cleanse their internal stack buffers, but a caller's
    ChainCode remains populated in memory after use. Promote ChainCode
    from `typedef uint256` to a `base_blob<256>` subclass with a
    memory_cleanse() destructor, so chain codes in CExtKey, CExtPubKey,
    and local variables are cleansed on scope exit.
    
    Retype MUSIG_CHAINCODE from `constexpr uint256` to `const ChainCode`
    to match its BIP328 semantic role. Dropping `constexpr` (ChainCode is
    no longer a literal type) also removes the GCC-14 consteval lambda
    workaround.
    
    Remove the duplicate typedef in pubkey.h (which includes hash.h
    transitively). Two fuzz-test call sites in test/fuzz/key.cpp now
    construct the chain-code argument explicitly rather than relying on
    the typedef.
    21a1380c13
  11. in src/crypto/hmac_sha512.cpp:31 in b3a3f88346 outdated
      26 | @@ -26,11 +27,14 @@ CHMAC_SHA512::CHMAC_SHA512(const unsigned char* key, size_t keylen)
      27 |      for (int n = 0; n < 128; n++)
      28 |          rkey[n] ^= 0x5c ^ 0x36;
      29 |      inner.Write(rkey, 128);
      30 | +
      31 | +    memory_cleanse(rkey, sizeof(rkey));
    


    davidgumberg commented at 11:55 PM on May 11, 2026:

    ChainCode is not cleansed in the calling code, maybe a destructor, i.e.:

    +class ChainCode : public base_blob<256>  {
    +public:
    +    constexpr ChainCode() = default;
    +    constexpr explicit ChainCode(std::span<const unsigned char> vch) : base_blob<256>(vch) {}
    +
    +    ~ChainCode() { memory_cleanse(data(), size()); }
    +};
    +
    

    thomasbuilds commented at 1:10 PM on May 12, 2026:

    Worth considering as a follow-up, but I'd keep it out of this PR since the chain code lives on the caller side while this PR only touches HMAC.

    Btw the snippet doesn't quite work as-is since ChainCode is typedef uint256 in hash.h and pubkey.h, so replacing it with a class derived from base_blob<256> drops the uint8_t and string_view constructors uint256 provides, and breaks call sites that rely on the typedef. That's a wider refactor which would be another reason it belongs in its own PR. What do you think?


    davidgumberg commented at 3:44 PM on May 12, 2026:

    replacing it with a class derived from base_blob<256> drops the uint8_t and string_view constructors uint256 provides,

    These are not used anywhere

    breaks call sites that rely on the typedef.

    Can you be more specific?

    I tested my diff and there is only a single caller that (erroneously imo!) constructs a ChainCode from a uint256, and that would have to be fixed as well. It is like a 10 line diff, and without it, it helps little to be cleansing the chaincode in crypto code without cleansing it in the caller that invokes the crypto code.


    thomasbuilds commented at 6:47 PM on May 12, 2026:

    Indeed those constructors aren't used on ChainCode. Should have waited to post until I could actually verify sorry. There are three call sites: musig.cpp:77 + test/fuzz/key.cpp:88 and :278 where random_uint256 is passed as const ChainCode& to Derive.

    I'll type MUSIG_CHAINCODE as ChainCode directly instead of uint256, so no cross-type construction is needed at the call site. This requires dropping constexpr since the destructor isn't constexpr but it isn't used in any constant-evaluated context. Also resolves the need for the GCC-14 consteval lambda workaround. Tested on GCC 13.3 and 14.2.

    Will push the commit shortly.

  12. sedited commented at 7:40 PM on May 12, 2026: contributor

    Concept ACK

  13. thomasbuilds renamed this:
    crypto: cleanse HMAC stack buffers after use
    crypto: cleanse HMAC stack buffers after use and ChainCode
    on May 13, 2026
  14. davidgumberg commented at 7:32 PM on May 14, 2026: contributor

    crACK https://github.com/bitcoin/bitcoin/pull/35254/commits/21a1380c13470d28f53cc0b85d902693365d39d3

    Re: the discussion about testing, I'm not sure how deterministic this is, but it seems that taintgrind / secret grind might be useful for writing tests like this: https://github.com/lmrs2/secretgrind/tree/stable-0.1.

    Long-term in the project it might make sense to move toward all sensitive material to be allocated using secure_allocator, which does the cleansing but also mlock()'s secrets so that they are never swapped to disk, having more types maybe in the style of SecureString, but I think this approach is good enough as-is.

  15. DrahtBot requested review from optout21 on May 14, 2026
  16. DrahtBot requested review from sedited on May 14, 2026
  17. winterrdog commented at 1:59 PM on May 17, 2026: none

    Approach ACK

    particularly like the automatic cleansing in the ChainCode destructor & dropping the GCC 14 consteval lambda workaround (it was quite confusing), which makes the code intent clearer now

  18. optout21 commented at 10:37 AM on May 21, 2026: contributor

    ACK 21a1380c13470d28f53cc0b85d902693365d39d3

    The PR has been extended with a second commit to address that values in ChainCode instances are also cleared when no longer needed. This is achieved by 'promoting' ChainCode from a typedef to a class with cleanse logic in its destructor. Code review, local build & tests.

  19. in src/musig.cpp:12 in 21a1380c13
       8 | @@ -9,10 +9,7 @@
       9 |  
      10 |  //! MuSig2 chaincode as defined by BIP 328
      11 |  using namespace util::hex_literals;
      12 | -constexpr uint256 MUSIG_CHAINCODE{
      13 | -    // Use immediate lambda to work around GCC-14 bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117966
      14 | -    []() consteval { return uint256{"868087ca02a6f974c4598924c36b57762d32cb45717167e300622c7167e38965"_hex_u8}; }(),
      15 | -};
      16 | +const ChainCode MUSIG_CHAINCODE{"868087ca02a6f974c4598924c36b57762d32cb45717167e300622c7167e38965"_hex_u8};
    


    sedited commented at 10:49 AM on May 21, 2026:

    Why change this, and not just convert it to a ChainCode on its call site?


    thomasbuilds commented at 11:41 AM on May 21, 2026:

    Yes that works, I retyped it mainly to drop the immediate-lambda workaround: with constexpr, the _hex_u8 -> std::span construction stays in constant evaluation, which needs the workaround for GCC bug 117966 under _GLIBCXX_DEBUG. const ChainCode avoids it, downside is it's no longer a guaranteed compile-time constant, but it's only read inside CreateMuSig2SyntheticXpub() so that doesn't matter in practice.

    I can switch back to constexpr uint256 + a call-site cast if you want to keep the compile-time constant. That needs to re-ACK though, what do you think is best at this point?


    winterrdog commented at 3:09 PM on May 21, 2026:

    reading sedited's question made me dig a bit deeper into this change.

    while dropping the GCC 14 lambda hack does make the code look cleaner on the surface, changing MUSIG_CHAINCODE from constexpr uint256 to a global const ChainCode seems to introduce a couple of subtle regressions:

    1. additional (a bit) runtime overhead: keeping MUSIG_CHAINCODE as a constexpr allows the compiler to bake those 32 bytes directly into the instruction stream as an immediate value. turning it into a runtime global instead requires a data-segment lookup across translation units just to copy a public constant.

    2. shutdown cleanup (potential UB): MUSIG_CHAINCODE is a BIP-defined public constant, not a private key. making it a global ChainCode means its destructor will call memory_cleanse at program termination. wiping a public constant at shutdown seems extra and could theoretically run into static destruction sequencing issues depending on teardown order across translation units (probably unlikely in practice, but still possible in a large multi-translation-unit project like Bitcoin Core). - (edit: in a later comment in this thread, i realised that MUSIG_CHAINCODE is NOT used anywhere outside of the musig.cpp file therefore it uses internal (or local) linkage so there's no way cross-TU issues like SIOF can pop up)

    i think it would be safer and more efficient to revert MUSIG_CHAINCODE back to a constexpr uint256 (keeping the compiler workaround for now) and instead handle the explicit cast cleanly at the call site in CreateMuSig2SyntheticXpub, just like what sedited suggests:

    extpub.chaincode = ChainCode{MUSIG_CHAINCODE};
    

    in the meantime, let's wait for sedited's input too.


    thomasbuilds commented at 4:42 PM on May 21, 2026:

    @winterrdog I checked the codegen on GCC-14 and clang-18 (-O2), and the runtime side is actually the other way around.

    The current const ChainCode global compiles the assignment to a direct 32-byte copy, no call. The ChainCode{MUSIG_CHAINCODE} cast builds a temporary whose destructor runs memory_cleanse on every call (non-elidable by design), so it's the heavier option at the call site, in paths like signing and descriptor expansion. The bytes sit in .rodata/.data either way, and the constant has internal linkage, so there's no cross-TU lookup.

    You're right that the shutdown destructor runs memory_cleanse and wipes the public constant once at exit. That's a tiny waste, but not UB. The only reader is CreateMuSig2SyntheticXpub (runtime only), so nothing reads it during teardown.

    So I'd lean towards keeping it as is. The current form is the cheapest at the call site; its only real costs are .data placement and one memset at exit.


    winterrdog commented at 6:46 PM on May 22, 2026:

    The current const ChainCode global compiles the assignment to a direct 32-byte copy, no call.

    ah, true!

    after compiling a release bitcoind build with debug symbols using GCC on a x86_64/Linux machine, i checked the generated x86 disassembly for CreateMuSig2SyntheticXpub (since it's the only MUSIG_CHAINCODE reader in the codebase right now), then demangled the symbols with objdump -d -S --demangle build/bin/bitcoind:

    <details> <summary> click to see the disassembly from objdump </summary>

    00000000006f3520 <CreateMuSig2SyntheticXpub(CPubKey const&)>:
    {
      ...
        extpub.chaincode = MUSIG_CHAINCODE;
      6f3552:       66 0f 6f 0d 26 e8 61    movdqa 0x61e826(%rip),%xmm1        # d11d80 <MUSIG_CHAINCODE>
      6f3559:       00
      6f355a:       0f 11 47 20             movups %xmm0,0x20(%rdi)
      6f355e:       0f 11 4f 10             movups %xmm1,0x10(%rdi)
      6f3562:       66 0f 6f 15 26 e8 61    movdqa 0x61e826(%rip),%xmm2        # d11d90 <MUSIG_CHAINCODE+0x10>
      6f3569:       00
      ...
      ...
      6f35b3:       c3                      ret
    

    </details>

    which confirms that, with optimisation enabled, the compiler lowers the assignment to direct SIMD loads/stores with RIP-relative addressing, with no helper call involved.

    The ChainCode{MUSIG_CHAINCODE} cast builds a temporary whose destructor runs memory_cleanse on every call (non-elidable by design), so it's the heavier option at the call site, in paths like signing and descriptor expansion.

    oh, i see it now. leaning the same way on keeping it as-is

    The bytes sit in .rodata/.data either way, and the constant has internal linkage, so there's no cross-TU lookup.

    true, i found no cross-TU lookup either.

    the disassembly above confirms that MUSIG_CHAINCODE is accessed directly via RIP-relative offsets from the executing code, which shows the compiler treats it as a local data-segment object. there are also no guard-variable checks or initialization thunks before the loads, which rules out cross-TU initialization dependencies or any dynamic lookup overhead

    That's a tiny waste, but not UB.

    yes, agreed. my concern there was more theoretical than something i had actually verified in practice (should have checked the generated code first, my bad)

    The current form is the cheapest at the call site; its only real costs are .data placement and one memset at exit.

    yep, that's fair.

    after looking at the generated code, the current approach avoids repeatedly constructing temporary stack ChainCode objects every time CreateMuSig2SyntheticXpub() runs. instead, the global ChainCode instance is constructed once, reused directly, and only cleaned up during program teardown

    the generated code also makes it clear that the assignment itself is compiled down to efficient 128-bit SIMD, for x86-64 machines, moves (movdqa / movups) rather than a helper routine or repeated byte-wise copies

    so yeah, i think the tradeoff made here makes sense

    just a non-blocker:

    out of curiosity for future cleanup: since MUSIG_CHAINCODE is a public constant that does not hold actual secret wallet data, do you think it is worth eventually introducing a non-cleansing wrapper type (like a raw uint256 array or a distinct PublicChainCode view) to completely bypass the default ~ChainCode() exit-scrub overhead, or is the single memset at program exit trivial enough that we should leave the ChainCode type abstraction exactly as it is ?

  20. DrahtBot requested review from sedited on May 21, 2026
  21. DrahtBot requested review from winterrdog on May 21, 2026
  22. achow101 commented at 5:45 PM on May 21, 2026: member

    Why does the chaincode need to be cleansed? It is not secret data.

  23. thomasbuilds requested review from davidgumberg on May 21, 2026
  24. thomasbuilds commented at 12:01 PM on May 22, 2026: contributor

    For a chain code that's part of an exported xpub, you're right it's public. But a wallet also keeps master and intermediate xprv chain codes in memory during derivation, and those sit above whatever xpub you export, so they're never published. They're part of the secret extended key. And the chain code is what turns a single leaked private key into a whole subtree, since children can't be derived without it. Cleansing it limits the damage if process memory is ever exposed. @davidgumberg suggested doing this on the caller side, which is what the second commit does. That said, it's an extra safeguard, not essential. If you'd prefer to leave the chain code alone, I'm fine to drop the second commit and keep just the original first commit (the HMAC buffer cleansing).


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-06-01 06:51 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me