Use RdSeed when available, and reduce RdRand load

sipa commented at 2:43 am on January 25, 2019: member

This introduces support for autodetecting and using the RdSeed instruction on x86/x86_64 systems.

In addition:

In SeedFast, only 64 bits of entropy are generated through RdRand (256 was relatively slow).
In SeedStartup, 256 bits of entropy are generated, using RdSeed (preferably) or RdRand (otherwise).

fanquake added the label Utils/log/libs on Jan 25, 2019

sipa force-pushed on Jan 25, 2019

practicalswift commented at 9:00 am on January 25, 2019: contributor

While you’re changing random.cpp: isn’t the std::move on L562 without any effect? Could be removed?

sipa commented at 10:34 pm on January 25, 2019: member

@practicalswift MixExtract intentionally takes an rvalue reference, so that callers must either pass in a temporary, or use std::move explicitly (as the hasher object becomes useless after the call).

in src/random.cpp:131 in 83ebab177b outdated

132+#ifdef __i386__
133+    uint32_t r1, r2;
134+    __asm__ volatile (".byte 0x0f, 0xc7, 0xf0, 0x0f, 0xc7, 0xf2; setc %2" : "=a"(r1), "=d"(r2), "=q"(ok) :: "cc"); // rdrand %eax; rdrand %edx
135+    assert(ok);
136+    return (((uint64_t)r2) << 32) | r1;
137+#else

gmaxwell commented at 4:55 am on January 26, 2019:

I feel like this and the one below should be guarded by #if defined(x86_64) || defined(amd64) … I think we’d like to prefer x86_64 asm in the output, even if we’re not going to call it. :P

sipa commented at 7:25 am on January 26, 2019:

Done.

in src/random.cpp:227 in 83ebab177b outdated

223+        uint64_t out1 = GetRdRand(), out2 = GetRdRand(), out3 = GetRdRand(), out4 = GetRdRand();
224+        WriteLE64(ent32, out1);
225+        WriteLE64(ent32 + 8, out2);
226+        WriteLE64(ent32 + 16, out3);
227+        WriteLE64(ent32 + 24, out4);
228         return true;

gmaxwell commented at 5:06 am on January 26, 2019:

Maybe we should ignore it for the less critical runtime use, but pedantically:

“After invoking the RDRAND instruction, the caller must examine the carry flag (CF) to determine whether a random value was available at the time the RDRAND instruction was executed. As Table 3 shows, a value of 1 indicates that a random value was available and placed in the destination register provided in the invocation. A value of 0 indicates that a random value was not available. In current architectures the destination register will also be zeroed as a side effect of this condition.” … “It is recommended that applications attempt 10 retries in a tight loop in the unlikely event that the RDRAND instruction does not return a random number. This number is based on a binomial probability argument: given the design margins of the DRNG, the odds of ten failures in a row are astronomically small and would in fact be an indication of a larger CPU issue.”

This one might also want to consider:

“There are two approaches to structuring RDRAND invocations such that DRBG reseeding can be guaranteed:

Iteratively execute RDRAND beyond the DRBG upper bound by executing more than 1022 64-bit RDRANDs"

sipa commented at 7:25 am on January 26, 2019:

Done.

in src/random.cpp:217 in 83ebab177b outdated

215+        uint64_t out1 = GetRdSeed(), out2 = GetRdSeed(), out3 = GetRdSeed(), out4 = GetRdSeed();
216+        WriteLE64(ent32, out1);
217+        WriteLE64(ent32 + 8, out2);
218+        WriteLE64(ent32 + 16, out3);
219+        WriteLE64(ent32 + 24, out4);
220+        return true;

gmaxwell commented at 5:08 am on January 26, 2019:

“After invoking the RDSEED instruction, the caller must examine the carry flag (CF) to determine whether a random seed was available at the time the RDSEED instruction was executed. As shown in Table 5, a value of 1 indicates that a random seed was available and placed in the destination register provided in the invocation.” “Unlike the RDRAND instruction, the seed values come directly from the entropy conditioner, and it is possible for callers to invoke RDSEED faster than those values are generated. This means that applications must be designed robustly and be prepared for calls to RDSEED to fail because seeds are not available (CF=0).

If only one thread is calling RDSEED infrequently, it is very unlikely that a random seed will not be available. Only during periods of heavy demand, such as when one thread is calling RDSEED in rapid succession or multiple threads are calling RDSEED simultaneously, are underflows likely to occur. Because the RDSEED instruction does not have a fairness mechanism built into it, however, there are no guarantees as to how often a thread should retry the instruction, or how many retries might be needed, in order to obtain a random seed. In practice, this depends on the number of hardware threads on the CPU and how aggressively they are calling RDSEED.” “5.3.1.1 Synchronous applications

If the application is not latency-sensitive, then it can simply retry the RDSEED instruction indefinitely, though it is recommended that a PAUSE instruction be placed in the retry loop. In the worst-case scenario, where multiple threads are invoking RDSEED continually, the delays can be long, but the longer the delay, the more likely (with an exponentially increasing probability) that the instruction will return a result.”

sipa commented at 7:26 am on January 26, 2019:

Done.

sipa force-pushed on Jan 26, 2019

sipa commented at 7:28 am on January 26, 2019: member

@gmaxwell Wow, I didn’t know about those failure modes of RdRand and RdSeed. Rewrote things a bit to take that into account. RdRand is now retried up to 10 times (and ignored after that). RdSeed is retried indefinitely (with a pause in between). When using RdRand for 256-bit seeding, it’s invoked 1024 times and XORed to produce each 64 bit group.

gmaxwell commented at 3:47 pm on January 28, 2019: contributor

utACK

in src/random.cpp:97 in 673964328f outdated

87+static void inline GetCPUID(uint32_t leaf, uint32_t subleaf, uint32_t& a, uint32_t& b, uint32_t& c, uint32_t& d)
88+{
89+#ifdef __GNUC__
90+    __cpuid_count(leaf, subleaf, a, b, c, d);
91+#else
92+  __asm__ ("cpuid" : "=a"(a), "=b"(b), "=c"(c), "=d"(d) : "0"(leaf), "2"(subleaf));

Empact commented at 3:01 pm on January 31, 2019:

nit: whitespace

sipa commented at 10:16 pm on February 3, 2019:

Where?

Empact commented at 11:08 pm on February 3, 2019:

Line 91 is indented 2 spaces instead of 4. :P

sipa commented at 1:34 am on February 4, 2019:

Oh! Fixed.

in src/random.cpp:91 in 673964328f outdated

82+static bool g_rdrand_supported = false;
83+static bool g_rdseed_supported = false;
84 static constexpr uint32_t CPUID_F1_ECX_RDRAND = 0x40000000;
85+static constexpr uint32_t CPUID_F7_EBX_RDSEED = 0x00040000;
86+
87+static void inline GetCPUID(uint32_t leaf, uint32_t subleaf, uint32_t& a, uint32_t& b, uint32_t& c, uint32_t& d)

Empact commented at 4:07 pm on January 31, 2019:

Maybe I’m mistaken, but I think we can use __get_cpuid here, e.g.: https://github.com/linuxkit/linuxkit/blob/c5b3f24de6ba2238c5465025960f16920a731124/pkg/rngd/cmd/rngd/rng_linux_amd64.go#L11-L23

If not, how about a comment ala, or extracting a common method: https://github.com/bitcoin/bitcoin/blob/4d661baf1aca4226d1c4ec89fe7e44c1f93f04da/src/crypto/sha256.cpp#L552

sipa commented at 10:19 pm on February 3, 2019:

No, we need subleafs to detect rdseed, so __get_cpuid won’t work.

sipa commented at 10:29 pm on February 3, 2019:

Added a comment.

in src/random.cpp:84 in 673964328f outdated

77@@ -78,25 +78,112 @@ static inline int64_t GetPerformanceCounter() noexcept
78 }
79 
80 #if defined(__x86_64__) || defined(__amd64__) || defined(__i386__)
81-static bool rdrand_supported = false;
82+static bool g_rdrand_supported = false;
83+static bool g_rdseed_supported = false;
84 static constexpr uint32_t CPUID_F1_ECX_RDRAND = 0x40000000;
85+static constexpr uint32_t CPUID_F7_EBX_RDSEED = 0x00040000;

Empact commented at 4:09 pm on January 31, 2019:

Maybe static_assert against cpuid.h’s definition of bit_RDRND, bit_RDSEED if available?

sipa commented at 10:29 pm on February 3, 2019:

Done.

in src/random.cpp:104 in 673964328f outdated

102+    if (ecx & CPUID_F1_ECX_RDRAND) {
103+        g_rdrand_supported = true;
104+        GetCPUID(7, 0, eax, ebx, ecx, edx);
105+        if (ebx & CPUID_F7_EBX_RDSEED) {
106+            g_rdseed_supported = true;
107+        }

Empact commented at 4:24 pm on January 31, 2019:

Wasn’t able to find a source for this: is RDSEED always and everywhere absent if RDRAND is absent? Seems relatively harmless to not make the tests dependent on one another. https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide#inpage-nav-6-1

sipa commented at 10:28 pm on February 3, 2019:

Indeed, made them independent.

in src/random.cpp:229 in 673964328f outdated

244+        }
245+        return true;
246+    }
247+    // When falling back to RdRand, XOR the result of 1024 results.
248+    // This guarantees a reseeding occurs between each.
249+    if (g_rdrand_supported) {

Empact commented at 4:26 pm on January 31, 2019:

This is always applied when the g_rdseed_supported case is, given the dependent detection logic. else if?

gmaxwell commented at 1:04 am on February 2, 2019:

The prior block unconditionally returns if its entered, so it is not applied if rdseed is supported (and shouldn’t be.)

in src/random.cpp:144 in 673964328f outdated

138+    }
139+    for (int i = 0; i < 10; ++i) {
140+        __asm__ volatile (".byte 0x0f, 0xc7, 0xf0; setc %1" : "=a"(r2), "=q"(ok) :: "cc"); // rdrand %eax
141+        if (ok) break;
142+    }
143+    return (((uint64_t)r2) << 32) | r1;

Empact commented at 4:55 pm on January 31, 2019:

If you put two 64-bit values with additive prediction resistance togehter, the prediction resistance of the resulting value is only 65 bits (264 + 264 = 265). https://software.intel.com/en-us/blogs/2012/11/17/the-difference-between-rdrand-and-rdseed

Assume this is addressed via the XOR rounds?

gmaxwell commented at 1:01 am on February 2, 2019:

Keep reading: rdrand is guaranteed to be reseeded at least every 1022 calls, see the documentation that I provided in earlier comments.

sipa commented at 10:25 pm on February 3, 2019:

@Empact Nothing is being added together here (it’s concatenated, which doesn’t suffer from the same problem). Also, this function is only trying to produce a 64-bit value.

in src/random.cpp:208 in 673964328f outdated

231+    }
232 #endif
233+    return false;
234+}
235+
236+static bool SeedHardwareSlow(CSHA512& hasher) noexcept {

Empact commented at 4:59 pm on January 31, 2019:

Returns bool that no one reacts to. Drop or log?

practicalswift commented at 5:56 pm on February 3, 2019:

If returning bool then please consider NODISCARD.

sipa commented at 10:28 pm on February 3, 2019:

Fixed, removed the bool.

in src/random.cpp:197 in 673964328f outdated

193@@ -107,36 +194,36 @@ static void InitHardwareRand() {}
194 static void ReportHardwareRand() {}
195 #endif
196 
197-static bool GetHardwareRand(unsigned char* ent32) noexcept {
198+static bool SeedHardwareFast(CSHA512& hasher) noexcept {

Empact commented at 5:00 pm on January 31, 2019:

Returns bool that no one reacts to. Drop or log?

practicalswift commented at 5:56 pm on February 3, 2019:

Same here: if returning bool then please consider NODISCARD.

sipa commented at 10:28 pm on February 3, 2019:

Fixed. bool is gone.

Empact commented at 5:04 pm on January 31, 2019: member

Concept ACK. Apologies if any of the above is particularly naive. :P

sipa force-pushed on Feb 3, 2019

Use RdSeed when available, and reduce RdRand load

This introduces support for autodetecting and using the RdSeed instruction.

In addition:
* In SeedFast, only 64 bits of entropy are generated through RdRand (256 was relatively slow).
* In SeedStartup, 256 bits of entropy are generated, using RdSeed (preferably) or RdRand (otherwise).

1435fabc19

sipa force-pushed on Feb 4, 2019

gmaxwell commented at 2:29 am on February 7, 2019: contributor

ACK

gmaxwell commented at 2:39 am on February 7, 2019: contributor

Aside, we might want to consider this one a bug fix, since technically we weren’t using rdrand quite correctly before. OTOH, the use of rdrand was entirely non-critical… sooo…

MarcoFalke added this to the milestone 0.18.0 on Feb 7, 2019

laanwj commented at 12:26 pm on February 12, 2019: member

So to be clear: this doesn’t result in a CPU with broken and/or backdoored RDRAND or RDSEED instruction to generate vulberable private keys?

sipa commented at 3:57 pm on February 12, 2019: member

@laanwj Not trivially, at least. The output of the rdrand/rdseed is mixed with other entropy using SHA512 to produce output and the new state.

In theory a backdoored CPU could try to infer what variable is going to store entropy, and try to control that directly.

laanwj commented at 4:46 pm on February 12, 2019: member

Thanks!

In theory a backdoored CPU could try to infer what variable is going to store entropy, and try to control that directly.

Sure—I meant when say, only those instructions were made to return a fixed value, not anything more advanced/complex. Like a bugdoor with plausible deniability.

laanwj commented at 6:10 pm on February 13, 2019: member

Code review utACK 1435fabc19d2143187efb493cbe23225eaf851ae I have no hardware to be able to test this on. (if you do, /proc/cpuinfo will show the rdrand and/or rdseed under flags)

JustinTArthur commented at 7:39 pm on February 13, 2019: contributor

ACK 1435fab on macOS x86_64 on Intel Core i7-7920HQ CPU @ 3.10GHz (has RDRAND & RDSEED).

2019-02-13T19:22:30Z Using the 'sse4(1way),sse41(4way),avx2(8way)' SHA256 implementation 2019-02-13T19:22:30Z Using RdSeed as additional entropy source 2019-02-13T19:22:30Z Using RdRand as an additional entropy source

Functional and unit tests pass.

sipa commented at 8:33 pm on February 13, 2019: member

Tested on a system with RdRand but no RdSeed; tests pass, bitcoind runs fine, and reports only “Using RdRand as an additional entropy source”.

gmaxwell commented at 3:29 am on February 14, 2019: contributor

Tested on a system with no rdrand/rdseed, and it didn’t catch fire. does @luke-jr maybe have a 32-bit system with rdrand/rdseed?

gmaxwell commented at 3:35 am on February 14, 2019: contributor

@laanwj To expand on Pieter’s response. Our mixing construction is designed to be secure against adversarial inputs generated by an attacker that can also inspect the entire process state unless the attacker can attack SHA512 in a very serious way (e.g. construct a suffix to a message being hashed that causes a specific hash value).

There are various standards for combining entropy sources that just xor the sources. The construction we’re using is obviously stronger than those. If we had just xored the state then a malicious rdrand could inspect and return the current state (perhaps xored with a constant) in order to override it, and even plausibly do so as a bugdoor (e.g. oops we accidentally failed to update the register). I just mention this to point out that what we’re doing is stronger than something other people think is acceptable…

pstratem commented at 9:37 pm on February 14, 2019: contributor

@gmaxwell


 Linux debian 4.9.0-8-686-pae [#1](/bitcoin-bitcoin/1/) SMP Debian 4.9.130-2 (2018-10-27) i686 GNU/Linux
 2019-02-14T21:30:29Z Using RdSeed as additional entropy source
 2019-02-14T21:30:29Z Using RdRand as an additional entropy source

laanwj merged this on Feb 18, 2019

laanwj closed this on Feb 18, 2019

laanwj referenced this in commit 29e82e460e on Feb 18, 2019

random-zebra referenced this in commit 93f43f0f81 on Apr 14, 2021

ogabrielides referenced this in commit cd73bf3297 on Sep 14, 2021

ogabrielides referenced this in commit 454e89b2e8 on Sep 15, 2021

ogabrielides referenced this in commit df7ec9004d on Sep 15, 2021

ogabrielides referenced this in commit 5167cbe911 on Sep 16, 2021

PastaPastaPasta referenced this in commit 0cb3dd5b87 on Sep 16, 2021

gades referenced this in commit d9a10dfe57 on May 15, 2022

DrahtBot locked this on Oct 30, 2022

Use RdSeed when available, and reduce RdRand load #15250