This PR want to fix #31817 by added a maximum retry limit (max_retries) to the GetRNDRRS function to prevent it from entering an infinite loop if the hardware random number generator fails to return a valid random number. This change improves stability and ensures that the function terminates after a predefined number of retries.
random: Check `GetRNDRRS` is supported in `InitHardwareRand` to avoid infinite loop #31826
pull eval-exec wants to merge 1 commits into bitcoin:master from eval-exec:exec/add-retry-limit-for-GetRNDRRS changing 1 files +34 −11-
eval-exec commented at 10:17 AM on February 8, 2025: contributor
-
DrahtBot commented at 10:17 AM on February 8, 2025: contributor
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--006a51241073e994b41acfe9ec718e94-->
Code Coverage & Benchmarks
For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31826.
<!--021abf342d371248e50ceaed478a90ca-->
Reviews
See the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
<!--174a7506f384e20aa4161008e828411d-->
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
- bitcoin deleted a comment on Feb 8, 2025
-
maflcko commented at 2:34 PM on February 8, 2025: member
Instead of changing the code and repeating the code changes in the description, it would be better to actually explain why the problem happens and why this fix is the correct fix.
To me this seems like a problem where a system claims to support a feature, but doesn't? So it would be better to instead investigate and fix the feature test or feature reporting.
-
in src/random.cpp:245 in d2f540a5ff outdated
241 | __asm__ volatile("mrs %0, s3_3_c2_c4_1; cset %w1, ne;" 242 | : "=r"(r1), "=r"(ok)::"cc"); 243 | if (ok) break; 244 | __asm__ volatile("yield"); 245 | - } while (true); 246 | + } while (--max_retries > 0);
sipa commented at 2:35 PM on February 8, 2025:When this fails, do we want to fall back to
GetRNSRinstead, rather than returning whatever arbitrary value was left in the register?
eval-exec commented at 2:33 AM on February 9, 2025:Agree, I prefer fallback to
GetRNSR.
eval-exec commented at 2:40 AM on February 9, 2025:Maybe should fallback to
GetRNDR?
sipa commented at 2:51 AM on February 9, 2025:I think it would be even better to do a trial run inside
InitHardwareRand, so this condition can be detected, and the feature can be considered unsupported (so future calls to GetRNDRS will not even bother).eval-exec commented at 2:35 AM on February 9, 2025: contributorTo me this seems like a problem where a system claims to support a feature, but doesn't?
I read the
RNDRRS's documentation, We have checked if bitcoin support this feature by:g_rndr_supported:#elif defined(__aarch64__) && defined(HWCAP2_RNG) bool g_rndr_supported = false; void InitHardwareRand() { if (getauxval(AT_HWCAP2) & HWCAP2_RNG) { g_rndr_supported = true; } }it would be better to instead investigate and fix the feature test or feature reporting.
Agree, it need time to invistigate this bug.
And I did some research, this CPU have bug on
rndrrs:04:28 <HdkR> I found out yesterday that the bitcoin project uses rndrrs if it is available on aarch64 to fill its entropy pool. It spins in a loop until its pool is filled. 04:28 <HdkR> This means on Oryon it will hang at startup because of the bugged rndrrs :D
Ref:
DrahtBot commented at 3:23 AM on February 9, 2025: contributor<!--85328a0da195eb286784d51f73fa0af9-->
🚧 At least one of the CI tasks failed. <sub>Debug: https://github.com/bitcoin/bitcoin/runs/36907425995</sub>
<details><summary>Hints</summary>
Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
</details>
DrahtBot added the label CI failed on Feb 9, 2025eval-exec force-pushed on Feb 9, 2025eval-exec force-pushed on Feb 9, 2025DrahtBot removed the label CI failed on Feb 9, 2025eval-exec force-pushed on Feb 9, 2025eval-exec requested review from sipa on Feb 9, 2025eval-exec requested review from maflcko on Feb 9, 2025in src/random.cpp:201 in ccd8906c22 outdated
210 | 211 | void InitHardwareRand() 212 | { 213 | if (getauxval(AT_HWCAP2) & HWCAP2_RNG) { 214 | g_rndr_supported = true; 215 | + g_rndrrs_supported = VerifyRNDRRS();
sipa commented at 2:38 PM on February 9, 2025:You'll also need to actually use this variable inside
GetRNDRRS(), to fall back toGetRNDR()instead.in src/random.cpp:226 in ccd8906c22 outdated
221 | @@ -206,6 +222,9 @@ void ReportHardwareRand() 222 | // from global constructors, before logging is initialized. 223 | if (g_rndr_supported) { 224 | LogPrintf("Using RNDR and RNDRRS as additional entropy sources\n"); 225 | + if (!g_rndrrs_supported) { 226 | + LogPrintf("Did trial test by VerifyRNDRRS, RNDRRS failed 1000 times, falling back to RNDR\n");
sipa commented at 2:38 PM on February 9, 2025:Nit: perhaps just say "Using RNDR as additional entropy source" here, instead of claiming that RNDRRS is used, but then disabled?
eval-exec commented at 2:50 PM on February 9, 2025:Thank you.
eval-exec force-pushed on Feb 9, 2025in src/random.cpp:254 in 656aec4af8 outdated
250 | @@ -233,22 +251,18 @@ uint64_t GetRNDR() noexcept 251 | */ 252 | uint64_t GetRNDRRS() noexcept 253 | { 254 | + if (!g_rndrrs_supported) return GetRNDR();
sipa commented at 2:54 PM on February 9, 2025:Even better, use this directly in the only place this function is called:
SeedHardwareSlow, instead of theg_rndr_supportedthat is used there now.
eval-exec commented at 2:57 PM on February 9, 2025:Thank you, Updated.
eval-exec requested review from sipa on Feb 9, 2025eval-exec force-pushed on Feb 9, 2025in src/random.cpp:225 in daa1fbe881 outdated
221 | // This must be done in a separate function, as InitHardwareRand() may be indirectly called 222 | // from global constructors, before logging is initialized. 223 | - if (g_rndr_supported) { 224 | + if (g_rndr_supported && g_rndrrs_supported) { 225 | LogPrintf("Using RNDR and RNDRRS as additional entropy sources\n"); 226 | + }else if (g_rndr_supported) {
sipa commented at 3:05 PM on February 9, 2025:Coding style nit: space after
}.sipa commented at 3:05 PM on February 9, 2025: memberCode review ACK daa1fbe88171042ae39b257767137c8ba652e046. Please squash your commits.
eval-exec force-pushed on Feb 9, 2025eval-exec force-pushed on Feb 9, 2025sipa commented at 4:39 PM on February 9, 2025: memberutACK 03beb9c0bf2cb4a4022a1f9f94a6093fb4021154
in src/random.cpp:206 in 03beb9c0bf outdated
201 | + int max_retries = 1000; 202 | + do { 203 | + __asm__ volatile("mrs %0, s3_3_c2_c4_1; cset %w1, ne;" 204 | + : "=r"(test), "=r"(ok)::"cc"); 205 | + if (ok) return true; 206 | + __asm__ volatile("yield");
laanwj commented at 8:12 AM on February 11, 2025:yielding a 1000 timeshares to check if the RNG is working is probably overkill, i'd say if it fails like, 10 times in a row that's already enough indication we shouldn't use it?
in src/random.cpp:320 in 03beb9c0bf outdated
319 | + uint64_t out; 320 | + if (g_rndrrs_supported) { 321 | + out = GetRNDRRS(); 322 | + } else { 323 | + out = GetRNDR(); 324 | + }
laanwj commented at 8:19 AM on February 11, 2025:As we already use
GetRNDRinSeedHardwareFastwhich is called way more often thanSeedHardwareSlow(including inSeedSlow) i'm not sure if calling it here has any additional value. Let's just skip this loop if rndrrs isn't supported.
eval-exec commented at 1:53 PM on February 11, 2025:Thank you, updated.
sipa commented at 2:54 PM on February 11, 2025:There is one difference, in that
SeedHardwareFastonly tries to get 64 bits of entropy, whileSeedHardwareSlowtries to get a full (and fresh) 256 bits, if possible. But without the rndrrs instruction, the difference isn't all that significant (especially asSeedSlowalso gets 256 bit of entropy from the OS.laanwj commented at 8:23 AM on February 11, 2025: memberConcept ACK.
Please update the PR title to reflect the new set of changes.
eval-exec force-pushed on Feb 11, 2025eval-exec requested review from laanwj on Feb 11, 2025eval-exec requested review from sipa on Feb 11, 2025eval-exec renamed this:Limit retries in GetRNDRRS to avoid infinite loop
Check `GetRNDRRS` is supported in `InitHardwareRand` to avoid infinite loop
on Feb 11, 2025eval-exec renamed this:Check `GetRNDRRS` is supported in `InitHardwareRand` to avoid infinite loop
random: Check `GetRNDRRS` is supported in `InitHardwareRand` to avoid infinite loop
on Feb 11, 2025laanwj commented at 3:21 PM on February 11, 2025: memberCode review ACK a2ae62fb4db5e014c03a7ad41ba94ca49fd3806e i do not have RNDR-supporting hardware to test this
in src/random.cpp:318 in a2ae62fb4d outdated
314 | @@ -297,6 +315,9 @@ void SeedHardwareSlow(CSHA512& hasher) noexcept { 315 | #elif defined(__aarch64__) && defined(HWCAP2_RNG) 316 | if (g_rndr_supported) { 317 | for (int i = 0; i < 4; ++i) { 318 | + if (!g_rndrrs_supported) {
sipa commented at 3:23 PM on February 11, 2025:Nit: the entire loop can be skipped here (replace
if (g_rndr_supported) {withif (g_rndrrs_support) {).
laanwj commented at 3:24 PM on February 11, 2025:Eh yes, that would be better.
eval-exec commented at 3:29 PM on February 11, 2025:Yes, Updated.
DrahtBot requested review from sipa on Feb 11, 2025eval-exec force-pushed on Feb 11, 2025sipa commented at 3:30 PM on February 11, 2025: memberCode review ACK 585aba6eec858e5b1411ae9a8684ef8f82a7e435. I have not tested this.
DrahtBot requested review from laanwj on Feb 11, 2025laanwj commented at 3:34 PM on February 11, 2025: membergiahuy98 commented at 7:00 AM on February 12, 2025: nonere-ACK 585aba6
i could test it on amazon graviton again like i did the original PR, but only the good-weather case, that won't reproduce the edge case that led to this.
Would be good if @giahuy98 (who opened #31817) could test it on the specific target hardware.
I tested 585aba6 on a Samsung Galaxy S25, and it works. Thanks.
maflcko added this to the milestone 29.0 on Feb 12, 2025laanwj commented at 10:48 AM on February 12, 2025: memberChecked on amazon r7g.xlarge (Graviton 3):
2025-02-12T10:07:35Z Bitcoin Core version v28.99.0-585aba6eec85 (release build) 2025-02-12T10:07:35Z Using the 'arm_shani(1way,2way)' SHA256 implementation 2025-02-12T10:07:35Z Using RNDR and RNDRRS as additional entropy sourcesAlso verified that both
GetRNDRandGetRNDRRSare still being called.As baseline i also built and tested on rpi5 (ARM64 which has no support for RNG). It is still correctly not detected, no invalid instruction errors happen.
Combined with the report above that the workaround works, this should be safe to merge in 29.0 and maybe even backported to 28.x.
glozow added the label Bug on Feb 13, 2025glozow assigned achow101 on Feb 13, 2025glozow requested review from theuni on Feb 13, 2025in src/random.cpp:195 in 585aba6eec outdated
191 | @@ -192,20 +192,38 @@ uint64_t GetRdSeed() noexcept 192 | #elif defined(__aarch64__) && defined(HWCAP2_RNG) 193 | 194 | bool g_rndr_supported = false; 195 | +bool g_rndrrs_supported = false;
theuni commented at 6:39 PM on February 13, 2025:In case it helps other reviewers... no need for static for these because of the anon namespace.
in src/random.cpp:199 in 585aba6eec outdated
191 | @@ -192,20 +192,38 @@ uint64_t GetRdSeed() noexcept 192 | #elif defined(__aarch64__) && defined(HWCAP2_RNG) 193 | 194 | bool g_rndr_supported = false; 195 | +bool g_rndrrs_supported = false; 196 | + 197 | +bool VerifyRNDRRS() 198 | +{ 199 | + uint8_t ok;
theuni commented at 6:42 PM on February 13, 2025:I think
ok~andtest~ should be initialized to sane values because some sanitizers (and reviewers :p) aren't clever enough to know if they're guaranteed to be set by the asm.edit: I guess the same goes for the other copy of this.
eval-exec commented at 1:34 AM on February 14, 2025:Agree, I will append a commit to fix this.
theuni commented at 7:09 PM on February 13, 2025: memberCode review ACK 585aba6eec858e5b1411ae9a8684ef8f82a7e435. Untested. A few (non-blocking) notes, assuming this works as-is.
- It's unclear to me under what scenario the call might fail. OpenSSL does the same loop, I assume that's why it's done here? I guess it's pretty unlikely that it passes the <10 failures check at startup, then fails lots of times during execution.
- The code duplication is unfortunate. I would've preferred to see a single factored-out function that takes a reference to a
uint64_tand returns a bool. - gcc (since v10) and clang (since v13) have intrinsics for
__rndrand__rndrrsthat would imo be preferable.
in src/random.cpp:209 in 585aba6eec outdated
204 | + : "=r"(test), "=r"(ok)::"cc"); 205 | + if (ok) return true; 206 | + __asm__ volatile("yield"); 207 | + } while (--max_retries > 0); 208 | + return false; 209 | +}
achow101 commented at 8:48 PM on February 13, 2025:This function basically duplicates
GetRNDRRS(); it should instead be refactored to avoid duplicating.
eval-exec commented at 1:34 AM on February 14, 2025:Agreed, I'll append a commit to refactor to avoid duplication with
GetRNDRRS().
eval-exec commented at 3:05 AM on February 15, 2025:Thank you, I refactored, could you please take another look?
achow101 commented at 3:14 AM on February 15, 2025:I don't think there's any need to have these be separate commits, please squash.
eval-exec commented at 3:36 AM on February 15, 2025:Squashed.
laanwj commented at 10:08 PM on February 13, 2025: memberIt's unclear to me under what scenario the call might fail
The ARM documentation isn't very specific on that:
If the hardware returns a genuine random number, PSTATE.NZCV is set to 0b0000.
If the instruction cannot return a genuine random number in a reasonable period of time, PSTATE.NZCV is set to 0b0100 and the data value returned is 0.
i imagine there's some internal conditions under which the RNG module can't deliver a value quickly enough. What can be concluded is that
!okis a temporary failure, for chips working according to spec. So the normal usage (as we have before this PR, and in OpenSSL and such) is to keep trying forever. This simplifies the code paths, so that callers don't have to handle a "randomness failed" case.The broken chips always return
!okfor RNDRRS, the reason for retrying 10 times in the detection loop is to prevent false positives. That is, a working one might fail temporarily, but not for (say) 10 consecutive timeshares, so it won't be detected as broken.gcc (since v10) and clang (since v13) have intrinsics for __rndr and __rndrrs that would imo be preferable.
Yes. But that's imo out of scope for this workaround PR.
in src/random.cpp:210 in 585aba6eec outdated
220 | { 221 | // This must be done in a separate function, as InitHardwareRand() may be indirectly called 222 | // from global constructors, before logging is initialized. 223 | - if (g_rndr_supported) { 224 | + if (g_rndr_supported && g_rndrrs_supported) { 225 | LogPrintf("Using RNDR and RNDRRS as additional entropy sources\n");
Sjors commented at 2:33 PM on February 14, 2025:It seems neither of these are available on Apple Silicon (
HWCAP2_RNGisn't defined), so not much to test here.theuni commented at 8:46 PM on February 14, 2025: membergcc (since v10) and clang (since v13) have intrinsics for __rndr and __rndrrs that would imo be preferable.
Yes. But that's imo out of scope for this workaround PR.
For sure out of scope, yes, I didn't mean to imply that should be done here.
achow101 referenced this in commit 254fd89d39 on Feb 14, 2025eval-exec force-pushed on Feb 15, 2025eval-exec force-pushed on Feb 15, 2025eval-exec requested review from achow101 on Feb 15, 2025eval-exec requested review from theuni on Feb 15, 2025eval-exec requested review from Sjors on Feb 15, 2025eval-exec force-pushed on Feb 15, 202509b150bb8aIn `InitHardwareRand`, do trail test for `RNDRRS` by `VerifyRNDRRS`
Signed-off-by: Eval EXEC <execvy@gmail.com>
eval-exec force-pushed on Feb 15, 2025luke-jr referenced this in commit 06c0746152 on Feb 15, 2025sipa commented at 2:04 PM on February 17, 2025: memberutACK 09b150bb8adae00854f02ece69fc6ef222fb07d9
achow101 commented at 2:15 AM on February 19, 2025: memberACK 09b150bb8adae00854f02ece69fc6ef222fb07d9
achow101 merged this on Feb 19, 2025achow101 closed this on Feb 19, 2025in src/random.cpp:201 in 09b150bb8a
196 | 197 | void InitHardwareRand() 198 | { 199 | if (getauxval(AT_HWCAP2) & HWCAP2_RNG) { 200 | g_rndr_supported = true; 201 | + g_rndrrs_supported = VerifyRNDRRS();
maflcko commented at 9:41 AM on February 19, 2025:I don't think it is possible in C++ to use a function before the function signature has been given to the compiler.
I fail to see how this compiles on any compiler. See https://godbolt.org/z/cYPda9784
Locally I get:
[ 40%] Building CXX object src/util/CMakeFiles/bitcoin_util.dir/__/random.cpp.o /__w/b-c-nightly/b-c-nightly/src/random.cpp:201:30: error: use of undeclared identifier 'VerifyRNDRRS' 201 | g_rndrrs_supported = VerifyRNDRRS(); | ^(clang)
eval-exec commented at 9:54 AM on February 19, 2025:Thanks for pointing it out. I've submitted a fix https://github.com/bitcoin/bitcoin/pull/31902
glozow commented at 3:25 PM on February 19, 2025:Another option would be to revert this.
eval-exec referenced this in commit 82f3d89c6d on Feb 19, 2025darosior referenced this in commit 3e9b12b3e0 on Feb 19, 2025LabelsMilestone
29.0
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-26 09:13 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me