random: Check GetRNDRRS is supported in InitHardwareRand to avoid infinite loop #31826

pull eval-exec wants to merge 1 commits into bitcoin:master from eval-exec:exec/add-retry-limit-for-GetRNDRRS changing 1 files +34 −11
  1. eval-exec commented at 10:17 am on February 8, 2025: contributor
    This PR want to fix #31817 by added a maximum retry limit (max_retries) to the GetRNDRRS function to prevent it from entering an infinite loop if the hardware random number generator fails to return a valid random number. This change improves stability and ensures that the function terminates after a predefined number of retries.
  2. DrahtBot commented at 10:17 am on February 8, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31826.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK sipa, achow101
    Stale ACK laanwj, theuni

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. bitcoin deleted a comment on Feb 8, 2025
  4. maflcko commented at 2:34 pm on February 8, 2025: member

    Instead of changing the code and repeating the code changes in the description, it would be better to actually explain why the problem happens and why this fix is the correct fix.

    To me this seems like a problem where a system claims to support a feature, but doesn’t? So it would be better to instead investigate and fix the feature test or feature reporting.

  5. in src/random.cpp:245 in d2f540a5ff outdated
    241         __asm__ volatile("mrs %0, s3_3_c2_c4_1; cset %w1, ne;"
    242                          : "=r"(r1), "=r"(ok)::"cc");
    243         if (ok) break;
    244         __asm__ volatile("yield");
    245-    } while (true);
    246+    } while (--max_retries > 0);
    


    sipa commented at 2:35 pm on February 8, 2025:
    When this fails, do we want to fall back to GetRNSR instead, rather than returning whatever arbitrary value was left in the register?

    eval-exec commented at 2:33 am on February 9, 2025:
    Agree, I prefer fallback to GetRNSR.

    eval-exec commented at 2:40 am on February 9, 2025:
    Maybe should fallback to GetRNDR?

    sipa commented at 2:51 am on February 9, 2025:
    I think it would be even better to do a trial run inside InitHardwareRand, so this condition can be detected, and the feature can be considered unsupported (so future calls to GetRNDRS will not even bother).
  6. eval-exec commented at 2:35 am on February 9, 2025: contributor

    To me this seems like a problem where a system claims to support a feature, but doesn’t?

    I read the RNDRRS’s documentation, We have checked if bitcoin support this feature by:g_rndr_supported:

    0#elif defined(__aarch64__) && defined(HWCAP2_RNG)
    1
    2bool g_rndr_supported = false;
    3
    4void InitHardwareRand()
    5{
    6    if (getauxval(AT_HWCAP2) & HWCAP2_RNG) {
    7        g_rndr_supported = true;
    8    }
    9}
    

    it would be better to instead investigate and fix the feature test or feature reporting.

    Agree, it need time to invistigate this bug.

    And I did some research, this CPU have bug on rndrrs:

    04:28 I found out yesterday that the bitcoin project uses rndrrs if it is available on aarch64 to fill its entropy pool. It spins in a loop until its pool is filled. 04:28 This means on Oryon it will hang at startup because of the bugged rndrrs :D

    Ref:

    1. https://oftc.irclog.whitequark.org/aarch64-laptops/2024-07-21#1721536104-1721536124;
    2. https://oftc.irclog.whitequark.org/aarch64-laptops/2024-06-18#33280488;
  7. DrahtBot commented at 3:23 am on February 9, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36907425995

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  8. DrahtBot added the label CI failed on Feb 9, 2025
  9. eval-exec force-pushed on Feb 9, 2025
  10. eval-exec force-pushed on Feb 9, 2025
  11. DrahtBot removed the label CI failed on Feb 9, 2025
  12. eval-exec force-pushed on Feb 9, 2025
  13. eval-exec requested review from sipa on Feb 9, 2025
  14. eval-exec requested review from maflcko on Feb 9, 2025
  15. in src/random.cpp:201 in ccd8906c22 outdated
    210 
    211 void InitHardwareRand()
    212 {
    213     if (getauxval(AT_HWCAP2) & HWCAP2_RNG) {
    214         g_rndr_supported = true;
    215+        g_rndrrs_supported = VerifyRNDRRS();
    


    sipa commented at 2:38 pm on February 9, 2025:
    You’ll also need to actually use this variable inside GetRNDRRS(), to fall back to GetRNDR() instead.
  16. in src/random.cpp:226 in ccd8906c22 outdated
    221@@ -206,6 +222,9 @@ void ReportHardwareRand()
    222     // from global constructors, before logging is initialized.
    223     if (g_rndr_supported) {
    224         LogPrintf("Using RNDR and RNDRRS as additional entropy sources\n");
    225+        if (!g_rndrrs_supported) {
    226+            LogPrintf("Did trial test by VerifyRNDRRS, RNDRRS failed 1000 times, falling back to RNDR\n");
    


    sipa commented at 2:38 pm on February 9, 2025:
    Nit: perhaps just say “Using RNDR as additional entropy source” here, instead of claiming that RNDRRS is used, but then disabled?

    eval-exec commented at 2:50 pm on February 9, 2025:
    Thank you.
  17. eval-exec force-pushed on Feb 9, 2025
  18. in src/random.cpp:254 in 656aec4af8 outdated
    250@@ -233,22 +251,18 @@ uint64_t GetRNDR() noexcept
    251  */
    252 uint64_t GetRNDRRS() noexcept
    253 {
    254+    if (!g_rndrrs_supported) return GetRNDR();
    


    sipa commented at 2:54 pm on February 9, 2025:
    Even better, use this directly in the only place this function is called: SeedHardwareSlow, instead of the g_rndr_supported that is used there now.

    eval-exec commented at 2:57 pm on February 9, 2025:
    Thank you, Updated.
  19. eval-exec requested review from sipa on Feb 9, 2025
  20. eval-exec force-pushed on Feb 9, 2025
  21. in src/random.cpp:225 in daa1fbe881 outdated
    221     // This must be done in a separate function, as InitHardwareRand() may be indirectly called
    222     // from global constructors, before logging is initialized.
    223-    if (g_rndr_supported) {
    224+    if (g_rndr_supported && g_rndrrs_supported) {
    225         LogPrintf("Using RNDR and RNDRRS as additional entropy sources\n");
    226+    }else if (g_rndr_supported) {
    


    sipa commented at 3:05 pm on February 9, 2025:
    Coding style nit: space after }.
  22. sipa commented at 3:05 pm on February 9, 2025: member
    Code review ACK daa1fbe88171042ae39b257767137c8ba652e046. Please squash your commits.
  23. eval-exec force-pushed on Feb 9, 2025
  24. eval-exec force-pushed on Feb 9, 2025
  25. sipa commented at 4:39 pm on February 9, 2025: member
    utACK 03beb9c0bf2cb4a4022a1f9f94a6093fb4021154
  26. in src/random.cpp:206 in 03beb9c0bf outdated
    201+    int max_retries = 1000;
    202+    do {
    203+        __asm__ volatile("mrs %0, s3_3_c2_c4_1; cset %w1, ne;"
    204+                         : "=r"(test), "=r"(ok)::"cc");
    205+        if (ok) return true;
    206+        __asm__ volatile("yield");
    


    laanwj commented at 8:12 am on February 11, 2025:
    yielding a 1000 timeshares to check if the RNG is working is probably overkill, i’d say if it fails like, 10 times in a row that’s already enough indication we shouldn’t use it?
  27. in src/random.cpp:320 in 03beb9c0bf outdated
    319+            uint64_t out;
    320+            if (g_rndrrs_supported) {
    321+                out = GetRNDRRS();
    322+            } else {
    323+                out = GetRNDR();
    324+            }
    


    laanwj commented at 8:19 am on February 11, 2025:
    As we already use GetRNDR in SeedHardwareFast which is called way more often than SeedHardwareSlow (including in SeedSlow) i’m not sure if calling it here has any additional value. Let’s just skip this loop if rndrrs isn’t supported.

    eval-exec commented at 1:53 pm on February 11, 2025:
    Thank you, updated.

    sipa commented at 2:54 pm on February 11, 2025:
    There is one difference, in that SeedHardwareFast only tries to get 64 bits of entropy, while SeedHardwareSlow tries to get a full (and fresh) 256 bits, if possible. But without the rndrrs instruction, the difference isn’t all that significant (especially as SeedSlow also gets 256 bit of entropy from the OS.
  28. laanwj commented at 8:23 am on February 11, 2025: member

    Concept ACK.

    Please update the PR title to reflect the new set of changes.

  29. eval-exec force-pushed on Feb 11, 2025
  30. eval-exec requested review from laanwj on Feb 11, 2025
  31. eval-exec requested review from sipa on Feb 11, 2025
  32. eval-exec renamed this:
    Limit retries in GetRNDRRS to avoid infinite loop
    Check `GetRNDRRS` is supported in `InitHardwareRand` to avoid infinite loop
    on Feb 11, 2025
  33. eval-exec renamed this:
    Check `GetRNDRRS` is supported in `InitHardwareRand` to avoid infinite loop
    random: Check `GetRNDRRS` is supported in `InitHardwareRand` to avoid infinite loop
    on Feb 11, 2025
  34. laanwj commented at 3:21 pm on February 11, 2025: member
    Code review ACK a2ae62fb4db5e014c03a7ad41ba94ca49fd3806e i do not have RNDR-supporting hardware to test this
  35. in src/random.cpp:318 in a2ae62fb4d outdated
    314@@ -297,6 +315,9 @@ void SeedHardwareSlow(CSHA512& hasher) noexcept {
    315 #elif defined(__aarch64__) && defined(HWCAP2_RNG)
    316     if (g_rndr_supported) {
    317         for (int i = 0; i < 4; ++i) {
    318+            if (!g_rndrrs_supported) {
    


    sipa commented at 3:23 pm on February 11, 2025:
    Nit: the entire loop can be skipped here (replace if (g_rndr_supported) { with if (g_rndrrs_support) {).

    laanwj commented at 3:24 pm on February 11, 2025:
    Eh yes, that would be better.

    eval-exec commented at 3:29 pm on February 11, 2025:
    Yes, Updated.
  36. DrahtBot requested review from sipa on Feb 11, 2025
  37. eval-exec force-pushed on Feb 11, 2025
  38. sipa commented at 3:30 pm on February 11, 2025: member
    Code review ACK 585aba6eec858e5b1411ae9a8684ef8f82a7e435. I have not tested this.
  39. DrahtBot requested review from laanwj on Feb 11, 2025
  40. laanwj commented at 3:34 pm on February 11, 2025: member

    re-ACK 585aba6eec858e5b1411ae9a8684ef8f82a7e435

    i could test it on amazon graviton again like i did the original PR, but only the good-weather case, that won’t reproduce the edge case that led to this.

    Would be good if @giahuy98 (who opened #31817) could test it on the specific target hardware.

  41. giahuy98 commented at 7:00 am on February 12, 2025: none

    re-ACK 585aba6

    i could test it on amazon graviton again like i did the original PR, but only the good-weather case, that won’t reproduce the edge case that led to this.

    Would be good if @giahuy98 (who opened #31817) could test it on the specific target hardware.

    I tested 585aba6 on a Samsung Galaxy S25, and it works. Thanks.

  42. maflcko added this to the milestone 29.0 on Feb 12, 2025
  43. laanwj commented at 10:48 am on February 12, 2025: member

    Checked on amazon r7g.xlarge (Graviton 3):

    02025-02-12T10:07:35Z Bitcoin Core version v28.99.0-585aba6eec85 (release build)
    12025-02-12T10:07:35Z Using the 'arm_shani(1way,2way)' SHA256 implementation
    22025-02-12T10:07:35Z Using RNDR and RNDRRS as additional entropy sources
    

    Also verified that both GetRNDR and GetRNDRRS are still being called.

    As baseline i also built and tested on rpi5 (ARM64 which has no support for RNG). It is still correctly not detected, no invalid instruction errors happen.

    Combined with the report above that the workaround works, this should be safe to merge in 29.0 and maybe even backported to 28.x.

  44. glozow added the label Bug on Feb 13, 2025
  45. glozow assigned achow101 on Feb 13, 2025
  46. glozow requested review from theuni on Feb 13, 2025
  47. in src/random.cpp:195 in 585aba6eec outdated
    191@@ -192,20 +192,38 @@ uint64_t GetRdSeed() noexcept
    192 #elif defined(__aarch64__) && defined(HWCAP2_RNG)
    193 
    194 bool g_rndr_supported = false;
    195+bool g_rndrrs_supported = false;
    


    theuni commented at 6:39 pm on February 13, 2025:
    In case it helps other reviewers… no need for static for these because of the anon namespace.
  48. in src/random.cpp:199 in 585aba6eec outdated
    191@@ -192,20 +192,38 @@ uint64_t GetRdSeed() noexcept
    192 #elif defined(__aarch64__) && defined(HWCAP2_RNG)
    193 
    194 bool g_rndr_supported = false;
    195+bool g_rndrrs_supported = false;
    196+
    197+bool VerifyRNDRRS()
    198+{
    199+    uint8_t ok;
    


    theuni commented at 6:42 pm on February 13, 2025:

    I think ok and test should be initialized to sane values because some sanitizers (and reviewers :p) aren’t clever enough to know if they’re guaranteed to be set by the asm.

    edit: I guess the same goes for the other copy of this.


    eval-exec commented at 1:34 am on February 14, 2025:
    Agree, I will append a commit to fix this.
  49. theuni commented at 7:09 pm on February 13, 2025: member

    Code review ACK 585aba6eec858e5b1411ae9a8684ef8f82a7e435. Untested. A few (non-blocking) notes, assuming this works as-is.

    1. It’s unclear to me under what scenario the call might fail. OpenSSL does the same loop, I assume that’s why it’s done here? I guess it’s pretty unlikely that it passes the <10 failures check at startup, then fails lots of times during execution.
    2. The code duplication is unfortunate. I would’ve preferred to see a single factored-out function that takes a reference to a uint64_t and returns a bool.
    3. gcc (since v10) and clang (since v13) have intrinsics for __rndr and __rndrrs that would imo be preferable.
  50. in src/random.cpp:209 in 585aba6eec outdated
    204+                         : "=r"(test), "=r"(ok)::"cc");
    205+        if (ok) return true;
    206+        __asm__ volatile("yield");
    207+    } while (--max_retries > 0);
    208+    return false;
    209+}
    


    achow101 commented at 8:48 pm on February 13, 2025:
    This function basically duplicates GetRNDRRS(); it should instead be refactored to avoid duplicating.

    eval-exec commented at 1:34 am on February 14, 2025:
    Agreed, I’ll append a commit to refactor to avoid duplication with GetRNDRRS().

    eval-exec commented at 3:05 am on February 15, 2025:
    Thank you, I refactored, could you please take another look?

    achow101 commented at 3:14 am on February 15, 2025:
    I don’t think there’s any need to have these be separate commits, please squash.

    eval-exec commented at 3:36 am on February 15, 2025:
    Squashed.
  51. laanwj commented at 10:08 pm on February 13, 2025: member

    It’s unclear to me under what scenario the call might fail

    The ARM documentation isn’t very specific on that:

    If the hardware returns a genuine random number, PSTATE.NZCV is set to 0b0000.

    If the instruction cannot return a genuine random number in a reasonable period of time, PSTATE.NZCV is set to 0b0100 and the data value returned is 0.

    i imagine there’s some internal conditions under which the RNG module can’t deliver a value quickly enough. What can be concluded is that !ok is a temporary failure, for chips working according to spec. So the normal usage (as we have before this PR, and in OpenSSL and such) is to keep trying forever. This simplifies the code paths, so that callers don’t have to handle a “randomness failed” case.

    The broken chips always return !ok for RNDRRS, the reason for retrying 10 times in the detection loop is to prevent false positives. That is, a working one might fail temporarily, but not for (say) 10 consecutive timeshares, so it won’t be detected as broken.

    gcc (since v10) and clang (since v13) have intrinsics for __rndr and __rndrrs that would imo be preferable.

    Yes. But that’s imo out of scope for this workaround PR.

  52. in src/random.cpp:210 in 585aba6eec outdated
    220 {
    221     // This must be done in a separate function, as InitHardwareRand() may be indirectly called
    222     // from global constructors, before logging is initialized.
    223-    if (g_rndr_supported) {
    224+    if (g_rndr_supported && g_rndrrs_supported) {
    225         LogPrintf("Using RNDR and RNDRRS as additional entropy sources\n");
    


    Sjors commented at 2:33 pm on February 14, 2025:
    It seems neither of these are available on Apple Silicon (HWCAP2_RNG isn’t defined), so not much to test here.
  53. theuni commented at 8:46 pm on February 14, 2025: member

    gcc (since v10) and clang (since v13) have intrinsics for __rndr and __rndrrs that would imo be preferable.

    Yes. But that’s imo out of scope for this workaround PR.

    For sure out of scope, yes, I didn’t mean to imply that should be done here.

  54. achow101 referenced this in commit 254fd89d39 on Feb 14, 2025
  55. eval-exec force-pushed on Feb 15, 2025
  56. eval-exec force-pushed on Feb 15, 2025
  57. eval-exec requested review from achow101 on Feb 15, 2025
  58. eval-exec requested review from theuni on Feb 15, 2025
  59. eval-exec requested review from Sjors on Feb 15, 2025
  60. eval-exec force-pushed on Feb 15, 2025
  61. In `InitHardwareRand`, do trail test for `RNDRRS` by `VerifyRNDRRS`
    Signed-off-by: Eval EXEC <execvy@gmail.com>
    09b150bb8a
  62. eval-exec force-pushed on Feb 15, 2025
  63. luke-jr referenced this in commit 06c0746152 on Feb 15, 2025
  64. sipa commented at 2:04 pm on February 17, 2025: member
    utACK 09b150bb8adae00854f02ece69fc6ef222fb07d9
  65. achow101 commented at 2:15 am on February 19, 2025: member
    ACK 09b150bb8adae00854f02ece69fc6ef222fb07d9
  66. achow101 merged this on Feb 19, 2025
  67. achow101 closed this on Feb 19, 2025

  68. in src/random.cpp:201 in 09b150bb8a
    196 
    197 void InitHardwareRand()
    198 {
    199     if (getauxval(AT_HWCAP2) & HWCAP2_RNG) {
    200         g_rndr_supported = true;
    201+        g_rndrrs_supported = VerifyRNDRRS();
    


    maflcko commented at 9:41 am on February 19, 2025:

    I don’t think it is possible in C++ to use a function before the function signature has been given to the compiler.

    I fail to see how this compiles on any compiler. See https://godbolt.org/z/cYPda9784

    Locally I get:

    0[ 40%] Building CXX object src/util/CMakeFiles/bitcoin_util.dir/__/random.cpp.o
    1/__w/b-c-nightly/b-c-nightly/src/random.cpp:201:30: error: use of undeclared identifier 'VerifyRNDRRS'
    2  201 |         g_rndrrs_supported = VerifyRNDRRS();
    3      |                              ^
    

    (clang)


    eval-exec commented at 9:54 am on February 19, 2025:
    Thanks for pointing it out. I’ve submitted a fix https://github.com/bitcoin/bitcoin/pull/31902

    glozow commented at 3:25 pm on February 19, 2025:
    Another option would be to revert this.
  69. eval-exec referenced this in commit 82f3d89c6d on Feb 19, 2025
  70. darosior referenced this in commit 3e9b12b3e0 on Feb 19, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-02-22 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me