Several randomness improvements #29625

pull sipa wants to merge 20 commits into bitcoin:master from sipa:202403_rand_rework changing 31 files +738 −554
  1. sipa commented at 7:36 pm on March 11, 2024: member

    This PR contains a number of vaguely-related improvements to the random module.

    The specific changes and more detailed rationale is in the commit messages, but the highlights are:

    • XoRoShiRo128PlusPlus (previously a test-only RNG) moves to random.h and becomes InsecureRandomContext, which is even faster than FastRandomContext but non-cryptographic. It also gets all helper randomness functions (randrange, fillrand, …), making it a lot more succinct to use.
    • During tests, all randomness is made deterministic (except for GetStrongRandBytes) but non-repeating (like GetRand() used to be when g_mock_deterministic_tests was used), either fixed, or from a random seed (overridden by env var).
    • Several infrequently used top-level functions (GetRandMillis, GetRandMicros, GetExponentialRand) are converted into member functions of FastRandomContext (and InsecureRandomContext).
    • GetRand<T>() (without argument) can now return the maximum value of the type (previously e.g. GetRand<uint32_t>() would never return 0xffffffff).
  2. DrahtBot commented at 7:36 pm on March 11, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK theuni
    Stale ACK dergoegge, EthanHeilman

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30277 ([DO NOT MERGE] Erlay: bandwidth-efficient transaction relay protocol (Full implementation) by sr-gi)
    • #30202 (netbase: extend CreateSock() to support creating arbitrary sockets by vasild)
    • #29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)
    • #29543 (refactor: Avoid unsigned integer overflow in script/interpreter.cpp by hebasto)
    • #29536 (fuzz: fuzz connman with non-empty addrman + ASMap by brunoerg)
    • #29480 (Drop log category in SeedStartup by hebasto)
    • #26114 (net: Make AddrFetch connections to fixed seeds by mzumsande)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label CI failed on Mar 12, 2024
  4. DrahtBot commented at 4:41 am on March 12, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/22530315845

  5. dergoegge commented at 10:09 am on March 12, 2024: member

    Concept ACK

    There are a few places in the fuzz tests where this will allow to easily replace FastRandomContext with a InsecureRandomContext, which is beneficial for performance (e.g. the addrman harnesses partially fill the addrman with addresses from rng) and we don’t need cryptographic rng there anyway.

    During tests, all randomness is made deterministic

    Great, this should help with #29018

  6. Sjors commented at 3:11 pm on March 12, 2024: member
    What’s the impact on the fuzz corpus of switching to a different (?) deterministic RNG?
  7. dergoegge commented at 3:53 pm on March 12, 2024: member

    What’s the impact on the fuzz corpus of switching to a different (?) deterministic RNG?

    I would expect that switching to a different rng should have no meaningful effect on the corpus itself. The corpus for a particular harness might change but the coverage for the code we intend to test should remain the same. This is because using rng in a fuzz harness only makes sense in very rare cases. It should never be used in a way that can significantly affect the coverage reached, otherwise there is no point in using a coverage-guided fuzzer, we could just pipe /dev/random to our harnesses.

    For example, if we need to populate some data that we don’t really expect to have an impact on the thing we are testing, we might use rng instead of consuming from the fuzz input (we do this in the p2p transport harnesses to fill message contents, which are essentially irrelevant to the transport logic).

    Switching to deterministic rng can cause a corpus’ coverage to grow because coverage-guided feedback loops start working more reliably when the code under test is deterministic. This can vary from harness to harness, but we’ve seen coverage-guided fuzzers find bugs once we’ve improved on determinism.

  8. sipa force-pushed on Mar 12, 2024
  9. sipa force-pushed on Mar 12, 2024
  10. sipa force-pushed on Mar 13, 2024
  11. sipa force-pushed on Mar 13, 2024
  12. sipa force-pushed on Mar 13, 2024
  13. sipa force-pushed on Mar 13, 2024
  14. sipa force-pushed on Mar 13, 2024
  15. sipa force-pushed on Mar 13, 2024
  16. sipa force-pushed on Mar 13, 2024
  17. sipa force-pushed on Mar 13, 2024
  18. sipa force-pushed on Mar 14, 2024
  19. sipa force-pushed on Mar 14, 2024
  20. sipa force-pushed on Mar 14, 2024
  21. sipa force-pushed on Mar 14, 2024
  22. sipa force-pushed on Mar 14, 2024
  23. sipa force-pushed on Mar 14, 2024
  24. sipa force-pushed on Mar 14, 2024
  25. sipa force-pushed on Mar 14, 2024
  26. sipa force-pushed on Mar 14, 2024
  27. sipa commented at 11:21 pm on March 14, 2024: member
    I’m baffled by what is happening here. In the fuzz test CI job, the data directory gets deleted in between two fuzz input runs??
  28. maflcko commented at 8:57 am on March 15, 2024: member

    This is expected, because all fuzz tests in this pull request branch use the same datadir (5a19fd2bc079272a9b138710c6e50c5ea512c1a0736333c9817c4df1d77c8518). Maybe the randomness is too deterministic and returns a constant?

    You can double check with something like:

     0diff --git a/src/test/util/setup_common.cpp b/src/test/util/setup_common.cpp
     1index 2c18184261..422f495246 100644
     2--- a/src/test/util/setup_common.cpp
     3+++ b/src/test/util/setup_common.cpp
     4@@ -153,6 +153,7 @@ BasicTestingSetup::BasicTestingSetup(const ChainType chainType, const std::vecto
     5         const auto rand_str{g_insecure_rand_ctx_temp_path.rand256().ToString()};
     6         m_path_root = fs::temp_directory_path() / "test_common_" PACKAGE_NAME / rand_str;
     7         TryCreateDirectories(m_path_root);
     8+        Assert(util::LockDirectory(m_path_root, ".lock", /*probe_only=*/false) == util::LockResult::Success) ;
     9     } else {
    10         // Custom data directory
    11         m_has_custom_datadir = true;
    

    which will print:

    0Run addrman_serdeser with args ['/bitcoin-core/src/test/fuzz/fuzz', '-runs=1', PosixPath('folder_3/addrman_serdeser')]test/util/setup_common.cpp:156 BasicTestingSetup: Assertion `util::LockDirectory(m_path_root, ".lock", false) == util::LockResult::Success' failed.
    

    Also, the issue should happen locally as well, not only on CI. Something like ./test/fuzz/test_runner.py -j 9 -l DEBUG ./folder_temp reproduces it for me.

    Edit:

    025d4e22d8b25c33f60b91193673336e4ad1c9406 is the first bad commit
    
  29. sipa force-pushed on Mar 15, 2024
  30. sipa commented at 12:55 pm on March 15, 2024: member
    @maflcko Thank you! I suspect that is it. I indeed made all fuzz test use the same deterministic seed. I’ve dropped that change now.
  31. DrahtBot removed the label CI failed on Mar 15, 2024
  32. sipa force-pushed on Mar 15, 2024
  33. sipa force-pushed on Mar 15, 2024
  34. sipa force-pushed on Mar 15, 2024
  35. DrahtBot added the label CI failed on Mar 15, 2024
  36. DrahtBot commented at 7:04 pm on March 15, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/22719690018

  37. DrahtBot removed the label CI failed on Mar 15, 2024
  38. sipa commented at 3:50 pm on March 17, 2024: member
    Ready for review.
  39. DrahtBot added the label Needs rebase on Mar 19, 2024
  40. sipa force-pushed on Mar 19, 2024
  41. sipa force-pushed on Mar 19, 2024
  42. DrahtBot removed the label Needs rebase on Mar 19, 2024
  43. sipa commented at 1:23 pm on March 23, 2024: member

    @Sjors

    What’s the impact on the fuzz corpus of switching to a different (?) deterministic RNG?

    Let’s break it down into cases:

    • The main “random.h” RNG (GetRand() and friends, default-constructed FastRandomContext objects, …). This used to be truly random in fuzz tests, and will now be deterministic with a fixed seed. In theory this should have no impact, because the fuzz tests shouldn’t be relying on this randomness in the first place. But possibly there are some which do, indirectly through code that wasn’t properly mocked or otherwise avoiding it, in which case making things deterministic should be a strict improvement by not making the fuzzer waste time on chasing the effects of that randomness.
    • XoRoShiRo128PlusPlus. There are a few fuzz tests (bip324, crypto_chacha20, p2p_transport_serialization, poolresource) that use this very fast deterministic RNG to construct certain data. This PR changes the behavior of some of them by replacing ad-hoc code to use that randomness with general helper functions that become available for all RNGs. In theory this might invalidate part of the fuzz corpus for those tests, but in practice I expect it won’t, because the data drawn from those RNGs is data that shouldn’t matter for the test much (if it did, it’d be drawn from the fuzz input instead).

    So overall, it might invalidate a few tests’ corpus (but probably not), and for others it should either have no effect or be a strict improvement.

  44. EthanHeilman commented at 10:10 pm on April 5, 2024: contributor
    @sipa I plan to do a review of this next week
  45. in src/net_processing.cpp:5506 in 1779a37972 outdated
    5502@@ -5501,7 +5503,7 @@ void PeerManagerImpl::MaybeSendFeefilter(CNode& pto, Peer& peer, std::chrono::mi
    5503             MakeAndPushMessage(pto, NetMsgType::FEEFILTER, filterToSend);
    5504             peer.m_fee_filter_sent = filterToSend;
    5505         }
    5506-        peer.m_next_send_feefilter = GetExponentialRand(current_time, AVG_FEEFILTER_BROADCAST_INTERVAL);
    5507+        peer.m_next_send_feefilter = current_time + FastRandomContext().rand_expo_duration(AVG_FEEFILTER_BROADCAST_INTERVAL);
    


    dergoegge commented at 9:18 am on April 30, 2024:

    #28558 made PeerManager own a FastRandomContext, so we could (should?) use m_rng here instead (otherwise PeerManager::Options::deterministic_rng still only applies to some of the randomness).

    Since this PR kind of makes individual “make this component deterministic” options redudant, we could consider reverting #28558 (not necessarily in this PR)?

    I was thinking that in the long run we could break the dependencies between components and the specific rng they use (maybe something like template<RandomNumberGenerator R> class PeerManager { ... }), which would allow more fine grained mocking than a global “make rng deterministic” in tests (e.g. we could have a “rng” type that consumes from a FuzzedDataProvider). I guess this can be done by using globals as well.


    sipa commented at 7:55 pm on April 30, 2024:

    #28558 made PeerManager own a FastRandomContext, so we could (should?) use m_rng here instead (otherwise PeerManager::Options::deterministic_rng still only applies to some of the randomness).

    I’ve changed the PR to reuse PeerManagerImpl::m_rng.

    Since this PR kind of makes individual “make this component deterministic” options redudant, we could consider reverting #28558 (not necessarily in this PR)?

    Maybe. I’ve opted to use it where possible for now as it’s a smaller change, and has some (possibly negligible) performance advantage (no need to lock the global RNG mutex to get randomness when you already hold g_msgproc_mutex), but I think that can be reconsidered.

    Independently, we may be able to just drop PeerManager::Options::deterministic_rng, relying on global deterministic mode instead.

    I was thinking that in the long run we could break the dependencies between components and the specific rng they use (maybe something like template<RandomNumberGenerator R> class PeerManager { ... }), which would allow more fine grained mocking than a global “make rng deterministic” in tests (e.g. we could have a “rng” type that consumes from a FuzzedDataProvider). I guess this can be done by using globals as well.

    Maybe, though that means testing something very different from what we’re doing here: testing under conditions where the RNG returns actually decidedly non-random results (which is different from a deterministic FastRandomContext which is still cryptographically-strong, just deterministic. I don’t know for how many things this makes sense.

  46. sipa force-pushed on Apr 30, 2024
  47. DrahtBot added the label CI failed on Apr 30, 2024
  48. DrahtBot removed the label CI failed on May 2, 2024
  49. dergoegge approved
  50. dergoegge commented at 12:38 pm on May 20, 2024: member
    Code review ACK 636e4864516782b7e6e33be2b2ec7743afaae8ba
  51. in src/random.h:137 in 91f0e3cc55 outdated
    140- * is completely deterministic and does not gather more entropy after that.
    141+// Forward declaration of RandomMixin, used in RandomNumberGenerator concept.
    142+template<typename T>
    143+class RandomMixin;
    144+
    145+/** A concept for RandomMixin-based random number generators. */
    


    theuni commented at 4:32 pm on May 28, 2024:
    Neat :)
  52. in src/random.h:227 in f9bf239a93 outdated
    218@@ -219,6 +219,40 @@ class RandomMixin
    219         return ret & ((uint64_t{1} << bits) - 1);
    220     }
    221 
    222+    /** Same as above, but with compile-time fixed bits count. */
    223+    template<int Bits>
    224+    uint64_t randbits() noexcept
    225+    {
    226+        static_assert(Bits >= 0 && Bits <= 64);
    227+        if constexpr (Bits == 64) {
    


    theuni commented at 6:24 pm on May 28, 2024:
    Add a quick zero-case here too?

    sipa commented at 8:06 pm on May 28, 2024:
    I think I decided that the compiler should be smart enough to optimize the final “else” case with Bits == 0 to effectively a return 0;, but it’s more obvious to just make it explicit. Done.
  53. in src/random.h:283 in 89fa561ab9 outdated
    271+        while (span.size() >= 8) {
    272+            uint64_t gen = Impl().rand64();
    273+            WriteLE64(UCharCast(span.data()), gen);
    274+            span = span.subspan(8);
    275+        }
    276+        while (span.size()) {
    


    theuni commented at 6:39 pm on May 28, 2024:
    Why not do a rand32() if size() >= 4 first?

    sipa commented at 8:06 pm on May 28, 2024:
    Done.
  54. in src/net_processing.cpp:516 in 8504bfc5dc outdated
    507@@ -508,7 +508,7 @@ class PeerManagerImpl final : public PeerManager
    508         EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex, !m_recent_confirmed_transactions_mutex, !m_most_recent_block_mutex, g_msgproc_mutex);
    509 
    510     /** Implement PeerManager */
    511-    void StartScheduledTasks(CScheduler& scheduler) override;
    512+    void StartScheduledTasks(CScheduler& scheduler) override EXCLUSIVE_LOCKS_REQUIRED(g_msgproc_mutex);
    


    theuni commented at 6:59 pm on May 28, 2024:

    Unrelated change?

    Edit: Or because of use of m_rng?


    sipa commented at 7:44 pm on May 28, 2024:
    It appeared to be necessary, IIRC.

    maflcko commented at 9:32 am on June 15, 2024:

    It appeared to be necessary, IIRC.

    Yes, it is required. However, adding the annotation will only silence the compiler warning.

    FastRandomContext is not thread safe, so the lock will actually have to be taken.

    Putting an EXCLUSIVE_LOCKS_REQUIRED only into the c++ file and leaving it out from the header file will not work, because the compiler has no way to see the annotation outside of the module. For example, it can not be seen from init.cpp.


    sipa commented at 12:52 pm on June 15, 2024:
    Ok, apparently these annotations are not nearly as smart as I assumed they were!

    theuni commented at 4:38 pm on June 15, 2024:

    Lol, I even spent a few hours looking directly at this and trying to figure out if this could be a threading problem.

    🤦

    I suppose I trusted the annotations so much I didn’t even notice that the code didn’t take the lock. That’s embarrassing!

    Looks like this is our issue.

    I’ll give that patched branch a shot next week and see if it turns up anything else in our codebase. Will chime in upstream with a +1 as well (assuming it works).

  55. theuni commented at 7:13 pm on May 28, 2024: member

    Concept ACK and first-pass code review ACK excluding tests.

    I don’t feel qualified to review the rand_expo_duration changes.

    I manually verified the randbits impls with some local tests as the shifts weren’t intuitive to me at first glance.

  56. in src/random.h:407 in 89fa561ab9 outdated
    362@@ -351,19 +363,19 @@ class FastRandomContext : public RandomMixin<FastRandomContext>
    363         return ReadLE64(UCharCast(buf.data()));
    364     }
    365 
    366-    /** Fill a byte Span with random bytes. */
    367+    /** Fill a byte Span with random bytes. This overrides the RandomMixin version. */
    


    theuni commented at 7:17 pm on May 28, 2024:

    This is kinda sneaky and surprising (to me at least) to see with CRTP.

    It would be nice to specify as part of the concept which functions are not allowed to be overridden, but sadly std::experimental::detected_t is.. well.. experimental.


    sipa commented at 7:29 pm on May 28, 2024:

    Well being able to ad-hoc override individual functions, while still being able to have non-overridden functions be able to refer to it, is sort of the point of CRTP (virtual functions have the same behavior, but CRTP gives it without runtime overhead). If we didn’t want this ability, it’d be possible to just have a template<typename R> class RNGUtility { R m_int; ...}, where there are low-level ChaCha20RNG and XoRoShiRo128PlusPlusRNG classes, and high-level using FastRandomContext = RNGUtility<ChaCha20RNG>; and using InsecureRandomContext = RNGUtility<XoRoShiRo128PlusPlusRNG>.

    Being able to selectively mark certain functions as non-overridable would be nice of course, but I don’t see a way to do that currently.

  57. sipa force-pushed on May 28, 2024
  58. bitcoin deleted a comment on May 28, 2024
  59. DrahtBot added the label CI failed on May 28, 2024
  60. maflcko commented at 2:02 pm on May 29, 2024: member
     0 node0 stderr random.h:218:26: runtime error: shift exponent -23 is negative
     1    [#0](/bitcoin-bitcoin/0/) 0x61cb8c410ee5 in RandomMixin<FastRandomContext>::randbits(int) src/./random.h:218:26
     2    [#1](/bitcoin-bitcoin/1/) 0x61cb8c460b2c in long RandomMixin<FastRandomContext>::randrange<long>(long) src/./random.h:270:35
     3    [#2](/bitcoin-bitcoin/2/) 0x61cb8c460b2c in std::chrono::duration<long, std::ratio<1l, 1000l>> RandomMixin<FastRandomContext>::randrange<std::chrono::duration<long, std::ratio<1l, 1000l>>>(std::common_type<std::chrono::duration<long, std::ratio<1l, 1000l>>>::type) src/./random.h:352:27
     4    [#3](/bitcoin-bitcoin/3/) 0x61cb8c460b2c in (anonymous namespace)::PeerManagerImpl::StartScheduledTasks(CScheduler&) src/net_processing.cpp:2080:38
     5    [#4](/bitcoin-bitcoin/4/) 0x61cb8c340ea6 in AppInitMain(node::NodeContext&, interfaces::BlockAndHeaderTipInfo*) src/init.cpp:1980:37
     6    [#5](/bitcoin-bitcoin/5/) 0x61cb8c2fa474 in AppInit(node::NodeContext&) src/bitcoind.cpp:227:43
     7    [#6](/bitcoin-bitcoin/6/) 0x61cb8c2fa474 in main src/bitcoind.cpp:273:10
     8    [#7](/bitcoin-bitcoin/7/) 0x77f2e3bd21c9  (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 4d9090d61bf70e6b3225d583f0f08193f54670b2)
     9    [#8](/bitcoin-bitcoin/8/) 0x77f2e3bd228a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 4d9090d61bf70e6b3225d583f0f08193f54670b2)
    10    [#9](/bitcoin-bitcoin/9/) 0x61cb8c21cfe4 in _start (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/bitcoind+0xa1dfe4) (BuildId: 478a61c7d2f8deae95e82eb455a5b93dcb36ffa7)
    11SUMMARY: UndefinedBehaviorSanitizer: invalid-shift-exponent random.h:218:26 
    
  61. sipa commented at 4:12 pm on May 29, 2024: member
    @maflcko That makes absolutely no sense. The shift value is bits - bitbuf_size, and it’s inside a branch on conditional bits > bitbuf_size…?!
  62. in src/random.h:192 in 84dcdac575 outdated
    212+        other.bitbuf_size = 0;
    213+    }
    214 
    215-    /** Generate a random 64-bit integer. */
    216-    uint64_t rand64() noexcept
    217+    RandomMixin& operator=(RandomMixin&& other) noexcept
    


    theuni commented at 9:12 pm on May 29, 2024:

    While you’re at it, how about replacing these move assignments with Reseed() functions?

    I find the idea of moving an rng to be unintuitive. And as far as I can tell, every case of

    RandomMixin = foo and FastRandomContext = foo and InsecureRandomContext = foo (all in tests) really just means “reseed”.


    sipa commented at 2:39 pm on May 31, 2024:
    @theuni Nice idea, done.
  63. in src/random.h:218 in 6ce883eb1f outdated
    214@@ -215,6 +215,7 @@ class RandomMixin
    215             // number becomes the new bitbuf.
    216             uint64_t gen = Impl().rand64();
    217             ret = (gen << bitbuf_size) | bitbuf;
    218+            assert(bits >= bitbuf_size);
    


    maflcko commented at 11:06 am on May 30, 2024:
    0 node1 stderr bitcoind: random.h:218: uint64_t RandomMixin<FastRandomContext>::randbits(int) [T = FastRandomContext]: Assertion `bits >= bitbuf_size' failed. 
    

    I guess this is a compiler bug. I’ll try to take a look here.


    theuni commented at 3:32 pm on May 30, 2024:
    I tried to track down if/how it could be a race condition, but given that it’s in init and all locks seems to be in place, I don’t see how that could possibly be the case.

    maflcko commented at 6:52 am on May 31, 2024:
    Actually, I can’t reproduce it so far. So a corrupt CI machine is more likely.

    maflcko commented at 1:01 pm on May 31, 2024:
    Clearing the cache helped, for some reason. (Adding the assert should have already cleared the ccache entry for this translation unit, but now it seems to work :shrug: )

    sipa commented at 10:34 pm on May 31, 2024:
    @maflcko Thanks, problem seems fixed indeed!
  64. sipa force-pushed on May 31, 2024
  65. DrahtBot removed the label CI failed on May 31, 2024
  66. in src/random.h:200 in 309f00b594 outdated
    204-            return Impl().rand64() >> (64 - bits);
    205-        } else {
    206-            if (bitbuf_size < bits) FillBitBuffer();
    207-            uint64_t ret = bitbuf & (~uint64_t{0} >> (64 - bits));
    208+        // Requests for the full 64 bits are passed through.
    209+        if (bits == 64) return Impl().rand64();
    


    achow101 commented at 1:54 am on June 5, 2024:

    In 309f00b59485454117416dcfcae37e2c675069b7 “random: Improve RandomMixin::randbits”

    Should probably include bits > 64 here since it seems like larger numbers would be problematic below.

    0        if (bits >= 64) return Impl().rand64();
    

    Or maybe just assert that bits is in range.


    sipa commented at 1:32 pm on June 5, 2024:
    I’ve added an Assume that bits is in range.


    sipa commented at 1:24 pm on June 8, 2024:
    Seems I did not. Done now.
  67. achow101 commented at 2:17 am on June 5, 2024: member
    Reviewed up to 64bcc828cde51a5da551b49a31b9fe44e0a255c3
  68. in src/random.cpp:678 in 0b92cd2ae4 outdated
    673-    std::vector<B> ret(len);
    674-    fillrand(MakeWritableByteSpan(ret));
    675-    return ret;
    676-}
    677-template std::vector<unsigned char> FastRandomContext::randbytes(size_t);
    678-template std::vector<std::byte> FastRandomContext::randbytes(size_t);
    


    maflcko commented at 11:16 am on June 5, 2024:

    nit in 0b92cd2ae4a317be4fe0dfd208017a7d05d1eb2a:

    Looks like this commit is doing several things at once. It may be easier to review if move-only stuff (like moving rand256 and randbytes function bodies to the header file, and removing the explicit template instantiation) was done in a separate prepare commit?


    sipa commented at 12:25 pm on June 8, 2024:
    I’ve added a few preparatory commits before this one, please have a look.
  69. in src/random.h:144 in 0b92cd2ae4 outdated
    149+    { rng.rand64() } noexcept -> std::same_as<uint64_t>;
    150+    // A random number generator must provide randfill(Span<std::byte>).
    151+    { rng.fillrand(s) } noexcept;
    152+    // A random number generator must derive from RandomMixin, which adds other rand* functions.
    153+    requires std::derived_from<std::remove_reference_t<T>, RandomMixin<std::remove_reference_t<T>>>;
    154+};
    


    maflcko commented at 11:24 am on June 5, 2024:

    nit in 0b92cd2ae4a317be4fe0dfd208017a7d05d1eb2a: Not sure I understand this concept. Is there anything that is not already enforced normally in C++ that would be enforced by this concept?

    For example, static_cast in the Impl() method, which enforces this concept should already check for inheritance:

    0<source>:17:40: error: static_cast from 'Mixin<FRC> *' to 'FRC *', which are not related by inheritance, is not allowed
    

    Conversely, if the class wasn’t derived from RandomMixin, the RandomMixin::Impl() method wouldn’t be called and this check would be skipped anyway.

    Even when the code is rewritten to C++23 using __cpp_explicit_this_parameter, the same should hold.

    Similarly, calling rand64, when it is not provided should also result in a compile failure without this concept.

    I guess it enforces the function signature of rand64 is noexcept?


    sipa commented at 1:47 pm on June 5, 2024:
    Well in my view, concepts are really a kind of compiler-enforced documentation. Their primary purpose is giving a readable compiler error when template instantiation ought to be failing, but often also would be failing anyway. In this case, someone implementing a new RNG class FancyRNG : public RandomMixin<FancyRNG> but say missing rand64, would instead of getting an error in the instantiation of RandomMixin get an error message that references RandomNumberGenerator, which then explicitly lists what one is required to do to make it work.

    maflcko commented at 2:59 pm on June 5, 2024:
    Ok, fair enough.
  70. in src/random.h:447 in 6370a51aac outdated
    383+
    384+    constexpr explicit XoRoShiRo128PlusPlus(uint64_t seedval) noexcept
    385+        : m_s0(SplitMix64(seedval)), m_s1(SplitMix64(seedval)) {}
    386+
    387+    // no copy - that is dangerous, we don't want accidentally copy the RNG and then have two streams
    388+    // with exactly the same results. If you need a copy, call copy().
    


    maflcko commented at 1:08 pm on June 5, 2024:
    unrelated q in 6370a51aac06a430ec53f29fe03f5bbd08371b34: Can you explain where one finds this “copy()” method?

    sipa commented at 1:28 pm on June 8, 2024:
    It was added in #26153, with a commit originally taken from #25325. I assume it originally had an explicit copy() member function, which was removed before merge. I’ve added a commit to remove it.
  71. sipa force-pushed on Jun 5, 2024
  72. sipa force-pushed on Jun 5, 2024
  73. maflcko commented at 1:38 pm on June 5, 2024: member

    e7676b6c52672f557f0e2b036d5e877393f1ce46 🎩

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: e7676b6c52672f557f0e2b036d5e877393f1ce46 🎩
    3cxzra7fTi/Rsr6+8RvbXnVktwr1MJ/eJJsVn5hm1X2uUnnz3rNYpzIbtPyGWi9dcTAB7fkupq9pQeUC8sQG4AQ==
    
  74. in src/random.cpp:52 in c099cd2e3c outdated
    44@@ -45,13 +45,15 @@
    45 #include <sys/auxv.h>
    46 #endif
    47 
    48-[[noreturn]] static void RandFailure()
    49+namespace {
    50+
    51+[[noreturn]] void RandFailure()
    52 {
    53     LogPrintf("Failed to read randomness, aborting\n");
    


    maflcko commented at 1:39 pm on June 5, 2024:
    nit in c099cd2e3c20fc416e3f7fc1da4798bb8042af1c: Could increase the severity to LogError() while touching this?

    sipa commented at 12:26 pm on June 8, 2024:
    Done, in an additional commit.
  75. sipa force-pushed on Jun 8, 2024
  76. sipa force-pushed on Jun 8, 2024
  77. sipa commented at 1:29 pm on June 8, 2024: member
    I’ve rebased on top of the now-merged #30161, and believe I’ve addressed outstanding comments.
  78. in src/net.cpp:3479 in 462cca90c4 outdated
    3474@@ -3475,7 +3475,8 @@ std::vector<CAddress> CConnman::GetAddresses(CNode& requestor, size_t max_addres
    3475         // nodes to be "terrible" (see IsTerrible()) if the timestamps are older than 30 days,
    3476         // max. 24 hours of "penalty" due to cache shouldn't make any meaningful difference
    3477         // in terms of the freshness of the response.
    3478-        cache_entry.m_cache_entry_expiration = current_time + std::chrono::hours(21) + GetRandMillis(std::chrono::hours(6));
    3479+        cache_entry.m_cache_entry_expiration = current_time +
    3480+            21h + FastRandomContext().randrange<std::chrono::microseconds>(6h);
    


    EthanHeilman commented at 11:14 pm on June 8, 2024:
    Is the change from milliseconds to microseconds intentional?

    sipa commented at 1:09 pm on June 9, 2024:
    Yes, because m_cache_entry_expiration is expressed in microseconds anyway; no need to reduce accuracy. I’ve moved the behavior change to a separate commit.
  79. in src/random.h:360 in 462cca90c4 outdated
    473+     * certain type of message) to minimize leaking information to observers.
    474+     *
    475+     * The probability of an event occurring before time x is 1 - e^-(x/a) where a
    476+     * is the average interval between events.
    477+     * */
    478+    std::chrono::microseconds rand_expo_duration(std::chrono::microseconds mean) noexcept
    


    EthanHeilman commented at 11:22 pm on June 8, 2024:
    Nit: Why not rand_exp_duration instead? It is one character shorter and exp reads more clearly to me as exponential than expo?

    sipa commented at 1:10 pm on June 9, 2024:
    I think I was inspired by Python’s random.expovariate, but I agree “exp” is much more naturally associated with “exponential”. Done.
  80. in src/random.cpp:349 in 462cca90c4 outdated
    347@@ -345,64 +348,6 @@ static void Strengthen(const unsigned char (&seed)[32], SteadyClock::duration du
    348 }
    349 #endif
    350 
    351-/** Get 32 bytes of system entropy. */
    352-void GetOSRand(unsigned char *ent32)
    


    EthanHeilman commented at 11:38 pm on June 8, 2024:
    As far as I can tell GetOSRand is unchanged in this PR. Why move below the class RNGState declaration?

    sipa commented at 12:24 pm on June 9, 2024:
    This is done in commit e5658e7a40f03b0b07b1f83f22a9d834bc8fd5c6 which only improves the code organization. Specifically here the reason for moving is to have all the non-exposed functions in random.cpp within one anonymous namespace at the start of the file, and all public one below.

    EthanHeilman commented at 2:24 pm on June 9, 2024:
    That makes sense. This improved organization is worth it.
  81. in src/random.cpp:421 in 462cca90c4 outdated
    416     /** Extract up to 32 bytes of entropy from the RNG state, mixing in new entropy from hasher.
    417      *
    418      * If this function has never been called with strong_seed = true, false is returned.
    419+     *
    420+     * If allow_deterministic is true, and MakeDeterministic has been called before, output
    421+     * from the deterministic PRNG is output instead.
    


    EthanHeilman commented at 11:49 pm on June 8, 2024:
    This confused me because MixExtract is a deterministic function based on hasher, m_counter, and m_state. Unless I am mistaken the non-determinism comes from the calling function determining on the value of ret. Why not address strengthen ret value in the calling function rather than here?

    EthanHeilman commented at 11:51 pm on June 8, 2024:
    0     * If allow_deterministic is true, and MakeDeterministic has been called before, output
    1     * from the deterministic PRNG.
    

    sipa commented at 12:37 pm on June 9, 2024:

    I see how this is confusing; MixExtract is indeed a deterministic function, and arguably, all of RNGState is deterministic when considered in isolation.

    The “determinism” referred here is the introduction of a new deterministic mode for the entire RNG (module-wide, including seeding/initializations/hardware, …). In order to do that, a separate RNG is embedded within RNGState which is only ever explicitly initialized, and if it is, MixExtract can be asked to draw its output from there instead of from its normal SHA512-based mechanism. The entropy fed to MixExtract is still processed normally in this case because for some purposes (cryptographic keys) I think we always want normal high-quality randomness for belt-and-suspenders reasons. Finally, all of this is done inside MixExtract instead of elsewhere so that the same lock can be used for everything (RNGState::m_mutex), meaning the performance penalty of introducing this mechanism is limited to the if (allow_deterministic && m_deterministic_prng.has_value()) conditional, which involves no atomics or locks.

    So really I think we want to call the RNG-wide name for this feature “deterministic mode” (e.g. at ProcRand), but maybe using the same name at the level of RNGState is confusing? Would it help to call it call it something else?


    sipa commented at 1:10 pm on June 9, 2024:
    Done (I left the “instead” in place).

    EthanHeilman commented at 2:38 pm on June 9, 2024:
    1. If I understand, you don’t want to replace MixExtract entirely when the deterministic test RNG is used, because when MixExtract to still run and add/mix entropy in case someone decides to generate a keying material while the deterministic test RNG is on. This was what I was missing.

    2. I like the name deterministic mode because it makes it more clear this is a RNG wide change.

    3. Thought I had after waking up this morning was to change this to bool fail_on_deterministic_mode. Thus if someone had the deterministic test RNG activated and then tried to generate a key it would throw an error because fail_on_deterministic_mode=true. That said, I’m not sure this works with the current approach since if someone calls SeedStrengthen with the deterministic test RNG activated it would fail and it is reasonable to call SeedStrengthen in tests/fuzzing. Edit: You make a similar point in the next comment. I don’t think my proposed change here is worth making in this PR.


    sipa commented at 3:10 pm on June 9, 2024:
    How about having an always_use_real_rng bool instead, with meaning !allow_deterministic. That means the “deterministic” terminology would be more clearly referring to the mode (is the deterministic PRNG enabled), but not to the produced numbers.
  82. in src/random.cpp:521 in 462cca90c4 outdated
    518+void SeedStrengthen(CSHA512& hasher, RNGState& rng, SteadyClock::duration dur) noexcept
    519 {
    520     // Generate 32 bytes of entropy from the RNG, and a copy of the entropy already in hasher.
    521     unsigned char strengthen_seed[32];
    522-    rng.MixExtract(strengthen_seed, sizeof(strengthen_seed), CSHA512(hasher), false);
    523+    rng.MixExtract(strengthen_seed, sizeof(strengthen_seed), CSHA512(hasher), false, /*allow_deterministic=*/true);
    


    EthanHeilman commented at 11:59 pm on June 8, 2024:

    If I am understanding this logic correctly, if m_deterministic_prng.has_value() is true then MixExtract in SeedStrengthen will never use m_counter or m_state to do MixExtract?

    Strengthen is going to make the output of this call non-deterministic anyways, so what value does allow_deterministic=true as an argument provide in MixExtract? Shouldn’t this be false?

    Is the intention to make Strengthen deterministic at some later point?


    sipa commented at 1:11 pm on June 9, 2024:
    Nice catch. Done, and added a comment to explain.
  83. in src/random.cpp:442 in 462cca90c4 outdated
    437@@ -479,6 +438,13 @@ class RNGState {
    438             hasher.Finalize(buf);
    439             // Store the last 32 bytes of the hash output as new RNG state.
    440             memcpy(m_state, buf + 32, 32);
    441+            // Handle requests for deterministic randomness.
    442+            if (allow_deterministic && m_deterministic_prng.has_value()) {
    


    EthanHeilman commented at 0:02 am on June 9, 2024:
    Does allow_deterministic=false and m_deterministic_prng.has_value()=true imply that something has gone wrong? Should we log an error or take some action or are there cases where this is expected behavior?

    sipa commented at 1:06 pm on June 9, 2024:

    Hmm. The intent was that this is supported behavior; GetStrongRandBytes() never sets allow_deterministic, and thus in deterministic mode would still return from the normal fully-seeded RNG (because I don’t want to introduce anything in the code path of GetStrongRandBytes that could return anything but high quality randomness). This means that if GetStrongRandBytes is called anywhere in test code that enables deterministic randomness, the condition you state occurs.

    It’s probably possible to instead make the behavior to abort if allow_deterministic=false and m_deterministic_prng.has_value()=true, but it would involve going over all cases where tests use GetStrongRandBytes (directly, or indirectly). I don’t think I want to do that in this PR.

  84. sipa force-pushed on Jun 9, 2024
  85. DrahtBot added the label CI failed on Jun 9, 2024
  86. EthanHeilman approved
  87. EthanHeilman commented at 4:02 pm on June 9, 2024: contributor
    ACK 125509f395a214465ebf379164e41e4c6d19d443 I have not tested the code. All my comments have been addressed.
  88. DrahtBot requested review from dergoegge on Jun 9, 2024
  89. DrahtBot requested review from theuni on Jun 9, 2024
  90. DrahtBot removed the label CI failed on Jun 9, 2024
  91. DrahtBot added the label CI failed on Jun 9, 2024
  92. DrahtBot removed the label CI failed on Jun 9, 2024
  93. DrahtBot added the label CI failed on Jun 9, 2024
  94. DrahtBot removed the label CI failed on Jun 9, 2024
  95. maflcko commented at 11:30 am on June 10, 2024: member
    The false positive CI error still happens. I am taking another look.
  96. maflcko commented at 8:34 pm on June 10, 2024: member

    I could reproduce once outside the CI env on a 24.04 Ubuntu vm:

    0./configure CC='clang-18 -ftrivial-auto-var-init=pattern' CXX='clang++-18 -ftrivial-auto-var-init=pattern' --with-sanitizers=float-divide-by-zero,integer,undefined --enable-usdt --enable-zmq --with-incompatible-bdb --with-gui=qt5  && make
    1
    2while ( UBSAN_OPTIONS="suppressions=$(pwd)/test/sanitizer_suppressions/ubsan:print_stacktrace=1:halt_on_error=1:report_error_type=1" ./test/functional/test_runner.py -j 22 --timeout-factor=2  ) ; do ( echo 1 >> /tmp/ok ) ; done
    

    Required diff to remove known UB from leveldb:

     0diff --git a/src/leveldb/db/db_impl.cc b/src/leveldb/db/db_impl.cc
     1index 65e31724b..f61b47195 100644
     2--- a/src/leveldb/db/db_impl.cc
     3+++ b/src/leveldb/db/db_impl.cc
     4@@ -1028,9 +1028,6 @@ Status DBImpl::DoCompactionWork(CompactionState* compact) {
     5       stats.bytes_read += compact->compaction->input(which, i)->file_size;
     6     }
     7   }
     8-  for (size_t i = 0; i < compact->outputs.size(); i++) {
     9-    stats.bytes_written += compact->outputs[i].file_size;
    10-  }
    11 
    12   mutex_.Lock();
    13   stats_[compact->compaction->level() + 1].Add(stats);
    

    Will try some more stuff later on.

  97. sipa commented at 11:17 pm on June 10, 2024: member
    @maflcko Thanks, I can also reproduce that way.
  98. DrahtBot added the label Needs rebase on Jun 11, 2024
  99. sipa force-pushed on Jun 11, 2024
  100. sipa commented at 11:19 pm on June 11, 2024: member
    Rebased.
  101. DrahtBot removed the label Needs rebase on Jun 12, 2024
  102. DrahtBot commented at 7:52 am on June 12, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/26102938676

  103. DrahtBot added the label CI failed on Jun 12, 2024
  104. random: write rand256() in function of fillrand() 23f5be594b
  105. random: move rand256() and randbytes() to .h file a8451f0fdd
  106. random: add a few noexcepts to FastRandomContext d83e47fbdc
  107. random: use BasicByte concept in randbytes eff953f59a
  108. random: refactor: move rand* utilities to RandomMixin
    Rather than make all the useful types of randomness be exclusive to
    FastRandomContext, move it to a separate RandomMixin class where it can be reused by
    other RNGs.
    
    A Curiously Recurring Template Pattern (CRTP) is used for this, to provide the ability
    for individual RNG classes to override one or more randomness functions, without
    needing the runtime-cost of virtual classes.
    
    Specifically, RNGs are expected to only provide fillrand and rand64, while all others
    are derived from those:
    - randbits
    - randrange
    - randbytes
    - rand32
    - rand256
    - randbool
    - rand_uniform_delay
    - rand_uniform_duration
    - min(), max(), operator()(), to comply with C++ URBG concept.
    3dfff39d9e
  109. random: Improve RandomMixin::randbits
    The previous randbits code would, when requesting more randomness than available
    in its random bits buffer, discard the remaining entropy and generate new.
    
    Benchmarks show that it's usually better to first consume the existing randomness
    and only then generate new ones. This adds some complexity to randbits, but it
    doesn't weigh up against the reduced need to generate more randomness.
    1dcade3586
  110. random: add RandomMixin::randbits with compile-known bits
    In many cases, it is known at compile time how many bits are requested from
    randbits. Provide a variant of randbits that accepts this number as a template,
    to make sure the compiler can make use of this knowledge. This is used immediately
    in rand32() and randbool(), and a few further call sites.
    6946e61268
  111. random: modernize XoRoShiRo128PlusPlus a bit
    Make use of C++20 functions in XoRoShiRo128PlusPlus.
    fac024cf23
  112. xoroshiro128plusplus: drop comment about nonexisting copy() a20a54cc38
  113. random: move XoRoShiRo128PlusPlus into random module
    This is preparation for making it more generally accessible.
    04bf44adae
  114. random: convert XoRoShiRo128PlusPlus into full RNG
    Convert XoRoShiRo128PlusPlus into a full RandomMixin-based RNG class,
    providing all utility functionality that FastRandomContext has. In doing so,
    it is renamed to InsecureRandomContext, highlighting its non-cryptographic
    nature.
    
    To do this, a fillrand fallback is added to RandomMixin (where it is used by
    InsecureRandomContext), but FastRandomContext still uses its own fillrand.
    d64ea826e5
  115. tests: overhaul deterministic test randomness
    The existing code provides two randomness mechanisms for test purposes:
    - g_insecure_rand_ctx (with its wrappers InsecureRand*), which during tests is
      initialized using either zeros (SeedRand::ZEROS), or using environment-provided
      randomness (SeedRand::SEED).
    - g_mock_deterministic_tests, which controls some (but not all) of the normal
      randomness output if set, but then makes it extremely predictable (identical
      output repeatedly).
    
    Replace this with a single mechanism, which retains the SeedRand modes to control
    all randomness. There is a new internal deterministic PRNG inside the random
    module, which is used in GetRandBytes() when in test mode, and which is also used
    to initialize g_insecure_rand_ctx. This means that during tests, all random numbers
    are made deterministic. There is one exception, GetStrongRandBytes(), which even
    in test mode still uses the normal PRNG state.
    
    This probably opens the door to removing a lot of the ad-hoc "deterministic" mode
    functions littered through the codebase (by simply running relevant tests in
    SeedRand::ZEROS mode), but this isn't done yet.
    2df7dc2dc2
  116. random: make GetRand() support entire range (incl. max)
    The existing code uses GetRand(nMax), with a default value for nMax, where nMax is the
    range of values (not the maximum!) that the output is allowed to take. This will always
    miss the last possible value (e.g. GetRand<uint32_t>() will never return 0xffffffff).
    
    Fix this, by moving the functionality largely in RandomMixin, and also adding a
    separate RandomMixin::rand function, which returns a value in the entire (non-negative)
    range of an integer.
    91193091de
  117. net: use GetRandMicros for cache expiration
    This matches the data type of m_cache_entry_expiration.
    2a53e6e232
  118. random: convert GetRand{Micros,Millis} into randrange
    There are only a few call sites of these throughout the codebase, so
    move the functionality into FastRandomContext, and rewrite all call sites.
    
    This requires the callers to explicit construct FastRandomContext objects,
    which do add to the verbosity, but also make potentially apparent locations
    where the code can be improved by reusing a FastRandomContext object.
    efe456a063
  119. random: convert GetExponentialRand into rand_exp_delay
    This simultaneously allows some queries to be redirected to the
    PeerManagerImpl::m_rng instance rather than a fresh context.
    8709e0f533
  120. random: cleanup order, comments, static 1069e428e2
  121. tests: make fuzz tests (mostly) deterministic with fixed seed fef44f271c
  122. random: use LogError for init failure f7431109de
  123. random: replace construct/assign with explicit Reseed() 6e905f86fe
  124. sipa force-pushed on Jun 12, 2024
  125. sipa commented at 11:23 am on June 12, 2024: member
    Now actually rebased, after merge of #30160.
  126. DrahtBot removed the label CI failed on Jun 12, 2024
  127. DrahtBot added the label CI failed on Jun 12, 2024
  128. DrahtBot removed the label CI failed on Jun 12, 2024
  129. DrahtBot added the label CI failed on Jun 12, 2024
  130. DrahtBot removed the label CI failed on Jun 12, 2024
  131. DrahtBot added the label CI failed on Jun 13, 2024
  132. DrahtBot removed the label CI failed on Jun 13, 2024
  133. maflcko commented at 7:41 am on June 13, 2024: member
    I am still looking into the clang error. I couldn’t find an issue with your code, but minimizing/reducing the clang bug will take some time, because it appears non-deterministic.
  134. maflcko commented at 9:33 am on June 15, 2024: member

    Ok, I think I found the bug: https://github.com/bitcoin/bitcoin/pull/29625/files#r1640954007

    It is in your code, and not a compiler bug :sweat_smile:

  135. sipa commented at 12:50 pm on June 15, 2024: member

    Ok, I think I found the bug: https://github.com/bitcoin/bitcoin/pull/29625/files#r1640954007

    It is in your code, and not a compiler bug :sweat_smile:

    Wow, nice catch, thank you. Will fix next week.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-18 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me