Safegcd-based modular inverses in MuHash3072 #21590

pull sipa wants to merge 3 commits into bitcoin:master from sipa:202101_muhash_safegcd changing 6 files +507 −92
  1. sipa commented at 4:26 am on April 4, 2021: member

    This implements a safegcd-based modular inverse for MuHash3072. It is a fairly straightforward translation of the libsecp256k1 implementation, with the following changes:

    • Generic for 32-bit and 64-bit
    • Specialized for the specific MuHash3072 modulus (2^3072 - 1103717).
    • A bit more C++ish
    • Far fewer sanity checks

    A benchmark is also included for MuHash3072::Finalize. The new implementation is around 100x faster on x86_64 for me (from 5.8 ms to 57 Ξs); for 32-bit code the factor is likely even larger.

    For more information:

    • Original paper by Daniel J. Bernstein and Bo-Yin Yang
    • Implementation for libsecp256k1 by Peter Dettman; and the final version
    • Explanation of the algorithm using Python snippets
    • Analysis of the maximum number of iterations the algorithm needs
    • Formal proof in Coq by Russell O’Connor (for the 256-bit version of the algorithm; here we use a 3072-bit one).
  2. DrahtBot added the label Build system on Apr 4, 2021
  3. DrahtBot added the label Utils/log/libs on Apr 4, 2021
  4. sipa commented at 11:11 pm on April 6, 2021: member

    Using libgmp for inverses is 1.5x-2x faster still, which is somewhat expected - there are several optimizations to safegcd that become more relevant for larger input sizes but aren’t useful in the 256-bit code which this is adapted from as well.

    I think it’s fine to leave those for future improvements, as this already gets hash finalization down to ~1 signature check worth, which is probably far below what we care about.

  5. laanwj commented at 2:01 pm on May 19, 2021: member
    Concept and high-level review ACK. Did not check the algorithm in detail.
  6. theStack commented at 6:56 pm on June 22, 2021: contributor
    Concept ACK
  7. DrahtBot commented at 9:07 am on October 15, 2021: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/21590.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK TheCharlatan
    Concept ACK laanwj, theStack
    Stale ACK dergoegge

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  8. DrahtBot added the label Needs rebase on Nov 13, 2021
  9. sipa force-pushed on Nov 15, 2021
  10. sipa commented at 4:22 pm on November 15, 2021: member
    Rebased.
  11. DrahtBot removed the label Needs rebase on Nov 15, 2021
  12. in configure.ac:975 in f1f4eae619 outdated
    966@@ -967,6 +967,30 @@ AC_CHECK_DECLS([bswap_16, bswap_32, bswap_64],,,
    967                  #include <byteswap.h>
    968                  #endif])
    969 
    970+AC_MSG_CHECKING([for __builtin_ctz])
    971+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ ]], [[
    972+ (void) __builtin_clz(0);
    973+  ]])],
    974+ [ AC_MSG_RESULT(yes); AC_DEFINE(HAVE_BUILTIN_CTZ, 1, [Define this symbol if you have __builtin_ctz])],
    975+ [ AC_MSG_RESULT(no)]
    


    fanquake commented at 11:44 pm on November 15, 2021:

    Can you use the following here and below. Functionally there’s no real difference, but we’ve settled on this style for consistency across the build system.

    0 [ AC_MSG_RESULT([yes]); AC_DEFINE([HAVE_BUILTIN_CTZ], [1], [Define this symbol if you have __builtin_ctz])],
    1 [ AC_MSG_RESULT([no])]
    

    sipa commented at 9:17 pm on December 1, 2021:
    Done.
  13. sipa force-pushed on Dec 1, 2021
  14. sipa force-pushed on Dec 1, 2021
  15. sipa commented at 9:17 pm on December 1, 2021: member

    I added additional fuzz tests, which seem to pass.

    However, test/functional/feature_coinstatsindex.py now fails, and I haven’t figured out why. @fjahr ?

  16. in src/test/fuzz/muhash.cpp:35 in 58b108ebcd outdated
    30+    }
    31+
    32+    void Serialize(Span<uint8_t> bytes) {
    33+        assert(bytes.size() % 4 == 0);
    34+        assert(bytes.size() <= 768);
    35+        for (int i = 0; i*4 < bytes.size(); ++i) {
    


    maflcko commented at 3:07 pm on December 2, 2021:
    0        for (size_t i{0}; i*4 < bytes.size(); ++i) {
    
    0test/fuzz/muhash.cpp:27:29: error: comparison of integers of different signs: 'int' and 'std::size_t' (aka 'unsigned long') [-Werror,-Wsign-compare]
    1        for (int i = 0; i*4 < bytes.size(); ++i) {
    2                        ~~~ ^ ~~~~~~~~~~~~
    3test/fuzz/muhash.cpp:35:29: error: comparison of integers of different signs: 'int' and 'std::size_t' (aka 'unsigned long') [-Werror,-Wsign-compare]
    4        for (int i = 0; i*4 < bytes.size(); ++i) {
    5                        ~~~ ^ ~~~~~~~~~~~~
    6test/fuzz/muhash.cpp:131:13: error: unused variable 'buf' [-Werror,-Wunused-variable]
    7    uint8_t buf[384];
    8            ^
    93 errors generated.
    

    sipa commented at 4:40 pm on December 2, 2021:
    Fixed.
  17. in src/test/fuzz/muhash.cpp:110 in 58b108ebcd outdated
    105+} // namespace
    106+
    107+FUZZ_TARGET(num3072_mul)
    108+{
    109+    FuzzedDataProvider provider{buffer.data(), buffer.size()};
    110+    uint8_t buf[384];
    


    maflcko commented at 3:09 pm on December 2, 2021:
    unused

    sipa commented at 4:40 pm on December 2, 2021:
    Improved in various ways.
  18. sipa force-pushed on Dec 2, 2021
  19. sipa force-pushed on Dec 2, 2021
  20. maflcko referenced this in commit fd1c9e26d3 on Dec 3, 2021
  21. sidhujag referenced this in commit 25e1d70230 on Dec 3, 2021
  22. fjahr commented at 10:05 pm on December 5, 2021: contributor

    I added additional fuzz tests, which seem to pass.

    However, test/functional/feature_coinstatsindex.py now fails, and I haven’t figured out why. @fjahr ?

    This isn’t an issue with the implementation here but with me the test just being stupid. More info here #23681.

    I will take a closer look at the code here soon!

  23. maflcko referenced this in commit 42b35f17d5 on Dec 6, 2021
  24. sidhujag referenced this in commit 177261cf2d on Dec 6, 2021
  25. in src/crypto/muhash.cpp:147 in 9cb1bc3409 outdated
    148+    void FromNum3072(const Num3072& in)
    149+    {
    150+        double_limb_t c = 0;
    151+        int b = 0, outpos = 0;
    152+        for (int i = 0; i < LIMBS; ++i) {
    153+            c += ((double_limb_t)in.limbs[i]) << b;
    


    PastaPastaPasta commented at 1:44 pm on February 1, 2022:

    please avoid c-style cast, use c++11 style functional casts

    double_limb_t(in.limbs[i])


    sipa commented at 2:04 pm on October 16, 2022:
    Done.
  26. in src/crypto/muhash.cpp:150 in 9cb1bc3409 outdated
    151+        int b = 0, outpos = 0;
    152+        for (int i = 0; i < LIMBS; ++i) {
    153+            c += ((double_limb_t)in.limbs[i]) << b;
    154+            b += LIMB_SIZE;
    155+            while (b >= SIGNED_LIMB_SIZE) {
    156+                limbs[outpos++] = (limb_t)c & MAX_SIGNED_LIMB;
    


    PastaPastaPasta commented at 1:44 pm on February 1, 2022:
    please use functional cast

    sipa commented at 2:05 pm on October 16, 2022:
    Done.
  27. in src/crypto/muhash.cpp:167 in 9cb1bc3409 outdated
    169+    void ToNum3072(Num3072& out) const
    170+    {
    171+        double_limb_t c = 0;
    172+        int b = 0, outpos = 0;
    173+        for (int i = 0; i < SIGNED_LIMBS; ++i) {
    174+            c += ((double_limb_t)limbs[i]) << b;
    


    PastaPastaPasta commented at 1:45 pm on February 1, 2022:
    use functional cast

    sipa commented at 2:05 pm on October 16, 2022:
    Done.
  28. in src/crypto/muhash.cpp:170 in 9cb1bc3409 outdated
    172+        int b = 0, outpos = 0;
    173+        for (int i = 0; i < SIGNED_LIMBS; ++i) {
    174+            c += ((double_limb_t)limbs[i]) << b;
    175+            b += SIGNED_LIMB_SIZE;
    176+            if (b >= LIMB_SIZE) {
    177+                out.limbs[outpos++] = (limb_t)c;
    


    PastaPastaPasta commented at 1:45 pm on February 1, 2022:
    please use functional cast

    sipa commented at 2:05 pm on October 16, 2022:
    Done.
  29. PastaPastaPasta changes_requested
  30. PastaPastaPasta commented at 1:49 pm on February 1, 2022: contributor
    please change all the c-style casts to functional casts, I gave specific comments on a few instances, but stopped so it’s not as spammy
  31. elichai commented at 3:00 pm on October 4, 2022: contributor
    Another option here it to make arith_u256 generic over the integer size, and then we can get a generic u3072 and implement a simple egcd via extended euclidean algorithm (as this doesn’t require constant timness)
  32. achow101 commented at 6:26 pm on October 12, 2022: member
    Are you still working on this?
  33. sipa force-pushed on Oct 16, 2022
  34. sipa commented at 2:05 pm on October 16, 2022: member
    Rebased, and addressed comments.
  35. sipa commented at 2:07 pm on October 16, 2022: member
    @elichai That’s a possibility. I expect it’d be an order of magnitude slower, but still significantly faster than what we have. The arith_uint256 code is already templated in the number of bits, so this would not be much work. Are you interested in trying that approach?
  36. DrahtBot added the label Needs rebase on Oct 20, 2022
  37. sipa commented at 9:03 pm on January 18, 2023: member
    Rebased.
  38. sipa force-pushed on Jan 18, 2023
  39. DrahtBot removed the label Needs rebase on Jan 18, 2023
  40. DrahtBot added the label Needs rebase on Jan 30, 2023
  41. sipa force-pushed on Apr 25, 2023
  42. achow101 commented at 3:52 pm on April 25, 2023: member
  43. DrahtBot removed the label Needs rebase on Apr 25, 2023
  44. DrahtBot added the label CI failed on Apr 25, 2023
  45. maflcko commented at 2:21 pm on April 26, 2023: member

    On Windows CI this will eat the bench CPU without terminating?

    https://cirrus-ci.com/task/6655367648641024?logs=check#L1394

  46. Fabcien referenced this in commit a045b9e47d on Jun 22, 2023
  47. PastaPastaPasta referenced this in commit 25fcda0180 on Dec 25, 2023
  48. PastaPastaPasta referenced this in commit 9b388c8388 on Dec 25, 2023
  49. PastaPastaPasta referenced this in commit 7d601cfa85 on Dec 27, 2023
  50. theuni commented at 3:43 pm on January 18, 2024: member
    Seems the ctz builtins here can be switched to c++20’s countr_zero ? See the libc++ impl here, for an example of how it maps to builtins.
  51. DrahtBot added the label Needs rebase on Mar 1, 2024
  52. achow101 commented at 2:24 pm on April 9, 2024: member
    Are you still working on this?
  53. sipa force-pushed on Apr 9, 2024
  54. DrahtBot removed the label Needs rebase on Apr 9, 2024
  55. DrahtBot removed the label CI failed on Apr 9, 2024
  56. sipa commented at 8:44 pm on April 9, 2024: member
    Rebased, and switched to std::countr_zero instead of CTZ builtins.
  57. maflcko commented at 10:10 am on April 10, 2024: member
    Looks like this also fixed the Windows issue (https://github.com/bitcoin/bitcoin/pull/21590#issuecomment-1523507720), so I guess there may have been a bug in the previous implementation.
  58. fjahr commented at 3:34 pm on April 15, 2024: contributor
    @sipa The description still lists “Add more tests” as an open todo. Did you still want to address this before this could be merged? I think we have ok-ish coverage of MuHash but if you think something should be added could you tell us what you had in mind? Are there maybe test vectors we can port over?
  59. sipa commented at 3:36 pm on April 15, 2024: member
    @fjahr I’ve dropped the TODO. Feel free to contribute tests of course if you feel that’s helpful.
  60. dergoegge approved
  61. dergoegge commented at 3:42 pm on June 7, 2024: member

    tACK 030c9edf5b12033207da2bc0735f97840dc88056

    I differentially fuzzed the muhash implementation we have on master against the version in this PR. Given the coverage reached and assuming that the implementation on master is correct, I am confident that the muhash implementation in this PR is also correct. I tested across multiple compilers and optimization levels i.e. the following tuples: (master clang -O2, pr clang -O2), (master clang -O2, pr gcc -O2), (master clang-O2, pr gcc -O0) (all on x86_64).

    Given the amount of testing and work that has already gone into all of this, I’m not surprised that I didn’t find any bugs.

    The speedup for the muhash harness is very noticeable, ~60x faster for me.

  62. DrahtBot requested review from laanwj on Jun 7, 2024
  63. DrahtBot requested review from theStack on Jun 7, 2024
  64. hebasto added the label Needs CMake port on Aug 16, 2024
  65. DrahtBot added the label Needs rebase on Aug 28, 2024
  66. maflcko removed the label Needs CMake port on Aug 29, 2024
  67. Add benchmark for MuHash finalization e6186be843
  68. sipa force-pushed on Oct 15, 2024
  69. DrahtBot added the label CI failed on Oct 15, 2024
  70. DrahtBot commented at 2:49 pm on October 15, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/31561424673

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  71. maflcko commented at 2:54 pm on October 15, 2024: member
  72. DrahtBot removed the label Needs rebase on Oct 15, 2024
  73. TheCharlatan commented at 4:31 pm on October 15, 2024: contributor
    Concept ACK
  74. sipa force-pushed on Oct 16, 2024
  75. Safegcd based modular inverse for Num3072 310b778eb1
  76. Add a fuzz test for Num3072 multiplication and inversion ad67fd2e0b
  77. sipa force-pushed on Oct 16, 2024
  78. DrahtBot removed the label CI failed on Oct 16, 2024
  79. in src/crypto/muhash.cpp:227 in 310b778eb1 outdated
    251+ * f:   bottom SIGNED_LIMB_SIZE bits of initial f value
    252+ * g:   bottom SIGNED_LIMB_SIZE bits of initial g value
    253+ * out: resulting transformation matrix, scaled by 2^SIGNED_LIMB_SIZE
    254+ * return: eta value after SIGNED_LIMB_SIZE divsteps
    255+ */
    256+limb_t ComputeDivstepMatrix(signed_limb_t eta, limb_t f, limb_t g, SignedMatrix& out)
    


    TheCharlatan commented at 12:36 pm on October 31, 2024:
    Nit: Not important, but could inline these same as the other functions only used within this module?
  80. in src/crypto/muhash.cpp:398 in ad67fd2e0b
    451+    Num3072Signed d, e, f, g;
    452+    e.limbs[0] = 1;
    453+    // F is initialized as modulus, which in signed limb representation can be expressed
    454+    // simply as 2^3072 + -MAX_PRIME_DIFF.
    455+    f.limbs[0] = -MAX_PRIME_DIFF;
    456+    f.limbs[3072 / SIGNED_LIMB_SIZE] = ((limb_t)1) << (3072 % SIGNED_LIMB_SIZE);
    


    TheCharlatan commented at 2:40 pm on November 6, 2024:
    Nit: Both 3072 / SIGNED_LIMB_SIZE and 3072 % SIGNED_LIMB_SIZE are used a couple of times. Might it be more clear to give them their own descriptive constants? I was initially thinking FINAL_LIMB_POSITION and FINAL_LIMB_MODULUS_BITS, but not sure if that really feels clearer either.
  81. in src/crypto/muhash.cpp:304 in ad67fd2e0b
    340+    /* Begin computing t*[d,e]. */
    341+    signed_limb_t di = d.limbs[0], ei = e.limbs[0];
    342+    signed_double_limb_t cd = (signed_double_limb_t)u * di + (signed_double_limb_t)v * ei;
    343+    signed_double_limb_t ce = (signed_double_limb_t)q * di + (signed_double_limb_t)r * ei;
    344+    /* Correct md,me so that t*[d,e]+modulus*[md,me] has SIGNED_LIMB_SIZE zero bottom bits. */
    345+    md -= (limb_t(0x70a1421da087d93) * limb_t(cd) + md) & MAX_SIGNED_LIMB;
    


    TheCharlatan commented at 9:33 am on November 7, 2024:
    Nit: The constant here seems clearer in modinv32_impl.h. Could it get a name here too, or an explanation how it was computed?
  82. in src/crypto/muhash.cpp:311 in ad67fd2e0b
    347+    /* Update the beginning of computation for t*[d,e]+modulus*[md,me] now md,me are known. */
    348+    cd -= (signed_double_limb_t)1103717 * md;
    349+    ce -= (signed_double_limb_t)1103717 * me;
    350+    /* Verify that the low SIGNED_LIMB_SIZE bits of the computation are indeed zero, and then throw them away. */
    351+    assert((cd & MAX_SIGNED_LIMB) == 0);
    352+    assert((ce & MAX_SIGNED_LIMB) == 0);
    


    TheCharlatan commented at 9:41 am on November 7, 2024:
    Just a question: Compared to the code in modinv32_impl.h, there are fewer bound checks done here. Is this on purpose?
  83. in src/crypto/muhash.cpp:190 in ad67fd2e0b
    213+    void Normalize(bool negate)
    214+    {
    215+        // Add modulus if this was negative. This brings the range of *this to 1-2^3072..2^3072-1.
    216+        signed_limb_t cond_add = limbs[SIGNED_LIMBS-1] >> (LIMB_SIZE-1); // -1 if this is negative; 0 otherwise
    217+        limbs[0] += signed_limb_t(-MAX_PRIME_DIFF) & cond_add;
    218+        limbs[3072 / SIGNED_LIMB_SIZE] += (signed_limb_t(1) << (3072 % SIGNED_LIMB_SIZE)) & cond_add;
    


    TheCharlatan commented at 12:12 pm on November 7, 2024:
    Just a question: Compared to modinv32_normalize, this step seems to skip the inner limbs. Is there a reason why there is a difference between the implementations here? IIUC this works out in the end because of the carry step.
  84. in src/crypto/muhash.cpp:449 in ad67fd2e0b
    503+    d.ToNum3072(ret);
    504+    return ret;
    505 }
    506 
    507-void Num3072::Square()
    508+void Num3072::Multiply(const Num3072& a)
    


    TheCharlatan commented at 7:58 pm on November 7, 2024:
    Note for other reviewers: The diff here is confusing, but from what I can tell, this does not actually change the Multiply function, but just gets rid of Square(), muldbladd3, and square_n_mul, which were used in the old Inverse implementation.
  85. in src/crypto/muhash.cpp:308 in ad67fd2e0b
    344+    /* Correct md,me so that t*[d,e]+modulus*[md,me] has SIGNED_LIMB_SIZE zero bottom bits. */
    345+    md -= (limb_t(0x70a1421da087d93) * limb_t(cd) + md) & MAX_SIGNED_LIMB;
    346+    me -= (limb_t(0x70a1421da087d93) * limb_t(ce) + me) & MAX_SIGNED_LIMB;
    347+    /* Update the beginning of computation for t*[d,e]+modulus*[md,me] now md,me are known. */
    348+    cd -= (signed_double_limb_t)1103717 * md;
    349+    ce -= (signed_double_limb_t)1103717 * me;
    


    TheCharlatan commented at 8:19 pm on November 7, 2024:
    This tripped me up at first, because something is added instead of subtracted compared to the implementation in modinv32_impl.h. But I think this is correct, because it subtracts the distance (I don’t know what the correct terminology is for this) to the modulus instead of adding the modulus, which should work out to the same. Similarly, the operation in the for loop can also be moved to the final step, which I’m guessing is a further nice optimization.
  86. TheCharlatan approved
  87. TheCharlatan commented at 8:26 pm on November 7, 2024: contributor

    ACK ad67fd2e0bfa6f43f350066596b6cca146391362

    Just nits, and all of them can be ignored. This was fun to review, the explanations in the safegcd_implementation.md are excellent. I also profited from the two review clubs and their notes that were done on the original MuHash introduction.

  88. DrahtBot requested review from dergoegge on Nov 7, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me