sipa
commented at 4:26 am on April 4, 2021:
member
This implements a safegcd-based modular inverse for MuHash3072. It is a fairly straightforward translation of the libsecp256k1 implementation, with the following changes:
Generic for 32-bit and 64-bit
Specialized for the specific MuHash3072 modulus (2^3072 - 1103717).
A bit more C++ish
Far fewer sanity checks
A benchmark is also included for MuHash3072::Finalize. The new implementation is around 100x faster on x86_64 for me (from 5.8 ms to 57 μs); for 32-bit code the factor is likely even larger.
Explanation of the algorithm using Python snippets
Analysis of the maximum number of iterations the algorithm needs
Formal proof in Coq by Russell O’Connor (for the 256-bit version of the algorithm; here we use a 3072-bit one).
DrahtBot added the label
Build system
on Apr 4, 2021
DrahtBot added the label
Utils/log/libs
on Apr 4, 2021
sipa
commented at 11:11 pm on April 6, 2021:
member
Using libgmp for inverses is 1.5x-2x faster still, which is somewhat expected - there are several optimizations to safegcd that become more relevant for larger input sizes but aren’t useful in the 256-bit code which this is adapted from as well.
I think it’s fine to leave those for future improvements, as this already gets hash finalization down to ~1 signature check worth, which is probably far below what we care about.
laanwj
commented at 2:01 pm on May 19, 2021:
member
Concept and high-level review ACK. Did not check the algorithm in detail.
theStack
commented at 6:56 pm on June 22, 2021:
contributor
Concept ACK
DrahtBot
commented at 9:07 am on October 15, 2021:
contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Conflicts
Reviewers, this pull request conflicts with the following ones:
#31507 ([POC] build: Use clang-cl to build on Windows natively by hebasto)
#31308 (ci, iwyu: Treat warnings as errors for specific targets by hebasto)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
DrahtBot added the label
Needs rebase
on Nov 13, 2021
sipa force-pushed
on Nov 15, 2021
sipa
commented at 4:22 pm on November 15, 2021:
member
Rebased.
DrahtBot removed the label
Needs rebase
on Nov 15, 2021
in
configure.ac:975
in
f1f4eae619outdated
966@@ -967,6 +967,30 @@ AC_CHECK_DECLS([bswap_16, bswap_32, bswap_64],,,
967 #include <byteswap.h>
968 #endif])
969970+AC_MSG_CHECKING([for __builtin_ctz])
971+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ ]], [[
972+ (void) __builtin_clz(0);
973+ ]])],
974+ [ AC_MSG_RESULT(yes); AC_DEFINE(HAVE_BUILTIN_CTZ, 1, [Define this symbol if you have __builtin_ctz])],
975+ [ AC_MSG_RESULT(no)]
fanquake
commented at 11:44 pm on November 15, 2021:
Can you use the following here and below. Functionally there’s no real difference, but we’ve settled on this style for consistency across the build system.
0 [ AC_MSG_RESULT([yes]); AC_DEFINE([HAVE_BUILTIN_CTZ], [1], [Define this symbol if you have __builtin_ctz])],
1 [ AC_MSG_RESULT([no])]
in
src/crypto/muhash.cpp:170
in
9cb1bc3409outdated
172+ int b = 0, outpos = 0;
173+ for (int i = 0; i < SIGNED_LIMBS; ++i) {
174+ c += ((double_limb_t)limbs[i]) << b;
175+ b += SIGNED_LIMB_SIZE;
176+ if (b >= LIMB_SIZE) {
177+ out.limbs[outpos++] = (limb_t)c;
PastaPastaPasta
commented at 1:49 pm on February 1, 2022:
contributor
please change all the c-style casts to functional casts, I gave specific comments on a few instances, but stopped so it’s not as spammy
elichai
commented at 3:00 pm on October 4, 2022:
contributor
Another option here it to make arith_u256 generic over the integer size, and then we can get a generic u3072 and implement a simple egcd via extended euclidean algorithm (as this doesn’t require constant timness)
achow101
commented at 6:26 pm on October 12, 2022:
member
Are you still working on this?
sipa force-pushed
on Oct 16, 2022
sipa
commented at 2:05 pm on October 16, 2022:
member
Rebased, and addressed comments.
sipa
commented at 2:07 pm on October 16, 2022:
member
@elichai That’s a possibility. I expect it’d be an order of magnitude slower, but still significantly faster than what we have. The arith_uint256 code is already templated in the number of bits, so this would not be much work. Are you interested in trying that approach?
DrahtBot added the label
Needs rebase
on Oct 20, 2022
sipa
commented at 9:03 pm on January 18, 2023:
member
Rebased.
sipa force-pushed
on Jan 18, 2023
DrahtBot removed the label
Needs rebase
on Jan 18, 2023
DrahtBot added the label
Needs rebase
on Jan 30, 2023
sipa force-pushed
on Apr 25, 2023
achow101
commented at 3:52 pm on April 25, 2023:
member
fjahr
commented at 3:34 pm on April 15, 2024:
contributor
@sipa The description still lists “Add more tests” as an open todo. Did you still want to address this before this could be merged? I think we have ok-ish coverage of MuHash but if you think something should be added could you tell us what you had in mind? Are there maybe test vectors we can port over?
sipa
commented at 3:36 pm on April 15, 2024:
member
@fjahr I’ve dropped the TODO. Feel free to contribute tests of course if you feel that’s helpful.
dergoegge approved
dergoegge
commented at 3:42 pm on June 7, 2024:
member
tACK030c9edf5b12033207da2bc0735f97840dc88056
I differentially fuzzed the muhash implementation we have on master against the version in this PR. Given the coverage reached and assuming that the implementation on master is correct, I am confident that the muhash implementation in this PR is also correct. I tested across multiple compilers and optimization levels i.e. the following tuples: (master clang -O2, pr clang -O2), (master clang -O2, pr gcc -O2), (master clang-O2, pr gcc -O0) (all on x86_64).
Given the amount of testing and work that has already gone into all of this, I’m not surprised that I didn’t find any bugs.
The speedup for the muhash harness is very noticeable, ~60x faster for me.
DrahtBot requested review from laanwj
on Jun 7, 2024
DrahtBot requested review from theStack
on Jun 7, 2024
hebasto added the label
Needs CMake port
on Aug 16, 2024
DrahtBot added the label
Needs rebase
on Aug 28, 2024
maflcko removed the label
Needs CMake port
on Aug 29, 2024
sipa force-pushed
on Oct 15, 2024
DrahtBot added the label
CI failed
on Oct 15, 2024
DrahtBot
commented at 2:49 pm on October 15, 2024:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
maflcko
commented at 2:54 pm on October 15, 2024:
member
DrahtBot removed the label
Needs rebase
on Oct 15, 2024
TheCharlatan
commented at 4:31 pm on October 15, 2024:
contributor
Concept ACK
sipa force-pushed
on Oct 16, 2024
sipa force-pushed
on Oct 16, 2024
DrahtBot removed the label
CI failed
on Oct 16, 2024
in
src/crypto/muhash.cpp:227
in
310b778eb1outdated
251+ * f: bottom SIGNED_LIMB_SIZE bits of initial f value
252+ * g: bottom SIGNED_LIMB_SIZE bits of initial g value
253+ * out: resulting transformation matrix, scaled by 2^SIGNED_LIMB_SIZE
254+ * return: eta value after SIGNED_LIMB_SIZE divsteps
255+ */
256+limb_t ComputeDivstepMatrix(signed_limb_t eta, limb_t f, limb_t g, SignedMatrix& out)
TheCharlatan
commented at 12:36 pm on October 31, 2024:
Nit: Not important, but could inline these same as the other functions only used within this module?
in
src/crypto/muhash.cpp:398
in
ad67fd2e0boutdated
451+ Num3072Signed d, e, f, g;
452+ e.limbs[0] = 1;
453+ // F is initialized as modulus, which in signed limb representation can be expressed
454+ // simply as 2^3072 + -MAX_PRIME_DIFF.
455+ f.limbs[0] = -MAX_PRIME_DIFF;
456+ f.limbs[3072 / SIGNED_LIMB_SIZE] = ((limb_t)1) << (3072 % SIGNED_LIMB_SIZE);
TheCharlatan
commented at 2:40 pm on November 6, 2024:
Nit: Both 3072 / SIGNED_LIMB_SIZE and 3072 % SIGNED_LIMB_SIZE are used a couple of times. Might it be more clear to give them their own descriptive constants? I was initially thinking FINAL_LIMB_POSITION and FINAL_LIMB_MODULUS_BITS, but not sure if that really feels clearer either.
in
src/crypto/muhash.cpp:311
in
ad67fd2e0boutdated
347+ /* Update the beginning of computation for t*[d,e]+modulus*[md,me] now md,me are known. */
348+ cd -= (signed_double_limb_t)1103717 * md;
349+ ce -= (signed_double_limb_t)1103717 * me;
350+ /* Verify that the low SIGNED_LIMB_SIZE bits of the computation are indeed zero, and then throw them away. */
351+ assert((cd & MAX_SIGNED_LIMB) == 0);
352+ assert((ce & MAX_SIGNED_LIMB) == 0);
TheCharlatan
commented at 9:41 am on November 7, 2024:
Just a question: Compared to the code in modinv32_impl.h, there are fewer bound checks done here. Is this on purpose?
But I guess they are not really that important and it is not worth adding the required utilities?
in
src/crypto/muhash.cpp:190
in
ad67fd2e0boutdated
213+ void Normalize(bool negate)
214+ {
215+ // Add modulus if this was negative. This brings the range of *this to 1-2^3072..2^3072-1.
216+ signed_limb_t cond_add = limbs[SIGNED_LIMBS-1] >> (LIMB_SIZE-1); // -1 if this is negative; 0 otherwise
217+ limbs[0] += signed_limb_t(-MAX_PRIME_DIFF) & cond_add;
218+ limbs[3072 / SIGNED_LIMB_SIZE] += (signed_limb_t(1) << (3072 % SIGNED_LIMB_SIZE)) & cond_add;
TheCharlatan
commented at 12:12 pm on November 7, 2024:
Just a question: Compared to modinv32_normalize, this step seems to skip the inner limbs. Is there a reason why there is a difference between the implementations here? IIUC this works out in the end because of the carry step.
Yes and no. They aren’t skipped, the carry step is the equivalent of processing the inner limbs. They’re just easier, because the modulus here can be represented as [1 « FINAL_LIMB_MODULUS_BITS, 0, 0, 0, …, 0, 0, -MAX_PRIME_DIFF]. In the modinv32_impl.h code, the modulus is treated as generic, where any limb can be nonzero.
in
src/crypto/muhash.cpp:455
in
ad67fd2e0boutdated
TheCharlatan
commented at 7:58 pm on November 7, 2024:
Note for other reviewers: The diff here is confusing, but from what I can tell, this does not actually change the Multiply function, but just gets rid of Square(), muldbladd3, and square_n_mul, which were used in the old Inverse implementation.
in
src/crypto/muhash.cpp:314
in
ad67fd2e0boutdated
344+ /* Correct md,me so that t*[d,e]+modulus*[md,me] has SIGNED_LIMB_SIZE zero bottom bits. */
345+ md -= (limb_t(0x70a1421da087d93) * limb_t(cd) + md) & MAX_SIGNED_LIMB;
346+ me -= (limb_t(0x70a1421da087d93) * limb_t(ce) + me) & MAX_SIGNED_LIMB;
347+ /* Update the beginning of computation for t*[d,e]+modulus*[md,me] now md,me are known. */
348+ cd -= (signed_double_limb_t)1103717 * md;
349+ ce -= (signed_double_limb_t)1103717 * me;
TheCharlatan
commented at 8:19 pm on November 7, 2024:
This tripped me up at first, because something is added instead of subtracted compared to the implementation in modinv32_impl.h. But I think this is correct, because it subtracts the distance (I don’t know what the correct terminology is for this) to the modulus instead of adding the modulus, which should work out to the same. Similarly, the operation in the for loop can also be moved to the final step, which I’m guessing is a further nice optimization.
Same comment as above. The modulus here is 2^3072 - MAX_PRIME_DIFF, which is represented in signed-limb representation as [1 « FINAL_LIMB_MODULUS_BITS, 0, 0, 0, …, 0, -MAX_PRIME_DIFF], so we only need to do something for the bottom limb (where our modulus is negative) and the top limb.
TheCharlatan approved
TheCharlatan
commented at 8:26 pm on November 7, 2024:
contributor
ACKad67fd2e0bfa6f43f350066596b6cca146391362
Just nits, and all of them can be ignored. This was fun to review, the explanations in the safegcd_implementation.md are excellent. I also profited from the two review clubs and their notes that were done on the original MuHash introduction.
DrahtBot requested review from dergoegge
on Nov 7, 2024
theStack approved
theStack
commented at 5:21 pm on November 18, 2024:
contributor
With the friendly help of @dergoegge I managed to get differential fuzzing running last week and let that ran for the last ~77 hours. Here are the rough instructions for those who also want to give it a try:
Safegcd based modular inverse for Num3072a26ce62894
Add a fuzz test for Num3072 multiplication and inversionf5883286e3
sipa force-pushed
on Jan 9, 2025
sipa
commented at 3:12 pm on January 9, 2025:
member
Rebased, addressed a few comments, and changed some assert() to Assume().
TheCharlatan approved
TheCharlatan
commented at 8:13 pm on January 9, 2025:
contributor
Re-ACKf5883286e32b625aab3dd80c74d6adb4f37f0a80
Range-diff’ed from the last push, changes are marking functions as inline, checking asserts to Assume and adding some constants. There are some minor formatting issues, which can be fixed by running the commits through clang-format-diff.
DrahtBot requested review from theStack
on Jan 9, 2025
DrahtBot requested review from dergoegge
on Jan 9, 2025
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2025-01-22 03:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me