Add MuHash3072 implementation

fjahr commented at 7:37 pm on May 22, 2020: member

This is the first split of #18000 which implements the Muhash algorithm and uses it to calculate the UTXO set hash in gettxoutsetinfo.

DrahtBot added the label Build system on May 22, 2020

DrahtBot added the label RPC/REST/ZMQ on May 22, 2020

DrahtBot added the label Tests on May 22, 2020

DrahtBot added the label Utils/log/libs on May 22, 2020

jonatack commented at 11:38 am on May 26, 2020: member

Reviewers: This PR is being discussed in the review club at https://bitcoincore.reviews/19055

(A Review club tag for this PR would be helpful to indicate that review notes and discussion are available there.)

MarcoFalke removed the label Build system on May 26, 2020

MarcoFalke removed the label Tests on May 26, 2020

MarcoFalke removed the label Utils/log/libs on May 26, 2020

MarcoFalke added the label Review club on May 26, 2020

MarcoFalke added the label Validation on May 26, 2020

jonasschnelli commented at 12:02 pm on May 26, 2020: contributor

Some history for the context:

glozow commented at 4:24 pm on May 26, 2020: member

Concept ACK - this is so cool!

Regarding approaches (i.e. which hash function) I looked at your comparison in this doc, and have a question about the “rolling-ness” consideration and why it’s important. It also seems that ECMH (which is noted as non-rolling) performed better and has a maintainability plus since it’s part of Secp256k1.

From what I’ve gathered:

incremental hashing is efficient with arbitrary additions/deletions from a large set, at any point in the data
rolling hash functions have this same efficiency idea but are particularly optimized for adding/popping at the ends (i.e. like a “rolling” window) (Please correct me if I’m mistaken here, since you’ve definitely spent more time with this. I also understand that they aren’t mutually exclusive).

Since the UTXO set is not spent in any particular order and you can’t sort it ahead of time, I definitely see (theoretically as well as from the tangible performance wins) why an incremental hash is better, but I fail to understand why rolling is particularly important?

sipa commented at 4:45 pm on May 26, 2020: member

@gzhao408 I think there is a bit of a miscommunication. The term “rolling” here just means efficient addition as well as deletion from the set being hashed. There is no actual window, as sets are unordered. Perhaps the term was chosen badly (which would be my fault).

All of LtHash, MuHash, and ECMH support incremental addition and deletion. The “non-rolling” mention in @fjahr’s document refers to the fact that he hasn’t implemented continuous UTXO set hashing with these functions, only a from scratch computation. It has nothing to do with what these functions can or can’t do.

The primary difference between MuHash and ECMH is caching:

ECMH is more CPU time overall, but the time is mostly in computing the “effect” of a set of additions/deletions; applying that effect to the overall hash is extremely cheap. Furthermore, a 64-byte precomputed “patch” per set of additions/deletions can be created, which means a patch could be cached per-transaction, and then very cheaply applied when the transaction confirms. So it allows doing most of the work ahead of time (before the block arrives).
MuHash is cheaper overall, but the time is mostly in applying the changes to the overall hash. This means caching isn’t very useful (and also the caches would be 768 bytes, which is pretty large). If the intent is computing things in the background, or at block connection time, then this is a better (and simpler) approach.

glozow commented at 5:08 pm on May 26, 2020: member

AaaAHhhh, thank you @sipa for the clarification!

The term “rolling” here just means efficient addition as well as deletion from the set being hashed. The “non-rolling” mention in @fjahr’s document refers to the fact that he hasn’t implemented continuous UTXO set hashing with these functions

This makes a lot more sense; I had thought he meant the hash function itself is rolling/non-rolling. I didn’t have much info but was super interested and went for a textbook definition 😅 now I feel embarrassed. This information is super helpful, thanks!

fjahr commented at 6:44 pm on May 26, 2020: member

It also seems that ECMH (which is noted as non-rolling) performed better and has a maintainability plus since it’s part of Secp256k1.

Please also note that there is an implementation of ECMH that is an open pull request to Secp256k1 but it is not merged and has been stale for some time. So the work to get it merged is more or less the same and I don’t think there is a big difference in maintainability between having it in core or secp.

sipa commented at 9:14 pm on May 26, 2020: member

@fjahr I’ve written a Python implementation of MuHash3072, so it can be more easily tested: https://github.com/sipa/bitcoin/commits/202005_muhash_python (no tests are included, but I’ve verified it matches the C++ code in a few simple examples).

DrahtBot added the label Needs rebase on May 26, 2020

fjahr force-pushed on May 26, 2020

fjahr commented at 10:22 pm on May 26, 2020: member

@fjahr I’ve written a Python implementation of MuHash3072, so it can be more easily tested: sipa/bitcoin@202005_muhash_python (commits) (no tests are included, but I’ve verified it matches the C++ code in a few simple examples).

Awesome, I have just pulled it in here and will build further tests on top of it.

fjahr commented at 10:28 pm on May 26, 2020: member

Rebased and added a6cf0df104728a587283578735e7f11c5f0f2ae4 which allows the user to use the legacy hash with the use of a flag. This will definitely be squashed but I decided to keep it separate for the moment to make discussions easier in the review club tomorrow. I figured I would need to do something like this anyway in order go through a deprecation cycle with gettxoutsetinfo, the question is if that is still needed with the flag. Also added 4438aed09e87de0afb37da64ffb5e7489084e8ab which is @sipa ’s python implementation of Muhash, currently unused.

DrahtBot removed the label Needs rebase on May 26, 2020

in src/Makefile.am:409 in 4438aed09e outdated

387@@ -388,6 +388,8 @@ crypto_libbitcoin_crypto_base_a_SOURCES = \
388   crypto/hmac_sha512.h \
389   crypto/poly1305.h \
390   crypto/poly1305.cpp \
391+  crypto/muhash.cpp \
392+  crypto/muhash.h \

jonatack commented at 12:31 pm on May 27, 2020:

ced54242 nit: sort

fjahr commented at 11:37 am on May 29, 2020:

done

in src/bench/crypto_hash.cpp:172 in 4438aed09e outdated

165@@ -101,3 +166,8 @@ BENCHMARK(SipHash_32b, 40 * 1000 * 1000);
166 BENCHMARK(SHA256D64_1024, 7400);
167 BENCHMARK(FastRandom_32bit, 110 * 1000 * 1000);
168 BENCHMARK(FastRandom_1bit, 440 * 1000 * 1000);
169+
170+BENCHMARK(MuHash, 5000);
171+BENCHMARK(MuHashPrecompute, 5000);
172+BENCHMARK(MuHashAdd, 5000);
173+BENCHMARK(MuHashDiv, 100);

jonatack commented at 1:30 pm on May 27, 2020:

cda20f3 nit: perhaps sort the functions above and this list, e.g. like

 0$ src/bench/bench_bitcoin -list
 1...
 2MuHash, 5, 5000, 0, 0, 0, 0
 3MuHashAdd, 5, 5000, 0, 0, 0, 0
 4MuHashDiv, 5, 100, 0, 0, 0, 0
 5MuHashPrecompute, 5, 5000, 0, 0, 0, 0
 6
 7$ src/bench/bench_bitcoin -filter=MuHash*.*
 8# Benchmark, evals, iterations, total, min, max, median
 9MuHash, 5, 5000, 0.287426, 7.45603e-06, 1.72302e-05, 1.08339e-05
10MuHashAdd, 5, 5000, 0.243778, 7.93741e-06, 1.17234e-05, 9.61388e-06
11MuHashDiv, 5, 100, 8.62069, 0.0146344, 0.0216452, 0.0169485
12MuHashPrecompute, 5, 5000, 0.0557224, 1.55965e-06, 4.40565e-06, 1.65373e-06

fjahr commented at 11:37 am on May 29, 2020:

done

in src/bench/crypto_hash.cpp:150 in 4438aed09e outdated

145+        key[i] = randkey[i];
146+    }
147+
148+    MuHash3072 muhash = MuHash3072(key);
149+
150+    for (size_t i = 0; i < state.m_num_iters; i++) {

jonatack commented at 1:56 pm on May 27, 2020:

ced54242 and cda20f3f nit: in these four functions above ++i is preferred over i++ per doc/developer-notes.md

5d67c47 same for src/crypto/muhash.h::L71

fjahr commented at 11:37 am on May 29, 2020:

done

in src/node/coinstats.cpp:125 in 4438aed09e outdated

122+    unsigned char out[384];
123+    hash.Finalize(out);
124+    stats.hashSerialized = (TruncatedSHA512Writer(SER_DISK, 0) << out).GetHash();
125+}
126+
127+bool GetUTXOStats(CCoinsView *view, CCoinsStats &stats, const std::function<void()>& interruption_point, bool use_muhash)

jonatack commented at 2:53 pm on May 27, 2020:

a6cf0df1 could add the new bool GetUTXOStats() next to the existing one – to ease review, I moved them together.

fjahr commented at 11:37 am on May 29, 2020:

now out of scope of this PR but will reorg it in the follow-up

jonatack commented at 3:09 pm on May 27, 2020: member

Concept ACK / partial ACK 4438aed. Code review. Did not yet compare the C++ MuHash implementation to the Python one or look for other implementations to verify with. Built, ran tests, ran benchmarks. Verified that minor mutations to muhash.cpp duly broke the new unit test. Tested the new RPC gettxoutsetinfo help and boolean. Some comments:

This PR essentially makes gettxoutsetinfo too slow to be useable in testnet and mainnet (it times out and raises after 15 minutes); for that reason, until the -coinstatsindex in #18000 is merged, the MuHash algorithm should be opt-in for rpc gettxoutsetinfo and not the default
What are your plans regarding ECMH?
it looks like TruncatedSHA512Writer could use unit tests (perhaps just sanity checks if not viable to use test vectors, CHashWriter is also not tested directly but used by other unit tests: addrman, hash, sighash)
a fuzz harness would be ideal
do you plan to add tests using the just-added Python MuHash3072 implementation?
06ae4ab0 commit message s/256/512/?
04d088da commit message could mention the renaming of hash_serialized_2 to utxo_set_hash and why it was changed (I didn’t see any explanation anywhere)
04d088da and a6cf0df ought to be squashed to one commit; keeping them separate unnecessarily complicates reviewing – I squashed them locally in order to see the relevant diff, as ApplyStats() is weird enough to review already
at some point you’ll want to add a release note
some nits below; feel free to ignore

 0$ src/bench/bench_bitcoin -filter=MuHash*.*
 1# Benchmark, evals, iterations, total, min, max, median
 2MuHash, 5, 5000, 0.287426, 7.45603e-06, 1.72302e-05, 1.08339e-05
 3MuHashAdd, 5, 5000, 0.243778, 7.93741e-06, 1.17234e-05, 9.61388e-06
 4MuHashDiv, 5, 100, 8.62069, 0.0146344, 0.0216452, 0.0169485
 5MuHashPrecompute, 5, 5000, 0.0557224, 1.55965e-06, 4.40565e-06, 1.65373e-06
 6
 7$ src/bench/bench_bitcoin -filter=MuHash*.* -evals=10
 8# Benchmark, evals, iterations, total, min, max, median
 9MuHash, 10, 5000, 0.286697, 5.3029e-06, 6.24744e-06, 5.78305e-06
10MuHashAdd, 10, 5000, 0.227103, 4.1665e-06, 5.39662e-06, 4.52542e-06
11MuHashDiv, 10, 100, 12.7459, 0.0106584, 0.0182222, 0.0120859
12MuHashPrecompute, 10, 5000, 0.096089, 1.67035e-06, 2.37696e-06, 1.89207e-06
13
14$ src/bench/bench_bitcoin -filter=MuHash*.* -evals=50
15# Benchmark, evals, iterations, total, min, max, median
16MuHash, 50, 5000, 1.92384, 6.04054e-06, 1.34205e-05, 7.3487e-06
17MuHashAdd, 50, 5000, 1.26626, 3.9807e-06, 5.9453e-06, 5.61147e-06
18MuHashDiv, 50, 100, 59.2743, 0.00950589, 0.0151357, 0.011696
19MuHashPrecompute, 50, 5000, 0.386586, 1.4394e-06, 1.66618e-06, 1.55641e-06

Note to reviewers: you’ll need to build #18000 to build the index and test performance, as it contains the -coinstatsindex that this PR does not yet implement.

jonatack commented at 4:01 pm on May 27, 2020: member

Also verified that, like your added test in rpc_blockchain.py, only utxo_set_hash varies between gettxoutsetinfo in -regtest with each algo.

 0$ bitcoin-cli -regtest gettxoutsetinfo
 1{
 2  "height": 15599,
 3  "bestblock": "6e535bc570f9be1b86d21711b23391e4e8f001682b5c6243883744879cdc4f84",
 4  "transactions": 3216,
 5  "txouts": 3219,
 6  "bogosize": 234987,
 7  "utxo_set_hash": "c4926481130f6689950e07e0e1f2060277d349b106a94b83145d577d3db4d225",
 8  "disk_size": 226234,
 9  "total_amount": 14949.99998350
10}
11$ bitcoin-cli -regtest gettxoutsetinfo true
12{
13  "height": 15599,
14  "bestblock": "6e535bc570f9be1b86d21711b23391e4e8f001682b5c6243883744879cdc4f84",
15  "transactions": 3216,
16  "txouts": 3219,
17  "bogosize": 234987,
18  "utxo_set_hash": "1883e99e56927fd5e3b82ccb9e08acd1be7ed0256d4cafe142626ae2fb66ffb4",
19  "disk_size": 226234,
20  "total_amount": 14949.99998350
21}

narula commented at 5:10 pm on May 27, 2020: contributor

nit: “separate” is misspelled in the commit title of cda20f3f897ff88337756cb0c0345b41cec9014e

It might be nice to split this in two PRs: One which adds MuHash, and one which updates gettxoutsetinfo to support it. I would definitely at least rebase/squash to put all the MuHash additions first and then layer the functionality changes on top.

Edited to add thought: The next PR could add rolling as well, so this way we don’t have an interim state where gettxoutsetinfo is even slower than it is now.

DrahtBot commented at 8:04 pm on May 27, 2020: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#20828 (fuzz: Introduce CallOneOf helper to replace switch-case by MarcoFalke)
#19145 (Add hash_type MUHASH for gettxoutsetinfo by fjahr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

in src/rpc/blockchain.cpp:975 in 4438aed09e outdated

970@@ -971,7 +971,9 @@ static UniValue gettxoutsetinfo(const JSONRPCRequest& request)
971             RPCHelpMan{"gettxoutsetinfo",
972                 "\nReturns statistics about the unspent transaction output set.\n"
973                 "Note this call may take some time.\n",
974-                {},
975+                {
976+                    {"legacy_hash", RPCArg::Type::BOOL, /* default */ "false", "Whether the UTXO set hash should be calculated using the legacy algorithm (not Muhash)."},

MarcoFalke commented at 11:53 am on May 28, 2020:

0                    {"hash_type", RPCArg::Type::STR, /* default */ "hash_serialized_2", "Which UTXO set hash should be calculated. Options: 'hash_serialized_2' (the legacy algorithm), 'muhash', 'none'."},

How much time is spent on hashing with the legacy hash vs muhash vs none? Imagine someone just wants the total_amount as fast as possible.

If hashing is slow in general, maybe there should be an option to skip it?

fjahr commented at 3:54 pm on May 28, 2020:

It’s a good question but based on my tests (with commented out hashing code) the performance impact of hash_serialized2 is only small. Results were volatile but on average gettxoutsetinfo without the hash was only a few seconds faster (5s - 20s faster, an improvement of ~10%). I think it’s not really worth adding an option for that if we can have the index instead.

fjahr commented at 11:36 am on May 29, 2020:

Thinking about it a little bit more, I think it would be good to have the option for later on when we want to remove the code for hash_serialized_2. So I will add it as an option.

fjahr commented at 8:14 pm on June 5, 2020:

This is now implemented in #19145.

laanwj added this to the "Blockers" column in a project

fjahr force-pushed on May 29, 2020

jonatack commented at 12:51 pm on May 29, 2020: member

It looks like you removed a bunch of code, and the functional tests, in the last push? If yes, review effort is being thrown away, and perhaps the PR description needs to be updated.

fjahr commented at 12:53 pm on May 29, 2020: member

It looks like you removed a bunch of code, and the functional tests, in the last push? If yes, review effort is being thrown away, and perhaps the PR description needs to be updated.

Yes, I am writing an update at the moment

fjahr commented at 12:53 pm on May 29, 2020: member

nit: “separate” is misspelled in the commit title of cda20f3

done

fjahr commented at 1:03 pm on May 29, 2020: member

Thanks for all the reviews so far! I have addressed comments and, based on feedback from the PR review club and others, split it up further to make reviews more manageable and keep discussions more focussed. This now only adds the implementation of Muhash3072 and TruncatedSHA512 in C++. The next in the series is #19105 and will only add the Python implementation. A PR review club on it will give another chance to dive deeper into implementation details of Muhash3072. The third PR will add let the user use gettxoutsetinfo with Muhash, taking feedback into account (WIP). Progress of the different pull requests is now tracked in #18000 (top of the description).

fjahr commented at 1:17 pm on May 29, 2020: member

This PR essentially makes gettxoutsetinfo too slow to be useable in testnet and mainnet (it times out and raises after 15 minutes); for that reason, until the -coinstatsindex in #18000 is merged, the MuHash algorithm should be opt-in for rpc gettxoutsetinfo and not the default

Will do so in the follow-up that adds the option to the RPC.

What are your plans regarding ECMH?

I don’t have plans for it unless there is a reason to reconsider it as the algorithm of choice over Muhash. The branch with the ECMH implementation in secp256k1 is still there if people want to play with it.

it looks like TruncatedSHA512Writer could use unit tests (perhaps just sanity checks if not viable to use test vectors, CHashWriter is also not tested directly but used by other unit tests: addrman, hash, sighash)

I have laid out some thoughts on it [here]( #18000 (comment)) why it is hard to add meaningful unit tests. I am still thinking about a better solution and might add an implementation of SHA512/256. There is a single sanity check at the bottom of the Muhash test at least.

a fuzz harness would be ideal

Yeah, as I mention in my answer to Sjors, I would like to add it as a follow-up.

do you plan to add tests using the just-added Python MuHash3072 implementation?

Yes

06ae4ab commit message s/256/512/?

done

04d088d commit message could mention the renaming of hash_serialized_2 to utxo_set_hash and why it was changed (I didn’t see any explanation anywhere)

I was going back and forth on this, which is probably why it is not mentioned. Initially it was named muhash, then I thought it should be more general name that makes more sense to the user. But now my tendency is back to naming it muhash, since we want to keep hash_serialized_2 for longer and there is no guarantee that there will not be another algo used some time in the future which would mean we would probably need to rename it again in order to avoid confusion.

04d088d and a6cf0df ought to be squashed to one commit; keeping them separate unnecessarily complicates reviewing – I squashed them locally in order to see the relevant diff, as ApplyStats() is weird enough to review already

Yes, done in follow-up pr

at some point you’ll want to add a release note

yepp

fjahr renamed this:
~~Calculate UTXO set hash using Muhash~~
Add MuHash3072 implementation
on Jun 2, 2020

jnewbery commented at 6:29 pm on June 5, 2020: member

I think it’d be easier to review this code if the PR didn’t include the ASM implementation.

fjahr commented at 8:16 pm on June 5, 2020: member

I think it’d be easier to review this code if the PR didn’t include the ASM implementation.

Yeah, I can split that out easily as well.

fjahr force-pushed on Jun 5, 2020

fjahr commented at 8:24 pm on June 5, 2020: member

ASM optimizations moved to https://github.com/bitcoin/bitcoin/pull/19181

gmaxwell commented at 7:22 am on June 7, 2020: contributor

Is there a particular reason for sha512? Although for some sizes it can be faster w/ a totally naive implementation It is a lot slower on hardware with sha-ni (and presumably somewhat slower than a parallel implementation using 8-way avx sha256).

sipa commented at 4:33 pm on June 7, 2020: member

@gmaxwell I assume it’s SHA512 because of my original code used that. I’ve pointed out to @fjahr that we should benchmark if SHA256 isn’t faster these days (on x86_64, I expect it to be). I hadn’t thought of the possibility of parallellizing; that should shift things even further in favor of SHA256.

FWIW, the original reason was that for typical UTXOs, only one SHA512 compression is enough (it’d be over 55 bytes but below 119), and at the time this was first written, a SHA256 compression on x86_64 was less than twice as fast as a SHA512 one.

fjahr force-pushed on Jun 7, 2020

fjahr commented at 3:24 pm on June 8, 2020: member

I finally got around to run these benchmarks and it appears that Truncated512 is still significantly faster on my hardware (Intel(R) Core(TM) i5-6287U CPU @ 3.10GHz). But I have seen some strange benchmarks before on this machine, so it would be great if others could run them as well. I pushed the code in a new commit, I can remove it again if it is not valuable enough to keep.

SHA256 results (100 bytes):

0$ src/bench/bench_bitcoin -filter=SHA256_100b -evals=50
1# Benchmark, evals, iterations, total, min, max, median
2SHA256_100b, 50, 1000000, 38.3667, 6.93112e-07, 9.39267e-07, 7.40312e-07
3SHA256_100b, 50, 1000000, 38.4717, 6.93757e-07, 1.08095e-06, 7.47834e-07
4SHA256_100b, 50, 1000000, 40.7169, 7.07437e-07, 1.11799e-06, 8.01852e-07

TruncatedSHA512Writer results (100 bytes):

0$ src/bench/bench_bitcoin -filter=TruncatedSHA512_100b -evals=50
1# Benchmark, evals, iterations, total, min, max, median
2TruncatedSHA512_100b, 50, 1000000, 21.6684, 4.17094e-07, 4.98784e-07, 4.30721e-07
3TruncatedSHA512_100b, 50, 1000000, 22.3288, 4.17564e-07, 5.26341e-07, 4.46433e-07
4TruncatedSHA512_100b, 50, 1000000, 21.9103, 4.17407e-07, 5.02515e-07, 4.33787e-07

sipa commented at 6:04 pm on June 8, 2020: member

@fjahr It seems the benchmarking framework isn’t calling the SHA256AutoDetect() function, which would enable hardware-accelerated versions of SHA256. See #19214 for a fix.

With that fixed, I get (on my SHA-NI enabled machine):

0$ ./src/bench/bench_bitcoin -filter='.*100b' -evals=50
1# Benchmark, evals, iterations, total, min, max, median
2SHA256_100b, 50, 1000000, 4.73236, 9.27549e-08, 1.35582e-07, 9.35812e-08
3TruncatedSHA512_100b, 50, 1000000, 18.3907, 3.63877e-07, 3.88716e-07, 3.67614e-07

And on a machine without SHA-NI (but with AVX2):

0$ ./src/bench/bench_bitcoin -filter='.*100b.*' -evals=50
1# Benchmark, evals, iterations, total, min, max, median
2SHA256_100b, 50, 1000000, 30.8421, 6.10718e-07, 6.40862e-07, 6.14576e-07
3TruncatedSHA512_100b, 50, 1000000, 28.2434, 5.57851e-07, 5.87092e-07, 5.62717e-07

So it seems @gmaxwell’s intuition above was right, that non-parallel SHA256 is only preferable on SHA-NI enabled machines. If we’d consider parallellizing updates (e.g. cache up 8 added/deleted UTXOs in the MuHash object itself, and then processing them all at once), SHA256 is probably a win over SHA512 on most modern x86_64 systems.

fjahr commented at 8:41 pm on June 8, 2020: member

@sipa Thanks for your help! Then I will change the code to use SHA256 which should simplify work for reviewers as well.

Sjors commented at 10:36 am on June 9, 2020: member

Concept ACK. I’m still worried about a lack of test vectors.

It would be useful to rebase #18000 whenever individual PRs are close to merge-ready. It allows reviewers to sanity check that the end result still produces a blazing fast gettxoutsetinfo and that the index builds in reasonable time.

I’m copying the discussion about TruncatedSHA256Writer unit tests here (if only to get it in the merge commit):

I am currently a bit unsure of what the right way to go is for this. A first observation is that CHashWriter is also untested. I think both these classes are pretty thin wrappers of the actual hash functions they use, so a test might not provide much value. But TruncatedSHA256Writer does a bit more, so I think it could still be valuable to have a sanity check test. I was looking for any test vectors provided for SHA-512/256 and while there are is no official set, at least these could be enough for a sanity check. However, the problem is that for SHA-512/256 different initializers are used. We reuse SHA-512 for simplicity reasons, I guess. It’s easy enough to add the other initializers. I am just not sure it worth the additional review effort. Another fact that I stumbled over, is that the serializer used here is writing the null character of a string into the hash, where I am not sure if that is a bug but it certainly makes working with test vectors hard.

So, I am not sure if adding SHA-512/256 is worth it. And I am also unsure if a test for TruncatedSHA256Writer would be valuable without test vectors and given the only limited logic inside the class.

And your comment above:

I am still thinking about a better solution and might add an implementation of SHA512/256. There is a single sanity check at the bottom of the Muhash test at least.

The performance discussion above suggests dropping 0104b7f33230f8bde09c5a2fb1d8ef4243f2ab1d entirely, which would render the above discussion moot. I’ll wait with reviewing until that’s settled.

in src/crypto/muhash.h:62 in d11910b38f outdated

25+    static constexpr int LIMB_SIZE = 32;
26+#endif
27+    limb_type limbs[LIMBS];
28+};
29+
30+/** A class representing MuHash sets

Sjors commented at 11:07 am on June 9, 2020:

Maybe link to the paper and mailinglist discussion here:

https://cseweb.ucsd.edu/~mihir/papers/inchash.pdf (explains MuHASH in general terms)
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-May/014337.html (picks SHA512 and ChaCha20)

Sjors commented at 2:51 pm on June 9, 2020: member

We should consider switching to the latest ChaCha20 per RFC 8439, which may or may not produce a different MuHash3072. See #19225

in src/crypto/muhash.h:79 in d11910b38f outdated

41+ *
42+ * As the update operations are also associative, H(a)+H(b)+H(c)+H(d) can
43+ * in fact be computed as (H(a)+H(b)) + (H(c)+H(d)). This implies that
44+ * all of this is perfectly parallellizable: each thread can process an
45+ * arbitrary subset of the update operations, allowing them to be
46+ * efficiently combined later.

Sjors commented at 3:22 pm on June 9, 2020:

Let’s add some info about the choice of SHA256 + ChaCha20 here. It essentially compresses the item (e.g. a serialised UTXO) into 3072 bits. Because ChaCha20 doesn’t compress - it’s designed for encryption - we feed it a SHA256() hash; it then decompresses that 256 bits into 3072.

Why 3072 bits? Because it’s safe enough, according to @sipa’s mailinglist post:

0Thankfully, [6] also shows that the k-sum problem cannot be
1efficiently solved in groups in which the discrete logarithm problem
2is hard, as an efficient k-sum solver can be used to compute discrete
3logarithms. As a result, MuHash modulo a sufficiently large safe prime
4is provably secure under the DL assumption. Common guidelines on
5security parameters [7] say that 3072-bit DL has about 128 bits of
6security. A final 256-bit hash can be applied to the 3072-bit result
7without loss of security to reduce the final size.

IIUC we could apply any other digest on a serialised UTXO as long as it produces at least ~3000 bits. Since sha256 and ChaCha20 are well studied an we use them for other things, they do the job.

real-or-random commented at 10:35 am on June 10, 2020:

Because ChaCha20 doesn’t compress - it’s designed for encryption - we feed it a SHA256() hash; it then decompresses that 256 bits into 3072.

I think that argument is a little too ad-hoc for serious crypto. The MuHash paper assumes that elements are first hashed with a hash function that they model as a random oracle (RO). Now while SHA256 with fixed inputs itself is a good choice for something modeled as a RO, this is not immediately clear to me for ChaCha20-3072-bits(key=SHA256(.)), even though that looks okay.

If SHA256 is modeled as a RO and ChaCha20 as an ideal cipher, then this is probably good. And MuHash assumes a RO but AFACT then relies only on the fact that the outputs are uniformly and independently distributed over the target group (see Lemma 3.1 in the paper) and there is no RO-programming or other advanced stuff involved. And indeed, ChaCha20-3072-bits(key=SHA256(.)) should be negligibly close to this, even if we just assume that ChaCha20 is a PRF instead of an ideal cipher. But this requires some more attention and some more eyes.

Sjors commented at 1:00 pm on June 10, 2020:

I think that argument is a little too ad-hoc for serious crypto.

Completely agree. I’m trying to tease out what we know. For the purpose of an index it doesn’t matter too much. We can use MuHash to experiment with this concept, and abandon it for something better. By the time anyone proposes to use this for consensus, we’ll need a much stronger proof.

That said, there should be strongly worded warnings in the header, because other project might prematurely run with this if it “works in Bitcoin”.

Sjors commented at 4:35 pm on June 9, 2020: member

From the PR Review club minutes: https://bitcoincore.reviews/19055.html#l-195

sipa to fjahr: if you’re going to cache the hash for every block… that’s actually an argument in favor of ECMH, as the minimal “state” to keep for ECMH is 33 bytes only, while for MuHash it’s 384 bytes

And above:

The primary difference between MuHash and ECMH is caching:

ECMH is more CPU time overall, but the time is mostly in computing the “effect” of a set of additions/deletions; applying that effect to the overall hash is extremely cheap. Furthermore, a 64-byte precomputed “patch” per set of additions/deletions can be created, which means a patch could be cached per-transaction, and then very cheaply applied when the transaction confirms. So it allows doing most of the work ahead of time (before the block arrives).

MuHash is cheaper overall, but the time is mostly in applying the changes to the overall hash. This means caching isn’t very useful (and also the caches would be 768 bytes, which is pretty large). If the intent is computing things in the background, or at block connection time, then this is a better (and simpler) approach.

We should probably clarify what’s meant with “caching”.

With -coinstatsindex we store the UTXO set hash for every height. This makes the RPC blazing fast. With MuHash it uses ~500 MB for the current chain vs 39 MB with ECMH. The latter is so small I can see us turning that on by default one day (assuming ECMH calculation adds negligible overhead to IBD). But 500 MB isn’t unacceptable for an advanced user who needs gettxoutsetinfo (disclaimer: yours truly needs the index, but has no use for any hash).

When you don’t have the index enabled, MuHash is faster than ECMH, but both are unacceptably slow for anyone using the feature more than once. So in practice I expect anyone who needs this feature to turn on the index.

Assuming this feature is mostly used with an index, then ECMH seems the better option in terms of storage.

Conceptionally I find MuHash very simple and we have all the ingredients here. We might as well use it for the initial -coinstatsindex, and add ECMH later, as well as any briljant other ideas.

real-or-random commented at 12:05 pm on June 10, 2020: member

Another data point is the implementation complexity (code lines etc). If we want arithmetic incl. ASM here, this needs to be maintained whereas for ECMH the basic primitives may just be there already there in secp256k1. Not sure if that makes a large difference but it’s something to keep in mind.

fjahr commented at 1:01 pm on June 10, 2020: member

With -coinstatsindex we store the UTXO set hash for every height. This makes the RPC blazing fast. With MuHash it uses ~500 MB for the current chain vs 39 MB with ECMH. The latter is so small I can see us turning that on by default one day (assuming ECMH calculation adds negligible overhead to IBD). But 500 MB isn’t unacceptable for an advanced user who needs gettxoutsetinfo (disclaimer: yours truly needs the index, but has no use for any hash).

When you don’t have the index enabled, MuHash is faster than ECMH, but both are unacceptably slow for anyone using the feature more than once. So in practice I expect anyone who needs this feature to turn on the index.

Assuming this feature is mostly used with an index, then ECMH seems the better option in terms of storage.

There is no real difference in storage because I am not saving the MuHash3072 object for every block but the finalized hash. I didn’t even think about storage when I made that decision tbh. Intuitively this made more sense to me because if we access historical blocks to get their stats all we would do with the MuHash is finalize it anyway. So we save a little bit of CPU every time we do it. There is currently only one MuHash3072 object maintained at the tip and if there is a reorg I am rolling back the reorged blocks. This takes longer than accessing a stored MuHash3072 object of course but that trade-off seemed worth it to me even without storage because we don’t see too many reorgs these days and it’s really not that long to roll back a block and add another.

On my current mainnet node the whole indexes/coinstats folder is 83 MB large. That includes all the stats, not just the hash. Was the 500 MB number a theoretical calculation or did you see this in practice? If your folder is that large I would need to investigate.

Sjors commented at 1:10 pm on June 10, 2020: member

My calculation was based on 768 bytes of cache, but that’s off by a factor 2. A finalised MuHash3072 uses 3072 bits. So for 640,000 blocks that’s 234 MB.

I suppose you could store just the sha256 hash for historical blocks, only keep 3072 bits for the most recent block, and use that to move forward or roll back. In that case storage is the same as with ECMH. I can’t think of a use case for the full 3072 bits. Initially I was thinking of it as a set, so you potentially query it for the presence of a UTXO, but that’s not actually how it works. So perhaps there really is no point in keeping the full 3072 bits for historical blocks.

fjahr force-pushed on Jun 11, 2020

fjahr commented at 9:54 pm on June 11, 2020: member

Implemented the use of SHA256 and also added some clarification on the “set” question in the docs.

in src/bench/crypto_hash.cpp:7 in fe9ea729fc outdated

3@@ -4,6 +4,7 @@
4 
5 
6 #include <bench/bench.h>
7+#include <crypto/muhash.h>

Sjors commented at 4:49 pm on June 12, 2020:

nit fe9ea729fc67e012de773975018a0f749c03790f: might as well move all bench code to 5675d28f22b3e3745226f40aac023c0d689b5acd

fjahr commented at 10:17 am on June 14, 2020:

done

in src/hash.h:215 in fe9ea729fc outdated

211@@ -212,7 +212,6 @@ class SHA256Writer
212 
213     uint256 GetHash() {
214         uint256 result;
215-        unsigned char out[32];

Sjors commented at 4:59 pm on June 12, 2020:

unsigned char out[32]; is added in 106f9148848ffa64b433f1bc2dd6980930a4cb02 and dropped in fe9ea729fc67e012de773975018a0f749c03790f

fjahr commented at 10:17 am on June 14, 2020:

done

Sjors approved

Sjors commented at 6:06 pm on June 12, 2020: member

ACK 5675d28

Most of the code in muhash.cpp is for basic 3072 bit modular arithmetic (Num3072). It’s nice that Python handles that out of the box in #19105, which gives some confidence the c++ code is correct. I assume @sipa didn’t find a generic BigInt library that can do the job? Well, we can always split it into a library ourselves one day…

sipa commented at 6:09 pm on June 12, 2020: member

@Sjors If you mean a bigint library for the C++ side, sure - my original email points out that GMP is faster than this code, but I don’t want to add a new dependency for this. The Python code uses Python’s native bigint support.

Sjors commented at 6:18 pm on June 12, 2020: member

a bigint library for the C++ side

That’s what I meant. I agree adding all of GMP is overkill.

real-or-random commented at 7:17 pm on June 12, 2020: member

There’s some discussion about tradeoffs and here’s another small point to consider. MuHash3072 introduces cryptographic assumptions that are not used in Bitcoin yet. This is unusual but not a big thing given these are very mild assumptions (i.e., either finite field DL is hard enough or Wagner’s algorithm is the best method to find collisions here), and most importantly, this is currently just intended to be used for sanity checks. Still, everything else equal, EC-MuHash would not introduce new assumptions. If this functionality will ever be used in relation with chain validation, one should keep this in mind.

sipa commented at 1:57 am on June 13, 2020: member

@gmaxwell Solving the generalized birthday problem is provably as hard as the DL problem in the same group (but it may be much harder as well), I think. So under the assumption that DL is hard in secp256k1, then the generalized birthday problem is hard over it, and that implies collision resistance for the ECMH hash based on it.

real-or-random commented at 9:39 am on June 13, 2020: member

Sorry, some theory crypto stuff ahead…

I find the claim that the EC alternative does not introduce a new cryptographic assumption implausible: You can’t break a discrete log by finding a number of random points that you do not know the discrete log of that add to a selected point.

Oh, in the (programmable) random oracle model, you can! It’s in fact straight forward and it’s buried in the proof of Lemma 3.1 in the MuHash paper. I missed that this Lemma relies on programming the random oracle when I wrote #19055 (review). It’s not immediately obvious because they split the proof in Lemma 3.1 (reducing collisions to balance problem) and Lemma B.2 (reducing balance problem to DL). Say you’re given an algorithm C that finds collisions in MuHash, and it needs to access the prehashing function H as a random oracle. Then when C asks you for H(q_i) for some query q_i, you can draw a random r_i, set a_i = r_i*G, and reply with H(q_i) = a_i. Then you know the DL of all the a_i, and you can continue from that.

However this requires modeling the prehashing function as a programmable random oracle. In our case, this function is ChaCha20-3072-bits(key=SHA256(.)). This means that we should really have a closer look at this function. I still believe that this is as good as a random oracle if we assume SHA256 is a random oracle and ChaCha20 is an ideal cipher, and this may very well be clear to someone who has worked on random oracle indifferentiability and the ideal cipher model but I haven’t, so I’d need to do some reading or ask people who are more familiar with this stuff.

fjahr force-pushed on Jun 14, 2020

fjahr commented at 10:18 am on June 14, 2020: member

Addressed @Sjors comments.

in src/crypto/muhash.h:80 in d111135a3c outdated

73@@ -73,6 +74,13 @@ class MuHash3072
74 
75     /* Finalize into a 384-byte hash. Does not change this object's value. */
76     void Finalize(unsigned char* hash384) noexcept;
77+
78+    SERIALIZE_METHODS(MuHash3072, obj)
79+    {
80+        for(int i = 0; i < obj.LIMBS; ++i) {

promag commented at 11:37 pm on June 14, 2020:

d111135a3c928d1915f48cb4d46ca89a3d179686

nit, space after for.

fjahr commented at 2:35 pm on June 15, 2020:

done

in src/crypto/muhash.h:12 in d111135a3c outdated

 8@@ -9,6 +9,7 @@
 9 #include <config/bitcoin-config.h>
10 #endif
11 
12+#include <serialize.h>

promag commented at 11:39 pm on June 14, 2020:

d111135a3c928d1915f48cb4d46ca89a3d179686

nit, newline after, usually project headers are split from others.

fjahr commented at 2:35 pm on June 15, 2020:

done

in src/crypto/muhash.cpp:245 in 4b493677ab outdated

174+    muln2(c0, c1, MAX_PRIME_DIFF);
175+    for (int j = 0; j < Num3072::LIMBS; ++j) {
176+        add2(c0, c1, tmp.limbs[j]);
177+        extract2(c0, c1, out->limbs[j]);
178+    }
179+    assert(c1 == 0);

promag commented at 11:41 pm on June 14, 2020:

4b493677ab2c1d90cb8b11659660d18c3f61171f

Should these assertions be conditional to #ifdef DEBUG? Same in Multiply.

fjahr commented at 2:35 pm on June 15, 2020:

Makes sense to me. Added.

in src/crypto/muhash.cpp:260 in 4b493677ab outdated

227+    Multiply(&x, &x, &p[0]);
228+
229+    *out = x;
230+}
231+
232+}

promag commented at 11:41 pm on June 14, 2020:

4b493677ab2c1d90cb8b11659660d18c3f61171f

nit, add // namespace.

fjahr commented at 2:35 pm on June 15, 2020:

done

in src/hash.h:200 in 18c4d69f8d outdated

195+class SHA256Writer
196+{
197+private:
198+    CSHA256 ctx;
199+
200+    const int nType;

promag commented at 11:55 pm on June 14, 2020:

18c4d69f8d771aa596a321cf1d6c8e6e106d42a9

nit, should fix format.

fjahr commented at 2:35 pm on June 15, 2020:

done

real-or-random commented at 11:45 am on June 15, 2020: member

However this requires modeling the prehashing function as a programmable random oracle. In our case, this function is ChaCha20-3072-bits(key=SHA256(.)). This means that we should really have a closer look at this function. I still believe that this is as good as a random oracle if we assume SHA256 is a random oracle and ChaCha20 is an ideal cipher, and this may very well be clear to someone who has worked on random oracle indifferentiability and the ideal cipher model but I haven’t, so I’d need to do some reading or ask people who are more familiar with this stuff.

Okay, this is indeed pretty direct from the ideal cipher model, which is not less a stretch than the random oracle model in the end. In the ideal cipher model, the cipher is modeled as a function E(k,x), which is a perfectly random permutation for every key k, and by giving the attacker oracle access to this function (and it’s inverse). So it’s very similar to the random oracle model for some hash function H and you can play the same programming tricks. That is, if you get a query H(q_i), you can use a random H(q_i) and program E(H(q_i), 0)|| … ||E(H(x), 5) = r_i * G. (Note here that 3072/512 = 6, where 512 is the bitsize of the ChaCha permutation.) And if you get a query E(k, x) / E^-1(k, x) for some key k* you’ve never seen before, just give a random answer, the attacker will find a preimage of H that maps to k* only with negligible probability.

tl;dr (or if I lost you): If you believe in provable security of cryptographic constructions, and we compare to the common assumptions we already need for working consensus, then

ECMH (= MuHash on secp256k1) needs the additional assumption that ChaCha20 is a good PRG.
MuHash3072 needs the additional assumption that ChaCha20 is a good PRG and that either i) the finite field discrete logarithm is hard in 3072-bit fields or ii) Wagner’s algorithm is the best method to find collisions here.

As I all these assumptions are fine. However, there’s indeed a difference, so

If this functionality will ever be used in relation with chain validation, one should keep this in mind.

fjahr force-pushed on Jun 15, 2020

fjahr commented at 2:36 pm on June 15, 2020: member

Addressed @promag ’s review comments.

laanwj commented at 1:06 pm on June 18, 2020: member

That’s what I meant. I agree adding all of GMP is overkill.

Also mind that GMP is not license-compatible with bitcoin (it’s LGPL). So, that’s another stumbling block besides our desire to limit third-party dependencies.

As I all these assumptions are fine. However, there’s indeed a difference, so

I have a hard time reading how serious your concerns are. So to be clear: in your opinion, this is not a blocker for merging this?

in src/crypto/muhash.cpp:21 in 3585297e5e outdated

16+
17+/** 2^3072 - 1103717, the next largest 3072-bit safe prime number, is used as the order of the group. */
18+#define MAX_PRIME_DIFF 1103717
19+
20+/** Extract the lowest limb of [c0,c1,c2] into n, and left shift the number by 1 limb. */
21+#define extract3(c0,c1,c2,n) { \

laanwj commented at 1:07 pm on June 18, 2020:

Why are these implemented as macros instead of (inline) functions?

fjahr commented at 9:25 pm on July 1, 2020:

I am not aware of a reason why inline functions couldn’t be used here. From my side, the only reason is that this is @sipa ’s original code and I did not feel so strongly about it that it justified a bigger change. Of course, I will do it if that’s preferred by reviewers.

sipa commented at 9:39 pm on July 1, 2020:

It may have been that I copied this from code that was originally written for C.

Feel free to change it; inline functions would be far more C++ish.

real-or-random commented at 12:47 pm on June 19, 2020: member

I have a hard time reading how serious your concerns are. So to be clear: in your opinion, this is not a blocker for merging this?

Sorry if my posts are confusing. I don’t have serious concerns at all and no, I don’t believe this is a blocker.

Let me try to explain. After I looked at the details, I believe that both MuHash3072 and ECMH (= MuHash on secp256k1) are very solid choices and we do not need to worry about either. What I was pointing out is that ECMH is a slightly more conservative choice if you look at the cryptographic assumptions because ECMH is better aligned with the assumptions that we already make in Bitcoin. But this is just a very small point in all the tradeoffs discussed between MuHash3072 and ECMH, and it’s perfectly reasonable that we prefer performance over being super conservative here, in particular because MuHash is intended for gettxoutsetinfo, which serves merely as a sanity check. I believe if MuHash was planned to be used in the consensus implementation, then we might come to the conclusion that we prefer being a little more conservative over getting a little bit more performance out of it. But as far as I understand, there are currently no plans to use MuHash in consensus, so this is not relevant currently.

practicalswift commented at 1:28 pm on June 22, 2020: contributor

Could rebase on top of #19286 and add fuzzing of the MuHash3072 class to the src/test/fuzz/crypto.cpp fuzzing harness in order to demonstrate (some level of) code robustness (more specifically: absence of warnings from ASan, MSan and UBSan when processing malformed/unexpected inputs)? :)

fjahr force-pushed on Jul 1, 2020

fjahr commented at 9:06 pm on July 1, 2020: member

Could rebase on top of #19286 and add fuzzing of the MuHash3072 class to the src/test/fuzz/crypto.cpp fuzzing harness in order to demonstrate (some level of) code robustness (more specifically: absence of warnings from ASan, MSan and UBSan when processing malformed/unexpected inputs)? :)

Done. I am still learning more about fuzzing so looking forward to your feedback. :)

in src/test/fuzz/crypto.cpp:37 in 62e842c15e outdated

33@@ -33,6 +34,7 @@ void test_one_input(const std::vector<uint8_t>& buffer)
34     CSHA256 sha256;
35     CSHA512 sha512;
36     CSipHasher sip_hasher{fuzzed_data_provider.ConsumeIntegral<uint64_t>(), fuzzed_data_provider.ConsumeIntegral<uint64_t>()};
37+    MuHash3072 muhash = MuHash3072();

practicalswift commented at 9:40 pm on July 1, 2020:

Nit: MuHash3072 muhash; is enough :)

jnewbery commented at 10:00 pm on July 1, 2020:

Brevity is the soul of wit

fjahr commented at 2:40 pm on July 10, 2020:

done

in src/test/fuzz/crypto.cpp:65 in 62e842c15e outdated

59@@ -58,6 +60,14 @@ void test_one_input(const std::vector<uint8_t>& buffer)
60             (void)Hash160(data);
61             (void)Hash160(data.begin(), data.end());
62             (void)sha512.Size();
63+
64+            // MuHash3072 only accepts a fixed key length of 32 bytes
65+            data.resize(32);

practicalswift commented at 9:46 pm on July 1, 2020:

To not interfere with the other fuzzers by changing the data size, what about either creating a new case 3 specifically for this MuHash3072 code (the resize and the lines immediately below), or alternatively do something along the lines of std::vector<uint8_t> muhash_data = data; + muhash_data.resize(32);?

fjahr commented at 2:41 pm on July 10, 2020:

done, opted for your second suggestion.

in src/test/fuzz/crypto.cpp:81 in 62e842c15e outdated

76@@ -67,10 +77,11 @@ void test_one_input(const std::vector<uint8_t>& buffer)
77             (void)sha1.Reset();
78             (void)sha256.Reset();
79             (void)sha512.Reset();
80+            muhash = MuHash3072();

practicalswift commented at 9:50 pm on July 1, 2020:

Make sure you also exercise the other constructor MuHash3072::MuHash3072(const unsigned char* key32). Perhaps ConsumeBool() to choose which constructor to use?

Generally a good thing to do when testing a fuzzer is to add assert(false); to all code paths you want to reach and then fuzz/tweak/repeat until you’ve reached them all.

In this case we want to make sure that relevant code paths in the following functions are covered:

0MuHash3072::MuHash3072() noexcept
1MuHash3072::MuHash3072(const unsigned char* key32) noexcept
2void MuHash3072::Finalize(unsigned char* hash384) noexcept
3MuHash3072& MuHash3072::operator*=(const MuHash3072& x) noexcept
4MuHash3072& MuHash3072::operator/=(const MuHash3072& x) noexcept

fjahr commented at 2:43 pm on July 10, 2020:

That constructor is used for the MuHash objects that get added or removed. I checked all the public functions as you suggested and they seem to all be covered.

practicalswift changes_requested

practicalswift commented at 9:53 pm on July 1, 2020: contributor

Thanks a lot for adding a fuzzer! Great!

Fuzzing harness specific review follows :)

Sjors commented at 11:34 am on July 6, 2020: member

re-ACK 62e842c15eeb7af5d195200d6a605113f16a7e39 modulo “not interfere with the other fuzzers by changing the data size”. Complete code coverage can wait for a followup (it already covers the basics).

fjahr force-pushed on Jul 10, 2020

fjahr commented at 2:43 pm on July 10, 2020: member

Addressed @practicalswift ’s feedback and turned the macros into inline functions.

practicalswift commented at 7:03 pm on July 10, 2020: contributor

@fjahr Thanks for addressing the fuzzing feedback. The fuzzing changes look good (src/test/fuzz/crypto.cpp). (I haven’t reviewed the rest yet.)

fjahr force-pushed on Jul 11, 2020

fjahr commented at 9:23 pm on July 11, 2020: member

Replaced the first commit which added SHA256Writer with commit 284ca85e3fc4d30f60e7fbb903303b4dc935d82c from #17977 (Taproot) which provides the same functionality in fewer LOC. It uses CHashWriter instead of creating a new class, which is nice, and if this gets merged before #17977 there is one less commit to review.

in src/hash.h:143 in 00a578e9bc outdated

160      */
161     inline uint64_t GetCheapHash() {
162-        unsigned char result[CHash256::OUTPUT_SIZE];
163-        ctx.Finalize(result);
164-        return ReadLE64(result);
165+        uint256 result = GetHash();

Sjors commented at 1:06 pm on July 16, 2020:

Shouldn’t GetCheapHash be calling GetSHA256? I.e. “Cheap” refers to a single rather than a double hash. cc @JeremyRubin @sipa

JeremyRubin commented at 4:49 pm on July 24, 2020:

No because it’s a behavior change :/

If someone wants to make sure that we don’t rely on cheaphashes being consistent across boots then yes.

sipa commented at 4:52 pm on July 24, 2020:

Addrman’s bucketing relies on it.

With a new addrman version we could change it to siphash instead, which would let us get rid of GetCheapHash entirely.

Sjors commented at 1:10 pm on July 16, 2020: member

bb3098bbaf1305aa9091ee5e3cd92f2973e04c68 looks good, but the new hash writer commit does raise a question.

Sjors commented at 10:23 am on July 24, 2020: member

Let’s try to get #18071 in first (for the first commit)

laanwj commented at 12:57 pm on July 30, 2020: member

Code review ACK bb3098bbaf1305aa9091ee5e3cd92f2973e04c68 I have reviewed everything but the MuHash implementation in detail, and looked broadly at the MuHash code but could not follow all the specifics.

Let’s try to get #18071 in first (for the first commit)

Mind that there doesn’t seem to be agreement on that PR yet.

in src/crypto/muhash.cpp:215 in bb3098bbaf outdated

206+        Multiply(&p[i + 1], &p[i + 1], &p[i]);
207+    }
208+
209+    x = p[11];
210+
211+    for (int j = 0; j < 512; ++j) Square(&x, &x);

laanwj commented at 1:21 pm on July 30, 2020:

It would be nice to document here in a comment why this particular cascade is used, and how it works. It is not obvious to me.

fjahr commented at 3:36 pm on August 23, 2020:

I have added a reference to a paper that explains this algorithm. Unfortunately, I think there is no publicly accessible version of that paper that we can link to here. The paid version is here: https://link.springer.com/chapter/10.1007/978-3-540-69485-4_10.

DrahtBot added the label Needs rebase on Jul 30, 2020

fjahr force-pushed on Aug 10, 2020

DrahtBot removed the label Needs rebase on Aug 10, 2020

laanwj commented at 2:14 pm on August 20, 2020: member

Let’s try to get #18071 in first (for the first commit)

It’s closed now in favor of #19601.

fjahr force-pushed on Aug 23, 2020

fjahr force-pushed on Aug 27, 2020

fjahr commented at 9:55 am on August 27, 2020: member

Rebased since #19601 was merged.

laanwj commented at 2:35 pm on August 27, 2020: member

Code review re-ACK 24317aa5a1bd91b90e5971016e842b58f0b4ba62 Thanks for adding a comment to the Inverse function.

DrahtBot added the label Needs rebase on Sep 10, 2020

fjahr force-pushed on Sep 11, 2020

fjahr commented at 11:15 am on September 11, 2020: member

Rebased

DrahtBot removed the label Needs rebase on Sep 11, 2020

in src/crypto/muhash.h:82 in 4cb77c70d3 outdated

74@@ -73,6 +75,13 @@ class MuHash3072
75 
76     /* Finalize into a 384-byte hash. Does not change this object's value. */
77     void Finalize(unsigned char* hash384) noexcept;
78+
79+    SERIALIZE_METHODS(MuHash3072, obj)
80+    {
81+        for (int i = 0; i < obj.LIMBS; ++i) {
82+            READWRITE(obj.limbs[i]);

ajtowns commented at 2:05 am on September 16, 2020:

I think this should be obj.data.LIMBS and obj.data.limbs[i] ? Can add:

0#include <streams.h>
1...
2    CDataStream ss(SER_DISK, PROTOCOL_VERSION);
3    ss << acc;

to exercise the code.

If this is being serialized by directly dumping the limbs, is there a problem if the code that serializes it uses uint64_t limbs and the code that deserializes uses uint32_t limbs? I think we read/write everything in little endian and limbs[0] is the least-significant limb so this works out okay.

Might be good to add a test that serializing some constant ends up as a predictable constant, and deserializes back to the same thing, something like:

 0    MuHash3072 serchk = FromInt(1); serchk *= FromInt(2);
 1    std::string ser_exp = "ad5d5a19c789b21f95f9c2a09f264f8aab4ef29ca30824f7330a8c3f4c61c25a6035bc27476ade3a38f2f2bc576bfbeabdf69641a91b99631ef904d37103eff744129842b1fd83158db6466ab4b752278f04add175a7a41ab8ea1305f66855068730dd3baf1ad29d6e13c7ba3350864a8f96e2e9e5c1ca65e11ab3e361184213a3fb395ac5c86efe3997c671cc5d46df3e1b2f00bf09120ae252a0e985b2452be64d5f626c0bc03747355d91ff2aa31ea4087797b91c80af5003d4b3a2ba5c34cd4a8785712897cd2331bac1b749e4fe9d17a5bb68395e2571d1759dceb5609dfae38ec6186e8c2eb22ffeb9bb8330083ee15d4b816e117e6bc1399c707b3e9d01ae74d2f589d6eb8e9e58c572088476e0444e3dbc05c95867acdb7c18be551e0eb2189d3c39dd86dbc844dc70ac33f9328be9e3b0d792bb359498f5fb385e2d8a8e3a5cde7e3e4e8ba123d3e1eda0b859d3a56e548dd6d23a03ce82e01493f96a67abd3109f9f4315c966f8cba311d71899c04a7fd9618ed028580ffcd21263";
 2    CDataStream ss_chk(SER_DISK, PROTOCOL_VERSION);
 3    ss_chk << serchk;
 4    BOOST_CHECK_EQUAL(ser_exp, HexStr(ss_chk.str()));
 5
 6    MuHash3072 deserchk;
 7    ss_chk >> deserchk;
 8    unsigned char out2[384];
 9    serchk.Finalize(out); deserchk.Finalize(out2);
10    BOOST_CHECK_EQUAL(HexStr(out), HexStr(out2));

fjahr commented at 10:21 pm on September 21, 2020:

Thanks, this was a bug. I also added that test in an additional commit.

in src/crypto/muhash.h:66 in c540f8edc1 outdated

61+public:
62+    /* The empty set. */
63+    MuHash3072() noexcept;
64+
65+    /* A singleton with a single 32-byte key in it. */
66+    explicit MuHash3072(const unsigned char* key32) noexcept;

ajtowns commented at 2:55 am on September 16, 2020:

Might be better to accept a Span rather than having an implicit length? Also for Finalize perhaps – sha3.cpp does it that way.

fjahr commented at 10:18 pm on September 21, 2020:

Makes sense. Done.

in src/crypto/muhash.h:47 in c540f8edc1 outdated

42+ * in fact be computed as (H(a)+H(b)) + (H(c)+H(d)). This implies that
43+ * all of this is perfectly parallellizable: each thread can process an
44+ * arbitrary subset of the update operations, allowing them to be
45+ * efficiently combined later.
46+ *
47+ * This class does not enforce the use of a set as the data it represents.

ajtowns commented at 3:05 am on September 16, 2020:

Might be worth clarifying that muhash does not support checking if an element is a member of the set, and therefore it’s not possible to (efficiently) enforce that you don’t add members that are already in the set, or remove members that aren’t in the set?

Might also be worth being more explicit that the “represent the running value as a fraction” optimisation is a TODO.

fjahr commented at 10:18 pm on September 21, 2020:

Updated that part a bit, I hope this is more clear.

in src/test/crypto_tests.cpp:898 in c540f8edc1 outdated

893+
894+        MuHash3072 x = FromInt(g_insecure_rand_ctx.randbits(4)); // x=X
895+        MuHash3072 y = FromInt(g_insecure_rand_ctx.randbits(4)); // x=X, y=Y
896+        MuHash3072 z; // x=X, y=Y, z=1
897+        z *= x; // x=X, y=Y, z=X
898+        z /= y; // x=X, y=Y, z=X/Y

ajtowns commented at 3:12 am on September 16, 2020:

The headers suggest that removing things that aren’t in the set is a bad idea (“not clear if the security assumptions still hold”, “should for now only be used to represent a set of elements”) so seems a bit odd to be doing it in the tests. Could convert it to z *= x; z *= y; y *= x; z /= y; to only add things that aren’t present/remove things that are present.

I think the comment in the header doesn’t make sense though – as I understand it, it should be equally fine (in security terms even) to remove a set by saying:

0    MuHash3072 to_remove;
1    for (x : spent) { to_remove *= MuHash3072(x); }
2    set /= to_remove;

and

0    MuHash3072 changes;
1    for (x : arrived) { changes *= MuHash3072(x); }
2    for (x : spent) { changes /= MuHash3072(x); }
3    set *= changes;

even though that’s removing things from changes before they’ve been added.

fjahr commented at 10:18 pm on September 21, 2020:

Yeah, I think this makes sense. From what I have read I think there was just no research done on the security of this. I think it makes sense to be defensive but I amended the comment that this would just not the intended use. I also updated the test to follow this.

ajtowns commented at 3:20 am on September 16, 2020: member

Haven’t checked the bignum math or that the muhash impl matches the paper.

in src/crypto/muhash.cpp:114 in c540f8edc1 outdated

109+bool IsOverflow(const Num3072* d)
110+{
111+    for (int i = 1; i < Num3072::LIMBS - 1; ++i) {
112+        if (d->limbs[i] != std::numeric_limits<Num3072::limb_type>::max()) return false;
113+    }
114+    if (d->limbs[0] <= std::numeric_limits<Num3072::limb_type>::max() - MAX_PRIME_DIFF) return false;

ajtowns commented at 4:56 am on September 16, 2020:

Might as well move the d->limbs[0] case prior to the for loop.

fjahr commented at 10:17 pm on September 21, 2020:

Done

in src/crypto/muhash.cpp:111 in c540f8edc1 outdated

106+    c1 += (c0 < a) ? 1 : 0;
107+}
108+
109+bool IsOverflow(const Num3072* d)
110+{
111+    for (int i = 1; i < Num3072::LIMBS - 1; ++i) {

ajtowns commented at 5:12 am on September 16, 2020:

I think the for loop should be i = 1; i < LIMBS; ++i not LIMBS-1?

Might be clever to have data be protected rather than private so it can be initialised directly to edge cases via a subclass in the unit tests, without having to find a chacha-primage for them.

fjahr commented at 10:24 pm on September 21, 2020:

Yeah, that makes sense. I also took your suggestion and added a basic test that executes the IsOverflow function which wasn’t covered so far.

in src/crypto/muhash.cpp:162 in c540f8edc1 outdated

157+#endif
158+    /* Perform a potential third reduction. */
159+    if (c0) FullReduce(out);
160+}
161+
162+void Square(Num3072* out, const Num3072* a)

ajtowns commented at 5:28 am on September 16, 2020:

This is only ever called with out == a, so could take a single argument. Similarly for Multiply which is also always in-place.

fjahr commented at 10:17 pm on September 21, 2020:

Done

in src/crypto/muhash.cpp:15 in c540f8edc1 outdated

10+#include <assert.h>
11+#include <stdio.h>
12+
13+#include <limits>
14+
15+namespace {

ajtowns commented at 6:47 am on September 16, 2020:

Dropping all the Num3072:: prefixes might make the code a little less noisy:

0using limb_type = Num3072::limb_type;
1using double_limb_type = Num3072::double_limb_type;
2constexpr int LIMB_SIZE = Num3072::LIMB_SIZE;
3constexpr int LIMBS = Num3072::LIMBS;

(also, wouldn’t limb_t be more normal than limb_type?)

Probably overkill, but maybe consider checking the Num3072 constants make sense:

0static_assert(LIMB_SIZE * LIMBS == 3072, "Num3072 isn't 3072 bits");
1static_assert(sizeof(double_limb_type) == sizeof(limb_type) * 2, "bad size for double_limb_type");
2static_assert(sizeof(limb_type) * 8 == LIMB_SIZE, "LIMB_SIZE is incorrect");
3
4// hard coded values in MuHash3072 constructor and Finalize
5static_assert(sizeof(limb_type) == 4 || sizeof(limb_type) == 8, "bad size for limb_type");

fjahr commented at 10:17 pm on September 21, 2020:

I think all these suggestions are sensible. Added.

ajtowns changes_requested

ajtowns commented at 7:14 am on September 16, 2020: member

Didn’t fully dig into Multiply() and Square(), but they seem reasonable. The referenced paper for Inverse() seems to be behind a paywall, and there’s more exponentiation than I can follow unguided, so no review of that, but the fact the test cases pass seems promising at least.

I think there’s a bug in both IsOverflow and serialization (see above); everything else is nits. Would be nice to have more test vectors checking that the c++ and python code end up with the same results. ACK otherwise.

DrahtBot added the label Needs rebase on Sep 16, 2020

fjahr force-pushed on Sep 21, 2020

DrahtBot removed the label Needs rebase on Sep 21, 2020

fjahr commented at 10:32 pm on September 21, 2020: member

Thanks a lot for the review @ajtowns . I hope I have addressed all of your feedback.

Would be nice to have more test vectors checking that the c++ and python code end up with the same results. ACK otherwise.

There are more tests like this coming in the follow-up #19145 . Unless you disagree I would prefer to keep the scope of this PR were it is :)

in src/crypto/muhash.cpp:210 in ef53ca76cc outdated

205+
206+void Inverse(Num3072* out, const Num3072* a)
207+{
208+    // For fast exponentiation a sliding window exponentiation with repunit
209+    // precomputation is utilized. See "Fast Point Decompression for Standard
210+    // Elliptic Curves" (Brumley, Järvinen, 2008).

ajtowns commented at 3:30 pm on September 22, 2020:

Confirmed that this is calculating out = a ** (2**3072 - MAX_PRIME_DIFF - 2) by noting that all the operations are multiplicative and thus you can take the log of everything in base a, giving:

 0>>> p = [2**(2**i)-1 for i in range(12)]
 1>>> def double_n_add(x, n, pn):
 2...     for j in range(n): x *= 2
 3...     return x + p[pn]
 4... 
 5>>> x = p[11]
 6>>> x = double_n_add(x, 512, 9)
 7>>> x = double_n_add(x, 256, 8)
 8>>> x = double_n_add(x, 128, 7)
 9>>> x = double_n_add(x,  64, 6)
10>>> x = double_n_add(x,  32, 5)
11>>> x = double_n_add(x,   8, 3)
12>>> x = double_n_add(x,   2, 1)
13>>> x = double_n_add(x,   1, 0)
14>>> x = double_n_add(x,   5, 2)
15>>> x = double_n_add(x,   3, 0)
16>>> x = double_n_add(x,   2, 0)
17>>> x = double_n_add(x,   4, 0)
18>>> x = double_n_add(x,   4, 1)
19>>> x = double_n_add(x,   3, 0)
20>>> x == 2**3072 - 1103717 - 2
21True

jnewbery commented at 4:56 pm on October 18, 2020:

This is really helpful, and makes it easier to understand what the algorithm is doing.

I wonder if converting the c++ code to use a DOUBLE_N_ADD() macro and adding a comment on how to audit the numbers would make it easier to follow the code.

jnewbery commented at 4:57 pm on October 18, 2020:

Is there anywhere to get hold of this paper, or a write-up of this technique?

fjahr commented at 10:42 pm on October 18, 2020:

I did spend quite some time looking for something but the paper is only available behind a paywall and a friend told me there are other sources but that we might not want to link to those. I didn’t find any summaries or so on the topic so if we want more details I would need to write something up myself I think.

ajtowns commented at 2:01 am on October 19, 2020:

The C++ code would be square_n_mul. Just having an inline function to calculate x**(2**n) – ie the “square n times” part would probably already make it easier to follow.

ajtowns commented at 2:07 am on October 19, 2020:

This is just an explanation of how the constants were chosen in the first place. I think it’s only interesting in the same way references in an academic paper help trace the lineage of an idea rather than actually useful for reviewing the code as it stands – ie, handy if you want to reinvent it, eg if 3072 bits aren’t enough and you need to pick a new prime and hence new constants for efficient inversion, or if you want to see if you can find different constants that would involve even fewer square/multiply operations to get the same result. But not needed if you just want to check correctness.

jnewbery commented at 9:17 am on October 19, 2020:

Agree. I don’t think there’s anything more needed here.

elichai commented at 3:44 pm on October 22, 2020:

@sipa When safegcd? ;)

elichai commented at 3:59 pm on October 22, 2020:

The paper if anyone wants: https://sci-hub.do/10.1007/978-3-540-69485-4_10 as for the technique AFAIU(didn’t review the actual code here) it’s simply fermat little theorem (a^p-2=1/a) together with a simple square-and-multiply algorithm (in elliptic curves it’s called double-and-add) Good references: https://en.wikipedia.org/wiki/Exponentiation_by_squaring https://briansmith.org/ecc-inversion-addition-chains-01

sipa commented at 4:04 pm on October 22, 2020:

@elichai That’s the easy part. It’s specifically using a technique here to construct an efficient exponentiation ladder for an exponent with many consecutive 1s.

elichai commented at 4:07 pm on October 22, 2020:

@sipa you’re right. I ignored how you came up with that specific ladder because I didn’t actually review the ladder itself (ops)

in src/test/fuzz/crypto.cpp:136 in f9b3c4bf06 outdated

134@@ -122,6 +135,9 @@ void test_one_input(const std::vector<uint8_t>& buffer)
135             case 9: {
136                 data.resize(SHA3_256::OUTPUT_SIZE);
137                 sha3.Finalize(data);
138+            case 10: {

ajtowns commented at 4:18 pm on September 22, 2020:

Missing } for case 9.

fjahr commented at 0:03 am on October 3, 2020:

done

in src/test/fuzz/crypto.cpp:140 in f9b3c4bf06 outdated

134@@ -122,6 +135,9 @@ void test_one_input(const std::vector<uint8_t>& buffer)
135             case 9: {
136                 data.resize(SHA3_256::OUTPUT_SIZE);
137                 sha3.Finalize(data);
138+            case 10: {
139+                data.resize(384);
140+                muhash.Finalize(data.data());

ajtowns commented at 4:18 pm on September 22, 2020:

Finalize(data)

fjahr commented at 0:02 am on October 3, 2020:

done

in src/test/fuzz/crypto.cpp:73 in f9b3c4bf06 outdated

68+            muhash_data.resize(32);
69+
70+            if (fuzzed_data_provider.ConsumeBool()) {
71+                muhash *= MuHash3072(muhash_data.data());
72+            } else {
73+                muhash /= MuHash3072(muhash_data.data());

ajtowns commented at 4:18 pm on September 22, 2020:

MuHash3072(muhash_data)

fjahr commented at 0:02 am on October 3, 2020:

done

ajtowns commented at 4:20 pm on September 22, 2020: member

Looks good, fuzzer tests need fixing though.

ajtowns approved

fjahr force-pushed on Oct 3, 2020

fjahr commented at 0:02 am on October 3, 2020: member

Pushed fuzz test fixes, thanks @ajtowns !

jnewbery removed the label Validation on Oct 16, 2020

in src/crypto/muhash.h:16 in e19e500134 outdated

11+
12+#include <serialize.h>
13+
14+#include <stdint.h>
15+
16+struct Num3072 {

jnewbery commented at 2:31 pm on October 18, 2020:

It seems a shame to make Num3072 part of the public interface just so it can be used in a single test in crypt_tests.cpp. Is there a way the overflow test can be written without using the Num3072 class?

fjahr commented at 10:41 pm on October 18, 2020:

I have to think a little more about to how solve that best…

fjahr commented at 9:52 pm on November 25, 2020:

Finally had the (seemingly obvious) idea to solve this with a serialized MuHash3072 object.

in src/crypto/muhash.h:32 in e19e500134 outdated

27+#endif
28+    limb_t limbs[LIMBS];
29+};
30+
31+/** 2^3072 - 1103717, the next largest 3072-bit safe prime number, is used as the order of the group. */
32+constexpr Num3072::limb_t MAX_PRIME_DIFF = 1103717;

jnewbery commented at 2:32 pm on October 18, 2020:

This doesn’t need to be in the public interface and can be in the .cpp file’s unnamed namespace.

fjahr commented at 10:41 pm on October 18, 2020:

Done

in src/crypto/muhash.h:68 in e19e500134 outdated

63+ * https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-May/014337.html.
64+ */
65+class MuHash3072
66+{
67+protected:
68+    Num3072 data;

jnewbery commented at 2:32 pm on October 18, 2020:

Any reason not to use the project’s style guide and name member variables with an m_ prefix?

fjahr commented at 10:41 pm on October 18, 2020:

Nope, done

in src/crypto/muhash.h:71 in e19e500134 outdated

66+{
67+protected:
68+    Num3072 data;
69+
70+    static constexpr size_t INPUT_SIZE = 32;
71+    static constexpr size_t OUTPUT_SIZE = 384;

jnewbery commented at 2:33 pm on October 18, 2020:

Again, it’d be nice if these weren’t exposed in the header file and were moved to the cpp file’s unnamed namespace.

fjahr commented at 10:41 pm on October 18, 2020:

Done

in src/crypto/muhash.h:123 in e19e500134 outdated

84+    MuHash3072& operator/=(const MuHash3072& sub) noexcept;
85+
86+    /* Finalize into a 384-byte hash. Does not change this object's value. */
87+    void Finalize(Span<unsigned char> hash384) noexcept;
88+
89+    SERIALIZE_METHODS(MuHash3072, obj)

jnewbery commented at 2:56 pm on October 18, 2020:

Is this platform independent (ie between HAVE___INT128 and !HAVE___INT128 platforms)? I guess so because the unit tests are asserting exact serializations.

fjahr commented at 10:42 pm on October 18, 2020:

I believe so, I definitely forced either option on my system at some point.

Sjors commented at 2:45 pm on December 17, 2020:

#20315 dropped our big endian Travis instance, but that shouldn’t be a problem. Took me a while to wrap my head around… Each limb is serialised as little endian, because serialize.h converts uint32_t and uint64_t to little endian, using htole32 / htole64. We also treat the collection of limbs as little endian, with the first limb having the least significant digit. This means we can serialise 32 bit limbs on one machine and deserialise the result as 64 bit limbs on another machine.

(in reply to #19055 (review))

in src/crypto/muhash.h:26 in e19e500134 outdated

12+#include <serialize.h>
13+
14+#include <stdint.h>
15+
16+struct Num3072 {
17+#ifdef HAVE___INT128

jnewbery commented at 3:00 pm on October 18, 2020:

Has the performance benefit of using __int128 been measured? My instinct would be to remove the platform-dependent code, including all the typedefs and static_asserts. There are benefits to having just a single implementation (simpler, smaller code), and the performance of the mushash implementation isn’t critical, since it’ll only be used in indexing and the rpc interface.

fjahr commented at 10:42 pm on October 18, 2020:

I don’t remember benchmarking this, will do so

sipa commented at 10:56 pm on October 18, 2020:

I expect a 4x speed difference approximately. Being able to use 64-bit multiplication hardware makes a huge difference.

fjahr commented at 10:15 pm on October 19, 2020:

Benchmarks on my machine with and without __int128 (values hardcoded instead) show a significant speedup:

With __int128:

0|               ns/op |                op/s |    err% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------:|:----------
2|            3,313.23 |          301,820.21 |    0.1% |      0.00 | `MuHash`
3|            2,544.00 |          393,081.76 |    0.1% |      0.00 | `MuHashAdd`
4|        5,385,923.00 |              185.67 |    1.7% |      0.06 | `MuHashDiv`
5|              752.12 |        1,329,566.23 |    0.1% |      0.00 | `MuHashPrecompute`

Without __int128:

0|               ns/op |                op/s |    err% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------:|:----------
2|           10,692.00 |           93,527.87 |    0.1% |      0.00 | `MuHash`
3|           10,279.00 |           97,285.73 |    0.2% |      0.00 | `MuHashAdd`
4|       27,921,963.00 |               35.81 |    2.2% |      0.32 | `MuHashDiv`
5|              931.21 |        1,073,873.55 |    2.6% |      0.00 | `MuHashPrecompute`

jnewbery commented at 9:59 am on October 21, 2020:

That’s almost exactly a 4x speedup for add and >5x speedup for div. Seems worthwhile!

in src/crypto/muhash.h:115 in e19e500134 outdated

88+
89+    SERIALIZE_METHODS(MuHash3072, obj)
90+    {
91+        for (int i = 0; i < obj.data.LIMBS; ++i) {
92+            READWRITE(obj.data.limbs[i]);
93+        }

jnewbery commented at 3:47 pm on October 18, 2020:

Consider using a range based loop:

0        for (auto& limb : obj.data.limbs) {
1            READWRITE(limb);
2        }

fjahr commented at 10:42 pm on October 18, 2020:

Done

in src/crypto/muhash.cpp:119 in e19e500134 outdated

114+{
115+    c0 += a;
116+    c1 += (c0 < a) ? 1 : 0;
117+}
118+
119+bool IsOverflow(const Num3072* d)

jnewbery commented at 3:52 pm on October 18, 2020:

All of these functions (IsOverflow(), FullReduce(), Multiply(), Square(), Inverse()) take pointers, but they’re always called with non-null ptrs and there’s no null checking in the function body. Consider changing the signatures to take references to better express that they must be called with an actual object.

fjahr commented at 10:42 pm on October 18, 2020:

Done

in src/crypto/muhash.h:57 in e19e500134 outdated

52+ * arbitrary subset of the update operations, allowing them to be
53+ * efficiently combined later.
54+ *
55+ * Muhash does not support checking if an element is already part of the
56+ * set. That is why this class does not enforce the use of a set as the
57+ * data it represents because there is no efficient way to do so..

jnewbery commented at 3:56 pm on October 18, 2020:

s/to do so../to do so./ (remove double .)

fjahr commented at 10:42 pm on October 18, 2020:

Done

in src/crypto/muhash.cpp:294 in e19e500134 outdated

289+    }
290+}
291+
292+MuHash3072& MuHash3072::operator*=(const MuHash3072& x) noexcept
293+{
294+    Multiply(&this->data, &x.data);

jnewbery commented at 4:27 pm on October 18, 2020:

Why not implicit this?

0    Multiply(&data, &x.data);

This would be even clearer if Multiply took references and data was renamed m_data:

Multiply(m_data, x.m_data);

Same below in the /=() operator

fjahr commented at 10:42 pm on October 18, 2020:

Done

in src/crypto/muhash.h:75 in e19e500134 outdated

76+
77+    /* A singleton with a single 32-byte key in it. */
78+    explicit MuHash3072(Span<const unsigned char> key32) noexcept;
79+
80+    /* Multiply (resulting in a hash for the union of the sets) */
81+    MuHash3072& operator*=(const MuHash3072& add) noexcept;

jnewbery commented at 4:38 pm on October 18, 2020:

It seems odd that the argument to the multiplication operator is called add and the argument to the division operator is called sub. Stick to either multiplicative or additive terminology!

fjahr commented at 10:42 pm on October 18, 2020:

I prefer the additive terminology personally and I thought they were both using it: add and subtract? What would you prefer? rem for remove?

jnewbery commented at 7:06 am on October 19, 2020:

I’d prefer mul and div for the multiplier and divisor operators. Alternatively change the operators to += and -=.

fjahr commented at 9:55 pm on October 19, 2020:

Hm, yeah, with the operators in mind that naming makes more sense. Changed to mul and div.

in src/crypto/muhash.cpp:86 in e19e500134 outdated

81+    limb_t tl, th;
82+    {
83+        double_limb_t t = (double_limb_t)a * b;
84+        th = t >> LIMB_SIZE;
85+        tl = t;
86+    }

jnewbery commented at 6:14 pm on October 18, 2020:

What are these indentations for? I think it’ll compile to the same thing without them, and this just makes the code harder to read. There are similar blocks in muldbladd3() and Multiply().

fjahr commented at 10:43 pm on October 18, 2020:

Removed them

fjahr force-pushed on Oct 18, 2020

fjahr commented at 10:45 pm on October 18, 2020: member

Thanks for the feedback @jnewbery . I implemented most comments while there are still some I need to spend some more time on (including #19055 (review) which I can not comment on for some reason).

in src/crypto/muhash.cpp:21 in b03e923c53 outdated

16+
17+/** 2^3072 - 1103717, the next largest 3072-bit safe prime number, is used as the order of the group. */
18+constexpr Num3072::limb_t MAX_PRIME_DIFF = 1103717;
19+
20+static constexpr size_t INPUT_SIZE = 32;
21+static constexpr size_t OUTPUT_SIZE = 384;

jnewbery commented at 9:13 am on October 19, 2020:

No need to mark these as static. Being in the unnamed namespace already means that they don’t have external linkage.

fjahr commented at 9:55 pm on October 19, 2020:

Done

in src/crypto/muhash.cpp:87 in b03e923c53 outdated

82+}
83+
84+/** [c0,c1,c2] += a * b */
85+inline void muladd3(limb_t& c0, limb_t& c1, limb_t& c2, const limb_t& a, const limb_t& b)
86+{
87+    limb_t tl, th;

jnewbery commented at 9:13 am on October 19, 2020:

No need to declare these variables up here. Just declare them where you need them. Same in other functions below.

fjahr commented at 9:55 pm on October 19, 2020:

Done

jnewbery commented at 7:05 am on October 20, 2020:

Same in other functions below.

Looks like you missed these.

fjahr commented at 3:42 pm on October 21, 2020:

Initally yes, should be done now.

in src/crypto/muhash.cpp:308 in b03e923c53 outdated

303+
304+MuHash3072& MuHash3072::operator/=(const MuHash3072& x) noexcept
305+{
306+    Num3072 tmp;
307+    Inverse(tmp, x.m_data);
308+    Multiply(this->m_data, tmp);

jnewbery commented at 9:18 am on October 19, 2020:

Implicit this:

0    Multiply(m_data, tmp);

fjahr commented at 9:55 pm on October 19, 2020:

Done

in src/crypto/muhash.cpp:219 in b03e923c53 outdated

214+    // For fast exponentiation a sliding window exponentiation with repunit
215+    // precomputation is utilized. See "Fast Point Decompression for Standard
216+    // Elliptic Curves" (Brumley, Järvinen, 2008).
217+
218+    Num3072 p[12]; // p[i] = a^(2^(2^i)-1)
219+    Num3072 x;

jnewbery commented at 9:21 am on October 19, 2020:

Why is this temporary needed? Can we just construct out here?

Changing the function signature to Num3072 Inverse(const Num3072& in) would use RVO and avoid copying x to out.

fjahr commented at 9:55 pm on October 19, 2020:

Done

in src/crypto/muhash.cpp:204 in b03e923c53 outdated

199+    muln2(c0, c1, MAX_PRIME_DIFF);
200+    for (int j = 0; j < LIMBS; ++j) {
201+        add2(c0, c1, tmp.limbs[j]);
202+        extract2(c0, c1, in_out.limbs[j]);
203+    }
204+#ifdef DEBUG

jnewbery commented at 9:27 am on October 19, 2020:

We don’t usually compile out asserts in our code (in fact, I can’t see any other examples in the codebase). I suggest removing this Ifdef.

fjahr commented at 9:55 pm on October 19, 2020:

Done

in src/crypto/muhash.cpp:117 in b03e923c53 outdated

114+    c1 += th;
115+    c2 += (c1 < th) ? 1 : 0;
116+}
117+
118+/** [c0,c1] += a */
119+inline void add2(limb_t& c0, limb_t& c1, limb_t& a)

jnewbery commented at 9:40 am on October 19, 2020:

add2() is always followed by extract2(). Does it make sense to refactor these two functions into a single function?

fjahr commented at 9:55 pm on October 19, 2020:

Hm, making an add_and_extract2() to use it in 3 places doesn’t feel like an improvement to me TBH.

jnewbery commented at 7:41 am on October 20, 2020:

My suggestion was to merge add2() and extract2() since they’re always called as a pair.

fjahr commented at 10:32 pm on November 8, 2020:

Done

in src/crypto/muhash.h:72 in b03e923c53 outdated

67+public:
68+    /* The empty set. */
69+    MuHash3072() noexcept;
70+
71+    /* A singleton with a single 32-byte key in it. */
72+    explicit MuHash3072(Span<const unsigned char> key32) noexcept;

jnewbery commented at 3:10 pm on October 19, 2020:

Did you consider making the argument to the constructor and Finalize() functions references to array rather than spans? Doing so reduces flexibility (eg you wouldn’t be able to Finalize into a vector, although I can’t imagine you’d ever want to do that) but increases safety (you can enforce the inputs at compile time rather than asserting at runtime).

elichai commented at 3:42 pm on October 22, 2020:

FWIW the “real span” can do std::span<const uint8_t, 32> to force the size possibly at compile time. (maybe we should look into extending our span impl to support that)

jnewbery commented at 5:49 pm on October 22, 2020:

Interesting. I wasn’t aware of that. Does that mean it can’t be initialized with a vector (since it’s not possible to know the size at compile time)?

sipa commented at 11:07 pm on October 22, 2020:

It can, but only explicitly. So you’d be able to write MuHash3072(std::span<const unsigned char, 32>(vec)) but not MuHash3072(vec).

jnewbery commented at 7:38 am on October 23, 2020:

I think I prefer that interface. It forces the client code to explicitly set the span size. Of course they can still create a span from a too-small vector, but that’s a client bug rather than an unexpected assert in the library.

In any case, my recommendation would be to make the MuHash class take any stream input and do the truncated SHA512 inside the class (https://github.com/bitcoin/bitcoin/pull/19055#discussion_r510232044) which would remove the assertion on input length.

fjahr force-pushed on Oct 19, 2020

in src/crypto/muhash.cpp:13 in b29a0fde87 outdated

 8+#include <crypto/common.h>
 9+
10+#include <assert.h>
11+#include <stdio.h>
12+
13+#include <limits>

jnewbery commented at 7:19 am on October 20, 2020:

No need for a blank line here. These are all standard library includes and can be grouped together (as well as using the C++ header names):

0#include <cassert>
1#include <cstdio>
2#include <limits>

fjahr commented at 8:27 pm on October 20, 2020:

Done

in src/crypto/muhash.cpp:17 in b29a0fde87 outdated

12+
13+#include <limits>
14+
15+namespace {
16+
17+/** 2^3072 - 1103717, the next largest 3072-bit safe prime number, is used as the order of the group. */

jnewbery commented at 7:19 am on October 20, 2020:

’next’ from what? Is this just the largest 3072 bit safe prime number?

fjahr commented at 8:27 pm on October 20, 2020:

Weird, I don’t know how that made it in there, removed next

in src/crypto/muhash.cpp:117 in b29a0fde87 outdated

112+    c1 += th;
113+    c2 += (c1 < th) ? 1 : 0;
114+}
115+
116+/** [c0,c1] += a */
117+inline void add2(limb_t& c0, limb_t& c1, limb_t& a)

jnewbery commented at 7:39 am on October 20, 2020:

limb_t& a can be const, to indicate that it’s an in-param (as is done for the in-params in the other helper functions)

fjahr commented at 8:26 pm on October 20, 2020:

Done

in src/crypto/muhash.cpp:101 in b29a0fde87 outdated

115+
116+/** [c0,c1] += a */
117+inline void add2(limb_t& c0, limb_t& c1, limb_t& a)
118+{
119+    c0 += a;
120+    c1 += (c0 < a) ? 1 : 0;

jnewbery commented at 7:41 am on October 20, 2020:

I think this function can overflow. If c1 is limb_t::max, and c0 + a overflows, then c1 + 1 overflows and the top bit is lost.

fjahr commented at 9:52 pm on November 25, 2020:

I think that is correct. At least I could not find evidence for the opposite. The way it is used it can only happen in Multiply() and Square() and only on the first limb. I think I have fixed this with a temporary internal c2 variable that catches this overflow if it happens.

in src/crypto/muhash.cpp:81 in b29a0fde87 outdated

76+    double_limb_t t = (double_limb_t)c0 * n;
77+    c0 = t;
78+    t >>= LIMB_SIZE;
79+    t += (double_limb_t)c1 * n;
80+    c1 = t;
81+    t >>= LIMB_SIZE;

jnewbery commented at 7:53 am on October 20, 2020:

t is a local variable so is dropped immediately after this bitshift.

fjahr commented at 8:26 pm on October 20, 2020:

Removed

jnewbery commented at 7:57 am on October 20, 2020: member

Many of the helper functions (mulnadd3, muln2, muladd3, muldbladd3 add add2) can overflow. I think in (very unlikely) scenarios, that’d cause the MuHash addition and subtraction to be incorrect.

fjahr force-pushed on Oct 20, 2020

in src/crypto/muhash.cpp:119 in 96a8d80f7b outdated

114+{
115+    c0 += a;
116+    c1 += (c0 < a) ? 1 : 0;
117+}
118+
119+bool IsOverflow(const Num3072& d)

jnewbery commented at 9:29 am on October 21, 2020:

This function name is slightly confusing. This isn’t returning whether the number has overflown the range that can be expressed in a Num3072. It’s returning whether the number d is greater than the group order g and can therefore be reduced to a congruent value < g. I’d suggest renaming the function to IsReduceable() or CanReduce(), and commenting what the function is doing.

real-or-random commented at 5:21 pm on October 21, 2020:

Hm, we use the same “overflow” terminology in secp256k1. The idea is that Num3072 represents a number in the range 0 to group order, so when it’s larger it has overflown this range. (Idk, this may be a matter of taste…)

“IsReduceable()” sounds also confusing to me. You can reduce a reduced number, it just won’t change.

jnewbery commented at 7:37 am on October 22, 2020:

Ah, I didn’t realise that ‘overflow’ was the terminology used for “d is larger than the modulus”. I suppose IsNotLeastResidue() would be most accurate, but as long as the terminology overflow is used consistently, I think just a comment is enough here.

fjahr commented at 10:32 pm on November 8, 2020:

Added a comment for clarification.

in src/crypto/muhash.cpp:128 in 96a8d80f7b outdated

123+        if (d.limbs[i] != std::numeric_limits<limb_t>::max()) return false;
124+    }
125+    return true;
126+}
127+
128+void FullReduce(Num3072& d)

jnewbery commented at 9:47 am on October 21, 2020:

This could also use a comment. We call this function in two places:

After IsOverflow where x ∈ [g, MAX_NUM3072). In this caseFullReduce(x) adds MAX_PRIME_DIFF to x and overflows Num3072, so FullReduce(x) = x + MAX_PRIME_DIFF - MAX_NUM3072 = x - g, which is congruent to x and is in [0, g)
After if (c0) i.e. where x has overflown and x = MAX_NUM3072 + d. In this case FullReduce(d) adds MAX_PRIME_DIFF to d, so x = MAX_NUM3072 + FullReduce(d) - MAX_PRIME_DIFF, which means that FullReduce(d) and x are congruent. I think there may be a bug here if d ∈ [g, MAX_NUM3072), i.e. if the FullReduce() function overflows. That would make x = MAX_NUM3072 + FullReduce(d) - MAX_PRIME_DIFF + MAX_NUM3072 so x and FullReduce(d) are not congruent.

fjahr commented at 9:52 pm on November 25, 2020:

If I got this right, 2. basically means we can have if (c0) or if (IsOverflow()) or both, and in the case of both there is a problem. If I understood that correctly I have addressed this in Square() and Multiply() with an additional comment.

in src/crypto/muhash.cpp:154 in 96a8d80f7b outdated

149+        for (int i = 0; i < j + 1; ++i) muladd3(c0, c1, c2, in_out.limbs[i], a.limbs[j - i]);
150+        extract3(c0, c1, c2, tmp.limbs[j]);
151+    }
152+
153+    /* Compute limb N-1 of a*b into tmp. */
154+    limb_t c2 = 0;

jnewbery commented at 9:51 am on October 21, 2020:

slightly confusing to have c0 and c1 declared at the top of the function, and c2 declared both in the for block above here and again here. Perhaps move this declaration to the top too and reuse the variable. c2 is always zero after extract3(), but you could assert that here to be clear.

fjahr commented at 10:31 pm on November 8, 2020:

Done

in src/crypto/muhash.cpp:188 in 96a8d80f7b outdated

183+        for (int i = 0; i < (j + 1) / 2; ++i) muldbladd3(c0, c1, c2, in_out.limbs[i], in_out.limbs[j - i]);
184+        if ((j + 1) & 1) muladd3(c0, c1, c2, in_out.limbs[(j + 1) / 2], in_out.limbs[j - (j + 1) / 2]);
185+        extract3(c0, c1, c2, tmp.limbs[j]);
186+    }
187+
188+    limb_t c2 = 0;

jnewbery commented at 9:52 am on October 21, 2020:

Same as above. Any reason not to declare this c2 at the top and reuse the variable?

fjahr commented at 10:31 pm on November 8, 2020:

Done

in src/bench/crypto_hash.cpp:121 in 96a8d80f7b outdated

116+        key[0] = ++i;
117+        acc *= MuHash3072(key);
118+    });
119+}
120+
121+static void MuHashAdd(benchmark::Bench& bench)

jnewbery commented at 10:05 am on October 21, 2020:

Change this to MuHashMul()?

ajtowns commented at 7:49 am on October 28, 2020:

“multiply” and “divide” (or mul/div) for the maths, and “insert” and “remove” for higher level (multi)set-like operations might make sense. “Add” definitely seems confusing.

jnewbery commented at 8:42 am on October 28, 2020:

My preferred naming for the operations is here: #19055 (review) (mul/div for Num3072, add/sub for MuHash), but I mostly just want it to be consistent.

fjahr commented at 10:31 pm on November 8, 2020:

Done

in src/bench/crypto_hash.cpp:111 in 96a8d80f7b outdated

105@@ -105,6 +106,70 @@ static void FastRandom_1bit(benchmark::Bench& bench)
106     });
107 }
108 
109+static void MuHash(benchmark::Bench& bench)
110+{
111+    FastRandomContext rng(true);

jnewbery commented at 10:05 am on October 21, 2020:

rng is unused. Remove it.

fjahr commented at 10:30 pm on November 8, 2020:

Done

in src/bench/crypto_hash.cpp:140 in 96a8d80f7b outdated

147+    MuHash3072 muhash = MuHash3072(key);
148+
149+    for (size_t i = 0; i < bench.epochIterations(); ++i) {
150+        acc *= muhash;
151+    }
152+

jnewbery commented at 10:08 am on October 21, 2020:

I don’t think this is needed. There’s nothing wrong with dividing by elements that aren’t in the muhash.

fjahr commented at 10:30 pm on November 8, 2020:

This is true but it’s not how we intend to use it. I have left this as is. Otherwise, it might be inconsistent with the docs and might give the wrong impression on the usage.

in src/bench/crypto_hash.cpp:131 in 96a8d80f7b outdated

126+    std::vector<unsigned char> randkey = rng.randbytes(32);
127+    for (size_t i = 0; i < randkey.size(); ++i) {
128+        key[i] = randkey[i];
129+    }
130+
131+    MuHash3072 muhash = MuHash3072(key);

jnewbery commented at 10:19 am on October 21, 2020:

The temporary array and vector variables are unnecessary here. The MuHash constructor takes a span, so we can just call it directly with the rvalue vector from randbytes().

0    MuHash3072 acc;
1    FastRandomContext rng(true);
2    MuHash3072 muhash{rng.randbytes(32)};

Same in the other functions below.

fjahr commented at 10:27 pm on November 8, 2020:

Done

in src/test/crypto_tests.cpp:933 in 96a8d80f7b outdated

926+    serchk.Finalize(out);
927+    deserchk.Finalize(out2);
928+    BOOST_CHECK_EQUAL(HexStr(out), HexStr(out2));
929+
930+    // Test MuHash3072 overflow
931+    class MuHashOverflow : public MuHash3072 {

jnewbery commented at 10:23 am on October 21, 2020:

Again, this isn’t actually overflowing the data type in MuHash. It’s simply reducing a number greater than the group order into its least residue. Consider renaming this function/class.

fjahr commented at 10:27 pm on November 8, 2020:

Since the overflow naming is consistent with secp256k1 I kept it but extended the comment for clarification.

jonatack commented at 4:01 pm on October 21, 2020: member

Did a fair amount of testing and review of this on May 27 and plan to re-review after the 0.21 branch-off.

in src/crypto/muhash.cpp:16 in 96a8d80f7b outdated

11+#include <cstdio>
12+#include <limits>
13+
14+namespace {
15+
16+/** 2^3072 - 1103717, the largest 3072-bit safe prime number, is used as the order of the group. */

jnewbery commented at 7:17 pm on October 21, 2020:

s/order of the group/modulus/

(the order of the group is 2^3072 - 1103717 - 1, since 0 isn’t in the multiplicative group)

fjahr commented at 10:25 pm on November 8, 2020:

Done

in src/crypto/muhash.h:91 in 96a8d80f7b outdated

57+ * is intended to represent a set of elements.
58+ *
59+ * See also https://cseweb.ucsd.edu/~mihir/papers/inchash.pdf and
60+ * https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-May/014337.html.
61+ */
62+class MuHash3072

jnewbery commented at 2:58 pm on October 22, 2020:

This does seem like a strange interface: both high-level (the user doesn’t need to know/worry about the chacha20 operation) and low-level (the user needs to apply a truncated SHA-512 to all inputs, track numerator and denominator when doing a bulk update, and SHA256 the finalized 3072-bit output). I think it makes sense to refactor this so that there are two classes:

a Num3072 class that is simply a 3072 bit wide integer that supports setting and reading limbs and multiplication/division modulo p.
a MuHash3072 class that internally has a numerator and denominator, accepts arbitrary length inputs, supports set addition/subraction, and finalizes to a 256 bit digest.

See https://bitcoincore.reviews/19055-2#l-159.

fjahr commented at 10:43 pm on November 8, 2020:

I have refactored the code to have clearer boundaries between the two classes. Let me know if this matches your expectation.

fjahr force-pushed on Nov 8, 2020

fjahr commented at 10:44 pm on November 8, 2020: member

Pushed a fair amount of fixes and refactoring but still working on some of the overflow questions and another fuzz test.

jnewbery commented at 10:41 am on November 19, 2020: member

I’m going to move this out of high priority while you’re working on the branch. It can go back in once it’s ready for review again.

jnewbery removed this from the "Blockers" column in a project

fjahr force-pushed on Nov 25, 2020

fjahr commented at 9:53 pm on November 25, 2020: member

This is finally ready for review again. I have added a fuzz test for consistency testing which I am running currently.

laanwj added this to the "Blockers" column in a project

DrahtBot added the label Needs rebase on Dec 15, 2020

fjahr force-pushed on Dec 16, 2020

fjahr commented at 11:20 pm on December 16, 2020: member

Rebased and pushed some new comment I seem to have forgotten last time.

DrahtBot removed the label Needs rebase on Dec 17, 2020

in configure.ac:757 in 515ee0d0ca outdated

752@@ -753,6 +753,9 @@ if test x$use_lcov_branch != xno; then
753   AC_SUBST(LCOV_OPTS, "$LCOV_OPTS --rc lcov_branch_coverage=1")
754 fi
755 
756+dnl Check for __int128
757+AC_CHECK_TYPES([__int128])

Sjors commented at 1:27 pm on December 17, 2020:

515ee0d0ca91bfdf6a9cd844824e747291efabe9: build system stuff could warrant a separate commit. It gets picked up on my macOS machine, so that’s good.

I you end up splitting the commit, you might as well also introduce MuHash3072 in its own commit.

fjahr commented at 0:56 am on December 22, 2020:

Thanks, I have split up the large commit into four separate ones and melted the other unit test commit with one of them. I think this will help with reviews.

in src/crypto/muhash.h:48 in 515ee0d0ca outdated

43+
44+    // Hard coded values in MuHash3072 constructor and Finalize
45+    static_assert(sizeof(limb_t) == 4 || sizeof(limb_t) == 8, "bad size for limb_t");
46+
47+    void Multiply(const Num3072& a);
48+    void Divide(Num3072& a);

Sjors commented at 2:21 pm on December 17, 2020:

I suggest either making a const (like Multiply) or, if it hurts performance, Divide private with MuHash3072 as its friend.

fjahr commented at 0:55 am on December 22, 2020:

I do a reduction on a if it’s needed so when it’s const I would need an intermediary variable. But at least this should not happen too often so I implemented this although it complicates the code a little bit so I am still undecided if it’s an improvement.

Sjors commented at 11:28 am on December 22, 2020:

It should make it less likely for a future user of Num3072 to shoot themselves in the foot, by calling Divide(a) and using a afterwards.

Sjors commented at 3:51 pm on December 17, 2020: member

Partial review as of 0c8e885. Don’t forgot to update the Python tests so it matches again.

Perhaps Num3072 can have SERIALIZE_METHODS (and a test), so that MuHash3072 can serialise its numerator and denominator without poking into Num3072 internals.

fjahr commented at 11:21 pm on December 17, 2020: member

@Sjors Thanks, I updated the Python test, will address your other comments soon.

build: Check for 128 bit integer support

Used in MuHash3072 implementation.

Co-authored-by: Pieter Wuille <pieter.wuille@gmail.com>

589f958662

crypto: Add Num3072 implementation

Num3072 is a specialized bignum implementation used in MuHash3072.

Co-authored-by: Pieter Wuille <pieter.wuille@gmail.com>

0b4d290bf5

crypto: Add MuHash3072 implementation

Co-authored-by: Pieter Wuille <pieter.wuille@gmail.com>

adc708c98d

test: Add MuHash3072 unit tests

Co-authored-by: Pieter Wuille <pieter.wuille@gmail.com>
Co-authored-by: Anthony Towns <aj@erisian.com.au>

7b1242229d

bench: Add Muhash benchmarks

Co-authored-by: Pieter Wuille <pieter.wuille@gmail.com>

c122527385

test: Add MuHash3072 fuzz test b111410914

fuzz: Add MuHash consistency fuzz test 01297fb3ca

test: Change MuHash Python implementation to match cpp version again 9815332d51

in src/crypto/muhash.cpp:316 in 515ee0d0ca outdated

298+    }
299+
300+    out = (CHashWriter(SER_DISK, 0) << data).GetSHA256();
301+}
302+
303+MuHash3072& MuHash3072::operator*=(const MuHash3072& mul) noexcept

sipa commented at 9:49 pm on December 18, 2020:

If the “MuHash3072” type is an encapsulation of a set-hash state, then perhaps it’s better to give this name “Add” or “Union” (with a comment explaining the implications of it having multiset semantics)?

Also, using this with a pattern of “accumulator *= MuHash3072{data}” will involve two multiplications, but the one with the denominator is always 1. Perhaps it’s better to have “Insert” and “Remove” functions (separate from Add/Union and Subtract) that take a Span of a single element to add/remove?

elichai commented at 8:12 am on December 19, 2020:

Yep, I’m actually implementing a similar accumulator in Go and I’m using Add/Remove that both do a single multiplication and Combine that combines 2 “accumulators” and does 2 multiplications. also serializing/deserializing can be quite helpful (probably will work similar to finalize, by dividing, normalizing and outputting the numerator)

FWIW I also tested replacing hash+Chacha20 with a XOF hash (blake2b) and it was slower so I’m sticking with hash+Chacha20

fjahr commented at 0:55 am on December 22, 2020:

@jnewbery and @ajtowns didn’t like “Add” (https://github.com/bitcoin/bitcoin/pull/19055#discussion_r509153295) and personally I am not sure about “Union” yet so I have kept the namings of *= and /= for now but I have added Insert and Remove to insert and remove single elements with a single multiplication as suggested.

fjahr force-pushed on Dec 22, 2020

fjahr commented at 0:57 am on December 22, 2020: member

Addressed the latest comments.

Perhaps Num3072 can have SERIALIZE_METHODS (and a test), so that MuHash3072 can serialise its numerator and denominator without poking into Num3072 internals.

Done

Limpisey168 commented at 10:57 am on December 22, 2020: none

So good

in src/crypto/muhash.h:126 in 9815332d51

121+    void Finalize(uint256& out) noexcept;
122+
123+    SERIALIZE_METHODS(MuHash3072, obj)
124+    {
125+        READWRITE(obj.m_numerator);
126+        READWRITE(obj.m_denominator);

elichai commented at 11:03 am on December 22, 2020:

Is this really what we want? or do we want it to normalize it (like Finalize does) and then just serialize the numerator?

Sjors commented at 11:26 am on December 22, 2020:

That might be a good idea. If we want to use MuHash in combination with assumeutxo then the shorter the (marginally) better.

fjahr commented at 9:36 pm on December 22, 2020:

Right, that would be nicer and I had implemented it like that a few weeks ago but decided to kick it out again because code-wise it felt awkward to force normalization before every serialization. Maybe there is a better way to do it than what I tried code-wise. I don’t think I found another example of where something comparable is done IIRC. Overall, normalizing is an expensive operation that would need to run quite for every serialization to result in a 50% space-saving on disk. However, for CoinstatsIndex I only save a single MuHash3072 to disk at any time, so at least for that, it would only save 384 bytes total. That’s why I took the easy way and kept it as it is now.

I can try to revive my old branch and push it for the sake of discussion. But I am not sure why it makes a difference for assumeutxo, wouldn’t we use the finalized 32 byte hash for that as well?

elichai commented at 10:55 am on December 23, 2020:

it depends, if we ever add a commitment to this in the coinbase or something like that then we might want to also save the MuHash in the undoBlocks. But I think there’s no point in discussing this now, so IMO you can leave it as-is

jnewbery deleted a comment on Dec 23, 2020

laanwj approved

laanwj commented at 4:48 pm on January 7, 2021: member

Code review ACK 9815332d5158d69a94abeaf465a2c07bd8e43359

laanwj merged this on Jan 7, 2021

laanwj closed this on Jan 7, 2021

laanwj removed this from the "Blockers" column in a project

sidhujag referenced this in commit d93f2b4943 on Jan 7, 2021

in src/crypto/muhash.cpp:126 in 9815332d51

121+    in_out.Multiply(mul);
122+}
123+
124+} // namespace
125+
126+/** Indicates wether d is larger than the modulus. */

hebasto commented at 6:18 pm on January 21, 2021:

typo: wether -> whether

fjahr commented at 9:43 pm on January 21, 2021:

Thanks! Included a fix in #19145

martinus referenced this in commit bc50aebd8d on Sep 18, 2021

martinus referenced this in commit 83cad97f37 on Sep 18, 2021

martinus referenced this in commit 524382c27c on Sep 18, 2021

martinus referenced this in commit c4c2113e72 on Sep 20, 2021

Fabcien referenced this in commit 8986cfad54 on Sep 21, 2021

martinus referenced this in commit 468b232f71 on Sep 21, 2021

rebroad referenced this in commit 97bbb73143 on Oct 13, 2021

kittywhiskers referenced this in commit 24d63c2715 on Feb 26, 2022

kittywhiskers referenced this in commit c599c26069 on Feb 26, 2022

kittywhiskers referenced this in commit e5a81c5fd4 on Feb 26, 2022

Fabcien referenced this in commit 465584c0e5 on Mar 14, 2022

kittywhiskers referenced this in commit 9656cb4510 on Apr 3, 2022

kittywhiskers referenced this in commit e81e7c4394 on Apr 20, 2022

kittywhiskers referenced this in commit 3f63f084b7 on Apr 24, 2022

kittywhiskers referenced this in commit c7eb44a911 on Apr 27, 2022

DrahtBot locked this on Aug 16, 2022

Add MuHash3072 implementation #19055

Conflicts