fjahr
commented at 11:07 pm on January 24, 2020:
member
This PR will not be updated anymore because the project is now split up into multiple pull requests. But I will use it to keep track of the projects PRs:
This implements an index of coin statistics with the goal of making the response time of the gettxoutsetinfo RPC call dramatically faster. Currently, this RPC is scanning the full UTXO set every time it is called which makes it hard to use for users that want to continually check the coin supply or compare UTXO set hashes between different nodes. It is especially challenging in periods of multiple quickly mined blocks, even relatively fast machines.
Implementation overview:
The current serialization of the UTXO set for the purpose of hashing is changed based on sipa’s concept in proposed on the mailing list in 2017 and #10434
The hashing algorithm in for the UTXO set is changed to Muhash, which was also implemented by sipa
CoinStatsIndex is added which keeps an index of all values that would require gettxoutsetinfo to scan the UTXO set
CoinStatsIndex can be activated through the flag -coinstatsindex
Todos/Open questions:
The transactions count is currently not implemented as it seems not knowable ex-post without running a full IBD to build up the index. I am looking for a solution to this. Ideas are:
Require txindex and the transactions count from it
Remove transactions count or mark it as unreliable in a different way if coinstatsindex is enabled
The transactions count question is also interesting because it seems to be the only obstacle to providing access to historical coin statistics, which may be a nice follow-up feature if transaction counts can be ignored somehow
More benchmarking to evaluate potential switch from Muhash to ECMH as the hashing algorithm
Edit: 3c5c1ca should probably be squashed into the prior commits but I wanted to leave sipa’s commits unchanged for the start.
fanquake added the label
UTXO Db and Indexes
on Jan 24, 2020
DrahtBot
commented at 0:37 am on January 25, 2020:
member
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Conflicts
Reviewers, this pull request conflicts with the following ones:
#18795 (Test: wallet issue with orphaned rewards by domob1812)
#18354 (Protect wallet by using shared pointers by bvbfan)
#17786 (refactor: Nuke policy/fees->mempool circular dependencies by hebasto)
#13462 (Make SER_GETHASH implicit for CHashWriter and SerializeHash by Empact)
#10443 (Add fee_est tool for debugging fee estimation code by ryanofsky)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
fjahr force-pushed
on Jan 25, 2020
fjahr force-pushed
on Jan 26, 2020
fjahr force-pushed
on Jan 26, 2020
fjahr force-pushed
on Jan 26, 2020
fjahr force-pushed
on Jan 26, 2020
fjahr force-pushed
on Jan 27, 2020
emilengler
commented at 5:27 pm on January 27, 2020:
contributor
I’m reviewing and testing it right now but from some benchmarks I did, I saw how the time goes down by 50%. However the performance increases by 19% percent.
Regtest results only right now
On master (ef8e2cee9f5d157eeb3139b64e9c3a5fa4bf36f3):
0$ time src/bitcoin-cli --regtest gettxoutsetinfo
1...
2> src/bitcoin-cli --regtest gettxoutsetinfo 0,00s user 0,00s system 23% cpu 0,014 total
With this PR
0$ time src/bitcoin-cli --regtest gettxoutsetinfo
1...
2> src/bitcoin-cli --regtest gettxoutsetinfo 0,00s user 0,00s system 42% cpu 0,007 total
ryanofsky
commented at 8:03 pm on January 27, 2020:
member
Can you say more about current use cases of the gettxoutsetinfo function and maybe future use cases if it’s significantly faster with this change?
sipa
commented at 9:18 pm on January 27, 2020:
member
I think it’s perfectly fine to drop the “transactions” statistics. It doesn’t really mean anything; it just happened to be easy to compute pre-pertxout, but even now it’s mostly a hack.
fjahr
commented at 11:40 pm on January 27, 2020:
member
@emilengler Thanks for giving it a try! But I think the benchmarks you are comparing are not well suited in this case. I assume you are running a standard regtest network, maybe generated a few hundred blocks and made some transactions by hand. In this case the UTXO set is very small and iterating over it will be extremely fast, so that it would not surprise me if there is no measurable upside to using the index (maybe it will be even a little slower as you saw). But the goal of this index is that the time of the call will stay constant for a realistically sized UTXO set. So a better indicator would be to try it on a synced mainnet or testnet node, where gettxoutset currently takes several minutes without the index, even on pretty high powered hardware.
@ryanofsky Sure, I will go along the different statistics the call provides to highlight which ones I think are more or less important:
total_amount The total amount of the current bitcoin supply. Tracking this number on a continual basis allows users to detect potential inflation bugs, such as CVE-2018-17144 (Example: Bitmex Research). I think all users should be given the chance to continually check this, even on more affordable hardware. I am currently running a mainnet node on an Odroid HC2 and gettxoutsetinfo takes 7min 50s. Specs are roughly comparable to a Raspberry Pi 4, so it can keep up well with the chain otherwise. If I wanted to run a continual inflation detection on this hardware, I would not be able to do so when the UTXO set has grown another 20-25% (ignoring hashrate fluctuations).
hash_serialized_2 The serialized hash of the UTXO set can be used for a quick comparison of the UTXO set between different notes. Essentially, it can be a stand-in for the latest blockhash and compared for sanity checks. BTCPayServer has been using it for their FastSync feature, which will probably be replaced with assume_utxo, where a UTXO set hash (not necessarily this one) will play a role as well. I still need to sync up with @jamesob about the latest plans for the hash and how this proposal can be compatible with his or even help/support it.
txouts The total number of unspent outputs is an interesting number to track from an operations standpoint. Users can see if coins are being consolidated or at which rate the UTXO set is growing. Example: satoshi.info
bogosize and disk_size might be also interesting to track but not as actionable, disc usage is probably much more efficient to collect from the OS
transactions it seems to be a nice-to-have metric that does not serve any real purpose afaict, as @sipa also noted
height and bestblock can be ignored since they are available from other RPCs, but are necessary to compare numbers between different machines
For future use cases (aside from integration with assume_utxo potentially) I don’t have specific ideas yet, but it opens the door to track any other statistics of the UTXO set or block content in the index. So it may collect further metrics of interest where users currently parse the block data with their own custom software.
emilengler
commented at 7:39 pm on January 28, 2020:
contributor
@fjahr I’m a bit confused now, I ran the tests again on a full chain and master is MUCH faster than this Pull Request (4 MINUTES faster)
On master (2755b2b1092d0286022cf3cc3028e96f6bee2b34)
0$ time src/bitcoin-cli --datadir=/media/emil/Files/Bitcoin/ gettxoutsetinfo
1{
2 "height": 614943,
3 "bestblock": "0000000000000000000fa03a2ce4c6d7fb763e55402deac32371b898e8faba5e",
4 "transactions": 38747934,
5 "txouts": 65245740,
6 "bogosize": 4903711195,
7 "hash_serialized_2": "011d3eccd148b68a3bf039f0695bb4d77a23ba208dcd4d8ac073d867fd5bbef6",
8 "disk_size": 4035082036,
9 "total_amount": 18186617.32125282
10}
11src/bitcoin-cli --datadir=/media/emil/Files/Bitcoin/ gettxoutsetinfo 0,00s user 0,01s system 0% cpu 57,428 total
On fjahr:utxo-stats-index-rebase (663dbfbd1da60e91c9576629ace82b712d31668c)
0$ time src/bitcoin-cli --datadir=/media/emil/Files/Bitcoin/ gettxoutsetinfo
1{
2 "height": 614937,
3 "bestblock": "00000000000000000005973a14ac338b681e410922c7566e435107db1b1d807b",
4 "transactions": 38745882,
5 "txouts": 65244661,
6 "bogosize": 4903634025,
7 "utxo_set_hash": "87d3e7f807d6b428bf60e983a73137ef410cb510de1b5f4e650e081d044fc772",
8 "disk_size": 4082447626,
9 "total_amount": 18186542.32125282
10}
11src/bitcoin-cli --datadir=/media/emil/Files/Bitcoin/ gettxoutsetinfo 0,01s user 0,00s system 0% cpu 5:01,42 total
ryanofsky
commented at 7:22 pm on January 29, 2020:
member
@ryanofsky Sure, I will go along the different statistics the call provides to highlight which ones I think are more or less important:
Thanks, this information is really clarifying. I wonder if some of it could be incorporated in the reference documentation (without mentioning specific CVE’s or software projects), or maybe if it would be useful as part of some wiki. In any case, this definitely helps motivate the PR, so concept ACK from me
fjahr
commented at 5:53 pm on January 30, 2020:
member
@emilengler did you actually start bitcoind with -coinstatsindex? I mention it briefly in the PR description, but as @ryanofsky already pointed out as well, I need to do a much better job at documenting the intent and current state of the proposal. It seems so because otherwise you could not test it that quickly, building up the index takes a quite long time and while the index is building up gettxoutsetinfo is responding with “unable to read UTXO set”. In fact, on my machine, it is currently so slow that I might have to address it. But running a plain IBD is also incredibly slow right now so it could also be a local issue. What is probably happening is that you are running bitcoind without the flag and so the stats are still calculated by iterating over the UTXO set but the hash is using the new hashing algorithm (Muhash), which is slower and results in slower performance when not using the index.
emilengler
commented at 6:55 pm on January 30, 2020:
contributor
@emilengler did you actually start bitcoind with -coinstatsindex?
no, sorry oversaw that
emilengler
commented at 8:22 pm on January 30, 2020:
contributor
Hmmm I get this error message now:
0error code: -32603
1error message:
2Unable to read UTXO set
the chain state is about ~2 days old
fjahr
commented at 8:30 pm on January 30, 2020:
member
Hmmm I get this error message now:
0error code: -32603
1error message:
2Unable to read UTXO set
the chain state is about ~2 days old
Yeah, that means the index is syncing at the moment. You should see the progress in the logs of bitcoind. Something like Syncing coinstatsindex with block chain from height XXXXX.
in
src/node/coinstats.cpp:58
in
b5f112cb0aoutdated
ariard
commented at 2:25 am on February 25, 2020:
member
benchmark: have you run gettxoutsetinfo with-and-without this PR on few UTXO sets across the years to check than performance holds linearly with utxo set growth (or a least give an intuition of it) ?
assume-utxo: As security model of assume-utxo lays on public audit of a hardcoded hash of a utxo set, lowering the bar to compute it will increase review process trustworthiness with a higher number of participants. So IMO this point is worthy enough to Concept ACK this proposal (though @jamesob may already have work here?)
Quickly skimmed over the new code, it’s clean enough if we want later to memory-isolate indexes in their own space. Don’t have opinion on either MuHash or ECMH (but no lattice).
fjahr force-pushed
on Mar 6, 2020
fjahr marked this as ready for review
on Mar 6, 2020
fjahr renamed this:
[WIP] Coin Statistics Index
Index for Coin Statistics
on Mar 6, 2020
fjahr
commented at 11:16 pm on March 6, 2020:
member
First of all, thanks for taking early looks and comments @ariard , @ryanofsky, @emilengler , @jnewbery (offline). Your comments should be mostly addressed in the latest code changes and in further information below. With the latest fixes and performance improvements, this should be now ready for real review and testing.
Short recap of changes (this changed slightly)
Introduces MuHash3072 and TruncatedSHA256Writer implementations (code by Pieter Wuille)
Uses above for MuHash hashing alogrithm for the UTXO set hash in gettxoutsetinfo
Removed tracking of the transactions count statistic
Introduces CoinStatsIndex class, an index for all the statistics in gettxoutsetinfo
Allows activating the index using the flag -coinstatsindex. When activated gettxoutsetinfo responds within seconds after the initial sync phase. Otherwise it works as before but will be slower to parse the UTXO set due to the change in the hashing algorithm.
Benchmarking
For reference, on my benchmarking server and my local machine, IBD takes around 18-19 hours
I can provide more detailed numbers if required but so far I am pretty confident that it’s not a bottleneck even for users on under-powered nodes.
How long does the sync of the CoinStatsIndex take?
On my benchmarking server syncing the Coinstatsindex after IBD took between 8.5 and 9.5h hours
On my local machine, I also synced testnet and it took 8h 20min.
How about the growth of the UTXO set (ariard’s question above)?
For the sync time of the index should actually not make much of a difference if the UTXO set shrinks or grows, it should rather grow linearly with the size of the blockchain.
Instead, growth of the UTXO set causes gettxoutsetinfo without the index to run longer because it iterates over it every time and so it is one of the reasons why I think the index is worth integrating into core.
TBD
Although its a lot of LOC I think the changes are still ok to be reviewed as one PR but if people would prefer me to split it into two or three smaller PRs that is easily doable.
The newly introduced metric coins count that is only used in Assume UTXO is not in the index and I don’t think there is a practical way to integrate it (similar problem to transaction count). That means it will probably not be possible to have this index running and create a new assume utxo set on the same node at the same time. I will need to discuss this with @jamesob to make sure this dealt with correctly one Assume UTXO is finished.
fjahr force-pushed
on Mar 7, 2020
fjahr
commented at 0:08 am on March 7, 2020:
member
Rebased
fjahr renamed this:
Index for Coin Statistics
Index for UTXO Set Statistics
on Mar 7, 2020
fjahr force-pushed
on Mar 15, 2020
fjahr force-pushed
on Mar 16, 2020
fjahr force-pushed
on Mar 16, 2020
fjahr force-pushed
on Mar 16, 2020
fjahr force-pushed
on Mar 17, 2020
fjahr force-pushed
on Mar 19, 2020
fjahr force-pushed
on Mar 20, 2020
fjahr force-pushed
on Mar 20, 2020
fjahr force-pushed
on Mar 20, 2020
fjahr force-pushed
on Mar 20, 2020
fjahr force-pushed
on Mar 21, 2020
fjahr force-pushed
on Mar 21, 2020
fjahr
commented at 6:33 pm on March 21, 2020:
member
Latest changes fixed/clarified lots of comments/documentation and finally made Travis happy (when it runs again).
laanwj added this to the "Chasing Concept ACK" column in a project
DrahtBot added the label
Needs rebase
on Mar 31, 2020
fjahr force-pushed
on Apr 2, 2020
fjahr
commented at 11:57 am on April 2, 2020:
member
Rebased.
DrahtBot removed the label
Needs rebase
on Apr 2, 2020
DrahtBot added the label
Needs rebase
on Apr 15, 2020
fjahr force-pushed
on Apr 15, 2020
fjahr
commented at 8:20 pm on April 15, 2020:
member
Removed the last commit which was just a small test improvement that became redundant after a recently merged change.
DrahtBot removed the label
Needs rebase
on Apr 15, 2020
DrahtBot added the label
Needs rebase
on Apr 17, 2020
pierreN
commented at 1:38 am on April 18, 2020:
contributor
Nice PR. I just tested this on my laptop (slow SSD, i7-6700HQ). No issue to report.
On master, gettxoutsetinfo takes 55s to complete. On this branch less than a second.
The coinstatsindex thread took around 7h30mn to complete.
fjahr force-pushed
on Apr 19, 2020
fjahr
commented at 3:29 pm on April 19, 2020:
member
In d22951b3f8d79b5ec2069961c084c8d27b6b13fa: this may be a good time to explain what coin.nHeight * 2 + coin.fCoinBase is good for. I guess it reserves 1 bit to indicate if this is a coinbase, but why do we care about that? Maybe BIP30 related?
Hm, it’s a good question. I have not questioned it beyond the fact that it’s a neat way to document height and coinbase in one uint32. fCoinbase doesn’t make a difference for BIP30 because those were all coinbases and the duplicated txs are filtered earlier already. So I don’t see a reason why we would need the information but if we can have it without taking up any extra space, why not? I think based on the commit message here I would guess the reasoning is that this information should be added to hash and it was added in one to minimize change: https://github.com/bitcoin/bitcoin/commit/d342424301013ec47dc146a4beb49d5c9319d80a
in
src/index/coinstatsindex.h:25
in
8e17009dccoutdated
in
src/index/coinstatsindex.cpp:159
in
8e17009dccoutdated
154+ // Add the new utxos created from the block
155+ for (size_t i = 0; i < block.vtx.size(); ++i) {
156+ const auto& tx = block.vtx.at(i);
157+
158+ // Skip duplicate txid coinbase transactions (BIP30).
159+ if (is_bip30_block && tx->IsCoinBase()) {
If we wouldn’t skip these we would get a different hash from the index than we get from the non-index nodes because these outputs are also not in the UTXO set which the non-index nodes use to calculate the hash. My goal is that the hash is the same with or without the index.
in
src/index/coinstatsindex.cpp:169
in
8e17009dccoutdated
Yes but the coinbase does not have any undo data because it is not spending any outputs. This section is only dealing with the undo data of the transactions in the block, it is not dealing with reorgs etc. which are further below.
in
src/index/coinstatsindex.cpp:349
in
8e17009dccoutdated
I came up with do_muhash because I needed the contrary term to undo_muhash. Since we are talking about rolling back a block that has ‘do’ (utxo created in the block) and ‘undo’ data (utxos destroyed in the block). I was thinking if I should reverse these terms when a block is removed or keep the terms the same as they were when the block was added. For me it seemed less confusing to keep the naming the same as when the block was added. So ‘do’ data that was added when the block is added still called ‘do’ data when the block is rolled back, just that the ‘do’ data is removed from the utxo set this time. Either way I probably confusing so I will keep adding comments to make this more clear.
I should add that I only need these intermediate values because division is much slower than multiplication. Since these operations are cummutative I can multiply all the values that need to be removed from the muhash and then remove them all with just one division.
in
src/index/coinstatsindex.cpp:197
in
8e17009dccoutdated
It is the intermediate muhash for all the utxos that need to be removed/undone. See also my longer explanation for do_muhash.
Sjors
commented at 3:14 pm on April 26, 2020:
member
Concept ACK.
Is there any reference document for MuHash3072? If so, it would be good to link to that in the header file. In particular, does the (single) test vector exist anywhere else?
The mailinglist post you mentioned at the top also contains useful background.
This particular part of the post might be worth copying (leaving only the MuHash part):
Interestingly, both ECMH and MuHash not only support adding set
elements in any order but also deleting in any order. As a result, we
can simply maintain a running sum for the UTXO set as a whole, and
add/subtract when creating/spending an output in it. In the case of
MuHash it is slightly more complicated, as computing an inverse is
relatively expensive. This can be solved by representing the running
value as a fraction, and multiplying created elements into the
numerator and spent elements into the denominator. Only when the final
hash is desired, a single modular inverse and multiplication is needed
to combine the two.
As the update operations are also associative, H(a)+H(b)+H(c)+H(d) can
in fact be computed as (H(a)+H(b)) + (H(c)+H(d)). This implies that
all of this is perfectly parallellizable: each thread can process an
arbitrary subset of the update operations, allowing them to be
efficiently combined later.
I can’t speak to the cryptographic assumptions underlying MuHash, but at the least within the scope of this PR it’s not used for anything security critical. As the mailinglist discussion pointed out, this feature is a good way to gain more experience with this type of hashing in a non consensus-critical manner.
I also can’t say anything intelligent about the implementation, other than that it’s nice and short. Maybe @practicalswift can add a fuzz harness?
b208dfd5712bdc658217c84d276d82e92cb2cf47 Add TruncatedSHA256Writer needs a test.
I think you can safely squash f59d2223199d26ef591d9a3fc3282b19a4cb833a into the original and make yourself Co-Authored-By? You’re not changing the algorithm.
I’m currently running cf6ba169ae1999a98db831900d36e0571a9d933d and building a mainnet index on macOS 10.15.4. I’ll update my comment with an example result from a recent block. I ran the first 100K blocks with --enable-debug, stopped and continued without debug.
Concept ACK on dropping transaction count from gettxoutsetinfo. “The number of transactions with unspent outputs” seems a useless metric. If someone does have a use case for it, hopefully we’ll learn before the release. In that case it could be added to getblock.
I got very confused by the indexer code in 8e17009dcc4de977ced030e624b18ecc6ba19e63, sorry for any dumb questions…
If it’s not too big a code change, adding a height argument to gettxoutsetinfo would unleash the power of this index, and also make it easier to compare hashes between reviewers. It also makes it easier to manually compare against the hash generated without an index (with something a bit more complex than the functional test).
DrahtBot added the label
Needs rebase
on May 8, 2020
fjahr force-pushed
on May 15, 2020
Add TruncatedSHA256Writere5c842146b
MuHash3072 implementation
Co-authored-by: Fabian Jahr <fjahr@protonmail.com>
c46296006b
Add x86_64 assembly optimization for MuHashdf0a6e79ab
crypto: Add serialization methods to Muhash1a5928c393
bench: Add seperate benchmarks for Muhash precompute, add and div58426412f0
coinstats: Remove tracking of transaction count in UTXO set
Tracking this metric would be very complicated to implement when
using a coinstatsindex and it does not seem to be used for anything
meaningful.
8e7c45da95
coinstats: Use Muhash to calculate UTXO set hashdfba5c2bfa
coinstats: Extract GetTruncatedSHA512Hash function
This makes it possible to reuse this code in coinstatsindex.
c092588cfa
index: Add CoinStatsIndex
The index holds the values previously calculated in coinstats.cpp
for each block, representing the state of the UTXO set at each
height.
1502a08b4a
index: CoinStatsIndex can be activated with command line flagc692275243
test: Add functional test for UTXO Stats Index92f4985753
test: Add unit test for UTXO Stats Index9d1b1ecf0f
fjahr force-pushed
on May 15, 2020
DrahtBot removed the label
Needs rebase
on May 15, 2020
rpc: gettxoutsetinfo can be requested for specific blockheights58cf819caa
fjahr force-pushed
on May 15, 2020
fjahr
commented at 3:15 pm on May 15, 2020:
member
Thanks for the review and the great questions!
Concept ACK.
Is there any reference document for MuHash3072? If so, it would be good to link to that in the header file. In particular, does the (single) test vector exist anywhere else?
I did a bit of research when I started the work on this but I am not aware of any reference document or official test vectors. Maybe @sipa knows if I missed something?
The mailinglist post you mentioned at the top also contains useful background.
This particular part of the post might be worth copying (leaving only the MuHash part):
Interestingly, both ECMH and MuHash not only support adding set
elements in any order but also deleting in any order. As a result, we
can simply maintain a running sum for the UTXO set as a whole, and
add/subtract when creating/spending an output in it. In the case of
MuHash it is slightly more complicated, as computing an inverse is
relatively expensive. This can be solved by representing the running
value as a fraction, and multiplying created elements into the
numerator and spent elements into the denominator. Only when the final
hash is desired, a single modular inverse and multiplication is needed
to combine the two.
As the update operations are also associative, H(a)+H(b)+H(c)+H(d) can
in fact be computed as (H(a)+H(b)) + (H(c)+H(d)). This implies that
all of this is perfectly parallellizable: each thread can process an
arbitrary subset of the update operations, allowing them to be
efficiently combined later.
Added an adapted version of this text to muhash.h.
I can’t speak to the cryptographic assumptions underlying MuHash, but at the least within the scope of this PR it’s not used for anything security critical. As the mailinglist discussion pointed out, this feature is a good way to gain more experience with this type of hashing in a non consensus-critical manner.
I also can’t say anything intelligent about the implementation, other than that it’s nice and short. Maybe @practicalswift can add a fuzz harness?
Happy to work on a fuzz harness myself, I am trying to get some more experience with it :)
I am currently a bit unsure of what the right way to go is for this. A first observation is that CHashWriter is also untested. I think both these classes are pretty thin wrappers of the actual hash functions they use, so a test might not provide much value. But TruncatedSHA256Writer does a bit more, so I think it could still be valuable to have a sanity check test. I was looking for any test vectors provided for SHA-512/256 and while there are is no official set, at least these could be enough for a sanity check. However, the problem is that for SHA-512/256 different initializers are used. We reuse SHA-512 for simplicity reasons, I guess. It’s easy enough to add the other initializers. I am just not sure it worth the additional review effort. Another fact that I stumbled over, is that the serializer used here is writing the null character of a string into the hash, where I am not sure if that is a bug but it certainly makes working with test vectors hard.
So, I am not sure if adding SHA-512/256 is worth it. And I am also unsure if a test for TruncatedSHA256Writer would be valuable without test vectors and given the only limited logic inside the class.
I think you can safely squash f59d222 into the original and make yourself Co-Authored-By? You’re not changing the algorithm.
done
I’m currently running cf6ba16 and building a mainnet index on macOS 10.15.4. I’ll update my comment with an example result from a recent block. I ran the first 100K blocks with --enable-debug, stopped and continued without debug.
Concept ACK on dropping transaction count from gettxoutsetinfo. “The number of transactions with unspent outputs” seems a useless metric. If someone does have a use case for it, hopefully we’ll learn before the release. In that case it could be added to getblock.
I got very confused by the indexer code in 8e17009, sorry for any dumb questions…
I will try to add more comments to make it easier to understand.
If it’s not too big a code change, adding a height argument to gettxoutsetinfo would unleash the power of this index, and also make it easier to compare hashes between reviewers. It also makes it easier to manually compare against the hash generated without an index (with something a bit more complex than the functional test).
Done, I had planned to do this as a follow-up anyway and with the removal of some duplicate code, the number of LOC has hardly grown.
DrahtBot
commented at 12:54 pm on May 26, 2020:
member
🐙 This pull request conflicts with the target branch and needs rebase.
DrahtBot added the label
Needs rebase
on May 26, 2020
fjahr renamed this:
Index for UTXO Set Statistics
[WIP] Index for UTXO Set Statistics
on May 28, 2020
laanwj removed this from the "Chasing Concept ACK" column in a project
jonatack
commented at 10:49 am on May 29, 2020:
member
Tested Concept and approach ACK58cf819caabe98
This PR builds cleanly. I launched bitcoind with -coinstatsindex=1 to create the index, which runs in a separate thread and does not block the node. It regularly logs progress in the debug log.
The first 459k blocks took 19 hours (~400 blocks/minute) with debug build with many flags for reviewing.
I rebuilt without debug enabled and no flags and restarted bitcoind. The remaining blocks 459k-632k (173k blocks) took 3.5 hours (~800 blocks/minute).
The index was enabled when the debug log prints: “coinstatsindex is enabled at height 632059” and RPC gettxoutsetinfo then runs near-instantly.
in
src/rpc/blockchain.cpp:997
in
58cf819caa
996 {RPCResult::Type::STR_HEX, "bestblock", "The hash of the block at the tip of the chain"},
997- {RPCResult::Type::NUM, "transactions", "The number of transactions with unspent outputs"},
998 {RPCResult::Type::NUM, "txouts", "The number of unspent transaction outputs"},
999 {RPCResult::Type::NUM, "bogosize", "A meaningless metric for UTXO set size"},
1000- {RPCResult::Type::STR_HEX, "hash_serialized_2", "The serialized hash"},
1001+ {RPCResult::Type::STR_HEX, "utxo_set_hash", "The serialized hash"},
Nit, for when you work on the RPC PR: let’s call this utxo_set_muhash in case we want to try another algo later.
The hash_serialized_2 field should either be (instantly) deprecated, or remain in use when there’s no index. The documentation for -coinstatsindex should point this out.
We could additionally expose the full 384-byte hash (base64 encoded). IIUC someone could use that to check if a specific UTXO is present. Update: that particular example may not be terribly useful
height + coinbase: determines if it can be spent, e.g. given coinbase maturity and OP_CSV
out: amount + scriptPubKey
If you have the above information and all the block headers, you can validate a new block, without needing data from previous blocks. This makes it suitable for pruning and UTXO snapshots.
Sjors
commented at 4:39 pm on June 9, 2020:
member
In #19145 (missing in OP) you’re adding a hash_type option to gettxoutsetinfo, which can be none. This makes an index immediately useful, even without MuHash. So you make a PR for that independently.
Adding MuHash to the index later might be annoying, however if the plan is to support different hash types, it sounds like MuHash should be in a separate index from the coin stats anyway?
fjahr
commented at 12:57 pm on June 16, 2020:
member
In #19145 (missing in OP) you’re adding a hash_type option to gettxoutsetinfo, which can be none. This makes an index immediately useful, even without MuHash. So you make a PR for that independently.
Adding MuHash to the index later might be annoying, however if the plan is to support different hash types, it sounds like MuHash should be in a separate index from the coin stats anyway?
I needed to reflect a bit on this but now do I think it would be good to offer the index already without the hash. But I think to keep the hash in a different index and letting users set these with separate settings might be too much responsibility for the users. I would suggest a more opinionated approach. Rather than having -coinstatsindex=none and -coinstatsindex=muhash later I would suggest it stays -coinstatsindex and at first there would be no hash and users would only get a fast response when they use hash_type=none, then muhash can be added to the index and users will then be receiving responses from the index with hash_type=none and hash_type=muhash. Personally I think we don’t want to have a lot of hash types, too much unnecessary choice for the user. I think we use this now to support hash_serialized_2 for some time and maybe also to run ECMH in parallel but ideally we should be ending up at one hash_type that is the default and most of the other options can probably be removed.
I will prepare the alternate PRs with only hash_type=none and I hope I can make so people are not totally confused by all the different PRs. :)
MarcoFalke referenced this in commit
b52e25cc1b
on Jul 6, 2020
sidhujag referenced this in commit
e98a4e94d1
on Jul 8, 2020
laanwj referenced this in commit
27eeb0337b
on Aug 20, 2020
sidhujag referenced this in commit
defff93348
on Aug 20, 2020
laanwj referenced this in commit
48c1083632
on Sep 1, 2020
sidhujag referenced this in commit
a97b9a0281
on Sep 3, 2020
jnewbery
commented at 10:49 am on October 14, 2020:
member
PR 19105 is merged
laanwj referenced this in commit
b6a71b80d2
on Jan 7, 2021
sidhujag referenced this in commit
d93f2b4943
on Jan 7, 2021
laanwj referenced this in commit
8d82eddee6
on Feb 12, 2021
sidhujag referenced this in commit
266fcc8b6d
on Feb 12, 2021
Rspigler
commented at 2:44 am on April 19, 2021:
contributor
Chart can be updated as PR #19055 and #19145 have been merged
fjahr
commented at 10:33 pm on April 24, 2021:
member
Chart can be updated as PR #19055 and #19145 have been merged
Thanks for the reminder, done!
Rspigler
commented at 7:23 am on April 25, 2021:
contributor
I don’t mean to spam the PR, but the chart was updated incorrectly.
PR 19055 is merged but marked as open
PR 19181 is open but marked as merged
Thank you for all your work!
fjahr
commented at 10:20 am on April 25, 2021:
member
I don’t mean to spam the PR, but the chart was updated incorrectly.
PR 19055 is merged but marked as open
PR 19181 is open but marked as merged
Thank you for all your work!
Thanks, I was too hastily. Fixed.
laanwj referenced this in commit
2b45cf0bcd
on Apr 30, 2021
Fabcien referenced this in commit
8ae513e386
on Aug 31, 2021
DrahtBot
commented at 11:22 am on December 15, 2021:
member
Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
Sjors
commented at 12:16 pm on December 15, 2021:
member
@fjahr should this be closed? If #19181 is all that’s left, it can just be its own PR.
fjahr
commented at 4:51 pm on December 19, 2021:
member
Closing this as almost everything that was aimed for has been implemented by now. Thanks to everyone who supported this project with advice, code reviews, or just general words of encouragement!
There are still two open PRs that are related and could use some review:
#19792 adds an RPC to export the coinstats to a CSV file
#19181 adds ASM optimizations (in rebase at the moment)
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2025-01-21 09:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me