Optimization: Calculate block hash less times

jtimon commented at 7:15 pm on May 4, 2017: contributor

The other day @gmaxwell pointed out on IRC that we’re calculating the block hash many more times than we need to. This attempts to improve that and should result in better performance but I haven’t run any benchmark.

Contains: #9717

in src/net_processing.cpp:2405 in ce6d1b2776 outdated

2402         // unless we're still syncing with the network.
2403         // Such an unrequested block may still be processed, subject to the
2404         // conditions in AcceptBlock().
2405         bool forceProcessing = pfrom->fWhitelisted && !IsInitialBlockDownload();
2406-        const uint256 hash(pblock->GetHash());
2407+        const uint256 hash(blockHash);

theuni commented at 7:27 pm on May 4, 2017:

Just use blockHash?

jtimon commented at 7:41 pm on May 4, 2017:

Thanks, I saw this and then forgot about it

in src/validation.cpp:2823 in 2049a1f3b9 outdated

2819@@ -2820,10 +2820,10 @@ bool FindUndoPos(CValidationState &state, int nFile, CDiskBlockPos &pos, unsigne
2820     return true;
2821 }
2822 
2823-bool CheckBlockHeader(const CBlockHeader& block, CValidationState& state, const Consensus::Params& consensusParams, bool fCheckPOW)

theuni commented at 7:43 pm on May 4, 2017:

Please rename this while you’re at it. CheckBlockHeader is a terrible name if basically just compares 2 values :)

theuni commented at 7:50 pm on May 4, 2017:

Heh, nevermind. I see that’s been suggested in #9717

jtimon commented at 9:03 pm on May 4, 2017:

The discussion there was more about fully removing it and inlining it, which I prefer not to do for now. Just renaming is fine and as you point out it’s free to do it now. Renaming to CheckProof (not with CheckProofOfWork to avoid confusing it with the version without CValidationState).

in src/net_processing.cpp:2023 in ce6d1b2776 outdated

2019@@ -2020,6 +2020,7 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
2020         // Keep a CBlock for "optimistic" compactblock reconstructions (see
2021         // below)
2022         std::shared_ptr<CBlock> pblock = std::make_shared<CBlock>();
2023+        const uint256& blockHash = pblock->GetHash();

theuni commented at 7:53 pm on May 4, 2017:

This hash doesn’t match the one that’s ultimately reconstructed.

jtimon force-pushed on May 4, 2017

jtimon commented at 9:16 pm on May 4, 2017: contributor

Fixed 3 nits by @theuni added a commit for TestBlockValidity, with a tiny optimization on getblocktemplate, and not much extra disruption.

fanquake added the label Refactoring on May 5, 2017

in src/validation.cpp:2823 in 6373598995 outdated

2819@@ -2820,10 +2820,10 @@ bool FindUndoPos(CValidationState &state, int nFile, CDiskBlockPos &pos, unsigne
2820     return true;
2821 }
2822 
2823-bool CheckBlockHeader(const CBlockHeader& block, CValidationState& state, const Consensus::Params& consensusParams)
2824+static bool CheckProof(const Consensus::Params& consensusParams, const CBlockHeader& block, const uint256& blockHash, CValidationState& state)

TheBlueMatt commented at 8:42 pm on May 5, 2017:

CheckProof doesnt say anything to me? I have no idea what “Proof” it would be checking, probably just remove the function.

jtimon commented at 2:07 pm on May 6, 2017:

Well, CheckProof also makes sense for the blocksigning branch #9177 (rewrite and rebase in progress), since you could be checking proof of work or the block script. But that name change can be done later, after all this function is being only called from 2 places at this point. Feel free to propose another name, maybe we can also move it to pow.o while at it.

TheBlueMatt commented at 8:21 pm on May 8, 2017:

Lets just remove it and call CheckProofOfWork directly?

jtimon commented at 4:38 pm on May 9, 2017:

See the discussion on #9717 and #10339 (review) It doesn’t seem to make much sense to remove a wrapper of CheckProofOfWork that takes the CValidationState to just add it again on another PR doesn’t seem to make much sense. But if we don’t do the blocksigning stuff or something we can remove the wrapper later. In any case, renamed CheckProof to CheckProofOfWork, the name collision doesn’t matter because the parameters are different, but maybe some other name is better.

NicolasDorier commented at 6:11 pm on May 6, 2017: contributor

I think a better approach would be to cache it. A bit like it is already done in CTransaction versus CMulableTransaction

jtimon commented at 12:07 pm on May 7, 2017: contributor

I think a better approach would be to cache it. A bit like it is already done in CTransaction versus CMulableTransaction

What is better about it? Anyway, this doesn’t preclude from caching it too.

Needs rebase

NicolasDorier commented at 5:17 pm on May 7, 2017: contributor

better in the sense it does not require one more parameter and that it is coherent with what we are doing for Transaction.

jtimon force-pushed on May 7, 2017

jtimon commented at 7:27 pm on May 7, 2017: contributor

Rebased

EDIT: @NicolasDorier I think even if we do that it’s better that some functions take the hash directly, maybe not all the the functions here. But I’m happy to review such a PR.

gmaxwell commented at 11:43 pm on May 7, 2017: contributor

Hm. if the block were made immutable, caching the result would be trivial. The only place where a block is not immutable in the code is in the creation of a block (where we don’t care about the hash). These changes require carrying around a hash where with an immutable block it would just be a method to return it.

So while these changes don’t preclude caching it, I think the caching it patch would be smaller and simpler. Thoughts?

in src/validation.cpp:3239 in 0b4a660e13 outdated

3235@@ -3235,7 +3236,8 @@ bool ProcessNewBlock(const CChainParams& chainparams, const std::shared_ptr<cons
3236         CValidationState state;
3237         // Ensure that CheckBlock() passes before calling AcceptBlock, as
3238         // belt-and-suspenders.
3239-        bool ret = CheckBlock(*pblock, state, chainparams.GetConsensus());
3240+        const uint256 blockHash = pblock->GetHash();

ryanofsky commented at 9:21 pm on May 8, 2017:

In commit “Pow: Optimization: Pass const uint256& blockHash to CheckBlock”

Some places you are using const uint256 and other places you are using const uint256& for the blockHash local. Both ways should be functionality identical, but it might be nice to stick with a consistent style.

ryanofsky commented at 9:35 pm on May 8, 2017: member

utACK 2069f9282cebb1c08b7e12c8155de3403dda39b4. Nice cleanup.

If a small caching implementation is possible, that would probably give a preferable end result. But this PR very simple and an obvious improvement over the status quo.

jtimon force-pushed on May 9, 2017

jtimon commented at 5:03 pm on May 9, 2017: contributor

Needed rebase. Yes, making the block immutable and using a cache would be more complete and also likely less disruptive. But I don’t think I will have time to try that next week. If somebody else writes that alternative, please ping for review.

sipa commented at 5:09 pm on May 9, 2017: member

I prefer this approach for reducing rehashing first. Caching is always possible later, but brings its own complications (like either eager precomputation, possibly hashing in unnecessary places, or locking issues).

ryanofsky commented at 3:11 pm on May 10, 2017: member

Minor code comment from 2 days ago (forgot to hit submit button).

ryanofsky commented at 3:26 pm on May 10, 2017: member

utACK 46f5a9b090d8084edd0160411f634c6140ac983d. Changes since previous review were just resolving chainParams merge conflict, renaming CheckProofOfWork, and making it static in an earlier commit.

laanwj added this to the "Blockers" column in a project

in src/validation.cpp:1696 in 008b0e3c21 outdated

1692@@ -1693,13 +1693,14 @@ static bool ConnectBlock(const CBlock& block, CValidationState& state, CBlockInd
1693 {
1694     AssertLockHeld(cs_main);
1695     assert(pindex);
1696+    const uint256& blockHash = block.GetHash();

sipa commented at 9:42 pm on May 23, 2017:

If you’re going to stick to variable names that look like Hungarian style (which is certainly not a requirement, IMO), I do prefer it to be consistent. hashBlock would be correct here.

jtimon commented at 4:48 pm on May 29, 2017:

does hashBlock even qualify as “hungarian style”? Anyway, it wasn’t my intention, I just cleed it blockHash because it’s the block’s hash. I don’t care to change it, but I really don’t think s/blockHash/hashBlock/ is worth it here at all.

in src/validation.cpp:3154 in 008b0e3c21 outdated

3150@@ -3151,6 +3151,7 @@ bool ProcessNewBlockHeaders(const std::vector<CBlockHeader>& headers, CValidatio
3151 static bool AcceptBlock(const std::shared_ptr<const CBlock>& pblock, CValidationState& state, const CChainParams& chainparams, CBlockIndex** ppindex, bool fRequested, const CDiskBlockPos* dbp, bool* fNewBlock)
3152 {
3153     const CBlock& block = *pblock;
3154+    const uint256& blockHash = pblock->GetHash();

sipa commented at 9:43 pm on May 23, 2017:

Super nit: this may be slightly confusing as it may make it look like nothing new is computed, so I prefer const uint256 hashBlock = here instead.

jtimon commented at 4:49 pm on May 29, 2017:

Again, I don’t see how you suggestion for changing the name improves anything or gives more clarity, but if you think so, I can change it everywhere (including all function arguments) I guess…

sipa commented at 9:46 pm on May 23, 2017: member

utACK with tiny nits

jtimon commented at 0:02 am on May 31, 2017: contributor

Before bikeshedding blockHash, please let me squash. Nobody seems to have complained about individual functions, which was the main point of keeping it separated. I think it would would be cleaner in a single commit on top of #9717 (or maybe squash that too if it’s not going to be merged independently).

jtimon force-pushed on May 31, 2017

jtimon commented at 1:52 am on May 31, 2017: contributor

Squashed to 2 commits on top of #9717 for now as explained in the previous comment.

EDIT: also scripted-diff: s/blockHash/block_hash/ on top

ryanofsky commented at 9:02 pm on June 1, 2017: member

utACK 49342ca8ad5be47119a74a9989109f8eacf4dd68. Only changes since last review were squashing last 6 commits down into 1, and adding the new scripted diff commit to rename blockHashes.

ryanofsky commented at 1:58 pm on June 2, 2017: member

@luke-jr do you have suggestions for this PR given your objection to #9717, which this is built on? Would you also disagree with changes in this PR? Or would you prefer to keep some changes here and discard others?

luke-jr commented at 4:40 am on June 3, 2017: member

I don’t see why it would require #9717. At least here, it renames the function, although it looks like it’s doing some confusing overloading now :/

jtimon commented at 5:01 pm on June 5, 2017: contributor

I don’t see why it would require #9717. At least here, it renames the function, although it looks like it’s doing some confusing overloading now :/

It is not strictly necessary, I can do this without any renaming and while keeping the stupid boolean argument in CheckBlockHeader if that is preferred. Or squash the first 2 commits, or rename it again to something different from “CheckProofOfWork” (but not “CheckProof” since that has been discarded already. The only thing I don’t want to do is remove CheckBlockHeader here or in #9717 . Can we please move forward and leave that decision (remove CheckBlockHeader or not [or whatever is called after rename]) for later? I thought we agreed on #9717 that it was an improvement even if some people want to go further and remove CheckBlockHeader and others aren’t convinced.

ryanofsky commented at 8:47 pm on June 5, 2017: member

I acked both #9717 and this PR, but in my opinion the situation would be less confusing if this PR were modified not to depend on #9717, and then merged, and then there was an attempt to come to an agreement on #9717.

TheBlueMatt commented at 7:37 pm on June 7, 2017: member

Yea, I think the disagreement in #9717 merits this being rebased on master directly instead.

jtimon commented at 8:52 pm on June 7, 2017: contributor

Needs rebase Agreed, I will do it without #9717 and also without renaming CheckBlockHeader (which I believe was only asked because of #9717). My fault, I really expected #9717 to be fully uncontroversial and an obvious (although small) win to everyone, but let’s just separate concerns and move forward.

Optimization: Pass pre-calculated const uint256& block_hash through more functions

- Pow: Pass const uint256& block_hash to s/CheckBlockHeader/CheckProofOfWork/
  Optimization because AcceptBlockHeader had the hash calculated already
- Pass const uint256& block_hash to CheckBlock
  Optimization because now ConnectBlock calls CBlock::GetHash() less times
- Pass const uint256& block_hash to ConnectBlock
  Optimization because TestBlockValidity had calculated block_hash already
- Pass const uint256& block_hash to AcceptBlock
  Optimization because now AcceptBlock reuses CBlock::GetHash() calculation from ProcessNewBlock
- Pass const uint256& block_hash to ProcessNewBlock
- Call CBlock::GetHash() less in net_processing::ProcessMessage()
- Pass const uint256& block_hash to TestBlockValidity
  Optimization because getblocktemplate had the hash calculated already

aa881b2c8b

jtimon force-pushed on Jun 7, 2017

jtimon commented at 9:55 pm on June 7, 2017: contributor

Rebased without #9717 as discussed and squashed into 1 commit.

ryanofsky commented at 3:04 pm on June 8, 2017: member

utACK aa881b2c8b7c7cd65b7f8f34a5244284628ae407. Change is straightforward, though I didn’t try to confirm whether it actually does result in fewer block hashes (potentially some parts of this change might compute block hashes in cases where they aren’t needed).

laanwj commented at 3:48 pm on June 8, 2017: member

I’m going to do some measurements on this.

laanwj commented at 4:46 pm on June 8, 2017: member

Aim: Block 1..300000 -reindex-chainstate. Added instrumentation to CBlockHeader::GetHash():

With patch: 1665111 block hashing operations (up to block 301215)
Without patch: 2260763 block hashing operations (up to block 300085)

There’s a slight bit of imprecision because -stopatheight=300000 didn’t work, so I had to use my reflexes. But I’m fairly sure the ~26% saving in block hash operations doesn’t result from only those 1130 blocks. Oh wait, it’s the other way around even :)

laanwj removed this from the "Blockers" column in a project

jtimon commented at 8:50 pm on June 17, 2017: contributor

Let’s say I claim “In a rather naive benchmark (reindex-chainstate up to block 284k, with -assumevalid=0 and a very large dbcache), this connected blocks ~1.05x faster.” similarly to what @TheBlueMatt in #10192, would someone check this claim (did anyone checked for #10192)? If true, would that be enough for making this interesting to those who isn’t? Is anyone working on doing that kind of benchmark automatically?

I assume he did it by looking at the logs. I haven’t tried that for wither this PR or #10192, but if 1.05x is not enough…this probably gives less than that. It just seems strange that this is disregarded as “the performance improvement is not worth it” without anyone actually meassured it besides laanwj’s tests which only counts the number of hashes (not saying that’s not helpful, but that doesn’t seem to be enough).

ryanofsky commented at 5:54 pm on June 21, 2017: member

I think this change is more good than bad. I generally think it’s better to pass data explicitly where it needs to go than to sneak it through side channels or recompute it unnecessarily, and I don’t see that much cost in adding function arguments. But I also agree with others in https://botbot.me/freenode/bitcoin-core-dev/msg/86999458/ who pointed out that there is some cost to adding function arguments, and that it would be nicer if hashes could be obtained more efficiently from CBlocks or from another block class. @jtimon, from my recollection of what people were saying two weeks ago, if this change actually did connect blocks “~1.05x faster,” that would be enough to overcome objections. The thought was more that hashing a block is so much simpler than connecting a block, that you would not actually see any measurable increase in runtime performance from this change.

sipa commented at 6:04 pm on June 21, 2017: member

I would be very surprised if this has a measurable impact at all. A 26% reduction in block hashing operations would completely vanish compared to Merkle tree computations which are also done for each blocks (and need 1000s of hashes). From @laanwj’s numbers I would expect a 1s CPU time reduction for a full sync on a modern system.

sdaftuar commented at 3:44 pm on June 23, 2017: member

If we don’t think this will have a measurable impact, I’d prefer that we not make this change – seems to me that this makes the consensus code more brittle.

laanwj commented at 2:00 pm on June 24, 2017: member

So while these changes don’t preclude caching it, I think the caching it patch would be smaller and simpler. Thoughts?

Implicit caching, although the explicit code impact is smaller, is always somewhat error-prone. I think I prefer explicitly passing values in.

But if this isn’t worth doing, we should simply not do it. Unnecessary changes are only a risk. Closing.

laanwj closed this on Jun 24, 2017

DrahtBot locked this on Sep 8, 2021

Optimization: Calculate block hash less times #10339