remove TestBlockValidity from CreateNewBlock critical path

jameshilliard commented at 7:57 pm on February 25, 2017: contributor

Rational for removal:

This is effectively a fatal error to miners regardless, there is no meaningful downside to removing this because getting an invalid template is not practically any worse than getting no template.
Due to cs_main locks this causes a large variance in GBT response times and further encourages undesirable workarounds such as validationless mining.
Invalid blocks are much more likely to originate from stratum server bugs, something like this is probably a better way to test block validity.

I’ve been running with this TestBlockValidity call removed in production for a few months at this point.

jameshilliard force-pushed on Feb 25, 2017

remove TestBlockValidity from CreateNewBlock critical path 7555922a66

jameshilliard force-pushed on Feb 25, 2017

luke-jr changes_requested

luke-jr commented at 9:41 pm on February 25, 2017: member

This seems to remove the test entirely, not just from the critical path.

Getting an invalid template is worse than “no template”, because it causes validationless mining. As such, arguing that this check encourages validationless is meaningless: encouraging is less bad than causing.

(Note that BIP 23 Block Proposals would be better than the faux mining stuff to do a later re-test.)

Also note that CNB is not in the critical path for validationless mining. As soon as validation of a block completes (and no sooner), miners should switch to empty blocks on top of that, and not be waiting for CNB anyway. A non-zero delay here will always exist, and it’s just a matter of implementing the empty block logic correctly (waiting for validation of the prevblock).

That all being said, it might be okay to simply remove it from the critical path, provided it is still checked after GBT returns, and has a reasonably sufficient way of stopping mining if it’s invalid (eg, perhaps call blocknotify handlers and abort?)

theuni commented at 9:41 pm on February 25, 2017: member

I can agree with the idea that BlockAssembler shouldn’t need to resort to the heaviness of TestBlockValidity, but we do need some checks here.

As to your points

Grinding for a minute on a block that will be rejected is not worse than an extra quick call to gbt?
cs_main is already locked. This change does not reduce locking, only shaves down the time that it’s held. Also, technically, this change would make this validation-less mining :(
Your example uses TestBlockValidity :)

fanquake added the label Mining on Feb 25, 2017

jameshilliard commented at 10:48 pm on February 25, 2017: contributor

Getting an invalid template here should never happen unless there is a bug within bitcoind, this TestBlockValidity call is only an internal sanity check, my definition of validationless mining would be mining on top of a block that you haven’t validated, this does not change that. @luke-jr

Not all pool software is capable of bypassing GBT, that requires implementing p2p networking in the stratum server which adds quite a bit of complexity.
My point is that CNB is in the critical path for validating mining, at least for stratum servers that don’t implement p2p themselves. @theuni
My assumption is that if TestBlockValidity fails subsequent calls are also likely to fail.
Yes, removing TestBlockValidity does however significantly reduce CreateNewBlock times from my testing, I’m assuming locking is a major factor since TestBlockValidity introduces a large amount of variance in CreateNewBlock times as well.

luke-jr commented at 1:49 am on February 26, 2017: member

blocknotify should be sufficient to begin empty-block mining. But perhaps it would make sense to have bitcoind’s GBT itself handle the empty template?

jameshilliard commented at 2:37 am on February 26, 2017: contributor

@luke-jr Well you would have to cache some things like nversion if you just use blocknotify right? CreateNewBlock itself seems to be quite fast without TestBlockValidity so I’m not sure sending empty blocks out first is worth the extra bandwidth/complexity.

TheBlueMatt commented at 6:46 pm on February 26, 2017: member

I’m totally in favor of removing TBV from the critical-path, but only if we trigger some background checker that will print to debug.log (and probably assert) if it returned a template it later found to be invalid.

jameshilliard commented at 7:35 pm on February 26, 2017: contributor

@TheBlueMatt I made an attempt to do that before but the way I was doing it there didn’t look viable.

sipa commented at 9:33 pm on February 26, 2017: member

I believe that @jameshilliard assumption that if CNB ever produces an invalid block, it is likely that it will keep doing so is right, and if so, the only benefit of running TBV is better reporting (so developers can fix the bug), not actually changing what is being mined. The better reporting may be useful, as miners may use GBT under circumstances that aren’t covered by any tests during rcs, but it seems we have a pretty good track record the last few releases.

I think there are two reasonable approaches:

Run a background thread that occasionally runs TBV on the CNB results. It doesn’t have to test everything, and it can do so with a delay. Unfortunately, TBV’s locking of cs_main may interfere with other CNB calls.
Make the TBV call optional like in #9859, and leave it on in unit and RPC tests, but off by default in production.

jameshilliard commented at 9:58 pm on February 26, 2017: contributor

@sipa I think just making it optional is probably best, doing block proposals within the stratum server would be good since it can be done completely outside of the critical path, although I’m not sure how easy it would be to add support for that in the various stratum servers themselves. I like cory’s faux-mining for testing a little more at least for non-production testing since it can be done without modification of the stratum server.

gmaxwell commented at 11:41 pm on February 27, 2017: contributor

I don’t think making it optional is useful. It will almost never be run where it needs to be run. And more options mean more mostly untested code that will catch fire later when someone does happen to use it.

Run a background thread that occasionally runs TBV on the CNB results. It doesn’t have to test everything, and it can do so with a delay. Unfortunately, TBV’s locking of cs_main may interfere with other CNB calls.

We should be able to get the locking into a state where it can return a cached template without taking cs_main. In which case, running the test immediately after returning a result will run it during a time when its mostly only returning a cached result. Moreover the test could be made interruptible, in case a new block arrives– this is also important for BIP23 block proposal testing.

I do disagree that mining an invalid block is equal to or better than a delay: A shutdown lets you fall over to another daemon, a shutdown tells you something is broken so you fix it rather than days of fake mining, and a shutdown also prevents you from making blocks that cause false confirmations for SPV clients propagated around broken nodes that broadcast without validating. – but the whole question should be moot, since we should be able to perform this check without adding any meaningful delay.

jameshilliard commented at 0:37 am on February 28, 2017: contributor

@gmaxwell My thoughts are that this should probably be only used for testing and not in production under normal circumstances.

In regards to failover I would expect that a bug in CNB would likely be deterministic and cause failures on all other pool nodes as well that run similar mempool policy.

in src/miner.h: in 7555922a66

164@@ -165,7 +165,7 @@ class BlockAssembler
165 public:
166     BlockAssembler(const CChainParams& chainparams);
167     /** Construct a new block template with coinbase to scriptPubKeyIn */
168-    std::unique_ptr<CBlockTemplate> CreateNewBlock(const CScript& scriptPubKeyIn);
169+    std::unique_ptr<CBlockTemplate> CreateNewBlock(const CScript& scriptPubKeyIn, bool TestBlock = false);

jtimon commented at 5:40 pm on March 7, 2017:

Do we need a default value here? Do we really save that much disruption by not passing false explicitly where it’s needed?

Bikeshedding: s/TestBlock/fValidateBlock/ ?

MarcoFalke added the label Needs rebase on Jun 6, 2018

MarcoFalke commented at 4:30 pm on August 25, 2018: member

Closing for now. Let me know when you continue to work on this

MarcoFalke closed this on Aug 25, 2018

laanwj removed the label Needs rebase on Oct 24, 2019

MarcoFalke locked this on Dec 16, 2021

remove TestBlockValidity from CreateNewBlock critical path #9858