I’ve been investigating performance regressions in CreateNewBlock.
addPackageTxs()
is supposed to “fail fast” for transactions that don’t fit in a block. However while we skip the most expensive ancestor/descendant calculations for failing transactions (generally), it turns out the other work we’re doing (mostly map lookups?) can still be pretty slow.
After trying various approaches to speed up those operations (replacing maps with unordered_maps, changing the sort order on the ancestor score function to avoid needless re-sorting, and even getting rid of the maps altogether in favor of storing set information directly in the mempool entry), the one optimization that dominates all these is to just return earlier when the block is nearly full.
So in this PR: when we’re within 4000 weight of the block being full, if we consider and fail to add 1000 transactions in a row, then give up. I’ve benchmarked this as reducing the average run of CNB from 84ms to 63ms 74ms to 61ms, with negligible difference in fees. Most of CNB time is currently taken by the call to TestBlockValidity, which is unaffected by this PR; so focusing just on package selection, the improvement from this change is a reduction in average addPackageTxs()
time from 19ms to 7ms, over the time period I analyzed (first week of December, 2016).
I also added some commits that provide benchmarking of CNB when running with -debug=bench, which I thought might be generally useful/interesting.