- Replace -benchmark (and the related fBenchmark) with a regular debug option, -debug=bench.
- Increase coverage and granularity of individual block processing steps.
- Add cummulative times.
Example output:
0 - Load block from disk: 3.11ms [51.13s]
1 - Connect 484 transactions: 4.14ms (0.009ms/tx, 0.005ms/txin) [64.37s]
2 - Verify 860 txins: 4.25ms (0.005ms/txin) [66.98s]
3 - Index writing: 1.22ms [33.53s]
4 - Callbacks: 0.10ms [2.15s]
5 - Connect total: 14.17ms [203.69s]
6 - Flush: 1.00ms [19.88s]
7 - Writing chainstate: 0.09ms [33.30s]
8 - Connect postprocess: 0.23ms [5.15s]
9 - Connect block: 11.96ms [313.16s]
Note that the subdivisions work backwards: the grand total is the last item, ‘Connect transactions’ is a part of ‘Verify transactions’, and ‘Verify transactions’, ‘Index writing’, and ‘Callbacks’ are part of ‘Connect total’.