- Replace -benchmark (and the related fBenchmark) with a regular debug option, -debug=bench.
- Increase coverage and granularity of individual block processing steps.
- Add cummulative times.
Example output:
- Load block from disk: 3.11ms [51.13s]
- Connect 484 transactions: 4.14ms (0.009ms/tx, 0.005ms/txin) [64.37s]
- Verify 860 txins: 4.25ms (0.005ms/txin) [66.98s]
- Index writing: 1.22ms [33.53s]
- Callbacks: 0.10ms [2.15s]
- Connect total: 14.17ms [203.69s]
- Flush: 1.00ms [19.88s]
- Writing chainstate: 0.09ms [33.30s]
- Connect postprocess: 0.23ms [5.15s]
- Connect block: 11.96ms [313.16s]
Note that the subdivisions work backwards: the grand total is the last item, 'Connect transactions' is a part of 'Verify transactions', and 'Verify transactions', 'Index writing', and 'Callbacks' are part of 'Connect total'.