I've been investigating performance of CreateNewBlock(), and one performance regression I spotted was added in #9189, when we started always calculating the witness commitment (even before segwit has activated). This is particularly slow because we don't currently cache the witness hash of a transaction.
The coinbase commitment takes about 8ms to generate currently; after this patch this time is reduced to about 1.5ms.
I think this also ought to speed up compact block performance, but I haven't tried to benchmark.
(I think in an ideal world, we'd go further and cache these witness hashes for segwit transactions as well -- I proposed one approach in #9700, and I tried another approach for a similar kind of caching benefit in #9709, but need reviewer feedback to figure out how to move forward.)