Ran a quick gprof of bitcoin for a short period and I noticed this:
% cumulative self self total
time seconds seconds calls s/call s/call name
37.21 573.47 573.47 secp256k1_fe_mul_inner
27.43 996.29 422.82 secp256k1_fe_sqr_inner
7.13 1106.24 109.95 secp256k1_scalar_reduce_512
4.89 1181.67 75.43 secp256k1_der_parse_integer
4.82 1255.90 74.23 secp256k1_ge_set_xo_var
I figured the low hanging fruit on the top 3 might be exhausted but secp256k1_der_parse_integer looks promising. I'm guessing because it has a bunch of branching? Thought I'd bring this up here if anyone has any ideas on optimizing this.