Not submitting as a PR because it's hackish and we're moving towards libsecp256 anyway. Logging it here in case it's of any value for discussion for 0.10, though.
See https://github.com/theuni/bitcoin/commit/3f74e704c6eea38c34132e273ba838473cf9eba9
Initialize a single EC_KEY with the correct underlying group, then use EC_KEY_precompute_mult to speedup future operations. Save the result. For each future key, use a duplicate of that result as a starting point rather than incurring the overhead each time. I stuffed it in a singleton-ish as a quick hack.
On my machine it shaves off about 5% per verify.