This contains a rebased version of @peterdettman’s #21 (to account for the move of lambda splitting from group to scalar, and avoiding secp256k1_num_get_bit which got removed), and then simplifies it to a pure scalar-based version.
Gives around a 0.8% speedup on –enable-endomorphism CFLAGS=-O3, and enables the endomorphism optimization without using GMP (with a 28% performance hit).