fe_get_b32 was taking 7 times longer than a fe_sqr due to working two bits at a time.
Rico666 on Bitcointalk noticed this and provided a patch for 5x52, I formatted it and wrote the rest.
This is perhaps a 0.5% speedup for ecdsa_verify using 32bit field code.