While addressing the review suggestion #1765 (review) (b10c mirror link), I noticed that we don't have an internal benchmark for the variable-time variant of _fe_normalize yet, so this PR adds one. IIUC it's fine to repeatedly apply the operation on the same (already normalized at latest after the first loop iteration) field element for benchmarking purposes and don't put in an effort to reach the final reduction code path, considering how extremely unlikely it is to reach it in practice.
Results on my machine:
$ ./build/bin/bench_internal normalize
Benchmark , Min(us) , Avg(us) , Max(us)
field_normalize , 0.0103 , 0.0106 , 0.0128
field_normalize_var , 0.00545 , 0.00546 , 0.00547
field_normalize_weak , 0.00352 , 0.00354 , 0.00363