This PR addresses issue #1751 by adding a call to check_arm32_assembly() by default, matching the current behavior with check_x86_64_assembly().
This would result in speedup on field_10x26_impl.h on default builds. For example, currently, the Bitcoin Core reference implementation compiles libsecp256k1 with default options, leading to unoptimal builds.
This change could help address https://github.com/bitcoin/bitcoin/issues/32832 partially, considering the flamegraph shows that ecdsa_verify takes 90% of IBD time.
| Benchmark | Avg(us) OFF | Avg(us) arm32 | Improvement (%) |
|---|---|---|---|
| ecdsa_verify | 379.0 | 322.0 | 15.0 |
| ecdsa_sign | 184.0 | 170.0 | 7.6 |
| ec_keygen | 160.0 | 145.0 | 9.4 |
| ecdh | 382.0 | 332.0 | 13.1 |
| schnorrsig_sign | 162.0 | 148.0 | 8.6 |
| schnorrsig_verify | 380.0 | 323.0 | 15.0 |
| ellswift_encode | 109.0 | 95.1 | 12.7 |
| ellswift_decode | 60.2 | 50.8 | 15.6 |
| ellswift_keygen | 268.0 | 240.0 | 10.4 |
| ellswift_ecdh | 395.0 | 343.0 | 13.2 |