(Previous discussion of this topic in #438 )
explicit_bzero()
is used when available on the platform (glibc 2.25+), otherwise it falls back to an inline implementation. There are many possible versions of how the inline could be implemented but I have no reason to prefer one over another.
The commit stack does some refactoring in order to preserve the intended non-‘cleanse’ functionality done by several of the _clear()
functions. I constructed the stack as unsquashed, incremental refactoring changes. Please let me know if a squash is preferred.
The measured delta in performance appears negligible. Trials were done inside a minimal headless VMs with Debian 8 in one for glibc 2.19
and the current base of Arch Linux in the other for glibc 2.25
. The full session output of the trials was logged for reference and is available as a gist.
For reference to bare metal performance with the same CPU, timing on the underlying native Debian 8 install (with glibc 2.19
and gcc 4.9.2
running with master
:
0(wo/ endomorphism)
1$ ./bench_sign
2ecdsa_sign: min 55.3us / avg 55.4us / max 55.5us
3$ ./bench_verify
4ecdsa_verify: min 85.6us / avg 85.9us / max 86.3us
5ecdsa_verify_openssl: min 476us / avg 484us / max 490us
Trials:
master
w/glibc 2.19
,gcc 4.9.2
w/ and wo/ endomorphisim
0(wo/ endomorphism)
1$ ./bench_sign
2ecdsa_sign: min 55.9us / avg 56.4us / max 56.7us
3$ ./bench_verify
4ecdsa_verify: min 86.2us / avg 86.8us / max 87.7us
5ecdsa_verify_openssl: min 487us / avg 488us / max 491us
6
7(w/ endomorphism)
8$ ./bench_sign
9ecdsa_sign: min 56.1us / avg 56.6us / max 57.2us
10$ ./bench_verify
11ecdsa_verify: min 66.5us / avg 67.4us / max 68.8us
12ecdsa_verify_openssl: min 487us / avg 493us / max 504us
- a16034039f07572a64bc9704c23abb1fff9d70ad w/
glibc 2.19
gcc 4.9.2
w/ and wo/ endomorphism
0(wo/ endomorphism)
1./bench_sign
2ecdsa_sign: min 55.8us / avg 56.5us / max 57.0us
3$ ./bench_verify
4ecdsa_verify: min 86.3us / avg 86.6us / max 86.9us
5ecdsa_verify_openssl: min 482us / avg 483us / max 484us
6
7(w/ endomorphism)
8$ ./bench_sign
9ecdsa_sign: min 56.0us / avg 56.5us / max 56.7us
10$ ./bench_verify
11ecdsa_verify: min 65.7us / avg 66.0us / max 66.4us
12ecdsa_verify_openssl: min 482us / avg 484us / max 489us
master
w/glibc 2.25
,gcc 6.3.1
w/ and wo/ endomorphisim
0(wo/ endomorphism)
1./bench_sign
2ecdsa_sign: min 57.2us / avg 57.8us / max 58.1us
3$ ./bench_verify
4ecdsa_verify: min 76.1us / avg 76.7us / max 77.7us
5ecdsa_verify_openssl: min 461us / avg 464us / max 482us
6
7(w/ endomorphism)
8$ ./bench_sign
9ecdsa_sign: min 57.0us / avg 57.2us / max 57.5us
10$ ./bench_verify
11ecdsa_verify: min 54.8us / avg 55.1us / max 55.5us
12ecdsa_verify_openssl: min 461us / avg 462us / max 467us
- a16034039f07572a64bc9704c23abb1fff9d70ad w/
glibc 2.25
gcc 6.3.1
w/ and wo/ endomorphism
0(wo/ endomorphism)
1./bench_sign
2ecdsa_sign: min 56.7us / avg 57.0us / max 57.5us
3$ ./bench_verify
4ecdsa_verify: min 75.6us / avg 76.1us / max 76.6us
5ecdsa_verify_openssl: min 460us / avg 460us / max 461us
6
7(w/ endomorphism)
8$ ./bench_sign
9ecdsa_sign: min 56.5us / avg 56.8us / max 57.1us
10$ ./bench_verify
11ecdsa_verify: min 54.7us / avg 54.8us / max 55.0us
12ecdsa_verify_openssl: min 462us / avg 463us / max 464us
The desired behavior of not getting optimized out was also verified by looking at the resulting assembly. With the inline implementation on glibc 2.19
, the end section of secp256k1_ecmult_gen()
from gcc 4.9.2
looks like:
0 movl %r15d, 120(%rbx) # infinity, r_7(D)->infinity
1 movq 128(%rsp), %rax # %sfp, ivtmp.249
2 cmpq $64, %rax #, ivtmp.249
3 je .L83 #,
4 salq $34, %rax #, D.9134
5 shrq $37, %rax #, D.9134
6 movl 304(%rsp,%rax,4), %r9d # gnb.d, D.9141
7 jmp .L84 #
8.L83:
9 movq $memset, 768(%rsp) #, volatile_memset
10 leaq 300(%rsp), %rdi #, tmp5061
11 movq 768(%rsp), %rax # volatile_memset, D.9135
12 movl $4, %edx #,
13 xorl %esi, %esi #
14 call *%rax # D.9135
15 movq $memset, 816(%rsp) #, volatile_memset
16 leaq 976(%rsp), %rdi #, tmp5062
17 movq 816(%rsp), %rax # volatile_memset, D.9135
18 movl $84, %edx #,
19 xorl %esi, %esi #
20 call *%rax # D.9135
21 movq $memset, 864(%rsp) #, volatile_memset
22 leaq 304(%rsp), %rdi #, tmp5063
23 movq 864(%rsp), %rax # volatile_memset, D.9135
24 movl $32, %edx #,
25 xorl %esi, %esi #
26 call *%rax # D.9135
27 addq $1080, %rsp #,
With the explicit_bzero()
from glibc 2.25
linked and with gcc 6.3.1
, the same end section of secp256k1_ecmult_gen
looks like:
0 movl $0, 240(%rsp) #, add.infinity
1 call secp256k1_gej_add_ge #
2 addq $1, (%rsp) #, %sfp
3 movq 8(%rsp), %r10 # %sfp, tmp242
4 movq (%rsp), %rax # %sfp, ivtmp.629
5 movq 16(%rsp), %r9 # %sfp, _90
6 movq 24(%rsp), %r8 # %sfp, _85
7 cmpq $64, %rax #, ivtmp.629
8 jne .L360 #,
9 leaq 60(%rsp), %rdi #, tmp380
10 movl $4, %esi #,
11 call explicit_bzero #
12 leaq 160(%rsp), %rdi #, tmp381
13 movl $88, %esi #,
14 call explicit_bzero #
15 leaq 64(%rsp), %rdi #, tmp382
16 movl $32, %esi #,
17 call explicit_bzero #
18 addq $264, %rsp #,