Unrolling the inner ChaCha20 loop gives a ~15% speedup for me in the CHACHA20_* benchmarks. It's a simple change, this performance helps with RNG generation, and will matter more for BIP324.
Unroll the ChaCha20 inner loop for performance #24946
pull sipa wants to merge 1 commits into bitcoin:master from sipa:202204_unrollchacha changing 1 files +28 −20-
sipa commented at 4:32 PM on April 22, 2022: member
- DrahtBot added the label Utils/log/libs on Apr 22, 2022
-
in src/crypto/chacha20.cpp:124 in 7f3f84c833 outdated
127 | - QUARTERROUND( x0, x5,x10,x15) 128 | - QUARTERROUND( x1, x6,x11,x12) 129 | - QUARTERROUND( x2, x7, x8,x13) 130 | - QUARTERROUND( x3, x4, x9,x14) 131 | - } 132 | +
kristapsk commented at 5:28 PM on April 22, 2022:Maybe add a comment that loop unrolling was done for performance reasons?
sipa commented at 6:01 PM on April 22, 2022:Done.
kristapsk commented at 5:30 PM on April 22, 2022: contributorConcept ACK, I see
./src/bench/bench_bitcoinimprovements with this change.sipa force-pushed on Apr 22, 2022MarcoFalke added the label DrahtBot Guix build requested on Apr 22, 2022instagibbs commented at 5:58 PM on April 22, 2022: memberCan you give commands to run just those benches for those willing to replicate?
sipa commented at 5:59 PM on April 22, 2022: member./src/bench/bench_bitcoin -filter=.*CHACHA20_[1-9].*laanwj commented at 6:40 PM on April 22, 2022: memberI'm somewhat surprised unrolling a loop that is
2010 times the same thing gives that much performance win on modern CPUs. But it's just a few ROL instructions I guess so the loop overhead easily dominates? Anyhow, concept ACK.instagibbs commented at 6:42 PM on April 22, 2022: memberGetting a rough average of 15% speedup as well
sipa commented at 6:53 PM on April 22, 2022: member@laanwj It may also have to do with better register scheduling when unrolling (the same variable doesn't need to stay in the same register every iteration), though I haven't investigated what the difference in emitted asm is.
This change may be very compiler and platform dependent, so it may be good to know what its impact is with modern clang versions and/or on arm64 systems.
jonatack commented at 7:29 PM on April 22, 2022: memberDebian testing clang 15, normal (non-debug) build, fixed CPU speed, I'm not sure I'm seeing a difference. Trying again after optimizing and tuning further.
laanwj commented at 8:24 PM on April 22, 2022: memberGcc 11.2.0, x86_64:
- The function
ChaCha20::Keystreamgrows in size from 992 bytes to 3840 (doesn't seem too bad, still fits in a page). - One iteration of the loop looks like:
370: 41 01 ed add %ebp,%r13d 373: 41 01 db add %ebx,%r11d 376: 41 01 f2 add %esi,%r10d 379: 44 31 e9 xor %r13d,%ecx 37c: 44 31 da xor %r11d,%edx 37f: 44 31 d0 xor %r10d,%eax 382: c1 c1 10 rol $0x10,%ecx 385: c1 c2 10 rol $0x10,%edx 388: 41 01 c9 add %ecx,%r9d 38b: 01 d7 add %edx,%edi 38d: c1 c0 10 rol $0x10,%eax 390: 44 31 cd xor %r9d,%ebp 393: 31 fb xor %edi,%ebx 395: 41 01 c4 add %eax,%r12d 398: c1 c5 0c rol $0xc,%ebp 39b: c1 c3 0c rol $0xc,%ebx 39e: 44 31 e6 xor %r12d,%esi 3a1: 41 01 ed add %ebp,%r13d 3a4: 41 01 db add %ebx,%r11d 3a7: c1 c6 0c rol $0xc,%esi 3aa: 44 31 e9 xor %r13d,%ecx 3ad: 44 31 da xor %r11d,%edx 3b0: 41 01 f2 add %esi,%r10d 3b3: c1 c1 08 rol $0x8,%ecx 3b6: c1 c2 08 rol $0x8,%edx 3b9: 44 31 d0 xor %r10d,%eax 3bc: 41 01 c9 add %ecx,%r9d 3bf: 01 d7 add %edx,%edi 3c1: 44 31 cd xor %r9d,%ebp 3c4: 31 fb xor %edi,%ebx 3c6: 89 7c 24 08 mov %edi,0x8(%rsp) 3ca: c1 c5 07 rol $0x7,%ebp 3cd: c1 c3 07 rol $0x7,%ebx 3d0: 44 89 4c 24 04 mov %r9d,0x4(%rsp) 3d5: c1 c0 08 rol $0x8,%eax 3d8: 45 01 f8 add %r15d,%r8d 3db: 41 01 dd add %ebx,%r13d 3de: 45 31 c6 xor %r8d,%r14d 3e1: 41 01 c4 add %eax,%r12d 3e4: 44 89 f7 mov %r14d,%edi 3e7: 44 8b 74 24 0c mov 0xc(%rsp),%r14d 3ec: 44 31 e6 xor %r12d,%esi 3ef: c1 c7 10 rol $0x10,%edi 3f2: c1 c6 07 rol $0x7,%esi 3f5: 41 01 fe add %edi,%r14d 3f8: 41 01 f3 add %esi,%r11d 3fb: 45 31 f7 xor %r14d,%r15d 3fe: 45 89 f1 mov %r14d,%r9d 401: 44 31 d9 xor %r11d,%ecx 404: 41 c1 c7 0c rol $0xc,%r15d 408: c1 c1 10 rol $0x10,%ecx 40b: 45 01 f8 add %r15d,%r8d 40e: 44 31 c7 xor %r8d,%edi 411: c1 c7 08 rol $0x8,%edi 414: 41 01 f9 add %edi,%r9d 417: 44 31 ef xor %r13d,%edi 41a: c1 c7 10 rol $0x10,%edi 41d: 45 31 cf xor %r9d,%r15d 420: 41 01 c9 add %ecx,%r9d 423: 41 01 fc add %edi,%r12d 426: 41 c1 c7 07 rol $0x7,%r15d 42a: 44 31 e3 xor %r12d,%ebx 42d: c1 c3 0c rol $0xc,%ebx 430: 41 01 dd add %ebx,%r13d 433: 44 31 ef xor %r13d,%edi 436: 41 89 fe mov %edi,%r14d 439: 41 c1 c6 08 rol $0x8,%r14d 43d: 45 01 f4 add %r14d,%r12d 440: 44 31 e3 xor %r12d,%ebx 443: c1 c3 07 rol $0x7,%ebx 446: 44 31 ce xor %r9d,%esi 449: 45 01 fa add %r15d,%r10d 44c: 41 01 e8 add %ebp,%r8d 44f: c1 c6 0c rol $0xc,%esi 452: 44 31 d2 xor %r10d,%edx 455: 44 31 c0 xor %r8d,%eax 458: c1 c2 10 rol $0x10,%edx 45b: 41 01 f3 add %esi,%r11d 45e: c1 c0 10 rol $0x10,%eax 461: 44 31 d9 xor %r11d,%ecx 464: c1 c1 08 rol $0x8,%ecx 467: 41 8d 3c 09 lea (%r9,%rcx,1),%edi 46b: 44 8b 4c 24 04 mov 0x4(%rsp),%r9d 470: 31 fe xor %edi,%esi 472: 89 7c 24 0c mov %edi,0xc(%rsp) 476: 8b 7c 24 08 mov 0x8(%rsp),%edi 47a: 41 01 d1 add %edx,%r9d 47d: c1 c6 07 rol $0x7,%esi 480: 01 c7 add %eax,%edi 482: 45 31 cf xor %r9d,%r15d 485: 31 fd xor %edi,%ebp 487: 41 c1 c7 0c rol $0xc,%r15d 48b: c1 c5 0c rol $0xc,%ebp 48e: 45 01 fa add %r15d,%r10d 491: 41 01 e8 add %ebp,%r8d 494: 44 31 d2 xor %r10d,%edx 497: 44 31 c0 xor %r8d,%eax 49a: c1 c2 08 rol $0x8,%edx 49d: c1 c0 08 rol $0x8,%eax 4a0: 41 01 d1 add %edx,%r9d 4a3: 01 c7 add %eax,%edi 4a5: 45 31 cf xor %r9d,%r15d 4a8: 31 fd xor %edi,%ebp 4aa: 41 c1 c7 07 rol $0x7,%r15d 4ae: c1 c5 07 rol $0x7,%ebp 4b1: 83 6c 24 10 01 subl $0x1,0x10(%rsp) 4b6: 0f 85 b4 fe ff ff jne 370 <ChaCha20::Keystream(unsigned char*, unsigned long)+0x140>- The unrolling indeed causes different register allocation, as well as instructions from multiple iterations to be interspersed (maybe better for scheduling, maybe it's possible to combine?).
- Benchmarks before on old AMD Phenom(tm) II X6 1075T:
| ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 2.18 | 459,187,395.75 | 0.3% | 0.03 | `CHACHA20_1MB` | 2.21 | 452,155,530.63 | 0.2% | 0.01 | `CHACHA20_256BYTES` | 2.34 | 427,257,435.31 | 0.0% | 0.01 | `CHACHA20_64BYTES`- Benchmarks after on same (~12% speedup):
| ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 1.91 | 523,324,820.67 | 0.4% | 0.02 | `CHACHA20_1MB` | 1.94 | 516,638,576.63 | 0.0% | 0.01 | `CHACHA20_256BYTES` | 2.22 | 451,258,216.13 | 4.6% | 0.01 | `CHACHA20_64BYTES`jonatack commented at 9:30 PM on April 22, 2022: memberRestarted and tuned (i7 6500U CPU @ 2.5 GHz) with
pyperf system tune, non-debug build, seeing roughly a 3 to 4% improvement.Linux 5.16.0-6-amd64 [#1](/bitcoin-bitcoin/1/) SMP PREEMPT Debian 5.16.18-1 (2022-03-29) x86_64 GNU/Linux. Debian clang version 15.0.0-++20220422111431+ba46ae7bd853-1~exp1~20220422111525.449 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/binmaster | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 2.43 | 410,814,309.33 | 0.3% | 18.61 | 6.29 | 2.957 | 0.20 | 0.0% | 0.03 | `CHACHA20_1MB` | 2.46 | 406,907,108.96 | 0.0% | 18.89 | 6.37 | 2.965 | 0.22 | 0.0% | 0.01 | `CHACHA20_256BYTES` | 2.59 | 385,499,110.76 | 1.0% | 19.72 | 6.68 | 2.952 | 0.28 | 0.0% | 0.01 | `CHACHA20_64BYTES`branch | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 2.35 | 425,969,024.53 | 0.7% | 16.70 | 6.07 | 2.752 | 0.05 | 0.0% | 0.03 | `CHACHA20_1MB` | 2.37 | 422,279,272.14 | 0.0% | 17.14 | 6.14 | 2.792 | 0.07 | 0.0% | 0.01 | `CHACHA20_256BYTES` | 2.52 | 396,803,365.77 | 0.1% | 18.45 | 6.53 | 2.825 | 0.13 | 0.0% | 0.01 | `CHACHA20_64BYTES`Edit: re-ran the bench a dozen times each to verify that these results are representative.
ajtowns commented at 8:50 AM on April 23, 2022: memberI'm seeing much smaller improvements (0%-2.5% with gcc 11; 1.3%-7% with clang 13) on an old i7. (And very slightly worse performance compared to master with debug enabled)
Did you consider just changing the
for() { ... }loop toREPEAT10( ... )with#define REPEAT10(a) a a a a a a a a a a?laanwj commented at 9:50 AM on April 23, 2022: member- gcc 11.2.0, RISC-V 64-bit (SiFive Unmatched, 1.2Ghz): speedup is there, but much less pronounced (~5%):
| ns/byte | byte/s | err% | ins/byte | cyc/byte | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|---------------:|--------:|----------:|:---------- Before: | 22.29 | 44,862,631.89 | 0.8% | 0.00 | 0.00 | 0.00 | 0.0% | 0.26 | `CHACHA20_1MB` After: | 21.23 | 47,101,646.21 | 0.9% | 0.00 | 0.00 | 0.00 | 0.0% | 0.25 | `CHACHA20_1MB`- gcc 10.2.1, aarch64 (custom i.MX8MQ board, 1Ghz), ~8% speedup:
| ns/byte | byte/s | err% | ins/byte | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|---------------:|--------:|----------:|:---------- Before: | 6.04 | 165,526,246.91 | 0.1% | 16.84 | 0.16 | 11.8% | 0.07 | `CHACHA20_1MB` After: | 5.58 | 179,185,196.22 | 0.1% | 15.86 | 0.02 | 0.0% | 0.06 | `CHACHA20_1MB`It's a nice speedup, and a simple change, tested ACK 4f3a18906880b065b6119ccf32b2875748b297b2
Did you consider just changing the for() { ... } loop to REPEAT10( ... ) with #define REPEAT10(a) a a a a a a a a a a ?
I like this idea, more elegantly than copy/pasting it makes it immediately clear it's the same. I would guess the generated code is exactly the same.
MarcoFalke commented at 5:10 PM on April 23, 2022: memberNot seeing a large difference on an i7. (Maybe a 1%-3% speedup?)
gcc-12 Before:
| ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 2.23 | 447,617,214.06 | 0.2% | 0.03 | `CHACHA20_1MB` | 2.26 | 441,653,947.12 | 0.1% | 0.01 | `CHACHA20_256BYTES` | 2.50 | 399,993,391.82 | 6.1% | 0.01 | :wavy_dash: `CHACHA20_64BYTES` (Unstable with ~6,241.4 iters. Increase `minEpochIterations` to e.g. 62414) | 7.03 | 142,173,319.29 | 10.1% | 0.09 | :wavy_dash: `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10) | 3.26 | 307,218,931.17 | 1.7% | 0.04 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT` | 8.83 | 113,259,198.67 | 1.3% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT` | 4.28 | 233,685,573.34 | 0.4% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT` | 15.78 | 63,391,055.77 | 0.6% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT` | 7.71 | 129,684,901.52 | 0.4% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`gcc-12 After:
| ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 2.20 | 454,707,913.08 | 0.8% | 0.03 | `CHACHA20_1MB` | 2.36 | 424,359,263.25 | 4.9% | 0.01 | `CHACHA20_256BYTES` | 2.41 | 414,622,602.59 | 0.4% | 0.01 | `CHACHA20_64BYTES` | 6.99 | 143,089,808.99 | 7.2% | 0.09 | :wavy_dash: `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10) | 3.26 | 306,926,493.73 | 4.2% | 0.04 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT` | 9.59 | 104,251,645.58 | 8.6% | 0.01 | :wavy_dash: `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT` (Unstable with ~402.1 iters. Increase `minEpochIterations` to e.g. 4021) | 4.33 | 230,986,007.33 | 0.6% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT` | 16.23 | 61,602,235.65 | 1.7% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT` | 9.63 | 103,830,365.13 | 9.9% | 0.01 | :wavy_dash: `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT` (Unstable with ~1,639.9 iters. Increase `minEpochIterations` to e.g. 16399)gcc-10 Before:
| ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 2.26 | 442,527,877.02 | 0.5% | 0.03 | `CHACHA20_1MB` | 2.30 | 435,535,172.72 | 1.9% | 0.01 | `CHACHA20_256BYTES` | 2.39 | 418,262,709.74 | 0.4% | 0.01 | `CHACHA20_64BYTES` | 6.93 | 144,210,951.65 | 5.9% | 0.09 | :wavy_dash: `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10) | 3.16 | 316,109,217.24 | 4.8% | 0.04 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT` | 8.43 | 118,625,079.49 | 0.3% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT` | 4.18 | 239,143,934.28 | 0.2% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT` | 16.05 | 62,308,156.96 | 5.2% | 0.01 | :wavy_dash: `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT` (Unstable with ~961.0 iters. Increase `minEpochIterations` to e.g. 9610) | 7.63 | 131,070,821.81 | 0.1% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`gcc-10 after:
| ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 2.20 | 454,351,689.08 | 0.2% | 0.03 | `CHACHA20_1MB` | 2.40 | 416,825,911.73 | 4.4% | 0.01 | `CHACHA20_256BYTES` | 2.40 | 416,369,054.39 | 0.2% | 0.01 | `CHACHA20_64BYTES` | 6.58 | 151,882,394.04 | 10.5% | 0.08 | :wavy_dash: `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10) | 3.03 | 329,600,644.76 | 0.9% | 0.04 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT` | 9.40 | 106,431,172.41 | 10.1% | 0.01 | :wavy_dash: `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT` (Unstable with ~434.9 iters. Increase `minEpochIterations` to e.g. 4349) | 4.30 | 232,776,146.25 | 0.2% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT` | 16.17 | 61,831,918.45 | 1.3% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT` | 8.83 | 113,301,205.50 | 5.8% | 0.01 | :wavy_dash: `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT` (Unstable with ~1,771.5 iters. Increase `minEpochIterations` to e.g. 17715)martinus commented at 5:45 PM on April 23, 2022: contributorI get the same 1-3% speedup on my i7. In my test adding
#pragma GCC unroll 10in front of the loop seems to produce exactly the same unrolled loop as the hand coded, this works for GCC and clangSide note 1: use e.g.
./src/bench/bench_bitcoin -filter="CHACHA20.*" -min_time=2000to run each test for 2 seconds to get more stable resultsSide note 2: No need to quote the result, it's markdown :slightly_smiling_face:
My results on i7-8700, with clang 13.0.1:
master | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 1.91 | 523,770,793.06 | 0.1% | 18.52 | 6.08 | 3.043 | 0.20 | 0.0% | 1.09 |
CHACHA20_1MB| 1.94 | 515,227,758.97 | 0.3% | 18.79 | 6.16 | 3.048 | 0.22 | 0.0% | 1.10 |CHACHA20_256BYTES| 2.02 | 494,527,885.82 | 0.2% | 19.61 | 6.44 | 3.046 | 0.28 | 0.0% | 1.10 |CHACHA20_64BYTESbranch | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 1.83 | 547,223,233.51 | 0.0% | 17.08 | 5.83 | 2.931 | 0.05 | 0.0% | 1.07 |
CHACHA20_1MB| 1.87 | 535,851,391.81 | 0.1% | 17.51 | 5.95 | 2.942 | 0.07 | 0.0% | 1.10 |CHACHA20_256BYTES| 1.98 | 504,774,917.46 | 0.0% | 18.81 | 6.32 | 2.977 | 0.13 | 0.0% | 1.10 |CHACHA20_64BYTESEmpact commented at 11:15 PM on April 23, 2022: member+1 for
#pragma unrollor similarlaanwj commented at 9:03 AM on April 27, 2022: memberSo I think the conclusion here is that on i7 there's no (or not much) difference but on other platforms it varies. But it never becomes worse. I think a performance optimization like this is mostly interesting for slower CPUs with less effective branch prediction so that's OK with me.
MarcoFalke commented at 6:28 PM on May 4, 2022: memberTIL that it is possible to pass multiple lines as an argument to a macro
sipa commented at 6:29 PM on May 4, 2022: memberTIL that it is possible to pass multiple lines as an argument to a macro
You clearly never saw the original serialization code this codebase had ;)
sipa force-pushed on May 4, 2022in src/crypto/chacha20.cpp:21 in 266bf15ddc outdated
17 | @@ -18,6 +18,8 @@ constexpr static inline uint32_t rotl32(uint32_t v, int c) { return (v << c) | ( 18 | a += b; d = rotl32(d ^ a, 8); \ 19 | c += d; b = rotl32(b ^ c, 7); 20 | 21 | +#define REPEAT10(a) a a a a a a a a a a
MarcoFalke commented at 6:51 PM on May 4, 2022:#define REPEAT10(a) do { a a a a a a a a a a } while (0)nit: Shouldn't this use do-while?
Otherwise writing
if (blub) REPEAT10(bla());will do the wrong thing?
Also, leaving the semicolon after the do-while in the definition makes the compiler enforce that one is placed after the call.
sipa commented at 6:53 PM on May 4, 2022:Done.
Unroll the ChaCha20 inner loop for performance 81c09ee45csipa force-pushed on May 4, 2022martinus commented at 5:47 AM on May 5, 2022: contributortested ACK 81c09ee with clang++ 13.0.1, test
CHACHA20_1MB:- 4.3% faster on i9-9960X
- 4.5% faster on i9-9980HK
- 4.4% faster on i7-8700
DrahtBot commented at 1:53 AM on May 8, 2022: member<!--9cd9c72976c961c55c7acef8f6ba82cd-->
Guix builds
DrahtBot removed the label DrahtBot Guix build requested on May 8, 2022MarcoFalke commented at 11:54 AM on May 9, 2022: member- A few percent faster on AMD EPYC as well with gcc-9/gcc-11.2/gcc-12.1/clang-14
- Same on AMD EPYC with guix built bench
- Same on Cortex-A72 with guix built bench
MarcoFalke commented at 11:56 AM on May 9, 2022: memberACK 81c09ee45caecf8d9daf6766b94cebf54f3f08cd 🍟
<details><summary>Show signature</summary>
Signature:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 ACK 81c09ee45caecf8d9daf6766b94cebf54f3f08cd 🍟 -----BEGIN PGP SIGNATURE----- iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p pUgxzQv9FMC3MiK58jmwXRv26Mf41HrwpXJawhRSU/j+VM0Vq9JI6RlIkZ3E5Biy EKOxtL9cMKv6cMOyE5bihZF3uIqnwJCMAx+8cb+/6RYm33UseEMHxX/S8T+Q8/vy 4r5BU/kisbX77yAjooN7Lr0/nKSv2E8APFjvcp7NIkWkx89W2zrk9z4eoFS5Dri/ yAbMpc95eTtu4gmsbjNNE73/Q1MsdfXiBgzwP8ToV/grzoZPpBTt7dsb1QRRjn1N NAY/xG1p1kFo7ORbJ0ZHiKE4waat0Erqi8MX35f5mkMVa47X5VdDuP1FGn191f9K oS6cfgSZr4d+SE3SFer56/3QOVToa06VmxjmKoRv0j12S7NVOxnjRNjwN6XkhgoK wlpkNa3HxNxdMNmaUDqxXk5Z1zH5RCjZwiPQuMG5sExjemAAJXOFQ8WYnJFGp04R dFlXeMTy2ZQWMWoEMhdJ2jCDjvggjMW8t51VA3+GQvr8ZZmN10dzXPA+Qi1c25es QNkpUvPg =2W4Z -----END PGP SIGNATURE-----</details>
MarcoFalke merged this on May 9, 2022MarcoFalke closed this on May 9, 2022sidhujag referenced this in commit 346bcd37d7 on May 9, 2022DrahtBot locked this on May 9, 2023
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 15:14 UTC
More mirrored repositories can be found on mirror.b10c.me