Summary
The current Write()
implementation of Siphash uses a byte-by-byte approach to iterate the span. This resulted in significant overhead
for large inputs due to repeated bounds checking and span manipulations, without any help from the compiler.
This PR aims at optimizing Siphash by replacing byte-by-byte processing in CSipHasher::Write() with an optimized chunked approach that processes data in 8-byte aligned blocks when possible.
These improvements are particularly beneficial for wallet operations and block filter construction.
Details
The new implementation is divided in 3 stages that process:
- initial unaligned bytes to reach an 8-byte boundary
- aligned 8-byte chunks directly using memcpy for efficiency
- remaining bytes at the end
every change was thoroughly tested and benchmarked to avoid overfitting, but replicating is welcomed and encouraged.
Benchmarks
0taskset -c 1 ./bin/bench_bitcoin -filter="(WalletIsMineMigratedDescriptors|WalletIsMineDescriptors|GCSFilterConstruct|AddrManSelect)" -output-csv=bench_old.csv --min-time=60000
Before:
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
214.55 | 4,660,983.40 | 0.2% | 65.89 | AddrManSelect |
12,983,090.72 | 77.02 | 0.1% | 66.00 | GCSFilterConstruct |
100.29 | 9,971,046.61 | 0.0% | 66.02 | WalletIsMineDescriptors |
115.42 | 8,664,379.92 | 0.0% | 66.02 | WalletIsMineMigratedDescriptors |
After:
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
210.60 | 4,748,271.82 | 0.1% | 65.93 | AddrManSelect |
11,155,751.42 | 89.64 | 0.1% | 65.99 | GCSFilterConstruct |
89.87 | 11,126,702.73 | 0.0% | 66.01 | WalletIsMineDescriptors |
72.67 | 13,761,145.85 | 0.0% | 66.01 | WalletIsMineMigratedDescriptors |
compared to master:
AddrManSelect
+1.85% fasterGCSFilterConstruct
+16% fasterWalletIsMineDescriptors
+11.6% fasterWalletIsMineMigratedDescriptors
+59% faster
Arguably the most impacting improvement would be to GCSFilterConstruct
during IBD, but has to be tested.