crypto: optimize SipHash Write() method with chunked processing #33325

pull Raimo33 wants to merge 1 commits into bitcoin:master from Raimo33:siphash-write-chunked changing 1 files +46 −9
  1. Raimo33 commented at 6:52 pm on September 6, 2025: none

    Summary

    The current Write() implementation of Siphash uses a byte-by-byte approach to iterate the span. This resulted in significant overhead for large inputs due to repeated bounds checking and span manipulations, without any help from the compiler.

    This PR aims at optimizing Siphash by replacing byte-by-byte processing in CSipHasher::Write() with an optimized chunked approach that processes data in 8-byte aligned blocks when possible.

    These improvements are particularly beneficial for wallet operations and block filter construction.

    Details

    The new implementation is divided in 3 stages that process:

    1. initial unaligned bytes to reach an 8-byte boundary
    2. aligned 8-byte chunks directly using memcpy for efficiency
    3. remaining bytes at the end

    every change was thoroughly tested and benchmarked to avoid overfitting, but replicating is welcomed and encouraged.

    Benchmarks

    0taskset -c 1 ./bin/bench_bitcoin -filter="(WalletIsMineMigratedDescriptors|WalletIsMineDescriptors|GCSFilterConstruct|AddrManSelect)" -output-csv=bench_old.csv --min-time=60000
    

    Before:

    ns/op op/s err% total benchmark
    214.55 4,660,983.40 0.2% 65.89 AddrManSelect
    12,983,090.72 77.02 0.1% 66.00 GCSFilterConstruct
    100.29 9,971,046.61 0.0% 66.02 WalletIsMineDescriptors
    115.42 8,664,379.92 0.0% 66.02 WalletIsMineMigratedDescriptors

    After:

    ns/op op/s err% total benchmark
    210.60 4,748,271.82 0.1% 65.93 AddrManSelect
    11,155,751.42 89.64 0.1% 65.99 GCSFilterConstruct
    89.87 11,126,702.73 0.0% 66.01 WalletIsMineDescriptors
    72.67 13,761,145.85 0.0% 66.01 WalletIsMineMigratedDescriptors

    compared to master:

    • AddrManSelect +1.85% faster
    • GCSFilterConstruct +16% faster
    • WalletIsMineDescriptors +11.6% faster
    • WalletIsMineMigratedDescriptors +59% faster

    Arguably the most impacting improvement would be to GCSFilterConstruct during IBD, but has to be tested.

  2. DrahtBot added the label Utils/log/libs on Sep 6, 2025
  3. DrahtBot commented at 6:52 pm on September 6, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33325.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

  4. l0rinc commented at 7:55 pm on September 6, 2025: contributor

    This seems like a good idea to me, I will review this in more detail soon.

    To set your expectations, these fundamental changes can take a long time to review - and we need very good reasons to risk introducing a potential error and making the code more complicated for a speedup that the users might never notice.

    I like that you’re explaining it in a lot of detail in the commit message, it’s not a trivial change! We may be able to use more C++20 alignment-specific methods there (e.g. like the std::assume_aligned we used in https://github.com/bitcoin/bitcoin/pull/31144/files#diff-f26d4597a7f5a5d4aa30053032d49427b402ef4fd7a1b3194cb75b2551d58ca4R44) - but I haven’t gone deep yet, I could be wrong here, I just glanced at it so far.

    In the meantime, please check if there’s any way to split this into even smaller commits to make it even easier for reviewers, check if the current tests cover each branch (e.g. the big-endian ones), different architectures, etc.

    But as far as I can tell this doesn’t affect most caching scenarios since we have dedicated siphash methods for those. To be sure, I can run some reindexes once the change is settled to make sure it doesn’t make it slower at least.

  5. Raimo33 force-pushed on Sep 6, 2025
  6. crypto: optimize SipHash Write() method with chunked processing
    Replace byte-by-byte processing in CSipHasher::Write() with an optimized
    chunked approach that processes data in 8-byte aligned blocks when possible.
    
    The previous implementation processed input data one byte at a time using
    data.front() and data.subspan(1), which resulted in significant overhead
    for large inputs due to repeated bounds checking and span manipulations.
    
    The new implementation:
    - Handles initial unaligned bytes to reach an 8-byte boundary
    - Processes aligned 8-byte chunks directly using memcpy for efficiency
    - Handles remaining bytes at the end
    
    This optimization provides substantial performance improvements across
    Bitcoin operations that rely on SipHash:
    - GCSFilterConstruct: 16% improvement
    - WalletIsMineDescriptors: 11.6% improvement
    - WalletIsMineMigratedDescriptors: 59.0% improvement
    
    These improvements are particularly beneficial for wallet operations and block filter construction.
    
    ./bin/bench_bitcoin -filter="(WalletIsMineMigratedDescriptors|WalletIsMineDescriptors|GCSFilterConstruct|AddrManSelect)" -output-csv=bench_old.csv --min-time=60000
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |              214.55 |        4,660,983.40 |    0.2% |     65.89 | `AddrManSelect`
    |       12,983,090.72 |               77.02 |    0.1% |     66.00 | `GCSFilterConstruct`
    |              100.29 |        9,971,046.61 |    0.0% |     66.02 | `WalletIsMineDescriptors`
    |              115.42 |        8,664,379.92 |    0.0% |     66.02 | `WalletIsMineMigratedDescriptors`
    
    ./bin/bench_bitcoin -filter="(WalletIsMineMigratedDescriptors|WalletIsMineDescriptors|GCSFilterConstruct|AddrManSelect)" -output-csv=bench_new.csv --min-time=60000
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |              210.60 |        4,748,271.82 |    0.1% |     65.93 | `AddrManSelect`
    |       11,155,751.42 |               89.64 |    0.1% |     65.99 | `GCSFilterConstruct`
    |               89.87 |       11,126,702.73 |    0.0% |     66.01 | `WalletIsMineDescriptors`
    |               72.67 |       13,761,145.85 |    0.0% |     66.01 | `WalletIsMineMigratedDescriptors`
    f4fc6cec82
  7. Raimo33 commented at 1:44 am on September 7, 2025: none
    I just realized this was attempted in the past. merged in Bitcoin knots 19. restored to match Core’s implementation in knots 29. 🤔. #18014
  8. Raimo33 commented at 2:15 am on September 7, 2025: none

    We may be able to use more C++20 alignment-specific methods there (e.g. like the std::assume_aligned we used in

    I tried with

    0- uint64_t chunk;
    1- std::memcpy(&chunk, ptr, 8);
    2+ auto aligned_ptr = std::assume_aligned<8>(ptr);
    3+ uint64_t chunk = *reinterpret_cast<const uint64_t *>(aligned_ptr);
    

    but no luck. performance is identical. memcpy is extremely optimized.

  9. Raimo33 force-pushed on Sep 7, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-09-12 12:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me