IBD performance regression in 27.0rc1 on Windows #29785

issue vostrnad openend this issue on April 2, 2024
  1. vostrnad commented at 9:49 am on April 2, 2024: none

    While investigating the variance in IBD when synchronizing from network nodes and comparing it to synchronizing from a local node, I noticed a significant slowdown when I switched the synchronizing node to 27.0rc1. When connected to a local node (also 27.0rc1), it reaches block 120,000 about 10% slower on average than 26.0, with a much higher variance. I measured around 100 runs of each, alternating between the versions every time. Headers (pre-)sync is not included in the measurement. OS is Windows 10, I’m using pre-built binaries for both versions.

    26.0: ibd26corrected

    27.0rc1: ibd27rc1corrected

    EDIT: The issue seems to be only with the pre-built release binary for Windows. Tests with a pre-built Linux binary on WSL and a self-built MSVC binary don’t show the regression (see later posts).

  2. maflcko added the label Windows on Apr 2, 2024
  3. maflcko added the label Resource usage on Apr 2, 2024
  4. maflcko removed the label Resource usage on Apr 2, 2024
  5. maflcko commented at 9:59 am on April 2, 2024: member
    Does it also happen on another type of operating system?
  6. vostrnad commented at 10:12 am on April 2, 2024: none
    I haven’t tested it with a different operating system. I can try WSL.
  7. maflcko commented at 10:52 am on April 2, 2024: member
    yes, that’d be useful to check, so that it is easier to tell if it is a Windows build system bug, or a caused by something else.
  8. vostrnad commented at 12:21 pm on April 2, 2024: none

    Tested pre-built binaries on Debian 12 WSL on the same system, measured 45 runs of each. No performance regression, 27.0rc1 is slightly faster on average.

    26.0: ibdwsl26corrected

    27.0rc1: ibdwsl27rc1corrected

  9. vostrnad renamed this:
    IBD performance regression in 27.0rc1
    IBD performance regression in 27.0rc1 on Windows
    on Apr 2, 2024
  10. maflcko commented at 12:34 pm on April 2, 2024: member
    Can you also try a self-built Windows executable?
  11. fanquake added this to the milestone 27.0 on Apr 3, 2024
  12. vostrnad commented at 10:52 am on April 3, 2024: none

    Tested self-built MSVC binaries for 26.0 and 27.0rc1, definitely no regression.

    26.0: ibdmsvc26

    27.0rc1: ibdmsvc27rc1

    Note that I made corrections to my previous graphs because of a bug that caused some runs to leak from one graph to the other. The regression is even more prominent now.

  13. maflcko commented at 10:57 am on April 3, 2024: member

    Tested self-built MSVC binaries for 26.0 and 27.0rc1, definitely no regression.

    I forgot to mention that optimization was enabled in commit 41e378a0a1f0729b423034f47c28a4e83287a1e8. So for a fair comparison, one would have to backport 41e378a0a1f0729b423034f47c28a4e83287a1e8 to 26.0.

    I presume that benchmarking IBD is expensive. So maybe comparing the micro-benchmarks can provide a hint at which code is slower?

  14. fanquake commented at 11:03 am on April 3, 2024: member

    Tested pre-built binaries on Debian 12 WSL on the same system, measured 45 runs of each. No performance regression, Tested self-built MSVC binaries for 26.0 and 27.0rc1, definitely no regression.

    It’s not clear to me if there is still an issue here or not. Or is there still a regression when comparing our 26.x and 27.x Windows release binaries?

    If the only issue is in relation to self-compiled MSVC binaries, then this isn’t a blocker for 27.x.

  15. hebasto commented at 11:04 am on April 3, 2024: member
    Speaking of MSVC builds, it’s worth noting that they still don’t have a hardware-accelerated SHA256 implementation (see #24773).
  16. vostrnad commented at 11:12 am on April 3, 2024: none

    @maflcko

    I forgot to mention that optimization was enabled in commit https://github.com/bitcoin/bitcoin/commit/41e378a0a1f0729b423034f47c28a4e83287a1e8. So for a fair comparison, one would have to backport https://github.com/bitcoin/bitcoin/commit/41e378a0a1f0729b423034f47c28a4e83287a1e8 to 26.0.

    The original regression is also in variance which doesn’t appear with the MSVC binary, so I don’t think it’s necessary to test the backport. @fanquake

    It’s not clear to me if there is still an issue here or not. Or is there still a regression when comparing our 26.x and 27.x Windows release binaries?

    The regression is in the pre-built 27.0rc1 binary for Windows (which isn’t MSVC), see original post. Pre-built binary for Linux tested on WSL and self-built MSVC binary for Windows don’t have the regression, see later posts.

  17. maflcko commented at 11:25 am on April 3, 2024: member

    I presume that benchmarking IBD is expensive. So maybe comparing the micro-benchmarks can provide a hint at which code is slower?

    I forgot to mention that benchmarks are disabled in guix. So something like sed -i -e 's/--disable-bench //g' $( git grep -l disable-bench ./contrib/guix/ ) may be needed before a guix build.

  18. fanquake commented at 12:10 pm on April 3, 2024: member

    The regression is in the pre-built 27.0rc1 binary for Windows

    Ok. It would be good if someone else on Windows can confirm this. Can you also let us know if you’re using any particular config options etc.

  19. m3dwards commented at 1:28 pm on April 3, 2024: contributor
    I can try and recreate this. @vostrnad do you have a script that records / plots the graphs? Also interested in how are you able to perform so many runs in such a short amount of time? Do you have access to a lot of compute?
  20. m3dwards commented at 3:26 pm on April 3, 2024: contributor
    Running IBD in a loop, will report back tomorrow.
  21. vostrnad commented at 4:38 pm on April 3, 2024: none

    Can you also let us know if you’re using any particular config options etc.

    Beyond the bare minimum (-connect, -datadir, RPC user/pass etc.) I’m just increasing dbcache and enabling pruning, but neither of those should kick in at this block height. I’ll test again using the defaults just to be sure.

    how are you able to perform so many runs in such a short amount of time? Do you have access to a lot of compute?

    I’m not doing full IBD, just to block 120,000. As much as I’d like to benchmark hundreds of full IBD runs, I don’t have that kind of compute.

  22. mzumsande commented at 4:43 pm on April 3, 2024: contributor
    Does it also appear with -reindex (without a second node)?
  23. sipa commented at 4:57 pm on April 3, 2024: member
    Or even with -reindex-chainstate (which does even less)?
  24. vostrnad commented at 5:44 pm on April 3, 2024: none
    I’ve set up a benchmark that performs a partial IBD and then runs -reindex, however the reindexing takes about ten times as long as the IBD, with CPU and disk usage sitting near zero. Same for -reindex-chainstate. Is this normal? Even if it is, is it even worth benchmarking for a difference between 27.0rc1 and 26.0? At first glance it seems about as slow.
  25. m3dwards commented at 10:19 am on April 4, 2024: contributor

    I was also going to block 120,000 but with public nodes. Re-read your initial post and see you are talking about syncing from a local node which then I assume makes the x axis of these charts in seconds, not minutes.

    I’ll re-run my test connecting to a local node.

  26. vostrnad commented at 12:52 pm on April 4, 2024: none

    Ran the original benchmark again (pre-built binaries on Windows) with only these configuration options:

    • -connect
    • -datadir
    • -port
    • -rpcport
    • -rpcuser
    • -rpcpassword

    This time 27.0rc1 was about 5% slower on average than 26.0, still with a much higher variance. Measured around 250 runs of each.

    26.0: ibd26noconfig

    27.0rc1: ibd27rc1noconfig

  27. m3dwards commented at 2:12 pm on April 5, 2024: contributor

    I can partially replicate this regression using pre-built binaries on Windows 11. For me, 27.0rc1 is 15% slower than 26.0. I can’t quite replicate the higher variability of 27.0rc1 as it had a slightly higher range but a lower standard deviation.

    26.0

    Mean: 35.21 seconds Range: 9 seconds Standard Deviation: 2.25

    Screenshot 2024-04-05 at 14 44 01

    27.0rc1

    Mean: 40.77 seconds Range: 10 seconds Standard Deviation: 2.00

    Screenshot 2024-04-05 at 14 47 39

    Method

    Machine: 14th Gen i9 processor, 96gb ram, 2tb nvme storage, win 11 Binaries: Downloaded from bitcoincore.org

    • Ran a 26.0 fully synced node locally and connected a test 26.0or 27.0rc1 to it.
    • Disabled antivirus for the data directories of all three bitcoind’s
    • Alternated between 26.0 and 27.0rc1 each run
    • Used these flags on bitcoind under test: -datadir, -stopatheight, -port, -connect
    • Didn’t include data from first two runs as they were both a lot slower (machine must have been warming up)
    • Start time was calculated as a log line that included both UpdateTip and Height=1 (this occurs after 98% of synching headers)
    • End time was calculated as the last log line that included UpdateTip

    I have a copy of the debug.log from all 200 runs so can make charts that shows progress over time if we think that’s helpful or choose different start and end times.

  28. maflcko commented at 2:56 pm on April 5, 2024: member

    Without knowing the cause, there is little that can be done. Given that WSL-built binaries didn’t regress, it hints at some build flags inside of guix. My suggestion would be to bisect while doing guix builds. An alternative would be to try to match the guix compile flags and compiler version in WSL.

    Also, instead of IBD, the benchmarks can be used, if they differ enough. Though, they’ll need to be enabled: #29785 (comment)

  29. vostrnad commented at 3:07 pm on April 5, 2024: none
    @maflcko Just to clarify, what didn’t regress was the release binary for Linux (which I assume is built with guix, not WSL) running in WSL.
  30. fanquake commented at 3:26 pm on April 5, 2024: member

    which I assume is built with guix

    All of the release binaries are built using Guix. Windows is produced in Guix using GCC+Mingw-w64. We don’t produce any release binaries using WSL or MSVC.

  31. achow101 commented at 8:25 am on April 8, 2024: member

    I’m not quite seeing the same issue on my WIndows machine, although IBD is generally a lot slower

    (50 runs each, sync to 120,000)

    • On 26.1:

      • Median 359.9265245
      • Mean 374.7166118
      • Std. Dev. 24.05475889010843
    • On 27.0rc1:

      • Median 375.675126
      • Mean 375.79323382
      • Std. Dev. 2.7244149066583287

    Considering that this is rather difficult to figure out what is wrong, and it seems it might not be happening for everyone, I think it would be okay for us to deal with this later and move forward with the 27.0 final release.

  32. maflcko commented at 7:22 am on April 15, 2024: member
    Did someone with Windows try to enable the benchmarks in guix and compile them, and run them?
  33. m3dwards commented at 11:35 am on April 15, 2024: contributor

    Did someone with Windows try to enable the benchmarks in guix and compile them, and run them?

    I can do

  34. achow101 commented at 3:28 pm on April 15, 2024: member

    Did someone with Windows try to enable the benchmarks in guix and compile them, and run them?

    Yes, I did that. I could not find any benchmark that had a significant difference.

  35. maflcko commented at 3:51 pm on April 15, 2024: member
    Thanks for confirming. If the benchmarks can not show a difference for someone who could reproduce, bisecting guix builds may be the only option left to debug this, but that will probably take some time.
  36. fanquake removed this from the milestone 27.0 on Apr 16, 2024
  37. m3dwards commented at 4:28 pm on April 17, 2024: contributor

    I did get a bit of variation but hard to know what’s important. BlockAssemblerAddPackageTxns was about 10-15% slower on 27.0rc1 vs 26.0. The bench also throws a filesystem error on 27.0rc1 but completes on 26.0.

    Full bench results

  38. vostrnad commented at 1:56 pm on September 3, 2024: none

    After seeing the same high variance in 27.0 and 27.1, it seems to have gone away in 28.0rc1. I’ll close the issue for now and will look out for future regressions.

    Last set of graphs for your enjoyment:

    27.0: 27.0

    27.1: 27.1

    28.0rc1: 28.0rc1

  39. vostrnad closed this on Sep 3, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me