util/blockstorage: Add TRACE_RAII, slightly faster -reindex-chainstate with CBufferedFile #28226

pull martinus wants to merge 4 commits into bitcoin:master from martinus:2023-08-more-CBufferedFile changing 4 files +84 −2
  1. martinus commented at 5:38 am on August 6, 2023: contributor

    TLDR:

    • Adds a TRACE_RAII macro to easily trace runtime of a code block.
    • Switch to CBufferedFile in BlockManager::ReadBlockFromDisk is slightly faster:
    • 9% faster unserialization => 1.2% faster -reindex-chainstate

    While profiling -reindex-changestate I saw lots of fread() calls in in BlockManager::ReadBlockFromDisk. This replaces the use of CAutoFile with CBufferedFile with a small buffer, leading to much fewer calls to fread(), which gives a little speedup.

    I measured runtime of the synchronization with the TRACE_RAII macro. I ran this command which took about 30 minutes on my PC, with and without CBufferedFile:

    0sync && sudo /sbin/sysctl vm.drop_caches=3 && ~/git/github.com/martinus/bitcoin/src/bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000
    

    The measured time spent in unserializing blocks are:

    • 308.227s CAutoFile
    • 277.827s CBufferedFile

    It is a bit hard to measure the total effect on -reindex-chainstate due to random fluctuations in the benchmark. For somewhat reliable results I’ve run the benchmark 10 times, using hyperfine:

    0hyperfine \                                                                                                                                                                                                                                                                                         --parameter-list commit 0f3a6a74a2889817df0f52e19c37de19c664daac,ce26cb6025c47bb4b3e51a579689635da7ca1f6b \
    1--setup 'git checkout {commit} && make -j$(nproc)' \
    2--prepare 'sync; sudo /sbin/sysctl vm.drop_caches=3' \
    3'./bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000'
    

    The results are:

    0Benchmark 1: ./bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000 (CAutoFile)
    1  Time (mean ± σ):     1847.622 s ± 13.448 s    [User: 1702.768 s, System: 54.059 s]
    2  Range (min … max):   1835.319 s … 1877.065 s    10 runs
    3
    4Benchmark 2: ./bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000 (CBufferedFile)
    5  Time (mean ± σ):     1824.660 s ± 14.394 s    [User: 1679.417 s, System: 53.965 s]
    6  Range (min … max):   1798.764 s … 1848.940 s    10 runs
    

    So the 9% improvement in unserializing seems to translate to roughly 1.2% total runtime improvement in -reindex-chainstate on my machine.

  2. DrahtBot commented at 5:38 am on August 6, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28195 (blockstorage: Drop legacy -txindex check by MarcoFalke)
    • #28052 (blockstorage: XOR blocksdir *.dat files by MarcoFalke)
    • #25541 (tracing: macOS USDT support by kouloumos)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. tracing: Add TRACE_RAII to easily get statistics about a code block 18cba0eac8
  4. util: Add CBufferedFile::IsNull, same as in CAutoFile
    So that CAutoFile is replaceable with CBufferedFile
    3bbbf6de49
  5. blockstorage: Add TRACE_RAII to trace serialization code
    That way we can easily get reliable statistics about the serialization
    code, without interference.
    39f11a2505
  6. blockstorage: Use CBufferedFile in ReadBlockFromDisk instead of CAutoFile
    While profiling `-reindex-changestate` I saw lots of fread() calls in
    in ReadBlockFromDisk which can be slow. This replaces the use of CAutoFile
    with CBufferedFile with a small buffer, leading to much fewer calls to
    fread().
    
    With the TRACE_RAII in place it is easier to reliably measure performance
    of the critical part, just measuring `-reindex-chainstate` runtime can be
    quite noisy.
    
    Nevertheless, I ran this command which took about 30 minutes on my PC:
    
    ```
    sync && sudo /sbin/sysctl vm.drop_caches=3 && ~/git/github.com/martinus/bitcoin/src/bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000
    ```
    
    The measured time spent in unserializing blocks are:
    * 308.227s mergebase
    * 277.827s this commit
    
    This 9% improvement seems to translate into roughly 1.2% total runtime improvement,
    when running the benchmark 10 times and showing the average runtime:
    
    ```
    Benchmark 1: ./bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000 (mergebase)
      Time (mean ± σ):     1847.622 s ± 13.448 s    [User: 1702.768 s, System: 54.059 s]
      Range (min … max):   1835.319 s … 1877.065 s    10 runs
    
    Benchmark 2: ./bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000 (branch)
      Time (mean ± σ):     1824.660 s ± 14.394 s    [User: 1679.417 s, System: 53.965 s]
      Range (min … max):   1798.764 s … 1848.940 s    10 runs
    ```
    
    I have produced these results with `hyperfine`, with this command:
    
    ```
    hyperfine \                                                                                                                                                                                                                                                                                          400 Mbps
    --parameter-list commit 0f3a6a74a2889817df0f52e19c37de19c664daac,ce26cb6025c47bb4b3e51a579689635da7ca1f6b \
    --setup 'git checkout {commit} && make -j$(nproc)' \
    --prepare 'sync; sudo /sbin/sysctl vm.drop_caches=3' \
    './bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000'
    ```
    e926795801
  7. martinus force-pushed on Aug 6, 2023
  8. DrahtBot added the label CI failed on Aug 6, 2023
  9. DrahtBot removed the label CI failed on Aug 6, 2023
  10. maflcko commented at 7:40 am on August 6, 2023: member
    Shouldn’t std::fread be already cached? I could imagine the result to be different on different hardware? Also, -reindex-chainstate seems rare enough to not spend time to optimize its runtime? Going further, if the only goal of CBufferedFile is to speed up -reindex by 1%, we could consider removing it. Since the implementation is put in the header file of streams.h, removing it could reduce peak compile memory usage by .5% and compilation speed by .3% on every fresh compilation IIRC. Obviously this is subjective, but to me it seems a better result.
  11. martinus commented at 12:27 pm on August 6, 2023: contributor

    I think most of the slowdown is due to the locking required by each fread() call. I did a quick benchmark by switching back to CAutoFile and and using fread_unlocked instead of std::fread (and call flockfile in the ctor and funlockfile before fclose), and the performance is practically the same as with CBufferedFile.

    I don’t mind dropping the change to CBufferedFile, but I think the TRACE_RAII macro is nice (maybe with a better name, I couldn’t think of anything else…). It simplifies getting accurate measurements from parts of the code.

  12. maflcko commented at 6:53 am on August 7, 2023: member

    It may be better to modify CAutoFile in that case. The file handle in the auto-file should[1] always be unique and pinned to a single thread, so I using fread_unlocked seems better, as it improves performance everywhere. (Though, it looks like std::fread_unlocked does not exist in C++, so I wonder if it exists on Windows and macOS?)

    [1] I may submit a related pull to better enforce this at compile-time.

  13. martinus marked this as a draft on Aug 7, 2023
  14. DrahtBot added the label CI failed on Sep 4, 2023
  15. DrahtBot commented at 11:38 am on September 5, 2023: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  16. DrahtBot added the label Needs rebase on Sep 5, 2023
  17. DrahtBot commented at 0:59 am on December 4, 2023: contributor

    There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  18. maflcko commented at 7:18 am on December 4, 2023: member
    I tried to remove BufferedFile, but it seemed non-trivial. I think it is needed to recover from datadir corruption caused by previous releases of Bitcoin Core. I wonder how relevant this still is, and whether it is simpler to require those affected to re-do an IBD. Another small benefit would be that the fuzz timeout in load_external_block_file would likely be fixed, see #28812 (comment). (Attempting to read a block at every byte position of the file in a loop may be CPU and memory intensive)
  19. DrahtBot commented at 0:28 am on March 3, 2024: contributor

    ⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  20. DrahtBot commented at 0:28 am on March 3, 2024: contributor

    🤔 There hasn’t been much activity lately and the CI seems to be failing.

    If no one reviewed the current pull request by commit hash, a rebase can be considered. While the CI failure may be a false positive, the CI hasn’t been running for some time, so there may be a real issue hiding as well. A rebase triggers the latest CI and makes sure that no silent merge conflicts have snuck in.

  21. DrahtBot commented at 1:35 am on May 31, 2024: contributor

    ⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  22. DrahtBot commented at 1:38 am on August 28, 2024: contributor

    ⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me