checkqueue: set MAX_SCRIPTCHECK_THREADS to nCores - 1 #32692

pull HowHsu wants to merge 1 commits into bitcoin:master from HowHsu:master changing 1 files +2 −1
  1. HowHsu commented at 1:53 pm on June 6, 2025: none
    A fixed MAX_SCRIPTCHECK_THREADS value is not flexible for users to leverage their cpu resources, and a value large than nCores - 1 doesn’t make sense since it only adds some context switch overhead. Set it to nCores - 1. Assumption: A user who sets the number of script verification workers is aware of how this affects the system performance, otherwise he/she leaves it as default (which is 0)
  2. checkqueue: set MAX_SCRIPTCHECK_THREADS to nCores - 1
    A fixed MAX_SCRIPTCHECK_THREADS value is not flexible for users to leverage their
    cpu resources, and a value large than nCores - 1 doesn't make sense since it only
    adds some context switch overhead. Set it to nCores - 1.
    Assumption: A user who sets the number of script verification workers is aware of
    how this affects the system performance, otherwise he/she leaves it as default
    (which is 0)
    
    Signed-off-by: Hao Xu <hao.xu@linux.dev>
    ef71fe4e19
  3. DrahtBot commented at 1:53 pm on June 6, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32692.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

  4. sipa commented at 1:58 pm on June 6, 2025: member
    I think up to nCores makes sense, and we need to be sure that that value is an actual reflection of hardware concurrency on all platforms. It may be worth benchmarking to see if there is actually a gain above certain values still (because thread contention will start to become more and more noticeable, which may at some point overtake the gains from more parallellism).
  5. maflcko added the label Needs Benchmark on Jun 6, 2025
  6. DrahtBot added the label CI failed on Jun 6, 2025
  7. DrahtBot commented at 2:42 pm on June 6, 2025: contributor

    🚧 At least one of the CI tasks failed. Task no wallet, libbitcoinkernel: https://github.com/bitcoin/bitcoin/runs/43620218125 LLM reason (✨ experimental): Linker error due to undefined reference to GetNumCores() causing build failure.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  8. HowHsu commented at 3:55 pm on June 6, 2025: none

    I think up to nCores makes sense, and we need to be sure that that value is an actual reflection of hardware concurrency on all platforms. It may be worth benchmarking to see if there is actually a gain above certain values still (because thread contention will start to become more and more noticeable, which may at some point overtake the gains from more parallellism).

    There is a master worker, plus nCores-1 normal workers, so setting it to nCores actually allows nCores+1 worker threads. I now think maybe it should even be smaller than nCores-1, since there may be some important bitcoind threads running (as a newbie, I can’t point them out clearly). About thread contention, I assume users know how many and which cores are idle enough to run these workers and they will bind workers to them(for example, use taskset on linux). Maybe more workers bring more lock contention, that’s a issue to be investigated.

    be sure that that value is an actual reflection of hardware concurrency on all platforms

    I’ll look into it.

  9. sipa commented at 4:04 pm on June 6, 2025: member

    There is a master worker, plus nCores-1 normal workers, so setting it to nCores actually allows nCores+1 worker threads.

    That is incorrect. With -par=N, there is 1 master plus N-1 workers, which both perform script validation work, but the master only joins the other threads when it is done with other work. If you have N perfectly parallel cores, and nothing else running on your machine, par=N is expected, so the software should allow up to that, if there is no reason to assume that this would result in degraded performance.

    About thread contention, I assume users know which cores are idle enough to run these workers and they will bind workers to them(for example, use taskset on linux).

    Thread contention is the overhead inside bitcoind caused by coordination between the threads, and is unrelated to what the hardware provides or how loaded it is. The reason for the existing limit of 15 is because benchmarks showed that even on an otherwise unloaded machine with more cores than that, adding more script validation threads was net slower after about that many. That benchmark dates from 2013, so there is no reason to assume it still holds, as both hardware and typical blockchain script usage has changed significantly, as well as other parts of the codebase. But it would still be worth investigating if there isn’t just some number above which it makes no sense.

  10. in src/validation.h:84 in ef71fe4e19
    80@@ -80,7 +81,7 @@ static constexpr int DEFAULT_CHECKLEVEL{3};
    81 static const uint64_t MIN_DISK_SPACE_FOR_BLOCK_FILES = 550 * 1024 * 1024;
    82 
    83 /** Maximum number of dedicated script-checking threads allowed */
    84-static constexpr int MAX_SCRIPTCHECK_THREADS{15};
    85+static int MAX_SCRIPTCHECK_THREADS = GetNumCores();
    


    sipa commented at 4:14 pm on June 6, 2025:
    Don’t do this. Upper-case variable names are for constants, and this is no longer a constant. If there is reason for it to be a variable, the constant and its name should be removed, and the code using it should be adapted to compute it on the fly.
  11. HowHsu commented at 4:31 pm on June 6, 2025: none

    There is a master worker, plus nCores-1 normal workers, so setting it to nCores actually allows nCores+1 worker threads.

    That is incorrect. With -par=N, there is 1 master plus N-1 workers, which both perform script validation work, but the master only joins the other threads when it is done with other work. If you have N perfectly parallel cores, and nothing else running on your machine, par=N is expected, so the software should allow up to that, if there is no reason to assume that this would result in degraded performance.

    From the code I read from checkqueue.h ( https://raw.githubusercontent.com/bitcoin/bitcoin/refs/heads/master/src/checkqueue.h#:~:text=std%3A%3Aoptional%3CR%3E%20Complete()%20EXCLUSIVE_LOCKS_REQUIRED(!m_mutex) ) I think par=N means N workers, plus one master worker which is actually the main bitcoind thread itself. And I run benchmark in ben/connectblock.cpp to debug it, shows the same thing.

    About thread contention, I assume users know which cores are idle enough to run these workers and they will bind workers to them(for example, use taskset on linux).

    Thread contention is the overhead inside bitcoind caused by coordination between the threads, and is unrelated to what the hardware provides or how loaded it is. The reason for the existing limit of 15 is because benchmarks showed that even on an otherwise unloaded machine with more cores than that, adding more script validation threads was net slower after about that many. That benchmark dates from 2013, so there is no reason to assume it still holds, as both hardware and typical blockchain script usage has changed significantly, as well as other parts of the codebase. But it would still be worth investigating if there isn’t just some number above which it makes no sense.

    I’ll try to find a machine which has enough cpu resource to test it.

  12. sipa commented at 4:33 pm on June 6, 2025: member

    I think par=N means N workers, plus one master worker which is actually the main bitcoind thread itself.

    That’s not correct. The number of worker threads is one less than the argument to -par: https://github.com/bitcoin/bitcoin/blob/v29.0/src/node/chainstatemanager_args.cpp#L62.

  13. HowHsu commented at 4:40 pm on June 6, 2025: none

    I think par=N means N workers, plus one master worker which is actually the main bitcoind thread itself.

    That’s not correct. The number of worker threads is one less than the argument to -par: https://github.com/bitcoin/bitcoin/blob/v29.0/src/node/chainstatemanager_args.cpp#L62.

    I see, sorry for missing this part, but MAX_SCRIPTCHECK_THREADS is to limit opt.worker_threads_num, so it still should be nCores-1

  14. sipa commented at 4:42 pm on June 6, 2025: member
    @HowHsu I am so sorry! You are right.
  15. HowHsu commented at 1:50 pm on June 8, 2025: none

    Hi Sipa,

    The reason for the existing limit of 15 is because benchmarks showed that even on an otherwise unloaded machine with more cores than that, adding more script validation threads was net slower after about that many.

    Any doc about this test?

    That benchmark dates from 2013

    Is the benchmark you mentioned /bench/connectblock.cpp

    I’ve done some investigation, and found the bottleneck may be the ECC calculation itself, not lock contension or something else. I did some test on my Intel i5-13600KF(13th Gen)PC which has 6 P-cores and 8 E-cores (6 * 2 + 8 = 20 logical cores) with ./bench/connectblock.cpp, the producer adds tasks to the queue fast, lock contention is low from perf report. But the TPS doesn’t go higher after thread_num=14 or 15, which is equal to the number of physical cores. From internet, I learned that ECC computation highly uses ALU、FPU etc. and logical cpu threads on one core shares these stuff, which means ECC cannot get benefit from SMT. So I guess a user can set MAX_SCRIPTCHECK_THREADS to nPhysicalCores - 1, but still needs more research. Unfortunately I don’t have a test environment with a bunch of CPUs (Would be super great if someone can do this test on a machine with different number of physical cores)

  16. sigmaemilio commented at 10:59 pm on June 8, 2025: none
    Gg

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-06-15 06:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me