Use std::thread::hardware_concurrency, instead of Boost, to determine available cores #10271

pull fanquake wants to merge 1 commits into bitcoin:master from fanquake:replace-boost-getnumcores changing 2 files +4 −8
  1. fanquake commented at 1:40 am on April 25, 2017: member

    Following discussion on IRC about replacing Boost usage for detecting available system cores, I’ve opened this to collect some benchmarks + further discussion.

    The current method for detecting available cores was introduced in #6361.

    Recap of the IRC chat:

     021:14:08 fanquake: Since we seem to be giving Boost removal a good shot for 0.15, does anyone have suggestions for replacing GetNumCores?
     121:14:26 fanquake: There is std::thread::hardware_concurrency(), but that seems to count virtual cores, which I don't think we want.
     221:14:51 BlueMatt: fanquake: I doubt we'll do boost removal for 0.15
     321:14:58 BlueMatt: shit like BOOST_FOREACH, sure
     421:15:07 BlueMatt: but all of boost? doubtful, there are still things we need
     521:16:36 fanquake: Yea sorry, not the whole lot, but we can remove a decent chunk. Just looking into what else needs to be done to replace some of the less involved Boost usage.
     621:16:43 BlueMatt: fair
     721:17:14 wumpus: yes, it makes sense to plan ahead a bit, without immediately doing it
     821:18:12 wumpus: right, don't count virtual cores, that used to be the case but it makes no sense for our usage
     921:19:15 wumpus: it'd create a swarm of threads overwhelming any machine with hyperthreading (+accompanying thread stack overhead), for script validation, and there was no gain at all for that
    1021:20:03 sipa: BlueMatt: don't worry, there is no hurry
    1121:59:10 morcos: wumpus: i don't think that is correct
    1221:59:24 morcos: suppose you have 4 cores (8 virtual cores)
    1321:59:24 wumpus: fanquake: indeed seems that std has no equivalent to physical_concurrency, on any standard. That's annoying as it is non-trivial to implement
    1421:59:35 morcos: i think running par=8 (if it let you) would be notably faster
    1521:59:59 morcos: jeremyrubin and i discussed this at length a while back... i think i commented about it on irc at the time
    1622:00:21 wumpus: morcos: I think the conclusion at the time was that it made no difference, but sure would make sense to benchmark
    1722:00:39 morcos: perhaps historical testing on the virtual vs actual cores was polluted by concurrency issues that have now improved
    1822:00:47 wumpus: I think there are not more ALUs, so there is not really a point in having more threads
    1922:01:40 wumpus: hyperthreads are basically just a stored register state right?
    2022:02:23 sipa: wumpus: yes but it helps the scheduler
    2122:02:27 wumpus: in which case the only speedup using "number of cores" threads would give you is, possibly, excluding other software from running on the cores on the same time
    2222:02:37 morcos: well this is where i get out of my depth
    2322:02:50 sipa: if one of the threads is waiting on a read from ram, the other can use the arithmetic unit for example
    2422:02:54 morcos: wumpus: i'm pretty sure though that the speed up is considerably more than what you might expect from that
    2522:02:59 wumpus: sipa: ok, I back down, I didn't want to argue this at all
    2622:03:35 morcos: the reason i haven't tested it myself, is the machine i usually use has 16 cores... so not easy due to remaining concurrency issues to get much more speedup
    2722:03:36 wumpus: I'm fine with restoring it to number of virtual threads if that's faster
    2822:03:54 morcos: we should have somene with 4 cores (and  actually test it though, i agree
    2922:03:58 sipa: i would expect (but we should benchmark...) that if 8 scriot validation threads instead of 4 on a quadcore hyperthreading is not faster, it's due to lock contention
    3022:04:20 morcos: sipa: yeah thats my point, i think lock contention isn't that bad with 8 now
    3122:04:22 wumpus: on 64-bit systems the additional thread overhead wouldn't be important at least
    3222:04:23 gmaxwell: I previously benchmarked, a long time ago, it was faster.
    3322:04:33 gmaxwell: (to use the HT core count)
    3422:04:44 wumpus: why was this changed at all then?
    3522:04:47 wumpus: I'm confused
    3622:05:04 sipa: good question!
    3722:05:06 gmaxwell: I had no idea we changed it.
    3822:05:25 wumpus: sigh 
    3922:05:54 gmaxwell: What PR changed it?
    4022:06:51 gmaxwell: In any case, on 32-bit it's probably a good tradeoff... the extra ram overhead is worth avoiding.
    4122:07:22 wumpus: [#6361](/bitcoin-bitcoin/6361/)
    4222:07:28 gmaxwell: PR 6461 btw.
    4322:07:37 gmaxwell: er lol at least you got it right.
    4422:07:45 wumpus: the complaint was that systems became unsuably slow when using that many thread
    4522:07:51 wumpus: so at least I got one thing right, woohoo
    4622:07:55 sipa: seems i even acked it!
    4722:07:57 BlueMatt: wumpus: there are more alus
    4822:08:38 BlueMatt: but we need to improve lock contention first
    4922:08:40 morcos: anywya, i think in the past the lock contention made 8 threads regardless of cores a bit dicey.. now that is much better (although more still to be done)
    5022:09:01 BlueMatt: or we can just merge [#10192](/bitcoin-bitcoin/10192/), thats fee
    5122:09:04 gribble: [#10192](/bitcoin-bitcoin/10192/) | Cache full script execution results in addition to signatures by TheBlueMatt · Pull Request [#10192](/bitcoin-bitcoin/10192/) · bitcoin/bitcoin · GitHub
    5222:09:11 BlueMatt: s/fee/free/
    5322:09:21 morcos: no, we do not need to improve lock contention first.   but we should probably do that before we increase the max beyond 16
    5422:09:26 BlueMatt: then we can toss concurrency issues out the window and get more speedup anyway
    5522:09:35 gmaxwell: wumpus: yea, well in QT I thought we also diminished the count by 1 or something?  but yes, if the motivation was to reduce how heavily the machine was used, thats fair.
    5622:09:56 sipa: the benefit of using HT cores is certainly not a factor 2
    5722:09:58 wumpus: gmaxwell: for the default I think this makes a lot of sense, yes
    5822:10:10 gmaxwell: morcos: right now on my 24/28 physical core hosts going beyond 16 still reduces performance.
    5922:10:11 wumpus: gmaxwell: do we also restrict the maximum par using this? that'd make less sense
    6022:10:51 wumpus: if someone *wants* to use the virtual cores they should be able to by setting -par=
    6122:10:51 sipa: *flies to US*
    6222:10:52 BlueMatt: sipa: sure, but the shared cache helps us get more out of it than some others, as morcos points out
    6322:11:30 BlueMatt: (because it means our thread contention issues are less)
    6422:12:05 morcos: gmaxwell: yeah i've been bogged down in fee estimation as well (and the rest of life) for a while now.. otherwise i would have put more effort into jeremy's checkqueue
    6522:12:36 BlueMatt: morcos: heh, well now you can do other stuff while the rest of us get bogged down in understanding fee estimation enough to review it 
    6622:12:37 wumpus: [to answer my own question: no, the limit for par is MAX_SCRIPTCHECK_THREADS, or 16]
    6722:12:54 morcos: but to me optimizing for more than 16 cores is pretty valuable as miners could use beefy machines and be less concerned by block validation time
    6822:14:38 BlueMatt: morcos: i think you may be surprised by the number of mining pools that are on VPSes that do not have 16 cores 
    6922:15:34 gmaxwell: I assume right now most of the time block validation is bogged in the parts that are not as concurrent. simple because caching makes the concurrent parts so fast. (and soon to hopefully increase with bluematt's patch)
    7022:17:55 gmaxwell: improving sha2 speed, or transaction malloc overhead are probably bigger wins now for connection at the tip than parallelism beyond 16 (though I'd like that too).
    7122:18:21 BlueMatt: sha2 speed is big
    7222:18:27 morcos: yeah lots of things to do actually... 
    7322:18:57 gmaxwell: BlueMatt: might be a tiny bit less big if we didn't hash the block header 8 times for every block.  
    7422:21:27 BlueMatt: ehh, probably, but I'm less rushed there
    7522:21:43 BlueMatt: my new cache thing is about to add a bunch of hashing
    7622:21:50 BlueMatt: 1 sha round per tx
    7722:22:25 BlueMatt: and sigcache is obviously a ton
    
  2. fanquake added the label Utils and libraries on Apr 25, 2017
  3. laanwj commented at 5:26 am on April 25, 2017: member

    I think defaulting to the number of physical cores is reasonable, not sure if we should change this default. #6361 was done to make issues like #6358 better, where so many threads were created the whole system is overloaded (that was reported a lot at the time!).

    So any benchmarking here should not just take speed into account but also user experience.

  4. fanquake added this to the "In progress" column in a project

  5. jtimon commented at 10:15 pm on June 22, 2017: contributor

    utACK e011c3b

    I’m not sure I understand why using number of hyperthreads instead of number of cores causes issues. In my experience hyperthreads have gotten very good in the last years (not sure about the timeframes, just from my previous laptop to my current one and my desktop). For all purposes I tend to treat the number of hyprethreads as if that was the number of cores I have.

  6. laanwj commented at 6:29 pm on June 25, 2017: member

    I’m not sure I understand why using number of hyperthreads instead of number of cores causes issues.

    It might not cause issues for you personally on your specific hardware, but it is clear it caused issues for many people if you read the PRs/issues I linked above.

  7. fanquake force-pushed on Oct 12, 2017
  8. Use std::thread::hardware_concurrency, instead of Boost, to determine available cores 937bf4335b
  9. fanquake force-pushed on Dec 15, 2017
  10. kallewoof approved
  11. kallewoof commented at 9:01 am on February 22, 2018: member
    utACK 937bf4335bc58c443645dc29b8d7ceadc81e74e5
  12. TheBlueMatt commented at 3:41 pm on March 2, 2018: member
    Concept ACK.
  13. eklitzke commented at 5:13 pm on March 6, 2018: contributor

    utACK 937bf4335bc58c443645dc29b8d7ceadc81e74e5

    With respect to issues like #6358, the right way to handle this on Linux is to use SCHED_BATCH on the loadblk thread. I don’t know what scheduler options are available on other platforms. I will put up a PR for this, as it can be done separately anyway.

  14. laanwj merged this on Mar 6, 2018
  15. laanwj closed this on Mar 6, 2018

  16. laanwj referenced this in commit bc679829e2 on Mar 6, 2018
  17. fanquake moved this from the "In progress" to the "Done" column in a project

  18. laanwj referenced this in commit becd8dd2ec on Apr 7, 2018
  19. fanquake deleted the branch on Jan 22, 2020
  20. deadalnix referenced this in commit 5e4e672619 on Apr 26, 2020
  21. furszy referenced this in commit 4ed15cc69d on Jun 8, 2020
  22. PastaPastaPasta referenced this in commit 098ccb3aaa on Jun 10, 2020
  23. PastaPastaPasta referenced this in commit a12c3950ac on Jun 12, 2020
  24. PastaPastaPasta referenced this in commit d40a7323cb on Jun 13, 2020
  25. PastaPastaPasta referenced this in commit 1a40ff1e6e on Jun 14, 2020
  26. PastaPastaPasta referenced this in commit a00da7fb0e on Jun 14, 2020
  27. ftrader referenced this in commit 2cc34da8d7 on Aug 17, 2020
  28. PastaPastaPasta referenced this in commit 1f46de1973 on Dec 16, 2020
  29. PastaPastaPasta referenced this in commit 02edd7066e on Dec 18, 2020
  30. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-23 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me