Apple Silicon massive performance degredation in function tests / rpc (m1) #24120

issue PastaPastaPasta openend this issue on January 21, 2022
  1. PastaPastaPasta commented at 7:53 am on January 21, 2022: contributor

    I discovered this significant slowdown when functional tests took substancially longer than expected on my new M1 macbook pro. (completes in ~250 seconds on my 1950x, and 3000 seconds on m1)

    To validate this issue, I deployed two new regtest networks (both compiled locally), one on an Amd 1950x CPU running ubuntu 21.10, and one on a 16 in m1 macbook pro 10 core.

    I found that the m1 macbook had substantially better results in benchmarks compared to x86, ~32% faster in the AssembleBlock benchmark for example. However, the functional tests were very slow, and RPC was very slow. It is my current believe that there is some issues in RPC handling / code that is very heavily slowing down the functional tests.

    I compared some rpc calls in m1 and x86.

    I found m1 rpc calls were around 10x slower.

    ~For example, getblockchaininfo took 0.5 seconds on m1, 0.003 seconds on x86.~ I’ve not been able to replicate this specific issue now.. Non-db / disk involved rpc calls seem mostly comparable between the two systems. Generating 100 blocks took 23 seconds on m1, and 2.4 seconds on x86.

    If someone has any ideas as to why this is so slow, or how to fix it, I am all ears!

    Also, if someone on mac (non-m1) could replicate this test and post their information (so we can compare m1 mac to non-m1 mac) that’d be highly appreciated.

    If other developers using m1 could share their experiences, that’d be helpful.

  2. PastaPastaPasta added the label Bug on Jan 21, 2022
  3. MarcoFalke added the label macOS on Jan 21, 2022
  4. PastaPastaPasta commented at 12:43 pm on January 22, 2022: contributor
    @sipa would you be able to attempt to replicate this on your non-m1 arm machine? @prusnak would you be able to attempt to replicate this on your m1 machine?
  5. MarcoFalke commented at 12:52 pm on January 22, 2022: member
    Can you debug where the time is spent? There is certain “anti virus” software that slows down RPC connections.
  6. sipa commented at 1:52 pm on January 22, 2022: member
    @PastaPastaPasta I’m afraid that any comparison will be difficult. The machine I have is a 24-core Cortex-A53 (an architecture designed in 2012) running at 1 GHz, with 4 GiB RAM. In terms of single-threaded performance, even a RPi 3 is faster.
  7. PastaPastaPasta commented at 2:47 pm on January 22, 2022: contributor

    Can you debug where the time is spent? There is certain “anti virus” software that slows down RPC connections. @MarcoFalke this is a brand new M1 install, so no custom AV or anything like that has been installed.

    See attached screen shot for high level profiling. It seems like 80% of this thread (which is a rpc worker that did exclusively getnewaddress) was spent in wallet::DatabaseBatch::Write. The vast majority of this time ended up being spent in libsystem_kernel.dylib`__fcntl

    Notes for replication, start bitcoind –regtest and attach profiler to it. Then, repeatidely run getnewaddress I used a bash script

    0while True
    1do
    2    time ./src/bitcoin-cli --regtest getnewaddress
    3    sleep 0
    4done
    

    I did this for about a minute.

  8. MarcoFalke commented at 2:48 pm on January 22, 2022: member
    Are you using an external hard drive? I presume RPCs that don’t write to disk are faster/immediate?
  9. PastaPastaPasta commented at 2:49 pm on January 22, 2022: contributor
    Nope, everything is on the internal ssd
  10. PastaPastaPasta commented at 3:29 pm on January 22, 2022: contributor

    Are you using an external hard drive? I presume RPCs that don’t write to disk are faster/immediate?

    Even non-writing rpc calls are slightly slower than I’d expect.

    getblockchaininfo takes .002 seconds on my amd 1950x machine, ~0.01 seconds on m1, or about 5x slower (not sure if there’s a difference between mac time and unix time commands)

    listtransaction "*" 1000 in a wallet that generated 1310 blocks takes 0.055s on amd ~0.06s on m1

    It seems like the issue is in disk / database involving rpc calls

  11. brunoerg commented at 10:13 pm on January 22, 2022: contributor
    I have a Macbook Pro (16GB + Intel arch) and a Macbook Air M1, I will compare the results and post here.
  12. brunoerg commented at 11:30 am on January 24, 2022: contributor
    Tested some functional tests here (wallet_listtransactions and rpc_blockchain) and both executes faster on M1. 60% faster.
  13. MarcoFalke commented at 11:38 am on January 24, 2022: member
    @PastaPastaPasta It might be worth to check your ssd for defects and performance
  14. MarcoFalke removed the label Bug on Jan 24, 2022
  15. MarcoFalke added the label Questions and Help on Jan 24, 2022
  16. PastaPastaPasta commented at 1:19 pm on January 24, 2022: contributor

    @brunoerg

    Tested some functional tests here (wallet_listtransactions and rpc_blockchain) and both executes faster on M1. 60% faster.

    Did you try anything involving wallet? getnewaddress or loadwallet or generatetoaddress?

    These are where the slowdowns are

    EDIT: I see know your just talking about the functional tests, I’ve added a comment below with my speeds on both. I will investigate potential ssd issues

  17. PastaPastaPasta commented at 1:25 pm on January 24, 2022: contributor

    On m1 wallet_listtransactions took 31.126 seconds, rpc_blockchain took 9.715 seconds

    on ubuntu amd 1900x wallet_listtransactions took 23.176 seconds, rpc_blockchain took 6.880s

    EDIT: In regards to SSD defects, S.M.A.R.T. Status according to System Information is “Verified”

  18. PastaPastaPasta commented at 8:03 am on January 25, 2022: contributor
    @MarcoFalke I tried to benchmark the disk. Using Blackmagic Disk Speed Test makes me think there is no raw ssd issue
  19. PastaPastaPasta commented at 9:18 am on January 26, 2022: contributor

    Tested some functional tests here (wallet_listtransactions and rpc_blockchain) and both executes faster on M1. 60% faster. @brunoerg ~Could you try feature_block.py?~

  20. PastaPastaPasta commented at 3:41 pm on January 26, 2022: contributor

    I have used a second MacBookPro to validate this issue (courtesy of @Munkybooty), as well as a 2017 macbook air

    We found the brand new macbook pro to be 131% slower than the 5 year old entry level mac book air.

    We compared two systems.

    Machine 2021 Macbook Pro 14"
    OS MacOS Monterey
    CPU 8/14 Core M1 Pro
    Memory 32GB
    Machine 2017 Macbook Air
    OS MacOS Monterey
    CPU 1.8GHz Dual core i5
    Memory 8GB

    We ran test_runner.py single threaded on each machine. The macbook air took 2206 seconds. The m1 took 5107 seconds. Over 80 minutes!

    You can view the raw data here: https://docs.google.com/spreadsheets/d/1uIcgoIg0SVF5DMz0htVwOEsqTcZPleBxiWYg1iMi-Rw/edit#gid=1443924285

    As you can see, some of the tests run substantially faster on m1, however others run exponentially slower on M1, as of note mempool_package_onemore.py takes 3400% longer on m1 compared to a macbook air. @brunoerg Please try running mempool_package_onemore.py.

    I think it is clear that there is obviously a massive issue with certain functional tests on m1 based platforms.

    These problematic tests are

    1. mempool_package_onemore.py
    2. mempool_packages.py
    3. mempool_updatefromblock.py
    4. mining_prioritisetransaction.py
    5. wallet_startup.py
    6. wallet_taproot.py

    Each of which take 3x (or more) as long on m1 compared to a 2017 macbook air

  21. hebasto commented at 8:03 pm on January 26, 2022: member

    Indeed, result comparison looks interesting (medians of 3 runs):

    • a VirtualBox VM (2 cores) on an Intel host, macOS Monterey 12.2 (21D48):
    0% time ./test/functional/mempool_package_onemore.py
    12022-01-26T19:53:31.410000Z TestFramework (INFO): Initializing test directory /var/folders/9n/wb_bl1vs509d0dy850m7l9_m0000gn/T/bitcoin_func_test_8mxxh75y
    22022-01-26T19:53:52.802000Z TestFramework (INFO): Stopping nodes
    32022-01-26T19:53:53.016000Z TestFramework (INFO): Cleaning up /var/folders/9n/wb_bl1vs509d0dy850m7l9_m0000gn/T/bitcoin_func_test_8mxxh75y on exit
    42022-01-26T19:53:53.016000Z TestFramework (INFO): Tests successful
    5./test/functional/mempool_package_onemore.py  1.74s user 2.30s system 18% cpu 21.838 total
    
    • Mac mini (M1 2020) + macOS Monterey 12.1 (21C52):
    0% time ./test/functional/mempool_package_onemore.py
    12022-01-26T19:42:07.867000Z TestFramework (INFO): Initializing test directory /var/folders/4f/xd_qxxhx35569jp258s803_w0000gn/T/bitcoin_func_test_l99ehi5l
    22022-01-26T19:43:14.092000Z TestFramework (INFO): Stopping nodes
    32022-01-26T19:43:14.476000Z TestFramework (INFO): Cleaning up /var/folders/4f/xd_qxxhx35569jp258s803_w0000gn/T/bitcoin_func_test_l99ehi5l on exit
    42022-01-26T19:43:14.476000Z TestFramework (INFO): Tests successful
    5./test/functional/mempool_package_onemore.py  2.03s user 0.80s system 4% cpu 1:06.73 total
    
  22. Munkybooty commented at 10:43 pm on January 26, 2022: none

    The numbers we’ve gotten were from running them through test_runner.py thankfully the results are replicable via single test runs

    • 2017 Macbook Air, Intel macOS Monterey 12.0.1:
    0% time ./test/functional/mempool_package_onemore.py 
    12022-01-26T22:28:58.175000Z TestFramework (INFO): Initializing test directory /var/folders/pq/dx302qc9017285xl75wyp64m0000gn/T/bitcoin_func_test__6g3bi1h
    22022-01-26T22:29:00.028000Z TestFramework (INFO): Stopping nodes
    32022-01-26T22:29:00.239000Z TestFramework (INFO): Cleaning up /var/folders/pq/dx302qc9017285xl75wyp64m0000gn/T/bitcoin_func_test__6g3bi1h on exit
    42022-01-26T22:29:00.239000Z TestFramework (INFO): Tests successful
    5./test/functional/mempool_package_onemore.py  1.33s user 0.61s system 59% cpu 3.263 total
    
    • 2021 Macbook Pro 14", M1 Pro macOS Monterey 12.1:
    0time ./test/functional/mempool_package_onemore.py
    12022-01-26T22:33:18.019000Z TestFramework (INFO): Initializing test directory /var/folders/82/2fk9xtkn1k3b5kbhvv937p0h0000gn/T/bitcoin_func_test_5i4zs9d_
    22022-01-26T22:34:25.599000Z TestFramework (INFO): Stopping nodes
    32022-01-26T22:34:26.209000Z TestFramework (INFO): Cleaning up /var/folders/82/2fk9xtkn1k3b5kbhvv937p0h0000gn/T/bitcoin_func_test_5i4zs9d_ on exit
    42022-01-26T22:34:26.209000Z TestFramework (INFO): Tests successful
    5./test/functional/mempool_package_onemore.py  1.46s user 1.24s system 3% cpu 1:08.59 total
    
  23. PastaPastaPasta commented at 4:11 pm on March 7, 2022: contributor
    I have found that doing ./configure --without-bdb --with-sqlite resolves the performance issues, for anyone else that may want to run functional tests on m1 @hebasto @brunoerg
  24. bitcoin deleted a comment on Apr 27, 2022
  25. landabaso commented at 10:16 am on May 28, 2022: none

    Generating 100 blocks took 23 seconds on m1, and 2.4 seconds on x86.

    I might have hit a similar problem on Intel.

    I am experiencing x10 slower integration tests (on the regtest) for my project with v23 vs v22. I also mine the initial 101 blocks on each of my tests on the regtest. This can be reproduced using the binaries distributed at bitcoincore.org.

    I described the issue on stack exchange.

    I have found that doing ./configure –without-bdb –with-sqlite resolves the performance issues, for anyone else that may want to run functional tests on m1 @hebasto @brunoerg

    After reading your post, I compiled with:

    0./configure --enable-hardening --with-gui=no  --without-bdb --with-sqlite
    

    , but that did not solve the performance issues in my case.

    I am using macos Monterey on Intel. I also tested on BigSur and it was the same. So it was not OS-version specific either.

    I cannot run comparisons in Linux, though, but it should be pretty easy to replicate by anyone:

     0rm -rf /tmp/regtest1/bitcoind
     1mkdir -p /tmp/regtest1/bitcoind
     2
     3bitcoind_v22_vs_v23 \
     4  -datadir=/tmp/regtest1/bitcoind \
     5  -regtest \
     6  -server \
     7  -txindex \
     8  -fallbackfee=0.0002 \
     9  -zmqpubhashtx=tcp://127.0.0.1:30001 \
    10  -zmqpubhashblock=tcp://127.0.0.1:30001
    11
    12bitcoin-cli -regtest -datadir=/tmp/regtest1/bitcoind createwallet performance_tests
    13GEN_ADDRESS=$(bitcoin-cli -regtest -datadir=/tmp/regtest1/bitcoind getnewaddress)
    14bitcoin-cli -regtest -datadir=/tmp/regtest1/bitcoind generatetoaddress 101 $GEN_ADDRESS
    

    This takes less than 2 secs on an average Intel Mac computer (f.ex.: macbook pro mid 2015) using this binary. But it takes 17 secs when compiling v23 (or using this binary).

  26. adam2k commented at 6:15 pm on August 10, 2022: none

    I have found that doing ./configure --without-bdb --with-sqlite resolves the performance issues, for anyone else that may want to run functional tests on m1 @hebasto @brunoerg @PastaPastaPasta I’m trying with your configuration and it seems to work when running:

    0$ ./test/functional/test_runner.py
    

    I’m testing with the --extended option and the same configuration, there appears to be some flaky tests. When running independently they pass.

     0$ ./test/functional/test_runner.py --extended
     1
     2...
     3
     4feature_index_prune.py                                 | ✖ Failed  | 209 s
     5p2p_eviction.py                                        | ✖ Failed  | 0 s
     6p2p_leak_tx.py                                         | ✖ Failed  | 0 s
     7rpc_signmessagewithprivkey.py                          | ✖ Failed  | 0 s
     8wallet_signmessagewithaddress.py                       | ✖ Failed  | 0 s
     9
    10ALL                                                    | ✖ Failed  | 13299 s (accumulated)
    11Runtime: 5049 s
    

    Running only feature_index_prune.py

     0./test/functional/feature_index_prune.py
     12022-08-10T18:10:59.423000Z TestFramework (INFO): Initializing test directory /var/folders/sk/74ql9fy11fx_q0y09x6bld9r0000gn/T/bitcoin_func_test_ry9rnzgx
     22022-08-10T18:11:02.010000Z TestFramework (INFO): check if we can access blockfilters and coinstats when pruning is enabled but no blocks are actually pruned
     32022-08-10T18:11:07.239000Z TestFramework (INFO): prune some blocks
     42022-08-10T18:11:07.380000Z TestFramework (INFO): check if we can access the tips blockfilter and coinstats when we have pruned some blocks
     52022-08-10T18:11:07.507000Z TestFramework (INFO): check if we can access the blockfilter and coinstats of a pruned block
     62022-08-10T18:11:11.365000Z TestFramework (INFO): make sure trying to access the indices throws errors
     72022-08-10T18:11:15.786000Z TestFramework (INFO): prune exactly up to the indices best blocks while the indices are disabled
     82022-08-10T18:11:18.310000Z TestFramework (INFO): make sure that we can continue with the partially synced indices after having pruned up to the index height
     92022-08-10T18:11:22.160000Z TestFramework (INFO): prune further than the indices best blocks while the indices are disabled
    102022-08-10T18:11:32.704000Z TestFramework (INFO): make sure we get an init error when starting the nodes again with the indices
    112022-08-10T18:11:34.274000Z TestFramework (INFO): make sure the nodes start again with the indices and an additional -reindex arg
    122022-08-10T18:11:55.329000Z TestFramework (INFO): ensure that prune locks don't prevent indices from failing in a reorg scenario
    132022-08-10T18:11:56.519000Z TestFramework (INFO): Stopping nodes
    142022-08-10T18:11:57.241000Z TestFramework (INFO): Cleaning up /var/folders/sk/74ql9fy11fx_q0y09x6bld9r0000gn/T/bitcoin_func_test_ry9rnzgx on exit
    152022-08-10T18:11:57.241000Z TestFramework (INFO): Tests successful
    
  27. hebasto commented at 8:04 am on September 23, 2022: member
    Is it related to (or the same as) #25724?
  28. adam2k commented at 6:05 pm on September 23, 2022: none
    @hebasto yes, this issue appears to be related/the same as #25724. I think it’s safe to close this issue when #25724 is closed and we can come back and confirm this is no longer an issue.
  29. fanquake commented at 1:39 pm on September 24, 2022: member
    Closing this as a duplicate of #25724.
  30. fanquake closed this on Sep 24, 2022

  31. PastaPastaPasta commented at 2:10 pm on September 24, 2022: contributor
    I’m not currently convinced that this is a duplicate, however, I don’t care too much if this is closed. In the testing that I and others I talked with performed, this issue was purely an arm vs x86 issue. When compiling with default compilation configuration (which does not use sqlite), there were significant performance differences in certain rpcs. However, when enabling sqlite, the arm systems were much faster than x86.
  32. fanquake commented at 2:13 pm on September 24, 2022: member

    When compiling with default compilation configuration (which does not use sqlite),

    Sqlite will be used automatically if it’s available (which is always the case on macOS), it doesn’t need to be enabled specifically.

  33. PastaPastaPasta commented at 2:16 pm on September 24, 2022: contributor

    Hmm, that is news to me, but it still doesn’t fully make sense to me..

    (on arm ~linux~ Mac) without --enable-sqlite lead to slow performance with --enable-sqlite lead to fast performance

    can you help me understand that? when based on what you said there should be no difference between the two right?

  34. fanquake commented at 2:23 pm on September 24, 2022: member

    can you help me understand that?

    Not without more information. What branch are you building. Where are the config.logs etc.

    there should be no difference between the two right?

    Correct. On a system that has a sqlite lib available, ./configure & ./configure --enable-sqlite, you will get the same result.

  35. PastaPastaPasta commented at 3:13 pm on September 27, 2022: contributor

    Okay, so I reproduced this on latest master ./configure leads to slow behavior (mempool_packages.py took 1min) config.log https://gist.github.com/PastaPastaPasta/048279264b94a542e9b29bf6f9fea436 ./configure --enable-sqlite --without-bdb leads to fast behavior (7 seconds ^) config.log https://gist.github.com/PastaPastaPasta/dcabff2fbbc2ccfbc62c46aeea836384

    Does this still sound like the other issue to you? I’m not sure myself

  36. hebasto commented at 4:08 pm on September 27, 2022: member

    @PastaPastaPasta

    Okay, so I reproduced this on latest master ./configure leads to slow behavior (mempool_packages.py took 1min) config.log https://gist.github.com/PastaPastaPasta/048279264b94a542e9b29bf6f9fea436 ./configure --enable-sqlite --without-bdb leads to fast behavior (7 seconds ^) config.log https://gist.github.com/PastaPastaPasta/dcabff2fbbc2ccfbc62c46aeea836384

    Does this still sound like the other issue to you? I’m not sure myself

    Which commit or tag you compiled?

  37. fanquake commented at 5:08 pm on September 27, 2022: member

    Does this still sound like the other issue to you? I’m not sure myself

    --enable-sqlite makes no difference here. Only --without-bdb changes anything in the config.log.

    The test framework defaults to running with bdb wallets, if support for both wallet types is compiled, and no option (--descriptors or --legacy-wallet) is given when running the test.

    So the performance difference you’re seeing when configuring with --without-bdb is the test framework defaulting to using descriptor wallets, vs using bdb.

    You should see the same difference if you (after building with ./configure and having sqlite and bdb support compiled) run ./test/functional/mempool_packages.py --descriptors and ./test/functional/mempool_packages.py --legacy-wallet.

  38. PastaPastaPasta commented at 6:45 pm on September 29, 2022: contributor

    @PastaPastaPasta

    Okay, so I reproduced this on latest master ./configure leads to slow behavior (mempool_packages.py took 1min) config.log https://gist.github.com/PastaPastaPasta/048279264b94a542e9b29bf6f9fea436 ./configure --enable-sqlite --without-bdb leads to fast behavior (7 seconds ^) config.log https://gist.github.com/PastaPastaPasta/dcabff2fbbc2ccfbc62c46aeea836384 Does this still sound like the other issue to you? I’m not sure myself

    Which commit or tag you compiled?

    9fcdb9f3a044330d3d7515fa35709102c98534d2

  39. hebasto commented at 9:12 am on October 4, 2022: member

    @PastaPastaPasta

    Which commit or tag you compiled?

    9fcdb9f

    Tested on macOS 12.6 (21G115, M1):

     0% git checkout 9fcdb9f3a044330d3d7515fa35709102c98534d2
     1% ./autogen.sh
     2% ./configure --enable-sqlite --without-bdb
     3% make clean
     4% make
     5% time ./test/functional/mempool_packages.py                                                                          
     62022-10-04T09:06:33.875000Z TestFramework (INFO): Initializing test directory /var/folders/4f/xd_qxxhx35569jp258s803_w0000gn/T/bitcoin_func_test_4ut0063u
     72022-10-04T09:06:38.756000Z TestFramework (INFO): Stopping nodes
     82022-10-04T09:06:39.035000Z TestFramework (INFO): Cleaning up /var/folders/4f/xd_qxxhx35569jp258s803_w0000gn/T/bitcoin_func_test_4ut0063u on exit
     92022-10-04T09:06:39.035000Z TestFramework (INFO): Tests successful
    10./test/functional/mempool_packages.py  0.97s user 0.31s system 24% cpu 5.319 total
    
     0% ./configure --enable-sqlite --with-bdb
     1% make clean
     2% make
     3% time ./test/functional/mempool_packages.py                                                                       
     42022-10-04T09:08:26.434000Z TestFramework (INFO): Initializing test directory /var/folders/4f/xd_qxxhx35569jp258s803_w0000gn/T/bitcoin_func_test_68lfhida
     52022-10-04T09:09:22.387000Z TestFramework (INFO): Stopping nodes
     62022-10-04T09:09:23.098000Z TestFramework (INFO): Cleaning up /var/folders/4f/xd_qxxhx35569jp258s803_w0000gn/T/bitcoin_func_test_68lfhida on exit
     72022-10-04T09:09:23.098000Z TestFramework (INFO): Tests successful
     8./test/functional/mempool_packages.py  2.91s user 1.00s system 6% cpu 56.819 total
     9% time ./test/functional/mempool_packages.py --descriptors
    102022-10-04T09:09:45.151000Z TestFramework (INFO): Initializing test directory /var/folders/4f/xd_qxxhx35569jp258s803_w0000gn/T/bitcoin_func_test_myxq01wj
    112022-10-04T09:09:49.752000Z TestFramework (INFO): Stopping nodes
    122022-10-04T09:09:50.140000Z TestFramework (INFO): Cleaning up /var/folders/4f/xd_qxxhx35569jp258s803_w0000gn/T/bitcoin_func_test_myxq01wj on exit
    132022-10-04T09:09:50.140000Z TestFramework (INFO): Tests successful
    14./test/functional/mempool_packages.py --descriptors  0.98s user 0.28s system 24% cpu 5.113 total
    15% time ./test/functional/mempool_packages.py --legacy-wallet
    162022-10-04T09:10:33.537000Z TestFramework (INFO): Initializing test directory /var/folders/4f/xd_qxxhx35569jp258s803_w0000gn/T/bitcoin_func_test_89kgc1sw
    172022-10-04T09:11:28.517000Z TestFramework (INFO): Stopping nodes
    182022-10-04T09:11:29.279000Z TestFramework (INFO): Cleaning up /var/folders/4f/xd_qxxhx35569jp258s803_w0000gn/T/bitcoin_func_test_89kgc1sw on exit
    192022-10-04T09:11:29.279000Z TestFramework (INFO): Tests successful
    20./test/functional/mempool_packages.py --legacy-wallet  2.81s user 0.89s system 6% cpu 55.865 total
    

    Confirming @fanquake’s comment.

  40. bitcoin locked this on Oct 4, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me