Functional test suite bottlenecks

Sjors commented at 2:47 pm on August 14, 2019: member

After applying #15363, and configuring without debug, when I run the functional test suite on a 3 GB RAM disk with --jobs=100 --cachedir=/Volumes/RAMDisk/cache --tmpdir=/Volumes/RAMDisk/tmp:

CPU usage hovers between 50 and 80% for the first 45 seconds and then drops.
RAM usage peaks at about 12 GB in the same time period and then drops.
test suite takes about 120 seconds

The above suggests that the total run time is determined by a handful of slow running tests:

wallet_backup.py: ~~120s~~ 58s
feature_block.py: ~~115s~~ 85s
wallet_address_types.py: ~~102s~~ 48s
feature_maxuploadtarget.py: 85s
p2p_segwit.py: ~~83s~~ 60s
p2p_tx_download.py: 84s
mining_getblocktemplate_longpoll.py: 70s
feature_fee_estimation.py: ~~68s~~ 41s

Getting the above tests to run in under 60 seconds should therefore cut total run time in half for me. Whether this benefits slower systems depends on the reason for the bottleneck.

A bit more detail about the slow running tests:

wallet_backup.py

“Creating transactions” takes almost a minute. Backup is nearly instant. “More transactions” takes another 50 seconds. The various restores take 22 seconds in total.

The test could be made faster by:

generating fewer transactions
put pre-generated the wallets, use extended test suite to verify their contents. However, it’s useful to check that the backup feature actually works for a freshly generated wallet.
figuring out why 30 transactions are so slow. I tried setting wait=0.1 for both sync_mempool and sync_blocks but that didn’t help.

Fixed by #17121

feature_block.py

When running this test individually it took only 25 seconds (in contrast to the wallet_backup where running individually didn’t make much difference). The slowest checks are Reject a block with too many P2SH sigops (7s)

This test could be split (in two). This requires moving some helper code into the test framework, as well as studying the sequence of checks for inter-dependencies.

See #16613 (comment)

wallet_address_types.py

Runs in 58 seconds individually.

This test seems trivial to split. But like wallet_backup.py it may be worth investigating why a just a few dozen transactions take so long.

Fixed by #17124

feature_maxuploadtarget.py

Runs in 1 minute individually, could perhaps be split in two.

p2p_segwit.py

TODO

p2p_tx_download.py

TODO

mining_getblocktemplate_longpoll.py

TODO

feature_fee_estimation.py

Fixed by #16535

fanquake added the label Tests on Aug 15, 2019

MarcoFalke added the label good first issue on Oct 12, 2019

MarcoFalke commented at 1:18 pm on October 12, 2019: member

p2p_tx_download.py

This is supposed to run 1 minute in real time. I didn’t bother to (or could not) change the tx fallback timeout to mocktime.

MarcoFalke commented at 1:19 pm on October 12, 2019: member

feature_fee_estimation.py

This has been fixed by #16535. The same could be applied to other tx-relay tests with a lot of txs.

theStack commented at 0:59 am on October 13, 2019: member

wallet_backup.py

The majority of the time for the “Creating transactions” / “More transactions” parts is spent in sync_mempools() (called 5 times each), i.e. the bottleneck is in relaying transactions. By whitelisting the peers (via -whitelist, like in #16535), the inventory is transmissioned immediately rather than on average every 5 seconds (see https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L105), speeding up the test by at least a factor of two:

0$ time ./wallet_backup.py
1real	2m2.523s
2user	0m6.093s
3sys	0m2.454s

0$ time ./wallet_backup_with_whitelist.py
1real	0m36.570s
2user	0m5.365s
3sys	0m1.696s

theStack referenced this in commit 581c9be0d8 on Oct 13, 2019

fanquake referenced this in commit e82f874a23 on Oct 13, 2019

theStack referenced this in commit 0d2a94c0b1 on Oct 13, 2019

theStack commented at 7:10 pm on October 13, 2019: member

wallet_address_types.py

Same as for feature_fee_estimation.py and wallet_backup.py, this one can be sped up significantly (>3x on my machine) by whitelisting peers: #17124 As far as I can see, there is no slow running test (>60 sec.) left now for which the bottleneck is mempool synchronization and a whitelisting could help.

theStack referenced this in commit fba4baa4fa on Oct 13, 2019

Sjors commented at 8:31 pm on October 13, 2019: member

Thanks for all the improvement PRs @theStack! I updated the description above to link to them. I’ll run my initial benchmark again when a few more have landed.

MarcoFalke referenced this in commit b33c03b0cb on Oct 14, 2019

sidhujag referenced this in commit 3c44d686b5 on Oct 15, 2019

sidhujag referenced this in commit 3c332fbde4 on Oct 15, 2019

theStack commented at 2:52 pm on October 15, 2019: member

Thanks for all the improvement PRs @theStack! I updated the description above to link to them. I’ll run my initial benchmark again when a few more have landed.

You’re welcome – working on those tasks provided me a great way to learn more about the test systems as well as about the internals of bitcoind and I recommend them to every new contributor (like me) trying to get one’s feet wet with Bitcoin Core development. Definitely deserves the label “good first issue”. :+1:

feature_block.py

On my machine most of the time is spent for the check Test a re-org of one week's worth of blocks (1088 blocks) (57.64sec / 90.94 sec, or >60% of total time). Considering that a relatively large total block data amount over 1 GB (1088 full blocks, hence 1088 MB) is created and sent to bitcoind, the CPU is mostly occupied with hashing, signing and waiting for bitcoind to respond, as shown by profiler:

 0$ python3 -m cProfile -s time feature_block.py
 1         11216456 function calls (10853148 primitive calls) in 87.749 seconds
 2
 3   Ordered by: internal time
 4
 5   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 6     1041   52.693    0.051   52.693    0.051 {built-in method time.sleep}
 7  1152256    8.820    0.000    8.820    0.000 key.py:112(double)
 8   572356    5.840    0.000    5.840    0.000 key.py:132(add_mixed)
 9   162592    4.093    0.000    4.093    0.000 {built-in method _hashlib.new}
10      350    2.126    0.006   54.649    0.156 util.py:206(wait_until)
11   105746    1.359    0.000    1.804    0.000 messages.py:338(serialize)
12151641/115735    0.979    0.000    4.257    0.000 messages.py:136(ser_vector)
13     4501    0.949    0.000   15.963    0.004 key.py:202(mul)
14   143473    0.784    0.000    1.000    0.000 messages.py:101(ser_uint256)
15     9001    0.711    0.000    0.711    0.000 key.py:11(modinv)

So I guess there is not much room for a performance improvement without changing parameters (like e.g. number of blocks to re-org).

MarcoFalke commented at 5:02 pm on November 1, 2019: member

Anything left to do here?

MarcoFalke removed the label good first issue on Nov 1, 2019

Sjors commented at 9:03 pm on November 1, 2019: member

I updated the measurements in the OP. Several tests are still > 60 seconds, e.g. feature_block.py. It seems non trivial to improve that, but not hopeless enough to close the issue either?

jnewbery commented at 9:10 pm on November 1, 2019: member

I think the best thing that can be done for feature_block is to split the week-long-reorg test into its own test file. That at least makes the test more tractable and easier to work with.

theStack commented at 9:06 pm on November 4, 2019: member

Now that several functional tests have been optimized recently, would it probably a good opportunity to update the order of the test script list (ordered descending by running time)? https://github.com/bitcoin/bitcoin/blob/bc38bb9a6036d73f507e66a2f588547f3f17e4b1/test/functional/test_runner.py#L76-L218 Also the comments don’t seem to be accurate anymore, “Tests less than 5m”/“Tests less than 2m” can probably be substitued by “Tests less than 2m”/“Tests less than 1m”.

jnewbery commented at 9:57 pm on November 4, 2019: member

would it probably a good opportunity to update the order of the test script list

Concept ACK. Seems reasonable to do this every so often.

markblundeberg commented at 5:57 am on February 7, 2020: none

p2p_feefilter also can be made 10x faster by:

whitelisting to remove relay delays.
changing the sleep timer from 1 to 0.1 seconds.

(The test time is dominated by waiting for invs from nodes, which are subject to the trickle relay mechanism.)

I’m not sure if that is totally desired though, is there any chance that whitelisting will modify fee filter mechanism and cause an interaction?

In fact that test can rarely fail with a ~ 1-in-30000 chance because there is no upper bound on the trickle relay time but the test only waits 60 seconds. With a tx that is created on node1 and related node1 -> node0 -> mininode, the mean time to propagate is ~ 6.7 seconds (node1 and node0 are bidirectionally connected but mininode is only inbound). We actually got lucky and had such a failure recently in ABC. :D

Sjors closed this on Jan 6, 2021

PastaPastaPasta referenced this in commit 2b1b020dd2 on Dec 22, 2021

PastaPastaPasta referenced this in commit 5201f45782 on Dec 22, 2021

DrahtBot locked this on Aug 18, 2022

Functional test suite bottlenecks #16613

wallet_backup.py

feature_block.py

wallet_address_types.py

feature_maxuploadtarget.py

p2p_segwit.py

p2p_tx_download.py

mining_getblocktemplate_longpoll.py

feature_fee_estimation.py