[IBD] Raspberry Pi: 90% CPU time for 1.5% of block processing

l0rinc commented at 9:37 am on June 30, 2025: contributor

There are numerous reports about how slow syncing on Raspberry Pi is, a few recent ones from different platforms:

#32455 (comment)

I’m running a Raspberry Pi 5 with 8GB of RAM and a 4TB SSD on RPi’s Debian Lite Bookworm. This is a fresh install of v29 and after more than two weeks I’m only at blockheight of 829564 of 901531.

https://stacker.news/items/978269

IBD now, I am sitting at 7 days and only 78% synced. The progress has whittled down to only ~1-2% per day. At this rate I think it will take me another 2-3 weeks to finish initial sync. […] I’m running bitcoin core 29.0 on 64bit Raspberry Pi OS (debian bookworm)

https://bitcoin.stackexchange.com/questions/125178/first-synchronized-has-been-running-for-7-days-and-is-at-82

have a raspberry pi 5 and the sync has been running for over a week and is only at 82%

https://www.reddit.com/r/getumbrel/comments/1chqna1/bitcoin_node_too_slow_to_sync

i have setup a raspberry pi 5 with umbrel and installed the bitcoin node and it is downloading the blockchain for more then 1 and half week. Its very very slow when it reached 91%.

https://community.umbrel.com/t/extremely-slow-initial-blockchain-sync-1-a-day/2777

download is VERY slow on my setup, like 1% a day

As part of #32043 I’m measuring IBD and -reindex performance regularly. Displaying the resulting stack as flames (where the call graph of the methods is represented as the callee at the bottom and the final method at the top, deduplicating the parent methods, resulting in multiple similar leaves based on the call paths) indicates which areas of the code take most time:

On high-end devices we see that multithreaded script validation is a minor part of IBD (partially because of more threads, but the total is also a lot less dominant):

Recently I got access to several lower-end Raspberry Pi 4 devices. IBD took surprisingly long indeed, and the resulting flame graphs are quite surprising:

Even the hidden ConnectTip looks quite different… even though it seems we do have SHANI optimizations that I first suspected as a possible culprit:

(I’ll add the details of the server later, but currently my Pi servers are frozen, they’re very sensitive creatures)

Doing an inverse flame to show the own time (where the final methods are at the top and their different callers are below, deduplicating the final methods, resulting in duplicated parents) is also very revealing:

Hetzner I7/i9 server:

Corresponding to:

 0=== Top 20 functions by self-time ===
 124,367,833,683,883 (12.53%) Lloop1_30
 216,614,495,487,954 ( 8.54%) secp256k1_fe_mul_inner
 39,671,690,320,529 ( 4.97%) std::_Hashtable<COutPoint, std::pair<COutPoint const, CCoinsCacheEntry>, Pool...
 48,048,128,525,665 ( 4.14%) Lloop2_30
 57,479,154,097,547 ( 3.85%) secp256k1_gej_double
 64,875,602,514,443 ( 2.51%) SipHashUint256Extra
 74,848,058,903,029 ( 2.49%) leveldb::(anonymous namespace)::BloomFilterPolicy::CreateFilter
 84,794,743,071,310 ( 2.47%) __memcmp_avx2_movbe
 94,683,317,778,516 ( 2.41%) __memmove_avx_unaligned_erms
104,469,407,701,360 ( 2.30%) _int_malloc
113,409,056,546,109 ( 1.75%) CSHA256::Write
123,135,255,483,667 ( 1.61%) leveldb::SkipList<char const*, leveldb::MemTable::KeyComparator>::FindGreater...
132,957,747,949,622 ( 1.52%) secp256k1_fe_sqrt
142,473,661,676,032 ( 1.27%) leveldb::BlockBuilder::Add
152,432,229,937,337 ( 1.25%) AutoFile::write_buffer
162,309,378,132,280 ( 1.19%) _int_free
172,272,582,522,437 ( 1.17%) malloc
182,140,700,000,760 ( 1.10%) leveldb::MemTable::KeyComparator::operator
192,088,998,156,937 ( 1.07%) AutoFile::detail_fread
202,033,602,002,215 ( 1.05%) leveldb::InternalKeyComparator::Compare

Pi - showing very heavy signature validation:

I.e. almost 90% of the time is spent in signature validation even though only 1.5% of the blocks had them ((900000-886157)/900000=1.5%)

 0=== Top 20 functions by self-time ===
 145,200,219,094,553 (49.07%) secp256k1_fe_mul_inner
 217,059,129,066,358 (18.52%) secp256k1_gej_double
 34,958,246,221,561 ( 5.38%) secp256k1_fe_sqrt
 44,537,566,553,791 ( 4.93%) secp256k1_gej_add_ge_var
 52,679,543,872,014 ( 2.91%) secp256k1_fe_sqr_inner
 61,776,534,490,509 ( 1.93%) CSHA256::Write
 71,423,064,476,505 ( 1.54%) secp256k1_ecmult_strauss_wnaf.constprop.0
 81,406,501,260,817 ( 1.53%) sha256_arm_shani::Transform
 91,160,517,562,371 ( 1.26%) __memcpy_generic
101,087,798,904,524 ( 1.18%) secp256k1_ge_from_storage
11967,511,395,459 ( 1.05%) secp256k1_gej_add_zinv_var
12837,850,817,824 ( 0.91%) secp256k1_modinv64_var
13613,159,858,561 ( 0.67%) cfree@GLIBC_2.17
14490,119,900,000 ( 0.53%) secp256k1_ecmult_wnaf.constprop.0.isra.0
15474,344,476,011 ( 0.51%) uint256 SignatureHash<CTransaction>
16462,336,566,142 ( 0.50%) SignatureCache::Get
17455,625,547,742 ( 0.49%) GetScriptOp
18393,204,249,970 ( 0.43%) MurmurHash3
19387,212,664,603 ( 0.42%) CRollingBloomFilter::insert
20375,980,794,732 ( 0.41%) secp256k1_modinv64_update_de_62.isra.0

(it’s possible I ran the two measurements with slightly different configs since it took weeks to convince the Pi not to overheat or OOM in the middle of measurements. I think the findings are still enough to continue the investigation)

Zooming in the signature verification doesn’t immediately reveal any obvious reason for the huge performance difference:

Hetzner:

Pi:

I’ll investigate whether there’s any room for obvious optimizations for Pi in https://github.com/bitcoin-core/secp256k1 - any other questions or suggestions are welcome.

maflcko commented at 10:35 am on June 30, 2025: member

((900000-886157)/900000=1.5%)

Could clarify in the title or in the text that this is for blocks after assumevalid?

maflcko added the label Resource usage on Jun 30, 2025

l0rinc commented at 1:26 pm on June 30, 2025: contributor

Could clarify in the title or in the text that this is for blocks after assumevalid?

Did I miscalculate it? Since the benchmark was running until 900k blocks of which only 13843 blocks needed script validation (unless I ran into #31494), isn’t that 13843/900000=0.0153811111 i.e. 1.5% of the blocks had any script validation, yet almost 90% of the time was spent there?

[IBD] Raspberry Pi: 90% CPU time for 1.5% of block processing #32832