With Bitcoin core v23 (Qt versin on Windows, or with the daemon from commandline), the IBD process can stall indefinitely after receiving a large block:
02022-08-07T13:24:50Z [net.cpp:3043] [CNode] Added connection to 104.248.143.83:8333 peer=0
12022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending version (102 bytes) peer=0
22022-08-07T13:24:50Z [net_processing.cpp:1146] [PushNodeVersion] send version message: version 70016, blocks=239242, them=104.248.143.83:8333, txrelay=1, peer=0
32022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: version (102 bytes) peer=0
42022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending verack (0 bytes) peer=0
52022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending getaddr (0 bytes) peer=0
62022-08-07T13:24:50Z [net_processing.cpp:2754] [ProcessMessage] receive version message: /Satoshi:0.20.1/: version 70015, blocks=748403, us=104.248.143.83:8333, txrelay=1, peer=0, peeraddr=104.248.143.83:8333
72022-08-07T13:24:50Z [timedata.cpp:57] [AddTimeData] added time data, samples 2, offset -1 (+0 minutes)
82022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: verack (0 bytes) peer=0
92022-08-07T13:24:50Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70015, blocks=748403, peer=0, peeraddr=104.248.143.83:8333 (manual)
102022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending sendheaders (0 bytes) peer=0
112022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending sendcmpct (9 bytes) peer=0
122022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending sendcmpct (9 bytes) peer=0
132022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending ping (8 bytes) peer=0
142022-08-07T13:24:50Z [net_processing.cpp:4650] [SendMessages] initial getheaders (319216) to peer=0 (startheight:748403)
152022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending getheaders (997 bytes) peer=0
162022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending feefilter (8 bytes) peer=0
172022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: sendheaders (0 bytes) peer=0
182022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: sendcmpct (9 bytes) peer=0
192022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: sendcmpct (9 bytes) peer=0
202022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: ping (8 bytes) peer=0
212022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending pong (8 bytes) peer=0
222022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: getheaders (1029 bytes) peer=0
232022-08-07T13:24:50Z [net_processing.cpp:3223] [ProcessMessage] Ignoring getheaders from peer=0 because node is in initial block download
242022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: feefilter (8 bytes) peer=0
252022-08-07T13:24:50Z [net_processing.cpp:4051] [ProcessMessage] received: feefilter of 0.00001000 BTC/kvB from peer=0
262022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: pong (8 bytes) peer=0
272022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: headers (162003 bytes) peer=0
Then any other attemps to connect to other peers (or accept incoming connections) stalls permanently (we can waitfior hours: no network activity, no disk activity, but some thread uses 100% of a CPU core, probably in a tight loop generated by a deadlock and infinite attemps to recover from it).
If we try to shutdown, al most all databases thread are closed, the RPC services are closed, But the conenction manager remains locked for about ~15 minutes, not doing anything.
In fact the received headers are not even starting to be parse until AFTER these ~15 minutes. At which time we see that in logs:
(note: I’ve enabled logtimestamps=1
, logsourcelocations=1
and logthreadnames=1
in bitcoin.conf to have more traceable traces; and run it with debug=all
).
It seems that a lock is kept somewhere in “net_processing.cpp” after the ProcessMessage() exists with “Ignoring getheaders from peer=0 because node is in initial block download”
Only when we shut down the node we see instantly:
02022-08-07T13:26:17Z [qt/bitcoin.cpp:207] [DebugMessageHandler] GUI: requestShutdown : Requesting shutdown
12022-08-07T13:26:17Z [rpc/server.cpp:302] [operator()] Interrupting RPC
22022-08-07T13:26:17Z [rpc/server.cpp:314] [operator()] Stopping RPC
32022-08-07T13:26:17Z [init.cpp:371] [OnRPCStopped] RPC stopped.
42022-08-07T13:26:17Z [qt/bitcoin.cpp:207] [DebugMessageHandler] GUI: Running Shutdown in thread
52022-08-07T13:26:17Z [httpserver.cpp:436] [InterruptHTTPServer] Interrupting HTTP server
62022-08-07T13:26:17Z [httprpc.cpp:312] [InterruptHTTPRPC] Interrupting HTTP RPC server
72022-08-07T13:26:17Z [init.cpp:205] [Shutdown] Shutdown: In progress...
82022-08-07T13:26:17Z [httprpc.cpp:317] [StopHTTPRPC] Stopping HTTP RPC server
92022-08-07T13:26:17Z [httpserver.cpp:657] [UnregisterHTTPHandler] Unregistering HTTP handler for / (exactmatch 1)
102022-08-07T13:26:17Z [httpserver.cpp:657] [UnregisterHTTPHandler] Unregistering HTTP handler for /wallet/ (exactmatch 0)
112022-08-07T13:26:17Z [httpserver.cpp:448] [StopHTTPServer] Stopping HTTP server
122022-08-07T13:26:17Z [httpserver.cpp:450] [StopHTTPServer] Waiting for HTTP worker threads to exit
132022-08-07T13:26:17Z [util/thread.cpp:19] [TraceThread] opencon thread exit
142022-08-07T13:26:17Z [util/thread.cpp:19] [TraceThread] dnsseed thread exit
152022-08-07T13:26:17Z [util/thread.cpp:19] [TraceThread] addcon thread exit
162022-08-07T13:26:17Z [httpserver.cpp:291] [ThreadHTTP] Exited http event loop
172022-08-07T13:26:17Z [httpserver.cpp:463] [StopHTTPServer] Waiting for HTTP event thread to exit
182022-08-07T13:26:17Z [httpserver.cpp:475] [StopHTTPServer] Stopped HTTP server
192022-08-07T13:26:17Z [mapport.cpp:210] [ProcessUpnp] UPNP_DeletePortMapping() returned: 0
202022-08-07T13:26:17Z [util/thread.cpp:19] [TraceThread] mapport thread exit
And then we wait for about 15-20 minutes to finally see this: the received block headers are processed, an attempt for sending another requests for headers is emitted, but all peers are finally disconnected (because we were shutting down), and all remaining data in moemory is flushed to disk and the node finally exits correctly.
02022-08-07T13:43:47Z [validation.cpp:3606] [ProcessNewBlockHeaders] Synchronizing blockheaders, height: 321216 (~43.64%)
12022-08-07T13:43:47Z [net_processing.cpp:2198] [ProcessHeadersMessage] more getheaders (321216) to end to peer=0 (startheight:748403)
22022-08-07T13:43:47Z [net.cpp:1262] [CreateNodeFromAcceptedSocket] connection from [2604:d500:4:1::2]:63949 accepted
32022-08-07T13:43:47Z [util/thread.cpp:19] [TraceThread] net thread exit
42022-08-07T13:43:47Z [net.cpp:3060] [PushMessage] sending getheaders (997 bytes) peer=0
52022-08-07T13:43:47Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=txindex, before=0.0MiB, after=0.0MiB
62022-08-07T13:43:47Z [util/thread.cpp:19] [TraceThread] msghand thread exit
72022-08-07T13:43:47Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=txindex, before=0.0MiB, after=0.0MiB
82022-08-07T13:43:47Z [util/thread.cpp:19] [TraceThread] txindex thread exit
92022-08-07T13:43:47Z [random.cpp:520] [SeedPeriodic] Feeding 1009016 bytes of dynamic environment data into RNG
102022-08-07T13:43:47Z [net.cpp:1846] [DumpAddresses] Flushed 6626 addresses to peers.dat 701ms
112022-08-07T13:43:47Z [net.cpp:567] [CloseSocketDisconnect] disconnecting peer=0
122022-08-07T13:43:47Z [net_processing.cpp:1282] [FinalizeNode] Cleared nodestate for peer=0
132022-08-07T13:43:47Z [net.cpp:567] [CloseSocketDisconnect] disconnecting peer=1
142022-08-07T13:43:47Z [net_processing.cpp:1282] [FinalizeNode] Cleared nodestate for peer=1
152022-08-07T13:43:47Z [net.cpp:1846] [DumpAddresses] Flushed 6626 addresses to peers.dat 15ms
162022-08-07T13:43:47Z [util/thread.cpp:19] [TraceThread] scheduler thread exit
172022-08-07T13:43:47Z [validation.cpp:4644] [DumpMempool] Writing 0 unbroadcast transactions to disk.
182022-08-07T13:43:47Z [validation.cpp:4654] [DumpMempool] Dumped mempool: 0s to copy, 0.036892s to dump
192022-08-07T13:43:47Z [policy/fees.cpp:998] [FlushUnconfirmed] Recorded 0 unconfirmed txs from mempool in 0s
202022-08-07T13:43:47Z [logging/timer.h:57] [Log] FlushStateToDisk: write block and undo data to disk started
212022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block and undo data to disk completed (38.02ms)
222022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block index to disk started
232022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=index, before=0.0MiB, after=0.3MiB
242022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block index to disk completed (36.98ms)
252022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write coins cache to disk (0 coins, 0kB) started
262022-08-07T13:43:48Z [txdb.cpp:161] [BatchWrite] Writing final batch of 0.00 MiB
272022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=chainstate, before=0.0MiB, after=0.0MiB
282022-08-07T13:43:48Z [txdb.cpp:163] [BatchWrite] Committed 0 changed transaction outputs (out of 0) to coin database...
292022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write coins cache to disk (0 coins, 0kB) completed (0.00ms)
302022-08-07T13:43:48Z [validationinterface.cpp:243] [ChainStateFlushed] Enqueuing ChainStateFlushed: block hash=000000000000012a873661c59cd6301ace5a9108786284693eb0e35038713f4d
312022-08-07T13:43:48Z [validationinterface.cpp:243] [operator()] ChainStateFlushed: block hash=000000000000012a873661c59cd6301ace5a9108786284693eb0e35038713f4d
322022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=db, before=0.0MiB, after=0.0MiB
332022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=db, before=0.0MiB, after=0.0MiB
342022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block and undo data to disk started
352022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block and undo data to disk completed (31.27ms)
362022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block index to disk started
372022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=index, before=0.3MiB, after=0.3MiB
382022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block index to disk completed (15.63ms)
392022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write coins cache to disk (0 coins, 0kB) started
402022-08-07T13:43:48Z [txdb.cpp:161] [BatchWrite] Writing final batch of 0.00 MiB
412022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=chainstate, before=0.0MiB, after=0.0MiB
422022-08-07T13:43:48Z [txdb.cpp:163] [BatchWrite] Committed 0 changed transaction outputs (out of 0) to coin database...
432022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write coins cache to disk (0 coins, 0kB) completed (0.00ms)
442022-08-07T13:43:48Z [validationinterface.cpp:243] [ChainStateFlushed] Enqueuing ChainStateFlushed: block hash=000000000000012a873661c59cd6301ace5a9108786284693eb0e35038713f4d
452022-08-07T13:43:48Z [init.cpp:322] [Shutdown] Shutdown: done
462022-08-07T13:43:48Z [qt/bitcoin.cpp:207] [DebugMessageHandler] GUI: Shutdown finished
472022-08-07T13:43:48Z [qt/bitcoin.cpp:207] [DebugMessageHandler] GUI: ~InitExecutor : Stopping thread
482022-08-07T13:43:48Z [qt/bitcoin.cpp:207] [DebugMessageHandler] GUI: ~InitExecutor : Stopped thread
If we relaunch the node, it loads all data successfully, connects to a new peer=0 and requests another set of headers, then stalls again exactly as above, and we have to shutdown and wait 15-20 minutes exactly the same way as above. A single connection is made instantly to download a single nex set of headers, and then stalls forever. Other connections are added to the list (the UPnP discovery occurs, other peers are also discovered), but will never start. These extra connections are locked again until….
We shutdown the UI and then wait for 15-20 minutes when the RPC services are closed. These pending connections are then open and closed immediately and the UI closes.
Result: we have been able to download a single set of headers at each run (less than 1 minute to start and then seeing the lock after the 1st download of headers, just about 180KB, then no network or disk activity at all dor an infinite time, then shutdown and wait 15-20 minutes).
Expected behavior
IBD should not stall. It should also connect to other peers, and not just to the initial one (the QT interface dispays constantly: “connecting to peers…”. System information
Bitcoin v23 for Windows (from https://bitcoincore.org/en/download/, extracted from the ZIP file, because the current EXE installer does not run on a Windows system enforcing ASLT)
Windows 11 x64 (Intel Core i7), 32 GB RAM