Unable to stop bitcoin-qt. ThreadDNSAddressSeed hangs.

Nikolay-Po commented at 11:24 pm on August 17, 2019: none

The problem: Sometimes, usually after Internet connection disturbances, it is impossible to stop bitcoin-qt or daemon.

I’m expecting: That on close command bitcoin-qt in GUI or daemon mode will finish it’s operation in reasonable time and the database will contain the last synchronized blocks and is ready to continue the operation from it’s current state.

Actual behavior: Qt-interface is hanging on “Bitcoin Core is shutting down” window, daemon just remains as a process. None CPU or I/O load is created when the process stalls. None reads/writes to the storage, none percents of CPU load. Have confirmed by looking at htop and iotop terminals. Bitcoin Core will running normally after WAN connection change or temporary problems. Newer peers are connected both way, the traffic is OK, all seems OK. But it will not stop operation normally any more. Only one way to stop is to send SIGQUIT to the process. The worst thing is the loose of last Bitcoin Core session data. For example, if Bitcoin Core was running two weeks and was failed to stop correctly, at the next start it will continue only from two weeks ago state. The blocks need to be rescanned form HDD again. This taking rather long time. After shutdown problem it may be impossible to perform shutdown even if there were none problems with WAN. I had to restart the client several times to get normal close. After it closed correctly, Bitcoin Core operating normally and shutting down normally. Here is a log from unsuccessful shutdown:

 02019-08-17T21:13:34Z GUI: requestShutdown : Requesting shutdown
 12019-08-17T21:13:34Z GUI: shutdown : Running Shutdown in thread
 22019-08-17T21:13:34Z Interrupting HTTP server
 32019-08-17T21:13:34Z Interrupting HTTP RPC server
 42019-08-17T21:13:34Z Interrupting RPC
 52019-08-17T21:13:34Z Shutdown: In progress...
 62019-08-17T21:13:34Z Stopping HTTP RPC server
 72019-08-17T21:13:34Z addcon thread exit
 82019-08-17T21:13:34Z opencon thread exit
 92019-08-17T21:13:34Z Stopping RPC
102019-08-17T21:13:34Z Stopping HTTP server
112019-08-17T21:13:34Z Stopped HTTP server
122019-08-17T21:13:34Z BerkeleyEnvironment::Flush: [/media/nikolaypo/bitcoindb] Flush(false)
132019-08-17T21:13:34Z BerkeleyEnvironment::Flush: Flushing wallet.dat (refcount = 0)...
142019-08-17T21:13:34Z BerkeleyEnvironment::Flush: wallet.dat checkpoint
152019-08-17T21:13:34Z BerkeleyEnvironment::Flush: wallet.dat detach
162019-08-17T21:13:34Z net thread exit
172019-08-17T21:13:34Z msghand thread exit
182019-08-17T21:13:34Z BerkeleyEnvironment::Flush: wallet.dat closed
192019-08-17T21:13:34Z BerkeleyEnvironment::Flush: Flush(false) took              86ms
202019-08-17T21:13:34Z BerkeleyEnvironment::Flush: [/media/nikolaypo/bitcoindb] Flush(false)
212019-08-17T21:13:34Z BerkeleyEnvironment::Flush: Flush(false) took               0ms
222019-08-17T21:27:43Z Flushed 66639 addresses to peers.dat  656ms
232019-08-17T21:42:44Z Flushed 66639 addresses to peers.dat  635ms
242019-08-17T21:50:27Z Potential stale tip detected, will try using extra outbound peer (last tip update: 2260 seconds ago)
252019-08-17T21:50:27Z net: setting try another outbound peer=true
262019-08-17T21:57:44Z Flushed 66639 addresses to peers.dat  622ms
272019-08-17T22:00:57Z Potential stale tip detected, will try using extra outbound peer (last tip update: 2890 seconds ago)
282019-08-17T22:00:57Z net: setting try another outbound peer=true
292019-08-17T22:11:27Z Potential stale tip detected, will try using extra outbound peer (last tip update: 3520 seconds ago)
302019-08-17T22:11:27Z net: setting try another outbound peer=true
312019-08-17T22:12:45Z Flushed 66639 addresses to peers.dat  605ms
322019-08-17T22:21:57Z Potential stale tip detected, will try using extra outbound peer (last tip update: 4150 seconds ago)
332019-08-17T22:21:57Z net: setting try another outbound peer=true
342019-08-17T22:58:17Z

Again, when the client hangs, it does not producing CPU or storage load.

Reproduction: Keep Bitcoin Core running normally. Then change WAN IP on upstream router by restarting PPTP interface. Let the client run some more to confirm normal behavior before stopping. Then try to close bitcoin-qt or send stop command to the daemon.

Sometimes the problem is not manifesting itself. But with about 10-30% of WAN connection disturbances or reconnects leads to impossibility of correct shutting down.

Have tried v0.18.0 and v0.18.1, have tried amd64 and arm architectures. Have tried GCC 6.3 and GCC 9.1.0 compilers. The problem persisting with different builds and on different machines. All sources was from bitcoin project on github.com, the place were I’m leaving this issue description. I have not tested pre-compiled versions.

Intel Core2Duo, Debian 9 Linux, amd64. 2) Rockchip RK3399, Armbian Linux, aarch64-linux-gnu. With both machines I have used the same USB 3.0/SATA adapter with the same 3TB Seagate (rotating) HDD. EXT4.

I had to rescan the database three times because of hardware failures, two times after a power loss, one time after USB interface sudden disconnection. May be there are the problems with blocks but fsck with -P parameter doesn’t found an errors.

As for me, I need more feedback from Bitcoin Core client in case of problems and more robustness. One time I had a problem with one chainstate file corruption on the media I had to rescan the data base again from block zero. It is a pity that single file corruption cannot be recovered automatically without spending a lot of IO operations and CPU power for rescanning.

Thank you for your attention!

Nikolay

fanquake deleted a comment on Aug 18, 2019

MarcoFalke commented at 3:10 pm on August 21, 2019: member

Are you sure this is due to the stale tip detection? It looks more like a thread that does not join on shutdown

Nikolay-Po commented at 3:18 pm on August 21, 2019: none

I have almost zero Bitcoin competence. I’m not sure in anything. What can I do next to investigate the problem? Why Bitcoin Core rejects all information of whole running period if was shut down not gracefully? среда, 21 августа 2019г., 18:13 +03:00 от MarcoFalke notifications@github.com :

Are you sure this is due to the stale tip detection? It looks more like a thread that does not join on shutdown — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub , or mute the thread .

MarcoFalke commented at 4:19 pm on August 21, 2019: member

Based on the debug.log, it is stuck in this line (or any later line):

https://github.com/bitcoin/bitcoin/blob/2472733a24a9364e4c6233ccd04166a26a68cc65/src/init.cpp#L221-L304

The Flushed 66639 addresses to peers.dat (happens inside of CConnman::Stop).

If you feel comfortable debugging this yourself, you can do that (for example) by adding log statements to the shutdown function. So maybe add LogPrintf("%s: Passed %s\n", __func__, __LINE__); in every other line after line if (g_connman) g_connman->Stop(); and the next time this happens, at least you will see which line caused the deadlock.

Nikolay-Po commented at 8:19 pm on August 24, 2019: none

Have inserted additional debug printouts after each function call inside of Shutdown() function. The last line which is not reaching it’s end is

if (g_connman) g_connman->Stop();

The log after this line execution looks like this:

 02019-08-24T20:08:17Z       - Connect 2576 transactions: 9925.32ms (3.853ms/tx, 2.221ms/txin) [199.79s (10515.11ms/blk)]
 12019-08-24T20:08:17Z     - Verify 4468 txins: 9948.58ms (2.227ms/txin) [200.34s (10544.47ms/blk)]
 22019-08-24T20:08:17Z     - Index writing: 0.40ms [0.01s (0.28ms/blk)]
 32019-08-24T20:08:17Z     - Callbacks: 0.32ms [0.00s (0.09ms/blk)]
 42019-08-24T20:08:17Z   - Connect total: 9964.71ms [200.75s (10565.71ms/blk)]
 52019-08-24T20:08:17Z   - Flush: 35.89ms [0.38s (20.02ms/blk)]
 62019-08-24T20:08:17Z   - Writing chainstate: 0.33ms [0.01s (0.33ms/blk)]
 72019-08-24T20:08:17Z Blockpolicy estimates updated by 0 of 0 block txs, since last block 0 of 0 tracked, mempool map size 0, max target 39 from historical
 82019-08-24T20:08:17Z UpdateTip: new best=000000000000000000195bb8347e4d48858145b696756b28e77843e6fcecf23c height=590456 version=0x20000000 log2_work=90.972688 tx=445847403 date='2019-08-17T00:28:39Z' progress=0.994447 cache=12.1MiB(91185txo)
 92019-08-24T20:08:17Z   - Connect postprocess: 4.38ms [0.11s (5.81ms/blk)]
102019-08-24T20:08:17Z - Connect block: 10064.16ms [202.51s (10658.52ms/blk)]
112019-08-24T20:08:17Z msghand thread exit
122019-08-24T20:08:17Z sending version (102 bytes) peer=3
132019-08-24T20:08:17Z send version message: version 70015, blocks=590455, us=[::]:0, peer=3
142019-08-24T20:08:17Z opencon thread exit
152019-08-24T20:08:17Z WriteBatch memory usage: db=txindex, before=2.8MiB, after=3.0MiB

Nikolay-Po commented at 11:53 am on August 25, 2019: none

Dug down to CConnman::Stop() method. It seems to me the core hangs at threadDNSAddressSeed.join() function.

void CConnman::Stop() { ... if (threadDNSAddressSeed.joinable()) threadDNSAddressSeed.join();

fanquake commented at 11:57 am on August 25, 2019: member

@promag is this similar to what you saw in #15657 ?

promag commented at 11:58 am on August 25, 2019: member

@fanquake I was about to comment that but I don’t think it is. In #15657 it eventually quits.

Nikolay-Po commented at 12:06 pm on August 25, 2019: none

Mine doesn’t quit even being left for tonight.

Nikolay-Po commented at 2:12 pm on August 25, 2019: none

My C++ (and in general, OOP) understanding absence prevents further debugging. I can’t find what is this “threadDNSAddressSeed” and where it’s “join” thing. And why there is a “join” if I ask to stop. It should be “leave” but not “join”. Do not understand. :(

MarcoFalke commented at 3:19 pm on August 25, 2019: member

Can you check the debug log if any of the following lines exist:

P2P peers available. Skipped DNS seeding
Loading addresses from DNS seeds (could take a while)
addresses found from DNS seeds

Nikolay-Po commented at 4:33 pm on August 25, 2019: none

After deleting debug log, P2P word is present only one time: “init message: Loading P2P addresses…”. Word “DNS” two times: “dnsseed thread start” and “Loading addresses from DNS seeds (could take a while)”. Word “seed” only two times, the same places as “DNS”.

Nikolay-Po commented at 6:40 pm on August 26, 2019: none

This is how looks process list from htop when close command was issued but not executed:

 0  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  
 119931 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:01.44 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
 219932 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
 319933 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
 419934 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
 519935 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
 619936 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
 719937 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
 819938 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.04 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
 919939 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:18.10 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
1019940 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6 14:41.51 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
1119941 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6 14:34.99 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
1219942 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6 14:34.51 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
1319943 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6 14:35.58 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
1419944 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6 14:37.79 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
1519945 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  2:08.95 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
1619994 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  1:46.61 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
1720011 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
1820073 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
1920080 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:10.75 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
2019930 nikolaypo  20   0 6117M  993M  138M S  0.6 25.6  2h05:14 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200

CPU is 6-core. Why there are so much processes?

MarcoFalke commented at 6:48 pm on August 26, 2019: member

Note, in markdown you can use ``` to start a block of code (edited your comments)

MarcoFalke commented at 6:50 pm on August 26, 2019: member

So it seems to be stuck somewhere in

https://github.com/bitcoin/bitcoin/blob/00ffe5aca134f424351b05a85da4eef9c3e4553a/src/net.cpp#L1549-L1586

MarcoFalke renamed this:
~~Unable to stop bitcoin-qt. Potential stale tip detected in a loop.~~
Unable to stop bitcoin-qt. ThreadDNSAddressSeed hangs.
on Aug 26, 2019

MarcoFalke commented at 6:51 pm on August 26, 2019: member

Adjusted title.

MarcoFalke commented at 6:54 pm on August 26, 2019: member

CPU is 6-core. Why there are so much processes?

There are about 10 threads for httprpc and about 6 threads for script validation, then one for the qt GUI and one for the main application, as well as some other threads such as ThreadDNSAddressSeed. Note you can get the thread names in htop via F2

MarcoFalke added this to the milestone 0.19.0 on Aug 26, 2019

MarcoFalke added the label P2P on Aug 26, 2019

Nikolay-Po commented at 8:14 pm on August 26, 2019: none

Before close command:

 00 S  1000 30271  1609 30271  1  80   0 - 1118215 poll_s pts/0  00:00:28 bitcoin-qt
 11 S  1000 30271  1609 30272  0  80   0 - 1118215 poll_s pts/0  00:00:00 QXcbEventReader
 21 S  1000 30271  1609 30273  0  80   0 - 1118215 futex_ pts/0  00:00:00 llvmpipe-0
 31 S  1000 30271  1609 30274  0  80   0 - 1118215 futex_ pts/0  00:00:00 llvmpipe-1
 41 S  1000 30271  1609 30275  0  80   0 - 1118215 futex_ pts/0  00:00:00 llvmpipe-2
 51 S  1000 30271  1609 30276  0  80   0 - 1118215 futex_ pts/0  00:00:00 llvmpipe-3
 61 S  1000 30271  1609 30277  0  80   0 - 1118215 futex_ pts/0  00:00:00 llvmpipe-4
 71 S  1000 30271  1609 30278  0  80   0 - 1118215 futex_ pts/0  00:00:00 llvmpipe-5
 81 S  1000 30271  1609 30279  0  80   0 - 1118215 poll_s pts/0  00:00:00 QDBusConnection
 91 S  1000 30271  1609 30280  1  80   0 - 1118215 poll_s pts/0  00:00:25 QThread
101 S  1000 30271  1609 30281  5  80   0 - 1118215 futex_ pts/0  00:02:04 bitcoin-scriptc
111 S  1000 30271  1609 30282  5  80   0 - 1118215 futex_ pts/0  00:02:03 bitcoin-scriptc
121 S  1000 30271  1609 30283  5  80   0 - 1118215 futex_ pts/0  00:02:03 bitcoin-scriptc
131 S  1000 30271  1609 30284  5  80   0 - 1118215 futex_ pts/0  00:02:05 bitcoin-scriptc
141 S  1000 30271  1609 30285  5  80   0 - 1118215 futex_ pts/0  00:02:05 bitcoin-scriptc
151 S  1000 30271  1609 30286  0  80   0 - 1118215 futex_ pts/0  00:00:09 bitcoin-schedul
161 S  1000 30271  1609 30288  0  80   0 - 1118215 futex_ pts/0  00:00:00 QThread
171 D  1000 30271  1609 30320  8  80   0 - 1118215 wait_o pts/0  00:02:44 bitcoin-loadblk
181 S  1000 30271  1609 30321  0  80   0 - 1118215 futex_ pts/0  00:00:03 bitcoin-net
191 S  1000 30271  1609 30322  0  80   0 - 1118215 sk_wai pts/0  00:00:00 bitcoin-dnsseed
201 S  1000 30271  1609 30323  0  80   0 - 1118215 futex_ pts/0  00:00:00 bitcoin-addcon
211 S  1000 30271  1609 30324  0  80   0 - 1118215 futex_ pts/0  00:00:00 bitcoin-opencon
221 S  1000 30271  1609 30325  0  80   0 - 1118215 futex_ pts/0  00:00:02 bitcoin-msghand
231 S  1000 30271  1609 30336  0  80   0 - 1118215 poll_s pts/0  00:00:00 QThread
241 S  1000 30271  1609 30339  0  80   0 - 1118215 poll_s pts/0  00:00:00 QThread
251 S  1000 30271  1609 30346  0  80   0 - 1118215 poll_s pts/0  00:00:00 Qt bearer threa

And after close hang:

 00 S  1000 30271  1609 30271  1  80   0 - 1141085 poll_s pts/0  00:00:31 bitcoin-qt
 11 S  1000 30271  1609 30272  0  80   0 - 1141085 poll_s pts/0  00:00:00 QXcbEventReader
 21 S  1000 30271  1609 30273  0  80   0 - 1141085 futex_ pts/0  00:00:00 llvmpipe-0
 31 S  1000 30271  1609 30274  0  80   0 - 1141085 futex_ pts/0  00:00:00 llvmpipe-1
 41 S  1000 30271  1609 30275  0  80   0 - 1141085 futex_ pts/0  00:00:00 llvmpipe-2
 51 S  1000 30271  1609 30276  0  80   0 - 1141085 futex_ pts/0  00:00:00 llvmpipe-3
 61 S  1000 30271  1609 30277  0  80   0 - 1141085 futex_ pts/0  00:00:00 llvmpipe-4
 71 S  1000 30271  1609 30278  0  80   0 - 1141085 futex_ pts/0  00:00:00 llvmpipe-5
 81 S  1000 30271  1609 30279  0  80   0 - 1141085 poll_s pts/0  00:00:00 QDBusConnection
 91 S  1000 30271  1609 30280  1  80   0 - 1141085 futex_ pts/0  00:00:25 bitcoin-shutoff
101 S  1000 30271  1609 30281  5  80   0 - 1141085 futex_ pts/0  00:02:25 bitcoin-scriptc
111 S  1000 30271  1609 30282  5  80   0 - 1141085 futex_ pts/0  00:02:23 bitcoin-scriptc
121 S  1000 30271  1609 30283  5  80   0 - 1141085 futex_ pts/0  00:02:23 bitcoin-scriptc
131 S  1000 30271  1609 30284  5  80   0 - 1141085 futex_ pts/0  00:02:26 bitcoin-scriptc
141 S  1000 30271  1609 30285  5  80   0 - 1141085 futex_ pts/0  00:02:25 bitcoin-scriptc
151 S  1000 30271  1609 30286  0  80   0 - 1141085 futex_ pts/0  00:00:11 bitcoin-schedul
161 S  1000 30271  1609 30288  0  80   0 - 1141085 futex_ pts/0  00:00:01 QThread
171 S  1000 30271  1609 30322  0  80   0 - 1141085 sk_wai pts/0  00:00:00 bitcoin-dnsseed
181 S  1000 30271  1609 30336  0  80   0 - 1141085 poll_s pts/0  00:00:00 QThread
191 S  1000 30271  1609 30346  0  80   0 - 1141085 poll_s pts/0  00:00:00 Qt bearer threa

Have tried to close PID 30322 “bitcoin-dnsseed” by sending to it the signal SIGQUIT from htop. All bitcoin threads are closing momentary. Restarted bitcoin-qt after some minutes. Have got:

0************************
1EXCEPTION: 15dbwrapper_error       
2Fatal LevelDB error: IO error: /media/nikolaypo/bitcoindb/indexes/txindex/159773.ldb: Input/output error       
3bitcoin in scheduler

Nikolay-Po commented at 4:53 am on August 27, 2019: none

Have to say my media have a problems. Any system I mentioned in this thread was using the same HDD connected though the same USB3.0-SATA adapter. Have started badblocks and got several of:

 0221404
 1221405
 2221406
 3221407
 4570580248
 5570580249
 6570580250
 7570580251
 8570582296
 9570582297
10570582298
11570582299
12570582300
13570582301

May be even more bads because the test was not finished. So the cause of a problem may be the data corruption. Does DNSseed depends on data on disk somehow? May be these files need to be examined?

Nikolay-Po closed this on Aug 27, 2019

DrahtBot locked this on Dec 16, 2021

Unable to stop bitcoin-qt. ThreadDNSAddressSeed hangs. #16642