Unable to stop bitcoin-qt. ThreadDNSAddressSeed hangs. #16642

issue Nikolay-Po openend this issue on August 17, 2019
  1. Nikolay-Po commented at 11:24 pm on August 17, 2019: none

    The problem: Sometimes, usually after Internet connection disturbances, it is impossible to stop bitcoin-qt or daemon.

    I’m expecting: That on close command bitcoin-qt in GUI or daemon mode will finish it’s operation in reasonable time and the database will contain the last synchronized blocks and is ready to continue the operation from it’s current state.

    Actual behavior: Qt-interface is hanging on “Bitcoin Core is shutting down” window, daemon just remains as a process. None CPU or I/O load is created when the process stalls. None reads/writes to the storage, none percents of CPU load. Have confirmed by looking at htop and iotop terminals. Bitcoin Core will running normally after WAN connection change or temporary problems. Newer peers are connected both way, the traffic is OK, all seems OK. But it will not stop operation normally any more. Only one way to stop is to send SIGQUIT to the process. The worst thing is the loose of last Bitcoin Core session data. For example, if Bitcoin Core was running two weeks and was failed to stop correctly, at the next start it will continue only from two weeks ago state. The blocks need to be rescanned form HDD again. This taking rather long time. After shutdown problem it may be impossible to perform shutdown even if there were none problems with WAN. I had to restart the client several times to get normal close. After it closed correctly, Bitcoin Core operating normally and shutting down normally. Here is a log from unsuccessful shutdown:

     02019-08-17T21:13:34Z GUI: requestShutdown : Requesting shutdown
     12019-08-17T21:13:34Z GUI: shutdown : Running Shutdown in thread
     22019-08-17T21:13:34Z Interrupting HTTP server
     32019-08-17T21:13:34Z Interrupting HTTP RPC server
     42019-08-17T21:13:34Z Interrupting RPC
     52019-08-17T21:13:34Z Shutdown: In progress...
     62019-08-17T21:13:34Z Stopping HTTP RPC server
     72019-08-17T21:13:34Z addcon thread exit
     82019-08-17T21:13:34Z opencon thread exit
     92019-08-17T21:13:34Z Stopping RPC
    102019-08-17T21:13:34Z Stopping HTTP server
    112019-08-17T21:13:34Z Stopped HTTP server
    122019-08-17T21:13:34Z BerkeleyEnvironment::Flush: [/media/nikolaypo/bitcoindb] Flush(false)
    132019-08-17T21:13:34Z BerkeleyEnvironment::Flush: Flushing wallet.dat (refcount = 0)...
    142019-08-17T21:13:34Z BerkeleyEnvironment::Flush: wallet.dat checkpoint
    152019-08-17T21:13:34Z BerkeleyEnvironment::Flush: wallet.dat detach
    162019-08-17T21:13:34Z net thread exit
    172019-08-17T21:13:34Z msghand thread exit
    182019-08-17T21:13:34Z BerkeleyEnvironment::Flush: wallet.dat closed
    192019-08-17T21:13:34Z BerkeleyEnvironment::Flush: Flush(false) took              86ms
    202019-08-17T21:13:34Z BerkeleyEnvironment::Flush: [/media/nikolaypo/bitcoindb] Flush(false)
    212019-08-17T21:13:34Z BerkeleyEnvironment::Flush: Flush(false) took               0ms
    222019-08-17T21:27:43Z Flushed 66639 addresses to peers.dat  656ms
    232019-08-17T21:42:44Z Flushed 66639 addresses to peers.dat  635ms
    242019-08-17T21:50:27Z Potential stale tip detected, will try using extra outbound peer (last tip update: 2260 seconds ago)
    252019-08-17T21:50:27Z net: setting try another outbound peer=true
    262019-08-17T21:57:44Z Flushed 66639 addresses to peers.dat  622ms
    272019-08-17T22:00:57Z Potential stale tip detected, will try using extra outbound peer (last tip update: 2890 seconds ago)
    282019-08-17T22:00:57Z net: setting try another outbound peer=true
    292019-08-17T22:11:27Z Potential stale tip detected, will try using extra outbound peer (last tip update: 3520 seconds ago)
    302019-08-17T22:11:27Z net: setting try another outbound peer=true
    312019-08-17T22:12:45Z Flushed 66639 addresses to peers.dat  605ms
    322019-08-17T22:21:57Z Potential stale tip detected, will try using extra outbound peer (last tip update: 4150 seconds ago)
    332019-08-17T22:21:57Z net: setting try another outbound peer=true
    342019-08-17T22:58:17Z
    

    Again, when the client hangs, it does not producing CPU or storage load.

    Reproduction: Keep Bitcoin Core running normally. Then change WAN IP on upstream router by restarting PPTP interface. Let the client run some more to confirm normal behavior before stopping. Then try to close bitcoin-qt or send stop command to the daemon.

    Sometimes the problem is not manifesting itself. But with about 10-30% of WAN connection disturbances or reconnects leads to impossibility of correct shutting down.

    Have tried v0.18.0 and v0.18.1, have tried amd64 and arm architectures. Have tried GCC 6.3 and GCC 9.1.0 compilers. The problem persisting with different builds and on different machines. All sources was from bitcoin project on github.com, the place were I’m leaving this issue description. I have not tested pre-compiled versions.

    1. Intel Core2Duo, Debian 9 Linux, amd64. 2) Rockchip RK3399, Armbian Linux, aarch64-linux-gnu. With both machines I have used the same USB 3.0/SATA adapter with the same 3TB Seagate (rotating) HDD. EXT4.

    I had to rescan the database three times because of hardware failures, two times after a power loss, one time after USB interface sudden disconnection. May be there are the problems with blocks but fsck with -P parameter doesn’t found an errors.

    As for me, I need more feedback from Bitcoin Core client in case of problems and more robustness. One time I had a problem with one chainstate file corruption on the media I had to rescan the data base again from block zero. It is a pity that single file corruption cannot be recovered automatically without spending a lot of IO operations and CPU power for rescanning.

    Thank you for your attention!

    Nikolay

  2. fanquake deleted a comment on Aug 18, 2019
  3. MarcoFalke commented at 3:10 pm on August 21, 2019: member
    Are you sure this is due to the stale tip detection? It looks more like a thread that does not join on shutdown
  4. Nikolay-Po commented at 3:18 pm on August 21, 2019: none

    I have almost zero Bitcoin competence. I’m not sure in anything. What can I do next to investigate the problem? Why Bitcoin Core rejects all information of whole running period if was shut down not gracefully? среда, 21 августа 2019г., 18:13 +03:00 от MarcoFalke notifications@github.com :

    Are you sure this is due to the stale tip detection? It looks more like a thread that does not join on shutdown — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub , or mute the thread .

  5. MarcoFalke commented at 4:19 pm on August 21, 2019: member

    Based on the debug.log, it is stuck in this line (or any later line):

    https://github.com/bitcoin/bitcoin/blob/2472733a24a9364e4c6233ccd04166a26a68cc65/src/init.cpp#L221-L304

    The Flushed 66639 addresses to peers.dat (happens inside of CConnman::Stop).

    If you feel comfortable debugging this yourself, you can do that (for example) by adding log statements to the shutdown function. So maybe add LogPrintf("%s: Passed %s\n", __func__, __LINE__); in every other line after line if (g_connman) g_connman->Stop(); and the next time this happens, at least you will see which line caused the deadlock.

  6. Nikolay-Po commented at 8:19 pm on August 24, 2019: none

    Have inserted additional debug printouts after each function call inside of Shutdown() function. The last line which is not reaching it’s end is

    if (g_connman) g_connman->Stop();

    The log after this line execution looks like this:

     02019-08-24T20:08:17Z       - Connect 2576 transactions: 9925.32ms (3.853ms/tx, 2.221ms/txin) [199.79s (10515.11ms/blk)]
     12019-08-24T20:08:17Z     - Verify 4468 txins: 9948.58ms (2.227ms/txin) [200.34s (10544.47ms/blk)]
     22019-08-24T20:08:17Z     - Index writing: 0.40ms [0.01s (0.28ms/blk)]
     32019-08-24T20:08:17Z     - Callbacks: 0.32ms [0.00s (0.09ms/blk)]
     42019-08-24T20:08:17Z   - Connect total: 9964.71ms [200.75s (10565.71ms/blk)]
     52019-08-24T20:08:17Z   - Flush: 35.89ms [0.38s (20.02ms/blk)]
     62019-08-24T20:08:17Z   - Writing chainstate: 0.33ms [0.01s (0.33ms/blk)]
     72019-08-24T20:08:17Z Blockpolicy estimates updated by 0 of 0 block txs, since last block 0 of 0 tracked, mempool map size 0, max target 39 from historical
     82019-08-24T20:08:17Z UpdateTip: new best=000000000000000000195bb8347e4d48858145b696756b28e77843e6fcecf23c height=590456 version=0x20000000 log2_work=90.972688 tx=445847403 date='2019-08-17T00:28:39Z' progress=0.994447 cache=12.1MiB(91185txo)
     92019-08-24T20:08:17Z   - Connect postprocess: 4.38ms [0.11s (5.81ms/blk)]
    102019-08-24T20:08:17Z - Connect block: 10064.16ms [202.51s (10658.52ms/blk)]
    112019-08-24T20:08:17Z msghand thread exit
    122019-08-24T20:08:17Z sending version (102 bytes) peer=3
    132019-08-24T20:08:17Z send version message: version 70015, blocks=590455, us=[::]:0, peer=3
    142019-08-24T20:08:17Z opencon thread exit
    152019-08-24T20:08:17Z WriteBatch memory usage: db=txindex, before=2.8MiB, after=3.0MiB
    
  7. Nikolay-Po commented at 11:53 am on August 25, 2019: none

    Dug down to CConnman::Stop() method. It seems to me the core hangs at threadDNSAddressSeed.join() function.

    void CConnman::Stop() { ... if (threadDNSAddressSeed.joinable()) threadDNSAddressSeed.join();

  8. fanquake commented at 11:57 am on August 25, 2019: member
    @promag is this similar to what you saw in #15657 ?
  9. promag commented at 11:58 am on August 25, 2019: member
    @fanquake I was about to comment that but I don’t think it is. In #15657 it eventually quits.
  10. Nikolay-Po commented at 12:06 pm on August 25, 2019: none
    Mine doesn’t quit even being left for tonight.
  11. Nikolay-Po commented at 2:12 pm on August 25, 2019: none
    My C++ (and in general, OOP) understanding absence prevents further debugging. I can’t find what is this “threadDNSAddressSeed” and where it’s “join” thing. And why there is a “join” if I ask to stop. It should be “leave” but not “join”. Do not understand. :(
  12. MarcoFalke commented at 3:19 pm on August 25, 2019: member

    Can you check the debug log if any of the following lines exist:

    • P2P peers available. Skipped DNS seeding
    • Loading addresses from DNS seeds (could take a while)
    • addresses found from DNS seeds
  13. Nikolay-Po commented at 4:33 pm on August 25, 2019: none
    After deleting debug log, P2P word is present only one time: “init message: Loading P2P addresses…”. Word “DNS” two times: “dnsseed thread start” and “Loading addresses from DNS seeds (could take a while)”. Word “seed” only two times, the same places as “DNS”.
  14. Nikolay-Po commented at 6:40 pm on August 26, 2019: none

    This is how looks process list from htop when close command was issued but not executed:

     0  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  
     119931 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:01.44 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
     219932 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
     319933 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
     419934 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
     519935 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
     619936 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
     719937 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
     819938 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.04 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
     919939 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:18.10 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
    1019940 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6 14:41.51 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
    1119941 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6 14:34.99 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
    1219942 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6 14:34.51 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
    1319943 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6 14:35.58 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
    1419944 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6 14:37.79 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
    1519945 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  2:08.95 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
    1619994 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  1:46.61 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
    1720011 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
    1820073 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:00.00 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
    1920080 nikolaypo  20   0 6117M  993M  138M S  0.0 25.6  0:10.75 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
    2019930 nikolaypo  20   0 6117M  993M  138M S  0.6 25.6  2h05:14 bitcoin-qt -datadir=/media/nikolaypo/bitcoindb -debug -dbcache=1200
    

    CPU is 6-core. Why there are so much processes?

  15. MarcoFalke commented at 6:48 pm on August 26, 2019: member
    Note, in markdown you can use ``` to start a block of code (edited your comments)
  16. MarcoFalke commented at 6:50 pm on August 26, 2019: member
  17. MarcoFalke renamed this:
    Unable to stop bitcoin-qt. Potential stale tip detected in a loop.
    Unable to stop bitcoin-qt. ThreadDNSAddressSeed hangs.
    on Aug 26, 2019
  18. MarcoFalke commented at 6:51 pm on August 26, 2019: member
    Adjusted title.
  19. MarcoFalke commented at 6:54 pm on August 26, 2019: member

    CPU is 6-core. Why there are so much processes?

    There are about 10 threads for httprpc and about 6 threads for script validation, then one for the qt GUI and one for the main application, as well as some other threads such as ThreadDNSAddressSeed. Note you can get the thread names in htop via F2

  20. MarcoFalke added this to the milestone 0.19.0 on Aug 26, 2019
  21. MarcoFalke added the label P2P on Aug 26, 2019
  22. Nikolay-Po commented at 8:14 pm on August 26, 2019: none

    Before close command:

     00 S  1000 30271  1609 30271  1  80   0 - 1118215 poll_s pts/0  00:00:28 bitcoin-qt
     11 S  1000 30271  1609 30272  0  80   0 - 1118215 poll_s pts/0  00:00:00 QXcbEventReader
     21 S  1000 30271  1609 30273  0  80   0 - 1118215 futex_ pts/0  00:00:00 llvmpipe-0
     31 S  1000 30271  1609 30274  0  80   0 - 1118215 futex_ pts/0  00:00:00 llvmpipe-1
     41 S  1000 30271  1609 30275  0  80   0 - 1118215 futex_ pts/0  00:00:00 llvmpipe-2
     51 S  1000 30271  1609 30276  0  80   0 - 1118215 futex_ pts/0  00:00:00 llvmpipe-3
     61 S  1000 30271  1609 30277  0  80   0 - 1118215 futex_ pts/0  00:00:00 llvmpipe-4
     71 S  1000 30271  1609 30278  0  80   0 - 1118215 futex_ pts/0  00:00:00 llvmpipe-5
     81 S  1000 30271  1609 30279  0  80   0 - 1118215 poll_s pts/0  00:00:00 QDBusConnection
     91 S  1000 30271  1609 30280  1  80   0 - 1118215 poll_s pts/0  00:00:25 QThread
    101 S  1000 30271  1609 30281  5  80   0 - 1118215 futex_ pts/0  00:02:04 bitcoin-scriptc
    111 S  1000 30271  1609 30282  5  80   0 - 1118215 futex_ pts/0  00:02:03 bitcoin-scriptc
    121 S  1000 30271  1609 30283  5  80   0 - 1118215 futex_ pts/0  00:02:03 bitcoin-scriptc
    131 S  1000 30271  1609 30284  5  80   0 - 1118215 futex_ pts/0  00:02:05 bitcoin-scriptc
    141 S  1000 30271  1609 30285  5  80   0 - 1118215 futex_ pts/0  00:02:05 bitcoin-scriptc
    151 S  1000 30271  1609 30286  0  80   0 - 1118215 futex_ pts/0  00:00:09 bitcoin-schedul
    161 S  1000 30271  1609 30288  0  80   0 - 1118215 futex_ pts/0  00:00:00 QThread
    171 D  1000 30271  1609 30320  8  80   0 - 1118215 wait_o pts/0  00:02:44 bitcoin-loadblk
    181 S  1000 30271  1609 30321  0  80   0 - 1118215 futex_ pts/0  00:00:03 bitcoin-net
    191 S  1000 30271  1609 30322  0  80   0 - 1118215 sk_wai pts/0  00:00:00 bitcoin-dnsseed
    201 S  1000 30271  1609 30323  0  80   0 - 1118215 futex_ pts/0  00:00:00 bitcoin-addcon
    211 S  1000 30271  1609 30324  0  80   0 - 1118215 futex_ pts/0  00:00:00 bitcoin-opencon
    221 S  1000 30271  1609 30325  0  80   0 - 1118215 futex_ pts/0  00:00:02 bitcoin-msghand
    231 S  1000 30271  1609 30336  0  80   0 - 1118215 poll_s pts/0  00:00:00 QThread
    241 S  1000 30271  1609 30339  0  80   0 - 1118215 poll_s pts/0  00:00:00 QThread
    251 S  1000 30271  1609 30346  0  80   0 - 1118215 poll_s pts/0  00:00:00 Qt bearer threa
    

    And after close hang:

     00 S  1000 30271  1609 30271  1  80   0 - 1141085 poll_s pts/0  00:00:31 bitcoin-qt
     11 S  1000 30271  1609 30272  0  80   0 - 1141085 poll_s pts/0  00:00:00 QXcbEventReader
     21 S  1000 30271  1609 30273  0  80   0 - 1141085 futex_ pts/0  00:00:00 llvmpipe-0
     31 S  1000 30271  1609 30274  0  80   0 - 1141085 futex_ pts/0  00:00:00 llvmpipe-1
     41 S  1000 30271  1609 30275  0  80   0 - 1141085 futex_ pts/0  00:00:00 llvmpipe-2
     51 S  1000 30271  1609 30276  0  80   0 - 1141085 futex_ pts/0  00:00:00 llvmpipe-3
     61 S  1000 30271  1609 30277  0  80   0 - 1141085 futex_ pts/0  00:00:00 llvmpipe-4
     71 S  1000 30271  1609 30278  0  80   0 - 1141085 futex_ pts/0  00:00:00 llvmpipe-5
     81 S  1000 30271  1609 30279  0  80   0 - 1141085 poll_s pts/0  00:00:00 QDBusConnection
     91 S  1000 30271  1609 30280  1  80   0 - 1141085 futex_ pts/0  00:00:25 bitcoin-shutoff
    101 S  1000 30271  1609 30281  5  80   0 - 1141085 futex_ pts/0  00:02:25 bitcoin-scriptc
    111 S  1000 30271  1609 30282  5  80   0 - 1141085 futex_ pts/0  00:02:23 bitcoin-scriptc
    121 S  1000 30271  1609 30283  5  80   0 - 1141085 futex_ pts/0  00:02:23 bitcoin-scriptc
    131 S  1000 30271  1609 30284  5  80   0 - 1141085 futex_ pts/0  00:02:26 bitcoin-scriptc
    141 S  1000 30271  1609 30285  5  80   0 - 1141085 futex_ pts/0  00:02:25 bitcoin-scriptc
    151 S  1000 30271  1609 30286  0  80   0 - 1141085 futex_ pts/0  00:00:11 bitcoin-schedul
    161 S  1000 30271  1609 30288  0  80   0 - 1141085 futex_ pts/0  00:00:01 QThread
    171 S  1000 30271  1609 30322  0  80   0 - 1141085 sk_wai pts/0  00:00:00 bitcoin-dnsseed
    181 S  1000 30271  1609 30336  0  80   0 - 1141085 poll_s pts/0  00:00:00 QThread
    191 S  1000 30271  1609 30346  0  80   0 - 1141085 poll_s pts/0  00:00:00 Qt bearer threa
    

    Have tried to close PID 30322 “bitcoin-dnsseed” by sending to it the signal SIGQUIT from htop. All bitcoin threads are closing momentary. Restarted bitcoin-qt after some minutes. Have got:

    0************************
    1EXCEPTION: 15dbwrapper_error       
    2Fatal LevelDB error: IO error: /media/nikolaypo/bitcoindb/indexes/txindex/159773.ldb: Input/output error       
    3bitcoin in scheduler       
    
  23. Nikolay-Po commented at 4:53 am on August 27, 2019: none

    Have to say my media have a problems. Any system I mentioned in this thread was using the same HDD connected though the same USB3.0-SATA adapter. Have started badblocks and got several of:

     0221404
     1221405
     2221406
     3221407
     4570580248
     5570580249
     6570580250
     7570580251
     8570582296
     9570582297
    10570582298
    11570582299
    12570582300
    13570582301
    

    May be even more bads because the test was not finished. So the cause of a problem may be the data corruption. Does DNSseed depends on data on disk somehow? May be these files need to be examined?

  24. Nikolay-Po closed this on Aug 27, 2019

  25. DrahtBot locked this on Dec 16, 2021


Nikolay-Po MarcoFalke fanquake promag

Labels
P2P

Milestone
0.19.0


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me