ThreadDNSAddressSeed hangs on sk_wait_data and doesn’t stop on exit #16778

issue Nikolay-Po openend this issue on September 1, 2019
  1. Nikolay-Po commented at 8:48 am on September 1, 2019: none

    This is a continue of previously opened issue [Unable to stop bitcoin-qt. ThreadDNSAddressSeed hangs. #16642](https://github.com/bitcoin/bitcoin/issues/16642#issue-481935673). Now hardware problems are sorted out. Single consumer-grade HDD on USB3.0 is changed for RAID1 of HGST Ultrastar® 7K4000 on SATA. Database was changed for know working archived version. Except already repaired HDD problem, hardware is stable, none failures or glitches was found for several month of operation under different operating systems.

    The problem is sometimes Bitcoin Core, particularly bitcoin-qt, unable to finish it’s operation on close. I’m expecting normal bitcoin-qt application finishing after clicking “close” sign at top right corner of main GUI window. Actual behavior: after several successful application shutdowns, the application remains to hang with a message “Bitcoin Core is shutting down… Don’t shut down the computer until this window dissapears.” Natively compiled 01 Sep 2019 for arm-linux-gnueabihf, master branch.

    Debug output:

     02019-09-01T07:44:54Z GUI: requestShutdown : Requesting shutdown
     12019-09-01T07:44:54Z GUI: shutdown : Running Shutdown in thread
     22019-09-01T07:44:54Z Interrupting HTTP server
     32019-09-01T07:44:54Z Interrupting HTTP RPC server
     42019-09-01T07:44:54Z Interrupting RPC
     52019-09-01T07:44:54Z addcon thread exit
     62019-09-01T07:44:54Z opencon thread exit
     72019-09-01T07:44:54Z msghand thread exit
     82019-09-01T07:44:54Z Shutdown: In progress...
     92019-09-01T07:44:54Z Stopping HTTP RPC server
    102019-09-01T07:44:54Z Stopping RPC
    112019-09-01T07:44:54Z Stopping HTTP server
    122019-09-01T07:44:54Z Stopped HTTP server
    132019-09-01T07:44:54Z net thread exit
    142019-09-01T07:44:54Z BerkeleyEnvironment::Flush: [/bdb] Flush(false)
    152019-09-01T07:44:54Z BerkeleyEnvironment::Flush: Flushing wallet.dat (refcount = 0)...
    162019-09-01T07:44:55Z BerkeleyEnvironment::Flush: wallet.dat checkpoint
    172019-09-01T07:44:55Z BerkeleyEnvironment::Flush: wallet.dat detach
    182019-09-01T07:44:55Z BerkeleyEnvironment::Flush: wallet.dat closed
    192019-09-01T07:44:55Z BerkeleyEnvironment::Flush: Flush(false) took             271ms
    202019-09-01T07:58:54Z Flushed 64452 addresses to peers.dat  1854ms
    212019-09-01T08:13:56Z Flushed 64452 addresses to peers.dat  2216ms
    222019-09-01T08:22:11Z Potential stale tip detected, will try using extra outbound peer (last tip update: 2245 seconds ago)
    232019-09-01T08:22:11Z net: setting try another outbound peer=true
    242019-09-01T08:28:58Z Flushed 64452 addresses to peers.dat  1895ms
    

    Threads after shutdown request:

     0rock@Debian-Desktop:~$ ps -eLl
     1...
     2F S   UID   PID  PPID   LWP  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
     30 S  1000  3094  1336  3094  4  80   0 - 139807 poll_s pts/0   00:02:43 bitcoin-main
     41 S  1000  3094  1336  3095  0  80   0 - 139807 poll_s pts/0   00:00:01 QXcbEventReader
     51 S  1000  3094  1336  3097  0  80   0 - 139807 futex_ pts/0   00:00:00 mali-mem-purge
     61 S  1000  3094  1336  3098  0  80   0 - 139807 futex_ pts/0   00:00:00 mali-utility-wo
     71 S  1000  3094  1336  3099  0  80   0 - 139807 futex_ pts/0   00:00:00 mali-utility-wo
     81 S  1000  3094  1336  3100  0  80   0 - 139807 futex_ pts/0   00:00:00 mali-utility-wo
     91 S  1000  3094  1336  3101  0  80   0 - 139807 futex_ pts/0   00:00:00 mali-utility-wo
    101 S  1000  3094  1336  3102  0  80   0 - 139807 futex_ pts/0   00:00:00 mali-utility-wo
    111 S  1000  3094  1336  3103  0  80   0 - 139807 futex_ pts/0   00:00:00 mali-utility-wo
    121 S  1000  3094  1336  3104  0  80   0 - 139807 poll_s pts/0   00:00:00 mali-cmar-backe
    131 S  1000  3094  1336  3105  0  80   0 - 139807 futex_ pts/0   00:00:00 mali-hist-dump
    141 S  1000  3094  1336  3106  0  80   0 - 139807 poll_s pts/0   00:00:00 QDBusConnection
    151 S  1000  3094  1336  3112  3  80   0 - 139807 futex_ pts/0   00:01:54 bitcoin-shutoff
    161 S  1000  3094  1336  3117  0  80   0 - 139807 futex_ pts/0   00:00:26 bitcoin-scriptc
    171 S  1000  3094  1336  3118  0  80   0 - 139807 futex_ pts/0   00:00:17 bitcoin-schedul
    181 S  1000  3094  1336  3122 17  80   0 - 139807 futex_ pts/0   00:09:23 bitcoin-qt-init
    191 S  1000  3094  1336  3151  0  80   0 - 139807 sk_wai pts/0   00:00:00 bitcoin-dnsseed
    201 S  1000  3094  1336  3156  0  80   0 - 139807 poll_s pts/0   00:00:00 QThread
    211 S  1000  3094  1336  3157  0  80   0 - 139807 poll_s pts/0   00:00:00 Qt bearer threa
    22...
    

    The issue is reproducing quite reliable. With 0.18 versions on different computers, different builds I had this problem at least three times even after DB rescan. For some time the Core works normally but at some point, usually after network disturbance, it hangs on shutdown.

    Now it is self-compiled from https://github.com/bitcoin/bitcoin.git Bitcoin Core version v0.18.99.0-495db72ee-dirty The machine is ROCKPro64, RK3399 CPU, 4GB RAM. Configure command was:

    0./configure --enable-debug --enable-werror BDB_LIBS="-L${BDB_PREFIX}/lib -ldb_cxx-4.8" BDB_CFLAGS="-I${BDB_PREFIX}/include" --with-boost-libdir=/usr/lib/arm-linux-gnueabihf
    

    This time I compiled the Core with debug enabled. If being provided with more debug instructions I will try to investigate it deeper. After the compilation and installation, test_bitcoin was completed with none errors.

    As I wrote in #16642 before, the problem persist not only at mine ARM CPU but on an Intel CPU too.

    Thank you.

  2. fanquake added the label P2P on Sep 1, 2019
  3. Nikolay-Po commented at 8:57 pm on September 3, 2019: none

    Is this issue #10210 and comment still valid? Commented on Apr 14, 2017:

    The hanging thread here is the dnsseed thread. Because glibc’s async dns lookup is horribly broken (see https://sourceware.org/bugzilla/show_bug.cgi?id=20874), we stopped using it in #9229. #10215 will make the dnsseed thread exit a bit faster, but until we move to libevent’s dns lookup, we’re stuck with this as the best we can do.

  4. TheBlueMatt commented at 9:12 pm on September 3, 2019: member

    I believe so, yes. We may be able to do some magic to exit even if that thread is running if it’s the only one (and in DNS lookup), but that’s likely a more involved change.

    On Sep 3, 2019, at 16:58, Nikolay-Po notifications@github.com wrote:

    Is this issue #10210 and comment still valid? Commented on Apr 14, 2017:

    The hanging thread here is the dnsseed thread. Because glibc’s async dns lookup is horribly broken (see https://sourceware.org/bugzilla/show_bug.cgi?id=20874), we stopped using it in #9229. #10215 will make the dnsseed thread exit a bit faster, but until we move to libevent’s dns lookup, we’re stuck with this as the best we can do.

    — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

  5. Nikolay-Po commented at 9:57 pm on September 3, 2019: none

    Right before splash screen will gone, the status of address seed is futex_wait. But when main GUI window is appeared, the seed thread status become sk_wait_data.

    0F S   UID   PID  PPID   LWP  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
    11 S  1000 25719 24555 25974  0  80   0 - 122507 futex_ pts/1   00:00:00 bitcoin-dnsseed
    
    01 S  1000 25719 24555 25974  0  80   0 - 137291 sk_wai pts/1   00:00:00 bitcoin-dnsseed
    

    Can’t complete last 240 (or even more) blocks update because of Bitcoin Core disability to finish the closure. While searching a solution, I found a thread when a man claiming he lost a year of bitcoin(-qt?) uptime upon unsuccessfull closure (probably dnsseed caused). After he restarted the core the node had to re-scan all blocks loaded during a year. Are there any hints for:

    1. How to terminate dnsseed thread but not terminating bitcoin-qt to let it finish DB update? If I’m sending SIGQUIT to thread PPID by htop it interrupting whole bitcoin-qt and the database block count remains as before bitcoin-qt start. Is there a different way, may be some Bitcoin Core RPC request or operating system message? Anything that will close dnsseed but let bitcoin-qt to finish.
    2. May I can change the order of closing operations to let the database finish it’s operation before ThreadDNSAddressSeed? If yes then how to do that in sources?
    3. I’m not satisfied by node database behavior on unexpected exits. Why after SIGQUIT or any other exits except normal stop there are none already loaded and scanned blocks present in chain state? After unexpected stop the blocks which was acquired during last run are not in the chain and are need to be re-scanned. May be some mechanism for periodic database update is needed to not loose a lot of CPU time for rescan, for example, a year of uptime? I’d prefer periodic, say, daily, database maintenance procedure to keep the blocks validated even in case of unexpected node operation interruption.
  6. Nikolay-Po commented at 5:16 pm on September 6, 2019: none

    It seems to me I have found a workaround how to exit from dnsseed hang and let the Core to update the database on exit (but at next run):

    1. If ps status of dnsseed thread is “sk_wai” then there is a problem: Bitcoin Core can not be stopped normally and the database cannot be updated completely. The node will definitely hang on exit. The blocks will remain on the media but not indexed.
    2. In case of (1), to recover from ThreadDNSAddressSeed lockout I’m killing the Core by sending SIGKILL. SIGQUIT doesn’t heal the hang. If I will send SIGQUIT then dnsseed will hang right on next bitcoin-qt start and chain state update will not be performed.
    3. After killing by SIGKILL, the ThreadDNSAddressSeed will not hang at next start, at least first time. After sudden Core kill, at next bitcoin-qt start, dnsseed thread is present for some time while splash screen is displayed, between “Loading data” status gone and GUI window appearance. The waiting channel of dnsseed is “poll_s”. After GUI window appearance dnsseed is not here and I can wait for database synchronization completion. At this stage the blocks on disk which was left not indexed at previous run on disk become indexed and the database become complete in sync without dnsseed hang.
    4. When database synchronization is complete and ThreadDNSAddressSeed is not running or hanging I’m closing the Core normally. Bitcoin-qt finishing it’s operation normally and completely. The database on media remains in good current state, without the loose of last run chainstate update.
    5. I’m starting bitcoin-qt again to continue node operation for some next days.
    6. Wile running for some time, ThreadDNSAddressSeed become active and most probably will hang. Periodically I’m checking the status of dnsseed thread. And if it is hanging and there are several days passed, I’m performing the procedure form start, point 1.

    Of course this workaround is not a solution, but it, at least, allow not to loose, say, a year of chainstate update. I can control the hang and can kill and restart the Core before it will accumulate a lot of blocks without possibility to store correct chainstate.

  7. Nikolay-Po commented at 12:36 pm on September 15, 2019: none

    The things got worse. I was able to recover dnsseed thread hang as described above. Next time the dnsseed hangs I had to kill it about 8 times before it returned to normal operation with normal thread exit. Now I tried to kill dnsseed about 20 times with no luck. Have captured the packets before DNSseed gets hang.

    DNSseed_failure.pcapng.gz

    I see normal DNS requests by UDP protocol. Then bitcoin-qt creates TCP connection to DNS’s 53-rd port and requesting for next address. The router’s DNS replays ACK for TCP packet with DNS request but doesn’t respond with an address. So the reason for DNSseed hanging is the absence of TCP reply from a DNS. But it obviously shouldn’t lead to bitcoin-qt blocking.

  8. Nikolay-Po commented at 1:21 pm on September 15, 2019: none
    Oh Good! At router side, for LAN segment where bitcoin-qt is running, have blocked TCP protocol, outgoing port 53 completely. Now DNSseed thread is passing through futex_ then poll_s states then silently exit. So the problem is mitigated. Who knows why DNSseed starts to poll DNS by TCP protocol when UDP was successfull just before?
  9. MarcoFalke added this to the milestone 0.19.0 on Sep 15, 2019
  10. achow101 commented at 3:42 pm on September 16, 2019: member
    There appears to be another person reporting the same issue: https://bitcointalk.org/index.php?topic=5180523.msg52321747#msg52321747
  11. Nikolay-Po commented at 4:02 pm on September 16, 2019: none
    Not sure the same. In my case only block indexing from last start become lost in case of hang. But I had not to re-index whole blockchain. I had before when had a problems with consumer HDD high read error rate.
  12. achow101 commented at 4:19 pm on September 16, 2019: member

    Not sure the same. In my case only block indexing from last start become lost in case of hang. But I had not to re-index whole blockchain. I had before when had a problems with consumer HDD high read error rate.

    I don’t think he had to reindex the blockchain, that was just a suggestion from another user.

  13. Nikolay-Po commented at 4:32 pm on September 16, 2019: none
    Ok, agreed now, thanks! P.S. Have red the topic to it’s end. Indeed, the same problem.
  14. alsxz-bit commented at 9:08 am on September 21, 2019: none

    Hi! I am topic starter on bitcointalk. Sure it’s the same problem. I had to reindex not whole blockchain but only from last start. Changing local DNS to 8.8.8.8 fixed issue.

    Note that problem is not only with bitcoin-qt but with bitcoind too.

  15. laanwj removed this from the milestone 0.19.0 on Sep 26, 2019
  16. laanwj added this to the milestone 0.20.0 on Sep 26, 2019
  17. instagibbs commented at 2:25 pm on September 26, 2019: member
    Right before this issue, are you getting blocks and operating normally? I’ve heard reports where the node stops asking for blocks, and then shutdown exactly this happens.
  18. MarcoFalke removed this from the milestone 0.20.0 on Apr 12, 2020
  19. pinheadmz commented at 4:03 pm on March 7, 2023: member
    @Nikolay-Po is this still an issue on the latest release of bitcoin core?
  20. pinheadmz assigned pinheadmz on Jun 2, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me