Add two interruption points in dnsseed thread to make shutdown in early … #4215

pull 4tar wants to merge 10 commits into bitcoin:master from 4tar:master changing 2 files +59 −1
  1. 4tar commented at 5:31 pm on May 22, 2014: contributor

    …stage quickly

    The dnsseed thread is starting by default and takes quick some time, if the user wants to stop the bitcoind by Ctrl-C in command line or a kill in daemon mode when the dnsseed is still working, s/he would have to wait for it to finish all the job, in such situation, an extra ‘kill -9’ would be expected.

    By adding two interruption points just before the time-consuming operations we give the dnsseed thread a chance to realize the user’s intention and quit its job much quickly, usually in seconds instead of several minutes or more depends on network situation.

    Signed-off-by: Huang Le 4tarhl@gmail.com

  2. Add two interruption points in dnsseed thread to make shutdown in early stage quickly
    Signed-off-by: Huang Le <4tarhl@gmail.com>
    cc8ee6a187
  3. laanwj commented at 9:41 am on May 23, 2014: member
    untested ACK, code changes OK
  4. sipa commented at 10:35 am on May 23, 2014: member
    untested ACK
  5. 4tar commented at 12:15 pm on May 23, 2014: contributor
    I did the testing locally, any thing I need to do for pushing the commit?
  6. Use the async getaddrinfo_a() instead of getaddrinfo() to improve the responsiveness of net threads, typically the dnsseed thread
    Signed-off-by: Huang Le <4tarhl@gmail.com>
    d9843c9963
  7. 4tar commented at 4:21 am on May 24, 2014: contributor

    By using async getaddrinfo_a() instead of getaddrinfo() in netbase.cpp::LookupIntern(), the responsiveness issue of dnsseed thread is finally resolved.

    And to note: With the 2nd commit in this pull request, the 1st one is not necessary now, although leaving it there would cause no problem.

  8. fanquake commented at 4:34 am on May 24, 2014: member
    @laanwj @sipa You’ll need to re-review/re-ACK this pull.
  9. Add -lanl (asynchronous name lookup library) checking in configure script, and modify corresponding conditional directive in netbase.cpp to ensure we can use the lib properly.
    Signed-off-by: Huang Le <4tarhl@gmail.com>
    f721cca442
  10. 4tar commented at 10:55 am on May 24, 2014: contributor

    Sorry, the previous building error was caused by missing the libanl, added it in last commit.

    Have done a clean rebuilding and testing locally, please kindly have a review. Thanks.

  11. In configure script check for getaddrinfo_a() directly instead of libanl, for better compatibility with other libc implementations different from glibc
    Signed-off-by: Huang Le <4tarhl@gmail.com>
    07fd1d63a6
  12. in configure.ac: in f721cca442 outdated
    366@@ -367,6 +367,7 @@ if test x$TARGET_OS = xdarwin; then
    367 fi
    368 
    369 AC_CHECK_HEADERS([stdio.h stdlib.h unistd.h strings.h sys/types.h sys/stat.h])
    370+AC_CHECK_LIB(anl, getaddrinfo_a)
    


    theuni commented at 4:20 pm on May 24, 2014:

    Please tie this check to the function and not glibc’s usage, in case other libc’s (uclibc, musl, etc) implement it as part of libc itself:

    0AC_SEARCH_LIBS([getaddrinfo_a], [anl], [AC_DEFINE(HAVE_GETADDRINFO_A, 1,[Define this symbol if you have getaddrinfo_a])])
    

    Then in code, use: #if defined(HAVE_GETADDRINFO_A)

  13. 4tar commented at 5:26 pm on May 24, 2014: contributor
    @theuni Thanks, done as you suggested in last commit.
  14. in src/netbase.cpp: in 07fd1d63a6 outdated
    87@@ -82,8 +88,32 @@ bool static LookupIntern(const char *pszName, std::vector<CNetAddr>& vIP, unsign
    88 #else
    89     aiHint.ai_flags = fAllowLookup ? AI_ADDRCONFIG : AI_NUMERICHOST;
    90 #endif
    91+
    92     struct addrinfo *aiRes = NULL;
    93+#if defined(_GNU_SOURCE) && defined(HAVE_GETADDRINFO_A)
    


    laanwj commented at 10:48 am on May 25, 2014:
    Is the defined(_GNU_SOURCE) necessary here? (i.e. isn’t HAVE_GETADDRINFO_A in itself enough to know that the function exists?)

    4tar commented at 11:21 am on May 25, 2014:

    Indeed, it’s unnecessary in my dev environment, but I just added it for if someone was building with an older toolchain, although such a toolchain might fail to get HAVE_GETADDRINFO_A defined by configure script at the first place, but just in case.

    Sure it should be pretty safe to remove the macro definition check, I’d like to do that if you think that is desired, but it could be even safer (ok, quite little) to leave it there:)


    4tar commented at 12:20 pm on May 25, 2014:

    @laanwj Yes, I consider the situation for a while and come to agree on your point: we should remove the _GNU_SOURCE macro definition checking because someone might build it with another libc implementation, and in such a case, the _GNU_SOURCE was unlikely defined by the compiler but the getaddrinfo_a() could have been implemented in the c library itself.

    Will submit a commit immediately. Thanks for your kind review!

  15. Remove two unnecessary modifications
    1. The _GNU_SOURCE macro definition checking in netbase.cpp, the HAVE_GETADDRINFO_A is enough for using getaddrinfo_a();
    2. The interruption points added in net.cpp::ThreadDNSAddressSeed(), using getaddrinfo_a() instead of getaddrinfo() is enough to make the dnsseed thread responsive.
    
    Signed-off-by: Huang Le <4tarhl@gmail.com>
    b5721372b9
  16. 4tar commented at 12:31 pm on May 25, 2014: contributor
    @laanwj Done. Hope it is the last commit for this pull request. :)
  17. Merge branch 'master' of https://github.com/bitcoin/bitcoin 6f5f19e1e6
  18. theuni commented at 6:51 pm on May 26, 2014: member

    Does this launch a new thread for each resolve? If so, I think it would be necessary to do a quick probe at random times to see what the thread count looks like before/after this change. If it’s a drastic change, this will need some sort of rate-limiting.

    Also (not necessary for this PR), it might be beneficial to maintain a global resolve list in some class so that a single tight loop can be checking the status of all resolves simultaneously, rather than each resolve getting its own loop. From the examples of getaddrinfo_a in the man pages, it seems that’s how it is meant to be used.

  19. 4tar commented at 5:22 am on May 27, 2014: contributor

    @theuni Yes, the getaddrinfo_a() call does kick off a new thread for resolving the name and poll on it in its own thread. So when there is one resolving job, there would be one more thread than before this change.

    However, the real name resolving job happens mostly, if not only, in dnsseed thread. To all other calls to the LookupIntern() function, they are generally kicked off by the constructor of CNetAddr which actually just want to convert a string address into a network address structure, and this kind of job, which is not a real name resolving one, can be done without getaddrinfo(), but a lighter inet_aton(). By detecting this case, we can dramatically reduce the overhead of getaddrinfo(), and in general there should be only one more thread and only when dnsseed thread is running.

    Will submit a patch for this PR according to the above soon.

    For the global resolver idea, one thing need to pointed out is that it doesn’t reduce the thread count but add one (the global resolver itself) instead, while it is extremely helpful for the throughput if there were lots of real async name resolving jobs kicked off simultaneously and the kickers don’t need the result immediately. But as mentioned above, the dnsseed seems the only one needs the result of a real name resolving, and it works in a sync manner, so the global resolver doesn’t help here. Actually why we switch to a async getaddrinfo_a() is just for responsiveness purpose but not throughput. We can absolutely rewrite the dnsseed thread to let it work in an async manner and improve its throughput, but given it’s only running for several minutes, or tens of minutes if the network situation is really bad, before exiting, it doesn’t worth the effort I guess. Or maybe later if we figure out the effort is really needed.

  20. Use inet_pton() in LookupIntern() first
    In LookupIntern(), the getaddrinfo_a() call would be issued if it's avaiable, which in turn would kick off a new thread for async name resolving job.  In our code, most calls go to LookupIntern() are just trying to convert a text ip address into a struct but not a real name resolving request, so we should do that first to avoid extra thread overhead.
    
    Signed-off-by: Huang Le <4tarhl@gmail.com>
    144beceeb1
  21. Explicitly include <arpa/inet.h> for windows building
    Signed-off-by: Huang Le <4tarhl@gmail.com>
    ecb23fb4da
  22. laanwj commented at 7:30 am on May 27, 2014: member

    @theuni, @4tar This is starting to be too complicated. Don’t get too deep in this rabbit hole. I would accept this simple patch, but I don’t want to merge a whole thread/job management system just so that interruption can happen sooner.

    We already have boost::asio for that, if really necessary. No need to replicate its functionality.

    But is the number of threads created by the async resolve really that bad that we need some complex system to work around it? Are we using the resolver this intensively at all? Does it create DoS risk?

  23. 4tar commented at 7:41 am on May 27, 2014: contributor
    Interesting, looks like the i586-mingw32msvc-g++ building env doesn’t have inet_pton() defined. Will submit a patch to AC_SEARCH_LIB for it.
  24. Search for inet_pton() in configure
    Looks like mingw env doesn't define this function yet, so we have to search it first to avoid breaking building in mingw env.
    
    Signed-off-by: Huang Le <4tarhl@gmail.com>
    762152ff0f
  25. 4tar commented at 8:09 am on May 27, 2014: contributor

    @laanwj I agree with you that we don’t really need an async resolver now, so I just add a inet_pton()/inet_addr() call in the LookupIntern() before issuing the getaddrinfo_a(), so that we can avoid kicking off a new resolving thread inside the latter, when the caller just want to convert an ip addr text to an ip addr structure, which, as we have discussed above, is the most cases that the LookupIntern() get called. By doing this, we have already removed the possible thread count problems.

    Hope I have cleared mingw env building error in last commit…

  26. theuni commented at 3:31 pm on May 27, 2014: member

    @4tar: I’m not very familiar with the networking code, so I’ll give your analysis the benefit of the doubt. Thanks for addressing my thread-count concern.

    As for inet_pton, please add “nsl resolv” to the list of libs where it may be found (the empty 2nd param).

  27. 4tar commented at 4:28 pm on May 27, 2014: contributor
    @theuni Thanks for your review! Setting up a correct lib list is always a mystery to me…
  28. Add "nsl resolve" to the search lib list for inet_pton()
    Signed-off-by: Huang Le <4tarhl@gmail.com>
    978d2881b0
  29. BitcoinPullTester commented at 4:58 pm on May 27, 2014: none
    Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/978d2881b00fbc98789b64e8ed2ec4bd2e8fadc5 for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.
  30. 4tar commented at 4:11 am on May 29, 2014: contributor

    @laanwj Anything I can do to make this PR be merged a little bit soon? It passed the test and I believe it is ready.

    The reason I’m asking is that I made a mistake to commit directly on my master branch and send the PR, so the pending causes big issue on my branch management…

  31. 4tar commented at 12:51 pm on May 30, 2014: contributor
    OK, I’m closing this RP and will re-submit it from another branch…
  32. 4tar closed this on May 30, 2014

  33. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 10:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me