Using Boost Asio to do async resolving #4299

pull 4tar wants to merge 3 commits into bitcoin:master from 4tar:asio_async_resolve changing 6 files +68 −77

4tar commented at 7:36 PM on June 6, 2014: contributor
Also create a new CNetAddr() constructor from the BoostAsioToCNetAddr() method.

NOTE: This PR is related to another one (https://github.com/bitcoin/bitcoin/pull/4259).

Using boost::asio but not getaddrinfo_a() might improve the portability (I'm wondering actually...) but there are two regressions:
1. Doesn't like the getaddrinfo_a(), the asio interface provides no timed wait/loop method, so we have to explicitly sleep when waiting for the result, which is known to be not efficient; And refactor all places which need a dns resolving would be an obvious overkill;
2. Doesn't like the getaddrinfo_a(), the asio::async_resolve() is implemented by calling getaddrinfo() in a newly created thread which would block on the call, so for our original purpose, to improve the net thread responsiveness, it only helps when more than one resolving jobs are issued by one thread sequentially (e.g. dnsseed thread), because only when after a resolving job (the blocking call) the thread gets its chance to response for interruption, while the getaddrinfo_a() doesn't block and can be interrupted at any time.
Base on these two issues above, I personally prefer the original PR 4259, and I'd like to submit another PR for the change of a new CNetAddr() constructor which I think is clearer than an independent method.

Signed-off-by: Huang Le 4tarhl@gmail.com
Using Boost Asio to do async resolving
Signed-off-by: Huang Le <4tarhl@gmail.com>
797aea6f95
A null commit for triggering auto testing
Signed-off-by: Huang Le <4tarhl@gmail.com>
7c3b15975e
4tar commented at 3:03 AM on June 7, 2014: contributor

Aha, can't understand the failure log, just pushed a null commit to trigger it again for checking whether it was a random accident in the test box...
4tar commented at 3:55 AM on June 7, 2014: contributor

Great, looks I guess right:)
4tar cross-referenced this on Jun 7, 2014 from issue Add another CNetAddr constructor which accepts boost::asio::ip::address by 4tar
in src/netbase.cpp:None in 7c3b15975e outdated
```
 117 | -        if (aiTrav->ai_family == AF_INET6)
 118 | -        {
 119 | -            assert(aiTrav->ai_addrlen >= sizeof(sockaddr_in6));
 120 | -            vIP.push_back(CNetAddr(((struct sockaddr_in6*)(aiTrav->ai_addr))->sin6_addr));
 121 | +        while (!ios.poll(ec) && !ec) {
 122 | +            boost::this_thread::interruption_point();
```
laanwj commented at 8:22 AM on June 7, 2014:

The whole reason for proposing the use of boost::asio was that you don't need a polling loop. There must surely be a better way to wait?

4tar commented at 8:52 AM on June 7, 2014:

If we want to fully leverage the asio facility, then we have to refactor all caller to be working in a real async way, so when the result has not been delivered, the caller can do some other useful work without waiting, which unfortunately is not our case. Now they just want an simple but working utility function which gives back the resolving result. To refactor them all looks an overkill for our purpose.

To be more specific, there must be somewhere in the code to wait for the result by calling asio::poll()/asio::run()/etc., and we can only choose to put it inside LookupIntern(), for avoiding the overkill of refactoring of all callers.

laanwj commented at 9:21 AM on June 7, 2014:

Well, let's keep the impact of this change as small as possible. Let's not refactor anything. But a boost::asio event loop stops after all work items are completed (or at least can be configured as such) and AFAIK is naturally interruptible. Wouldn't it be enough to just replace this with ios.run() ?

4tar commented at 11:15 AM on June 7, 2014:

@laanwj Yes the asio internal event loop is naturally interruptible, the problem is that it really is only when the interruption is natural, e.g. a SIGTERM signal from the OS. But in our cases, where the event loop resides, say dnsseed thread, generally speaking, would have no chance to receive such a signal, but rely on boost::thread::interrupt() call to notify it the interruption request, so we have to explicitly place one interruption point inside the event loop.

A slightly better / or cleaner manner to avoid a explicit wait loop with ios.poll() in LookupIntern() is to place one interruption point inside the handler, while using ios.run() simply. By this way, we actually leverage the internal event loop inside ios.run() itself.

However, no matter which way we take, as the discussion below, we still have to wait for the current outstanding call to getaddrinfo() returns, that may take quite a long time depends on the network and requested dns name, e.g. on my test box, the "dnsseed.bluematt.me" would cost about 5 minutes before return nothing.
laanwj commented at 8:23 AM on June 7, 2014: member

Looks good apart from my above nit. A lot less cluttered than the previous implementation.
4tar commented at 9:26 AM on June 7, 2014: contributor
@laanwj Yes the code is simpler than calling getaddrinfo_a(), but as I've mentioned in the PR message, it doesn't fully achieve our purpose because:
1. There is a lacking of timed wait in asio model, so without refactoring the caller we cannot make it work in an really efficient way, but fall into the waiting loop and face the problem of choosing a suitable waiting interval;
2. Inside the asio::async_resolve(), a new thread would be created to call getaddrinfo(), which is blocking, and the asio::io_service have to wait the thread exit before it could be destructed, which makes the LookupIntern() uninterruptable during this waiting period, while getaddrinfo_a() doesn't have this issue.
And let's go back to the original situation, first I noticed that the dnsseed thread always prevented my bitcoind from quitting immediately, because it works in a loop to collect addresses for all dns seeds. So I made the first change to add two interruption points inside the dnsseed thread, just around the places where a call placed would go down into LookupIntern(). It does help, by reducing the waiting time from about one quarter to tens of seconds or several minutes, because every call to LookupIntern() would block for seconds or minutes depends on network and requesting name. However a several minute waiting is still not good enough so I go down to LookupIntern() and change it to use getaddrinfo_a() instead of getaddrinfo(), that change dramatically reduced the waiting to any short time interval you can specified, another benefit is that the getaddrinfo_a() provides a timed wait interface to let the caller get its expected result whenever it is available without having to wait for the fixed time interval.

As the second issue mentioned above, to the dns seed, choosing asio::async_resolve() inside LookupIntern() is the same as my first change of adding two interruption points around its call going to LookupIntern().

Base on the above reason, I'd rather suggest to pick PR 4259, although this may looks simpler or more portable...

My two cents.
sipa commented at 10:24 AM on June 7, 2014: member

I know boost::asio to judge whether we should be using it for DNS resolving, but I'd really prefer not to make netbase dependent on it.
laanwj commented at 10:45 AM on June 7, 2014: member

@sipa Agreed @4tar So boost::asio doesn't even provide a solution in this case, we can't do without the polling loop? Too bad we realize it so late, let's go back to the other pull then.
4tar commented at 11:16 AM on June 7, 2014: contributor

@laanwj @sipa Yes, the boost::asio really doesn't fit in our case, I'd still suggest we pick PR 4259 (https://github.com/bitcoin/bitcoin/pull/4259) for the solution, although it looks a littble bit complicated, or even ugly...
Insert one interruption_point inside the resolve handler to avoid the wait loop in LookupIntern()
Signed-off-by: Huang Le <4tarhl@gmail.com>
387bfd2c26
4tar commented at 11:26 AM on June 7, 2014: contributor

This latest commit doesn't fullfill our purpose as discussed above, but just push it to show that we can remove the wait loop with ios.poll() inside LookupIntern() by placing interruption_point in the resolve_handler, which makes the code a little bit cleaner and simpler...
BitcoinPullTester commented at 11:58 AM on June 7, 2014: none

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/387bfd2c268bc3d49c21019e16c3cbded67ea4a8 for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.
laanwj cross-referenced this on Jun 9, 2014 from issue Add another CNetAddr constructor which accepts boost::asio::ip::address by 4tar
4tar closed this on Jun 10, 2014
bitcoin locked this on Sep 8, 2021

Contributors

Linked (view graph)

#4259 Use async name resolving to improve net thread responsiveness #4301 Add another CNetAddr constructor which accepts boost::asio::ip::address #4303 Add another CNetAddr constructor which accepts boost::asio::ip::address