Implement poll() on systems which support it properly.
This eliminates the restriction on maximum socket descriptor number.
Implement poll() on systems which support it properly.
This eliminates the restriction on maximum socket descriptor number.
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--174a7506f384e20aa4161008e828411d-->
No conflicts as of last run.
It seems that lots of the functional tests have races cause this pull request keeps triggering random seeming failures.
1517 | + if (interruptNet) 1518 | + return; 1519 | + 1520 | + std::vector<struct pollfd> pollfds; 1521 | + struct pollfd pollfd; 1522 | + memset(&pollfd, 0, sizeof(struct pollfd));
struct pollfd pollfd = {} instead of memset?
done
1605 | - for (CNode* pnode : vNodesCopy) 1606 | - pnode->AddRef(); 1607 | + LOCK(pnode->cs_hSocket); 1608 | + if (pnode->hSocket == INVALID_SOCKET) 1609 | + continue; 1610 | + recvSet = recv_set.count(pnode->hSocket);
recvSet = recv_set.count(pnode->hSocket) > 0 to avoid implicit conversion from unsigned long to bool. Applies also to the following to lines. Explicit is better than implicit :-)
done
1507 | +void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set) 1508 | +{ 1509 | + std::set<SOCKET> recv_select_set, send_select_set, error_select_set; 1510 | + GenerateSelectSet(recv_select_set, send_select_set, error_select_set); 1511 | + 1512 | + if (recv_select_set.empty() && send_select_set.empty() and error_select_set.empty()) {
Use && instead of and :-)
done
1540 | + } 1541 | + 1542 | + if(poll(&pollfds[0], pollfds.size(), 50) < 0) 1543 | + LogPrint(BCLog::NET, "poll failure %d %s\n", errno, strerror(errno)); 1544 | + 1545 | + for (struct pollfd pollfd : pollfds) {
Use another variable name: pollfd shadows the already existing local variable with the same name :-)
done
1346 | @@ -1347,8 +1347,6 @@ void CConnman::SocketHandler() 1347 | 1348 | int nSelect = select(have_fds ? hSocketMax + 1 : 0, 1349 | &fdsetRecv, &fdsetSend, &fdsetError, &timeout); 1350 | - if (interruptNet)
Move this into SocketHandler after the SocketEvents call? Or this method cool return a bool, true if it completes, false if it is interrupted.
NOTE: this was fixed in the commit maybe fix things?
1380 | +void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set) 1381 | +{ 1382 | + std::set<SOCKET> recv_select_set, send_select_set, error_select_set; 1383 | + GenerateSelectSet(recv_select_set, send_select_set, error_select_set); 1384 | + 1385 | + std::vector<struct pollfd> pollfds;
pollfds.reserve(recv_select_set.size() + send_select_set.size() + error_select_set.size())
1402 | + pollfd.fd = hSocket; 1403 | + pollfd.events = POLLERR|POLLHUP; 1404 | + pollfds.push_back(pollfd); 1405 | + } 1406 | + 1407 | + if(poll(&pollfds[0], pollfds.size(), 50) < 0)
nit: vectors now have .data(), which returns &pollfds[0] in a more object-oriented way.
nit: Make the 50 a named constant along with timeout.tv_usec in the select impl.
1327 | 1328 | std::set<SOCKET> recv_select_set, send_select_set, error_select_set; 1329 | GenerateSelectSet(recv_select_set, send_select_set, error_select_set); 1330 | 1331 | + if (recv_select_set.empty() && send_select_set.empty() && error_select_set.empty()) { 1332 | + interruptNet.sleep_for(std::chrono::milliseconds(50));
nit: Replace 50 with timeout.tv_usec/1000
1365 | - FD_SET(i, &fdsetRecv); 1366 | - } 1367 | + int nErr = WSAGetLastError(); 1368 | + LogPrintf("socket select error %s\n", NetworkErrorString(nErr)); 1369 | + for (unsigned int i = 0; i <= hSocketMax; i++) 1370 | + FD_SET(i, &fdsetRecv);
I see this is the same as the code currently, but it seems like this should be fdsetError instead of fdsetRecv, even though they are handled exactly the same.
I don't think it should be there at all really. select() failures dont mean the sockets themselves have failed, but i guess it could mean we made a mistake and called select() with a closed socket?
Yeah, then iterating through each one and calling recv would identify which peer to disconnect. Though, if it's fdsetError and not fdsetRecv, it probably shouldn't disconnect when nBytes == 0.
Concept ACK
Concept ACK, will review in detail when my brain works again.
Concept ACK. Will also review shortly.
1377 | @@ -1377,6 +1378,58 @@ void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_s 1378 | if(FD_ISSET(hSocket, &fdsetError)) 1379 | error_set.insert(hSocket); 1380 | } 1381 | +#else 1382 | +void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set) 1383 | +{ 1384 | + std::set<SOCKET> recv_select_set, send_select_set, error_select_set; 1385 | + GenerateSelectSet(recv_select_set, send_select_set, error_select_set);
Could be worth having GenerateSelectSet return a bool that indicates whether or not the sets have any content so that we can avoid the .empty() checks below (and the repetition necessary for the legacy select impl).
1407 | + pollfds.push_back(pollfd); 1408 | + } 1409 | + 1410 | + for (SOCKET hSocket : error_select_set) { 1411 | + pollfd.fd = hSocket; 1412 | + pollfd.events = POLLERR|POLLHUP;
This is ignored and reported on revents in any case, no? I guess no harm in being explicit.
I'm reusing the struct pollfd so it's necessary to overwrite the previous value.
1412 | + pollfd.events = POLLERR|POLLHUP; 1413 | + pollfds.push_back(pollfd); 1414 | + } 1415 | + } 1416 | + 1417 | + if(poll(pollfds.data(), pollfds.size(), 50) < 0)
Should we make use of SELECT_TIMEOUT_MILLISECONDS here?
nit: space after if + braces
yeah
1307 | @@ -1308,6 +1308,7 @@ void CConnman::GenerateSelectSet(std::set<SOCKET> &recv_set, std::set<SOCKET> &s 1308 | } 1309 | } 1310 | 1311 | +#ifdef WIN32
Apparently poll has been flaky on previous versions of MacOS. Do we want to fall back to select on Darwin systems just to be safe?
I'm avoiding using poll() or select() to sleep specifically because of this class of bug.
Some minor comments and a question about conservatively falling back to select for MacOS systems. Code looks good!
1312 | } 1313 | } 1314 | } 1315 | +} 1316 | + 1317 | +#ifndef HAVE_POLL
(from a previous comment on the outdated #ifdef WIN32) Apparently poll has been flaky on previous versions of MacOS. Do we want to fall back to select on Darwin systems just to be safe?
i'm specifically avoiding calling poll (or select) with nothing to listen for because of that class of bugs, i'll add a comment
1379 | + // 1380 | + // Find which sockets have data to receive 1381 | + // 1382 | + struct timeval timeout; 1383 | + timeout.tv_sec = 0; 1384 | + timeout.tv_usec = SELECT_TIMEOUT_MILLISECONDS * 1000; // frequency to poll pnode->vSend
don't know how long the old code waited, but isn't 50 milliseconds a bit short? if there are no events, why not sleep as long as possible? (saving CPU cycles) can you please add a comment explaining the reasoning for this specific value
I carried over the existing value.
There are two reasons for this to be a short value:
i just checked and this is the value from satoshi, so im sure it was entirely arbitrarily chosen
1263 | @@ -1262,28 +1264,10 @@ void CConnman::InactivityCheck(CNode *pnode) 1264 | } 1265 | } 1266 | 1267 | -void CConnman::SocketHandler() 1268 | +bool CConnman::GenerateSelectSet(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set)
I guess we want these sets to be sorted by fd, or at least predictably sorted, as it's iterated over--so using set over unordered_set makes sense
regardless of iterating i cant imagine we'd see any change in a benchmark here
It's probably slightly more efficient to return sorted vectors, and use std::binary_search to look up things in it (as std::map needs one allocation per element), but agree that it is unlikely to matter here.
some minor comments but otherwise utACK from me
101 | @@ -102,12 +102,12 @@ typedef void* sockopt_arg_type; 102 | typedef char* sockopt_arg_type; 103 | #endif 104 | 105 | +#if not defined WIN32
nit: MSVC does not like this. I would suggest to use #ifndef
I think the official syntax, which works with any C++ compiler, is
#if !defined(WIN32)
I haven't reviewed the code at all, but this doesn't seem to work on my mac (macos 10.13.3). It looks like I can't maintain a connection to any peers, here's an example of debug.log output when I grep for a single peer, to show an example:
2018-11-06T18:20:54Z Added connection peer=1788
2018-11-06T18:20:54Z sending version (103 bytes) peer=1788
2018-11-06T18:20:54Z send version message: version 70015, blocks=549011, us=[::]:0, peer=1788
2018-11-06T18:21:55Z socket no message in first 60 seconds, 0 1 from 1788
2018-11-06T18:21:55Z disconnecting peer=1788
2018-11-06T18:21:55Z Cleared nodestate for peer=1788
This pattern is holding for all my peers: my node is making an outbound connection and then disconnecting after a minute.
<!--a722867cd34abeea1fadc8d60700f111-->
Gitian builds for commit d864e45730be82879abe9c096c4d577975fdda7d (master):
4d597d216dc0ed5dae23214891153729... bitcoin-0.17.99-aarch64-linux-gnu-debug.tar.gz574346fdb743373d59742354b44fa6eb... bitcoin-0.17.99-aarch64-linux-gnu.tar.gz917414332974c093e1fb03d36d35fdb5... bitcoin-0.17.99-arm-linux-gnueabihf-debug.tar.gz4372cf9de88a47095cb0866dac04397f... bitcoin-0.17.99-arm-linux-gnueabihf.tar.gz64c30780e32ebbd18ed91f046108b1c9... bitcoin-0.17.99-i686-pc-linux-gnu-debug.tar.gz716ded863700852fd791ce342bd7eb64... bitcoin-0.17.99-i686-pc-linux-gnu.tar.gz9c19a015c3539ced19dc25eeee429122... bitcoin-0.17.99-osx-unsigned.dmgea03114a99a8798350c98a2add480a4c... bitcoin-0.17.99-osx64.tar.gza00c10b250688578673cb6feb1c9c5a5... bitcoin-0.17.99-riscv64-linux-gnu-debug.tar.gz0f46798a6847cf75e15ffc306fc24a6c... bitcoin-0.17.99-riscv64-linux-gnu.tar.gz419ec7a32ed070ea385fc391438333e6... bitcoin-0.17.99-win32-debug.zip38cac7b13da79148605e1ffa2d965b88... bitcoin-0.17.99-win32-setup-unsigned.exeedfde8af30b58d47b04c871be6e147b3... bitcoin-0.17.99-win32.zipccdaa8ed341aaeb4b9fbfd6b257d45c2... bitcoin-0.17.99-win64-debug.zip41b211f9af5270a93241c87e01eaacfb... bitcoin-0.17.99-win64-setup-unsigned.exe9aa1d2e836bd5cbf67430b9b83095364... bitcoin-0.17.99-win64.zipdb115e1ffd79c5812374eea03f815a17... bitcoin-0.17.99-x86_64-linux-gnu-debug.tar.gz2dd403a3fd11d031ea1bb55b87ba99ad... bitcoin-0.17.99-x86_64-linux-gnu.tar.gz3e8f143e26c8add3da4c5948fa51a12e... bitcoin-0.17.99.tar.gzf6936b93cd9d332d2f410c19cce3dada... bitcoin-linux-0.18-res.yml6c37e9b444f5fb010ebeeb0f06da281e... bitcoin-linux-build.log18db39e5556a3cd2361dddce08a018b9... bitcoin-osx-0.18-res.yml96cc3b3dd38c2b20a4f100bd793f3d4e... bitcoin-osx-build.loga559b90ddaaf2302da8c81742383fbd9... bitcoin-win-0.18-res.yml65fe18f73672c6fa6fec1122b1c1876b... bitcoin-win-build.logGitian builds for commit 66701dc4abf7c110b7048ddd5249225f8b406ac2 (master and this pull):
99e6aa42d98f3e2aec6718af6dbafa67... bitcoin-0.17.99-aarch64-linux-gnu-debug.tar.gz31d9e44d80e9f90b64c8763e6bcb31a1... bitcoin-0.17.99-aarch64-linux-gnu.tar.gze35d67fe58402668be22ec1377773eb7... bitcoin-0.17.99-arm-linux-gnueabihf-debug.tar.gzc68fe8e761fae31932d89783ffe5fba1... bitcoin-0.17.99-arm-linux-gnueabihf.tar.gz51471edabd41bb7f256b4acd37293bde... bitcoin-0.17.99-i686-pc-linux-gnu-debug.tar.gzc6a18bb020a6f9e3edbaca7c37881354... bitcoin-0.17.99-i686-pc-linux-gnu.tar.gzef54fa3a12a8c87c82965e7e2a4e0321... bitcoin-0.17.99-osx-unsigned.dmge365b13b2a426dac778b4270b86bb843... bitcoin-0.17.99-osx64.tar.gza1a5f624c0316d0ae630d39f5261af8e... bitcoin-0.17.99-riscv64-linux-gnu-debug.tar.gzdd5546cbce433b9b3a1d21dc3cf23cc1... bitcoin-0.17.99-riscv64-linux-gnu.tar.gz046dd6c2667c6476a717b83f89b7dc90... bitcoin-0.17.99-win32-debug.zip6396d563a21f3fad234a962e9dfcf26b... bitcoin-0.17.99-win32-setup-unsigned.exe56f75690f0efb752506b15280ca6a63e... bitcoin-0.17.99-win32.zipa1161a97597027d3b18bc2a12bd7a450... bitcoin-0.17.99-win64-debug.zip3b530f6a0c4f4e5dab78c4cb0de894d8... bitcoin-0.17.99-win64-setup-unsigned.exe61b470fb10300e3fecfbf441302b54dc... bitcoin-0.17.99-win64.zipad5034a3e08b08ce12763b7b2d3b4e05... bitcoin-0.17.99-x86_64-linux-gnu-debug.tar.gz9b7ed96dc0c838b916d3eba6bcd8ff9a... bitcoin-0.17.99-x86_64-linux-gnu.tar.gz9c1a23e6aa3add785bc3a109dd78c5a4... bitcoin-0.17.99.tar.gz3f659f9aed59eff53e4837b1849e1a3f... bitcoin-linux-0.18-res.yml42d88d5ab445f5a854f50e54defe9eda... bitcoin-linux-build.log795ad5e1c9832b5eb0d6986df608f23a... bitcoin-osx-0.18-res.yml03640b8b01160cd1b400db30a302a544... bitcoin-osx-build.log784ac7bd9e2c03acfd0abe29e5855fb9... bitcoin-win-0.18-res.yml8931546da31fb2709126b32dd191e55d... bitcoin-win-build.log1344 | + pollfds[hSocket].fd = hSocket; 1345 | + // These flags are ignored, but we set them for clarity 1346 | + pollfds[hSocket].events |= POLLERR|POLLHUP; 1347 | + } 1348 | + 1349 | + std::vector<struct pollfd> vpollfds;
vpollfds.reserve(pollfds.size())?
i believe std::map::size is o(n)
Nope, guaranteed to be O(1). https://en.cppreference.com/w/cpp/container/map/size
ok changing now
1346 | + pollfds[hSocket].events |= POLLERR|POLLHUP; 1347 | + } 1348 | + 1349 | + std::vector<struct pollfd> vpollfds; 1350 | + for(auto it : pollfds) 1351 | + vpollfds.push_back(it.second);
.push_back(std::move(it.second))?
1345 | + // These flags are ignored, but we set them for clarity 1346 | + pollfds[hSocket].events |= POLLERR|POLLHUP; 1347 | + } 1348 | + 1349 | + std::vector<struct pollfd> vpollfds; 1350 | + for(auto it : pollfds)
Could use std::transform?
Transform looks overly complicated for this: https://stackoverflow.com/a/771482
I've been trying to figure out why an earlier iteration of this PR fails on macOS (Darwin). The earlier version passes in a separate pollfd struct for each (fd, event type) pair, potentially having duplicate entries for file descriptors listed in multiple *_select_set sets. The current version deduplicates pollfd structs across file descriptors, passing a single struct per fd which consolidates all events we're interested in.
I did a little reading yesterday to try and figure out why the first approach works on some BSD platforms (per @sdaftuar's testing on FreeBSD), but not macOS. Bear with me while we briefly venture into the weeds.
Darwin's poll() implementation basically wraps kqueue use; for each pollfd struct passed into Darwin's poll(), 3 separate kevent_register() calls are made and then waited upon using kqueue_scan() up to the timeout.
I found an apparent bug in Darwin's kqueue implementation: for each kevent registered, a knote is created and enqueued on a per-fd list on the process' file descriptor table. These knotes track which events we're waiting on and are supposed to be unique per (fd, kevent filter) pair, where the kevent filter is either EVFILT_READ (for POLLIN) or EVFILT_WRITE (for POLLOUT), per the kqueue API. However, knote_fdfind() looks to be improperly implemented so that the first knote per fd is always returned, effectively ignoring the difference in filter. As far as I can tell, this could result in a situation where output events are ignored for an fd also being polled for input events (because the POLLIN kevent_register() call comes before the POLLOUT one in poll_nocancel()).
Unfortunately, this doesn't explain why POLLIN events are ignored in the previous version of this change - in @sdaftuar's logs above and in my own testing on macOS, we always disconnect peers because we think there's nothing to read on their sockets. This isn't explained by the supposed bug I note above, though my confidence in macOS' implementation of poll() isn't very high given the complexity of the kqueue implementation backing it.
Other versions of BSD implement poll in a much simpler way, iterating through all the pollfd structs passed in and calling a filetype-specific poll function which returns revents immediately without having to deal with kevents, kqueues, or callbacks. This delegation eventually reduces to a socket-specific poll function.
Anyway, this is a longwinded way of saying that macOS's poll() doesn't inspire confidence and that we should consider disabling its use on that platform. The other BSDs look okay to me.
I've modified to not use poll() on OSX. However that should probably by an automake m4 of some kind, but I'm not sure how to do that.
1328 | + interruptNet.sleep_for(std::chrono::milliseconds(SELECT_TIMEOUT_MILLISECONDS)); 1329 | + return; 1330 | + } 1331 | + 1332 | + std::map<SOCKET, struct pollfd> pollfds; 1333 | + for (SOCKET hSocket : recv_select_set) {
Tiny nit: variable naming style for hSocket.
1346 | + pollfds[hSocket].events |= POLLERR|POLLHUP; 1347 | + } 1348 | + 1349 | + std::vector<struct pollfd> vpollfds; 1350 | + vpollfds.reserve(pollfds.size()); 1351 | + for(auto it : pollfds)
Tiny nit: space after for (and after if further).
1356 | + 1357 | + if (interruptNet) 1358 | + return; 1359 | + 1360 | + for (struct pollfd pollfd : vpollfds) { 1361 | + if (pollfd.revents & POLLIN)
Style nit: braces whenever the next lines are indented (or put the then clause on the same line as the if).
1428 | - if (!interruptNet.sleep_for(std::chrono::milliseconds(timeout.tv_usec/1000))) 1429 | + if (!interruptNet.sleep_for(std::chrono::milliseconds(SELECT_TIMEOUT_MILLISECONDS))) 1430 | return; 1431 | } 1432 | 1433 | + for (SOCKET hSocket : recv_select_set)
Style nit: braces when indenting, and spaces after if.
16 | @@ -17,6 +17,10 @@ 17 | #include <string> 18 | #include <vector> 19 | 20 | +#ifndef WIN32
It doesn't seem like the current code is using poll on Windows.
it isnt?
#if defined(__linux__)
#define USE_POLL
#endif
Oh, damn, I misread ifndef.
Seems this include should be in netbase.cpp, given there's no publicly exposed reference.
30 | @@ -31,6 +31,7 @@ 31 | 32 | #ifndef WIN32 33 | #include <arpa/inet.h> 34 | +#include <poll.h>
It doesn't seem like the current code is using poll on Windows.
it isnt?
101 | @@ -102,8 +102,12 @@ typedef void* sockopt_arg_type; 102 | typedef char* sockopt_arg_type; 103 | #endif 104 | 105 | +#if defined(__linux__)
Perhaps add a comment here listen the reason for excluding every platform not included here, so it is easy to re-asses later.
1346 | + 1347 | + if (poll(vpollfds.data(), vpollfds.size(), SELECT_TIMEOUT_MILLISECONDS) < 0) return; 1348 | + 1349 | + if (interruptNet) return; 1350 | + 1351 | + for (struct pollfd pollfd : vpollfds) {
net.cpp: In member function ‘void CConnman::SocketEvents(std::set<unsigned int>&, std::set<unsigned int>&, std::set<unsigned int>&)’:
net.cpp:1351:10: warning: types may not be defined in a for-range-declaration
for (struct pollfd pollfd : vpollfds) {
^~~~~~
Concept ACK
You'll want to make the maxconnections limiting code in init.cpp:
nMaxConnections = std::max(std::min<int>(nMaxConnections, FD_SETSIZE - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS), 0);
not depend on FD_SETSIZE when USE_POLL is enabled (just limit to 1000 or so).
31 | @@ -32,6 +32,7 @@ 32 | 33 | #ifndef WIN32 34 | #include <arpa/inet.h> 35 | +#include <poll.h>
Should this be behind #ifdef USE_POLL instead? To match the pollfd reference below.
948 | nFD = RaiseFileDescriptorLimit(nMaxConnections + MIN_CORE_FILEDESCRIPTORS + MAX_ADDNODE_CONNECTIONS); 949 | +#ifdef USE_POLL 950 | + nMaxConnections = std::max(std::min<int>(nMaxConnections, nFD - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS), 0); 951 | +#else 952 | + nMaxConnections = std::max(std::min<int>(nMaxConnections, FD_SETSIZE - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS), 0); 953 | +#endif
nit: would be nice to avoid the duplication between these lines, something like:
#ifdef USE_POLL
int fd_max = nFD;
#else
int fd_max = FD_SETSIZE;
#endif
nMaxConnections = std::max(std::min<int>(nMaxConnections, fd_max - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS), 0);
1325 | +void CConnman::BuildPollfds(std::vector<struct pollfd> &vpollfds) { 1326 | + std::set<SOCKET> recv_select_set, send_select_set, error_select_set; 1327 | + if (!GenerateSelectSet(recv_select_set, send_select_set, error_select_set)) { 1328 | + interruptNet.sleep_for(std::chrono::milliseconds(SELECT_TIMEOUT_MILLISECONDS)); 1329 | + return; 1330 | + }
If this fails, we probs don't want to call poll in SocketEvents, right? Seems better to return false and handle the error in the caller.
1320 | - &fdsetRecv, &fdsetSend, &fdsetError, &timeout); 1321 | + return !recv_set.empty() || !send_set.empty() || !error_set.empty(); 1322 | +} 1323 | + 1324 | +#ifdef USE_POLL 1325 | +void CConnman::BuildPollfds(std::vector<struct pollfd> &vpollfds) {
This could overload GenerateSelectSet, maybe under a different name - e.g. GenerateEventSet
This is really just temporary to ensure there's no performance issues, the last commit will probably be removed actually.
1429 | + if (!interruptNet.sleep_for(std::chrono::milliseconds(SELECT_TIMEOUT_MILLISECONDS))) 1430 | return; 1431 | } 1432 | 1433 | + for (SOCKET hSocket : recv_select_set) { 1434 | + if(FD_ISSET(hSocket, &fdsetRecv)) {
There are a few whitespace issues, could do with running clang-format.
This separates the socket event collection logic from the logic
deciding which events we're interested in at all.
This separates the select() logic from the socket handling logic, setting up
for a switch to poll().
With 87 connections the effect of generating the pollfd list shows up on a profile but isn't significant.
The poll() implementation is on the order of 1000 ns slower per call than the select() implementation.
I don't think that's significant.
1350 | + vpollfds.reserve(pollfds.size()); 1351 | + for (auto it : pollfds) { 1352 | + vpollfds.push_back(std::move(it.second)); 1353 | + } 1354 | + 1355 | + if (poll(vpollfds.data(), vpollfds.size(), SELECT_TIMEOUT_MILLISECONDS) < 0) return;
(non-blocking) Wondering if it's worth logging a negative error code here, eg
diff --git a/src/net.cpp b/src/net.cpp
index 7b8b6e5ea2..3b05aad6d1 100644
--- a/src/net.cpp
+++ b/src/net.cpp
@@ -1344,7 +1344,10 @@ void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_s
vpollfds.push_back(std::move(it.second));
}
- if (poll(vpollfds.data(), vpollfds.size(), SELECT_TIMEOUT_MILLISECONDS) < 0) return;
+ if (poll(vpollfds.data(), vpollfds.size(), SELECT_TIMEOUT_MILLISECONDS) < 0) {
+ LogPrint(BCLog::NET, "Poll failed (errno %d)\n", WSAGetLastError());
+ return;
+ }
if (interruptNet) return;
It might be, but I'd like to cleanup the error logging more in the future, the select() code randomly spams the log with EINTR errors as well, so I'd say it's a separate issue.
1361 | + if (pollfd_entry.revents & POLLOUT) send_set.insert(pollfd_entry.fd); 1362 | + if (pollfd_entry.revents & (POLLERR|POLLHUP)) error_set.insert(pollfd_entry.fd); 1363 | + } 1364 | +} 1365 | +#else 1366 | +void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set)
(non-blocking)
diff --git a/src/net.cpp b/src/net.cpp
index 7b8b6e5ea2..03d165d701 100644
--- a/src/net.cpp
+++ b/src/net.cpp
@@ -1355,6 +1355,7 @@ void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_s
}
}
#else
+// Use select() on platforms where poll() is unavailable.
void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set)
{
std::set<SOCKET> recv_select_set, send_select_set, error_select_set;
272 | + struct pollfd pollfd = {}; 273 | + pollfd.fd = hSocket; 274 | + pollfd.events = POLLIN | POLLOUT; 275 | + int nRet = poll(&pollfd, 1, std::min(endTime - curTime, maxWait)); 276 | +#else 277 | struct timeval tval = MillisToTimeval(std::min(endTime - curTime, maxWait));
(non-blocking) std::min(endTime - curTime, maxWait) could be deduplicated.
agreed
278 | fd_set fdset;
279 | FD_ZERO(&fdset);
280 | FD_SET(hSocket, &fdset);
281 | int nRet = select(hSocket + 1, &fdset, nullptr, nullptr, &tval);
282 | +#endif
283 | if (nRet == SOCKET_ERROR) {
(non-blocking) This might be academic, but since this clause is handling both select() and poll() might be safer to say if (nRet < 0)? Though right now there's no difference in the return value for both APIs (and I doubt that'll ever change...).
utACK https://github.com/bitcoin/bitcoin/pull/14336/commits/20c9601e6b7c4297f39de149d26292c368b63ff8
Code looks good, feel free to ignore the small comments. I'm running tests at the moment and will report back with results within the next day - testnet happy-path behavior looks good so far.
This eliminates the restriction on maximum socket descriptor number.
Tested ACK https://github.com/bitcoin/bitcoin/pull/14336/commits/4927bf2f257ac53569978980eaf1f61c2c6b04cc
Tested on Linux 4.15.0-39-generic and macOS 10.12.1. Tested IBD on testnet as well as tweaking -maxconnections and observing connections with lsof -a -itcp -p $(pidof bitcoind).
Only code change since my last utACK is a small cleanup in netbase.cpp.
Thanks for the work here @pstratem.
utACK 4927bf2f257ac53569978980eaf1f61c2c6b04cc
Crashes on android 7 8 and 9 armv7a. May need to implement a work around