net: implement poll #14336

pull pstratem wants to merge 5 commits into bitcoin:master from pstratem:2018-09-24-socket-handler-poll changing 5 files +172 −43
  1. pstratem commented at 10:37 pm on September 26, 2018: contributor

    Implement poll() on systems which support it properly.

    This eliminates the restriction on maximum socket descriptor number.

  2. fanquake added the label P2P on Sep 26, 2018
  3. pstratem force-pushed on Sep 27, 2018
  4. pstratem force-pushed on Sep 27, 2018
  5. pstratem force-pushed on Sep 27, 2018
  6. DrahtBot commented at 2:37 am on September 27, 2018: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    No conflicts as of last run.

  7. pstratem force-pushed on Sep 27, 2018
  8. pstratem force-pushed on Sep 27, 2018
  9. pstratem force-pushed on Sep 27, 2018
  10. pstratem force-pushed on Sep 28, 2018
  11. pstratem force-pushed on Sep 28, 2018
  12. pstratem force-pushed on Sep 29, 2018
  13. pstratem commented at 0:51 am on September 29, 2018: contributor
    It seems that lots of the functional tests have races cause this pull request keeps triggering random seeming failures.
  14. in src/net.cpp:1392 in 7fd976b2f0 outdated
    1517+    if (interruptNet)
    1518+        return;
    1519+
    1520+    std::vector<struct pollfd> pollfds;
    1521+    struct pollfd pollfd;
    1522+    memset(&pollfd, 0, sizeof(struct pollfd));
    


    practicalswift commented at 7:18 am on September 29, 2018:
    struct pollfd pollfd = {} instead of memset?

    pstratem commented at 3:50 am on November 11, 2018:
    done
  15. in src/net.cpp:1470 in 7fd976b2f0 outdated
    1605-            for (CNode* pnode : vNodesCopy)
    1606-                pnode->AddRef();
    1607+            LOCK(pnode->cs_hSocket);
    1608+            if (pnode->hSocket == INVALID_SOCKET)
    1609+                continue;
    1610+            recvSet = recv_set.count(pnode->hSocket);
    


    practicalswift commented at 7:20 am on September 29, 2018:
    recvSet = recv_set.count(pnode->hSocket) > 0 to avoid implicit conversion from unsigned long to bool. Applies also to the following to lines. Explicit is better than implicit :-)

    pstratem commented at 3:50 am on November 11, 2018:
    done
  16. in src/net.cpp:1382 in 7fd976b2f0 outdated
    1507+void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set)
    1508+{
    1509+    std::set<SOCKET> recv_select_set, send_select_set, error_select_set;
    1510+    GenerateSelectSet(recv_select_set, send_select_set, error_select_set);
    1511+
    1512+    if (recv_select_set.empty() && send_select_set.empty() and error_select_set.empty()) {
    


    practicalswift commented at 7:22 am on September 29, 2018:
    Use && instead of and :-)

    pstratem commented at 3:50 am on November 11, 2018:
    done
  17. in src/net.cpp:1424 in 7fd976b2f0 outdated
    1540+    }
    1541+
    1542+    if(poll(&pollfds[0], pollfds.size(), 50) < 0)
    1543+        LogPrint(BCLog::NET, "poll failure %d %s\n", errno, strerror(errno));
    1544+
    1545+    for (struct pollfd pollfd : pollfds) {
    


    practicalswift commented at 7:23 am on September 29, 2018:
    Use another variable name: pollfd shadows the already existing local variable with the same name :-)

    pstratem commented at 3:50 am on November 11, 2018:
    done
  18. in src/net.cpp:1326 in d9ae35592c outdated
    1346@@ -1347,8 +1347,6 @@ void CConnman::SocketHandler()
    1347 
    1348     int nSelect = select(have_fds ? hSocketMax + 1 : 0,
    1349                          &fdsetRecv, &fdsetSend, &fdsetError, &timeout);
    1350-    if (interruptNet)
    


    jimpo commented at 11:16 pm on September 29, 2018:

    Move this into SocketHandler after the SocketEvents call? Or this method cool return a bool, true if it completes, false if it is interrupted.

    NOTE: this was fixed in the commit maybe fix things?

  19. in src/net.cpp:1393 in 5ab601601d outdated
    1380+void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set)
    1381+{
    1382+    std::set<SOCKET> recv_select_set, send_select_set, error_select_set;
    1383+    GenerateSelectSet(recv_select_set, send_select_set, error_select_set);
    1384+
    1385+    std::vector<struct pollfd> pollfds;
    


    jimpo commented at 11:25 pm on September 29, 2018:
    pollfds.reserve(recv_select_set.size() + send_select_set.size() + error_select_set.size())
  20. in src/net.cpp:1424 in 5ab601601d outdated
    1402+        pollfd.fd = hSocket;
    1403+        pollfd.events = POLLERR|POLLHUP;
    1404+        pollfds.push_back(pollfd);
    1405+    }
    1406+
    1407+    if(poll(&pollfds[0], pollfds.size(), 50) < 0)
    


    jimpo commented at 11:26 pm on September 29, 2018:
    nit: vectors now have .data(), which returns &pollfds[0] in a more object-oriented way.

    jimpo commented at 11:31 pm on September 29, 2018:
    nit: Make the 50 a named constant along with timeout.tv_usec in the select impl.
  21. in src/net.cpp:1331 in 7fd976b2f0 outdated
    1327 
    1328     std::set<SOCKET> recv_select_set, send_select_set, error_select_set;
    1329     GenerateSelectSet(recv_select_set, send_select_set, error_select_set);
    1330 
    1331+    if (recv_select_set.empty() && send_select_set.empty() && error_select_set.empty()) {
    1332+        interruptNet.sleep_for(std::chrono::milliseconds(50));
    


    jimpo commented at 11:34 pm on September 29, 2018:
    nit: Replace 50 with timeout.tv_usec/1000
  22. in src/net.cpp:1406 in 7fd976b2f0 outdated
    1365-                FD_SET(i, &fdsetRecv);
    1366-        }
    1367+        int nErr = WSAGetLastError();
    1368+        LogPrintf("socket select error %s\n", NetworkErrorString(nErr));
    1369+        for (unsigned int i = 0; i <= hSocketMax; i++)
    1370+            FD_SET(i, &fdsetRecv);
    


    jimpo commented at 11:37 pm on September 29, 2018:
    I see this is the same as the code currently, but it seems like this should be fdsetError instead of fdsetRecv, even though they are handled exactly the same.

    pstratem commented at 5:35 pm on October 1, 2018:
    I don’t think it should be there at all really. select() failures dont mean the sockets themselves have failed, but i guess it could mean we made a mistake and called select() with a closed socket?

    jimpo commented at 9:53 pm on October 2, 2018:
    Yeah, then iterating through each one and calling recv would identify which peer to disconnect. Though, if it’s fdsetError and not fdsetRecv, it probably shouldn’t disconnect when nBytes == 0.
  23. jimpo commented at 11:39 pm on September 29, 2018: contributor
    Concept ACK
  24. pstratem force-pushed on Oct 1, 2018
  25. pstratem force-pushed on Oct 1, 2018
  26. pstratem force-pushed on Oct 1, 2018
  27. laanwj commented at 7:37 pm on October 2, 2018: member
    Concept ACK, will review in detail when my brain works again.
  28. pstratem force-pushed on Oct 2, 2018
  29. theuni commented at 8:19 pm on October 2, 2018: member
    Concept ACK. Will also review shortly.
  30. pstratem force-pushed on Oct 16, 2018
  31. pstratem force-pushed on Oct 16, 2018
  32. in src/net.cpp:1386 in 5f99a7861c outdated
    1377@@ -1377,6 +1378,58 @@ void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_s
    1378         if(FD_ISSET(hSocket, &fdsetError))
    1379             error_set.insert(hSocket);
    1380 }
    1381+#else
    1382+void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set)
    1383+{
    1384+    std::set<SOCKET> recv_select_set, send_select_set, error_select_set;
    1385+    GenerateSelectSet(recv_select_set, send_select_set, error_select_set);
    


    jamesob commented at 8:10 pm on October 16, 2018:
    Could be worth having GenerateSelectSet return a bool that indicates whether or not the sets have any content so that we can avoid the .empty() checks below (and the repetition necessary for the legacy select impl).
  33. in src/net.cpp:1413 in 5f99a7861c outdated
    1407+            pollfds.push_back(pollfd);
    1408+        }
    1409+
    1410+        for (SOCKET hSocket : error_select_set) {
    1411+            pollfd.fd = hSocket;
    1412+            pollfd.events = POLLERR|POLLHUP;
    


    jamesob commented at 8:13 pm on October 16, 2018:
    This is ignored and reported on revents in any case, no? I guess no harm in being explicit.

    pstratem commented at 11:55 pm on October 18, 2018:
    I’m reusing the struct pollfd so it’s necessary to overwrite the previous value.
  34. in src/net.cpp:1417 in 5f99a7861c outdated
    1412+            pollfd.events = POLLERR|POLLHUP;
    1413+            pollfds.push_back(pollfd);
    1414+        }
    1415+    }
    1416+
    1417+    if(poll(pollfds.data(), pollfds.size(), 50) < 0)
    


    jamesob commented at 8:19 pm on October 16, 2018:
    Should we make use of SELECT_TIMEOUT_MILLISECONDS here?

    jamesob commented at 8:19 pm on October 16, 2018:
    nit: space after if + braces

    pstratem commented at 11:55 pm on October 18, 2018:
    yeah
  35. in src/net.cpp:1311 in 5f99a7861c outdated
    1307@@ -1308,6 +1308,7 @@ void CConnman::GenerateSelectSet(std::set<SOCKET> &recv_set, std::set<SOCKET> &s
    1308     }
    1309 }
    1310 
    1311+#ifdef WIN32
    


    jamesob commented at 8:26 pm on October 16, 2018:
    Apparently poll has been flaky on previous versions of MacOS. Do we want to fall back to select on Darwin systems just to be safe?

    pstratem commented at 11:53 pm on October 18, 2018:
    I’m avoiding using poll() or select() to sleep specifically because of this class of bug.
  36. jamesob commented at 8:28 pm on October 16, 2018: member
    Some minor comments and a question about conservatively falling back to select for MacOS systems. Code looks good!
  37. in src/net.cpp:1313 in 84fae8d4cc outdated
    1312             }
    1313         }
    1314     }
    1315+}
    1316+
    1317+#ifndef HAVE_POLL
    


    jamesob commented at 8:29 pm on October 16, 2018:
    (from a previous comment on the outdated #ifdef WIN32) Apparently poll has been flaky on previous versions of MacOS. Do we want to fall back to select on Darwin systems just to be safe?

    pstratem commented at 7:53 pm on October 20, 2018:
    i’m specifically avoiding calling poll (or select) with nothing to listen for because of that class of bugs, i’ll add a comment
  38. pstratem renamed this:
    net: implement poll
    wip: net: implement poll
    on Oct 20, 2018
  39. pstratem force-pushed on Oct 25, 2018
  40. pstratem force-pushed on Oct 29, 2018
  41. pstratem force-pushed on Oct 30, 2018
  42. pstratem force-pushed on Oct 30, 2018
  43. pstratem force-pushed on Oct 30, 2018
  44. pstratem force-pushed on Oct 31, 2018
  45. pstratem force-pushed on Oct 31, 2018
  46. pstratem force-pushed on Oct 31, 2018
  47. pstratem force-pushed on Oct 31, 2018
  48. pstratem force-pushed on Oct 31, 2018
  49. pstratem force-pushed on Oct 31, 2018
  50. pstratem force-pushed on Oct 31, 2018
  51. pstratem force-pushed on Oct 31, 2018
  52. pstratem force-pushed on Oct 31, 2018
  53. pstratem renamed this:
    wip: net: implement poll
    net: implement poll
    on Oct 31, 2018
  54. in src/net.cpp:1371 in ec652e12ee outdated
    1379+    //
    1380+    // Find which sockets have data to receive
    1381+    //
    1382+    struct timeval timeout;
    1383+    timeout.tv_sec  = 0;
    1384+    timeout.tv_usec = SELECT_TIMEOUT_MILLISECONDS * 1000; // frequency to poll pnode->vSend
    


    laanwj commented at 12:32 pm on November 1, 2018:
    don’t know how long the old code waited, but isn’t 50 milliseconds a bit short? if there are no events, why not sleep as long as possible? (saving CPU cycles) can you please add a comment explaining the reasoning for this specific value

    pstratem commented at 5:52 pm on November 1, 2018:

    I carried over the existing value.

    There are two reasons for this to be a short value:

    • we cant modify the list of sockets while waiting, so the sleep time is potentially the delay before a new connection is serviced at all
    • the disconnectnodes and inactivitychecks are time based and independent of socketevents

    pstratem commented at 5:57 pm on November 1, 2018:
    i just checked and this is the value from satoshi, so im sure it was entirely arbitrarily chosen
  55. in src/net.cpp:1269 in ec652e12ee outdated
    1263@@ -1262,28 +1264,10 @@ void CConnman::InactivityCheck(CNode *pnode)
    1264     }
    1265 }
    1266 
    1267-void CConnman::SocketHandler()
    1268+bool CConnman::GenerateSelectSet(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set)
    


    laanwj commented at 12:52 pm on November 1, 2018:
    I guess we want these sets to be sorted by fd, or at least predictably sorted, as it’s iterated over–so using set over unordered_set makes sense

    pstratem commented at 5:55 pm on November 1, 2018:
    regardless of iterating i cant imagine we’d see any change in a benchmark here

    sipa commented at 7:35 pm on November 10, 2018:
    It’s probably slightly more efficient to return sorted vectors, and use std::binary_search to look up things in it (as std::map needs one allocation per element), but agree that it is unlikely to matter here.
  56. laanwj commented at 1:10 pm on November 1, 2018: member
    some minor comments but otherwise utACK from me
  57. pstratem force-pushed on Nov 1, 2018
  58. sipa added this to the "Blockers" column in a project

  59. pstratem force-pushed on Nov 5, 2018
  60. in src/compat.h:105 in efb25c0a31 outdated
    101@@ -102,12 +102,12 @@ typedef void* sockopt_arg_type;
    102 typedef char* sockopt_arg_type;
    103 #endif
    104 
    105+#if not defined WIN32
    


    ken2812221 commented at 1:09 am on November 6, 2018:
    nit: MSVC does not like this. I would suggest to use #ifndef

    laanwj commented at 7:02 am on November 6, 2018:

    I think the official syntax, which works with any C++ compiler, is

    0#if !defined(WIN32)
    
  61. sdaftuar commented at 6:25 pm on November 6, 2018: member

    I haven’t reviewed the code at all, but this doesn’t seem to work on my mac (macos 10.13.3). It looks like I can’t maintain a connection to any peers, here’s an example of debug.log output when I grep for a single peer, to show an example:

    02018-11-06T18:20:54Z Added connection peer=1788
    12018-11-06T18:20:54Z sending version (103 bytes) peer=1788
    22018-11-06T18:20:54Z send version message: version 70015, blocks=549011, us=[::]:0, peer=1788
    32018-11-06T18:21:55Z socket no message in first 60 seconds, 0 1 from 1788
    42018-11-06T18:21:55Z disconnecting peer=1788
    52018-11-06T18:21:55Z Cleared nodestate for peer=1788
    

    This pattern is holding for all my peers: my node is making an outbound connection and then disconnecting after a minute.

  62. MarcoFalke deleted a comment on Nov 6, 2018
  63. pstratem force-pushed on Nov 6, 2018
  64. pstratem force-pushed on Nov 6, 2018
  65. pstratem commented at 11:25 pm on November 6, 2018: contributor

    @sdaftuar the last commit insures that there is a single struct pollfd per fd and adds significantly more logging

    can you give that a try, i dont have an os x system

  66. MarcoFalke added the label Needs gitian build on Nov 6, 2018
  67. sdaftuar commented at 1:54 am on November 7, 2018: member
    @pstratem New version seems to be working, I’ll let it run overnight and report back.
  68. DrahtBot commented at 5:51 pm on November 7, 2018: member

    Gitian builds for commit d864e45730be82879abe9c096c4d577975fdda7d (master):

    Gitian builds for commit 66701dc4abf7c110b7048ddd5249225f8b406ac2 (master and this pull):

  69. DrahtBot removed the label Needs gitian build on Nov 7, 2018
  70. pstratem commented at 8:48 pm on November 7, 2018: contributor
    @sdaftuar that’s interesting, seems there’s a bug in the os x implementation of poll regarding the same fd having multiple pollfd entries
  71. pstratem force-pushed on Nov 7, 2018
  72. pstratem force-pushed on Nov 8, 2018
  73. in src/net.cpp:1341 in 44f9c1ab13 outdated
    1344+        pollfds[hSocket].fd = hSocket;
    1345+        // These flags are ignored, but we set them for clarity
    1346+        pollfds[hSocket].events |= POLLERR|POLLHUP;
    1347+    }
    1348+
    1349+    std::vector<struct pollfd> vpollfds;
    


    jamesob commented at 2:42 am on November 9, 2018:
    vpollfds.reserve(pollfds.size())?

    pstratem commented at 2:45 am on November 9, 2018:
    i believe std::map::size is o(n)

    sipa commented at 2:46 am on November 9, 2018:

    pstratem commented at 6:48 pm on November 9, 2018:
    ok changing now
  74. in src/net.cpp:1343 in 44f9c1ab13 outdated
    1346+        pollfds[hSocket].events |= POLLERR|POLLHUP;
    1347+    }
    1348+
    1349+    std::vector<struct pollfd> vpollfds;
    1350+    for(auto it : pollfds)
    1351+        vpollfds.push_back(it.second);
    


    jamesob commented at 2:42 am on November 9, 2018:
    .push_back(std::move(it.second))?
  75. in src/net.cpp:1343 in 44f9c1ab13 outdated
    1345+        // These flags are ignored, but we set them for clarity
    1346+        pollfds[hSocket].events |= POLLERR|POLLHUP;
    1347+    }
    1348+
    1349+    std::vector<struct pollfd> vpollfds;
    1350+    for(auto it : pollfds)
    


    practicalswift commented at 8:33 am on November 9, 2018:
    Could use std::transform?

    jamesob commented at 3:21 pm on November 9, 2018:
    Transform looks overly complicated for this: https://stackoverflow.com/a/771482
  76. jamesob commented at 2:58 pm on November 9, 2018: member

    I’ve been trying to figure out why an earlier iteration of this PR fails on macOS (Darwin). The earlier version passes in a separate pollfd struct for each (fd, event type) pair, potentially having duplicate entries for file descriptors listed in multiple *_select_set sets. The current version deduplicates pollfd structs across file descriptors, passing a single struct per fd which consolidates all events we’re interested in.

    I did a little reading yesterday to try and figure out why the first approach works on some BSD platforms (per @sdaftuar’s testing on FreeBSD), but not macOS. Bear with me while we briefly venture into the weeds.

    Darwin’s poll() implementation basically wraps kqueue use; for each pollfd struct passed into Darwin’s poll(), 3 separate kevent_register() calls are made and then waited upon using kqueue_scan() up to the timeout.

    I found an apparent bug in Darwin’s kqueue implementation: for each kevent registered, a knote is created and enqueued on a per-fd list on the process’ file descriptor table. These knotes track which events we’re waiting on and are supposed to be unique per (fd, kevent filter) pair, where the kevent filter is either EVFILT_READ (for POLLIN) or EVFILT_WRITE (for POLLOUT), per the kqueue API. However, knote_fdfind() looks to be improperly implemented so that the first knote per fd is always returned, effectively ignoring the difference in filter. As far as I can tell, this could result in a situation where output events are ignored for an fd also being polled for input events (because the POLLIN kevent_register() call comes before the POLLOUT one in poll_nocancel()).

    Unfortunately, this doesn’t explain why POLLIN events are ignored in the previous version of this change - in @sdaftuar’s logs above and in my own testing on macOS, we always disconnect peers because we think there’s nothing to read on their sockets. This isn’t explained by the supposed bug I note above, though my confidence in macOS’ implementation of poll() isn’t very high given the complexity of the kqueue implementation backing it.

    Other versions of BSD implement poll in a much simpler way, iterating through all the pollfd structs passed in and calling a filetype-specific poll function which returns revents immediately without having to deal with kevents, kqueues, or callbacks. This delegation eventually reduces to a socket-specific poll function.

    Anyway, this is a longwinded way of saying that macOS’s poll() doesn’t inspire confidence and that we should consider disabling its use on that platform. The other BSDs look okay to me.

  77. pstratem force-pushed on Nov 9, 2018
  78. pstratem commented at 7:06 pm on November 9, 2018: contributor
    I’ve modified to not use poll() on OSX. However that should probably by an automake m4 of some kind, but I’m not sure how to do that.
  79. pstratem force-pushed on Nov 10, 2018
  80. in src/net.cpp:1325 in 4963f3110e outdated
    1328+        interruptNet.sleep_for(std::chrono::milliseconds(SELECT_TIMEOUT_MILLISECONDS));
    1329+        return;
    1330+    }
    1331+
    1332+    std::map<SOCKET, struct pollfd> pollfds;
    1333+    for (SOCKET hSocket : recv_select_set) {
    


    sipa commented at 7:37 pm on November 10, 2018:
    Tiny nit: variable naming style for hSocket.
  81. in src/net.cpp:1343 in 4963f3110e outdated
    1346+        pollfds[hSocket].events |= POLLERR|POLLHUP;
    1347+    }
    1348+
    1349+    std::vector<struct pollfd> vpollfds;
    1350+    vpollfds.reserve(pollfds.size());
    1351+    for(auto it : pollfds)
    


    sipa commented at 7:37 pm on November 10, 2018:
    Tiny nit: space after for (and after if further).
  82. in src/net.cpp:1353 in 4963f3110e outdated
    1356+
    1357+    if (interruptNet)
    1358+        return;
    1359+
    1360+    for (struct pollfd pollfd : vpollfds) {
    1361+        if (pollfd.revents & POLLIN)
    


    sipa commented at 7:39 pm on November 10, 2018:
    Style nit: braces whenever the next lines are indented (or put the then clause on the same line as the if).
  83. in src/net.cpp:1413 in 4963f3110e outdated
    1428-        if (!interruptNet.sleep_for(std::chrono::milliseconds(timeout.tv_usec/1000)))
    1429+        if (!interruptNet.sleep_for(std::chrono::milliseconds(SELECT_TIMEOUT_MILLISECONDS)))
    1430             return;
    1431     }
    1432 
    1433+    for (SOCKET hSocket : recv_select_set)
    


    sipa commented at 7:42 pm on November 10, 2018:
    Style nit: braces when indenting, and spaces after if.
  84. in src/netbase.h:20 in 4963f3110e outdated
    16@@ -17,6 +17,10 @@
    17 #include <string>
    18 #include <vector>
    19 
    20+#ifndef WIN32
    


    sipa commented at 7:43 pm on November 10, 2018:
    It doesn’t seem like the current code is using poll on Windows.

    pstratem commented at 9:36 pm on November 10, 2018:
    it isnt?

    sipa commented at 9:38 pm on November 10, 2018:
    0#if defined(__linux__)
    1#define USE_POLL
    2#endif
    

    sipa commented at 9:38 pm on November 10, 2018:
    Oh, damn, I misread ifndef.

    Empact commented at 4:21 pm on November 22, 2018:
    Seems this include should be in netbase.cpp, given there’s no publicly exposed reference.
  85. in src/net.h:35 in 4963f3110e outdated
    30@@ -31,6 +31,7 @@
    31 
    32 #ifndef WIN32
    33 #include <arpa/inet.h>
    34+#include <poll.h>
    


    sipa commented at 7:44 pm on November 10, 2018:
    It doesn’t seem like the current code is using poll on Windows.

    pstratem commented at 9:36 pm on November 10, 2018:
    it isnt?
  86. pstratem force-pushed on Nov 10, 2018
  87. pstratem force-pushed on Nov 10, 2018
  88. pstratem force-pushed on Nov 10, 2018
  89. pstratem force-pushed on Nov 10, 2018
  90. pstratem force-pushed on Nov 10, 2018
  91. pstratem force-pushed on Nov 10, 2018
  92. in src/compat.h:98 in ce79907bd7 outdated
    101@@ -102,8 +102,12 @@ typedef void* sockopt_arg_type;
    102 typedef char* sockopt_arg_type;
    103 #endif
    104 
    105+#if defined(__linux__)
    


    sipa commented at 2:54 am on November 11, 2018:
    Perhaps add a comment here listen the reason for excluding every platform not included here, so it is easy to re-asses later.
  93. in src/net.cpp:1351 in ce79907bd7 outdated
    1346+
    1347+    if (poll(vpollfds.data(), vpollfds.size(), SELECT_TIMEOUT_MILLISECONDS) < 0) return;
    1348+
    1349+    if (interruptNet) return;
    1350+
    1351+    for (struct pollfd pollfd : vpollfds) {
    


    sipa commented at 2:57 am on November 11, 2018:
    0net.cpp: In member function ‘void CConnman::SocketEvents(std::set<unsigned int>&, std::set<unsigned int>&, std::set<unsigned int>&)’:
    1net.cpp:1351:10: warning: types may not be defined in a for-range-declaration
    2     for (struct pollfd pollfd : vpollfds) {
    3          ^~~~~~
    
  94. sipa commented at 3:01 am on November 11, 2018: member

    Concept ACK

    You’ll want to make the maxconnections limiting code in init.cpp:

    0nMaxConnections = std::max(std::min<int>(nMaxConnections, FD_SETSIZE - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS), 0);
    

    not depend on FD_SETSIZE when USE_POLL is enabled (just limit to 1000 or so).

  95. pstratem force-pushed on Nov 11, 2018
  96. pstratem force-pushed on Nov 11, 2018
  97. pstratem force-pushed on Nov 11, 2018
  98. pstratem force-pushed on Nov 11, 2018
  99. pstratem force-pushed on Nov 11, 2018
  100. pstratem force-pushed on Nov 11, 2018
  101. pstratem force-pushed on Nov 13, 2018
  102. pstratem force-pushed on Nov 16, 2018
  103. in src/net.h:35 in feca5fb7e4 outdated
    31@@ -32,6 +32,7 @@
    32 
    33 #ifndef WIN32
    34 #include <arpa/inet.h>
    35+#include <poll.h>
    


    Empact commented at 4:18 pm on November 22, 2018:
    Should this be behind #ifdef USE_POLL instead? To match the pollfd reference below.
  104. in src/init.cpp:960 in feca5fb7e4 outdated
    948     nFD = RaiseFileDescriptorLimit(nMaxConnections + MIN_CORE_FILEDESCRIPTORS + MAX_ADDNODE_CONNECTIONS);
    949+#ifdef USE_POLL
    950+    nMaxConnections = std::max(std::min<int>(nMaxConnections, nFD - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS), 0);
    951+#else
    952+    nMaxConnections = std::max(std::min<int>(nMaxConnections, FD_SETSIZE - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS), 0);
    953+#endif
    


    Empact commented at 4:30 pm on November 22, 2018:

    nit: would be nice to avoid the duplication between these lines, something like:

    0#ifdef USE_POLL
    1    int fd_max = nFD;
    2#else
    3    int fd_max = FD_SETSIZE;
    4#endif
    5    nMaxConnections = std::max(std::min<int>(nMaxConnections, fd_max - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS), 0);
    
  105. in src/net.cpp:1322 in feca5fb7e4 outdated
    1325+void CConnman::BuildPollfds(std::vector<struct pollfd> &vpollfds) {
    1326+    std::set<SOCKET> recv_select_set, send_select_set, error_select_set;
    1327+    if (!GenerateSelectSet(recv_select_set, send_select_set, error_select_set)) {
    1328+        interruptNet.sleep_for(std::chrono::milliseconds(SELECT_TIMEOUT_MILLISECONDS));
    1329+        return;
    1330+    }
    


    Empact commented at 4:37 pm on November 22, 2018:
    If this fails, we probs don’t want to call poll in SocketEvents, right? Seems better to return false and handle the error in the caller.
  106. in src/net.cpp:1316 in feca5fb7e4 outdated
    1320-                         &fdsetRecv, &fdsetSend, &fdsetError, &timeout);
    1321+    return !recv_set.empty() || !send_set.empty() || !error_set.empty();
    1322+}
    1323+
    1324+#ifdef USE_POLL
    1325+void CConnman::BuildPollfds(std::vector<struct pollfd> &vpollfds) {
    


    Empact commented at 4:40 pm on November 22, 2018:
    This could overload GenerateSelectSet, maybe under a different name - e.g. GenerateEventSet

    pstratem commented at 11:02 pm on November 29, 2018:
    This is really just temporary to ensure there’s no performance issues, the last commit will probably be removed actually.
  107. in src/net.cpp:1419 in feca5fb7e4 outdated
    1429+        if (!interruptNet.sleep_for(std::chrono::milliseconds(SELECT_TIMEOUT_MILLISECONDS)))
    1430             return;
    1431     }
    1432 
    1433+    for (SOCKET hSocket : recv_select_set) {
    1434+        if(FD_ISSET(hSocket, &fdsetRecv)) {
    


    Empact commented at 4:43 pm on November 22, 2018:
    There are a few whitespace issues, could do with running clang-format.
  108. pstratem force-pushed on Nov 29, 2018
  109. pstratem force-pushed on Nov 29, 2018
  110. pstratem force-pushed on Nov 29, 2018
  111. pstratem force-pushed on Nov 30, 2018
  112. Introduce and use constant SELECT_TIMEOUT_MILLISECONDS. 1e6afd0dbc
  113. Move GenerateSelectSet logic to private method.
    This separates the socket event collection logic from the logic
    deciding which events we're interested in at all.
    7e403c0ae7
  114. Move SocketEvents logic to private method.
    This separates the select() logic from the socket handling logic, setting up
    for a switch to poll().
    28211a4bc9
  115. pstratem force-pushed on Nov 30, 2018
  116. pstratem commented at 11:04 pm on November 30, 2018: contributor

    With 87 connections the effect of generating the pollfd list shows up on a profile but isn’t significant.

    The poll() implementation is on the order of 1000 ns slower per call than the select() implementation.

    I don’t think that’s significant.

  117. in src/net.cpp:1347 in 20c9601e6b outdated
    1350+    vpollfds.reserve(pollfds.size());
    1351+    for (auto it : pollfds) {
    1352+        vpollfds.push_back(std::move(it.second));
    1353+    }
    1354+
    1355+    if (poll(vpollfds.data(), vpollfds.size(), SELECT_TIMEOUT_MILLISECONDS) < 0) return;
    


    jamesob commented at 3:50 pm on December 3, 2018:

    (non-blocking) Wondering if it’s worth logging a negative error code here, eg

     0diff --git a/src/net.cpp b/src/net.cpp
     1index 7b8b6e5ea2..3b05aad6d1 100644
     2--- a/src/net.cpp
     3+++ b/src/net.cpp
     4@@ -1344,7 +1344,10 @@ void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_s
     5         vpollfds.push_back(std::move(it.second));
     6     }
     7 
     8-    if (poll(vpollfds.data(), vpollfds.size(), SELECT_TIMEOUT_MILLISECONDS) < 0) return;
     9+    if (poll(vpollfds.data(), vpollfds.size(), SELECT_TIMEOUT_MILLISECONDS) < 0) {
    10+        LogPrint(BCLog::NET, "Poll failed (errno %d)\n", WSAGetLastError());
    11+        return;
    12+    }
    13 
    14     if (interruptNet) return;
    15 
    

    pstratem commented at 5:09 pm on December 3, 2018:
    It might be, but I’d like to cleanup the error logging more in the future, the select() code randomly spams the log with EINTR errors as well, so I’d say it’s a separate issue.
  118. in src/net.cpp:1358 in 20c9601e6b outdated
    1361+        if (pollfd_entry.revents & POLLOUT)           send_set.insert(pollfd_entry.fd);
    1362+        if (pollfd_entry.revents & (POLLERR|POLLHUP)) error_set.insert(pollfd_entry.fd);
    1363+    }
    1364+}
    1365+#else
    1366+void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set)
    


    jamesob commented at 3:56 pm on December 3, 2018:

    (non-blocking)

     0diff --git a/src/net.cpp b/src/net.cpp
     1index 7b8b6e5ea2..03d165d701 100644
     2--- a/src/net.cpp
     3+++ b/src/net.cpp
     4@@ -1355,6 +1355,7 @@ void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_s
     5     }
     6 }
     7 #else
     8+// Use select() on platforms where poll() is unavailable.
     9 void CConnman::SocketEvents(std::set<SOCKET> &recv_set, std::set<SOCKET> &send_set, std::set<SOCKET> &error_set)
    10 {
    11     std::set<SOCKET> recv_select_set, send_select_set, error_select_set;
    
  119. in src/netbase.cpp:277 in 20c9601e6b outdated
    272+                struct pollfd pollfd = {};
    273+                pollfd.fd = hSocket;
    274+                pollfd.events = POLLIN | POLLOUT;
    275+                int nRet = poll(&pollfd, 1, std::min(endTime - curTime, maxWait));
    276+#else
    277                 struct timeval tval = MillisToTimeval(std::min(endTime - curTime, maxWait));
    


    jamesob commented at 4:07 pm on December 3, 2018:
    (non-blocking) std::min(endTime - curTime, maxWait) could be deduplicated.

    pstratem commented at 7:26 pm on December 3, 2018:
    agreed
  120. in src/netbase.cpp:284 in 20c9601e6b outdated
    278                 fd_set fdset;
    279                 FD_ZERO(&fdset);
    280                 FD_SET(hSocket, &fdset);
    281                 int nRet = select(hSocket + 1, &fdset, nullptr, nullptr, &tval);
    282+#endif
    283                 if (nRet == SOCKET_ERROR) {
    


    jamesob commented at 4:12 pm on December 3, 2018:
    (non-blocking) This might be academic, but since this clause is handling both select() and poll() might be safer to say if (nRet < 0)? Though right now there’s no difference in the return value for both APIs (and I doubt that’ll ever change…).
  121. jamesob approved
  122. jamesob commented at 4:18 pm on December 3, 2018: member

    utACK https://github.com/bitcoin/bitcoin/pull/14336/commits/20c9601e6b7c4297f39de149d26292c368b63ff8

    Code looks good, feel free to ignore the small comments. I’m running tests at the moment and will report back with results within the next day - testnet happy-path behavior looks good so far.

  123. Implement poll() on systems which support it properly.
    This eliminates the restriction on maximum socket descriptor number.
    11cc491a28
  124. Increase maxconnections limit when using poll. 4927bf2f25
  125. pstratem force-pushed on Dec 3, 2018
  126. jamesob approved
  127. jamesob commented at 3:38 pm on December 4, 2018: member

    Tested ACK https://github.com/bitcoin/bitcoin/pull/14336/commits/4927bf2f257ac53569978980eaf1f61c2c6b04cc

    Tested on Linux 4.15.0-39-generic and macOS 10.12.1. Tested IBD on testnet as well as tweaking -maxconnections and observing connections with lsof -a -itcp -p $(pidof bitcoind).

    Only code change since my last utACK is a small cleanup in netbase.cpp.

    Thanks for the work here @pstratem.

  128. laanwj commented at 6:41 pm on December 13, 2018: member
    utACK 4927bf2f257ac53569978980eaf1f61c2c6b04cc
  129. laanwj merged this on Jan 2, 2019
  130. laanwj closed this on Jan 2, 2019

  131. laanwj referenced this in commit 62cf608e93 on Jan 2, 2019
  132. fanquake removed this from the "Blockers" column in a project

  133. BlockMechanic commented at 9:47 am on October 12, 2019: contributor
    Crashes on android 7 8 and 9 armv7a. May need to implement a work around
  134. codablock referenced this in commit d901baebc7 on Apr 8, 2020
  135. codablock referenced this in commit 8481d6c110 on Apr 8, 2020
  136. deadalnix referenced this in commit 7ef2a7b761 on Jun 9, 2020
  137. deadalnix referenced this in commit 716ac6e899 on Jun 9, 2020
  138. deadalnix referenced this in commit 96827c2176 on Jun 9, 2020
  139. deadalnix referenced this in commit 898802efaa on Jun 9, 2020
  140. deadalnix referenced this in commit 6f41fa19a3 on Jun 10, 2020
  141. hebasto referenced this in commit ee7891a0c4 on Sep 25, 2021
  142. laanwj referenced this in commit 8b523f2e55 on Sep 27, 2021
  143. sidhujag referenced this in commit e84c360598 on Sep 27, 2021
  144. laanwj referenced this in commit dbbb7fbcc0 on Sep 30, 2021
  145. sidhujag referenced this in commit a2b232de94 on Sep 30, 2021
  146. rebroad referenced this in commit 147cd3022a on Oct 13, 2021
  147. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me