Net: Massive speedup. Net locks overhaul #9441

pull theuni wants to merge 16 commits into bitcoin:master from theuni:connman-locks changing 4 files +107 −125
  1. theuni commented at 8:07 am on December 29, 2016: member

    Depends on (includes) #9289. This is what I ultimately set out to fix with the net refactor.

    In my (short) tests, it cuts network latency to ~50%. In some cases, more like 30%.

    Test method: 1 fresh node (macbook) connected to a single other node (desktop). Testnet. Running until synced to block 10000. Results:

    0new = patched with this PR.
    1old = running [#9289](/bitcoin-bitcoin/9289/).
    2client / server
    3old / old: 100000 in 8:05
    4new / old: 100000 in 3:24
    5new / new: 100000 in 2:16
    

    The results are very reproducible, always within a few seconds. Not only is it a nice improvement for the fresh node, but it compounds when its peer is running the updated code as well.

    I had hoped to have the abstraction complete in time for 0.14, but it looks like that’s unlikely at this point. For reference, it would look something more like https://github.com/theuni/bitcoin/commit/1a6b10aea129ac363727c2d68fae809c2861e4da.

    Anyway, this is an attempt to fix the actual issue in time for 0.14, and putting off the last bit of refactor until after that. This addresses the issue observed in #9415, but also cleans up the nasty locking issues.

    I labeled this WIP because there are probably still several racy bits. Quite a bit of test-writing remains.

    See the individual commits for the details. tl;dr: We currently either process a peer’s message or read from their socket, but never both simultaneously. The changes here remove that restriction.

  2. fanquake added the label Refactoring on Dec 29, 2016
  3. fanquake added this to the milestone 0.14.0 on Dec 29, 2016
  4. fanquake added the label P2P on Dec 29, 2016
  5. in src/net_processing.cpp: in 65e916c499 outdated
    2452+    if (pfrom->fDisconnect)
    2453+        return false;
    2454 
    2455-    std::deque<CNetMessage>::iterator it = pfrom->vRecvMsg.begin();
    2456-    while (!pfrom->fDisconnect && it != pfrom->vRecvMsg.end()) {
    2457         // Don't bother if send buffer is too full to respond anyway
    


    rebroad commented at 4:37 pm on December 29, 2016:
    I’m not sure if github is messing with the formatting, but this line looks unnecessarily indented.

    theuni commented at 4:47 pm on December 29, 2016:
    Yes, the indentation was just kept to avoid creating a huge whitespace diff. This can be cleaned up in a move-only commit as a next step.
  6. theuni force-pushed on Dec 29, 2016
  7. theuni commented at 2:09 am on December 31, 2016: member
    @sipa I’ve spent all day on changing this around some and breaking this up into logical commits in order to satisfy myself (and i hope others) that it’s safe. I’ll push up a fresh version in a few min, and drop the WIP tag.
  8. theuni force-pushed on Dec 31, 2016
  9. theuni renamed this:
    WIP: Massive net speedup. Net locks overhaul.
    Net: Massive speedup. Net locks overhaul
    on Dec 31, 2016
  10. theuni commented at 4:39 am on December 31, 2016: member

    Updated. Don’t let the number of commits scare you off, the diffstat isn’t too bad.

    I attempted to break this up into small/simple commits, in order to slowly remove the need for cs_vRecvMsg. The last commit actually removes it.

    I’m satisfied that this shouldn’t be introducing any new races, as only a handful of vars were actually touched on the SocketHandler thread.

    Edit: time-ordered.

  11. theuni force-pushed on Dec 31, 2016
  12. in src/net.cpp: in 3ab40f619d outdated
    1242@@ -1243,9 +1243,10 @@ void CConnman::ThreadSocketHandler()
    1243                             bool notify = false;
    1244                             if (!pnode->ReceiveMsgBytes(pchBuf, nBytes, notify))
    1245                                 pnode->CloseSocketDisconnect();
    1246+                            RecordBytesRecv(nBytes);
    1247                             if(notify)
    1248                                 condMsgProc.notify_one();
    1249-                            RecordBytesRecv(nBytes);
    1250+                            }
    


    theuni commented at 7:39 am on December 31, 2016:
    Whoops, rebase gone bad here. It ends up right in the end. Will fix.
  13. in src/net_processing.cpp: in 9959988414 outdated
    2449-    if (!pfrom->vRecvGetData.empty()) return fOk;
    2450+    if (!pfrom->vRecvGetData.empty()) return true;
    2451 
    2452     auto it = pfrom->vRecvMsg.begin();
    2453-    while (!pfrom->fDisconnect && it != pfrom->vRecvMsg.end()) {
    2454+    if (!pfrom->fDisconnect && it != pfrom->vRecvMsg.end()) {
    


    theuni commented at 7:57 am on December 31, 2016:

    Need to clarify in the commit message: It was almost always the case that only one message was processed in this while loop. See the break at line 2538.

    This commit changes the behavior so that it’s always one message processed per-loop. The only messages that show different behavior are the ones that used to “continue” the loop. I think it’s perfectly reasonable to skip to the next node for processing in that case.


    sipa commented at 8:55 pm on January 5, 2017:
    Can you clarify what the rationale is for switching from a ProcessMessages that processes multiple messages to just a single one? I’m not opposed to the change, but I’d like to understand what the reason for the change is.

    TheBlueMatt commented at 9:10 pm on January 5, 2017:
    Point is this is not a change. See #9441 (comment)
  14. dcousens commented at 0:18 am on January 3, 2017: contributor
    @theuni is this worth testing now?
  15. theuni commented at 4:16 am on January 3, 2017: member
    @dcousens Very much so!
  16. in src/net.cpp: in b1b6772240 outdated
    680+    }
    681+
    682+    int64_t nTime = GetTimeMicros();
    683+    LOCK(cs_vRecvMsg);
    684+    nRecvBytes += nBytes;
    685+    nLastRecv = nTime * 1000;
    


    TheBlueMatt commented at 10:48 am on January 3, 2017:
    I think you meant /, not * here?

    theuni commented at 3:11 pm on January 3, 2017:
    Heh, sure did
  17. in src/net.cpp: in b1b6772240 outdated
    679+            complete = true;
    680+    }
    681+
    682+    int64_t nTime = GetTimeMicros();
    683+    LOCK(cs_vRecvMsg);
    684+    nRecvBytes += nBytes;
    


    TheBlueMatt commented at 10:51 am on January 3, 2017:
    It might be nice in the RPC for mapRecvBytesPerMsgCmd to add up to nRecvBytes (ie to increment this in the loop per-msg instead of after receiving (which would also mean no api change)).

    theuni commented at 3:13 pm on January 3, 2017:
    Sure, that just means more locking. Though this is completely uncontested now, so that’s not really a problem.
  18. TheBlueMatt commented at 11:11 am on January 3, 2017: member

    See IRC discussion:

     0<BlueMatt> cfields: hmmm, regarding [#9441](/bitcoin-bitcoin/9441/), do we really want to run the entire ProcessMessages loop (calling SendMessages potentially umpteen times) just to process multiple messages from the same node?
     1<BlueMatt> (ie remove the loop inside ProcessMessages and move it to ThreadProcessMessages)
     2<sipa> BlueMatt: i was wondering about that too
     3<sipa> BlueMatt: but there is hardly any (contentious) locking going on anymore in between
     4<BlueMatt> yea, but SendMessages.....
     5<sipa> Ah.
     6<sipa> i hadn't considered that
     7<sipa> that defeats the purpose
     8<sipa> hmm, i was about to say that it negatively impacts batching of invs and addrs
     9<sipa> but since we have explicit random delays for those, i don't think that's really an issue anymore
    10<BlueMatt> yea, I dont think it breaks anything
    11<BlueMatt> just repeatedly calls SendMessages to do nothing
    12<sipa> oh, it won't break anything
    
  19. theuni commented at 3:19 pm on January 3, 2017: member

    @TheBlueMatt If I understand you correctly, as mentioned in the summary, that’s the part that was cut out here for simplicity. The next set of changes would combine the loops and move them into net_processing. See here for how it looked in a previous iteration: https://github.com/theuni/bitcoin/commit/1a6b10aea129ac363727c2d68fae809c2861e4da#diff-eff7adeaec73a769788bb78858815c91R271

    I’d be happy to add that here, but I’m afraid it adds a significant review burden to this PR.

  20. theuni commented at 4:15 pm on January 3, 2017: member

    Also from IRC:

    0<BlueMatt> I assume you didnt touch cs_vSend due to https://github.com/bitcoin/bitcoin/pull/9419/commits/c214d120a363a05ba9afdccff6b4bda6e29ae7c4, cfields?
    1<cfields> BlueMatt: yes, cs_vSend wasn't touched because of your PR. I've just been operating under the assumption that the lock around SendMessages will be removed by one PR or another.
    
  21. theuni commented at 6:27 pm on January 3, 2017: member

    Digging further into the IRC conversation above, I’d like to clarify that the interaction here between ProcessMessages and SendMessages is not changed substantially in this PR. @TheBlueMatt had missed a subtle part of the current behavior, and maybe @sipa too, so I’d just like to explicitly point out the current break here: https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L2543, added in #3180

    For the sake of processing in a round-robin fashion, we currently end up running through SendMessages multiple times while draining a node’s received message queue. That behavior is far from ideal, and needs to be addressed, but is not changed here*. I have plans to work on this in next steps, but not for 0.14.

    *It’s only changed in the case of an invalid header, or bad message checksum.

  22. gmaxwell commented at 3:15 am on January 4, 2017: contributor

    cfields: hmmm, regarding #9441, do we really want to run the entire ProcessMessages loop (calling SendMessages potentially umpteen times) just to process multiple messages from the same node? (ie remove the loop inside ProcessMessages and move it to ThreadProcessMessages)

    Clearly people weren’t reading PRs on this or you would have noticed that it only processes one at a time right now. I actually fixed it to process as many as it could until it encountered lock contention, then specifically pinged people for input on the effect. After no response, I unfixed it because of concern about the latency impact. Handling multiple messages at a time is good, if they’re fast ones…

  23. sipa commented at 3:21 am on January 4, 2017: member
    I certainly noticed it only processed one message at a time. I missed that it calls SendMessages between each.
  24. net: fix typo causing the wrong receive buffer size
    Surprisingly this hasn't been causing me any issues while testing, probably
    because it requires lots of large blocks to be flying around.
    
    Send/Recv corks need tests!
    53ad9a133a
  25. net: make vRecvMsg a list so that we can use splice() e5bcd9c84f
  26. net: make GetReceiveFloodSize public
    This will be needed so that the message processor can cork incoming messages
    5b4a8ac6d6
  27. net: only disconnect if fDisconnect has been set
    These conditions are problematic to check without locking, and we shouldn't be
    relying on the refcount to disconnect.
    f6315e07f9
  28. net: wait until the node is destroyed to delete its recv buffer
    when vRecvMsg becomes a private buffer, it won't make sense to allow other
    threads to mess with it anymore.
    60425870d7
  29. sipa commented at 2:42 pm on January 4, 2017: member
    Needs rebase.
  30. net: remove redundant max sendbuffer size check
    This is left-over from before there was proper accounting. Hitting 2x the
    sendbuffer size should not be possible.
    0e973d970a
  31. net: set message deserialization version when it's actually time to deserialize
    We'll soon no longer have access to vRecvMsg, and this is more intuitive anyway.
    56212e20ac
  32. theuni force-pushed on Jan 4, 2017
  33. theuni commented at 3:25 pm on January 4, 2017: member

    Rebased, and I believe I addressed all comments so far.

    I did a full mainnet sync last night with these (pre-rebase) changes, with no issues. I had extra logging to observe the recv pausing behavior, and verified that it’s working as intended. It is paused quite frequently during IBD. Obviously that’s intended, as it signals for the peer to slow down, but it may be worth discussing the consequences of a higher (or lower!) default for -maxreceivebuffer now that our recv is no longer blocked by processing. Out of scope for this PR, though.

  34. in src/net.cpp: in f6315e07f9 outdated
    1050@@ -1051,8 +1051,7 @@ void CConnman::ThreadSocketHandler()
    1051             std::vector<CNode*> vNodesCopy = vNodes;
    1052             BOOST_FOREACH(CNode* pnode, vNodesCopy)
    1053             {
    1054-                if (pnode->fDisconnect ||
    1055-                    (pnode->GetRefCount() <= 0 && pnode->vRecvMsg.empty() && pnode->nSendSize == 0))
    


    TheBlueMatt commented at 9:38 pm on January 4, 2017:
    Is it just me, or were these always impossible to hit? Both ConnectNode and AcceptConnection do a AddRef() which is only countered by the Release a few lines further down here.

    TheBlueMatt commented at 9:39 pm on January 4, 2017:
    (obviously not an issue, just checking my logic).
  35. theuni commented at 9:47 pm on January 4, 2017: member
    I believe you’re correct, that’s my reading of it as well. Regardless, I wasn’t able to track down a path where fDisconnect wouldn’t be set as necessary.
  36. in src/net.cpp: in 1c779a8cfe outdated
    648 bool CNode::ReceiveMsgBytes(const char *pch, unsigned int nBytes, bool& complete)
    649 {
    650+    LOCK(cs_vRecvMsg);
    651     complete = false;
    652+    int64_t nTime = GetTimeMicros();
    653+    nLastRecv = nTime / 1000;
    


    TheBlueMatt commented at 10:03 pm on January 4, 2017:
    I think you actually meant / 1000 / 1000 :p

    theuni commented at 8:00 pm on January 5, 2017:
    I completely missed that this was a full-second value. Thanks for catching this (twice!). Will fix.
  37. in src/net.cpp: in 4157c99cd8 outdated
    1877@@ -1882,10 +1878,9 @@ void CConnman::ThreadMessageHandler()
    1878                 pnode->Release();
    1879         }
    1880 
    1881-        if (fSleep) {
    1882-            std::unique_lock<std::mutex> lock(mutexMsgProc);
    1883-            condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100));
    1884-        }
    1885+        std::unique_lock<std::mutex> lock(mutexMsgProc);
    1886+        condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100), [this, &fMoreWork] { return fMoreWork || fMsgProcWake; });
    


    sipa commented at 8:50 pm on January 5, 2017:

    It seems that during this wait, fMoreWork will never change (and it isn’t covered by the lock either), so this line could instead be written as

    0if (!fMoreWork) {
    1    condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100), [this] { return fMsgProcWake; });
    2}
    

    theuni commented at 9:21 pm on January 5, 2017:
    Yep. Will fix.
  38. in src/net.cpp: in b5ea4f2199 outdated
    1238@@ -1239,8 +1239,18 @@ void CConnman::ThreadSocketHandler()
    1239                             if (!pnode->ReceiveMsgBytes(pchBuf, nBytes, notify))
    1240                                 pnode->CloseSocketDisconnect();
    1241                             RecordBytesRecv(nBytes);
    1242-                            if (notify)
    1243+                            if (notify) {
    1244+                                auto it(pnode->vRecvMsg.begin());
    1245+                                for (; it != pnode->vRecvMsg.end(); ++it) {
    


    sipa commented at 8:58 pm on January 5, 2017:

    I find it generally more readable to write something like this as

    0while (it != pnode->vRecvMsg.end() && it->complete()) ++it;
    

    Also, isn’t iterating backwards more efficient?


    theuni commented at 9:19 pm on January 5, 2017:

    Sure, will change it to a while.

    Iterating backwards would be preferable, but we end up hitting each msg anyway later (to add up message sizes) in a later commit, so i didn’t bother (see https://github.com/bitcoin/bitcoin/pull/9441/commits/a109f7bd1801392bf99994a7165d495200156c4a#diff-9a82240fe7dfe86564178691cc57f2f1R1246). The sizes are added ahead of time in order to avoid iterating the list under cs_vProcessMsg.

  39. in src/net_processing.cpp: in 4157c99cd8 outdated
    2501         CMessageHeader& hdr = msg.hdr;
    2502         if (!hdr.IsValid(chainparams.MessageStart()))
    2503         {
    2504             LogPrintf("PROCESSMESSAGE: ERRORS IN HEADER %s peer=%d\n", SanitizeString(hdr.GetCommand()), pfrom->id);
    2505-            continue;
    2506+            return fMoreWork;
    


    TheBlueMatt commented at 9:19 pm on January 5, 2017:
    This is always false here.

    theuni commented at 9:27 pm on January 5, 2017:

    This is unclear due to rebasing. In the end-result, this will return true if there are more messages.

    In this commit, there’s an interim chunk that’s missing. line 2479 should have:

    0fMoreWork = !pfrom->vRecvMsg.empty()
    

    I’ll fix it in the next rebase so that it’s more clear.


    pstratem commented at 5:03 pm on January 10, 2017:

    This should set fDisconnect and return false unconditionally actually.

    (Yeah that changes the current behavior)


    theuni commented at 7:38 pm on January 10, 2017:
    Yes, that would be a change of behavior, and I’m not entirely sure that we want it changed. Presumably this was done to avoid partitioning the network in the event that the header structure (or magic bytes) ever needs to change.
  40. in src/net.h: in a109f7bd18 outdated
    650@@ -650,6 +651,7 @@ class CNode
    651     const NodeId id;
    652 
    653     const uint64_t nKeyedNetGroup;
    654+    std::atomic_bool fPauseRecv;
    


    TheBlueMatt commented at 9:43 pm on January 5, 2017:
    This is never set to true until a later commit. I’m generally ok with this, but it is a bit strange.

    theuni commented at 9:50 pm on January 5, 2017:
    Grr, another artifact of breaking up an end-result into smaller commits. I’ll add this where it belongs in the next rebase.
  41. in src/net.cpp: in b5ea4f2199 outdated
    1238@@ -1239,8 +1239,18 @@ void CConnman::ThreadSocketHandler()
    1239                             if (!pnode->ReceiveMsgBytes(pchBuf, nBytes, notify))
    1240                                 pnode->CloseSocketDisconnect();
    1241                             RecordBytesRecv(nBytes);
    1242-                            if (notify)
    1243+                            if (notify) {
    1244+                                auto it(pnode->vRecvMsg.begin());
    


    TheBlueMatt commented at 9:50 pm on January 5, 2017:
    You’re not holding cs_vRecvMsg here, which hyou need to access vRecvMsg.

    theuni commented at 10:03 pm on January 5, 2017:
    vRecvMsg no longer needs cs_vRecvMsg starting here. It’s only ever used on this thread. In reality, cs_vRecvMsg isn’t needed at all, so it would probably be more clear to just remove it. copyStats is the only thing that accesses some of these vars, but it already operates without any locks, so the situation isn’t changed from before.

    TheBlueMatt commented at 10:14 pm on January 5, 2017:
    Hum, ok, can you just add a comment somewhere to note that cs_vRecvMsg no longer protects vRecvMsg, and instead protects other stuff.
  42. theuni commented at 3:12 pm on January 6, 2017: member

    From @sipa

    Can you clarify what the rationale is for switching from a ProcessMessages that processes multiple messages to just a single one? I’m not opposed to the change, but I’d like to understand what the reason for the change is.

    In nearly all cases in the current code, only 1 message is processed. I thought that making this explicit would clarify the situation, but it seems it’s only causing confusion in this PR. I’ll change this back and we can address that later.

  43. sipa commented at 5:26 pm on January 6, 2017: member

    @theuni Please ignore my request to keep processing multiple messages at the same time.

    In nearly all cases in the current code, only 1 message is processed.

    I thought you were trying to say our send buffer was usually full, not that the code actually never loops (which is apparently behaviour that was intentionally changed in #3180).

  44. theuni force-pushed on Jan 6, 2017
  45. theuni commented at 6:01 pm on January 6, 2017: member

    Updated to address feedback. Most of it is just fixing up interim commits to be more correct for easier commit-by-commit review.

    Actual changes:

    • Fixed up the condvar as @sipa suggested
    • Completely removed cs_vRecvMsg. See the commit message for more detail.

    Diff from before is here: https://gist.github.com/theuni/9c18405455a2569156671aadcf242500

  46. in src/net.cpp: in ddac068fa5 outdated
    1158@@ -1167,10 +1159,7 @@ void CConnman::ThreadSocketHandler()
    1159                     }
    1160                 }
    1161                 {
    1162-                    TRY_LOCK(pnode->cs_vRecvMsg, lockRecv);
    1163-                    if (lockRecv && (
    1164-                        pnode->vRecvMsg.empty() || !pnode->vRecvMsg.front().complete() ||
    1165-                        pnode->GetTotalRecvSize() <= GetReceiveFloodSize()))
    1166+                    if (!pnode->fPauseRecv)
    


    morcos commented at 6:44 pm on January 6, 2017:
    The comment above this code section describing the logic could be updated.
  47. morcos commented at 7:29 pm on January 6, 2017: member

    Per IRC discussion, I think it would be wise to add something that shortens/skips the wait time at the end if the loop has taken a long time in particular for the case of connecting a new tip. But that is a preexisting condition not a regression.

    Overall looks good to me. newbie quality code review ACK

  48. theuni commented at 7:49 pm on January 6, 2017: member
    @morcos I meant to expand on the commit message for c5b4e31fb87a9819330b83f4ac6cc92066e8337d, but apparently neglected to do so before pushing. I expect that we’ll want to add connman.WakeMessageHandler() in a few places to take care of that as a follow-up.
  49. sipa commented at 2:21 am on January 8, 2017: member
    utACK ddac068fa5d6b43925a96149f2d9b81ec736a5e7
  50. in src/net.cpp: in f39dd53470 outdated
    647@@ -648,6 +648,9 @@ void CNode::copyStats(CNodeStats &stats)
    648 bool CNode::ReceiveMsgBytes(const char *pch, unsigned int nBytes, bool& complete)
    649 {
    650     complete = false;
    651+    int64_t nTime = GetTimeMicros();
    


    pstratem commented at 4:37 pm on January 10, 2017:
    It would be nice if this was nTimeMicros instead of nTime

    theuni commented at 7:35 pm on January 10, 2017:
    will do
  51. in src/net.cpp: in ba4cae284f outdated
    1877@@ -1882,10 +1878,10 @@ void CConnman::ThreadMessageHandler()
    1878                 pnode->Release();
    1879         }
    1880 
    1881-        if (fSleep) {
    1882-            std::unique_lock<std::mutex> lock(mutexMsgProc);
    1883-            condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100));
    1884-        }
    1885+        std::unique_lock<std::mutex> lock(mutexMsgProc);
    1886+        if (!fMoreWork)
    


    pstratem commented at 4:50 pm on January 10, 2017:
    Please add the {} back even though it’s now 1 line

    theuni commented at 7:34 pm on January 10, 2017:
    will do
  52. in src/net_processing.cpp: in ba4cae284f outdated
    2459     if (!pfrom->vRecvGetData.empty())
    2460         ProcessGetData(pfrom, chainparams.GetConsensus(), connman, interruptMsgProc);
    2461 
    2462     // this maintains the order of responses
    2463-    if (!pfrom->vRecvGetData.empty()) return fOk;
    2464+    if (!pfrom->vRecvGetData.empty()) return true;
    


    pstratem commented at 4:52 pm on January 10, 2017:
    Probably should swap these around.

    theuni commented at 7:34 pm on January 10, 2017:
    ok
  53. theuni force-pushed on Jan 11, 2017
  54. theuni commented at 9:58 pm on January 11, 2017: member
    Addressed outstanding nits, will squash at the end.
  55. laanwj commented at 10:54 am on January 12, 2017: member
    (Lightly) tested ACK 1c779a8. This is an older version but it’s been running stable on a busy node for a week.
  56. theuni commented at 2:23 pm on January 12, 2017: member
    @TheBlueMatt @sipa @morcos care to re-ACK before final squash?
  57. morcos commented at 2:54 pm on January 12, 2017: member
    utACK and ran on some nodes without error for several days ddac068 utACK 7b44e18
  58. TheBlueMatt commented at 8:47 pm on January 12, 2017: member
    Dont think its actually possible due to the sizes involved, but, theoretically, PushMessage could increment nSendSize to be > nSendBufferMaxSize, set fPauseSend to true, and then call opportunistic SocketSendData, which flushed the entire buffer, but didnt reset fPauseSend. I’d suggest moving the fPauseSend update to right beside the nSendSize decrement as it is for the increment.
  59. in src/net.cpp: in 975fc9dd6f outdated
    1254-                                        pnode->fPauseRecv = true;
    1255-                                }
    1256-                                WakeMessageHandler();
    1257+                                completeMessages.splice(completeMessages.end(), pnode->vRecvMsg, pnode->vRecvMsg.begin(), it);
    1258+                                if (!QueueReceivedMessages(pnode, std::move(completeMessages), nSizeAdded))
    1259+                                    pnode->fPauseRecv = true;
    


    TheBlueMatt commented at 9:06 pm on January 12, 2017:
    By moving the setting of fPauseRecv outside of the cs_vProcessMsg lock you’ve introduced a race where the message processing thread may have already taken the next message and set fPauseRecv to false before this line (seems pretty unlikely, but its not unheard of). I’d suggest simply dropping this commit.

    theuni commented at 9:27 pm on January 12, 2017:
    Good catch!
  60. TheBlueMatt commented at 9:41 pm on January 12, 2017: member
    utACK 7b44e18b2f3a39657699adff5dd39dcfde24f1fb if 975fc9dd6f55a87f7d0db62898e58169b2816baf is reverted and the fPauseSend updates are all placed directly beside the nSendSize updates.
  61. TheBlueMatt commented at 0:37 am on January 13, 2017: member
    utACK 0168bea73772a707c53952956a364262cea63cd3 (+/- squash-needed).
  62. net: handle message accounting in ReceiveMsgBytes
    This allows locking to be pushed down to only where it's needed
    
    Also reuse the current time rather than checking multiple times.
    60befa3997
  63. net: record bytes written before notifying the message processor f5c36d19b6
  64. net: Add a simple function for waking the message handler
    This may be used publicly in the future
    ef7b5ecbb7
  65. net: remove useless comments c72cc88ed3
  66. net: rework the way that the messagehandler sleeps
    In order to sleep accurately, the message handler needs to know if _any_ node
    has more processing that it should do before the entire thread sleeps.
    
    Rather than returning a value that represents whether ProcessMessages
    encountered a message that should trigger a disconnnect, interpret the return
    value as whether or not that node has more work to do.
    
    Also, use a global fProcessWake value that can be set by other threads,
    which takes precedence (for one cycle) over the messagehandler's decision.
    
    Note that the previous behavior was to only process one message per loop
    (except in the case of a bad checksum or invalid header). That was changed in
    PR #3180.
    
    The only change here in that regard is that the current node now falls to the
    back of the processing queue for the bad checksum/invalid header cases.
    c5a8b1b946
  67. net: add a new message queue for the message processor
    This separates the storage of messages from the net and queued messages for
    processing, allowing the locks to be split.
    4d712e366c
  68. net: add a flag to indicate when a node's process queue is full
    Messages are dumped very quickly from the socket handler to the processor, so
    it's the depth of the processing queue that's interesting.
    
    The socket handler checks the process queue's size during the brief message
    hand-off and pauses if necessary, and the processor possibly unpauses each time
    a message is popped off of its queue.
    c6e8a9bcff
  69. net: add a flag to indicate when a node's send buffer is full
    Similar to the recv flag, but this one indicates whether or not the net's send
    buffer is full.
    
    The socket handler checks the send queue when a new message is added and pauses
    if necessary, and possibly unpauses after each message is drained from its buffer.
    991955ee81
  70. net: remove cs_vRecvMsg
    vRecvMsg is now only touched by the socket handler thread.
    
    The accounting vars (nRecvBytes/nLastRecv/mapRecvBytesPerMsgCmd) are also
    only used by the socket handler thread, with the exception of queries from
    rpc/gui. These accesses are not threadsafe, but they never were. This needs to
    be addressed separately.
    
    Also, update comment describing data flow
    e60360e139
  71. theuni force-pushed on Jan 13, 2017
  72. theuni commented at 4:45 am on January 13, 2017: member
    Squashed, and I believe this is now merge-ready. @sipa / @morcos See https://github.com/theuni/bitcoin/tree/connman-locks-tmp for the unsquashed version if you’d like. It’s codewise-identical to e60360e139852c655930e99d4bb4db554cd8385e.
  73. morcos commented at 3:43 pm on January 13, 2017: member
    re-utACK e60360e
  74. TheBlueMatt commented at 4:48 pm on January 13, 2017: member
    Confirmed there is no diff-tree to e60360e139852c655930e99d4bb4db554cd8385e from my previous utACK
  75. sipa commented at 5:50 pm on January 13, 2017: member
    utACK e60360e139852c655930e99d4bb4db554cd8385e
  76. sipa merged this on Jan 13, 2017
  77. sipa closed this on Jan 13, 2017

  78. sipa referenced this in commit 8b66bf74e2 on Jan 13, 2017
  79. fanquake moved this from the "In progress" to the "Done" column in a project

  80. random-zebra referenced this in commit 777638e7bc on Aug 27, 2020
  81. random-zebra referenced this in commit cbd9271afb on Sep 7, 2020
  82. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me