Net: Massive speedup. Net locks overhaul

theuni commented at 8:07 am on December 29, 2016: member

Depends on (includes) #9289. This is what I ultimately set out to fix with the net refactor.

In my (short) tests, it cuts network latency to ~50%. In some cases, more like 30%.

Test method: 1 fresh node (macbook) connected to a single other node (desktop). Testnet. Running until synced to block 10000. Results:

0new = patched with this PR.
1old = running [#9289](/bitcoin-bitcoin/9289/).
2client / server
3old / old: 100000 in 8:05
4new / old: 100000 in 3:24
5new / new: 100000 in 2:16

The results are very reproducible, always within a few seconds. Not only is it a nice improvement for the fresh node, but it compounds when its peer is running the updated code as well.

I had hoped to have the abstraction complete in time for 0.14, but it looks like that’s unlikely at this point. For reference, it would look something more like https://github.com/theuni/bitcoin/commit/1a6b10aea129ac363727c2d68fae809c2861e4da.

Anyway, this is an attempt to fix the actual issue in time for 0.14, and putting off the last bit of refactor until after that. This addresses the issue observed in #9415, but also cleans up the nasty locking issues.

I labeled this WIP because there are probably still several racy bits. Quite a bit of test-writing remains.

See the individual commits for the details. tl;dr: We currently either process a peer’s message or read from their socket, but never both simultaneously. The changes here remove that restriction.

fanquake added the label Refactoring on Dec 29, 2016

fanquake added this to the milestone 0.14.0 on Dec 29, 2016

fanquake added the label P2P on Dec 29, 2016

in src/net_processing.cpp: in 65e916c499 outdated

2452+    if (pfrom->fDisconnect)
2453+        return false;
2454 
2455-    std::deque<CNetMessage>::iterator it = pfrom->vRecvMsg.begin();
2456-    while (!pfrom->fDisconnect && it != pfrom->vRecvMsg.end()) {
2457         // Don't bother if send buffer is too full to respond anyway

rebroad commented at 4:37 pm on December 29, 2016:

I’m not sure if github is messing with the formatting, but this line looks unnecessarily indented.

theuni commented at 4:47 pm on December 29, 2016:

Yes, the indentation was just kept to avoid creating a huge whitespace diff. This can be cleaned up in a move-only commit as a next step.

theuni force-pushed on Dec 29, 2016

theuni commented at 2:09 am on December 31, 2016: member

@sipa I’ve spent all day on changing this around some and breaking this up into logical commits in order to satisfy myself (and i hope others) that it’s safe. I’ll push up a fresh version in a few min, and drop the WIP tag.

theuni force-pushed on Dec 31, 2016

theuni renamed this:
~~WIP: Massive net speedup. Net locks overhaul.~~
Net: Massive speedup. Net locks overhaul
on Dec 31, 2016

theuni commented at 4:39 am on December 31, 2016: member

Updated. Don’t let the number of commits scare you off, the diffstat isn’t too bad.

I attempted to break this up into small/simple commits, in order to slowly remove the need for cs_vRecvMsg. The last commit actually removes it.

I’m satisfied that this shouldn’t be introducing any new races, as only a handful of vars were actually touched on the SocketHandler thread.

Edit: time-ordered.

theuni force-pushed on Dec 31, 2016

in src/net.cpp: in 3ab40f619d outdated

1242@@ -1243,9 +1243,10 @@ void CConnman::ThreadSocketHandler()
1243                             bool notify = false;
1244                             if (!pnode->ReceiveMsgBytes(pchBuf, nBytes, notify))
1245                                 pnode->CloseSocketDisconnect();
1246+                            RecordBytesRecv(nBytes);
1247                             if(notify)
1248                                 condMsgProc.notify_one();
1249-                            RecordBytesRecv(nBytes);
1250+                            }

theuni commented at 7:39 am on December 31, 2016:

Whoops, rebase gone bad here. It ends up right in the end. Will fix.

in src/net_processing.cpp: in 9959988414 outdated

2449-    if (!pfrom->vRecvGetData.empty()) return fOk;
2450+    if (!pfrom->vRecvGetData.empty()) return true;
2451 
2452     auto it = pfrom->vRecvMsg.begin();
2453-    while (!pfrom->fDisconnect && it != pfrom->vRecvMsg.end()) {
2454+    if (!pfrom->fDisconnect && it != pfrom->vRecvMsg.end()) {

theuni commented at 7:57 am on December 31, 2016:

Need to clarify in the commit message: It was almost always the case that only one message was processed in this while loop. See the break at line 2538.

This commit changes the behavior so that it’s always one message processed per-loop. The only messages that show different behavior are the ones that used to “continue” the loop. I think it’s perfectly reasonable to skip to the next node for processing in that case.

sipa commented at 8:55 pm on January 5, 2017:

Can you clarify what the rationale is for switching from a ProcessMessages that processes multiple messages to just a single one? I’m not opposed to the change, but I’d like to understand what the reason for the change is.

TheBlueMatt commented at 9:10 pm on January 5, 2017:

Point is this is not a change. See #9441 (comment)

dcousens commented at 0:18 am on January 3, 2017: contributor

@theuni is this worth testing now?

theuni commented at 4:16 am on January 3, 2017: member

@dcousens Very much so!

in src/net.cpp: in b1b6772240 outdated

680+    }
681+
682+    int64_t nTime = GetTimeMicros();
683+    LOCK(cs_vRecvMsg);
684+    nRecvBytes += nBytes;
685+    nLastRecv = nTime * 1000;

TheBlueMatt commented at 10:48 am on January 3, 2017:

I think you meant /, not * here?

theuni commented at 3:11 pm on January 3, 2017:

Heh, sure did

in src/net.cpp: in b1b6772240 outdated

679+            complete = true;
680+    }
681+
682+    int64_t nTime = GetTimeMicros();
683+    LOCK(cs_vRecvMsg);
684+    nRecvBytes += nBytes;

TheBlueMatt commented at 10:51 am on January 3, 2017:

It might be nice in the RPC for mapRecvBytesPerMsgCmd to add up to nRecvBytes (ie to increment this in the loop per-msg instead of after receiving (which would also mean no api change)).

theuni commented at 3:13 pm on January 3, 2017:

Sure, that just means more locking. Though this is completely uncontested now, so that’s not really a problem.

TheBlueMatt commented at 11:11 am on January 3, 2017: member

See IRC discussion:

 0<BlueMatt> cfields: hmmm, regarding [#9441](/bitcoin-bitcoin/9441/), do we really want to run the entire ProcessMessages loop (calling SendMessages potentially umpteen times) just to process multiple messages from the same node?
 1<BlueMatt> (ie remove the loop inside ProcessMessages and move it to ThreadProcessMessages)
 2<sipa> BlueMatt: i was wondering about that too
 3<sipa> BlueMatt: but there is hardly any (contentious) locking going on anymore in between
 4<BlueMatt> yea, but SendMessages.....
 5<sipa> Ah.
 6<sipa> i hadn't considered that
 7<sipa> that defeats the purpose
 8<sipa> hmm, i was about to say that it negatively impacts batching of invs and addrs
 9<sipa> but since we have explicit random delays for those, i don't think that's really an issue anymore
10<BlueMatt> yea, I dont think it breaks anything
11<BlueMatt> just repeatedly calls SendMessages to do nothing
12<sipa> oh, it won't break anything

theuni commented at 3:19 pm on January 3, 2017: member

@TheBlueMatt If I understand you correctly, as mentioned in the summary, that’s the part that was cut out here for simplicity. The next set of changes would combine the loops and move them into net_processing. See here for how it looked in a previous iteration: https://github.com/theuni/bitcoin/commit/1a6b10aea129ac363727c2d68fae809c2861e4da#diff-eff7adeaec73a769788bb78858815c91R271

I’d be happy to add that here, but I’m afraid it adds a significant review burden to this PR.

theuni commented at 4:15 pm on January 3, 2017: member

Also from IRC:

0<BlueMatt> I assume you didnt touch cs_vSend due to https://github.com/bitcoin/bitcoin/pull/9419/commits/c214d120a363a05ba9afdccff6b4bda6e29ae7c4, cfields?
1<cfields> BlueMatt: yes, cs_vSend wasn't touched because of your PR. I've just been operating under the assumption that the lock around SendMessages will be removed by one PR or another.

theuni commented at 6:27 pm on January 3, 2017: member

Digging further into the IRC conversation above, I’d like to clarify that the interaction here between ProcessMessages and SendMessages is not changed substantially in this PR. @TheBlueMatt had missed a subtle part of the current behavior, and maybe @sipa too, so I’d just like to explicitly point out the current break here: https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L2543, added in #3180

For the sake of processing in a round-robin fashion, we currently end up running through SendMessages multiple times while draining a node’s received message queue. That behavior is far from ideal, and needs to be addressed, but is not changed here*. I have plans to work on this in next steps, but not for 0.14.

*It’s only changed in the case of an invalid header, or bad message checksum.

gmaxwell commented at 3:15 am on January 4, 2017: contributor

cfields: hmmm, regarding #9441, do we really want to run the entire ProcessMessages loop (calling SendMessages potentially umpteen times) just to process multiple messages from the same node? (ie remove the loop inside ProcessMessages and move it to ThreadProcessMessages)

Clearly people weren’t reading PRs on this or you would have noticed that it only processes one at a time right now. I actually fixed it to process as many as it could until it encountered lock contention, then specifically pinged people for input on the effect. After no response, I unfixed it because of concern about the latency impact. Handling multiple messages at a time is good, if they’re fast ones…

sipa commented at 3:21 am on January 4, 2017: member

I certainly noticed it only processed one message at a time. I missed that it calls SendMessages between each.

net: fix typo causing the wrong receive buffer size

Surprisingly this hasn't been causing me any issues while testing, probably
because it requires lots of large blocks to be flying around.

Send/Recv corks need tests!

53ad9a133a

net: make vRecvMsg a list so that we can use splice() e5bcd9c84f

net: make GetReceiveFloodSize public

This will be needed so that the message processor can cork incoming messages

5b4a8ac6d6

net: only disconnect if fDisconnect has been set

These conditions are problematic to check without locking, and we shouldn't be
relying on the refcount to disconnect.

f6315e07f9

net: wait until the node is destroyed to delete its recv buffer

when vRecvMsg becomes a private buffer, it won't make sense to allow other
threads to mess with it anymore.

60425870d7

sipa commented at 2:42 pm on January 4, 2017: member

Needs rebase.

net: remove redundant max sendbuffer size check

This is left-over from before there was proper accounting. Hitting 2x the
sendbuffer size should not be possible.

0e973d970a

net: set message deserialization version when it's actually time to deserialize

We'll soon no longer have access to vRecvMsg, and this is more intuitive anyway.

56212e20ac

theuni force-pushed on Jan 4, 2017

theuni commented at 3:25 pm on January 4, 2017: member

Rebased, and I believe I addressed all comments so far.

I did a full mainnet sync last night with these (pre-rebase) changes, with no issues. I had extra logging to observe the recv pausing behavior, and verified that it’s working as intended. It is paused quite frequently during IBD. Obviously that’s intended, as it signals for the peer to slow down, but it may be worth discussing the consequences of a higher (or lower!) default for -maxreceivebuffer now that our recv is no longer blocked by processing. Out of scope for this PR, though.

in src/net.cpp: in f6315e07f9 outdated

1050@@ -1051,8 +1051,7 @@ void CConnman::ThreadSocketHandler()
1051             std::vector<CNode*> vNodesCopy = vNodes;
1052             BOOST_FOREACH(CNode* pnode, vNodesCopy)
1053             {
1054-                if (pnode->fDisconnect ||
1055-                    (pnode->GetRefCount() <= 0 && pnode->vRecvMsg.empty() && pnode->nSendSize == 0))

TheBlueMatt commented at 9:38 pm on January 4, 2017:

Is it just me, or were these always impossible to hit? Both ConnectNode and AcceptConnection do a AddRef() which is only countered by the Release a few lines further down here.

TheBlueMatt commented at 9:39 pm on January 4, 2017:

(obviously not an issue, just checking my logic).

theuni commented at 9:47 pm on January 4, 2017: member

I believe you’re correct, that’s my reading of it as well. Regardless, I wasn’t able to track down a path where fDisconnect wouldn’t be set as necessary.

in src/net.cpp: in 1c779a8cfe outdated

648 bool CNode::ReceiveMsgBytes(const char *pch, unsigned int nBytes, bool& complete)
649 {
650+    LOCK(cs_vRecvMsg);
651     complete = false;
652+    int64_t nTime = GetTimeMicros();
653+    nLastRecv = nTime / 1000;

TheBlueMatt commented at 10:03 pm on January 4, 2017:

I think you actually meant / 1000 / 1000 :p

theuni commented at 8:00 pm on January 5, 2017:

I completely missed that this was a full-second value. Thanks for catching this (twice!). Will fix.

in src/net.cpp: in 4157c99cd8 outdated

1877@@ -1882,10 +1878,9 @@ void CConnman::ThreadMessageHandler()
1878                 pnode->Release();
1879         }
1880 
1881-        if (fSleep) {
1882-            std::unique_lock<std::mutex> lock(mutexMsgProc);
1883-            condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100));
1884-        }
1885+        std::unique_lock<std::mutex> lock(mutexMsgProc);
1886+        condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100), [this, &fMoreWork] { return fMoreWork || fMsgProcWake; });

sipa commented at 8:50 pm on January 5, 2017:

It seems that during this wait, fMoreWork will never change (and it isn’t covered by the lock either), so this line could instead be written as

0if (!fMoreWork) {
1    condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100), [this] { return fMsgProcWake; });
2}

theuni commented at 9:21 pm on January 5, 2017:

Yep. Will fix.

in src/net.cpp: in b5ea4f2199 outdated

1238@@ -1239,8 +1239,18 @@ void CConnman::ThreadSocketHandler()
1239                             if (!pnode->ReceiveMsgBytes(pchBuf, nBytes, notify))
1240                                 pnode->CloseSocketDisconnect();
1241                             RecordBytesRecv(nBytes);
1242-                            if (notify)
1243+                            if (notify) {
1244+                                auto it(pnode->vRecvMsg.begin());
1245+                                for (; it != pnode->vRecvMsg.end(); ++it) {

sipa commented at 8:58 pm on January 5, 2017:

I find it generally more readable to write something like this as

0while (it != pnode->vRecvMsg.end() && it->complete()) ++it;

Also, isn’t iterating backwards more efficient?

theuni commented at 9:19 pm on January 5, 2017:

Sure, will change it to a while.

Iterating backwards would be preferable, but we end up hitting each msg anyway later (to add up message sizes) in a later commit, so i didn’t bother (see https://github.com/bitcoin/bitcoin/pull/9441/commits/a109f7bd1801392bf99994a7165d495200156c4a#diff-9a82240fe7dfe86564178691cc57f2f1R1246). The sizes are added ahead of time in order to avoid iterating the list under cs_vProcessMsg.

in src/net_processing.cpp: in 4157c99cd8 outdated

2501         CMessageHeader& hdr = msg.hdr;
2502         if (!hdr.IsValid(chainparams.MessageStart()))
2503         {
2504             LogPrintf("PROCESSMESSAGE: ERRORS IN HEADER %s peer=%d\n", SanitizeString(hdr.GetCommand()), pfrom->id);
2505-            continue;
2506+            return fMoreWork;

TheBlueMatt commented at 9:19 pm on January 5, 2017:

This is always false here.

theuni commented at 9:27 pm on January 5, 2017:

This is unclear due to rebasing. In the end-result, this will return true if there are more messages.

In this commit, there’s an interim chunk that’s missing. line 2479 should have:

0fMoreWork = !pfrom->vRecvMsg.empty()

I’ll fix it in the next rebase so that it’s more clear.

pstratem commented at 5:03 pm on January 10, 2017:

This should set fDisconnect and return false unconditionally actually.

(Yeah that changes the current behavior)

theuni commented at 7:38 pm on January 10, 2017:

Yes, that would be a change of behavior, and I’m not entirely sure that we want it changed. Presumably this was done to avoid partitioning the network in the event that the header structure (or magic bytes) ever needs to change.

in src/net.h: in a109f7bd18 outdated

650@@ -650,6 +651,7 @@ class CNode
651     const NodeId id;
652 
653     const uint64_t nKeyedNetGroup;
654+    std::atomic_bool fPauseRecv;

TheBlueMatt commented at 9:43 pm on January 5, 2017:

This is never set to true until a later commit. I’m generally ok with this, but it is a bit strange.

theuni commented at 9:50 pm on January 5, 2017:

Grr, another artifact of breaking up an end-result into smaller commits. I’ll add this where it belongs in the next rebase.

in src/net.cpp: in b5ea4f2199 outdated

1238@@ -1239,8 +1239,18 @@ void CConnman::ThreadSocketHandler()
1239                             if (!pnode->ReceiveMsgBytes(pchBuf, nBytes, notify))
1240                                 pnode->CloseSocketDisconnect();
1241                             RecordBytesRecv(nBytes);
1242-                            if (notify)
1243+                            if (notify) {
1244+                                auto it(pnode->vRecvMsg.begin());

TheBlueMatt commented at 9:50 pm on January 5, 2017:

You’re not holding cs_vRecvMsg here, which hyou need to access vRecvMsg.

theuni commented at 10:03 pm on January 5, 2017:

vRecvMsg no longer needs cs_vRecvMsg starting here. It’s only ever used on this thread. In reality, cs_vRecvMsg isn’t needed at all, so it would probably be more clear to just remove it. copyStats is the only thing that accesses some of these vars, but it already operates without any locks, so the situation isn’t changed from before.

TheBlueMatt commented at 10:14 pm on January 5, 2017:

Hum, ok, can you just add a comment somewhere to note that cs_vRecvMsg no longer protects vRecvMsg, and instead protects other stuff.

theuni commented at 3:12 pm on January 6, 2017: member

From @sipa

Can you clarify what the rationale is for switching from a ProcessMessages that processes multiple messages to just a single one? I’m not opposed to the change, but I’d like to understand what the reason for the change is.

In nearly all cases in the current code, only 1 message is processed. I thought that making this explicit would clarify the situation, but it seems it’s only causing confusion in this PR. I’ll change this back and we can address that later.

sipa commented at 5:26 pm on January 6, 2017: member

@theuni Please ignore my request to keep processing multiple messages at the same time.

In nearly all cases in the current code, only 1 message is processed.

I thought you were trying to say our send buffer was usually full, not that the code actually never loops (which is apparently behaviour that was intentionally changed in #3180).

theuni force-pushed on Jan 6, 2017

theuni commented at 6:01 pm on January 6, 2017: member

Updated to address feedback. Most of it is just fixing up interim commits to be more correct for easier commit-by-commit review.

Actual changes:

Fixed up the condvar as @sipa suggested
Completely removed cs_vRecvMsg. See the commit message for more detail.

Diff from before is here: https://gist.github.com/theuni/9c18405455a2569156671aadcf242500

in src/net.cpp: in ddac068fa5 outdated

1158@@ -1167,10 +1159,7 @@ void CConnman::ThreadSocketHandler()
1159                     }
1160                 }
1161                 {
1162-                    TRY_LOCK(pnode->cs_vRecvMsg, lockRecv);
1163-                    if (lockRecv && (
1164-                        pnode->vRecvMsg.empty() || !pnode->vRecvMsg.front().complete() ||
1165-                        pnode->GetTotalRecvSize() <= GetReceiveFloodSize()))
1166+                    if (!pnode->fPauseRecv)

morcos commented at 6:44 pm on January 6, 2017:

The comment above this code section describing the logic could be updated.

morcos commented at 7:29 pm on January 6, 2017: member

Per IRC discussion, I think it would be wise to add something that shortens/skips the wait time at the end if the loop has taken a long time in particular for the case of connecting a new tip. But that is a preexisting condition not a regression.

Overall looks good to me. newbie quality code review ACK

theuni commented at 7:49 pm on January 6, 2017: member

@morcos I meant to expand on the commit message for c5b4e31fb87a9819330b83f4ac6cc92066e8337d, but apparently neglected to do so before pushing. I expect that we’ll want to add connman.WakeMessageHandler() in a few places to take care of that as a follow-up.

sipa commented at 2:21 am on January 8, 2017: member

utACK ddac068fa5d6b43925a96149f2d9b81ec736a5e7

in src/net.cpp: in f39dd53470 outdated

647@@ -648,6 +648,9 @@ void CNode::copyStats(CNodeStats &stats)
648 bool CNode::ReceiveMsgBytes(const char *pch, unsigned int nBytes, bool& complete)
649 {
650     complete = false;
651+    int64_t nTime = GetTimeMicros();

pstratem commented at 4:37 pm on January 10, 2017:

It would be nice if this was nTimeMicros instead of nTime

theuni commented at 7:35 pm on January 10, 2017:

will do

in src/net.cpp: in ba4cae284f outdated

1877@@ -1882,10 +1878,10 @@ void CConnman::ThreadMessageHandler()
1878                 pnode->Release();
1879         }
1880 
1881-        if (fSleep) {
1882-            std::unique_lock<std::mutex> lock(mutexMsgProc);
1883-            condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100));
1884-        }
1885+        std::unique_lock<std::mutex> lock(mutexMsgProc);
1886+        if (!fMoreWork)

pstratem commented at 4:50 pm on January 10, 2017:

Please add the {} back even though it’s now 1 line

theuni commented at 7:34 pm on January 10, 2017:

will do

in src/net_processing.cpp: in ba4cae284f outdated

2459     if (!pfrom->vRecvGetData.empty())
2460         ProcessGetData(pfrom, chainparams.GetConsensus(), connman, interruptMsgProc);
2461 
2462     // this maintains the order of responses
2463-    if (!pfrom->vRecvGetData.empty()) return fOk;
2464+    if (!pfrom->vRecvGetData.empty()) return true;

pstratem commented at 4:52 pm on January 10, 2017:

Probably should swap these around.

theuni commented at 7:34 pm on January 10, 2017:

ok

theuni force-pushed on Jan 11, 2017

theuni commented at 9:58 pm on January 11, 2017: member

Addressed outstanding nits, will squash at the end.

laanwj commented at 10:54 am on January 12, 2017: member

(Lightly) tested ACK 1c779a8. This is an older version but it’s been running stable on a busy node for a week.

theuni commented at 2:23 pm on January 12, 2017: member

@TheBlueMatt @sipa @morcos care to re-ACK before final squash?

morcos commented at 2:54 pm on January 12, 2017: member

utACK and ran on some nodes without error for several days ddac068 utACK 7b44e18

TheBlueMatt commented at 8:47 pm on January 12, 2017: member

Dont think its actually possible due to the sizes involved, but, theoretically, PushMessage could increment nSendSize to be > nSendBufferMaxSize, set fPauseSend to true, and then call opportunistic SocketSendData, which flushed the entire buffer, but didnt reset fPauseSend. I’d suggest moving the fPauseSend update to right beside the nSendSize decrement as it is for the increment.

in src/net.cpp: in 975fc9dd6f outdated

1254-                                        pnode->fPauseRecv = true;
1255-                                }
1256-                                WakeMessageHandler();
1257+                                completeMessages.splice(completeMessages.end(), pnode->vRecvMsg, pnode->vRecvMsg.begin(), it);
1258+                                if (!QueueReceivedMessages(pnode, std::move(completeMessages), nSizeAdded))
1259+                                    pnode->fPauseRecv = true;

TheBlueMatt commented at 9:06 pm on January 12, 2017:

By moving the setting of fPauseRecv outside of the cs_vProcessMsg lock you’ve introduced a race where the message processing thread may have already taken the next message and set fPauseRecv to false before this line (seems pretty unlikely, but its not unheard of). I’d suggest simply dropping this commit.

theuni commented at 9:27 pm on January 12, 2017:

Good catch!

TheBlueMatt commented at 9:41 pm on January 12, 2017: member

utACK 7b44e18b2f3a39657699adff5dd39dcfde24f1fb if 975fc9dd6f55a87f7d0db62898e58169b2816baf is reverted and the fPauseSend updates are all placed directly beside the nSendSize updates.

TheBlueMatt commented at 0:37 am on January 13, 2017: member

utACK 0168bea73772a707c53952956a364262cea63cd3 (+/- squash-needed).

net: handle message accounting in ReceiveMsgBytes

This allows locking to be pushed down to only where it's needed

Also reuse the current time rather than checking multiple times.

60befa3997

net: record bytes written before notifying the message processor f5c36d19b6

net: Add a simple function for waking the message handler

This may be used publicly in the future

ef7b5ecbb7

net: remove useless comments c72cc88ed3

net: rework the way that the messagehandler sleeps

In order to sleep accurately, the message handler needs to know if _any_ node
has more processing that it should do before the entire thread sleeps.

Rather than returning a value that represents whether ProcessMessages
encountered a message that should trigger a disconnnect, interpret the return
value as whether or not that node has more work to do.

Also, use a global fProcessWake value that can be set by other threads,
which takes precedence (for one cycle) over the messagehandler's decision.

Note that the previous behavior was to only process one message per loop
(except in the case of a bad checksum or invalid header). That was changed in
PR #3180.

The only change here in that regard is that the current node now falls to the
back of the processing queue for the bad checksum/invalid header cases.

c5a8b1b946

net: add a new message queue for the message processor

This separates the storage of messages from the net and queued messages for
processing, allowing the locks to be split.

4d712e366c

net: add a flag to indicate when a node's process queue is full

Messages are dumped very quickly from the socket handler to the processor, so
it's the depth of the processing queue that's interesting.

The socket handler checks the process queue's size during the brief message
hand-off and pauses if necessary, and the processor possibly unpauses each time
a message is popped off of its queue.

c6e8a9bcff

net: add a flag to indicate when a node's send buffer is full

Similar to the recv flag, but this one indicates whether or not the net's send
buffer is full.

The socket handler checks the send queue when a new message is added and pauses
if necessary, and possibly unpauses after each message is drained from its buffer.

991955ee81

net: remove cs_vRecvMsg

vRecvMsg is now only touched by the socket handler thread.

The accounting vars (nRecvBytes/nLastRecv/mapRecvBytesPerMsgCmd) are also
only used by the socket handler thread, with the exception of queries from
rpc/gui. These accesses are not threadsafe, but they never were. This needs to
be addressed separately.

Also, update comment describing data flow

e60360e139

theuni force-pushed on Jan 13, 2017

theuni commented at 4:45 am on January 13, 2017: member

Squashed, and I believe this is now merge-ready. @sipa / @morcos See https://github.com/theuni/bitcoin/tree/connman-locks-tmp for the unsquashed version if you’d like. It’s codewise-identical to e60360e139852c655930e99d4bb4db554cd8385e.

morcos commented at 3:43 pm on January 13, 2017: member

re-utACK e60360e

TheBlueMatt commented at 4:48 pm on January 13, 2017: member

Confirmed there is no diff-tree to e60360e139852c655930e99d4bb4db554cd8385e from my previous utACK

sipa commented at 5:50 pm on January 13, 2017: member

utACK e60360e139852c655930e99d4bb4db554cd8385e

sipa merged this on Jan 13, 2017

sipa closed this on Jan 13, 2017

sipa referenced this in commit 8b66bf74e2 on Jan 13, 2017

fanquake moved this from the "In progress" to the "Done" column in a project

random-zebra referenced this in commit 777638e7bc on Aug 27, 2020

random-zebra referenced this in commit cbd9271afb on Sep 7, 2020

MarcoFalke locked this on Sep 8, 2021

Net: Massive speedup. Net locks overhaul #9441