ThreadMessageHandler: replace MilliSleep() with condition variable #4230

ashleyholman commented at 8:16 AM on May 25, 2014: contributor

I'm hoping to get some feedback on this. It was mentioned by @gmaxwell that there is a 100ms sleep inside ThreadMessageHandler() which can delay the receiving and sending of network messages. By replacing this with a semaphore, it can hopefully improve block propagation times.

I haven't experienced any connections stalling with this patch, so it seems to work, and the ThreadMessageHandler() loop executed about 10 times per second on average when I tested with 8 peers.

There is still a lot of overhead since each execution of the loop does a scan of all nodes to check for new messages to send/receive, which I believe will happen every time there is a post() done on the semaphore. But, I haven't noticed any additional CPU load with this patch.

I am yet to test how this effects block propagation. Does anyone have a testing environment where you can roll out patches and test their effect on block latency?

gmaxwell commented at 8:33 AM on May 25, 2014: contributor

One way to test the latency is just to use the ping rpc (then getpeerinfo to see the results) and have a node with this patch and a peer with the patch. The effect should be visible (at least, it's very visible if you simply reset adjust the sleep!).

ashleyholman commented at 11:05 AM on May 25, 2014: contributor

Test Results

Two nodes were running, peered with each other via localhost, both running the same code. For each test, 100 pings were generated.

TEST 1: Unpatched master branch Average ping time: 101ms Median: 101ms Std Dev: 1.19ms

TEST 2: Patched with no_sleep branch Average ping time: 12.6ms Median: 10.9ms Std Dev: 6.4ms

Implementing the semaphore shows a clear drop in latency although there is added variance.

I'm surprised to see that the first test consistently had 100ms pings. I would have thought there would be more variance based on the timing of the sleep.

I took a look at the Jenkins build log. I'm not sure why the test is failing. It looks like it timed out generating the test blockchain.

laanwj commented at 2:02 PM on May 25, 2014: member

If it doesn't seem related to your own code, the error in pulltester could be an intermittent one.

ashleyholman commented at 2:33 PM on May 25, 2014: contributor

Pushed an amended commit trigger the test again

ashleyholman commented at 2:47 AM on May 26, 2014: contributor

The test has failed twice now. There must be a bug in this patch. I did notice when doing my ping test that the nPingTime value would only update about 30% of the time. But without the patch, it updated every time. At the time, I thought this must have been some race condition in the ping code, which gets more pronounced with the lower ping time. But now I'm thinking that perhaps it's dropping messages, or not draining the send queue fast enough. I'll do some further investigation.

mikehearn commented at 4:38 PM on May 26, 2014: contributor

The end of the log says:

/mnt/bitcoin/qa/rpc-tests/wallet.sh /mnt/bitcoin/linux-build/src Generating test blockchain... Timeout: aborting command ``/mnt/bitcoin/qa/pull-tester/pull-tester.sh'' with signal 9

So you could try running that test script and see if it has issues.

theuni commented at 7:33 PM on May 26, 2014: member

This seems like a scary change to make since it means that the messenger could stall if a post is forgotten. As a means of easing into this, how about teaching CSemaphore something like timed_wait(), which would timeout at 100msec as before, or immediately wake when signaled?

ashleyholman commented at 1:01 AM on May 27, 2014: contributor

@theuni - I agree with you there. However first I would like to get the timerless version working perfectly without any known bugs, and then add the 100msec timeout as pure safety net. If it relies on the timer for normal function, then the patch hasn't really achieved its purpose.

theuni commented at 3:13 PM on May 27, 2014: member

@ashleyholman Agreed, good plan.

sipa commented at 11:58 AM on May 31, 2014: member

I think you want a condition variable here, not a semaphore. Calling post() twice means wait() will return twice on a semaphore, which we don't want.

Note that CSemaphore is implemented on top of boost condition variables.

ashleyholman commented at 12:51 PM on May 31, 2014: contributor

@sipa Ah, that is exactly the problem I've been wanting to solve. I basically need the semaphore's counter to reset to 0 when wait() returns. Should I be adding a new class to sync.h for that just does this?

sipa commented at 1:39 PM on May 31, 2014: member

@ashleyholman What you really need is a condition-variable protected timestamp of the last addition to the receive queue or send state.

Basically:

boost::mutex cs_condMessageHandler;
boost::condition_variable condMessageHandler;
int64_t nMessageHandlerLastUpdate;
int64_t nMessageHandlerLastCheck;

void MessageHandlerWaiter() {
    boost::unique_lock<boost::mutex> lock(cs_condMessageHandler);
    while (nMessageHandlerLastUpdate <= nMessageHandlerLastCheck) {
        condMessageHandler.wait(lock);
    }
    nMessageHandlerLastCheck = GetTimeMicros();
}

void MessageHandlerUpdate() {
    boost::unique_lock<boost::mutex> lock(cs_condMessageHandler);
    nMessageHandlerLastUpdate = GetTimeMicros();
    condMessageHandler.notify_one();
}

Also, for SendMessages, I think you need indeed more, and just a waiting condition isn't enough. Some of the messages sent there are periodic, and necessary even if nothing was received in a while (for example sending pings). So you'll need something like:

if (!condMessageHandler.timed_wait(lock, boost::posix_time::milliseconds(5000)))
  break;

instead (which never blocks for more than 5 seconds).

sipa commented at 1:50 PM on May 31, 2014: member

Ugh, instead of the two timestamps you could just have a single boolean too, set to true when updating, and set to false after returning from wait.

sipa commented at 5:46 PM on May 31, 2014: member

To deal better with the ping latency variance, we could add a timestamp to CNetMessage, filled in when it is received. It's probably also useful for auto-disconnect logic of slow peers (currently, it's possible the trigger it if we're just too slow to react to an incoming message too).

ashleyholman commented at 4:14 AM on June 1, 2014: contributor

@sipa do you think there should be a new class added to sync.h with a simple interface for this? or just implement it all in the thread message handler?

Could be something like

void CCondition::wait() void CCondition::timed_wait(int milliseconds) void CCondition::post()

Something else that would be nice to have is a global message queue for new messages that need to be processed. So rather than having to loop over every node and check its vRecvMsg, there could be one queue that combines messages from all nodes. That might be more of a stage 2 thing though.

sipa commented at 6:44 AM on June 1, 2014: member

On Jun 1, 2014 6:14 AM, "Ashley Holman" notifications@github.com wrote:

@sipa do you think there should be a new class added to sync.h with a simple interface for this? or just implement it all in the thread message handler?

Could be something like

void CCondition::wait() void CCondition::timed_wait(int milliseconds) void CCondition::post()

Perhaps, but you have to encapsulate the state as well, as it needs to be mutex protected (and the condition variable wait needs to be able to temporarily release it), so either the timestamps or the boolean must be inside that class. I guess CCondition is a bit too general as a name. CTimeoutCondition maybe?

Also, the code that I gave above may not work if the waiting is regularly woken up spuriously (which the spec says can happen). The timed wait would need to decrement the timeout based on how much time passed in previous loops.

Something else that would be nice to have is a global message queue for new messages that need to be processed. So rather than having to loop over every node and check its vRecvMsg, there could be one queue that combines messages from all nodes. That might be more of a stage 2 thing though.

That may be possible, but we need to take the different parts of state being locked into account, and make sure that we can skip a busy node's nessages, without violating ordering (you need to process messages received by a single node in order). Also, we need per-node resource tracking (how much of the receive buffer is one peer using?).

ashleyholman commented at 10:02 AM on June 1, 2014: contributor

@sipa how is this? uses a class to encapsulate the mutex + boolean flag + timeout.

sipa commented at 10:55 AM on June 1, 2014: member

This segfaults for me during tests, because condMessageHandler is not necessarily initialized in those.

How about not making it a pointer, so it's initialized together with other global variables at startup?

ashleyholman commented at 11:17 AM on June 1, 2014: contributor

Done.

ashleyholman commented at 3:54 PM on June 4, 2014: contributor

I've reliased that changing the frequency of ThreadMessageHandler's loop has implications on the "trickle" activity. The trickling is designed around sending messages to 1 trickle node every 100 ms. To make the message handling real-time without effecting the trickling, should these be split into 2 separate threads?

ashleyholman commented at 4:00 PM on June 4, 2014: contributor

Or, it could just track the last time it trickled, and use this to determine whether to assign a trickle node or not for that iteration. This, combined with a 100ms sleep timeout, should keep the behaviour the same.

laanwj commented at 7:51 AM on July 8, 2014: member

Looks good to me, going to test this.

ashleyholman commented at 8:42 AM on July 8, 2014: contributor

@laanwj good to hear :). I'm also looking for thoughts on handling the trickle behaviour as per my comments above. I think in its current state, the patch will not trickle at the same rate. The two options I could think of were either having a separate thread for trickling, or tracking a "last trickle" timestamp in ThreadMessageHandler and using that to calculate how long to do the timed_wait() for so that it wakes up at least once every 100ms.

laanwj commented at 9:52 AM on July 8, 2014: member

To me, splitting it into two different threads sounds like the cleanest and least brittle solution here. A trickle thread that does its operations exactly every 100ms, and a message handler that wakes up only when necessary. After all, these are different concerns.

At some point we could switch to an async framework with deadline timers and such for the P2P code so that it can all be handled in one thread, but trying to do that by hand sounds somewhat error prone and hard to maintain.

sipa added this to the milestone 0.10.0 on Jul 8, 2014

sipa commented at 6:07 PM on July 8, 2014: member

I marked this as a milestone for 0.10. Reducing propagation latency is very essential imho, and just shaving off ~40ms on average for every hop is huge.

Regarding trickling: I think we don't nearly need 100ms intervals for that. How about once every 5s?

I'm not sure about every other processing happening in SendMessages, but a few seconds doesn't sound too bad.

sipa commented at 6:15 PM on July 8, 2014: member

Just had a look at how frequently addr messages are effectively trickled out on bitcoin.sipa.be... turns out to be one every few seconds.

ashleyholman commented at 8:48 PM on July 8, 2014: contributor

There is some analysis of the trickling code in this paper: http://arxiv.org/pdf/1405.7418.pdf in the "Address propagation" and "Transaction propagation" sections. Based on this paper and also reading the code, I think there might be a potential to impact transaction propagation times if the trickle interval were increased.

ashleyholman renamed this:
~~ThreadMessageHandler: use semaphore instead of MilliSleep() [feedback only]~~
ThreadMessageHandler: replace MilliSleep() with condition variable
on Jul 8, 2014

ashleyholman commented at 12:52 AM on July 9, 2014: contributor

More specifically, when a transaction is received, it has a 25% chance of being relayed to all peers immediately upon first call to SendMessages(). The other 75% of transactions will be delayed and pushed into the vInvWait queue which will only be sent to each peer when they are randomly chosen as the trickle node. So as a transaction propagates through the network, there might be some unlucky nodes who don't get any peers instantly relaying it to them, so they have to wait for 1 or more trickle hops before they see it. There might even be nodes who have to wait 2 or more trickle hops to see it. So I believe having a low trickle interval should ensure fast worst-case propagation. Fast propagation of unconfirmed transactions should be important for responsiveness when making a payment (eg. PoS or online checkout).

ashleyholman commented at 11:33 AM on July 9, 2014: contributor

Just pushed a new patch which splits the trickle activity into its own thread.

I've kept the 100ms MilliSleep in this thread, but can change that if required.

Note that the ThreadMessageHandler code currently causes a potential deadlock warning (Not introduced by this pull, but already there in master - reported in #4493). Since this patch makes ThreadMessageHandler() fire more often, it could increase the risk of that deadlock. But as per my notes on the issue, I think it might be a harmless warning with no actual deadlock potential.

mikehearn commented at 12:01 PM on July 9, 2014: contributor

I'm not totally convinced that this trickle behaviour is worth it. Performance is super critical for happy users and the trickle seems to be intended to make it harder to track the flow of transactions across the network. But:

It's quite feasible to just connect to all publicly reachable nodes and watch propagation this way
Government level adversaries would be unlikely to be impacted by timing randomisation anyway

Should we just remove it?

laanwj commented at 12:33 PM on July 9, 2014: member

NACK on removing it. It is a sensible precaution. With the payment protocol, which is the way forward, the payer gives the merchant the transaction through a direct channel so propagation time through the network is not important for user experience.

Going to test this new version of the pull.

gavinandresen commented at 1:00 PM on July 9, 2014: contributor

Simplicity is better than complexity-- I vote remove it.

The trickle code was implemented before we supported running over Tor; Tor is a much better (but still not foolproof) way to get more private transactions.

ashleyholman commented at 11:49 PM on July 9, 2014: contributor

I have done an audit to find events that should require SendMessages() to run:

CNode::fPingQueued set (ping())
CNode::vAddrToSend pushed (CNode::PushAddress())
CNodeState::fShouldBan set (Misbehaving())
CNodeState::rejects pushed (InvalidBlockFound())
CNode::fStartSync set (StartSync())
CNode::vInventoryToSend pushed (CNode::PushInventory())
CNodeState::nBlocksInFlight decreased (MarkBlockAsReceived())
CNode::mapAskFor pushed (CNode::AskFor())

So I will patch those to wake up ThreadMessageHandler().

However, there is a problem in that SendMessages() is not guaranteed to do all the work required of it. If it fails its TRY_LOCK on cs_main, it only sends pings, and postpones sending of all other message types until its next invocation when cs_main can "hopefully" be obtained. So there is already going to be the possibility of messages being delayed for arbitrary lengths of time - not good.

Given that SendMessages() doesn't guarantee sending, it is not going to be enough to just fire it once per relevant event and sleep inbetween. SendMessages() needs to be called regularly so that it can re-try on cs_main and send any pending messages that were previously postponed. Furthermore, SendMessages() is responsible for stalled peer detection which is not really event-based, so that also requires SendMessages() to be called at least somewhat regularly.

Given this need to run regularly, the timed_wait() in ThreadMessageHandler going to be strictly required rather than acting as a safety net, and the sleep time should probably be kept low.

In order to have a better guarantee on propagation times, I think SendMessages() should always send all outgoing messages every time it is invoked, ideally without requiring cs_main at all. It currently needs cs_main to call IsInitialBlockDownload() and obtain the CNodeState information. As discussed in #4493, IsInitialBlockDownload could be modified to not require the lock. But, cs_main would still be needed for sending rejection messages, banning peers, and requesting blocks. Delaying the banning/rejection messages is probably not too bad, but it would be good if requests for blocks were not delayable.

ashleyholman commented at 12:40 AM on July 10, 2014: contributor

Pros of removing trickling:

Faster transaction propagation (All peers blast the tx 100% of the time)
Simpler to understand code

Cons of removing trickling:

Reduced privacy for non-tor users
Increased burst of inv messages across the network may consume more bandwidth
Increased risk from changing protocol behaviour

This paper is worth a read for anyone who hasn't read it: http://arxiv.org/pdf/1405.7418.pdf. Their deanonymisation attack involves forcing people off of tor by getting the exit nodes banned.

I might try contacting those researchers, because I would be curious to get their opinion on this given they have obviously studied the topic in depth and implemented an attack.

laanwj commented at 6:23 AM on July 10, 2014: member

@ashleyholman Thanks for the work on analyzing the network code! Maybe an intermediate solution, as long as cs_main is still needed for some actions, would be to do a timed wait only when it failed to acquire the lock, and a non-timed wait otherwise?

Let's move the discussion about moving or not removing trickle to the mailing list. Don't combine major behaviour changes with refactoring. Having a separate trickle thread is fine for now. It can be easily disabled, even as an option.

in src/sync.h:None in 3df57a19af outdated

   5 | @@ -6,6 +6,8 @@
   6 |  #ifndef BITCOIN_SYNC_H
   7 |  #define BITCOIN_SYNC_H
   8 |  
   9 | +#include "util.h"

laanwj commented at 7:10 AM on July 10, 2014:

I'd like to avoid introducing this include-file dependency if possible

ashleyholman commented at 8:12 AM on July 10, 2014: contributor

Maybe an intermediate solution, as long as cs_main is still needed for some actions, would be to do a timed wait only when it failed to acquire the lock, and a non-timed wait otherwise? @laanwj That sounds good. It could at least do a shorter timed wait when the lock failed, and a longer one when it succeeds. The longer one is needed so that it can still do stalled peer detection. However, there is an added complexity: each time ThreadMessageHandler wakes up, it does many calls to SendMessages() - once for each individual peer. So what should it do if, say, the lock failed for only 1 out of the N peers?

laanwj commented at 9:43 AM on July 10, 2014: member

I think if it failed to acquire the lock for at least one peer, it needs a timed re-try.

In principle it is really inefficient that it has to acquire the lock many times, but that's harder to solve.

ashleyholman commented at 11:15 AM on July 10, 2014: contributor

It could perhaps acquire the lock once before the SendMessages loop. That way, all the SendMessages calls will succeed on the locks. I don't think the message handler currently spends much time without the lock anyway.

The only difference is that there wouldn't be a gap inbetween calls for other threads to obtain cs_main. If that's a problem, it could perhaps acquire the lock, then do SendMessages for the first 10 peers, then free the lock, then reacquire, do the next 10 peers, etc.

What do you think about that? The good thing about doing it that way is that messages won't have any chance of being "delayed".

From my work on the peers GUI, I recall that cs_main was actually often in use. I was doing TRY_LOCK on it every second, and it would fail maybe 20% of the time, even when my node wasn't very busy. In the case of having to run through 120 peers, I bet at least one of them would fail most of the time.

ashleyholman commented at 11:19 AM on July 10, 2014: contributor

Coincidentally, that would fix the potential deadlock issue as well.

ashleyholman commented at 12:22 PM on July 10, 2014: contributor

My latest patch implements some more trigger points to wake up ThreadMessageHandler. This covers off on all cases I can identify where messages need to be sent.

I've also removed the dependence on util.h.

laanwj commented at 12:26 PM on July 10, 2014: member

@ashleyholman Indeed - the intent behind releasing the main lock once in a while is to give other threads a chance. For example the GUI can appear to hang on some actions if the main lock is held for too long. I'm not sure if the network thread will hold the lock for long, though. The longest main-lock holders have always been wallet rescanning and some block processing.

sipa commented at 12:28 PM on July 10, 2014: member

I think we should pushing cs_main down, not up.

The whole reason for worrying about it is because the validation logic locks cs_main for very long times, so we need workarounds for dealing with it.

Also, this code should not cause a deadlock:

thread 1: lock(a) ... lock(b)
thread 2: lock(b) ... try_lock(a)

The typical deadlock scenario would be 1 acquiring a, 2 acquiring b, followed by 2 trying to get b, and 1 trying to get a, causing both to wait for eachother. As the second step in thread 2 is a try, it will eventually release b without acquiring a, causing thread 1 to continue.

ashleyholman commented at 12:34 PM on July 10, 2014: contributor

@sipa I concur. It does produce a "potential deadlock" warning though, because I guess the deadlock detection doesn't know that the 2nd lock is conditional on try_lock().

What I mean about ThreadMessageHandler is that I think it already spends most of its waking time with cs_main, but agreed that its not much time overall. So if it acquired cs_main upon wakeup and freed it before sleeping, this would allow all pending messages to be sent every time rather than randomly failing for some peers, causing arbitrary message delays.

So what about either, acquiring once, processing all SendMessages, then releasing. Or, acquiring and releasing in batches, doing X peers at a time?

ashleyholman commented at 12:47 PM on July 10, 2014: contributor

To be clear about the change I'm describing:

Make cs_main a precondition for calling SendMessages()
Remove the TRY_LOCK / early return logic from SendMessages. It now does all work required of it, guaranteed.
Acquire cs_main in ThreadMessageHandler, prior to calling SendMessages().
The above acquisition of cs_main can be batched to hold cs_main for many invocations of SendMessages() (because it has to call SendMessages for each peer).

ashleyholman commented at 1:32 PM on July 10, 2014: contributor

Just got another idea.. what about splitting SendMessages into two separate functions. One that requires cs_main and one that doens't. Most work can go into the non-lock one, eg. ping, inv, and addr messages. The cs_main function will be responsible for the other parts that need the lock, the most important of these being getdata messages which impact block propagation time.

I'm still suggesting cs_main be mandatory, but at least it would be over a smaller section of code.

ivanpustogarov commented at 4:36 PM on July 10, 2014: contributor

We think that removing trickling will definitely make Bitcoin less anonymous for non-Tor users. The attack has three (independent) parts:

Linking a user's IP address to his entry nodes (his 8 outgoing connections). Possible only if Tor is not used.
Correlating transactions to entry nodes.
Linking together transactions of the same user during one session.

The trickling delay for 75% of transactions considerably decreases the rate of linkability for the transactions, i.e. parts 2) and 3). Without trickling the success rate of these parts would significantly increase (probably would become close to perfect). Thus linking different transactions of the same user during the session would be much easier. Moreover for parts 2) and 3) the attacker would need to make only 1 connection to each peer, instead of 20-50 in the current attack. This is much less noticeable. Also Tor is easy to ban in the current protocol.

As possible countermeasures we are considering now would be to

increase the percentage of trickled transactions (from 75% to 90%, this is of course not in the spirit of this pull request, and will increase tx propagation delays);
decrease the number of outgoing connections from 8 to 3 (this has an implication that the network becomes less connected).

Minor thought: Not having privacy issues in mind and completely relying on an external tool (Tor) might cause that anonymity is not achieved at all (https://blog.torproject.org/blog/bittorrent-over-tor-isnt-good-idea https://www.torproject.org/docs/faq.html.en#TBBFlash)

Alex, Ivan https://www.cryptolux.org

laanwj commented at 1:02 PM on July 11, 2014: member

@ivanpustogarov I agree, but as said please move the trickling discussion to the mailing list, removing trickling is off the table in this pull.

ashleyholman commented at 5:28 AM on July 12, 2014: contributor

@sipa @laanwj Please take a look at my latest commit if you get a chance. It's doing what I described above - splitting SendMessages() into two functions and requiring the lock for one of them.

There is the potential to move the sending of non-block getdata messages into the lockless function, but that would cause 2 getdata messages to be sent (1 getdata for blocks, another getdata for non-blocks) when the old code would combine them into one. So I've left all getdata messages in the locked function to keep the protocol behaviour unchanged.

If you don't like this approach, I can revert it and look at some of the alternatives discussed earlier.

PS: I've observed that when this patch is applied, ThreadTrickle() ends up running only a few times per second during times when cs_main is busy (eg. when validating a new block), instead of the target 10 times per second. This is nothing new though, because the existing code is already effected in the same way: it would run 10 times per second, but 8 of those times its TRY_LOCK on cs_main would fail so no messages would be trickled.

sipa commented at 4:00 PM on July 12, 2014: member

Adding a SendMessagesCSMain seems wrong from a design point. Main is a client of net, registering some handlers to be called when certain events occur. If there are some parts of SendMessages logic that do not rely on cs_main, that means that code shouldn't have been in main in the first place. And net certainly shouldn't know about or use cs_main itself.

How about moving the non-cs_main-dependent part of SendMessages to net.cpp (sort of as a base message handler, independent of the validation engine), and register it as a second SendMessages handler?

ashleyholman commented at 7:39 PM on July 12, 2014: contributor

@sipa Thanks for checking it out. It felt like a kludgy hack as I was doing it, but I wanted to get the concept into code. That sounds like a much better way to do it. As for having two handlers on SendMessages, I don't know how I would be able to batch up the cs_main-dependent work to run under one acquisition of the lock. I'd have to acquire cs_main inside the main.ccp SendMessages handler, once for each peer. I figured it would be better to get the lock once and do all the work at once. Do you think it's going to add any/much overhead to be acquiring and releasing it up to 125 times every time the handler fires?

sipa commented at 8:09 PM on July 12, 2014: member

I think we should just get the lock once per peer, and if necessary, move the acquiring of the sender or receiver node lock inside sendmessages too, if the ordering of locks requires that.

ashleyholman commented at 12:34 AM on July 13, 2014: contributor

I've reverted my previous two commits and pushed a new patch with the cleaner design. cs_main is now acquired inside SendMessages() for each peer. Note that SendMessages() now waits to acquire cs_main instead of returning when TRY_LOCK fails. I think this will result in lower latency since it doesn't delay messages till the next ThreadMessageHandler iteration.

edit: moving the send receive locks will be necessary to prevent deadlock now that there's no TRY_LOCK on cs_main.. fixed in next patch.

ashleyholman commented at 3:42 AM on July 13, 2014: contributor

I think this pull is ready for a closer review now, so I've pushed a fresh rebase and squashed everything into a single patch.

Thanks.

sipa commented at 8:13 PM on July 14, 2014: member

Testing.

Needs a rebase, though.

ashleyholman commented at 9:56 PM on July 14, 2014: contributor

Rebased.

sipa commented at 3:21 PM on July 16, 2014: member

We may need a separate per-node lock now for the fields used by the non-main message handler thread code, as otherwise for example the ping variables may be accessed by both the trickle and normal handler thread?

ashleyholman commented at 9:45 PM on July 16, 2014: contributor

SendMessagesNet locks pto->cs_vSend (node-specific lock) before it checks the ping flag and inventory list. So I think that should exclude the two threads from racing?

sipa commented at 9:52 PM on July 16, 2014: member

But ProcessMessage (which also accesses those variables) does not lock cs_vSend. The regular message handler can be running ProcessMessage, while the trickle thread is running SendMessages.

ashleyholman commented at 9:57 PM on July 16, 2014: contributor

ProcessMessages() is only called by ThreadMessageHandler, and it locks the node's cs_vRecvMsg before calling it. In ThreadTrickle it's only doing sending because that's the only part that uses the trickle flag.

sipa commented at 10:00 PM on July 16, 2014: member

So? What prevents ThreadMessageHandler -> ProcessMessages -> ProcessMessage -> "pong" from running simultaneously with ThreadTrickle -> SendMessages -> "ping" ?

ashleyholman commented at 10:03 PM on July 16, 2014: contributor

Ahh yes, I get it now. Sorry. Any suggestions on naming this new lock?

laanwj added the label P2P on Jul 31, 2014

ashleyholman commented at 8:27 AM on August 10, 2014: contributor

I have added a new lock, CNode::cs_ping, to protect the race condition in the ping/pong processing.

sipa commented at 9:47 AM on August 10, 2014: member

Untested ACK (and testing right now). I would abstract all the condMessageHandler.notify_one() calls behind a function in net, though.

ashleyholman commented at 12:02 PM on August 10, 2014: contributor

Replaced all message handler notifications with WakeMessageHandler().

ashleyholman commented at 8:32 AM on August 24, 2014: contributor

Does anyone need anything further from me on this? I need another ack on it. How about @laanwj since you have had some involvement?

I'll need to do one final squash/sign before the merge.

sipa commented at 7:55 PM on August 24, 2014: member

Tested ACK.

in src/sync.h:None in c4879b1bc3 outdated

 122 | +
 123 | +    void timed_wait(int milliseconds) {
 124 | +        boost::unique_lock<boost::mutex> lock(mutex);
 125 | +        alarm = boost::posix_time::microsec_clock::universal_time() + boost::posix_time::milliseconds(milliseconds);
 126 | +        while (!fHasWork && boost::posix_time::microsec_clock::universal_time() < alarm) {
 127 | +            condition.timed_wait(lock, alarm - boost::posix_time::microsec_clock::universal_time());

laanwj commented at 7:39 AM on August 25, 2014:

Due to using relative durations there's a slight window for a race condition here:

boost::posix_time::microsec_clock::universal_time() called the first time in the while() clause and returns a value < alarm
boost::posix_time::microsec_clock::universal_time() called the second time in the subtraction returns a value > alarm In this case either a negative value would be passed to timed_wait, or an integer overflow would happen resulting in a very long wait.

BTW: there is also a time_wait that takes an absolute system time, would that be useful here?

ashleyholman commented at 11:30 AM on August 25, 2014:

I just pushed a patch which changes it to use the absolute version. I tested and it is still waking up after 1 second as expected.

BitcoinPullTester commented at 11:40 AM on August 25, 2014: none

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4230_5eef5a482ce8be3c4d34605bf2dc1e57c5b4bb55/ for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

in src/sync.h:None in 5eef5a482c outdated

 120 | +        fHasWork = false;
 121 | +    }
 122 | +
 123 | +    void timed_wait(int milliseconds) {
 124 | +        boost::unique_lock<boost::mutex> lock(mutex);
 125 | +        alarm = boost::posix_time::microsec_clock::universal_time() + boost::posix_time::milliseconds(milliseconds);

sipa commented at 2:17 PM on August 26, 2014:

An alternative:

auto now = boost::posix_time::microsec_clock::universal_time();
alarm = now + boost::posix_time::milliseconds(milliseconds);
while (!fHasWork && now < alarm) {
    condition.timed_wait(lock, alarm);
    now = boost::posix_time::microsec_clock::universal_time();
}
fHasWork = false;

ashleyholman commented at 9:54 AM on August 28, 2014:

I suppose that will mean one less call to boost::posix_time::microsec_clock::universal_time(), so I will use that.

sipa commented at 8:00 PM on August 27, 2014: member

Needs rebase.

BitcoinPullTester commented at 10:20 AM on August 28, 2014: none

Automatic sanity-testing: FAILED MERGE, see http://jenkins.bluematt.me/pull-tester/p4230_eebbb0be605d37e9a4dd5079695ab59258886f67/ for test log.

This pull does not merge cleanly onto current master This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

ashleyholman force-pushed on Aug 28, 2014

ashleyholman commented at 11:24 AM on August 28, 2014: contributor

Rebased and squashed. Let me know if you need anything else!

BitcoinPullTester commented at 11:26 AM on August 28, 2014: none

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4230_fe5491a91d7e500df90ef1d69bdc3d5dbabc6dc9/ for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

BitcoinPullTester commented at 12:08 PM on August 28, 2014: none

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4230_81da9c77b5e9a93fe2687061f68c29e0b94f36eb/ for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

in src/net.cpp:None in 81da9c77b5 outdated

1595 | @@ -1594,14 +1596,138 @@ void ThreadMessageHandler()
1596 |          }
1597 |  
1598 |          if (fSleep)
1599 | -            MilliSleep(100);
1600 | +            condMessageHandler.timed_wait(1000);
1601 |      }
1602 |  }
1603 |  
1604 | +void ThreadTrickle () {

Diapolo commented at 9:50 PM on August 28, 2014:

Nit: As this is a function, can you start with { in the next line?

ashleyholman commented at 9:58 AM on August 31, 2014:

Pushed a new patch with the brace moved.

ashleyholman force-pushed on Aug 31, 2014

BitcoinPullTester commented at 10:21 AM on August 31, 2014: none

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4230_c75bae764b936fde9d691bbe39115286d486ad61/ for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

Refactor message handling for lower latency #4230

- ThreadMessageHandler woken up by events that require it
- TRY_LOCKs replaced with LOCKs to limit worst-case delays
- Trickle behaviour split into separate thread
- SendMessages() split in two: original now only handles cs_main-dependent work

28a7464bb7

in src/net.cpp:None in c75bae764b outdated

 662 | @@ -662,6 +663,10 @@ bool CNode::ReceiveMsgBytes(const char *pch, unsigned int nBytes)
 663 |          if (handled < 0)
 664 |                  return false;
 665 |  
 666 | +        if (msg.complete())

TheBlueMatt commented at 6:44 AM on September 2, 2014:

Why not push this conditional to the end with the other one?

ashleyholman commented at 9:47 AM on September 2, 2014:

Done

ashleyholman force-pushed on Sep 2, 2014

BitcoinPullTester commented at 10:00 AM on September 2, 2014: none

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4230_28a7464bb7e851800c4924162ecd9d1ec5961288/ for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

sipa commented at 9:19 PM on September 5, 2014: member

@laanwj @gmaxwell @jgarzik Care to review/test this?

gmaxwell commented at 9:22 PM on September 5, 2014: contributor

I'm on it now. Thanks for the heads up— this fell off my radar after the initial issues. Thanks for your patience @ashleyholman.

laanwj commented at 2:44 AM on September 6, 2014: member

I've been testing this for quite a while time already, no problems. ACK.

sipa commented at 5:24 PM on September 8, 2014: member

Just did a (almost) full resync from this + #4834 as (one of the servers), with #4468 + #4834 as client. No problems, and the client reports low (~30ms) ping latencies during sync for that server.

gmaxwell commented at 10:27 PM on September 8, 2014: contributor

I've had this running in valgrind for few days and run it through some paces, e.g. bouncing connections, flooding it with inbound connections and messages... Seems to be holding up well. Hurray.

I'm somewhat concerned with peppering the code with wakeups. It's not always clear where they're needed and why, and I worry that if a future patch adds code that needs a wake that it will be missed. Is there some clean rule about where they're needed that I've missed?

Can you help me understand why the wakes outside of RecieveMsg are actually required?

(In theory, if the timout was reduced to 100ms we could merging back in the trickle thread, with just a check to prevent it from running too often... but I think it actually makes long term design sense to have a thread for periodic events; so I'm not opposed to that even though it does mean another couple megabytes of memory usage...)

ashleyholman commented at 8:43 AM on September 9, 2014: contributor

The wakeups were placed according to an audit I did, detailed in this comment. I was looking for places in the code that changed any state that would cause new messages to send next time SendMessages() runs. But upon thinking about this further, most of the times when that state is changed, it's happening during ProcessMessage(), which is always followed by SendMessages() anyway, so it looks like most of those can be removed. In the other cases, it might not be so important, like when an RPC 'ping' command is issued, maybe it's fine to wait until the next ThreadMessageHandler run (max 1 second) to send the pings.

However there is one event that I think is important to wake up on, which is PushInventory(). This gets called from ActivateBestChain() when we have a new block. To speed up propagation time, it should wake the message handler immediately so that it can send the INVs out to all peers right away.

Re: the separate trickle thread, both options were considered and I ended up taking advice from [@laanwj](/bitcoin-bitcoin/contributor/laanwj/) to have a new thread.

sipa commented at 3:00 PM on September 9, 2014: member

I would leave the trickle as a separate thread (it's some extra memory, but it's a simple to understand model), and moving part of the protocol handling out of main is also nice.

If every processmessage call is followed by a sendmessage call, I would suggest removing the Wake calls in the processmessage path (note that ActivateBestChain is also only called from within ProcessMessage...). Are there any wakes left that are 1) outside of this path and for which 2) sub-second latency matters?

sipa commented at 6:58 PM on September 9, 2014: member

Undo ACK.

I missed something. The trickle thread will call SendMessages with fSendTrickle, but all non-trickle behaviour therein will also run. This means many more data structures that are potentially accessed from multiple threads (just saw a use-after-reallocate in valgrind).

If we need the trickle thread, I would suggest creating a new entry in CNodeSignals for the trickling, and for handlers that need it, they can independently register that. This means that the current (in this PR) code from main.cpp's SendMessages will be split up over two functions, which potentially run in parallel (so all data structures shared between them need locking).

ashleyholman commented at 7:41 PM on September 9, 2014: contributor

For some reason I thought ActivateBestChain() would have been called from another thread that does block validation. But still, I'm still thinking it requires the wakeup because it needs to send a messages to all peers. We can only guarantee that SendMessages is going to run for the current peer who sent the block, plus any other peers that come after it in the vNodes list.

ashleyholman commented at 10:50 AM on September 15, 2014: contributor

@sipa: SendMessages in main is locking pto->cs_vSend, so shouldn't that prevent two threads from running it concurrently for the same CNode? SendMessagesNet does the same with pto->cs_vSend.

The only potential concurrency I can see from code outside of those functions is with access to CNode::vAddrToSend. There are a few places that are either accessing vAddrToSend directly, or using PushAddress. If I add a lock for all vAddrToSend operations, would that fix it?

ashleyholman commented at 8:08 PM on October 13, 2014: contributor

I'm ready to give this some attention again. It feels like it is close, so it would be good to try to get it over the line and integrated. The only issue I am aware of is the race condition identified by @sipa. If there needs to be significant refactoring to deal with this then let me know. Otherwise if there's some simpler way to fix it then that would be good, considering all the testing that has already been done.

sipa commented at 9:00 PM on October 13, 2014: member

Independently, I would really prefer to see a separate Trickle handler, where the trickle behavior and nothing else runs, from the trickle thread. It's hard enough to reason about the correctness already.

gmaxwell commented at 9:03 PM on October 13, 2014: contributor

I'd really like to see this get in soon, but I'm still uncomfortable with the manual wakeups, which I think we'll never manage to keep updated correctly.

laanwj added this to the milestone 0.11.0 on Oct 18, 2014

laanwj removed this from the milestone 0.10.0 on Oct 18, 2014

laanwj commented at 9:13 AM on October 18, 2014: member

Moved the milestone to 0.11 - I agree this should get in, but this is not ready yet, and a high-impact network code change like this needs some time to be tested in master after being merged.

gmaxwell commented at 6:58 PM on October 20, 2014: contributor

@laanwj I think we were thinking that its possible to achieve this with a much smaller and simpler diff (e.g. one that wakes immediately on recieved but still polls so that we're sure it's safe with no sprinkled wakeups).

I think this delay is currently the biggest latency source for block propagation on the network.

laanwj commented at 12:58 PM on March 18, 2015: member

This pull has been inactive for almost 6 months. Would be great if someone picked this up, but as there isn't enough confidence in this implementation as-is, and thus it won't merged in this state, I'm closing the pull. @ashleyholman let me know if you intend to work on this again, I'll reopen.

laanwj closed this on Mar 18, 2015

DrahtBot locked this on Sep 8, 2021