net: drop boost::thread

theuni commented at 3:02 am on December 6, 2016: member

(Split out and rewritten chunk of #8631. I’ll rebase that on top of this post-merge and reopen.)

This is a prerequisite for async network handling. As-is, we don’t have enough control over the shutdown process to be able to deal with async connecting and send/recv.

In particular, we need to be sure that message processing has shut down before forcing all networking down, otherwise we run the risk of trying to process a node’s messages during its destruction. This is not a problem now because the current sync model works fine with all threads being interrupted at the same time.

And if that’s not a satisfactory reason for the change, it also gets rid of a nice chunk of boost threading (and simplifies the rebase of #8631 :)

To accomplish this, we need to:

Set a flag to interrupt all net threads
notify all blocked condvars
replace MilliSleep (since they’re boost interruption points)
teach init to Interrupt/Stop, similar to some of the other subsystems.

The MilliSleeps are replaced with condvars that check for the thread’s interrupt flag. Since the flags are atomic, there’s no need for real locking, so these condvars use a dummy CNullLock.

With all of that done, we may as well switch away from boost threads, since we’re no longer dependent on interruption. @TheBlueMatt 3f3f0b4bce0fb34e803b4893594b44df1f2effe3 is the awkward change I mentioned on IRC. The global will be cleaned up when we move processing into a class as a next step (something like https://github.com/theuni/bitcoin/commit/3f598dbe7100c7c6c7bfb7e10210585327ed9d31)

MarcoFalke added the label Refactoring on Dec 6, 2016

laanwj commented at 12:24 pm on December 6, 2016: member

Concept ACK

sipa commented at 7:35 pm on December 6, 2016: member

Would it make sense to introduce a utility function like MilliSleep which takes a reference to a condition variable and an atomic flag? The pattern that replaces the MilliSleep calls is pretty complicated and repeats several times.

fanquake commented at 10:57 am on December 7, 2016: member

Concept ACK

theuni commented at 2:05 pm on December 8, 2016: member

@sipa Sure, will do.

theuni commented at 1:07 am on December 9, 2016: member

@sipa done. I’m not sure if it’s generic enough to add to utiltime, happy to move it back to net if that makes more sense.

Also fixed up the TODO that I was mistaken about.

theuni force-pushed on Dec 16, 2016

theuni commented at 0:51 am on December 16, 2016: member

Just noticed that this PR would’ve broken interruptible proxy recvs, so I pushed a commit to fix that. The proxy code will soon be dumped anyway, so I’m ok with the hackish global there for now.

Also squashed down the header fix while I was at it.

TheBlueMatt commented at 6:06 am on December 21, 2016: member

Does each thread need its own atomic_flag? We currently dont use it and I’m not sure I envision a scenario where you want to interrupt only some of the threads. Would simplify the code a decent chunk, I think.

I super dont like using atomic_flags and resetting them to dont-wake every time you check them…I think its fine in this PR, but it just seems like a bad idea generally. atomic_bool should be just fine, too :).

JeremyRubin commented at 6:51 am on December 21, 2016: contributor

@TheBlueMatt I think you might be confused as to how the atomic flags are used in this code. They’re used in “clear triggered mode”, so there is no need to reset them after every check. The other usage (“test_and_set triggered mode”) would introduce race conditions when the checking thread clears the flag as a concurrent interrupt could be cleared.

TheBlueMatt commented at 8:50 am on December 21, 2016: member

@JeremyRubin I see how they’re used, am only commenting on the likelihood of this pattern introducing future bugs. Eg someone uses break instead of return and then a loop gets moved around, making the thread no longer exit.

On December 20, 2016 10:51:14 PM PST, Jeremy Rubin notifications@github.com wrote:

@TheBlueMatt I think you might be confused as to how the atomic flags are used in this code. They’re used in “clear triggered mode”, so there is no need to reset them after every check. The other usage (“test_and_set triggered mode”) would introduce race conditions when the checking thread clears the flag as a concurrent interrupt could be cleared.

– You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/pull/9289#issuecomment-268450262

JeremyRubin commented at 7:19 pm on December 21, 2016: contributor

@TheBlueMatt ok I was responding to “atomic_flags and resetting them to dont-wake every time you check them.”, not sure what you meant by that if not my earlier interpretation.

I agree with you on the semantics being off if your worry is future code breaking it. We should either make it throw an Interruption, or call std::terminate. Changing it to an atomic_bool will not have any effect on reducing likelihood future code changes don’t break intended behavior. Implemented here for comparison https://github.com/theuni/bitcoin/compare/connman-threads...JeremyRubin:connman-threads The only negative of this approach is that local cleanup can’t be handled as neatly.

TheBlueMatt commented at 7:41 pm on December 21, 2016: member

@JeremyRubin well my note was that if you use an atomic_bool instead of a flag (which means you can check without resetting the state to the dont-exit-now state) then even if there is a bug the worst that happens is you will exit the thread at the next check.

TheBlueMatt commented at 8:06 pm on December 21, 2016: member

@JeremyRubin sadly I’m not sure about the execution propagation from within condition_variable…cppreference seems to indicate it was cleaned up in C++14, but its unclear if its guaranteed to propagation from the predicate or if it just “may”.

JeremyRubin commented at 8:59 pm on December 21, 2016: contributor

Thanks for clarifying the kind of bug you were imagining, I had a different type of bug in mind (there being no code flow which can cause a termination, rather than being catchable on the next iteration).

You’re absolutely right on the “may” for exceptions thrown from wait_for. Semantics should be fixed now (moved InterruptibleSleep into interruption_point and made the throwing external to wait_for).

TheBlueMatt commented at 9:28 pm on December 21, 2016: member

@JeremyRubin ehh, I mean yes regaring wrappers, no regarding exceptions - I think its generally accepted that using exceptions in C++ is bad, and I’m not sure we want to add any more uses than we already have.

theuni commented at 2:43 am on December 22, 2016: member

@JeremyRubin and @TheBlueMatt Thanks for having a look.

After a debate (probably more than necessary), and running it by a third party, I think I’ll concede and just use an atomic bool instead. The convincing argument was that a bool allows us to deviate from the current “got an interrupt, bail from thread!”, and lets us do quick on-thread teardown post-interruption. It will need to stay in check though, our usages of Interrupt() assume that it won’t block.

theuni commented at 7:19 am on December 22, 2016: member

@TheBlueMatt I pushed up a new attempt to https://github.com/theuni/bitcoin/commits/connman-threads-redo. I think you’ll like that much better.

I need to split a few non-obvious things into different commits, but mind having a quick look at the concept?

I borrowed @JeremyRubin’s wrapper idea but implemented it a bit differently: https://github.com/theuni/bitcoin/commit/7b4d8f608cca17fba7311c310e74c4cbd7671b73#diff-09404f8fdd1f69609d329cb0b1f1252e

The usage is (imo) pretty straightforward: https://github.com/theuni/bitcoin/commit/7b4d8f608cca17fba7311c310e74c4cbd7671b73#diff-9a82240fe7dfe86564178691cc57f2f1L1660

TheBlueMatt commented at 6:17 pm on December 22, 2016: member

@theuni Yup, really like that version.

theuni force-pushed on Dec 22, 2016

in src/threadinterrupt.h: in e0bdb1d62c outdated

 5+
 6+#include <mutex>
 7+#include <condition_variable>
 8+#include <atomic>
 9+
10+class CThreadInterrupt

TheBlueMatt commented at 11:34 pm on December 22, 2016:

Generally, I think it might make sense to just rename this “InterruptibleSleeper” and have it own the mutex and condition_variable. Then you dont have to have its owner own memory it points to and should still be able to keep +/- the same API.

theuni commented at 11:45 pm on December 22, 2016:

Sometimes you’ll want a real interruptible condvar, though. See https://github.com/theuni/bitcoin/commit/da68e3cf5832878fbb1951eb1eeb01d8cd22d1e1#diff-9a82240fe7dfe86564178691cc57f2f1R1887.

The alternative would be to add wait_for/wait_until/notify_one/notify_all to CThreadInterrupt, then just forward them to the condvar. But for that, you’d still need to expose the mutex.

theuni commented at 11:53 pm on December 22, 2016:

I suppose this could be rigged up with a separate condvar/mutex and some additional manual isinterruped checks.

in src/threadinterrupt.h: in e0bdb1d62c outdated

42+ * @param   rel_time maximum time to wait. Should be a std::chrono::duration.
43+ * @param   threadInterrupt The interrupt that may wake the sleep
44+ * @returns false if the sleep was interrupted, true otherwise
45+ */
46+template <typename Duration>
47+bool InterruptibleSleep(const Duration& rel_time, CThreadInterrupt& threadInterrupt)

TheBlueMatt commented at 11:35 pm on December 22, 2016:

Can we just move this into a member function of the class?

Also, does it need to be a template? I think if you set Duration to something that the various durations can be converted into it should do so intelligently. Would mean less stuff to compile in headers.

theuni commented at 11:48 pm on December 22, 2016:

This started out as a member function, but interrupt.sleep_for() felt awkward conceptually.

We can probably just force this to milliseconds. If we need microsecond precision later, we can just add another function.

in src/threadinterrupt.h: in e0bdb1d62c outdated

 9+
10+class CThreadInterrupt
11+{
12+public:
13+    CThreadInterrupt(std::condition_variable& condIn, std::mutex& mutIn) : cond(condIn), mut(mutIn), val(false) {}
14+    void reset()

TheBlueMatt commented at 11:36 pm on December 22, 2016:

Please no new implementations-in-headers. I know its mostly single-line stuff but we’re fighting to reduce compile-time memory usage, so every little bit helps. :)

theuni commented at 11:46 pm on December 22, 2016:

Sure. Though some of these need to be inlined. I figured someone would yell if I added a .cpp with 2 functions :p

TheBlueMatt commented at 11:37 pm on December 22, 2016: member

Why did the socket handler startup need to move up? Did I miss a bug that woule be cause by it not or did you just prefer it that way? (dont care, just asking to make sure I’m aware)

There is still a MilliSleep call in ThreadSocketHandler. Is that intentional?

theuni commented at 11:51 pm on December 22, 2016: member

Socket handler moved up just so that there’s a consistent start/stop order.

Thanks for pointing out the MilliSleep, definitely not intentional!

TheBlueMatt commented at 0:04 am on December 23, 2016: member

Yes, I guess my conception is different - if we make it a generic wrapper for condition variables then moving sleep into it would make sense. I thought some about what that would mean for net_processing, but missed that you’d need to expose the lock (exposing interrupt seems otherwise reasonable to me).

No longer convinced which one is nicer, up to you.

On December 22, 2016 3:48:21 PM PST, Cory Fields notifications@github.com wrote:

theuni commented on this pull request.

std::condition_variable& cond;

std::mutex& mut;

std::atomic val;

template

friend bool InterruptibleSleep(const Duration&, CThreadInterrupt&); +};

+/*

Sleep for the stated period of time, interruptible by clearing the flag and notifying the condvar.

@param rel_time maximum time to wait. Should be a std::chrono::duration.

@param threadInterrupt The interrupt that may wake the sleep

@returns false if the sleep was interrupted, true otherwise

*/ +template +bool InterruptibleSleep(const Duration& rel_time, CThreadInterrupt& threadInterrupt)

This started out as a member function, but interrupt.sleep_for() felt awkward conceptually.

We can probably just force this to milliseconds. If we need microsecond precision later, we can just add another function.

– You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/pull/9289

TheBlueMatt commented at 0:19 am on December 23, 2016: member

Actually, I’m not sure you need to expose the lock, you just need to take the lock on the notifying thread prior to the notify but after setting the wakeup condition.

If you want to try it this way, I’d say you’d want to have two wakeup conditions, a wake-up and an interrupt condition, and just expose the various operations that you need. Not sure it’s worth it, but I think that would be the cleanest API (modulo the too-much-wrapper potential issue).

On December 22, 2016 3:53:28 PM PST, Cory Fields notifications@github.com wrote:

theuni commented on this pull request.

@@ -0,0 +1,51 @@ +// Copyright (c) 2009-2010 Satoshi Nakamoto +// Copyright (c) 2009-2016 The Bitcoin Core developers +// Distributed under the MIT software license, see the accompanying +// file COPYING or http://www.opensource.org/licenses/mit-license.php.

+#include +#include <condition_variable> +#include + +class CThreadInterrupt

I suppose this could be rigged up with a separate condvar/mutex and some additional manual isinterruped checks.

– You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/pull/9289

theuni commented at 1:02 am on December 23, 2016: member

That would require the wakeup condition to be atomic, no?

TheBlueMatt commented at 2:16 am on December 23, 2016: member

Yes, I believe that implies the wakeup condition must be atomic, but it can be release/acquire since a mutex unlock implies a full memory barrier.

In the other case that I proposed (doing it with two atomics - one for wake-up and one for interruption all hidden inside the object they would both be atomic and you’d want ClearWakeupState() and Wake() functions).

On December 22, 2016 5:02:46 PM PST, Cory Fields notifications@github.com wrote:

That would require the wakeup condition to be atomic, no?

– You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/pull/9289#issuecomment-268923628

theuni commented at 1:49 am on December 24, 2016: member

@TheBlueMatt Pushed a round of changes after last night’s IRC discussion. I hope we can compromise on this approach.

TheBlueMatt commented at 2:05 am on December 24, 2016: member

Looks good, will do a more thourough review when you add threadinterrupt.cpp to git :)

theuni force-pushed on Dec 24, 2016

theuni commented at 4:00 am on December 24, 2016: member

grr.. added.

TheBlueMatt commented at 6:39 pm on December 24, 2016: member

utACK 992bbc01a0d33fdf47a55336f82825f43ad29522 You can remove the boost/thread.hpp include in netbase now, I believe.

in src/netbase.cpp: in e0bdb1d62c outdated

716@@ -715,3 +717,8 @@ bool SetSocketNonBlocking(SOCKET& hSocket, bool fNonBlocking)
717 
718     return true;
719 }
720+
721+void InterruptSocks5(bool interrupt)

gmaxwell commented at 10:34 pm on December 24, 2016:

Do we really want symbols that differ only in case?

theuni commented at 5:21 am on December 25, 2016:

Ah yes, probably unwise. Will change the atomic.

theuni force-pushed on Dec 26, 2016

theuni commented at 7:22 pm on December 26, 2016: member

Updated and squashed. The diff from 992bbc0 can be seen here: https://gist.github.com/theuni/32409c39e045cd3f2a6d6e4316d228ec.

The only things changed in the squash (can be seen in the diff above):

Rename interruptSocks5 to avoid a symbol with same-name-different-case (and make it static while we’re at it)
Initialize the atomics in DoS_tests

One commit was added on top to cleanup includes.

in src/net_processing.cpp: in c827c1ac15 outdated

900@@ -901,7 +901,9 @@ void static ProcessGetData(CNode* pfrom, const Consensus::Params& consensusParam
901 
902         const CInv &inv = *it;
903         {
904-            boost::this_thread::interruption_point();
905+            if(interruptMsgProc)

sipa commented at 4:51 pm on December 27, 2016:

Style nit (in many places): space after if.

sipa commented at 4:55 pm on December 27, 2016: member

I’d like to review this commit by commit, but GitHub shows them out of order. Can you reorder them?

theuni force-pushed on Dec 27, 2016

theuni commented at 10:15 pm on December 27, 2016: member

@sipa reordered and fixed the “if(“s. I didn’t rebase for now to avoid invalidating @TheBlueMatt’s review.

fanquake added this to the milestone 0.14.0 on Dec 29, 2016

theuni force-pushed on Dec 29, 2016

theuni commented at 7:44 pm on December 29, 2016: member

rebased at @TheBlueMatt’s request.

in src/threadinterrupt.cpp: in bf2b0ad258 outdated

0@@ -0,0 +1,41 @@
1+// Copyright (c) 2009-2010 Satoshi Nakamoto

sipa commented at 7:53 pm on December 29, 2016:

Satoshi did not write this… as far as I know at least? Nor was it written in 2009.

theuni commented at 6:50 pm on December 30, 2016:

Caught me!

in src/net.cpp: in 24e17858f4 outdated

1886@@ -1877,8 +1887,10 @@ void CConnman::ThreadMessageHandler()
1887                 pnode->Release();
1888         }
1889 
1890-        if (fSleep)
1891-            messageHandlerCondition.timed_wait(lock, boost::posix_time::microsec_clock::universal_time() + boost::posix_time::milliseconds(100));
1892+        if (fSleep) {
1893+            std::unique_lock<std::mutex> lock(mutexMsgProc);
1894+            condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100));

sipa commented at 1:32 pm on December 30, 2016:

It seems that mutexMsgProc is only used for locking flagInterruptMsgProc and waiting, making it very similar to CThreadInterrupt. If you’d add a wake() method to that class, which breaks from sleeps, but doesn’t set the interrupt point, wouldn’t you be able to replace the explicit mutex/flag/condvar with a CThreadInterrupt>

theuni commented at 6:50 pm on December 30, 2016:

I tried to make CThreadInterrupt do this at first (see some of the back+forth with @TheBlueMatt) above. Your suggestion works fine here, I think, but it’s not generic due to the lack of accounting for wake conditions. I think it could be done by having wait_until accept an optional predicate, which would be used internally as “while !internal_wake && !pred()”.

The compromise was to only use it for plain sleeps. But it’s bounced back and forth enough times that I really have no preference. If you’d prefer that, I’m happy to make the change.

theuni commented at 6:53 pm on December 30, 2016:

Whoops, that’s right, that didn’t work because it would force the internal mutex to be exposed so the caller could actually set the wake condition.

sipa commented at 7:24 pm on December 30, 2016:

Ok, I see. Let’s think about generalizing this later. No need to hold this PR up.

in src/net.cpp: in bf2b0ad258 outdated

2216             semOutbound->post();
2217+}
2218+
2219+void CConnman::Stop()
2220+{
2221+    LogPrintf("%s\n",__func__);

sipa commented at 1:34 pm on December 30, 2016:

Leftover debug statement?

theuni commented at 6:50 pm on December 30, 2016:

Will remove.

theuni commented at 7:32 pm on December 30, 2016: member

Pushed a commit for @sipa’s nits. I’ve been running my dev branches on top of this for several days, killing bitcoind in all kinds of rude ways while debugging, and it has continued to shutdown reliably.

sipa commented at 7:33 pm on December 30, 2016: member

utACK

theuni force-pushed on Dec 30, 2016

theuni commented at 8:20 pm on December 30, 2016: member

@sipa I figured you’d have more nits :)

Went ahead and squashed.

sipa commented at 9:55 pm on January 3, 2017: member

Needs rebase.

net: a few small cleanups before replacing boost threads

- Drop the interruption point directly after the pnode allocation. This would
    be leaky if hit.
- Rearrange thread creation so that the socket handler comes first

7325b15566

net: add CThreadInterrupt and InterruptibleSleep 799df9115f

net: make net interruptible

Also now that net threads are interruptible, switch them to use std
threads/binds/mutexes/condvars.

0985052319

net: make net processing interruptible d3d7056d2a

net: remove thread_interrupted catch

This is now a std::thread, so there's no hope of catching a boost interruption
point.

5cb0fcee81

net: make proxy receives interruptible 8b3159ef0a

net: misc header cleanups 67ee4ec901

theuni force-pushed on Jan 3, 2017

theuni commented at 0:06 am on January 4, 2017: member

Done.

laanwj commented at 11:20 am on January 4, 2017: member

lightly-tested ACK 67ee4ec

laanwj merged this on Jan 4, 2017

laanwj closed this on Jan 4, 2017

laanwj referenced this in commit d9ae1cefa0 on Jan 4, 2017

random-zebra referenced this in commit 777638e7bc on Aug 27, 2020

MarcoFalke locked this on Sep 8, 2021

net: drop boost::thread_group #9289