[net] [validation] Call ProcessNewBlock() asynchronously

skeees commented at 9:54 pm on April 10, 2018: contributor

Update

I think this is now in a state for code review Summary of discussion of overall design as well as some concept acks: https://bitcoincore.org/en/meetings/2018/05/03/

Description

~~This is still in progress and not fully completed yet, but wanted to put it out for review in terms of overall design/architecture~~

The high level goal here (in this, and if accepted, subsequent PRs) is to allow for net and validation to live on separate threads and communicate mostly via message passing - both for the efficiency benefits that further parallelism in the net layer might provide, but also perhaps moreso as a step towards the goal of reducing the amount of shared state and forcing a cleaner separation between the net and validation layers in the core node.

To keep this PR as self contained as possible - this set of commits does the following:

defines ProducerConsumerQueue() / ConsumerThread(): infrastructure to facilitate async communication between the net and validation layers
defines ValidationLayer(): an interface where requests for (just CBlock for now) validation can be submitted and processed asynchronously
replaces synchronous calls of ProcessNewBlock() in net_processing with the new async interface ValidationLayer::SubmitForValidation(CBlock) -> std::future<BlockValidationResult>

Because the P2P layer assumes that for a given node every message is fully processed before any subsequent messages are processed, when an asynchronous validation request is submitted for a block coming from a node - that node is “frozen” until that request has been fully validated. In the meantime - the net layer may continue servicing other nodes that do not have pending asynchronous validation requests.

The ProducerConsumerQueue() was left sufficiently generic so that it may be interposed in other places where separation of components via asynchronous message passing might make sense from a design perspective.

fanquake commented at 4:28 am on April 11, 2018: member

cc @theuni @skeees Have you looked through the current work being done to refactor the P2P code? See here for an overview.

skeees commented at 1:27 pm on April 11, 2018: contributor

Thanks, yes I have looked through those. This is more focused on separation between net_processing (PeerLogicValidation) and validation, whereas those primarily tackle socket handling and other ConnMan stuff. I don’t think there’s anything here that’s redundant or incompatible with those refactors

instagibbs commented at 2:26 am on April 13, 2018: member

cc @TheBlueMatt

in src/core/consumerthread.h:20 in 5b880d2590 outdated

12+class ConsumerThread;
13+
14+//! A WorkItem() encapsulates a task that can be processed by a ConsumerThread()
15+//! @see ConsumerThread()
16+template<WorkerMode MODE>
17+class WorkItem

ryanofsky commented at 6:44 pm on April 23, 2018:

I think you could replace this class with using WorkItem = std::function<void()>;. Less code and would make queue interface more generic.

in src/core/consumerthread.h:88 in 5b880d2590 outdated

83+    ConsumerThread() :m_active(false) {};
84+
85+    //! Constructs a ConsumerThread: RAII
86+    //! @param queue the queue from which this thread will pull work
87+    ConsumerThread(std::shared_ptr<WorkQueue<PRODUCER_POLICY>> queue, const std::string id = "")
88+        :m_id(id), m_queue(queue), m_active(true)

ryanofsky commented at 6:50 pm on April 23, 2018:

Could maybe use TraceThread from util.h to make the thread name visible to the os.

in src/core/consumerthread.h:32 in 5b880d2590 outdated

27+template<WorkerMode MODE>
28+class ShutdownPill : public WorkItem<MODE> {
29+    friend ConsumerThread<MODE>;
30+
31+private:
32+    ShutdownPill(ConsumerThread<MODE>& consumer) : m_consumer(consumer) {};

ryanofsky commented at 6:58 pm on April 23, 2018:

ShutdownPill seems a little complicated. What advantages does it provide over just adding bool m_active to ProducerConsumerQueue with a simple method to set it to false and cancel blocked Pop() calls?

skeees commented at 11:03 pm on April 23, 2018:

Yeah - it is pretty complicated. I didn’t want to poke through the queue api just to enable shutdown - you’d have to have T Pop() potentially not return a T (i.e. throw an exception) - which seemed less desirable and maybe equally complicated.

Having said that - a lot of the complexity here is introduced to handle:

being able to shutdown a specific ConsumerThread without shutting down others (in reality you probably only ever want to shut down all of them when you terminate the process)
allowing for a queue with fewer slots than the number of threads servicing it (unlikely)

Only reason I allowed for these is so that the APIs to the queue work the way they sound like they should work and for unit test completeness. If I discard the above the code gets much simpler and it will just be broken for two use cases that seem pretty unlikely to ever happen right now (but you never know - and then maybe somebody gets frustrated one day).

in src/validation_layer.h:102 in c10466c7a2 outdated

57+    typedef WorkQueue<WorkerMode::BLOCKING> ValidationQueue;
58+    typedef ConsumerThread<WorkerMode::BLOCKING> ValidationThread;
59+
60+public:
61+    ValidationLayer(const CChainParams& chainparams)
62+        :m_chainparams(chainparams), m_validation_queue(std::shared_ptr<ValidationQueue>(new ValidationQueue(100))) {}

ryanofsky commented at 7:09 pm on April 23, 2018:

Could use make_shared here, MakeUnique below

in src/validation_layer.h:124 in c10466c7a2 outdated

88+        return BlockValidationResponse(block_valid, is_new);
89+    };
90+
91+ private:
92+    template<typename RESPONSE>
93+    std::future<RESPONSE> SubmitForValidation(ValidationRequest<RESPONSE> * request)

ryanofsky commented at 7:13 pm on April 23, 2018:

Probably should take unique_ptr instead of raw argument to clarify ownership

ryanofsky commented at 7:25 pm on April 23, 2018: member

@TheBlueMatt, could you take a quick look at the last commit (“Call ProcessNewBlock() asynchronously in a separate thread”), and give a concept ACK/NACK if you think the approach makes sense? (All the commits before the last one are pretty straightforward util code.)

It seems like a clean change that disentangles network & validation code and could make bitcoin more responsive to other network requests when blocks are coming in.

skeees commented at 10:52 pm on April 23, 2018: contributor

Thank you for the review - one thing (general design related) to add to the discussion here:

Since I’ve submitted this request - I happened to stumble upon two race conditions in validation that stem from concurrent calls to ProcessNewBlock (#12988, #13023) This pr should simplify the concurrency model for block validation (a single validation thread pulls a block to validate from the queue and validates it completely before moving on to the next block) and would have inadvertently fixed those two referenced race conditions.

Explicitly simplifying the concurrency model hopefully reduces a bit the cognitive burden of future code changes in validation and I don’t think makes anything substantially less efficient - much of validation is already single threaded (because of cs_main), and certain pieces fundamentally cannot be concurrent (i.e. connecttip). Validation is already complicated enough to understand on its own without worrying about concurrency.

Seems like the clarity gains will outweigh the minor efficiency hit here - +the async api into should allow all the stuff around validation to be more easily be parallelized with less risk of inadvertently introducing a consensus bug. And it makes process separation / alternate p2p more natural if that’s ever to be a thing in the future.

If this design seems useful - my intention is to finish this pr up (some stuff around compact blocks that I still have to work through + refit the couple of places in rpc that call ProcessNewBlock) and explore subsequent prs to put a similar model in place around the mempool. I’d also like to explore feasibility for header processing.

TheBlueMatt commented at 1:57 am on April 28, 2018: member

I do think the general approach is fine. It’s not going to be really at all useful until we do a ton of locking cleanups in net_processing (it’ll let us run ahead to the next message, which almost always requires cs_main, and we’ll end up blocking on validation anyway). It’s probably simpler than multiple net_processing threads (with the same cleanups required) which was most of my previous work, but they’ll end up looking pretty similar on the net_processing end, we were gonna need this blocking-on-response logic either way. Should definitely get brought up at a meeting, though, to get wider feedback. @theuni probably has some thoughts, too.

skeees force-pushed on Jun 4, 2018

skeees force-pushed on Jun 5, 2018

skeees renamed this:
~~[WIP] [net] [validation] Call ProcessNewBlock() asynchronously~~
[net] [validation] Call ProcessNewBlock() asynchronously
on Jun 5, 2018

skeees commented at 11:54 am on June 5, 2018: contributor

PR updated with latest commits, ready for review Also, for reference, a discussion on high level design for this PR at the IRC meeting a couple weeks ago: https://bitcoincore.org/en/meetings/2018/05/03/

in src/core/producerconsumerqueue.h:81 in d8106146e7 outdated

76+                }
77+
78+                m_producer_cv.wait(l, [&]() { return m_data.size() < m_capacity; });
79+            }
80+
81+            m_data.push_back(std::forward<TT>(data));

ryanofsky commented at 5:11 pm on June 14, 2018:

In commit “Implement a thread-safe FIFO (producer/consumer style) queue” (d8106146e7511837decb4b45de35dec654454b69)

Should probably emplace_back to avoid creating a temporary.

skeees commented at 1:59 pm on July 2, 2018:

not sure i completely understand - data is already constructed when it is passed into the function - thought the most economical thing to do was std::forward which should use the move constructor when possible? is that different from what emplace_back would do?

jamesob commented at 3:03 pm on July 2, 2018:

Talked about this offline with @ryanofsky, @skeees; this form is equivalent (in terms of constructor calls) to what @ryanofsky initially suggested so no need to change.

DrahtBot added the label Needs rebase on Jun 15, 2018

in src/core/producerconsumerqueue.h:32 in d8106146e7 outdated

27+ * @param m_producer_mode queue behavior when calling Push() on a full queue (block till space becomes available, or immediately fail)
28+ * @param m_consumer_mode queue behavior when calling Pop() on an empty queue (block until there is data, or immediately fail)
29+ *
30+ * @see WorkerMode
31+ */
32+template <typename T, WorkerMode m_producer_mode = WorkerMode::BLOCKING, WorkerMode m_consumer_mode = WorkerMode::BLOCKING>

ryanofsky commented at 6:28 pm on June 19, 2018:

In commit “Implement a thread-safe FIFO (producer/consumer style) queue” (d8106146e7511837decb4b45de35dec654454b69)

I don’t think it’s a good idea for blocking and nonblocking defaults to be attributes of the Queue data structure, instead of arguments to (or variants of) the push and pop methods. Advantages to dropping these template arguments:

Readability. It would be nice to be able to see if a push or pop call is blocking just by looking at the call, without have to check another part of the code to see how the queue data structure was initially declared.
Code size. Dropping these arguments would avoid compiler potentially having to instantiate many copies of this code for different combinations of template arguments.
Extensibility. There could be other useful blocking methods added in the future (like methods to wait for low/high water marks or for empty/full events) and it would either be verbose to have to add new classwide blocking/nonblocking defaults for new methods, or confusing to have to somehow tie existing defaults to new methods.
Consistency. If you look at other C++ objects that support optional blocking like std::mutex or std::future, the blocking behaviour is determined only by the particular method call, not by template arguments from where the object was declared.

skeees commented at 2:06 pm on July 2, 2018:

I can see this either way - i wrote it with defaults essentially as constructor args because - at least for the use cases i can imagine you almost always want a default mode of operation for a given queue except for certain edge cases (shutdown is the most apparent one) - and i’ve seen data structures that handle this sort of initialization both ways (defaults on construction vs with every method call) but i’m also happy to change this up if the prevailing opinion is the other way.

ryanofsky commented at 1:33 pm on July 3, 2018:

Re: #12934 (review)

I can see this either way - i wrote it with defaults essentially as constructor args because - at least for the use cases i can imagine you almost always want a default mode of operation for a given queue except for certain edge cases (shutdown is the most apparent one) - and i’ve seen data structures that handle this sort of initialization both ways (defaults on construction vs with every method call) but i’m also happy to change this up if the prevailing opinion is the other way.

What about having Push, Pop, TryPush, and TryPop methods and dropping the enum entirely? I think this would make code using the queue easier to understand, since blocking would be explicit, instead of based on an enum value determined by a combination of method argument, class template argument, method argument default, and class template argument default values. It could also make code using the queue easier to write, since there would no longer be any chance of hitting various compile and runtime checks for invalid enum values.

I do understand the enum is useful in tests for listing all possible combinations of behavior, but for that purpose, you could just define the enum in test code.

promag commented at 10:27 pm on July 4, 2018:

Readability. It would be nice to be able to see if a push or pop call is blocking just by looking at the call

I think this comment is spot on.

If you look at other C++ objects that support optional blocking like std::mutex or std::future, the blocking behaviour is determined only by the particular method call

What about having Push, Pop, TryPush, and TryPop methods and dropping the enum entirely?

Agree.

ryanofsky commented at 7:10 pm on August 13, 2018:

Re: #12934 (review)

Another advantage to add to list above:

If you make blocking an attribute of push/pop methods rather than an attibutes of the queue you can drop the consumer/producer terminology, which I’m finding confusing now that I’m looking at downstream code. E.g. if I push an item into the queue, that seems like producing from my perspective, but it’s consuming from the queue/worker thread perspective and makes that code a bit strange, IMO.

in src/core/producerconsumerqueue.h:124 in d8106146e7 outdated

119+     * This must always succeed and thus may only be called in producer blocking mode
120+     * @return the element popped
121+     */
122+    T Pop()
123+    {
124+        static_assert(m_consumer_mode == WorkerMode::BLOCKING, "");

ryanofsky commented at 6:32 pm on June 19, 2018:

In commit “Implement a thread-safe FIFO (producer/consumer style) queue” (d8106146e7511837decb4b45de35dec654454b69)

Could return std::future<T> in the case of a non-blocking Pop() to support it instead of having this asymmetry.

skeees commented at 2:04 pm on July 2, 2018:

Doing that would complicate internal implementation a bit - you’d have to hang on to the promise that satisfies the future internally - and either you’d have too many of those and need to block - or potentially allow your buffer holding promises to grow unbounded - so actually it might not be safely implementable.

Also I don’t really see an immediate use case for this - you’d have to later wait on the future (blocking or non-blocking) - but you could just alternately wait on the queue. I can’t think of any use cases right now where you’d want to reserve a place in line in a non-blocking fashion and then later claim that item.

ryanofsky commented at 6:46 pm on June 19, 2018: member

(edited 2018-08-16)

utACK 3d6f03814cde98869ca5b8ad365bb3a0aae522d9. Code looks good, and from the linked IRC discussion, this seems like a promising direction to move in. All of my comments are just suggestions which you should feel free to ignore.

c6396d9ac2d01c43fca7f7730b710de4ba86229c Implement a thread-safe FIFO (producer/consumer style) queue (1/9)
3f09adf2ab63b813727b5b8e22f8a8129c43ed49 Unit tests for ProducerConsumerQueue (2/9)
7f8a8889564b6509285fbab64825fecb983fca6d Add ConsumerThread: to consumer and operate on work from a ProducerConsumerQueue (3/9)
134d47a263620afae26f9966a74465bf896ff530 ConsumerThread unit tests (4/9)
b4717281d43b1e0ff9e52d979a3899220ad11eea ValidationLayer() - interface for calls into block validation (5/9)
1c9f74170f9c35d7bc5ba3e9e079b70aafae4094 Call ProcessNewBlock() asynchronously in a separate thread from p2p layer (6/9)
ce1d845a975d11b3f4e439dfd3ab2823895cf419 Replace all instances of ProcessNewBlock() with ValidationLayer.Validate() (7/9)
377873614b3fa143147ecaba5b750f8be1bdd8a4 Limit available scope of ProcessNewBlock to ValidationLayer (move-only) (8/9)
3d6f03814cde98869ca5b8ad365bb3a0aae522d9 Fix whitespace in test_bitcoin.cpp (whitespace,move-only) (9/9)

skeees force-pushed on Jun 19, 2018

DrahtBot removed the label Needs rebase on Jun 19, 2018

in src/test/producerconsumerqueue_tests.cpp:90 in 52ac3b5469 outdated

85+void QueueTest(int capacity, int n_elements, int n_producers, int n_consumers)
86+{
87+    int bucket_size = n_elements / n_producers;
88+
89+    Q push(capacity);
90+    std::vector<Q*> pull;

ryanofsky commented at 10:25 am on July 3, 2018:

In commit “Unit tests for ProducerConsumerQueue” (52ac3b5469e514f9288683bcb944e58c02606346)

Might be nice to use vector<unique_ptr<Q» to make this code exception safe and avoid need for manual deletion.

in src/test/producerconsumerqueue_tests.cpp:21 in 52ac3b5469 outdated

16+typedef ProducerConsumerQueue<int, WorkerMode::NONBLOCKING, WorkerMode::NONBLOCKING> QueueNN;
17+
18+typedef boost::mpl::list<QueueBB, QueueBN, QueueNB, QueueNN> queue_types;
19+
20+template <typename Q>
21+void Producer(Q& push, Q& recv, int id, int elements_to_push)

ryanofsky commented at 11:13 am on July 3, 2018:

In commit “Unit tests for ProducerConsumerQueue” (52ac3b5469e514f9288683bcb944e58c02606346)

Maybe give this function a more descriptive name like PushThenPop.

Maybe add a summary comment like “Push `elements_pushed` consecutive ints to the `push` queue, starting from (id * elements_to_values). Then pop `elements_pushed` values from the `recv` queue and verify all ints in the range (-elements_pushed, 0] are received, in any order.”

in src/test/producerconsumerqueue_tests.cpp:54 in 52ac3b5469 outdated

49+        assert(received.count(-i));
50+    }
51+}
52+
53+template <typename Q>
54+void Consumer(Q& work, std::vector<Q*> push, int id, int n_producers, int bucket_size, int elements_to_receive)

ryanofsky commented at 11:19 am on July 3, 2018:

In commit “Unit tests for ProducerConsumerQueue” (52ac3b5469e514f9288683bcb944e58c02606346)

Maybe give this function a more descriptive name like PopAndPush.

Maybe add a summary comment like “Pop `elements_pushed` values from the `work` queue. For each value `w` received, push value -(w % bucket_size) to the push[w / bucket_size] output queue.”

in src/test/producerconsumerqueue_tests.cpp:63 in 52ac3b5469 outdated

58+
59+    while (elements_received != elements_to_receive) {
60+        int w;
61+        if (work.Pop(w)) {
62+            int producer_id = w / bucket_size;
63+            int i = w % bucket_size;

ryanofsky commented at 11:28 am on July 3, 2018:

In commit “Unit tests for ProducerConsumerQueue” (52ac3b5469e514f9288683bcb944e58c02606346)

It could be nice to avoid this encoding (and the negative int encoding) by declaring a simple struct like:

0struct TestValue { int producer_id; int value; };

and using ProducerConsumerQueue<TestValue> in the test.

jamesob commented at 1:50 pm on July 4, 2018:

Completely agree. This is pretty hard to follow for me as-is.

in src/test/producerconsumerqueue_tests.cpp:66 in 52ac3b5469 outdated

61+        if (work.Pop(w)) {
62+            int producer_id = w / bucket_size;
63+            int i = w % bucket_size;
64+
65+            assert(producer_id < n_producers);
66+            assert(i > latest[producer_id]);

ryanofsky commented at 11:40 am on July 3, 2018:

In commit “Unit tests for ProducerConsumerQueue” (52ac3b5469e514f9288683bcb944e58c02606346)

Seems like this could more specifically assert (latest == -1) or (i == latest + 1). But perhaps the current way is more compact or less fragile.

in src/test/producerconsumerqueue_tests.cpp:120 in 52ac3b5469 outdated

115+{
116+    Q q;
117+    BOOST_CHECK(q.GetCapacity() == 0);
118+}
119+
120+BOOST_AUTO_TEST_CASE(basic_operation)

ryanofsky commented at 12:05 pm on July 3, 2018:

In commit “Unit tests for ProducerConsumerQueue” (52ac3b5469e514f9288683bcb944e58c02606346)

Maybe add a simple test description like “Test simple queue pushes and pops which don’t block.”

in src/test/producerconsumerqueue_tests.cpp:85 in 52ac3b5469 outdated

80+        }
81+    }
82+}
83+
84+template <typename Q>
85+void QueueTest(int capacity, int n_elements, int n_producers, int n_consumers)

ryanofsky commented at 1:44 pm on July 3, 2018:

In commit “Unit tests for ProducerConsumerQueue” (52ac3b5469e514f9288683bcb944e58c02606346)

Maybe add a simple test description like “Generate messages in `n_producers` producer threads and push them on to a single `push` queue. Pop messages from the queue in `n_consumers` consumer threads, and forward them back to the original producer threads through `n_producers` different `pull` queues. Verify all the queues are empty after the threads exit.”

in src/test/producerconsumerqueue_tests.cpp:157 in 52ac3b5469 outdated

152+    int ret;
153+    BOOST_CHECK(!qBN.Pop(ret));
154+    BOOST_CHECK(!qNN.Pop(ret));
155+}
156+
157+BOOST_AUTO_TEST_CASE_TEMPLATE(multithreaded_operation, Q, queue_types)

ryanofsky commented at 1:47 pm on July 3, 2018:

In commit “Unit tests for ProducerConsumerQueue” (52ac3b5469e514f9288683bcb944e58c02606346)

Maybe add a simple test description like “Run QueueTest with different numbers of messages and threads and different queue capacities.”

in src/test/producerconsumerqueue_tests.cpp:39 in 52ac3b5469 outdated

34+
35+    std::set<int> received;
36+    while (received.size() < (unsigned)elements_to_push) {
37+        int e;
38+        if (recv.Pop(e)) {
39+            assert(!received.count(e));

jamesob commented at 11:14 am on July 4, 2018:

Could also assert that items were popped off in the order they were pushed.

skeees commented at 1:56 pm on July 4, 2018: contributor

Thakns @ryanofsky - agree with basically everything you’ve said re commit 52ac3b5 - going to wait for some more comments to accumulate and address all at once so i don’t mess up any other reviews that might be in progress

in src/core/consumerthread.h:54 in 6c624232a7 outdated

49+{
50+    friend ConsumerThread<MODE>;
51+
52+private:
53+    ShutdownPill(ConsumerThread<MODE>& consumer) : m_consumer(consumer){};
54+    void operator()()

jamesob commented at 1:56 pm on July 4, 2018:

Any reason this is protected in superclasses but private here?

ryanofsky commented at 6:28 pm on August 13, 2018:

In commit “Add ConsumerThread: to consumer and operate on work from a ProducerConsumerQueue” (7f8a8889564b6509285fbab64825fecb983fca6d)

Should this be marked “override?”

in src/core/producerconsumerqueue.h:71 in 9668666b27 outdated

66+        // TT needed for perfect forwarding to vector::push_back
67+
68+        // attempting a push to a queue of capacity 0 is likely unintended
69+        assert(m_capacity > 0);
70+
71+        {

promag commented at 10:11 pm on July 4, 2018:

Why this scope? I think notify_one() below can be called while m_queue_lock is held. — same in Pop().

skeees commented at 7:30 am on July 5, 2018:

You can - but its less efficient to notify on a cv while holding a lock - the thread that you notify will wake up and immediately block again on the lock until you release it. It is safe to call notify without the lock held.

from: https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one

The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s); in fact doing so is a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock. However, some implementations (in particular many implementations of pthreads) recognize this situation and avoid this “hurry up and wait” scenario by transferring the waiting thread from the condition variable’s queue directly to the queue of the mutex within the notify call, without waking it up.

in src/core/producerconsumerqueue.h:78 in 9668666b27 outdated

73+            if (m_data.size() >= m_capacity) {
74+                if (mode == WorkerMode::NONBLOCKING) {
75+                    return false;
76+                }
77+
78+                m_producer_cv.wait(l, [&]() { return m_data.size() < m_capacity; });

promag commented at 10:20 pm on July 4, 2018:

nit, could drop (). — same in Pop().

DrahtBot added the label Needs rebase on Jul 7, 2018

in src/core/consumerthread.h:48 in 6c624232a7 outdated

43+    std::function<void()> m_f;
44+};
45+
46+//! A special WorkItem() that is used to interrupt a blocked ConsumerThread() so that it can terminate
47+template <WorkerMode MODE>
48+class ShutdownPill : public WorkItem<MODE>

jamesob commented at 5:15 pm on July 13, 2018:

This thing is pretty complex, and right now it looks like we don’t do granular per-thread per-queue shutdowns. The only instance of this being used is shutting down the single thread associated with ValidationQueue. When would a situation arise where we need to shut off some but not all of the threads consuming from a queue? I can’t think of a case where that’d be necessary.

Could we just have the queue itself raise a shutdown signal for the threads? Have queue->Pop() return (or set) a sentinel shutdown value of some kind instead? Might save a good bit of code, and would remove the need for some of the locking in ConsumerThread.

ryanofsky commented at 9:27 pm on August 13, 2018:

In commit “Add ConsumerThread: to consumer and operate on work from a ProducerConsumerQueue” (7f8a8889564b6509285fbab64825fecb983fca6d)

I think I agree with James. Even if you want to support shutting down specific threads, an approach more like “Everybody wake up, check if you are supposed to exit, and if not go back to sleep again” might be simpler than “Random thread wake up, check if you are supposed to exit, and if not wake up another random thread, but not if you already seen this particular notification before,” or whatever the correct description is. (If you do want to stick with the current approach, it could be nice to add a high level comment explaining it like this.)

in src/validation_layer.h:122 in 5d40aa8e96 outdated

117+    //! Submit a block for validation and block on the response
118+    BlockValidationResponse Validate(const std::shared_ptr<const CBlock> block, bool force_processing);
119+
120+private:
121+    //! Internal utility method - sets up and calls ProcessNewBlock
122+    BlockValidationResponse ValidateInternal(const std::shared_ptr<const CBlock> block, bool force_processing) const;

jamesob commented at 6:03 pm on July 13, 2018:

Any reason this doesn’t just live on BlockValidationRequest? Could avoid some friendliness that way.

in src/validation_layer.h:49 in 5d40aa8e96 outdated

44+struct BlockValidationResponse {
45+    //! Is this the first time this block has been validated
46+    const bool is_new;
47+
48+    //! Did initial validation pass (a block can still pass initial validation but then later fail to connect to an active chain)
49+    const bool block_valid;

jamesob commented at 6:14 pm on July 13, 2018:

Based on the call site, should this be block_accepted? “valid” is a little ambiguous if you take into account all the different states in chain.h:BlockStatus.

in src/net.cpp:2052 in 8f0d086d53 outdated

2039-            bool fMoreNodeWork = m_msgproc->ProcessMessages(pnode, flagInterruptMsgProc);
2040-            fMoreWork |= (fMoreNodeWork && !pnode->fPauseSend);
2041+            bool request_was_queued = pnode->IsAwaitingInternalRequest();
2042+
2043+            // If an internal request was queued and it's not done yet, skip this node
2044+            if (request_was_queued && !pnode->ProcessInternalRequestResults(m_msgproc))

jamesob commented at 6:19 pm on July 13, 2018:

braces

in src/net.h:38 in 8f0d086d53 outdated

36 #ifndef WIN32
37 #include <arpa/inet.h>
38 #endif
39 
40-
41+struct BlockValidationResponse;

jamesob commented at 6:24 pm on July 13, 2018:

Why not just include validation_layer? Circular dep somehow?

in src/net_processing.cpp:2706 in 8f0d086d53 outdated

2692-            pfrom->nLastBlockTime = GetTime();
2693-        } else {
2694-            LOCK(cs_main);
2695-            mapBlockSource.erase(pblock->GetHash());
2696-        }
2697+        SubmitBlock(connman, validation_layer, pfrom, pblock, forceProcessing);

jamesob commented at 6:27 pm on July 13, 2018:

Nice cleanup!

in src/net_processing.cpp:3082 in 8f0d086d53 outdated

3059+
3060+    // If we've reconstructed this block via compactblocks then
3061+    // Clear download state for this block, which is in
3062+    // process from some other peer.  We do this after calling
3063+    // ProcessNewBlock so that a malleated cmpctblock announcement
3064+    // can't be used to interfere with block relay.

jamesob commented at 6:30 pm on July 13, 2018:

I’m out of my depth here - should be looked at by someone with lots of net_processing experience.

jamesob commented at 6:37 pm on July 13, 2018: member

I think this is a great change, but I wonder if we can cut down on some of the complexity. Once this is rebased and maybe some of the feedback addressed, I’ll do some manual testing.

sipa referenced this in commit 1e90862f5d on Jul 14, 2018

Implement a thread-safe FIFO (producer/consumer style) queue c6396d9ac2

Unit tests for ProducerConsumerQueue 3f09adf2ab

Add ConsumerThread: to consumer and operate on work from a ProducerConsumerQueue 7f8a888956

ConsumerThread unit tests 134d47a263

ValidationLayer() - interface for calls into block validation b4717281d4

skeees force-pushed on Jul 25, 2018

Call ProcessNewBlock() asynchronously in a separate thread from p2p layer 1c9f74170f

Replace all instances of ProcessNewBlock() with ValidationLayer.Validate() ce1d845a97

Limit available scope of ProcessNewBlock to ValidationLayer (move-only) 377873614b

Fix whitespace in test_bitcoin.cpp (whitespace,move-only) 3d6f03814c

skeees force-pushed on Jul 25, 2018

DrahtBot removed the label Needs rebase on Jul 25, 2018

DrahtBot commented at 3:07 pm on August 8, 2018: member

DrahtBot added the label Needs rebase on Aug 8, 2018

in src/Makefile.am:148 in c6396d9ac2 outdated

144@@ -145,6 +145,7 @@ BITCOIN_CORE_H = \
145   policy/policy.h \
146   policy/rbf.h \
147   pow.h \
148+  core/producerconsumerqueue.h \

ryanofsky commented at 5:43 pm on August 13, 2018:

In commit “Implement a thread-safe FIFO (producer/consumer style) queue” (c6396d9ac2d01c43fca7f7730b710de4ba86229c)

Just noticed this PR is creating a new src/core/ subdirectory to hold the queue and thread code. This seems good, but it may also be good to add a short core/README.md to say what the directory is supposed to be for. For example, if it’s meant to hold utility code that isn’t bitcoin specific, or if it might make sense in the future to move other code there.

in src/core/consumerthread.h:26 in 7f8a888956 outdated

21+{
22+    friend ConsumerThread<MODE>; //<! only a consumer thread can execute a WorkItem
23+
24+protected:
25+    WorkItem(){};
26+    virtual void operator()(){};

ryanofsky commented at 6:31 pm on August 13, 2018:

In commit “Add ConsumerThread: to consumer and operate on work from a ProducerConsumerQueue” (7f8a8889564b6509285fbab64825fecb983fca6d)

It might be a good idea to make this abstract (= 0) to trigger a compile error in case a subclass declares this the wrong way and fails to override.

in src/core/consumerthread.h:84 in 7f8a888956 outdated

79+    ConsumerThread<MODE>& m_consumer;
80+    std::set<std::thread::id> m_threads_observed;
81+};
82+
83+template <WorkerMode PRODUCER_MODE>
84+class WorkQueue : public BlockingConsumerQueue<std::unique_ptr<WorkItem<PRODUCER_MODE>>, PRODUCER_MODE>

ryanofsky commented at 7:35 pm on August 13, 2018:

In commit “Add ConsumerThread: to consumer and operate on work from a ProducerConsumerQueue” (7f8a8889564b6509285fbab64825fecb983fca6d)

Calling this variable PRODUCER_MODE here, but passing it as the consumer_mode argument to BlockingConsumerQueue/ProducerConsumerQueue is a little unexpected. Could you maybe note this in a short comment to avoid confusion? Alternately it might be clearer just to use ProducerConsumerQueue directly, and not have BlockingConsumerQueue as a thing.

TBH, also, I don’t actually understand why PRODUCER_MODE exists as a parameter. Why would code constructing WorkQueue want to control the default consumer mode of the queue when it doesn’t consume from the queue (ConsumerThread does)?

in src/core/consumerthread.h:121 in 7f8a888956 outdated

116+    ConsumerThread() : m_active(false){};
117+
118+    //! Constructs a ConsumerThread: RAII
119+    //! @param queue the queue from which this thread will pull work
120+    ConsumerThread(std::shared_ptr<WorkQueue<PRODUCER_POLICY>> queue, const std::string id = "worker")
121+        : m_id(id), m_queue(queue), m_active(true)

ryanofsky commented at 7:39 pm on August 13, 2018:

In commit “Add ConsumerThread: to consumer and operate on work from a ProducerConsumerQueue” (7f8a8889564b6509285fbab64825fecb983fca6d)

Might be better to std::move(id) to avoid a copy if caller passes a temporary.

in src/core/consumerthread.h:175 in 7f8a888956 outdated

170+
171+    //! the thread that this class wraps
172+    std::thread m_thread;
173+
174+    //! whether this thread should continue running: behaves like a latch
175+    //! initialized to true in the constructor

ryanofsky commented at 7:42 pm on August 13, 2018:

In commit “Add ConsumerThread: to consumer and operate on work from a ProducerConsumerQueue” (7f8a8889564b6509285fbab64825fecb983fca6d)

Can the comment explain why this is volatile rather than std::atomic or similar? I thought volatile wasn’t really meaningful for thread synchronization.

Also maybe just write m_active = true here so no need to initialize it elsewhere and then describe the initialization in a comment.

in src/test/consumerthread_tests.cpp:35 in 134d47a263 outdated

30+};
31+
32+void ConsumerThreadTest(int n_elements, int n_threads)
33+{
34+    std::vector<int> work(n_elements);
35+    auto queue = std::shared_ptr<WorkQueue<WorkerMode::BLOCKING>>(new WorkQueue<WorkerMode::BLOCKING>(n_elements + 1));

ryanofsky commented at 8:04 pm on August 13, 2018:

In commit “ConsumerThread unit tests” (134d47a263620afae26f9966a74465bf896ff530)

Would it make sense to add tests for WorkerMode::NONBLOCKING? Maybe say one way or the other in a comment here even if it is not worth writing more tests now.

in src/core/consumerthread.h:179 in 7f8a888956 outdated

174+    //! whether this thread should continue running: behaves like a latch
175+    //! initialized to true in the constructor
176+    //! can be set to false by calling Terminate()
177+    volatile bool m_active;
178+
179+    //! protects Terminate()

ryanofsky commented at 8:27 pm on August 13, 2018:

In commit “Add ConsumerThread: to consumer and operate on work from a ProducerConsumerQueue” (7f8a8889564b6509285fbab64825fecb983fca6d)

Would dropping the mutex and just making m_active a std::atomic have the same effect? It seems like this would be equivalent and simpler, but maybe you could explain if the mutex is covering something else besides m_active in the comment.

in src/test/consumerthread_tests.cpp:39 in 134d47a263 outdated

34+    std::vector<int> work(n_elements);
35+    auto queue = std::shared_ptr<WorkQueue<WorkerMode::BLOCKING>>(new WorkQueue<WorkerMode::BLOCKING>(n_elements + 1));
36+
37+    std::vector<std::unique_ptr<ConsumerThread<WorkerMode::BLOCKING>>> threads;
38+    for (int i = 0; i < n_threads; i++) {
39+        threads.emplace_back(std::unique_ptr<ConsumerThread<WorkerMode::BLOCKING>>(new ConsumerThread<WorkerMode::BLOCKING>(queue, std::to_string(i))));

ryanofsky commented at 8:30 pm on August 13, 2018:

In commit “ConsumerThread unit tests” (134d47a263620afae26f9966a74465bf896ff530)

Maybe shorten this with MakeUnique (also in Push below).

in src/test/consumerthread_tests.cpp:47 in 134d47a263 outdated

42+    for (int i = 0; i < n_elements; i++) {
43+        work[i] = i;
44+        queue->Push(std::unique_ptr<TestWorkItem>(new TestWorkItem(work[i])));
45+    }
46+
47+    while (queue->size() > 0) {

ryanofsky commented at 8:40 pm on August 13, 2018:

In commit “ConsumerThread unit tests” (134d47a263620afae26f9966a74465bf896ff530)

Can you add a comment to say what the yield loop is for? It seems like just calling Sync without yielding beforehand, then then checking size == 0 would be simpler, and also a better check that the Sync method works.

in src/test/consumerthread_tests.cpp:80 in 134d47a263 outdated

75+    for (int i = 0; i < n_elements; i++) {
76+        BOOST_CHECK_EQUAL(work[i], i + 4);
77+    }
78+}
79+
80+BOOST_AUTO_TEST_CASE(foo)

ryanofsky commented at 8:53 pm on August 13, 2018:

In commit “ConsumerThread unit tests” (134d47a263620afae26f9966a74465bf896ff530)

Should replace foo.

in src/test/consumerthread_tests.cpp:44 in 134d47a263 outdated

39+        threads.emplace_back(std::unique_ptr<ConsumerThread<WorkerMode::BLOCKING>>(new ConsumerThread<WorkerMode::BLOCKING>(queue, std::to_string(i))));
40+    }
41+
42+    for (int i = 0; i < n_elements; i++) {
43+        work[i] = i;
44+        queue->Push(std::unique_ptr<TestWorkItem>(new TestWorkItem(work[i])));

ryanofsky commented at 8:57 pm on August 13, 2018:

In commit “ConsumerThread unit tests” (134d47a263620afae26f9966a74465bf896ff530)

I think there is a race here between work[i] being assigned and then incremented in the worker thread. You could avoid it by setting work values after creating the vector but before creating the queue.

in src/test/consumerthread_tests.cpp:58 in 134d47a263 outdated

53+        threads[i]->Terminate();
54+    }
55+
56+    BOOST_CHECK_LT(queue->size(), n_threads + 1);
57+    for (int i = 0; i < n_elements; i++) {
58+        BOOST_CHECK_EQUAL(work[i], i + 2);

ryanofsky commented at 9:01 pm on August 13, 2018:

In commit “ConsumerThread unit tests” (134d47a263620afae26f9966a74465bf896ff530)

Might be good to move this check above Terminate() to make the test more strict.

in src/core/consumerthread.h:75 in 7f8a888956 outdated

70+                m_consumer.m_queue->Push(MakeUnique<ShutdownPill<MODE>>(std::move(*this)), WorkerMode::NONBLOCKING);
71+            }
72+
73+            // if the same pill has been seen by the same thread previously then it can safely be discarded
74+            // the intended thread has either terminated or is currently processing a work item and will terminate
75+            // after completing that item and before blocking on the queue

ryanofsky commented at 9:32 pm on August 13, 2018:

In commit “Add ConsumerThread: to consumer and operate on work from a ProducerConsumerQueue” (7f8a888)

Why couldn’t the intended thread just be blocked calling Pop() and not terminated or currently processing anything? It seems like this is assuming threads are notified in a circular order.

in src/validation_layer.h:115 in b4717281d4 outdated

110+
111+    //! Stops the validation layer (stopping the validation thread)
112+    void Stop();
113+
114+    //! Submit a block for asynchronous validation
115+    std::future<BlockValidationResponse> SubmitForValidation(const std::shared_ptr<const CBlock> block, bool force_processing, std::function<void()> on_ready = []() {});

ryanofsky commented at 2:54 pm on August 14, 2018:

In commit “ValidationLayer() - interface for calls into block validation” (b4717281d43b1e0ff9e52d979a3899220ad11eea)

Would be slightly more efficient to make default on_ready value nullptr instead of a no-op lambda.

It’s also kind of unclear what m_ready is supposed to be used for in this context. You might want to move down your other comment from m_on_ready about c++11 futures here.

in src/validation_layer.cpp:10 in b4717281d4 outdated

 5+#include <validation_layer.h>
 6+#include <validation.h>
 7+
 8+void BlockValidationRequest::operator()()
 9+{
10+    LogPrint(BCLog::VALIDATION, "%s: validating request=%s\n", __func__, GetId());

ryanofsky commented at 8:20 pm on August 14, 2018:

In commit “ValidationLayer() - interface for calls into block validation” (b4717281d43b1e0ff9e52d979a3899220ad11eea)

Maybe add the word “block” somewhere in here to make it clear what this is validating and obvious this is a block hash.

in src/validation_layer.cpp:28 in b4717281d4 outdated

23+    return strprintf("BlockValidationRequest[%s]", m_block->GetHash().ToString());
24+}
25+
26+void ValidationLayer::Start()
27+{
28+    assert(!m_thread || !m_thread->IsActive());

ryanofsky commented at 8:23 pm on August 14, 2018:

In commit “ValidationLayer() - interface for calls into block validation” (b4717281d43b1e0ff9e52d979a3899220ad11eea):

I think it would be better to just assert !m_thread to simplify and be more conservative. If m_thread is allowed to be non-null, then I think you would need to add more synchronization here to make sure join is called before the thread is destroyed to prevent a crash in the destructor: https://en.cppreference.com/w/cpp/thread/thread/%7Ethread

in src/core/consumerthread.h:157 in 7f8a888956 outdated

152+            m_queue->Push(std::unique_ptr<ShutdownPill<PRODUCER_POLICY>>(new ShutdownPill<PRODUCER_POLICY>(*this)), WorkerMode::NONBLOCKING);
153+        }
154+    }
155+
156+    //! Waits until this thread terminates
157+    //! RequestTerminate() must have been previously called or be called by a different thread

ryanofsky commented at 8:42 pm on August 14, 2018:

In commit “Add ConsumerThread: to consumer and operate on work from a ProducerConsumerQueue” (7f8a8889564b6509285fbab64825fecb983fca6d)

I think it would be clearer to just say this will block until RequestTerminate() is called. It should be perfectly fine to call RequestTerminate before or after this call and from any thread.

in src/validation_layer.cpp:40 in b4717281d4 outdated

35+    m_thread->Terminate();
36+}
37+
38+std::future<BlockValidationResponse> ValidationLayer::SubmitForValidation(const std::shared_ptr<const CBlock> block, bool force_processing, std::function<void()> on_ready)
39+{
40+    BlockValidationRequest* req = new BlockValidationRequest(*this, block, force_processing, on_ready);

ryanofsky commented at 8:45 pm on August 14, 2018:

In commit “ValidationLayer() - interface for calls into block validation” (b4717281d43b1e0ff9e52d979a3899220ad11eea)

Would be safer / more efficient to use std::make_shared here.

in src/validation_layer.h:141 in b4717281d4 outdated

136+
137+    //! a queue that holds validation requests that are sequentially processed by m_thread
138+    const std::shared_ptr<ValidationQueue> m_validation_queue;
139+
140+    //! the validation thread - sequentially processes validation requests from m_validation_queue
141+    std::unique_ptr<ValidationThread> m_thread;

ryanofsky commented at 8:54 pm on August 14, 2018:

In commit “ValidationLayer() - interface for calls into block validation” (b4717281d43b1e0ff9e52d979a3899220ad11eea)

Can this just be a ValidationThread instead of a pointer to one? The extra indirection doesn’t seem helpful.

in src/validation_layer.h:106 in b4717281d4 outdated

101+    typedef ConsumerThread<WorkerMode::BLOCKING> ValidationThread;
102+
103+public:
104+    ValidationLayer(const CChainParams& chainparams)
105+        : m_chainparams(chainparams), m_validation_queue(std::make_shared<ValidationQueue>(100)) {}
106+    ~ValidationLayer(){};

ryanofsky commented at 8:57 pm on August 14, 2018:

In commit “ValidationLayer() - interface for calls into block validation” (b4717281d43b1e0ff9e52d979a3899220ad11eea)

Maybe assert thread is not joinable here, to help debugging in case this is not shut down correctly.

in src/init.cpp:214 in 1c9f74170f outdated

210@@ -209,6 +211,7 @@ void Shutdown()
211     // using the other before destroying them.
212     if (peerLogic) UnregisterValidationInterface(peerLogic.get());
213     if (g_connman) g_connman->Stop();
214+    if (g_validation_layer) g_validation_layer->Stop();

ryanofsky commented at 4:27 pm on August 15, 2018:

In commit “Call ProcessNewBlock() asynchronously in a separate thread from p2p layer” (1c9f74170f9c35d7bc5ba3e9e079b70aafae4094)

Should this also free g_validation_layer? (or have a comment saying why it shouldn’t be freed)

in src/net.cpp:2052 in 1c9f74170f outdated

2050-            bool fMoreNodeWork = m_msgproc->ProcessMessages(pnode, flagInterruptMsgProc);
2051-            fMoreWork |= (fMoreNodeWork && !pnode->fPauseSend);
2052+            bool request_was_queued = pnode->IsAwaitingInternalRequest();
2053+
2054+            // If an internal request was queued and it's not done yet, skip this node
2055+            if (request_was_queued && !pnode->ProcessInternalRequestResults(m_msgproc))

ryanofsky commented at 5:11 pm on August 15, 2018:

In commit “Call ProcessNewBlock() asynchronously in a separate thread from p2p layer” (1c9f74170f9c35d7bc5ba3e9e079b70aafae4094)

I think this code might be clearer if the IsAwaitingInternalRequest were call were dropped and ProcessInternalRequestResults just returned requested_was_queued directly. It seems awkward how IsAwaitingInternalRequest and ProcessInternalRequestResults are checking some of the same things and then this code is combining their return values.

in src/test/test_bitcoin.h:21 in 1c9f74170f outdated

17@@ -18,6 +18,8 @@
18 
19 #include <boost/thread.hpp>
20 
21+class ValidationLayer;

ryanofsky commented at 5:21 pm on August 15, 2018:

In commit “Call ProcessNewBlock() asynchronously in a separate thread from p2p layer” (1c9f74170f9c35d7bc5ba3e9e079b70aafae4094)

Probably remove, doesn’t look like this is used right now.

in src/net_processing.cpp:1582 in 1c9f74170f outdated

1578@@ -1572,7 +1579,7 @@ bool static ProcessHeadersMessage(CNode *pfrom, CConnman *connman, const std::ve
1579     return true;
1580 }
1581 
1582-bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStream& vRecv, int64_t nTimeReceived, const CChainParams& chainparams, CConnman* connman, const std::atomic<bool>& interruptMsgProc, bool enable_bip61)
1583+    bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStream& vRecv, int64_t nTimeReceived, const CChainParams& chainparams, CConnman* connman, ValidationLayer& validation_layer, const std::atomic<bool>& interruptMsgProc, bool enable_bip61)

ryanofsky commented at 5:32 pm on August 15, 2018:

In commit “Call ProcessNewBlock() asynchronously in a separate thread from p2p layer” (1c9f74170f9c35d7bc5ba3e9e079b70aafae4094)

Unintended indent?

in src/net_processing.cpp:3083 in 1c9f74170f outdated

3078+    // If we've reconstructed this block via compactblocks then
3079+    // Clear download state for this block, which is in
3080+    // process from some other peer.  We do this after calling
3081+    // ProcessNewBlock so that a malleated cmpctblock announcement
3082+    // can't be used to interfere with block relay.
3083+    if (!pindex || pindex->IsValid(BLOCK_VALID_TRANSACTIONS)) {

ryanofsky commented at 5:46 pm on August 15, 2018:

In commit “Call ProcessNewBlock() asynchronously in a separate thread from p2p layer” (1c9f74170f9c35d7bc5ba3e9e079b70aafae4094)

Maybe drop the !pindex check here, or say in a comment whether this would ever be expected? It seems surprising to treat null pindex like the block is valid.

in src/net_processing.cpp:3087 in 1c9f74170f outdated

3082+    // can't be used to interfere with block relay.
3083+    if (!pindex || pindex->IsValid(BLOCK_VALID_TRANSACTIONS)) {
3084+        MarkBlockAsReceived(pblock->GetHash());
3085+    }
3086+
3087+    if (validation_response.is_new) {

ryanofsky commented at 5:49 pm on August 15, 2018:

In commit “Call ProcessNewBlock() asynchronously in a separate thread from p2p layer” (1c9f74170f9c35d7bc5ba3e9e079b70aafae4094)

This seems ok, but I wanted to note that previous code in the fBlockReconstructed case updated nLastBlockTime/mapBlockSource before calling MarkBlockAsReceived, instead of after.

in src/net_processing.cpp:2649 in 1c9f74170f outdated

2647@@ -2656,20 +2648,13 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
2648             }
2649         } // Don't hold cs_main when we call into ProcessNewBlock

ryanofsky commented at 5:55 pm on August 15, 2018:

In commit “Call ProcessNewBlock() asynchronously in a separate thread from p2p layer” (1c9f74170f9c35d7bc5ba3e9e079b70aafae4094)

Should s/ProcessNewBlock/SubmitBlock/

in src/net_processing.cpp:2648 in 1c9f74170f outdated

2637@@ -2645,7 +2638,6 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
2638                 // though the block was successfully read, and rely on the
2639                 // handling in ProcessNewBlock to ensure the block index is
2640                 // updated, reject messages go out, etc.
2641-                MarkBlockAsReceived(resp.blockhash); // it is now an empty pointer

ryanofsky commented at 8:10 pm on August 15, 2018:

In commit “Call ProcessNewBlock() asynchronously in a separate thread from p2p layer” (1c9f741)

I guess with this line removed, the block will be marked received later, from the worker thread, after it is processed. This seems ok, though I could see why you might want to mark the block received when its received but before it’s processed, so it’s clearer what “received” and “processed” actually refer to.

in src/net.cpp:2860 in 1c9f74170f outdated

2855+    return all_cleared;
2856+}
2857+
2858+void CNode::SetPendingInternalRequest(const std::shared_ptr<const CBlock> block, std::future<BlockValidationResponse>&& pending_response, const CBlockIndex* pindex)
2859+{
2860+    m_block_validating = block;

ryanofsky commented at 7:32 pm on August 16, 2018:

In commit “Call ProcessNewBlock() asynchronously in a separate thread from p2p layer” (1c9f74170f9c35d7bc5ba3e9e079b70aafae4094)

Would it be possible to assert m_block_validating variables are null/invalid here before overwriting them? It seems like it would help debugging if SetPendingInternalRequest were called for a new request before the previous request completed.

in src/validation_layer.cpp:27 in 377873614b outdated

23+ * @param[in]   pblock  The block we want to process.
24+ * @param[in]   fForceProcessing Process this block even if unrequested; used for non-network block sources and whitelisted peers.
25+ * @param[out]  fNewBlock A boolean which is set to indicate if the block was first received via this call
26+ * @return True if state.IsValid()
27+ */
28+bool ProcessNewBlock(const CChainParams& chainparams, const std::shared_ptr<const CBlock> pblock, bool fForceProcessing, bool* fNewBlock);

ryanofsky commented at 7:40 pm on August 16, 2018:

In commit “Limit available scope of ProcessNewBlock to ValidationLayer (move-only)” (377873614b3fa143147ecaba5b750f8be1bdd8a4)

It’s unusual that ProcessNewBlock is documented and declared in validation_layer.cpp but defined in src/validation.cpp. It seems like it will make the documentation hard to find. There are a bunch of other options that seem like they would be better to me:

Moving ProcessNewBlock declaration to validate_layer.cpp but moving the documentation to validate.cpp near the definition.
Leaving ProcessNewBlock where it is but renaming it to ProcessNewBlock internal.
Making ProcessNewBlock a private method of a class that is friends with ValidationLayer

ryanofsky commented at 7:41 pm on August 16, 2018:

In commit “Limit available scope of ProcessNewBlock to ValidationLayer (move-only)” (377873614b3fa143147ecaba5b750f8be1bdd8a4)

LOCKS_EXCLUDED(cs_main) annotation seems to have been dropped here.

in src/test/test_bitcoin.cpp:82 in 3d6f03814c

100-        {
101-            CValidationState state;
102-            if (!ActivateBestChain(state, chainparams)) {
103-                throw std::runtime_error(strprintf("ActivateBestChain failed. (%s)", FormatStateMessage(state)));
104-            }
105+    // Ideally we'd move all the RPC tests to the functional testing framework

ryanofsky commented at 7:56 pm on August 16, 2018:

In commit “Fix whitespace in test_bitcoin.cpp (whitespace,move-only)” (3d6f03814cde98869ca5b8ad365bb3a0aae522d9)

Maybe drop “move-only” from commit description, since this is actually just a whitespace change.

in src/core/consumerthread.h:66 in 3d6f03814c

61+            if (!m_threads_observed.count(id)) {
62+                m_threads_observed.insert(std::this_thread::get_id());
63+
64+                // resubmit it so that it gets a chance to get to the right thread
65+                // when resubmitting, do not block and do not care about failures
66+                // theres a potential deadlock where we try to push this to a queue thats

practicalswift commented at 8:09 am on September 2, 2018:

Typo found by codespell: “theres” should be “there is” :-)

practicalswift commented at 8:09 am on September 2, 2018:

Typo found by codespell: “thats” should be “that is” :-)

in src/core/producerconsumerqueue.h:128 in 3d6f03814c

123+    {
124+        static_assert(m_consumer_mode == WorkerMode::BLOCKING, "");
125+
126+        T ret;
127+
128+        // use a temporary so theres no side effecting code inside an assert which could be disabled

practicalswift commented at 8:09 am on September 2, 2018:

Typo found by codespell: “theres” should be “there is” :-)

DrahtBot commented at 4:19 pm on December 3, 2018: member

DrahtBot added the label Up for grabs on Dec 3, 2018

DrahtBot closed this on Dec 3, 2018

laanwj removed the label Needs rebase on Oct 24, 2019

PastaPastaPasta referenced this in commit 5c6fb6e75c on Apr 15, 2020

PastaPastaPasta referenced this in commit a3f0f3cf63 on Apr 16, 2020

PastaPastaPasta referenced this in commit 504b696b78 on Apr 16, 2020

PastaPastaPasta referenced this in commit 7e588c7594 on Apr 19, 2020

PastaPastaPasta referenced this in commit 6cce7b22db on May 10, 2020

PastaPastaPasta referenced this in commit abc246a698 on May 12, 2020

fanquake removed the label Up for grabs on May 13, 2020

PastaPastaPasta referenced this in commit 5b72c199ff on Jun 9, 2020

PastaPastaPasta referenced this in commit 9c6d174c4d on Jun 9, 2020

PastaPastaPasta referenced this in commit cfac0a9f3f on Jun 10, 2020

PastaPastaPasta referenced this in commit c26f14a99c on Jun 11, 2020

DrahtBot locked this on Feb 15, 2022

[net] [validation] Call ProcessNewBlock() asynchronously #12934

Update

Description