refactor: replace CConnman::SocketEvents() with mockable Sock::WaitMany()

vasild commented at 2:03 pm on February 16, 2022: member

This is a piece of #21878, chopped off to ease review.

Sock::Wait() waits for IO events on one socket. Introduce a similar virtual method WaitMany() that waits simultaneously for IO events on more than one socket.

Use WaitMany() instead of CConnman::SocketEvents() (and ditch the latter). Given that the former is a virtual method, it can be mocked by unit and fuzz tests. This will help to make bigger parts of CConnman testable (unit and fuzz).

DrahtBot added the label P2P on Feb 16, 2022

DrahtBot added the label Refactoring on Feb 16, 2022

DrahtBot added the label Utils/log/libs on Feb 16, 2022

DrahtBot commented at 8:18 am on February 17, 2022: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#21878 (Make all networking code mockable by vasild)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

in src/util/sock.h:154 in af45eeea03 outdated

143      * @param[out] occurred If not nullptr and `true` is returned, then upon return this
144-     * indicates which of the requested events occurred. A timeout is indicated by return
145-     * value of `true` and `occurred` being set to 0.
146+     * indicates which of the requested events occurred (`ERR` will be added, even if
147+     * not requested, if an exceptional event occurs on the socket).
148+     * A timeout is indicated by return value of `true` and `occurred` being set to 0.

jonatack commented at 6:20 pm on February 20, 2022:

c1349291159b97de03160dfe4cbd9be7053b9dba maybe, if I understand correctly:

 0      * Wait for readiness for input (recv) or output (send).
 1      * [@param](/bitcoin-bitcoin/contributor/param/)[in] timeout Wait this much for at least one of the requested events to occur.
 2      * [@param](/bitcoin-bitcoin/contributor/param/)[in] requested Wait for those events, bitwise-or of `RECV` and `SEND`.
 3-     * [@param](/bitcoin-bitcoin/contributor/param/)[out] occurred If not nullptr and `true` is returned, then upon return this
 4+     * [@param](/bitcoin-bitcoin/contributor/param/)[out] occurred If not nullptr and the function returns `true`, then this
 5      * indicates which of the requested events occurred (`ERR` will be added, even if
 6      * not requested, if an exceptional event occurs on the socket).
 7-     * A timeout is indicated by return value of `true` and `occurred` being set to 0.
 8-     * [@return](/bitcoin-bitcoin/contributor/return/) true on success and false otherwise
 9+     * [@return](/bitcoin-bitcoin/contributor/return/) true on success (or timeout, if `occurred` of 0 is returned), false otherwise
10      */

vasild commented at 10:17 am on March 10, 2022:

Done.

in src/util/sock.h:208 in af45eeea03 outdated

198+     * @param[in,out] what Wait for the requested events on these sockets and set `occurred`
199+     * to the events that actually occur.
200+     * A timeout is indicated by return value of `true` and all `what[].occurred`
201+     * being set to 0.
202+     * @return true on success and false otherwise
203+     */

jonatack commented at 7:05 pm on February 20, 2022:

fc4bfdf maybe, if I understand correctly:

0-     * to the events that actually occur.
1-     * A timeout is indicated by return value of `true` and all `what[].occurred`
2-     * being set to 0.
3-     * [@return](/bitcoin-bitcoin/contributor/return/) true on success and false otherwise
4+     * for the events that actually occurred.
5+     * [@return](/bitcoin-bitcoin/contributor/return/) true on success (or timeout, if all `what[].occurred` are returned as 0), false otherwise
6      */

vasild commented at 10:18 am on March 10, 2022:

Done.

in src/util/sock.h:193 in af45eeea03 outdated

192+     */
193+    using WaitData = std::unordered_map<std::shared_ptr<const Sock>, Events, Hash, Equal>;
194+
195+    /**
196+     * Same as `Wait()`, but wait on many sockets within the same timeout.
197+     * @param[in] timeout Wait this much for at least one of the requested events to occur.

jonatack commented at 7:06 pm on February 20, 2022:

fc4bfdf s/much/long|amount of time/

vasild commented at 10:18 am on March 10, 2022:

Done.

in src/util/sock.h:184 in af45eeea03 outdated

183+    };
184+
185+    /**
186+     * On which socket to wait for what events in `WaitMany()`.
187+     * The `shared_ptr` is copied into the map to ensure that the `Sock` object
188+     * is not destroyed and the underlying socket closed. If this happens

jonatack commented at 7:09 pm on February 20, 2022:

fc4bfdf the writing here seems ambiguous to me as to whether “not” applies only to “destroyed” or also to “the underlying socket closed”

vasild commented at 10:18 am on March 10, 2022:

Reworded.

in src/net.h:1018 in af45eeea03 outdated

1024      * Accept incoming connections, one from each read-ready listening socket.
1025-     * @param[in] recv_set Sockets that are ready for read.
1026+     * @param[in] what Sockets that are ready for IO.
1027      */
1028-    void SocketHandlerListening(const std::set<SOCKET>& recv_set);
1029+    void SocketHandlerListening(Sock::WaitData& what);

jonatack commented at 7:40 pm on February 20, 2022:

af45eee what is an in-param

0    void SocketHandlerListening(const Sock::WaitData& what);

vasild commented at 10:18 am on March 10, 2022:

Done.

in src/net.cpp:1540 in af45eeea03 outdated

1536@@ -1659,13 +1537,14 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
1537     }
1538 }
1539 
1540-void CConnman::SocketHandlerListening(const std::set<SOCKET>& recv_set)
1541+void CConnman::SocketHandlerListening(Sock::WaitData& what)

jonatack commented at 7:40 pm on February 20, 2022:

af45eee what is an in-param

0void CConnman::SocketHandlerListening(const Sock::WaitData& what);

vasild commented at 10:18 am on March 10, 2022:

Done.

in src/net.h:1012 in af45eeea03 outdated

1016      */
1017-    void SocketHandlerConnected(const std::vector<CNode*>& nodes,
1018-                                const std::set<SOCKET>& recv_set,
1019-                                const std::set<SOCKET>& send_set,
1020-                                const std::set<SOCKET>& error_set);
1021+    void SocketHandlerConnected(const std::vector<CNode*>& nodes, Sock::WaitData& what);

jonatack commented at 7:43 pm on February 20, 2022:

af45eee what is an in-param

0    void SocketHandlerConnected(const std::vector<CNode*>& nodes, const Sock::WaitData& what);

vasild commented at 10:19 am on March 10, 2022:

Done.

in src/net.cpp:1448 in af45eeea03 outdated

1583 
1584-void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
1585-                                      const std::set<SOCKET>& recv_set,
1586-                                      const std::set<SOCKET>& send_set,
1587-                                      const std::set<SOCKET>& error_set)
1588+void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes, Sock::WaitData& what)

jonatack commented at 7:43 pm on February 20, 2022:

af45eee what is an in-param

0void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes, const Sock::WaitData& what)

vasild commented at 10:19 am on March 10, 2022:

Done.

jonatack commented at 7:58 pm on February 20, 2022: member

Approach ACK, first pass tested almost-ACK af45eeea037770229b7358b93d4cb8e256d7de19. Read the code, each commit debug builds cleanly / unit tests green, ran signet bitcoind on this branch rebased to master. As well as improving testability, this seems like a nice clean-up too. Reviewers may find man select helpful.

vasild force-pushed on Mar 10, 2022

vasild commented at 10:17 am on March 10, 2022: member

af45eeea03...c9420b3cce: address suggestions

jonatack commented at 10:40 am on March 10, 2022: member

ACK c9420b3ccea8cd5b2f5d77aabe58bcf34da28517 per re-review of git diff af45eee c9420b3 following my previous #24356#pullrequestreview-888069918, rebased to master at 05957a8, debug build with Debian clang version 15, unit tests

DrahtBot added the label Needs rebase on Apr 26, 2022

vasild force-pushed on Apr 27, 2022

vasild commented at 3:55 pm on April 27, 2022: member

c9420b3cce...ca8dcfabb7: rebase due to conflicts

Invalidates ACK from @jonatack

DrahtBot removed the label Needs rebase on Apr 27, 2022

in src/net.h:1005 in ca8dcfabb7 outdated

1010         EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex);
1011 
1012     /**
1013      * Accept incoming connections, one from each read-ready listening socket.
1014-     * @param[in] recv_set Sockets that are ready for read.
1015+     * @param[in] what Sockets that are ready for IO.

jonatack commented at 9:29 am on April 28, 2022:

If you agree, maybe change the names of the what method params to ones that are more descriptive and unique (e.g. socket_data, wait_sockets, wait_data, sockets_ready_for_io, etc.) and for the what localvars, either the same or i.e. s where unimportant.

vasild commented at 1:25 pm on May 19, 2022:

Renamed to EventsPerSock. That map contains, per socket, the requested and the occurred events (just like poll(2)). And the variables then become the blatant events_per_sock ;-)

jonatack commented at 5:12 pm on May 20, 2022:

Thanks, that seems better. In the last commit, I sort of hesitated in the two places where you do @param[in] events_per_sock Sockets that are ready for IO. … thinking it should be something like “Events for each socket that is ready for IO.”

jonatack commented at 9:47 am on April 28, 2022: member

re-ACK ca8dcfabb795e94ee085ba5e6cf9ab1edb1d3a06 per git range-diff 132d5f8 c9420b3 ca8dcfa, rebase-only for added thread safety annotations and assertions

DrahtBot added the label Needs rebase on May 16, 2022

vasild force-pushed on May 19, 2022

vasild commented at 1:22 pm on May 19, 2022: member

ca8dcfabb7...6747729cb8: rebase due to conflicts and rename WaitData to EventsPerSock

Invalidates ACK from @jonatack

DrahtBot removed the label Needs rebase on May 19, 2022

jonatack commented at 5:09 pm on May 20, 2022: member

re-ACK 6747729cb850914d925f9c4a7a809fff547da746

laanwj added this to the "Blockers" column in a project

jonatack commented at 9:36 am on June 8, 2022: member

Anyone else like to have a look here?

laanwj commented at 10:34 am on June 8, 2022: member

Yes, I’ll have a look.

in src/i2p.cpp:154 in 6747729cb8 outdated

149@@ -150,7 +150,7 @@ bool Session::Accept(Connection& conn)
150                 throw std::runtime_error("wait on socket failed");
151             }
152 
153-            if ((occurred & Sock::RECV) == 0) {
154+            if (occurred == 0) {
155                 // Timeout, no incoming connections within MAX_WAIT_FOR_IO.

laanwj commented at 10:38 am on June 8, 2022:

Maybe update this to “no incoming connections or errors” to match the code change.

vasild commented at 12:26 pm on June 9, 2022:

Done

in src/net.h:985 in 6747729cb8 outdated

1001-     */
1002-    void SocketEvents(const std::vector<CNode*>& nodes,
1003-                      std::set<SOCKET>& recv_set,
1004-                      std::set<SOCKET>& send_set,
1005-                      std::set<SOCKET>& error_set);
1006+    Sock::EventsPerSock GenerateWaitSockets(const std::vector<CNode*>& nodes);

laanwj commented at 10:41 am on June 8, 2022:

Instead of const std::vector<CNode*>& could use a std::span<const CNode*> which is more general?

Edit: although this is symmetrical with CConnman::GenerateWaitSockets.

vasild commented at 11:55 am on June 9, 2022:

Leaving it as is - I tried to make it Span, but it can’t be to const CNode* because the method does LOCK(pnode->cs_vSend); for each CNode element. Span<CNode*> argument results in the error: No viable conversion from 'const std::vector<CNode *>' to 'Span<CNode *>'.

It is symmetrical with CConnman::SocketHandlerConnected().

laanwj commented at 12:50 pm on June 9, 2022:

Fairly sure that it needs to be std::span<const CNode*> to be convertable from const std::vector<CNode *> Otherwise it’s a span of modifiable CNode pointers?

vasild commented at 2:25 pm on June 9, 2022:

Alright, having it Span<CNode* const> works (not Span<const CNode*>)! The pointer itself has to be constant - should not be able to re-point it to another CNode object. And that pointer is pointing to a non-constant CNode object, so that we can modify it (or at least lock its member mutex which is equivalent to modifying). Changed!

PS std::span is a C++20 thing, we are using our own Span while on C++17.

laanwj commented at 4:40 pm on June 9, 2022:

Great!

PS std::span is a C++20 thing, we are using our own Span while on C++17.

Eh, yes, true.

in src/util/sock.h:209 in 6747729cb8 outdated

210+     * @param[in,out] events_per_sock Wait for the requested events on these sockets and set
211+     * `occurred` for the events that actually occurred.
212+     * @return true on success (or timeout, if all `what[].occurred` are returned as 0),
213+     * false otherwise
214+     */
215+    [[nodiscard]] virtual bool WaitMany(std::chrono::milliseconds timeout,

laanwj commented at 10:47 am on June 8, 2022:

Maybe return std::optional<EventsPerSock> instead of using output parameter.

Edit: never mind, it’s good like this, as it’s an in+out parameter.

in src/util/sock.h:171 in 6747729cb8 outdated

172+        Event requested;
173+        Event occurred;
174+    };
175+
176+    struct Hash {
177+        size_t operator()(const std::shared_ptr<const Sock>& s) const

laanwj commented at 10:54 am on June 8, 2022:

As this code is templated in the header anyway, could we do something more general than hardcoding std::shared_ptr<const Sock> here? I mean, it would work the same for any kind of “pointer to Sock”, whether shared, unique, or raw. Same for Equal below.

Edit: OK, so looking further, Hash and Equal are implementation details internal to the EventsPerSock structure. Another option would be to name them less generally.

vasild commented at 12:25 pm on June 9, 2022:

Yeah, it has to take shared_ptr because it is used for std::unordered_map with keys that are shared_ptr. Renamed to HashSharedPtrSock and EqualSharedPtrSock, better now?

laanwj commented at 12:52 pm on June 9, 2022:

Yes, LGTM.

net: also wait for exceptional events in Sock::Wait()

This mimics closely `CConnman::SocketEvents()` and the underlying
`poll(2)`.

cc74459768

net: introduce Sock::WaitMany()

It allows waiting concurrently on more than one socket. Being a
`virtual` `Sock` method it can be overriden by tests.

Will be used to replace `CConnman::SocketEvents()`.

ae263460ba

in src/net.cpp:1459 in 6747729cb8 outdated

1588         // listening sockets in one call ("readiness" as in poll(2) or
1589         // select(2)). If none are ready, wait for a short while and return
1590         // empty sets.
1591-        SocketEvents(snap.Nodes(), recv_set, send_set, error_set);
1592+        events_per_sock = GenerateWaitSockets(snap.Nodes());
1593+        if (events_per_sock.empty() || !events_per_sock.begin()->first->WaitMany(timeout, events_per_sock)) {

laanwj commented at 12:06 pm on June 8, 2022:

This looks slightly circuitous to me. Is there a reason to have Sock::WaitMany be an instance method instead of a static one? It acts on multiple sockets at once, not one socket, after all, does it matter which one you call it on?

vasild commented at 12:47 pm on June 9, 2022:

This looks slightly circuitous to me.

It is!

Is there a reason to have Sock::WaitMany be an instance method instead of a static one?

Yes. It is easier to override/mock it this way in tests.

It acts on multiple sockets at once, not one socket, after all, does it matter which one you call it on?

No it does not matter. Indeed it does not access any of the member variables and acts only on the sockets that are passed as arguments. As such it is more natural for it to be a standalone function or a static method of the Sock class.

I chose to have it as virtual method of the Sock class because this way it is easier for tests to provide a mocked alternative of it - just by providing a mocked Sock class, e.g. FuzzedSock.

If WaitMany() is a standalone function it would have to be done in a similar way like CreateSock() and g_dns_lookup() - global variables that point to “real” functions but are re-pointed to mocked functions in tests. So, each test that wants to mock those would have to override WaitMany() in addition to CreateSock(). Should I try to do it like that?

laanwj commented at 12:53 pm on June 9, 2022:

That’s a good point. We don’t really have a parent object (e.g. some EventManager or such) that this could be a method on, and abusing a global function would be quite ugly too.

vasild commented at 2:28 pm on June 9, 2022:

Ok, leaving it as is now. I do not have a strong opinion, could change it later and am open to other ideas/suggestions.

laanwj commented at 4:40 pm on June 9, 2022:

Agree

vasild force-pushed on Jun 9, 2022

vasild commented at 12:25 pm on June 9, 2022: member

6747729cb8...358dab76aa: address some suggestions

Invalidates ACK from @jonatack

net: use Sock::WaitMany() instead of CConnman::SocketEvents()

Rename `GenerateSelectSet()` to `GenerateWaitSockets()` and adapt it to
generate a wait data suitable for `Sock::WaitMany()`. Then call it from
`CConnman::SocketHandler()` and feed the generated data to
`Sock::WaitMany()`.

This way `CConnman::SocketHandler()` can be unit tested because
`Sock::WaitMany()` is mockable, so the usage of real sockets can be
avoided.

Resolves https://github.com/bitcoin/bitcoin/issues/21744

6e68ccbefe

vasild force-pushed on Jun 9, 2022

vasild commented at 2:27 pm on June 9, 2022: member

358dab76aa...6e68ccbefe: use Span for CConnman::GenerateWaitSockets() argument.

Previously invalidated ACK from @jonatack.

jonatack commented at 4:30 pm on June 9, 2022: member

There’s a fuzz addressSAN issue, but it seems unrelated and there is the same one on the last push on master https://cirrus-ci.com/task/4544256453902336.

jonatack commented at 5:02 pm on June 9, 2022: member

re-ACK 6e68ccbefea6509c61fc4405a391a517c6057bb0 per git range-diff e18fd47 6747729 6e68ccb, and verified rebase to master and debug build

laanwj commented at 5:09 pm on June 9, 2022: member

Code review ACK 6e68ccbefea6509c61fc4405a391a517c6057bb0

laanwj merged this on Jun 16, 2022

laanwj closed this on Jun 16, 2022

laanwj removed this from the "Blockers" column in a project

sidhujag referenced this in commit 10043f624c on Jun 17, 2022

vasild deleted the branch on Jun 17, 2022

Empact referenced this in commit b095ac573b on Apr 25, 2023

DrahtBot locked this on Jun 17, 2023

refactor: replace CConnman::SocketEvents() with mockable Sock::WaitMany() #24356

Conflicts