No need the RecursiveMutex type for the CNode::cs_vSend, CNode::cs_hSocket and CNode::cs_vRecv.
Related to #19303.
std::mutex locks across a thread and within a thread. i.e. if a mutex is locked in a call stack on a given thread, it can no longer be acquired on any thread.
std::recursive_mutex locks across a thread but not within a thread. i.e. if a recursive_mutex is locked in a call stack on a given thread, it can still be re-acquired on the same thread further in the call stack. recursive_mutex behaves similar to mutex across threads.
So when we change a recursive_mutex to a mutex, this reviewer sees the risk of a deadlock within a given call stack as mutex is more restrictive. Reviewing each lock acquisition site and tracing backward to build a call graph, if we determine that the call graph is acyclic, and no other lock site is in the path, the conversion should be safe.
@hebasto:
cs_vSend, but found way too many call sites for CConnman::PushMessage() to proceed manually. Do you think it'd be useful if I tried to spend some time building automation for such analysis (not sure how)? Could be re-usable across #19303 ?cs_vSend call site tracing:
src/net.cpp: CNode::copyStats()
|-- src/net.cpp: CConnman:GetNodeStats()
|-- src/interfaces/node.cpp: NodeImpl::getNodeStats()
|-- src/qt/peertablemodel.cpp: PeerTablePriv::refreshPeers()
|-- src/qt/peertablemodel.cpp: PeerTableModel::refresh()
|-- src/qt/peertablemodel.cpp: PeerTableModel::PeerTableModel()
|-- src/qt/clientmodel.cpp: ClientModel::ClientModel()
|-- src/qt/bitcoin.cpp: BitcoinApplication::initializeResult()
|-- src/qt/bitcoin.cpp: BitcoinApplication::startThread()
|-- src/qt/bitcoin.cpp: BitcoinApplication::requestInitialize()
|-- src/qt/bitcoin.cpp: GuiMain()
|-- src/qt/main.cpp: main()
|-- src/qt/bitcoin.cpp: BitcoinApplication::requestShutdown()()
|-- src/qt/bitcoin.cpp: GuiMain()
|-- src/qt/main.cpp: main()
|-- src/rpc/net.cpp: getpeerinfo()
src/net.cpp: CConnman::DisconnectNodes()
|-- src/net.cpp: CConnman::ThreadSocketHandler()
|-- src/net.cpp: CConnman::Start(): Spins up a `b-net` thread for `ThreadSocketHandler`
src/net.cpp: CConnman::GenerateSelectSet()
|-- src/net.cpp: CConnman::SocketEvents()
|-- src/net.cpp: CConnman::SocketHandler() [IS LOCK SITE: lock is acquired after call to SocketEvents is complete]
|-- src/net.cpp: CConnman::ThreadSocketHandler()
|-- src/net.cpp: CConnman::Start(): Spins up a `b-net` thread for `ThreadSocketHandler`
src/net.cpp: CConnman::SocketHandler()
|-- src/net.cpp: CConnman::ThreadSocketHandler()
|-- src/net.cpp: CConnman::Start(): Spins up a `b-net` thread for `ThreadSocketHandler`
src/net.cpp: CConnman::PushMessage()
|-- **55 different call sites in net_processing.cpp**
std::mutexlocks...std::recursive_mutexlocks...
TBH, I don't think that "customizing" technical terms could be helpful. A mutex does not lock anything, but it could be locked by a thread.
So when we change a
recursive_mutexto amutex, this reviewer sees the risk of a deadlock within a given call stack asmutexis more restrictive.
I think that the transition from a recursive_mutex to a mutex cannot introduce new deadlocks as it does not change locking order. The real risk is the undefined behavior: "If lock is called by a thread that already owns the mutex, the behavior is undefined: for example, the program may deadlock".
Reviewing each lock acquisition site and tracing backward to build a call graph, if we determine that the call graph is acyclic, and no other lock site is in the path, the conversion should be safe.
To be correct we need a call graph while a particular mutex is held. A cycle in the call graph means UB.
Let's consider cs_vSend, for example:
CNode::copyStats()LOCK(cs_vSend); -- no other calls in the scope. It is safeCConnman::SocketSendData()GetSystemTimeInSeconds() does not try to lock cs_vSendCNode::CloseSocketDisconnect() does not try to lock cs_vSend... and so on.
Thank you for the response and corrections, @hebasto - we are on the same page now. In the coming days, I'll tinker with doxygen or CastXML and try to write a script that can find the potential UB. In the meantime, if you have better ideas for the review, or think it's best left for manual review, I would love to hear about it.
@troygiorshev @jnewbery @MarcoFalke Mind looking into this PR?
It would be good to have the debug-lockorder change merged first, which checks that a mutex is not already held by the thread that tries to take it
utACK 0e51a35512
~The data guarded by cs_hSocket is one bit of information (has the hSocket been changed to INVALID_SOCKET by CloseSocket()). I think we could do better by:~
hSocket member const and setting it in the CNode's initializer list~CloseSocket() take a const SOCKET& or by-value SOCKET (i.e. it doesn't update the value of the passed in socket)~socket_closed that gets initialized to false and set to true in CloseSocketDisconnect()~LOCK(cs_hSocket); if (hSocket == INVALID_SOCKET) ... instances to if (socket_closed.load())~~Perhaps that could be done in a follow-up. Changing recursive mutexes to mutexes is good. Getting rid of them entirely is even better.~
EDIT: Ignore that. The cs_hSocket mutex also prevents multiple threads from sending/receiving/closing the same socket.
review ACK 0e51a355128a825d428fe2b9017c25085731fc04 🔊
<details><summary>Show signature and timestamp</summary>
Signature:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
review ACK 0e51a355128a825d428fe2b9017c25085731fc04 🔊
-----BEGIN PGP SIGNATURE-----
iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYEACgkQzit1aX5p
pUhe9AwAsp8/vU4EpiwanorQlFjYBvJTOuT1tJpPr9aI8Ryeeyv1q718YvoSkphp
Q7VPKHFHqLxhV0l0okDjSg22xVMkrBF7G8eyIaNyHbTDirOjX3qdM4uY2FHz/gGC
dw3a5apTs2vYBoaA0IPKIIk3emFKpo1yq6haqWVfih3PWG2EDQce/LRg4mqZfkM8
OcgQOAFPCpyJkMkV5a/W5piur80aOz+RggpQK/odL10/rn5FeNqN5+4LpQZk/Liq
dhzz70NkivpvdyP16tRwcTMmI2TjDrcGVVTd+tfTNnCaNqHmjeNEf7pc1M6JMo1T
jCgiYy7Za04TXQdar+KacVVpONGzFZvvxMKFOKBhadVz0jVQh41vlkOBf5e+AXyy
Rz3R+o8Z/0xlBZ7tzEfPuZbCN0kqkemzDucsVKXyv40TZxzNEGrXa9Fd9Q9qUbH/
OUYAC45Sq0ZMjlBdzZJImRgSzjw8ohkt4MZtoSREHv5wqp453/B89X9O9uihPeeu
RC2PY1Fp
=PApH
-----END PGP SIGNATURE-----
Timestamp of file with hash 0ec71e4b61f398f812ed971bafc7847ee070d12c4f1d8a0e7c3d97d010c764f3 -
</details>