p2p, refactor: Use Mutex type for some mutexes in CNode class #19915

pull hebasto wants to merge 1 commits into bitcoin:master from hebasto:200908-netmx changing 1 files +3 −3
  1. hebasto commented at 8:30 AM on September 8, 2020: member

    No need the RecursiveMutex type for the CNode::cs_vSend, CNode::cs_hSocket and CNode::cs_vRecv.

    Related to #19303.

  2. refactor: Use Mutex type for some mutexes in CNode class 0e51a35512
  3. hebasto renamed this:
    refactor: Use Mutex type for some mutexes in CNode class
    p2p, refactor: Use Mutex type for some mutexes in CNode class
    on Sep 8, 2020
  4. DrahtBot added the label P2P on Sep 8, 2020
  5. MarcoFalke added the label Refactoring on Sep 10, 2020
  6. dhruv commented at 6:16 AM on September 16, 2020: member

    Review methodology

    std::mutex locks across a thread and within a thread. i.e. if a mutex is locked in a call stack on a given thread, it can no longer be acquired on any thread.

    std::recursive_mutex locks across a thread but not within a thread. i.e. if a recursive_mutex is locked in a call stack on a given thread, it can still be re-acquired on the same thread further in the call stack. recursive_mutex behaves similar to mutex across threads.

    So when we change a recursive_mutex to a mutex, this reviewer sees the risk of a deadlock within a given call stack as mutex is more restrictive. Reviewing each lock acquisition site and tracing backward to build a call graph, if we determine that the call graph is acyclic, and no other lock site is in the path, the conversion should be safe. @hebasto:

    • Would you agree with the review method? Am I missing something?
    • I started doing this manually with the results below for cs_vSend, but found way too many call sites for CConnman::PushMessage() to proceed manually. Do you think it'd be useful if I tried to spend some time building automation for such analysis (not sure how)? Could be re-usable across #19303 ?

    Results from call tracing

    cs_vSend call site tracing:

    src/net.cpp: CNode::copyStats()
    |-- src/net.cpp: CConnman:GetNodeStats()
    	|-- src/interfaces/node.cpp: NodeImpl::getNodeStats()
    		|-- src/qt/peertablemodel.cpp: PeerTablePriv::refreshPeers()
    			|-- src/qt/peertablemodel.cpp: PeerTableModel::refresh()
    				|-- src/qt/peertablemodel.cpp: PeerTableModel::PeerTableModel()
    					|-- src/qt/clientmodel.cpp: ClientModel::ClientModel()
    						|-- src/qt/bitcoin.cpp: BitcoinApplication::initializeResult()
    							|-- src/qt/bitcoin.cpp: BitcoinApplication::startThread()
    								|-- src/qt/bitcoin.cpp: BitcoinApplication::requestInitialize()
    									|-- src/qt/bitcoin.cpp: GuiMain()
    										|-- src/qt/main.cpp: main()
    								|-- src/qt/bitcoin.cpp: BitcoinApplication::requestShutdown()()
    									|-- src/qt/bitcoin.cpp: GuiMain()
    										|-- src/qt/main.cpp: main()
    							
    	|-- src/rpc/net.cpp: getpeerinfo()
    	
    src/net.cpp: CConnman::DisconnectNodes()
    	|-- src/net.cpp: CConnman::ThreadSocketHandler()
    		|-- src/net.cpp: CConnman::Start(): Spins up a `b-net` thread for `ThreadSocketHandler`
    
    src/net.cpp: CConnman::GenerateSelectSet()
    	|-- src/net.cpp: CConnman::SocketEvents()
    		|-- src/net.cpp: CConnman::SocketHandler() [IS LOCK SITE: lock is acquired after call to SocketEvents is complete]
    			|-- src/net.cpp: CConnman::ThreadSocketHandler()
    				|-- src/net.cpp: CConnman::Start(): Spins up a `b-net` thread for `ThreadSocketHandler`
    
    src/net.cpp: CConnman::SocketHandler()
    	|-- src/net.cpp: CConnman::ThreadSocketHandler()
    		|-- src/net.cpp: CConnman::Start(): Spins up a `b-net` thread for `ThreadSocketHandler`
    
    src/net.cpp: CConnman::PushMessage()
    |-- **55 different call sites in net_processing.cpp**
    
  7. hebasto commented at 10:56 AM on September 19, 2020: member

    @dhruv

    std::mutex locks... std::recursive_mutex locks...

    TBH, I don't think that "customizing" technical terms could be helpful. A mutex does not lock anything, but it could be locked by a thread.

    So when we change a recursive_mutex to a mutex, this reviewer sees the risk of a deadlock within a given call stack as mutex is more restrictive.

    I think that the transition from a recursive_mutex to a mutex cannot introduce new deadlocks as it does not change locking order. The real risk is the undefined behavior: "If lock is called by a thread that already owns the mutex, the behavior is undefined: for example, the program may deadlock".

    Reviewing each lock acquisition site and tracing backward to build a call graph, if we determine that the call graph is acyclic, and no other lock site is in the path, the conversion should be safe.

    To be correct we need a call graph while a particular mutex is held. A cycle in the call graph means UB.

    Let's consider cs_vSend, for example:

    • CNode::copyStats()
      • LOCK(cs_vSend); -- no other calls in the scope. It is safe
    • CConnman::SocketSendData()
      • GetSystemTimeInSeconds() does not try to lock cs_vSend
      • CNode::CloseSocketDisconnect() does not try to lock cs_vSend

    ... and so on.

  8. dhruv commented at 5:17 PM on September 19, 2020: member

    Thank you for the response and corrections, @hebasto - we are on the same page now. In the coming days, I'll tinker with doxygen or CastXML and try to write a script that can find the potential UB. In the meantime, if you have better ideas for the review, or think it's best left for manual review, I would love to hear about it.

  9. hebasto commented at 12:11 PM on October 14, 2020: member

    @troygiorshev @jnewbery @MarcoFalke Mind looking into this PR?

  10. MarcoFalke commented at 12:27 PM on October 14, 2020: member

    It would be good to have the debug-lockorder change merged first, which checks that a mutex is not already held by the thread that tries to take it

  11. hebasto commented at 12:30 PM on October 14, 2020: member

    It would be good to have the debug-lockorder change merged first, which checks that a mutex is not already held by the thread that tries to take it

    Agree. Let #19337 go first.

  12. jnewbery commented at 1:00 PM on December 31, 2020: member

    utACK 0e51a35512

    ~The data guarded by cs_hSocket is one bit of information (has the hSocket been changed to INVALID_SOCKET by CloseSocket()). I think we could do better by:~

    • ~Making the hSocket member const and setting it in the CNode's initializer list~
    • ~Make CloseSocket() take a const SOCKET& or by-value SOCKET (i.e. it doesn't update the value of the passed in socket)~
    • ~Adding an atomic bool socket_closed that gets initialized to false and set to true in CloseSocketDisconnect()~
    • ~Changing all of the LOCK(cs_hSocket); if (hSocket == INVALID_SOCKET) ... instances to if (socket_closed.load())~

    ~Perhaps that could be done in a follow-up. Changing recursive mutexes to mutexes is good. Getting rid of them entirely is even better.~

    EDIT: Ignore that. The cs_hSocket mutex also prevents multiple threads from sending/receiving/closing the same socket.

  13. hebasto commented at 12:35 PM on January 1, 2021: member

    @MarcoFalke

    It would be good to have the debug-lockorder change merged first, which checks that a mutex is not already held by the thread that tries to take it

    #19337 has been merged :)

  14. MarcoFalke commented at 11:18 AM on January 5, 2021: member

    review ACK 0e51a355128a825d428fe2b9017c25085731fc04 🔊

    <details><summary>Show signature and timestamp</summary>

    Signature:

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA512
    
    review ACK 0e51a355128a825d428fe2b9017c25085731fc04 🔊
    -----BEGIN PGP SIGNATURE-----
    
    iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYEACgkQzit1aX5p
    pUhe9AwAsp8/vU4EpiwanorQlFjYBvJTOuT1tJpPr9aI8Ryeeyv1q718YvoSkphp
    Q7VPKHFHqLxhV0l0okDjSg22xVMkrBF7G8eyIaNyHbTDirOjX3qdM4uY2FHz/gGC
    dw3a5apTs2vYBoaA0IPKIIk3emFKpo1yq6haqWVfih3PWG2EDQce/LRg4mqZfkM8
    OcgQOAFPCpyJkMkV5a/W5piur80aOz+RggpQK/odL10/rn5FeNqN5+4LpQZk/Liq
    dhzz70NkivpvdyP16tRwcTMmI2TjDrcGVVTd+tfTNnCaNqHmjeNEf7pc1M6JMo1T
    jCgiYy7Za04TXQdar+KacVVpONGzFZvvxMKFOKBhadVz0jVQh41vlkOBf5e+AXyy
    Rz3R+o8Z/0xlBZ7tzEfPuZbCN0kqkemzDucsVKXyv40TZxzNEGrXa9Fd9Q9qUbH/
    OUYAC45Sq0ZMjlBdzZJImRgSzjw8ohkt4MZtoSREHv5wqp453/B89X9O9uihPeeu
    RC2PY1Fp
    =PApH
    -----END PGP SIGNATURE-----
    

    Timestamp of file with hash 0ec71e4b61f398f812ed971bafc7847ee070d12c4f1d8a0e7c3d97d010c764f3 -

    </details>

  15. MarcoFalke merged this on Jan 5, 2021
  16. MarcoFalke closed this on Jan 5, 2021

  17. hebasto deleted the branch on Jan 5, 2021
  18. sidhujag referenced this in commit e01d44f50b on Jan 5, 2021
  19. Fabcien referenced this in commit 7ae0a8a188 on Apr 27, 2021
  20. DrahtBot locked this on Aug 16, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-22 06:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me