CConnman::SocketEvents() may poll the wrong socket

vasild commented at 3:13 pm on April 21, 2021: member

Adverse scenario:

thread “net”

0CConnman::ThreadSocketHandler()
1  CConnman::SocketHandler()
2    CConnman::SocketEvents()
3      CConnman::GenerateSelectSet()
4        Cycles through CConnman::vNodes and remembers some sockets to
5        be polled for readiness later by CConnman::SocketEvents().
6        Lets say socket (file descriptor) 10 is remembered.

thread “msghand”

0CConnman::ThreadMessageHandler()
1  PeerManagerImpl::ProcessMessages()
2    PeerManagerImpl::ProcessMessage()
3      PeerManagerImpl::PushNodeVersion()
4        CConnman::PushMessage()
5          CConnman::SocketSendData()
6            CNode::CloseSocketDisconnect()
7              Closes socket 10.

thread “opencon”

0CConnman::ThreadOpenConnections()
1  CConnman::OpenNetworkConnection()
2    CConnman::ConnectNode()
3      CreateSock()
4        Creates a new socket, reusing file descriptor 10.

thread “net”

0CConnman::ThreadSocketHandler()
1  CConnman::SocketHandler()
2    CConnman::SocketEvents()
3      polls the remembered socket 10, which corresponds to a different
4      connection now (bug)

As a result the following logic may not hold:

https://github.com/bitcoin/bitcoin/blob/e16f8720dca2de1040478968c9f3ca07644a04b7/src/net.cpp#L1306-L1315

I think the severity of this is low and that it has a low chance of happening. However the pattern of “remember a socket by its file descriptor number and use it later, concurrently with other threads that might close it” better be avoided. That pattern could lead to closing or writing to the wrong socket which would be more severe.

vasild added the label Bug on Apr 21, 2021

vasild commented at 9:30 am on May 12, 2021: member

This is surprisingly difficult to fix (without doing huge amount of changes). I see two ways to address it:

Hold the socket mutex while poll()ing, so that other threads that want to close the socket would have to wait for the poll() to finish. Not acceptable IMO.
Lazy close with reference counting. Increment the reference count of the socket during poll(). Whenever a thread wants to close the socket (disconnect a peer) it should disable the socket for send/recv (e.g. shutdown(2)) but not close(2) it if reference count is >0 and somebody else should close all disabled sockets with reference count of 0.

practicalswift commented at 9:43 am on May 12, 2021: contributor

@vasild

Interesting find! Thanks for reporting.

May I ask how you found this issue?

Was it found as part of the Sock encapsulation project, or a related fuzzing find? :)

Impressive find either way.

vasild commented at 10:14 am on May 12, 2021: member

Was it found as part of the Sock encapsulation project…

Correct! :)

In #21878, while replacing SOCKET with Sock in CConnman::SocketEvents() one option was to keep pointers to the Sock objects and close the sockets by destroying those objects, but (because of this bug) this would have ended up with dangling pointers, pointing to possibly-deleted-by-other-threads objects.

This is the reason that PR keeps the Sock objects alive and calls Sock::Reset() from CNode::CloseSocketDisconnect() instead of something like delete m_sock.

vasild commented at 10:27 am on May 12, 2021: member

Hmm, maybe the following changes in #21878 would fix this bug elegantly (doing the lazy close I mentioned above):

Change CNode::m_sock from std::unique_ptr<Sock> to std::shared_ptr<Sock>.
In CNode::CloseSocketDisconnect() do m_sock.reset() (reducing the ref count stored in the shared_ptr and destroying the Sock object if ref count is 0).
Take a copy of std::shared_ptr<Sock> in CConnman::SocketEvents() or its replacement Sock::WaitMany(). The copy will automatically keep the ref count >0 during poll().

I will consider this more carefully.

vasild referenced this in commit 6a809ae907 on May 31, 2021

vasild commented at 10:19 am on May 31, 2021: member

maybe the following changes in #21878 would fix this bug elegantly…

Done. In addition to the above, CConnman::ListenSocket::sock also had to be changed from unique_ptr to shared_ptr because we feed those sockets to Sock::WaitMany() too.

This issue is now resolved by commit net: use Sock::WaitMany() instead of CConnman::SocketEvents() in #21878.

vasild referenced this in commit 3e2b3eb10d on May 31, 2021

vasild referenced this in commit 1770567602 on Jun 3, 2021

vasild referenced this in commit 6a8c8c110a on Jun 22, 2021

vasild referenced this in commit c3acb68b2c on Aug 24, 2021

vasild referenced this in commit ca7a5053ab on Aug 27, 2021

vasild referenced this in commit e04d8508ec on Aug 30, 2021

vasild referenced this in commit 538152fa46 on Sep 23, 2021

vasild referenced this in commit f6e3c597bc on Sep 27, 2021

vasild referenced this in commit 26b58387b3 on Sep 28, 2021

vasild referenced this in commit 888eea8c79 on Sep 28, 2021

vasild referenced this in commit 3a8d37ac64 on Oct 8, 2021

vasild referenced this in commit fa8cdbf2d5 on Oct 25, 2021

vasild referenced this in commit 72b1685219 on Nov 10, 2021

vasild referenced this in commit 89dbf42921 on Nov 12, 2021

vasild referenced this in commit 0ee3b6b108 on Nov 18, 2021

vasild referenced this in commit a21f07c76b on Nov 25, 2021

vasild referenced this in commit d7ad4f72e5 on Nov 26, 2021

vasild referenced this in commit aa77871d72 on Nov 26, 2021

vasild referenced this in commit 0a6920fda8 on Dec 1, 2021

vasild referenced this in commit 86fe131893 on Dec 7, 2021

vasild referenced this in commit 56ea9160c3 on Dec 15, 2021

vasild referenced this in commit 22a60e9f39 on Jan 6, 2022

vasild referenced this in commit 505d008c77 on Feb 11, 2022

vasild referenced this in commit af45eeea03 on Feb 15, 2022

vasild referenced this in commit c9420b3cce on Mar 10, 2022

vasild referenced this in commit eeded18a64 on Apr 15, 2022

vasild referenced this in commit 9cfa1a7d1c on Apr 19, 2022

vasild referenced this in commit ca8dcfabb7 on Apr 27, 2022

vasild referenced this in commit 933068a0ca on Apr 27, 2022

vasild referenced this in commit 6747729cb8 on May 19, 2022

vasild referenced this in commit f0fc63894f on May 27, 2022

vasild referenced this in commit 358dab76aa on Jun 9, 2022

vasild referenced this in commit 6e68ccbefe on Jun 9, 2022

laanwj closed this on Jun 16, 2022

janus referenced this in commit 65a5f1271b on Aug 4, 2022

DrahtBot locked this on Jun 16, 2023

CConnman::SocketEvents() may poll the wrong socket #21744