CConnman::SocketEvents() may poll the wrong socket #21744

issue vasild openend this issue on April 21, 2021
  1. vasild commented at 3:13 pm on April 21, 2021: member

    Adverse scenario:

    1. thread “net”
    0CConnman::ThreadSocketHandler()
    1  CConnman::SocketHandler()
    2    CConnman::SocketEvents()
    3      CConnman::GenerateSelectSet()
    4        Cycles through CConnman::vNodes and remembers some sockets to
    5        be polled for readiness later by CConnman::SocketEvents().
    6        Lets say socket (file descriptor) 10 is remembered.
    
    1. thread “msghand”
    0CConnman::ThreadMessageHandler()
    1  PeerManagerImpl::ProcessMessages()
    2    PeerManagerImpl::ProcessMessage()
    3      PeerManagerImpl::PushNodeVersion()
    4        CConnman::PushMessage()
    5          CConnman::SocketSendData()
    6            CNode::CloseSocketDisconnect()
    7              Closes socket 10.
    
    1. thread “opencon”
    0CConnman::ThreadOpenConnections()
    1  CConnman::OpenNetworkConnection()
    2    CConnman::ConnectNode()
    3      CreateSock()
    4        Creates a new socket, reusing file descriptor 10.
    
    1. thread “net”
    0CConnman::ThreadSocketHandler()
    1  CConnman::SocketHandler()
    2    CConnman::SocketEvents()
    3      polls the remembered socket 10, which corresponds to a different
    4      connection now (bug)
    

    As a result the following logic may not hold:

    https://github.com/bitcoin/bitcoin/blob/e16f8720dca2de1040478968c9f3ca07644a04b7/src/net.cpp#L1306-L1315

    I think the severity of this is low and that it has a low chance of happening. However the pattern of “remember a socket by its file descriptor number and use it later, concurrently with other threads that might close it” better be avoided. That pattern could lead to closing or writing to the wrong socket which would be more severe.

  2. vasild added the label Bug on Apr 21, 2021
  3. vasild commented at 9:30 am on May 12, 2021: member

    This is surprisingly difficult to fix (without doing huge amount of changes). I see two ways to address it:

    1. Hold the socket mutex while poll()ing, so that other threads that want to close the socket would have to wait for the poll() to finish. Not acceptable IMO.
    2. Lazy close with reference counting. Increment the reference count of the socket during poll(). Whenever a thread wants to close the socket (disconnect a peer) it should disable the socket for send/recv (e.g. shutdown(2)) but not close(2) it if reference count is >0 and somebody else should close all disabled sockets with reference count of 0.
  4. practicalswift commented at 9:43 am on May 12, 2021: contributor

    @vasild

    Interesting find! Thanks for reporting.

    May I ask how you found this issue?

    Was it found as part of the Sock encapsulation project, or a related fuzzing find? :)

    Impressive find either way.

  5. vasild commented at 10:14 am on May 12, 2021: member

    Was it found as part of the Sock encapsulation project…

    Correct! :)

    In #21878, while replacing SOCKET with Sock in CConnman::SocketEvents() one option was to keep pointers to the Sock objects and close the sockets by destroying those objects, but (because of this bug) this would have ended up with dangling pointers, pointing to possibly-deleted-by-other-threads objects.

    This is the reason that PR keeps the Sock objects alive and calls Sock::Reset() from CNode::CloseSocketDisconnect() instead of something like delete m_sock.

  6. vasild commented at 10:27 am on May 12, 2021: member

    Hmm, maybe the following changes in #21878 would fix this bug elegantly (doing the lazy close I mentioned above):

    • Change CNode::m_sock from std::unique_ptr<Sock> to std::shared_ptr<Sock>.
    • In CNode::CloseSocketDisconnect() do m_sock.reset() (reducing the ref count stored in the shared_ptr and destroying the Sock object if ref count is 0).
    • Take a copy of std::shared_ptr<Sock> in CConnman::SocketEvents() or its replacement Sock::WaitMany(). The copy will automatically keep the ref count >0 during poll().

    I will consider this more carefully.

  7. vasild referenced this in commit 6a809ae907 on May 31, 2021
  8. vasild commented at 10:19 am on May 31, 2021: member

    maybe the following changes in #21878 would fix this bug elegantly…

    Done. In addition to the above, CConnman::ListenSocket::sock also had to be changed from unique_ptr to shared_ptr because we feed those sockets to Sock::WaitMany() too.

    This issue is now resolved by commit net: use Sock::WaitMany() instead of CConnman::SocketEvents() in #21878.

  9. vasild referenced this in commit 3e2b3eb10d on May 31, 2021
  10. vasild referenced this in commit 1770567602 on Jun 3, 2021
  11. vasild referenced this in commit 6a8c8c110a on Jun 22, 2021
  12. vasild referenced this in commit c3acb68b2c on Aug 24, 2021
  13. vasild referenced this in commit ca7a5053ab on Aug 27, 2021
  14. vasild referenced this in commit e04d8508ec on Aug 30, 2021
  15. vasild referenced this in commit 538152fa46 on Sep 23, 2021
  16. vasild referenced this in commit f6e3c597bc on Sep 27, 2021
  17. vasild referenced this in commit 26b58387b3 on Sep 28, 2021
  18. vasild referenced this in commit 888eea8c79 on Sep 28, 2021
  19. vasild referenced this in commit 3a8d37ac64 on Oct 8, 2021
  20. vasild referenced this in commit fa8cdbf2d5 on Oct 25, 2021
  21. vasild referenced this in commit 72b1685219 on Nov 10, 2021
  22. vasild referenced this in commit 89dbf42921 on Nov 12, 2021
  23. vasild referenced this in commit 0ee3b6b108 on Nov 18, 2021
  24. vasild referenced this in commit a21f07c76b on Nov 25, 2021
  25. vasild referenced this in commit d7ad4f72e5 on Nov 26, 2021
  26. vasild referenced this in commit aa77871d72 on Nov 26, 2021
  27. vasild referenced this in commit 0a6920fda8 on Dec 1, 2021
  28. vasild referenced this in commit 86fe131893 on Dec 7, 2021
  29. vasild referenced this in commit 56ea9160c3 on Dec 15, 2021
  30. vasild referenced this in commit 22a60e9f39 on Jan 6, 2022
  31. vasild referenced this in commit 505d008c77 on Feb 11, 2022
  32. vasild referenced this in commit af45eeea03 on Feb 15, 2022
  33. vasild referenced this in commit c9420b3cce on Mar 10, 2022
  34. vasild referenced this in commit eeded18a64 on Apr 15, 2022
  35. vasild referenced this in commit 9cfa1a7d1c on Apr 19, 2022
  36. vasild referenced this in commit ca8dcfabb7 on Apr 27, 2022
  37. vasild referenced this in commit 933068a0ca on Apr 27, 2022
  38. vasild referenced this in commit 6747729cb8 on May 19, 2022
  39. vasild referenced this in commit f0fc63894f on May 27, 2022
  40. vasild referenced this in commit 358dab76aa on Jun 9, 2022
  41. vasild referenced this in commit 6e68ccbefe on Jun 9, 2022
  42. laanwj closed this on Jun 16, 2022

  43. janus referenced this in commit 65a5f1271b on Aug 4, 2022
  44. DrahtBot locked this on Jun 16, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 16:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me