bugfix: Do not lock EventLoop::mutex after EventLoop is done #159

pull ryanofsky wants to merge 1 commits into bitcoin-core:master from ryanofsky:pr/macbug changing 2 files +10 −7
  1. ryanofsky commented at 4:24 am on February 10, 2025: collaborator

    Currently, there are two cases where code may attempt to lock EventLoop::mutex after the EventLoop::loop() method has finished running. This is not a problem by itself, but is a problem if there is external code which deletes the EventLoop object immediately after the method returns, which is the case in Bitcoin Core. Both cases of locking after the loop method returns happen due to uses of the Unlock() utility function which unlocks a mutex temporarily, runs a callback function and relocks it.

    The first case happens in EventLoop::removeClient when the EventLoop::done() condition is reached and it calls Unlock() in order to be able to call write(post_fd, ...) without blocking and wake the EventLoop thread. In this case, since EventLoop execution is done there is not really any point to using Unlock() and relocking, so the code is changed to use a simple lock.unlock() call and not try to relock.

    The second case happens in EventLoop::post where Unlock() is used in a similar way, and depending on thread scheduling (if the thread is delayed for a long time before relocking) can result in failing to relock EventLoop::m_mutex after calling write(). In this case, since relocking the mutex is actually necessary for the code that follows, the fix is different: new addClient/removeClient calls are added to this code, so the EventLoop::loop() function will never exit while the post() function is waiting.

    These two changes are being labeled as a bugfix even though there is not technically a bug here in the libmultiprocess code or API. The EventLoop object itself was safe before these changes as long as outside code waited for EventLoop methods to finish executing before deleting the EventLoop object. Originally the multiprocess code added in https://github.com/bitcoin/bitcoin/pull/10102 did this, but later as more features were added for binding and connecting to unix sockets, and unit tests were added which would immediately delete the EventLoop object after EventLoop::loop() returned, it meant this code could now start failing to relock after unlocking.

    A previous attempt was made to fix this bug in https://github.com/bitcoin/bitcoin/pull/31815 outside of libmultiprocess code. But it only addressed the first case of a failing Unlock() in EventLoop::removeClient(), not the second case in EventLoop::post().

    This first case described above was not observed but is theoretically possible. The second case happens intermittently on macos and was reported https://github.com/chaincodelabs/libmultiprocess/issues/154

  2. bugfix: Do not lock EventLoop::mutex after EventLoop is done
    Currently, there are two cases where code may attempt to lock
    `EventLoop::mutex` after the `EventLoop::loop()` method has finished running.
    This is not a problem by itself, but is a problem if there is external code
    which deletes the `EventLoop` object immediately after the method returns,
    which is the case in Bitcoin Core. Both cases of locking after the loop method
    returns happen due to uses of the `Unlock()` utility function which unlocks a
    mutex temporarily, runs a callback function and relocks it.
    
    The first case happens in `EventLoop::removeClient` when the
    `EventLoop::done()` condition is reached and it calls `Unlock()` in order to be
    able to call `write(post_fd, ...)` without blocking and wake the EventLoop
    thread. In this case, since `EventLoop` execution is done there is not really
    any point to using `Unlock()` and relocking, so the code is changed to use a
    simple `lock.unlock()` call and not try to relock.
    
    The second case happens in `EventLoop::post` where `Unlock()` is used in a
    similar way, and depending on thread scheduling (if the thread is delayed for a
    long time before relocking) can result in failing to relock
    `EventLoop::m_mutex` after calling `write()`. In this case, since relocking the
    mutex is actually necessary for the code that follows, the fix is different:
    new `addClient`/`removeClient` calls are added to this code, so the
    `EventLoop::loop()` function will never exit while the `post()` function is
    waiting.
    
    These two changes are being labeled as a bugfix even though there is not
    technically a bug here in the libmultiprocess code or API. The `EventLoop`
    object itself was safe before these changes as long as outside code waited for
    `EventLoop` methods to finish executing before deleting the `EventLoop` object.
    Originally the multiprocess code added in
    https://github.com/bitcoin/bitcoin/pull/10102 did this, but later as more
    features were added for binding and connecting to unix sockets, and unit tests
    were added which would immediately delete the `EventLoop` object after
    `EventLoop::loop()` returned, it meant this code could now start failing to
    relock after unlocking.
    
    A previous attempt was made to fix this bug in
    https://github.com/bitcoin/bitcoin/pull/31815 outside of libmultiprocess code.
    But it only addressed the first case of a failing `Unlock()` in
    `EventLoop::removeClient()`, not the second case in `EventLoop::post()`.
    
    This first case described above was not observed but is theoretically possible.
    The second case happens intermittently on macos and was reported
    https://github.com/chaincodelabs/libmultiprocess/issues/154
    48d01bcca7
  3. ryanofsky merged this on Feb 10, 2025
  4. ryanofsky closed this on Feb 10, 2025

  5. ryanofsky referenced this in commit 3a95817ece on Feb 10, 2025
  6. ryanofsky commented at 5:28 am on February 10, 2025: collaborator
    #160 contains some followups to try to make this code a little safer and easier to maintain.
  7. Sjors referenced this in commit 6aabfcb615 on Feb 10, 2025
  8. Sjors referenced this in commit 1746618e08 on Feb 13, 2025
  9. ryanofsky referenced this in commit 83e40d3b52 on Feb 24, 2025
  10. ryanofsky referenced this in commit 8619f03ec2 on Feb 24, 2025
  11. ryanofsky referenced this in commit cbb7b41c20 on Feb 24, 2025
  12. fanquake referenced this in commit 01f7715766 on Feb 25, 2025
  13. fanquake referenced this in commit ba0a4391ff on Feb 25, 2025
  14. ryanofsky referenced this in commit 258a617c1e on Jun 19, 2025
  15. janus referenced this in commit 86cb86b050 on Sep 1, 2025


ryanofsky


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/libmultiprocess. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-12-04 19:30 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me