[deleted] #158

pull ryanofsky wants to merge 1 commits into bitcoin-core:master from ryanofsky:pr/macbug changing 2 files +10 −7
  1. ryanofsky commented at 4:14 am on February 10, 2025: collaborator
    [deleted]
  2. bugfix: Do not lock EventLoop::mutex after EventLoop is done
    Currently, there are two cases where code may attempt to lock
    `EventLoop::mutex` after the `EventLoop::loop()` method has finished running.
    This is not a problem by itself, but is a problem if there is external code
    which deletes the `EventLoop` object immediately after the method returns,
    which is the case in Bitcoin Core. Both cases of locking after the loop method
    returns happen due to uses of the `Unlock()` utility function which unlocks a
    mutex temporarily, runs a callback function and relocks it.
    
    The first case happens in `EventLoop::removeClient` when the
    `EventLoop::done()` condition is reached and it calls `Unlock()` in order to be
    able to call `write(post_fd, ...)` without blocking and wake the EventLoop
    thread. In this case, since `EventLoop` execution is done there is not really
    any point to using `Unlock()` and relocking, so the code is changed to use a
    simple `lock.unlock()` call and not try to relock.
    
    The second case happens in `EventLoop::post` where `Unlock()` is used in a
    similar way, and depending on thread scheduling (if the thread is delayed for a
    long time before relocking) can result in failing to relock
    `EventLoop::m_mutex` after calling `write()`. In this case, since relocking the
    mutex is actually necessary for the code that follows, the fix is different:
    new `addClient`/`removeClient` calls are added to this code, so the
    `EventLoop::loop()` function will never exit while the `post()` function is
    waiting.
    
    These two changes are being labeled as a bugfix even though there is not
    technically a bug here in the libmultiprocess code or API. The `EventLoop`
    object itself was safe before these changes as long as outside code waited for
    `EventLoop` methods to finish executing before deleting the `EventLoop` object.
    Originally the multiprocess code added in
    https://github.com/bitcoin/bitcoin/pull/10102 did this, but later as more
    features were added for binding and connecting to unix sockets, and unit tests
    were added which would immediately delete the `EventLoop` object after
    `EventLoop::loop()` returned, it meant this code could now start failing to
    relock after unlocking.
    
    A previous attempt was made to fix this bug in
    https://github.com/bitcoin/bitcoin/pull/31815 outside of libmultiprocess code.
    But it only addressed the first case of a failing `Unlock()` in
    `EventLoop::removeClient()`, not the second case in `EventLoop::post()`.
    
    This first case described above was not observed but is theoretically possible.
    The second case happens intermittently on macos and was reported
    https://github.com/chaincodelabs/libmultiprocess/issues/154
    f8cbd0d34a
  3. ryanofsky merged this on Feb 10, 2025
  4. ryanofsky closed this on Feb 10, 2025

  5. ryanofsky renamed this:
    bugfix: Do not lock EventLoop::mutex after EventLoop is done
    [deleted]
    on Feb 10, 2025


ryanofsky


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/libmultiprocess. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-12-04 19:30 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me