Currently, there are two cases where code may attempt to lock EventLoop::mutex after the EventLoop::loop() method has finished running. This is not a problem by itself, but is a problem if there is external code which deletes the EventLoop object immediately after the method returns, which is the case in Bitcoin Core. Both cases of locking after the loop method returns happen due to uses of the Unlock() utility function which unlocks a mutex temporarily, runs a callback function and relocks it.
The first case happens in EventLoop::removeClient when the EventLoop::done() condition is reached and it calls Unlock() in order to be able to call write(post_fd, ...) without blocking and wake the EventLoop thread. In this case, since EventLoop execution is done there is not really any point to using Unlock() and relocking, so the code is changed to use a simple lock.unlock() call and not try to relock.
The second case happens in EventLoop::post where Unlock() is used in a similar way, and depending on thread scheduling (if the thread is delayed for a long time before relocking) can result in failing to relock EventLoop::m_mutex after calling write(). In this case, since relocking the mutex is actually necessary for the code that follows, the fix is different: new addClient/removeClient calls are added to this code, so the EventLoop::loop() function will never exit while the post() function is waiting.
These two changes are being labeled as a bugfix even though there is not technically a bug here in the libmultiprocess code or API. The EventLoop object itself was safe before these changes as long as outside code waited for EventLoop methods to finish executing before deleting the EventLoop object. Originally the multiprocess code added in
https://github.com/bitcoin/bitcoin/pull/10102 did this, but later as more features were added for binding and connecting to unix sockets, and unit tests were added which would immediately delete the EventLoop object after EventLoop::loop() returned, it meant this code could now start failing to relock after unlocking.
A previous attempt was made to fix this bug in
https://github.com/bitcoin/bitcoin/pull/31815 outside of libmultiprocess code. But it only addressed the first case of a failing Unlock() in EventLoop::removeClient(), not the second case in EventLoop::post().
This first case described above was not observed but is theoretically possible. The second case happens intermittently on macos and was reported https://github.com/chaincodelabs/libmultiprocess/issues/154