util: Make LockDirectory thread-safe, consistent, and fix OpenBSD 6.2 build #12422

pull laanwj wants to merge 2 commits into bitcoin:master from laanwj:2018_01_openbsd_util_fix changing 3 files +165 −6
  1. laanwj commented at 11:17 am on February 13, 2018: member

    Wrap the boost::interprocess::file_lock in a std::unique_ptr inside the map that keeps track of per-directory locks.

    This fixes a build issue with the clang 4.0.0+boost-1.58.0p8 version combo on OpenBSD 6.2, and should have no effect otherwise.

    Also add a unit test, make the function thread-safe, and fix Linux versus Windows behavior inconsistency.

    Meant to fix #12413.

  2. laanwj added the label Linux/Unix on Feb 13, 2018
  3. laanwj added the label Utils/log/libs on Feb 13, 2018
  4. laanwj renamed this:
    util: Use unique_ptr in locks map
    util: Use unique_ptr in locks map (fix OpenBSD 6.2 build)
    on Feb 13, 2018
  5. laanwj added this to the milestone 0.16.0 on Feb 13, 2018
  6. laanwj requested review from meshcollider on Feb 13, 2018
  7. laanwj commented at 11:46 am on February 13, 2018: member

    @MeshCollider A question about this code - what is supposed to be the semantics of LockDirectory if it’s called with the same directory multiple times? I understand that the map is just to hold on to the per-directory lock, but right now, if it happens to be called multiple times it will try to re-lock the same lock again*. We don’t check if it already exists, and emplace will return the current object with that key.

    According to the documentation http://www.boost.org/doc/libs/master/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.file_lock

    Effects: The calling thread tries to acquire exclusive ownership of the file lock without waiting. If no other thread has exclusive or sharable ownership of the file lock, this succeeds.

    The result of that will depend if the lock was taken in the same thread (? I’d expect process, in the case of interprocess locks).

    I think the intent of the function is to return true if the process acquired the lock, or already had it, and false if it didn’t have the lock already and failed to acquire it? Correct?

    * Or will it? or will it first try to construct a new boost::interprocess::file_lock>(pathLockFile.string().c_str()) with the directory, then discard it when it cannot be inserted? (will that constructor raise an exception if the lock already exists?)

    Edit: I’ve tested this, and it seems

    • a second LockDirectory with the same directory and lock file name, from the same thread, returns true.
    • a second LockDirectory with the same directory and lock file name, from another thread within the same process, returns true.
  8. laanwj commented at 11:53 am on February 13, 2018: member
    Also (as noted by goatpig on IRC) this function is currently not thread-safe. That’s no issue right now for 0.16, I think, as this is only used by the init thread at init time, but will be with dynamic loading of wallets.
  9. laanwj requested review from jnewbery on Feb 13, 2018
  10. meshcollider approved
  11. meshcollider commented at 12:12 pm on February 13, 2018: contributor

    I think the intent of the function is to return true if the process acquired the lock, or already had it, and false if it didn’t have the lock already and failed to acquire it? Correct?

    Yep, exactly.

    A bit of background: when only a single directory was being locked (i.e. when the datadir locking function was separate to the walletdir locking function), the lock was declared as static. But because of the abstraction into a more general directory locking function, @ryanofsky suggested changing it to a map of locks, one per directory. Didn’t realize this would cause issues, the proposed change looks good to me. utACK https://github.com/bitcoin/bitcoin/pull/12422/commits/dce11b86dd4df76167222235c79c178cf6f2a661

  12. laanwj removed review request from jnewbery on Feb 13, 2018
  13. laanwj requested review from ryanofsky on Feb 13, 2018
  14. laanwj force-pushed on Feb 13, 2018
  15. laanwj commented at 1:13 pm on February 13, 2018: member
    @MeshCollider I’ve added a unit test, test_LockDirectory. Can you please verify I’m testing the intended behavior of the function?
  16. laanwj force-pushed on Feb 13, 2018
  17. laanwj force-pushed on Feb 13, 2018
  18. laanwj commented at 2:02 pm on February 13, 2018: member

    So this is interesting: my test already found a divergence between the behavior on Linux and Windows.

    On Linux,

    • Another lock on the directory from the same thread succeeds
    • Another lock on the directory from a different thread within the same process succeeds

    On Windows, both cases fail:

    0test/util_tests.cpp(655): error: in "util_tests/test_LockDirectory": check LockDirectory(dirname, LOCKNAME) == true has failed [false != true]
    1test/util_tests.cpp(661): error: in "util_tests/test_LockDirectory": check threadresult == true has failed [false != true]
    
  19. ryanofsky commented at 3:17 pm on February 13, 2018: member

    On Windows, both cases fail:

    I guess this is due to inconsistent behavior of try_lock. Probably best not to rely on this behavior. Maybe we should replace:

    0lock = locks.emplace(...).first->second;
    1if (!lock->try_lock()) {
    2    return false;
    3}
    

    with something like:

    0auto inserted = locks.emplace(...);
    1if (inserted.second && !inserted.first.second->try_lock()) {
    2    locks.erase(inserted.first)
    3    return false;
    4}
    
  20. laanwj commented at 3:25 pm on February 13, 2018: member

    I guess this is due to inconsistent behavior of try_lock. Probably best not to rely on this behavior. Maybe we should replace:

    I agree - I think we shouldn’t insert locks that aren’t held into the map at all. This makes it possible to use ‘already in map’ as early-out.

    While implementing and testing this, I ran into something really frustrating, I found the FILE* file = fsbridge::fopen(pathLockFile, "a"); destroys the lock if it already exists! (but apparently, only if the current process holds it, not if another does, so it wipes the ‘we own this lock’ administration)

  21. laanwj commented at 4:00 pm on February 13, 2018: member

    Pushed a new commit, which hopefully fixes the tests on windows:

    util: Fix multiple use of LockDirectory

    This commit fixes problems with calling LockDirectory multiple times on the same directory, or from multiple threads.

    • Protect the locks map using a mutex
    • Make sure that only locks that are successfully acquired (and when not probing) are inserted in the map
    • Open the lock file for appending only if we know we don’t have the lock yet - The FILE* file = fsbridge::fopen(pathLockFile, "a"); (reopening the file) wipes the ‘we own this lock’ administration.
  22. in src/util.cpp:402 in 0a671a733c outdated
    402+        if (!lock->try_lock()) {
    403             return false;
    404         }
    405         if (probe_only) {
    406-            lock.unlock();
    407+            lock->unlock();
    


    laanwj commented at 4:08 pm on February 13, 2018:
    It think this explicit unlock is unnecessary now, as the lock will fall out of scope when locks.emplace is not called to move it to the map.
  23. in src/util.cpp:384 in cb80237c75 outdated
    379+    // successful locking, these will be held here until the global
    380+    // destructor cleans them up and thus automatically unlocks them.
    381+    static std::map<std::string, std::unique_ptr<boost::interprocess::file_lock>> locks;
    382+    // Protect the map with a mutex
    383+    static std::mutex cs;
    384+    std::unique_lock<std::mutex> ulock(cs);
    


    bpay commented at 6:58 pm on February 13, 2018:
    std::lock_guard should suffice for this
  24. in src/test/util_tests.cpp:612 in ca4c802ac0 outdated
    606@@ -603,4 +607,71 @@ BOOST_AUTO_TEST_CASE(test_ParseFixedPoint)
    607     BOOST_CHECK(!ParseFixedPoint("1.", 8, &amount));
    608 }
    609 
    610+static const std::string LOCKNAME = ".lock";
    611+
    612+static void TestOtherThread(fs::path dirname, bool *result)
    


    ryanofsky commented at 10:46 pm on February 13, 2018:
    Maybe define TestOtherThread as a lambda instead of an external function. TestOther code would seem easier to understand in the context where it launches instead of out here. Also this would allow test case to be self contained and not need external LOCKNAME/TestOther declarations.

    laanwj commented at 7:08 am on February 14, 2018:
    Conceptually I do prefer (especially) the TestOtherProcess code to be in a self-contained function instead of in-line, because the stuff runs in a separate process. Also making the separate-process test work on windows (not going to do so in this pull) would involve some factoring in that direction anyhow. If the LOCKNAME constant is a problem it could be passed in as parameter.

    laanwj commented at 7:26 am on February 14, 2018:

    But yeah I do see the argument for making it self-contained, too. Really not sure here…

    Edit: however, unlike for the thread, inlining TestOtherProcess won’t put the code in a context where it’s easier to understand, but at the beginning of the function in fork() else.

  25. ryanofsky commented at 10:53 pm on February 13, 2018: member
    utACK ca4c802ac05c651559ca23c6f614c50b5f161a92. Code looks good though obviously can be squashed. Left minor comment but feel free to ignore it (or complain about it, as may be your wont).
  26. laanwj renamed this:
    util: Use unique_ptr in locks map (fix OpenBSD 6.2 build)
    util: Make LockDirectory thread-safe, consistent, and fix OpenBSD 6.2 build
    on Feb 14, 2018
  27. laanwj force-pushed on Feb 14, 2018
  28. laanwj commented at 7:48 am on February 14, 2018: member

    added commit:

    • e548d69: lockname is a parameter to inner functions instead of a constant

    squashed e548d69 2018_01_openbsd_util_fix_v0 -> 6a0a3d7

  29. laanwj commented at 8:11 am on February 14, 2018: member

    Hmm though it compiles, the new test is failing on OpenBSD, which was the original goal of this PR. Talking about scope drift. Will investigate :)

    0unknown location(0): fatal error in "test_LockDirectory": signal: generated by kill() (or family); uid=0; pid=0
    1test/util_tests.cpp(666): last checkpoint
    

    Fixed: this had to do with different handling of SIGCHLD on BSD versus Linux.

  30. in src/util.cpp:390 in 2207709e6e outdated
    386 {
    387+    std::lock_guard<std::mutex> ulock(cs_dir_locks);
    388     fs::path pathLockFile = directory / lockfile_name;
    389-    FILE* file = fsbridge::fopen(pathLockFile, "a"); // empty lock file; created if it doesn't exist.
    390+
    391+    // If a lock for this directory already exists in the map, don't try to re-lock it
    


    theuni commented at 9:14 pm on February 14, 2018:

    Maybe note that this just working around the lack of std::map::try_emplace, which was added in c++17.

    Edit: Heh, they even use this exact construction as an example:

    Unlike insert or emplace, these functions do not move from rvalue arguments if the insertion does not happen, which makes it easy to manipulate maps whose values are move-only types, such as std::map<std::string, std::unique_ptr>.


    laanwj commented at 5:02 pm on February 15, 2018:
    I’m a bit confused here, to be honest. Does try_emplace avoid constructing the object entirely if it already exists in the map?
  31. in src/util.cpp:397 in 2207709e6e outdated
    393+        return true;
    394+    }
    395+
    396+    // Create empty lock file if it doesn't exist.
    397+    FILE* file = fsbridge::fopen(pathLockFile, "a");
    398     if (file) fclose(file);
    


    theuni commented at 1:16 am on February 15, 2018:
    May as well bail early if the fopen fails, we’re just going to end up in a boost exception.

    laanwj commented at 9:21 am on February 15, 2018:
    I played with that thought, but decided against it - that would just add extra code for an error case that is handled by boost, later?
  32. in src/util.cpp:402 in 2207709e6e outdated
    401-        static std::map<std::string, boost::interprocess::file_lock> locks;
    402-        boost::interprocess::file_lock& lock = locks.emplace(pathLockFile.string(), pathLockFile.string().c_str()).first->second;
    403-        if (!lock.try_lock()) {
    404+        auto lock = MakeUnique<boost::interprocess::file_lock>(pathLockFile.string().c_str());
    405+        if (!lock->try_lock()) {
    406             return false;
    


    theuni commented at 1:18 am on February 15, 2018:
    I realize it isn’t a regression, but this really needs to be logged.

    laanwj commented at 9:19 am on February 15, 2018:
    In both cases it’s used, it’s logged in the caller function, so I think that is redundant?
  33. in src/util.cpp:401 in 2207709e6e outdated
    400     try {
    401-        static std::map<std::string, boost::interprocess::file_lock> locks;
    402-        boost::interprocess::file_lock& lock = locks.emplace(pathLockFile.string(), pathLockFile.string().c_str()).first->second;
    403-        if (!lock.try_lock()) {
    404+        auto lock = MakeUnique<boost::interprocess::file_lock>(pathLockFile.string().c_str());
    405+        if (!lock->try_lock()) {
    


    theuni commented at 1:20 am on February 15, 2018:
    In the probe_only case, this won’t be unlocked when lock destructs, as interprocess::file_lock isn’t RAII. Was it intended to work that way?

    laanwj commented at 9:17 am on February 15, 2018:

    Eh are you sure? the whole intent of keeping the things in a std::map is that they get unlocked when they are destructed. Anecdotally this seems to work on windows, at least, the call to ReleaseDirectoryLocks() seems to release the locks. I can try further.

    Edit: shouldn’t -daemon be broken if true? It’s what uses probe_only to probe (acquire and release) the lock from the parent process, before getting it in the child process. At least on Linux it seems to work…


    laanwj commented at 10:31 am on February 15, 2018:

    The documentation is confusing here (http://www.boost.org/doc/libs/1_66_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.file_lock):

    A file locking is a class that has process lifetime. This means that if a process holding a file lock ends or crashes, the operating system will automatically unlock it. This feature is very useful in some situations where we want to assure automatic unlocking even when the process crashes and avoid leaving blocked resources in the system. A file lock is constructed using the name of the file as an argument:

    This would suggest we don’t even need to hold on to the file_lock after locking them, as the locks always have process lifetime. But the test outcomes seem to suggest RAII behavior.


    theuni commented at 11:58 am on February 15, 2018:

    Grr, the “theuni requested changes” is a bit overstated… There are a few fixes here and I can’t produce any real-world issues, so utACK from me for 0.16 for the sake of not dragging it out.


    Further discussion for master, or 0.16 if you still feel like messing with it:

    Strange, your tests results (RAII behavior) don’t line up with what I see from the docs/code. I’ll play with tests.

    You mentioned:

    The FILE* file = fsbridge::fopen(pathLockFile, "a"); wipes the ‘we own this lock’ administration

    And from the Boost docs:

    In POSIX, when two file descriptors are used to lock a file if a descriptor is closed, all file locks set by the calling process are cleared.

    Is it possible that it was the fclose rather than the fopen? Because that would explain everything neatly, I think. In the probe_only case, You create a lock, lock it, then destroy it. Then on the next invocation you fopen/fclose it, which basically gives a clean start before you create a new lock, lock again, etc.

    To be clear, my concern is that in the near future we might adapt LockDirectory() and add UnlockDirectory() for some feature like multi-wallet-dirs, only to be hit with a hard-to-diagnose bug when UnlockDirectory() quietly doesnt’ work as intended.

    The boost docs suggest using RAII generics here:

    scoped_lock and sharable_lock can be used to make file locking easier in the presence of exceptions, just like with mutexes.

    And file_lock’s destructor is pretty straightforward, I don’t see how it could be doing any unlocking:

    0// m_file_hnd: typedef int (void* for win)
    1// ipcdetail::close_file: ::close (CloseHandle() for win)
    2inline file_lock::~file_lock()
    3{
    4   if(m_file_hnd != ipcdetail::invalid_file()){
    5      ipcdetail::close_file(m_file_hnd);
    6      m_file_hnd = ipcdetail::invalid_file();
    7   }
    8}
    

    laanwj commented at 12:05 pm on February 15, 2018:

    Is it possible that it was the fclose rather than the fopen? Because that would explain everything neatly, I think. In the probe_only case, You create a lock, lock it, then destroy it. Then on the next invocation you fopen/fclose it, which basically gives a clean start before you create a new lock, lock again, etc.

    Whoa, sneaky. Yes, that’s possible. Let’s try the tests that by using an already-existing directory with an existing lock file, without that fopen part.

  34. theuni referenced this in commit aa04588d9c on Feb 15, 2018
  35. theuni changes_requested
  36. theuni commented at 1:47 am on February 15, 2018: member
  37. meshcollider commented at 2:05 am on February 15, 2018: contributor

    I’ve added a unit test, test_LockDirectory. Can you please verify I’m testing the intended behavior of the function?

    LGTM, yep. Ugh OS inconsistencies make things like this so fun. Code changes look good though. cfields changes too, utACK

  38. laanwj commented at 9:49 am on February 15, 2018: member
    Now that I had to add ReleaseDirectoryLocks() for testing anyhow, I’m going to extend the cross-process test to see if locks are given up as expected when the map is cleared, or when using probe-only.
  39. laanwj commented at 10:18 am on February 15, 2018: member
    In the new commit I added unit tests for lock probing, to see if ReleaseDirectoryLocks() succesfully releases the lock, and whether exiting the child prices releases the lock. At least locally this all passes, let’s see what Travis makes of it.
  40. theuni approved
  41. laanwj commented at 12:38 pm on February 15, 2018: member

    Whoa, sneaky. Yes, that’s possible. Let’s try the tests that by using an already-existing directory with an existing lock file, without that fopen part.

    To check this, I applied the following patch, removing all manual file and directory handling:

     0diff --git a/src/test/util_tests.cpp b/src/test/util_tests.cpp
     1index 4b2da3e..80d76e6 100644
     2--- a/src/test/util_tests.cpp
     3+++ b/src/test/util_tests.cpp
     4@@ -647,7 +647,7 @@ static void TestOtherProcess(fs::path dirname, std::string lockname, int fd)
     5 
     6 BOOST_AUTO_TEST_CASE(test_LockDirectory)
     7 {
     8-    fs::path dirname = fs::temp_directory_path() / fs::unique_path();
     9+    fs::path dirname("/tmp/locktest");
    10     const std::string lockname = ".lock";
    11 #ifndef WIN32
    12     // Revert SIGCHLD to default, otherwise boost.test will catch and fail on
    13@@ -667,10 +667,6 @@ BOOST_AUTO_TEST_CASE(test_LockDirectory)
    14     }
    15     BOOST_CHECK_EQUAL(close(fd[0]), 0); // Parent: close child end
    16 #endif
    17-    // Lock on non-existent directory should fail
    18-    BOOST_CHECK_EQUAL(LockDirectory(dirname, lockname), false);
    19-
    20-    fs::create_directories(dirname);
    21 
    22     // Probing lock on new directory should succeed
    23     BOOST_CHECK_EQUAL(LockDirectory(dirname, lockname, true), true);
    24@@ -730,7 +726,6 @@ BOOST_AUTO_TEST_CASE(test_LockDirectory)
    25 #endif
    26     // Clean up
    27     ReleaseDirectoryLocks();
    28-    fs::remove_all(dirname);
    29 }
    30 
    31 BOOST_AUTO_TEST_SUITE_END()
    32diff --git a/src/util.cpp b/src/util.cpp
    33index dcf7ed3..aa29a04 100644
    34--- a/src/util.cpp
    35+++ b/src/util.cpp
    36@@ -392,10 +392,6 @@ bool LockDirectory(const fs::path& directory, const std::string lockfile_name, b
    37         return true;
    38     }
    39 
    40-    // Create empty lock file if it doesn't exist.
    41-    FILE* file = fsbridge::fopen(pathLockFile, "a");
    42-    if (file) fclose(file);
    43-
    44     try {
    45         auto lock = MakeUnique<boost::interprocess::file_lock>(pathLockFile.string().c_str());
    46         if (!lock->try_lock()) {
    

    Then manually created the lock directory and file:

    0mkdir /tmp/locktest && touch /tmp/locktest/.lock
    

    Result:

    0$ test/test_bitcoin --run_test=util_tests/test_LockDirectory
    1Running 1 test case...
    2
    3*** No errors detected
    

    … yep, it just works. Curious.

    There are a few fixes here and I can’t produce any real-world issues, so utACK from me for 0.16 for the sake of not dragging it out.

    If you can’t find a problem with my methodology above, I agree, let’s get back to this discussion after 0.16.

  42. Sjors commented at 1:19 pm on February 15, 2018: member
    make check is still happy on MacOS (46d46323).
  43. laanwj added the label Needs backport on Feb 15, 2018
  44. util: Fix multiple use of LockDirectory
    This commit fixes problems with calling LockDirectory multiple times on
    the same directory, or from multiple threads. It also fixes the build on
    OpenBSD.
    
    - Wrap the boost::interprocess::file_lock in a std::unique_ptr inside
      the map that keeps track of per-directory locks. This fixes a build
      issue with the clang 4.0.0+boost-1.58.0p8 version combo on OpenBSD
      6.2, and should have no observable effect otherwise.
    
    - Protect the locks map using a mutex.
    
    - Make sure that only locks that are successfully acquired are inserted
      in the map.
    
    - Open the lock file for appending only if we know we don't have the
      lock yet - The `FILE* file = fsbridge::fopen(pathLockFile, "a");`
      wipes the 'we own this lock' administration, likely because it opens
      a new fd for the locked file then closes it.
    fc888bfcac
  45. test: Add unit test for LockDirectory
    Add a unit test for LockDirectory, introduced in #11281.
    1d4cbd26e4
  46. laanwj force-pushed on Feb 15, 2018
  47. laanwj commented at 3:27 pm on February 15, 2018: member
    squashed into two commits (one that makes the changes to LockDatadirectory, one that adds the unit test) 46d4632 2018_01_openbsd_util_fix_v11d4cbd2
  48. laanwj merged this on Feb 15, 2018
  49. laanwj closed this on Feb 15, 2018

  50. laanwj referenced this in commit 58715f6d07 on Feb 15, 2018
  51. laanwj referenced this in commit 32a726846d on Feb 15, 2018
  52. laanwj referenced this in commit 4d54e7ad41 on Feb 15, 2018
  53. laanwj removed the label Needs backport on Feb 15, 2018
  54. HashUnlimited referenced this in commit 20b1dca709 on Mar 16, 2018
  55. HashUnlimited referenced this in commit deb02e491b on Mar 16, 2018
  56. ccebrecos referenced this in commit 89e5863863 on Sep 14, 2018
  57. ccebrecos referenced this in commit 2ecd9297dd on Sep 14, 2018
  58. PastaPastaPasta referenced this in commit debd209c46 on Apr 3, 2020
  59. PastaPastaPasta referenced this in commit 40e4a1a45c on Apr 3, 2020
  60. ckti referenced this in commit 65e535a7d3 on Mar 28, 2021
  61. gades referenced this in commit 78ffe97fcd on Jun 30, 2021
  62. random-zebra referenced this in commit 61a098a775 on Aug 5, 2021
  63. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-02-22 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me