util: Make LockDirectory thread-safe, consistent, and fix OpenBSD 6.2 build

laanwj commented at 11:17 am on February 13, 2018: member

Wrap the boost::interprocess::file_lock in a std::unique_ptr inside the map that keeps track of per-directory locks.

This fixes a build issue with the clang 4.0.0+boost-1.58.0p8 version combo on OpenBSD 6.2, and should have no effect otherwise.

Also add a unit test, make the function thread-safe, and fix Linux versus Windows behavior inconsistency.

Meant to fix #12413.

laanwj added the label Linux/Unix on Feb 13, 2018

laanwj added the label Utils/log/libs on Feb 13, 2018

laanwj renamed this:
~~util: Use unique_ptr in locks map~~
util: Use unique_ptr in locks map (fix OpenBSD 6.2 build)
on Feb 13, 2018

laanwj added this to the milestone 0.16.0 on Feb 13, 2018

laanwj requested review from meshcollider on Feb 13, 2018

laanwj commented at 11:46 am on February 13, 2018: member

@MeshCollider A question about this code - what is supposed to be the semantics of LockDirectory if it’s called with the same directory multiple times? I understand that the map is just to hold on to the per-directory lock, but right now, if it happens to be called multiple times it will try to re-lock the same lock again*. We don’t check if it already exists, and emplace will return the current object with that key.

According to the documentation http://www.boost.org/doc/libs/master/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.file_lock

Effects: The calling thread tries to acquire exclusive ownership of the file lock without waiting. If no other thread has exclusive or sharable ownership of the file lock, this succeeds.

The result of that will depend if the lock was taken in the same thread (? I’d expect process, in the case of interprocess locks).

I think the intent of the function is to return true if the process acquired the lock, or already had it, and false if it didn’t have the lock already and failed to acquire it? Correct?

* Or will it? or will it first try to construct a new boost::interprocess::file_lock>(pathLockFile.string().c_str()) with the directory, then discard it when it cannot be inserted? (will that constructor raise an exception if the lock already exists?)

Edit: I’ve tested this, and it seems

a second LockDirectory with the same directory and lock file name, from the same thread, returns true.
a second LockDirectory with the same directory and lock file name, from another thread within the same process, returns true.

laanwj commented at 11:53 am on February 13, 2018: member

Also (as noted by goatpig on IRC) this function is currently not thread-safe. That’s no issue right now for 0.16, I think, as this is only used by the init thread at init time, but will be with dynamic loading of wallets.

laanwj requested review from jnewbery on Feb 13, 2018

meshcollider approved

meshcollider commented at 12:12 pm on February 13, 2018: contributor

I think the intent of the function is to return true if the process acquired the lock, or already had it, and false if it didn’t have the lock already and failed to acquire it? Correct?

Yep, exactly.

A bit of background: when only a single directory was being locked (i.e. when the datadir locking function was separate to the walletdir locking function), the lock was declared as static. But because of the abstraction into a more general directory locking function, @ryanofsky suggested changing it to a map of locks, one per directory. Didn’t realize this would cause issues, the proposed change looks good to me. utACK https://github.com/bitcoin/bitcoin/pull/12422/commits/dce11b86dd4df76167222235c79c178cf6f2a661

laanwj removed review request from jnewbery on Feb 13, 2018

laanwj requested review from ryanofsky on Feb 13, 2018

laanwj force-pushed on Feb 13, 2018

laanwj commented at 1:13 pm on February 13, 2018: member

@MeshCollider I’ve added a unit test, test_LockDirectory. Can you please verify I’m testing the intended behavior of the function?

laanwj force-pushed on Feb 13, 2018

laanwj commented at 2:02 pm on February 13, 2018: member

So this is interesting: my test already found a divergence between the behavior on Linux and Windows.

On Linux,

Another lock on the directory from the same thread succeeds
Another lock on the directory from a different thread within the same process succeeds

On Windows, both cases fail:

0test/util_tests.cpp(655): error: in "util_tests/test_LockDirectory": check LockDirectory(dirname, LOCKNAME) == true has failed [false != true]
1test/util_tests.cpp(661): error: in "util_tests/test_LockDirectory": check threadresult == true has failed [false != true]

ryanofsky commented at 3:17 pm on February 13, 2018: member

On Windows, both cases fail:

I guess this is due to inconsistent behavior of try_lock. Probably best not to rely on this behavior. Maybe we should replace:

0lock = locks.emplace(...).first->second;
1if (!lock->try_lock()) {
2    return false;
3}

with something like:

0auto inserted = locks.emplace(...);
1if (inserted.second && !inserted.first.second->try_lock()) {
2    locks.erase(inserted.first)
3    return false;
4}

laanwj commented at 3:25 pm on February 13, 2018: member

I guess this is due to inconsistent behavior of try_lock. Probably best not to rely on this behavior. Maybe we should replace:

I agree - I think we shouldn’t insert locks that aren’t held into the map at all. This makes it possible to use ‘already in map’ as early-out.

While implementing and testing this, I ran into something really frustrating, I found the FILE* file = fsbridge::fopen(pathLockFile, "a"); destroys the lock if it already exists! (but apparently, only if the current process holds it, not if another does, so it wipes the ‘we own this lock’ administration)

laanwj commented at 4:00 pm on February 13, 2018: member

Pushed a new commit, which hopefully fixes the tests on windows:

util: Fix multiple use of LockDirectory

This commit fixes problems with calling LockDirectory multiple times on the same directory, or from multiple threads.

Protect the locks map using a mutex
Make sure that only locks that are successfully acquired (and when not probing) are inserted in the map
Open the lock file for appending only if we know we don’t have the lock yet - The FILE* file = fsbridge::fopen(pathLockFile, "a"); (reopening the file) wipes the ‘we own this lock’ administration.

in src/util.cpp:402 in 0a671a733c outdated

402+        if (!lock->try_lock()) {
403             return false;
404         }
405         if (probe_only) {
406-            lock.unlock();
407+            lock->unlock();

laanwj commented at 4:08 pm on February 13, 2018:

It think this explicit unlock is unnecessary now, as the lock will fall out of scope when locks.emplace is not called to move it to the map.

in src/util.cpp:384 in cb80237c75 outdated

379+    // successful locking, these will be held here until the global
380+    // destructor cleans them up and thus automatically unlocks them.
381+    static std::map<std::string, std::unique_ptr<boost::interprocess::file_lock>> locks;
382+    // Protect the map with a mutex
383+    static std::mutex cs;
384+    std::unique_lock<std::mutex> ulock(cs);

bpay commented at 6:58 pm on February 13, 2018:

std::lock_guard should suffice for this

in src/test/util_tests.cpp:612 in ca4c802ac0 outdated

606@@ -603,4 +607,71 @@ BOOST_AUTO_TEST_CASE(test_ParseFixedPoint)
607     BOOST_CHECK(!ParseFixedPoint("1.", 8, &amount));
608 }
609 
610+static const std::string LOCKNAME = ".lock";
611+
612+static void TestOtherThread(fs::path dirname, bool *result)

ryanofsky commented at 10:46 pm on February 13, 2018:

Maybe define TestOtherThread as a lambda instead of an external function. TestOther code would seem easier to understand in the context where it launches instead of out here. Also this would allow test case to be self contained and not need external LOCKNAME/TestOther declarations.

laanwj commented at 7:08 am on February 14, 2018:

Conceptually I do prefer (especially) the TestOtherProcess code to be in a self-contained function instead of in-line, because the stuff runs in a separate process. Also making the separate-process test work on windows (not going to do so in this pull) would involve some factoring in that direction anyhow. If the LOCKNAME constant is a problem it could be passed in as parameter.

laanwj commented at 7:26 am on February 14, 2018:

But yeah I do see the argument for making it self-contained, too. Really not sure here…

Edit: however, unlike for the thread, inlining TestOtherProcess won’t put the code in a context where it’s easier to understand, but at the beginning of the function in fork() else.

ryanofsky commented at 10:53 pm on February 13, 2018: member

utACK ca4c802ac05c651559ca23c6f614c50b5f161a92. Code looks good though obviously can be squashed. Left minor comment but feel free to ignore it (or complain about it, as may be your wont).

laanwj renamed this:
~~util: Use unique_ptr in locks map (fix OpenBSD 6.2 build)~~
util: Make LockDirectory thread-safe, consistent, and fix OpenBSD 6.2 build
on Feb 14, 2018

laanwj force-pushed on Feb 14, 2018

laanwj commented at 7:48 am on February 14, 2018: member

added commit:

e548d69: lockname is a parameter to inner functions instead of a constant

squashed e548d69 2018_01_openbsd_util_fix_v0 -> 6a0a3d7

laanwj commented at 8:11 am on February 14, 2018: member

Hmm though it compiles, the new test is failing on OpenBSD, which was the original goal of this PR. Talking about scope drift. Will investigate :)

0unknown location(0): fatal error in "test_LockDirectory": signal: generated by kill() (or family); uid=0; pid=0
1test/util_tests.cpp(666): last checkpoint

Fixed: this had to do with different handling of SIGCHLD on BSD versus Linux.

in src/util.cpp:390 in 2207709e6e outdated

386 {
387+    std::lock_guard<std::mutex> ulock(cs_dir_locks);
388     fs::path pathLockFile = directory / lockfile_name;
389-    FILE* file = fsbridge::fopen(pathLockFile, "a"); // empty lock file; created if it doesn't exist.
390+
391+    // If a lock for this directory already exists in the map, don't try to re-lock it

theuni commented at 9:14 pm on February 14, 2018:

Maybe note that this just working around the lack of std::map::try_emplace, which was added in c++17.

Edit: Heh, they even use this exact construction as an example:

Unlike insert or emplace, these functions do not move from rvalue arguments if the insertion does not happen, which makes it easy to manipulate maps whose values are move-only types, such as std::map<std::string, std::unique_ptr>.

laanwj commented at 5:02 pm on February 15, 2018:

I’m a bit confused here, to be honest. Does try_emplace avoid constructing the object entirely if it already exists in the map?

in src/util.cpp:397 in 2207709e6e outdated

393+        return true;
394+    }
395+
396+    // Create empty lock file if it doesn't exist.
397+    FILE* file = fsbridge::fopen(pathLockFile, "a");
398     if (file) fclose(file);

theuni commented at 1:16 am on February 15, 2018:

May as well bail early if the fopen fails, we’re just going to end up in a boost exception.

laanwj commented at 9:21 am on February 15, 2018:

I played with that thought, but decided against it - that would just add extra code for an error case that is handled by boost, later?

in src/util.cpp:402 in 2207709e6e outdated

401-        static std::map<std::string, boost::interprocess::file_lock> locks;
402-        boost::interprocess::file_lock& lock = locks.emplace(pathLockFile.string(), pathLockFile.string().c_str()).first->second;
403-        if (!lock.try_lock()) {
404+        auto lock = MakeUnique<boost::interprocess::file_lock>(pathLockFile.string().c_str());
405+        if (!lock->try_lock()) {
406             return false;

theuni commented at 1:18 am on February 15, 2018:

I realize it isn’t a regression, but this really needs to be logged.

laanwj commented at 9:19 am on February 15, 2018:

In both cases it’s used, it’s logged in the caller function, so I think that is redundant?

in src/util.cpp:401 in 2207709e6e outdated

400     try {
401-        static std::map<std::string, boost::interprocess::file_lock> locks;
402-        boost::interprocess::file_lock& lock = locks.emplace(pathLockFile.string(), pathLockFile.string().c_str()).first->second;
403-        if (!lock.try_lock()) {
404+        auto lock = MakeUnique<boost::interprocess::file_lock>(pathLockFile.string().c_str());
405+        if (!lock->try_lock()) {

theuni commented at 1:20 am on February 15, 2018:

In the probe_only case, this won’t be unlocked when lock destructs, as interprocess::file_lock isn’t RAII. Was it intended to work that way?

laanwj commented at 9:17 am on February 15, 2018:

Eh are you sure? the whole intent of keeping the things in a std::map is that they get unlocked when they are destructed. Anecdotally this seems to work on windows, at least, the call to ReleaseDirectoryLocks() seems to release the locks. I can try further.

Edit: shouldn’t -daemon be broken if true? It’s what uses probe_only to probe (acquire and release) the lock from the parent process, before getting it in the child process. At least on Linux it seems to work…

laanwj commented at 10:31 am on February 15, 2018:

The documentation is confusing here (http://www.boost.org/doc/libs/1_66_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.file_lock):

A file locking is a class that has process lifetime. This means that if a process holding a file lock ends or crashes, the operating system will automatically unlock it. This feature is very useful in some situations where we want to assure automatic unlocking even when the process crashes and avoid leaving blocked resources in the system. A file lock is constructed using the name of the file as an argument:

This would suggest we don’t even need to hold on to the file_lock after locking them, as the locks always have process lifetime. But the test outcomes seem to suggest RAII behavior.

theuni commented at 11:58 am on February 15, 2018:

Grr, the “theuni requested changes” is a bit overstated… There are a few fixes here and I can’t produce any real-world issues, so utACK from me for 0.16 for the sake of not dragging it out.

Further discussion for master, or 0.16 if you still feel like messing with it:

Strange, your tests results (RAII behavior) don’t line up with what I see from the docs/code. I’ll play with tests.

You mentioned:

The FILE* file = fsbridge::fopen(pathLockFile, "a"); wipes the ‘we own this lock’ administration

And from the Boost docs:

In POSIX, when two file descriptors are used to lock a file if a descriptor is closed, all file locks set by the calling process are cleared.

Is it possible that it was the fclose rather than the fopen? Because that would explain everything neatly, I think. In the probe_only case, You create a lock, lock it, then destroy it. Then on the next invocation you fopen/fclose it, which basically gives a clean start before you create a new lock, lock again, etc.

To be clear, my concern is that in the near future we might adapt LockDirectory() and add UnlockDirectory() for some feature like multi-wallet-dirs, only to be hit with a hard-to-diagnose bug when UnlockDirectory() quietly doesnt’ work as intended.

The boost docs suggest using RAII generics here:

scoped_lock and sharable_lock can be used to make file locking easier in the presence of exceptions, just like with mutexes.

And file_lock’s destructor is pretty straightforward, I don’t see how it could be doing any unlocking:

0// m_file_hnd: typedef int (void* for win)
1// ipcdetail::close_file: ::close (CloseHandle() for win)
2inline file_lock::~file_lock()
3{
4   if(m_file_hnd != ipcdetail::invalid_file()){
5      ipcdetail::close_file(m_file_hnd);
6      m_file_hnd = ipcdetail::invalid_file();
7   }
8}

laanwj commented at 12:05 pm on February 15, 2018:

Is it possible that it was the fclose rather than the fopen? Because that would explain everything neatly, I think. In the probe_only case, You create a lock, lock it, then destroy it. Then on the next invocation you fopen/fclose it, which basically gives a clean start before you create a new lock, lock again, etc.

Whoa, sneaky. Yes, that’s possible. Let’s try the tests that by using an already-existing directory with an existing lock file, without that fopen part.

theuni referenced this in commit aa04588d9c on Feb 15, 2018

theuni changes_requested

theuni commented at 1:47 am on February 15, 2018: member

See https://github.com/theuni/bitcoin/commit/aa04588d9cfdef72d6ac6aa3a6ad4e94ac9e38a4 for my attempt at addressing the notes above.

meshcollider commented at 2:05 am on February 15, 2018: contributor

I’ve added a unit test, test_LockDirectory. Can you please verify I’m testing the intended behavior of the function?

LGTM, yep. Ugh OS inconsistencies make things like this so fun. Code changes look good though. cfields changes too, utACK

laanwj commented at 9:49 am on February 15, 2018: member

Now that I had to add ReleaseDirectoryLocks() for testing anyhow, I’m going to extend the cross-process test to see if locks are given up as expected when the map is cleared, or when using probe-only.

laanwj commented at 10:18 am on February 15, 2018: member

In the new commit I added unit tests for lock probing, to see if ReleaseDirectoryLocks() succesfully releases the lock, and whether exiting the child prices releases the lock. At least locally this all passes, let’s see what Travis makes of it.

theuni approved

laanwj commented at 12:38 pm on February 15, 2018: member

Whoa, sneaky. Yes, that’s possible. Let’s try the tests that by using an already-existing directory with an existing lock file, without that fopen part.

To check this, I applied the following patch, removing all manual file and directory handling:

 0diff --git a/src/test/util_tests.cpp b/src/test/util_tests.cpp
 1index 4b2da3e..80d76e6 100644
 2--- a/src/test/util_tests.cpp
 3+++ b/src/test/util_tests.cpp
 4@@ -647,7 +647,7 @@ static void TestOtherProcess(fs::path dirname, std::string lockname, int fd)
 5 
 6 BOOST_AUTO_TEST_CASE(test_LockDirectory)
 7 {
 8-    fs::path dirname = fs::temp_directory_path() / fs::unique_path();
 9+    fs::path dirname("/tmp/locktest");
10     const std::string lockname = ".lock";
11 #ifndef WIN32
12     // Revert SIGCHLD to default, otherwise boost.test will catch and fail on
13@@ -667,10 +667,6 @@ BOOST_AUTO_TEST_CASE(test_LockDirectory)
14     }
15     BOOST_CHECK_EQUAL(close(fd[0]), 0); // Parent: close child end
16 #endif
17-    // Lock on non-existent directory should fail
18-    BOOST_CHECK_EQUAL(LockDirectory(dirname, lockname), false);
19-
20-    fs::create_directories(dirname);
21 
22     // Probing lock on new directory should succeed
23     BOOST_CHECK_EQUAL(LockDirectory(dirname, lockname, true), true);
24@@ -730,7 +726,6 @@ BOOST_AUTO_TEST_CASE(test_LockDirectory)
25 #endif
26     // Clean up
27     ReleaseDirectoryLocks();
28-    fs::remove_all(dirname);
29 }
30 
31 BOOST_AUTO_TEST_SUITE_END()
32diff --git a/src/util.cpp b/src/util.cpp
33index dcf7ed3..aa29a04 100644
34--- a/src/util.cpp
35+++ b/src/util.cpp
36@@ -392,10 +392,6 @@ bool LockDirectory(const fs::path& directory, const std::string lockfile_name, b
37         return true;
38     }
39 
40-    // Create empty lock file if it doesn't exist.
41-    FILE* file = fsbridge::fopen(pathLockFile, "a");
42-    if (file) fclose(file);
43-
44     try {
45         auto lock = MakeUnique<boost::interprocess::file_lock>(pathLockFile.string().c_str());
46         if (!lock->try_lock()) {

Then manually created the lock directory and file:

0mkdir /tmp/locktest && touch /tmp/locktest/.lock

Result:

0$ test/test_bitcoin --run_test=util_tests/test_LockDirectory
1Running 1 test case...
2
3*** No errors detected

… yep, it just works. Curious.

There are a few fixes here and I can’t produce any real-world issues, so utACK from me for 0.16 for the sake of not dragging it out.

If you can’t find a problem with my methodology above, I agree, let’s get back to this discussion after 0.16.

Sjors commented at 1:19 pm on February 15, 2018: member

make check is still happy on MacOS (46d46323).

laanwj added the label Needs backport on Feb 15, 2018

util: Fix multiple use of LockDirectory

This commit fixes problems with calling LockDirectory multiple times on
the same directory, or from multiple threads. It also fixes the build on
OpenBSD.

- Wrap the boost::interprocess::file_lock in a std::unique_ptr inside
  the map that keeps track of per-directory locks. This fixes a build
  issue with the clang 4.0.0+boost-1.58.0p8 version combo on OpenBSD
  6.2, and should have no observable effect otherwise.

- Protect the locks map using a mutex.

- Make sure that only locks that are successfully acquired are inserted
  in the map.

- Open the lock file for appending only if we know we don't have the
  lock yet - The `FILE* file = fsbridge::fopen(pathLockFile, "a");`
  wipes the 'we own this lock' administration, likely because it opens
  a new fd for the locked file then closes it.

fc888bfcac

test: Add unit test for LockDirectory

Add a unit test for LockDirectory, introduced in #11281.

1d4cbd26e4

laanwj force-pushed on Feb 15, 2018

laanwj commented at 3:27 pm on February 15, 2018: member

squashed into two commits (one that makes the changes to LockDatadirectory, one that adds the unit test) 46d4632 2018_01_openbsd_util_fix_v1 → 1d4cbd2

laanwj merged this on Feb 15, 2018

laanwj closed this on Feb 15, 2018

laanwj referenced this in commit 58715f6d07 on Feb 15, 2018

laanwj referenced this in commit 32a726846d on Feb 15, 2018

laanwj referenced this in commit 4d54e7ad41 on Feb 15, 2018

laanwj removed the label Needs backport on Feb 15, 2018

HashUnlimited referenced this in commit 20b1dca709 on Mar 16, 2018

HashUnlimited referenced this in commit deb02e491b on Mar 16, 2018

ccebrecos referenced this in commit 89e5863863 on Sep 14, 2018

ccebrecos referenced this in commit 2ecd9297dd on Sep 14, 2018

PastaPastaPasta referenced this in commit debd209c46 on Apr 3, 2020

PastaPastaPasta referenced this in commit 40e4a1a45c on Apr 3, 2020

ckti referenced this in commit 65e535a7d3 on Mar 28, 2021

gades referenced this in commit 78ffe97fcd on Jun 30, 2021

random-zebra referenced this in commit 61a098a775 on Aug 5, 2021

MarcoFalke locked this on Sep 8, 2021

util: Make LockDirectory thread-safe, consistent, and fix OpenBSD 6.2 build #12422

util: Fix multiple use of LockDirectory