Prevent concurrent savemempool

promag commented at 5:01 pm on March 30, 2018: member

Follow up of #12172, this change prevents calling savemempool RPC concurrently.

promag commented at 5:03 pm on March 30, 2018: member

In order to test the new check and error add:

 0--- a/src/rpc/blockchain.cpp
 1+++ b/src/rpc/blockchain.cpp
 2@@ -1619,6 +1619,8 @@ UniValue savemempool(const JSONRPCRequest& request)
 3         throw JSONRPCError(RPC_MISC_ERROR, "Currently dumping the mempool");
 4     }
 5
 6+    MilliSleep(10000);
 7+
 8     if (!g_is_mempool_loaded) {
 9         throw JSONRPCError(RPC_MISC_ERROR, "The mempool was not loaded yet");
10     }

Then run

0bitcoin-cli -regtest savemempool & bitcoin-cli -regtest savemempool

The second should wait for the first to terminate.

fanquake added the label RPC/REST/ZMQ on Mar 31, 2018

fanquake added the label Mempool on Mar 31, 2018

in src/rpc/blockchain.cpp:1619 in 7f722a1f0c outdated

1613@@ -1614,6 +1614,11 @@ UniValue savemempool(const JSONRPCRequest& request)
1614         );
1615     }
1616 
1617+    TxMempoolDumpReserver reserver(mempool);
1618+    if (!reserver.reserve()) {
1619+        throw JSONRPCError(RPC_MISC_ERROR, "Currently dumping the mempool");

conscott commented at 6:08 am on April 2, 2018:

Nit: maybe instead,

“savemempool already in progress”

promag commented at 8:43 am on April 2, 2018:

Sounds better.

conscott commented at 6:14 am on April 2, 2018: contributor

Tested ACK 7f722a1f0c82e1cc8dd9f6af9159bf7d7b8f2626

laanwj commented at 8:12 am on April 3, 2018: member

After #12863 I’m even more convinced that this might be the wrong approach. Protecting access using a mutex instead of returning these kind of ‘busy errors’ would avoid having to implement retry-poll-loops client-side (including in the tests).

If you want to stick with this you’d at least need to define a new error code that means ’transient error’, similar to RPC_IN_WARMUP. But I think pushing this responsibility to the client is unnecessary.

promag commented at 10:14 am on April 3, 2018: member

@laanwj indeed it’s very arguable. I think it’s preferable to have the client retrying than reserving resources on the server side. IMO having the client waiting is also not desirable because the client can wait indefinitely and also timeout (but then he can raise the timeout) and we would process the request anyway (?). Getting an error instead of waiting is also more informative and allows the client to explicitly log and retry when he feels like.

laanwj commented at 11:17 am on April 3, 2018: member

OTOH

Then the API wouldn’t be changed. Users of savemempool can rely on the call working, or fatally failing.
Hanging indefinitely is a bug no matter what. If it takes too long, something is wrong.
Interrupt semantics usually require less resources than polling. In this case: sleeping on a mutex requires virtually no resources. Whereas repetitive RPC requests take up CPU cycles and I/O.

MarcoFalke commented at 4:03 pm on April 3, 2018: member

I think we had reports that loading the mempool took 1.5 hours or so, just noting without further comment.

promag commented at 10:25 pm on April 3, 2018: member

Then the API wouldn’t be changed. Users of savemempool can rely on the call working, or fatally failing.

The caller should always have error handling, and this new error would fall in his “unkown error”.

Hanging indefinitely is a bug no matter what. If it takes too long, something is wrong.

We have calls that can hang a lot of time, for instance when the wallet is really big (as in lots of keys, lots of transactions).

Interrupt semantics usually require less resources than polling.

Right, but at least a thread and a socket is on hold. And when the mutex is acquired, it might do something that is no longer relevant - in this case dumping the mempool again..

We have this semantic in rescanblockchain, do you think we should change that too?

And regarding #12863, having the client blindingly wait for 1.5 hours should not be an option (?)

sipa commented at 11:28 pm on April 3, 2018: member

I’m also more a fan of blocking until the existing save has finished. Given that no modifications in the mempool can happen in the mean time anyway (I think?), we could even just wait until the existing save operation finishes and then return from both RPC calls (without an additional save).

promag commented at 11:50 pm on April 3, 2018: member

Given that no modifications in the mempool can happen in the mean time anyway

Actually it can, DumpMempool takes a copy of the mempool contents and then releases the lock to then write to file. So, while writing to disk, the mempool can change.

promag commented at 2:04 pm on June 7, 2018: member

What should be done here? I’ve to address @conscott comment above, but I guess I should wait for more concept ACKs.

luke-jr commented at 6:19 pm on June 12, 2018: member

I agree with @laanwj - it should just hold a lock on the file to avoid problems, but otherwise succeed.

promag force-pushed on Jun 14, 2018

promag commented at 10:06 am on June 14, 2018: member

Rebased and updated to block/wait while there is another dump.

promag force-pushed on Jun 14, 2018

promag commented at 10:44 am on June 15, 2018: member

@luke do you think the current code is enough?

MarcoFalke commented at 1:12 pm on June 15, 2018: member

utACK 35bdaf7489cf33d120b3c66f87067c2698ed2e8c

DrahtBot commented at 12:38 pm on August 10, 2018: member

DrahtBot closed this on Aug 10, 2018

DrahtBot reopened this on Aug 10, 2018

in src/txmempool.h:488 in 35bdaf7489 outdated

484@@ -485,6 +485,7 @@ class CTxMemPool
485 
486     mutable CCriticalSection cs;
487     indexed_transaction_set mapTx GUARDED_BY(cs);
488+    std::mutex m_dump_mutex;

ajtowns commented at 1:25 pm on September 18, 2018:

(a) Shouldn’t this be a global mutex, since it’s specific to the “mempool.dat.new” file whose path is based on the (global) GetDataDir()? (b) Why a std::mutex instead of a CCriticalSection?

MarcoFalke commented at 7:31 pm on September 18, 2018:

Why a CCriticalSection instead of a Mutex?

promag commented at 10:16 pm on September 18, 2018:

a) Not sure why that matters. To reduce exposure the mutex could be a static variable in DumpMempool() function; b) we don’t need a recursive mutex.

ajtowns commented at 3:21 pm on September 19, 2018:

a) Yeah, a static variable in/next to the DumpMempool() function was what I was thinking. I don’t think it matters in practice though. b) More about trying to just use one set of locking primitives; Mutex/LOCK per MarcoFalke’s suggestion would make more sense to me.

in src/validation.cpp:4780 in 35bdaf7489 outdated

4776@@ -4777,6 +4777,8 @@ bool DumpMempool(void)
4777     std::map<uint256, CAmount> mapDeltas;
4778     std::vector<TxMempoolInfo> vinfo;
4779 
4780+    std::lock_guard<std::mutex> lock(mempool.m_dump_mutex);

MarcoFalke commented at 0:08 am on September 19, 2018:

Could rebase on master, make the type Mutex and use LOCK(...) here?

MarcoFalke added this to the milestone 0.18.0 on Sep 19, 2018

MarcoFalke removed this from the milestone 0.18.0 on Sep 19, 2018

rpc: Prevent concurrent savemempool 585b47cfe1

promag force-pushed on Oct 20, 2018

promag commented at 10:14 am on October 20, 2018: member

@MarcoFalke @achow101 rebased on master and changed to Mutex/LOCK and static variable in DumpMempool.

MarcoFalke commented at 12:06 pm on October 24, 2018: member

re-utACK 585b47cfe133fae112782ad0a88fe25c71d465fa

laanwj commented at 1:20 pm on October 24, 2018: member

I really like this clean and simple solution

utACK 585b47cfe133fae112782ad0a88fe25c71d465fa

laanwj merged this on Oct 24, 2018

laanwj closed this on Oct 24, 2018

laanwj referenced this in commit 2e15fa16cd on Oct 24, 2018

laanwj added the label Backport on Oct 26, 2018

laanwj added this to the milestone 0.17.1 on Oct 26, 2018

PastaPastaPasta referenced this in commit d130278aaa on Jun 27, 2021

PastaPastaPasta referenced this in commit 1fd9861979 on Jun 28, 2021

PastaPastaPasta referenced this in commit 9cbcb9629b on Jun 29, 2021

PastaPastaPasta referenced this in commit 36579d9a5b on Jul 1, 2021

PastaPastaPasta referenced this in commit 49b0e46fdc on Jul 1, 2021

PastaPastaPasta referenced this in commit 0e81e96d4d on Jul 1, 2021

PastaPastaPasta referenced this in commit 77b76ef0a5 on Jul 3, 2021

MarcoFalke locked this on Sep 8, 2021

Prevent concurrent savemempool #12842