Bugfix: RPC: savemempool: Don't save until LoadMempool() is finished

jtimon commented at 9:00 pm on January 12, 2018: contributor

The tests are a little bit slow, mempool_persist.py goes from about 20 s to about 120 s in my hardware. Perhaps there’s a better way to test this.

jtimon force-pushed on Jan 12, 2018

in src/rpc/blockchain.cpp:1610 in f70efa5d16 outdated

1582@@ -1583,14 +1583,14 @@ UniValue savemempool(const JSONRPCRequest& request)
1583     if (request.fHelp || request.params.size() != 0) {
1584         throw std::runtime_error(
1585             "savemempool\n"
1586-            "\nDumps the mempool to disk.\n"
1587+            "\nDumps the mempool to disk. It will fail until the previous dump is fully loaded.\n"

instagibbs commented at 9:21 pm on January 12, 2018:

this also cannot complete when -persistmempool is false, yes? Might want to add that to RPC notes, or set it to true always if persist is false.

luke-jr commented at 9:32 pm on January 12, 2018:

I think it’s probably important that this works if -persistmempool is false.

TheBlueMatt commented at 6:38 pm on January 14, 2018:

Indeed, definitely needs to work with -persistmempool false, that’s the whole point of the savemempool command. AFAICT the current implementation does work in that case.

jtimon commented at 2:29 pm on January 15, 2018:

I agree it’s best it works in that case, just didn’t thought about it initially. But for the record, I’m using savemempool with -persistmempool to true, because I want to read it on restart too! I think the whole point of savemempool is to cover use cases where for whatever reason (in my case, an insufficient knowledge of docker-compose) you can’t or don’t always let the daemon exit gracefully.

in test/functional/mempool_persist.py:124 in e2b17a5822 outdated

118+        self.start_node(0)
119+        self.nodes[0].generate(12)
120+        for i in range(NUM_TXS_IN_MEMPOOL):
121+            self.nodes[0].sendtoaddress(self.nodes[0].getnewaddress(), Decimal("0.00001"))
122+        self.stop_nodes()
123+        self.start_node(0)

instagibbs commented at 9:24 pm on January 12, 2018:

is this test essentially racing the mempool acceptance?

jtimon commented at 9:30 pm on January 12, 2018:

stopping the node the mempool is dumped and starting the node, LoadMempool gets called again, hopefully with enough work ahead to catch the error when calling savemempool on the next line.

jtimon force-pushed on Jan 12, 2018

in src/rpc/blockchain.cpp:1586 in 4b30711566 outdated

1582@@ -1583,14 +1583,14 @@ UniValue savemempool(const JSONRPCRequest& request)
1583     if (request.fHelp || request.params.size() != 0) {
1584         throw std::runtime_error(
1585             "savemempool\n"
1586-            "\nDumps the mempool to disk.\n"
1587+            "\nDumps the mempool to disk. It will fail until the previous dump is fully loaded or when -persistmempool=0.\n"

MarcoFalke commented at 9:45 pm on January 12, 2018:

I don’t see a reason to make it fail when nopersistmempool

in src/init.cpp:682 in 4b30711566 outdated

678@@ -679,6 +679,7 @@ void ThreadImport(std::vector<fs::path> vImportFiles)
679     } // End scope of CImportingNow
680     if (gArgs.GetArg("-persistmempool", DEFAULT_PERSIST_MEMPOOL)) {
681         LoadMempool();
682+        g_is_mempool_loaded = true;

MarcoFalke commented at 9:46 pm on January 12, 2018:

Move down two lines to break out of this scope?

jtimon force-pushed on Jan 12, 2018

jtimon commented at 9:55 pm on January 12, 2018: contributor

Modified so that savemempool keeps working when -persistmempool=0 as discussed.

in src/validation.cpp:230 in d494001d79 outdated

226@@ -227,6 +227,7 @@ CAmount maxTxFee = DEFAULT_TRANSACTION_MAXFEE;
227 
228 CBlockPolicyEstimator feeEstimator;
229 CTxMemPool mempool(&feeEstimator);
230+std::atomic_bool g_is_mempool_loaded(false);

MarcoFalke commented at 10:58 pm on January 12, 2018:

style-nit: Might use g_is_mempool_loaded{false} for compile-time narrowing check.

jtimon commented at 11:19 pm on January 12, 2018:

Fixed

jtimon force-pushed on Jan 12, 2018

fanquake added the label RPC/REST/ZMQ on Jan 13, 2018

fanquake added the label Mempool on Jan 13, 2018

in src/init.cpp:684 in e685d0b7ea outdated

680@@ -681,6 +681,7 @@ void ThreadImport(std::vector<fs::path> vImportFiles)
681         LoadMempool();
682         fDumpMempoolLater = !fRequestShutdown;
683     }
684+    g_is_mempool_loaded = true;

TheBlueMatt commented at 6:37 pm on January 14, 2018:

It would be nice to drop fDumpMempoolLater and only have one global for “mempool was loaded”. Should be able to set g_is_mempool_loaded to !fRequestShutdown here, and then check -persistmempool before flushing during shutdown to avoid changing behavior without adding another boolean.

jtimon commented at 2:32 pm on January 15, 2018:

Perhaps I’m not understanding the code, but what I assumed is that fDumpMempoolLater is set to !fRequestShutdown instead of true to avoid more than one thread dumping the mempool. But in this case we want all threads to work with savemempool once the mempool is loaded. Am I confused about this or missing something?

TheBlueMatt commented at 7:00 pm on January 15, 2018:

There is only one shutdown thread, not sure what you mean “more than one thread dumping the mempool”. !fRequestShutdown is a check to make sure LoadMempool() didn’t quit early, but its also safe to re-use that in RPC as if fRequestShutdown is set we’ll be shutting down soon anyway, so RPCs can fail.

jtimon commented at 8:42 pm on January 15, 2018:

Alright, I wasn’t understanding why !fRequestShutdown was set, it seems we can have a single boolean then and drop fDumpMempoolLater.

in src/rpc/blockchain.cpp:1593 in e685d0b7ea outdated

1590             + HelpExampleRpc("savemempool", "")
1591         );
1592     }
1593 
1594-    if (!DumpMempool()) {
1595+    if (!g_is_mempool_loaded || !DumpMempool()) {

TheBlueMatt commented at 6:37 pm on January 14, 2018:

Would be nice to give a different error message here.

promag commented at 2:46 pm on January 15, 2018:

Agree, with the corresponding test.

promag commented at 2:46 pm on January 15, 2018:

Should the g_is_mempool_loaded check be inside of DumpMempool? No need to see g_is_mempool_loaded in the RPC code.

Note that with this change there is no need for a functional test, just a c++ test DumpMempool.

jtimon commented at 8:45 pm on January 15, 2018:

Well, if I put it inside DumpMempool() then I can’t have the different meesage, right? I don’t have a strong opinion about putting it inside and removing the functional test, I would like to hear what others think before doing it. I’ll change the error message for now.

promag commented at 2:53 pm on January 15, 2018: member

Concept ACK.

BTW, Should the RPC server start after the mempool is loaded?

jtimon force-pushed on Jan 15, 2018

jtimon commented at 9:09 pm on January 15, 2018: contributor

I guess starting the rpc server only after the mempool is loaded is another solution, but I’m not sure it is better. Unified the 2 bool globals and separated the error message.

in src/rpc/blockchain.cpp:1593 in 4e9f522fd9 outdated

1589             + HelpExampleCli("savemempool", "")
1590             + HelpExampleRpc("savemempool", "")
1591         );
1592     }
1593 
1594+    if (!g_is_mempool_loaded && gArgs.GetArg("-persistmempool", DEFAULT_PERSIST_MEMPOOL)) {

gmaxwell commented at 5:01 am on January 16, 2018:

Why is there still an argument check here? The setting of g_is_mempool_loaded appears to me to be unconditional now.

MarcoFalke commented at 4:44 pm on January 16, 2018:

The code seems correct, but is just badly documented.

What about

0const bool is_supposed_to_load_mempool{gArgs.GetArg("-persistmempool", DEFAULT_PERSIST_MEMPOOL)};
1if (!g_is_mempool_loaded && is_supposed_to_load_mempool) {

jtimon commented at 5:16 pm on January 16, 2018:

Well, it is now not needed since the g_is_mempool_loaded is set out of the if gArgs.GetArg("-persistmempool".... I was thinking that perhaps we should allow to dump even before g_is_mempool_loaded is set if -persistmempool=0 (since in that case it’s not loading it anyway). But perhaps it’s better that it waits for that line to be run when -persistmempool=0 too. Or perhaps the we can set g_is_mempool_loaded only conditionally like fDumpMempoolLater was.

No strong opinion, I’m happy with any of the possible variations on this. But I don’t think it’s a matter of documentation. It’s just choosing what we want here.

MarcoFalke commented at 6:22 pm on January 16, 2018:

Why would there be anything in mempool, when -persistmempool=0 and you are not yet connected to any peers? So there wouldn’t be any need to allow savemempool.

We are mostly nitpicking here, and I don’t care too much what variation you pick, but the current code seems confusion/hard to read, otherwise we woulnd’t have this conversation.

jtimon commented at 9:08 pm on January 16, 2018:

Alright, I’ll just remove the gArgs.GetArg("-persistmempool", DEFAULT_PERSIST_MEMPOOL) condition from here and leave it as it was right before.

jtimon commented at 9:14 pm on January 16, 2018:

Fixed.

jtimon force-pushed on Jan 16, 2018

MarcoFalke commented at 0:26 am on January 17, 2018: member

utACK 5b80af6322036c141ffce8fb7f10f5d40b8bc0f0 (didn’t look at tests)

in test/functional/mempool_persist.py:118 in 733315992a outdated

111@@ -112,5 +112,17 @@ def run_test(self):
112         assert_raises_rpc_error(-1, "Unable to dump mempool to disk", self.nodes[1].savemempool)
113         os.remove(mempooldotnew1)
114 
115+        NUM_TXS_IN_MEMPOOL = 900
116+        self.log.debug("Send %d transactions from node0 (to its own address)" % NUM_TXS_IN_MEMPOOL)
117+        self.stop_nodes()

MarcoFalke commented at 0:11 am on January 19, 2018:

A comment explaining why this is needed?

jtimon commented at 9:04 pm on January 21, 2018:

I guess perhaps it is not needed…

MarcoFalke commented at 9:31 pm on January 21, 2018:

Please remove it, when it is not needed. Starting and stopping nodes is where a ton of time is wasted in the test framework.

MarcoFalke commented at 9:47 pm on January 21, 2018:

I assume you can modify the self.stop_nodes() a few lines above to say self.stop_node(1)

in test/functional/mempool_persist.py:122 in 733315992a outdated

116+        self.log.debug("Send %d transactions from node0 (to its own address)" % NUM_TXS_IN_MEMPOOL)
117+        self.stop_nodes()
118+        self.start_node(0)
119+        self.nodes[0].generate(12)
120+        for i in range(NUM_TXS_IN_MEMPOOL):
121+            self.nodes[0].sendtoaddress(self.nodes[0].getnewaddress(), Decimal("0.00001"))

MarcoFalke commented at 0:11 am on January 19, 2018:

How will this not create too long mempool chains?

jtimon commented at 9:04 pm on January 21, 2018:

I’m not sure I understand the question. What’s the problem?

MarcoFalke commented at 9:52 pm on January 21, 2018:

Currently the mempool only accepts a chain of up to 26 unconfirmed transactions, since coin selection is not deterministic, I assume we will run into travis failures at some point.

Also, IIRC the wallet is extremely slow with coin selection, when a lot of the coins are unconfirmed change. There might be some helper functions in the test framework that can create a ton of spam transactions.

MarcoFalke commented at 9:54 pm on January 21, 2018: member

Test takes 7-8 Minutes on travis. Should be fixed before merge.

MarcoFalke commented at 9:56 pm on January 21, 2018: member

Test is failing for me locally. You might as well leave out the test and ask for some Tested ACKs instead.

jtimon commented at 2:22 am on January 23, 2018: contributor

@MarcoFalke Sure, I’m happy to leave out the test if I get some tested acks. It’s an edge case anyway and seems expensive to test this way, filling the mempool enough so that the loading takes long enough to get the error (plus that varies with the machine and machine load [this seems close to the minimum in my machine but it’s not enough on yours and it’s too much on travis]).
Additionally it is quite simple IMO to see this is correct without the tests with very little grep since it’s only +9-4 without the tests.

jtimon force-pushed on Jan 31, 2018

jtimon commented at 10:09 pm on January 31, 2018: contributor

Needed rebase

MarcoFalke commented at 10:17 pm on January 31, 2018: member

I am guessing you wouldn’t have to rebase if you just dropped the test commit. Since I am going to NACK on merging the test commit for the reasons provided above, you might as well remove it.

jtimon commented at 10:24 pm on January 31, 2018: contributor

Yes, I’m also for removing it. I was just waiting for more people to agree with us and some tested ACKs as you said.

EDIT: but, no, it needed rebase and not for the tests, new code close to the one I was changing was changed in https://github.com/bitcoin/bitcoin/pull/12266/files#diff-c865a8939105e6350a50af02766291b7R204

greenaddress commented at 10:37 am on February 4, 2018: contributor

BTW, Should the RPC server start after the mempool is loaded?

Seems to me this issue is not just about saving mempool but also about any RPC that interacts with the mempool like fetching raw transactions and so on - thus I also think it may be a good idea to not start the rpc server until the service is up and loaded the mempool - am I missing something?

MarcoFalke commented at 5:51 pm on February 4, 2018: member

@greenaddress Loading the mempool takes several minutes or hours (c.f. #12106 (comment)).

jtimon commented at 6:19 pm on February 4, 2018: contributor

@greenaddress Other calls interacting with the mempool, already give their errors asociated to the mempool not being fully loaded from disk (or don’t need to give errors at all). getrawmempool, for example, starts returning an empty list and then more items as the mempool is loaded. getmempoolentry will give you an error if you ask for a tx that has not been loaded yet. This is simply to solve the issue #12142 which is specific to savemempool, I don’t see why would we want to block the rpc server until the mempool loads and it is clearly not needed to solve this issue specific to savemempool.

laanwj added this to the "Blockers" column in a project

jtimon force-pushed on Feb 8, 2018

jtimon commented at 9:26 pm on February 8, 2018: contributor

Test removed

MarcoFalke commented at 9:29 pm on February 8, 2018: member

re-utACK ab8329601ceba739d5ea86f2713216659153dd25

in src/init.cpp:695 in ab8329601c outdated

685@@ -687,8 +686,8 @@ void ThreadImport(std::vector<fs::path> vImportFiles)
686     } // End scope of CImportingNow
687     if (gArgs.GetArg("-persistmempool", DEFAULT_PERSIST_MEMPOOL)) {
688         LoadMempool();
689-        fDumpMempoolLater = !fRequestShutdown;
690     }
691+    g_is_mempool_loaded = !fRequestShutdown;

promag commented at 0:08 am on February 9, 2018:

Is it me or after the above StartShutdown() calls we should return?

in src/init.cpp:693 in ab8329601c outdated

685@@ -687,8 +686,8 @@ void ThreadImport(std::vector<fs::path> vImportFiles)
686     } // End scope of CImportingNow
687     if (gArgs.GetArg("-persistmempool", DEFAULT_PERSIST_MEMPOOL)) {
688         LoadMempool();

promag commented at 0:12 am on February 9, 2018:

IMO we should do this instead:

0g_is_mempool_loaded = LoadMempool();

because, for instance, in LoadMempool(): https://github.com/bitcoin/bitcoin/blob/67447ba06057b8e83f962c82491d2fe6c5211f50/src/validation.cpp#L4592-L4593

promag commented at 0:15 am on February 9, 2018: member

AFAICT LoadMempool() doesn’t lock the mempool and since the node starts concurrently it is possible to have new transactions in the mempool after LoadMempool() finishes. If that’s the case (?) I guess it’s ok to discard them in this fast startup-shutdown case.

promag commented at 0:39 am on February 9, 2018: member

Tested ACK The mempool was not loaded yet error.

I also think we should prevent concurrent savemempool calls, currently the 2nd waits for the first to end, but IMO the 2nd should fail. Much like rescanblockchain: https://github.com/bitcoin/bitcoin/blob/67447ba06057b8e83f962c82491d2fe6c5211f50/src/wallet/rpcwallet.cpp#L3458-L3461

MarcoFalke deleted a comment on Feb 12, 2018

jtimon commented at 4:42 am on February 21, 2018: contributor

Rebased and added several commits inspired by @promag ’s last suggestion but without looking much at WalletRescanReserver::reserver() and trying to do things with atomic booleans in mempool instead.

We can squash, leave for later PRs or reject forever each of the newly proposed commits.

I wish I could go as far as s/LoadMempool(/mempool.Load(/ and s/DumpMempool(/mempool.Dump(/ , moving code from validation.cpp to txmempool.cpp, but at some point we need to just fix #12142 and leave later improvements for later PRs.

jtimon force-pushed on Feb 21, 2018

promag commented at 10:42 am on February 21, 2018: member

without looking much at WalletRescanReserver::reserver() and trying to do things with atomic booleans in mempool instead.

Any reason to do differently?

Not sure about representing the state with 2 atomic booleans. Will review later.

jtimon commented at 4:33 am on February 28, 2018: contributor

@promag Yeah, sorry for pushing a half-baked thought. Please, let me come back to this again before you review it, perhaps I can convince myself it doesn’t make sense and I save you some time.

EDIT: Or perhaps we can just leave it as it previously was and leave your request for a following PR, but I think it makes a lot of sense and it shouldn’t be that hard if I hadn’t tried to simplify the wheel.

laanwj removed this from the "Blockers" column in a project

in src/txmempool.h:458 in f2c13eb6f6 outdated

454@@ -452,6 +455,7 @@ class CTxMemPool
455 public:
456 
457     static const int ROLLING_FEE_HALFLIFE = 60 * 60 * 12; // public only for testing
458+    static const uint64_t MEMPOOL_DUMP_VERSION = 1;

MarcoFalke commented at 0:30 am on March 19, 2018:

Please remove unrelated move of variables from a bug fix pull request

MarcoFalke commented at 0:42 am on March 19, 2018: member

What is the status of this? Seems stale and can be closed?

jtimon commented at 9:25 am on March 22, 2018: contributor

I’m sorry @MarcoFalke IIRC the status is that the first commit has been around for long and tested by myself many times (but I’m trusting my rebases on the same ting now), but then after @promag ’s review and test (he gave a tested ack for the first commit), I got very excited about a suggestion he made to upgrade this PR to fix more related things. I think he is totally right in his suggestion of apart from not allowing saving the mempool until it has been loaded, also just return an error as suggested when you call the rpc to save the mempool and it’s already in the process of saving.

But instead of listening to @promag and copy a part of the codebase that already works for the same king of thing, I tried to re-invent the wheel because I thought I could implement it simpler and I failed so far.

I suggest that we merge the first commit which is very simple and even has had tests in the past (until people asked them to be removed for being slow and non-deterministic) and also been tested manually (not very hard to test manually, btw). Then someone (ehem, promag) could create another so that no thread dares to save the mempool while already saving it. Please ping me to test that second part, since it’s as easy to test manually as this one,

As a user that never gives more the daemon more than 60 secs to gracefully exit and thus depends on this rpc call to persist the mempool, I can just always wait longer for the first save, I can at most miss mempool data while re-deploying too often. Being prevented from calling savemempool twice is I think a cool improvement but not that much for users that are calling savemempool with a cron process like me (and I imagine, most savemempool users).

promag commented at 11:12 am on March 22, 2018: member

I’m fine with @jtimon suggestion to move the fix forward. I can then submit a PR to improve concurrency.

Bugfix: RPC: savemempool: Don't save until LoadMempool() is finished cb1e319fe9

jtimon force-pushed on Mar 29, 2018

jtimon commented at 4:30 am on March 29, 2018: contributor

I’m assuming an agreement from @MarcoFalke ’s lack of response. Just in case, the branch that is not tested and that is trying to solve @promag ’s very legitimate concern is in https://github.com/jtimon/bitcoin/tree/b16-bugfix-savemempool-extend. I will review or at a minimum test his more reasonable future proposal. But I tried to make it in an unorthodox way I would like to study more than I know the way @promag ’s knows how to do it. But please let’s fix this first.

laanwj commented at 10:25 pm on March 29, 2018: member

utACK cb1e319

laanwj merged this on Mar 29, 2018

laanwj closed this on Mar 29, 2018

laanwj referenced this in commit 3b62a91386 on Mar 29, 2018

jtimon deleted the branch on May 23, 2018

laanwj referenced this in commit 2e15fa16cd on Oct 24, 2018

PastaPastaPasta referenced this in commit be018722f1 on Sep 27, 2020

PastaPastaPasta referenced this in commit be87ffbad8 on Oct 22, 2020

PastaPastaPasta referenced this in commit d130278aaa on Jun 27, 2021

PastaPastaPasta referenced this in commit 1fd9861979 on Jun 28, 2021

PastaPastaPasta referenced this in commit 9cbcb9629b on Jun 29, 2021

PastaPastaPasta referenced this in commit 36579d9a5b on Jul 1, 2021

PastaPastaPasta referenced this in commit 49b0e46fdc on Jul 1, 2021

PastaPastaPasta referenced this in commit 0e81e96d4d on Jul 1, 2021

PastaPastaPasta referenced this in commit 77b76ef0a5 on Jul 3, 2021

MarcoFalke locked this on Sep 8, 2021

Bugfix: RPC: savemempool: Don’t save until LoadMempool() is finished #12172