This PR fixes an early relocking race condition from #18811 where old relock callback runs after new wallet unlock code and nRelockTime update but before rpcRunLater call, causing early relock and incorrect nRelockTime time
rpc: Relock wallet only if most recent callback #18814
pull promag wants to merge 2 commits into bitcoin:master from promag:2020-04-rpc-wallet-relock changing 3 files +14 −4-
promag commented at 9:48 am on April 29, 2020: member
-
fanquake added the label RPC/REST/ZMQ on Apr 29, 2020
-
rpc: Add mutex to guard deadlineTimers a2e6db5c4f
-
promag force-pushed on Apr 29, 2020
-
DrahtBot commented at 10:56 am on April 29, 2020: member
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #18452 (qt: Fix shutdown when waitfor* cmds are called from RPC console by hebasto)
- #18354 (Protect wallet by using shared pointers by bvbfan)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
-
in src/wallet/rpcwallet.cpp:1968 in 5a0486cd15 outdated
1958@@ -1957,9 +1959,11 @@ static UniValue walletpassphrase(const JSONRPCRequest& request) 1959 // wallet before the following callback is called. If a valid shared pointer 1960 // is acquired in the callback then the wallet is still loaded. 1961 std::weak_ptr<CWallet> weak_wallet = wallet; 1962- pwallet->chain().rpcRunLater(strprintf("lockwallet(%s)", pwallet->GetName()), [weak_wallet] { 1963+ pwallet->chain().rpcRunLater(strprintf("lockwallet(%s)", pwallet->GetName()), [weak_wallet, relock_time] { 1964 if (auto shared_wallet = weak_wallet.lock()) { 1965 LOCK(shared_wallet->cs_wallet); 1966+ // Skip if this is not the most recent rpcRunLater callback. 1967+ if (shared_wallet->nRelockTime != relock_time) return;
ryanofsky commented at 12:04 pm on April 29, 2020:I think this could introduce a bug where wallet never relocks.
In case where walletpassphrase is called from two threads, first thread sets
pwallet->nRelockTime
and is interrupted after constructing lambda but before invoking rpcRunLater. Second thread setspwallet->nRelockTime
to a different value and callsrpcRunLater
, and then first thread wakes up and callsrpcRunLater
.Lambda from second thread never runs because lambda from first thread replaced it. But lambda from first thread has the wrong
relock_time
value so never relocks the wallet.Simplest fix might be adding relock time to timer names “lockwallet(%s,%i)” above to make them unique
promag commented at 12:29 pm on April 29, 2020:That would make timers list grow - they are never removed, just replaced.
promag commented at 9:09 pm on April 29, 2020:Added a mutex to prevent concurrent calls with the same wallet. The lock is held during walletUnlock
andRPCRunLater
so the case you describe can’t happen.
promag commented at 10:30 pm on May 3, 2020:Nice catch BTW.MarcoFalke commented at 12:22 pm on April 29, 2020: memberWould be nice to write a test for thispromag commented at 12:38 pm on April 29, 2020: memberThanks for reviewing @ryanofsky. I’ll address that case. @MarcoFalke probably a unit test no?MarcoFalke commented at 12:56 pm on April 29, 2020: memberYes, I haven’t looked at the code changes, but I assume some kind of unit test makes sense.MarcoFalke commented at 1:17 pm on April 29, 2020: memberAlso, this needs backport, right?MarcoFalke added this to the milestone 0.20.0 on Apr 29, 2020MarcoFalke added the label Needs backport (0.20) on Apr 29, 2020promag force-pushed on Apr 29, 2020ryanofsky approvedryanofsky commented at 9:52 pm on April 29, 2020: memberCode review ACK b0f1e19f82d725dbefb4b0ac69e4d74f2deb1d44. I think all the problems we’ve seen so far are avoided:
- Original deadlock from #18487 where rpcRunLater blocks waiting for old relock callback in timer thread which is deadlocked waiting for cs_wallet
- Early relocking race condition from #18811 where old relock callback runs after new wallet unlock code and nRelockTime update but before rpcRunLater call, causing early relock and incorrect nRelockTime time
- Failure to relock race condition in 5a0486cd15c26e5e587f56f331dec3a7756407b2 where two relock callbacks could each cancel the other out without either of them relocking
Everything would be simpler if rpcRunLater would just activate the new callback after the old callback if the old callback happened to be running, instead of blocking waiting for the old callback to finish. Then walletpassphrase could go back to using cs_wallet and multiple mutexes wouldn’t be required. But I think the approach in this PR should work too and is probably a smaller change on top of existing code.
Only additional suggestion would be to write a PR description that summarizes what this is fixing without having to read the other thread. A unit test could be a good idea, but might be difficult to write, so I don’t think it would be crazy to merge this without a new test if it’s difficult to come up with one. Especially since the bug this is fixing was caught by a python test.
promag commented at 9:58 pm on April 29, 2020: memberA unit test could be a good idea, but might be difficult to write
FWIW I’m still trying this.
promag commented at 10:11 pm on April 29, 2020: memberinstead of blocking waiting for the old callback to finish.
BTW If we bump minimum libevent then we could use this https://github.com/libevent/libevent/blob/29cc8386a2f7911eaa9336692a2c5544d8b4734f/whatsnew-2.1.txt#L248-L265
MarcoFalke commented at 10:30 pm on April 29, 2020: memberBumping libevent should be discussed separately. See also #18676ryanofsky commented at 10:37 pm on April 29, 2020: memberAlso, I haven’t looked but I doubt there’s a need for a special libevent feature to improve rpcRunLater. We should just need an extra bit of code at the end of the existing libevent callback to register a new callback if there’s one pendingpromag commented at 10:34 pm on May 3, 2020: memberFWIW I’m still trying this.
And I quit. Tried with unit test but it required to start HTTP/RPC server because of libevent and timer stuff. Then tried funcional test but can’t figure how to hit the problem. @ryanofsky I’ll borrow your comment above to improve PR description.
ryanofsky commented at 0:38 am on May 7, 2020: memberThis still looks good but it might be simpler to squash second and third commits to avoid having a bug in the second commit which is fixed by the third commit.
This would simplify the PR description so you could drop the paragraph “Failure to relock race condition in 59a50e7” about the temporary bug.
rpc: Relock wallet only if most recent callback 9f59dde974promag force-pushed on May 7, 2020promag commented at 0:45 am on May 7, 2020: memberDone.ryanofsky commented at 1:04 am on May 7, 2020: memberCode review ACK 9f59dde9740d065118bdddde75ef9f4e4603a7b1. No changes since last review except squashing commits.
Two other thoughts:
-
It might be nice to add a comment for
RPCRunLater
that mentions the risk of deadlocks:0/** 1 * Run func nSeconds from now. 2 * Overrides previous timer <name> (if any). 3 * 4 * If previous timer has started executing, this will block waiting for the call 5 * to return. Therefore, to avoid deadlocks, RPCRunLater can't be called while 6 * holding locks that a previous timer might be waiting for. 7 * 8 * [@todo](/bitcoin-bitcoin/contributor/todo/) Make implementation nonblocking to avoid risk of deadlocks and simplify 9 * usage. 10 */ 11void RPCRunLater(const std::string& name, std::function<void()> func, int64_t nSeconds);
- This is an idea for a new PR, but maybe there should be an RPC function to return a timer’s scheduled time so we could drop
CWallet::nRelockTime
MarcoFalke commented at 2:45 pm on May 11, 2020: memberACK 9f59dde9740d065118bdddde75ef9f4e4603a7b1jonatack commented at 2:32 pm on May 12, 2020: memberACK 9f59dde9740d065118bdddde75ef
I like @ryanofsky’s suggestions in #18814 (comment).
fanquake merged this on May 13, 2020fanquake closed this on May 13, 2020
promag deleted the branch on May 13, 2020promag commented at 10:01 am on May 13, 2020: member@ryanofsky honestly I’d suggest to avoid using it. Instead of timeout what I think would be best is to exclusively “bind” the unlock to the current RPC connection. In the GUI we could have a timer. I don’t see other uses of the timer in the server.sidhujag referenced this in commit c7204ec263 on May 14, 2020fanquake referenced this in commit 0342f4ad67 on May 14, 2020fanquake referenced this in commit 2e31113f8c on May 14, 2020fanquake removed the label Needs backport (0.20) on May 14, 2020fanquake referenced this in commit ca4dac48c5 on May 15, 2020fanquake referenced this in commit 251e321ad7 on May 15, 2020MarcoFalke referenced this in commit 17bdf2afae on May 15, 2020Fabcien referenced this in commit afc4263288 on Jan 6, 2021backpacker69 referenced this in commit 6adf27d814 on Mar 28, 2021backpacker69 referenced this in commit f75b8d27c0 on Mar 28, 2021DrahtBot locked this on Feb 15, 2022
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 09:12 UTC
More mirrored repositories can be found on mirror.b10c.me