Call wallet notify callbacks in scheduler thread (without cs

TheBlueMatt commented at 12:59 AM on April 27, 2017: member

Based on #10179, this effectively reverts #9583, regaining most of the original speedups of #7946.

This concludes the work of #9725, #10178, and #10179.

See individual commit messages for more information.

TheBlueMatt force-pushed on Apr 27, 2017

in src/txmempool.h:518 in 1b6b0c9911 outdated

 510 | @@ -511,6 +511,9 @@ class CTxMemPool
 511 |      // to track size/count of descendant transactions.  First version of
 512 |      // addUnchecked can be used to have it call CalculateMemPoolAncestors(), and
 513 |      // then invoke the second version.
 514 | +    // Note that addUnchecked is ONLY called from ATMP during normal operation,
 515 | +    // and any other callers may break wallet's in-mempool tracking (due to
 516 | +    // lack of CValidationInterface::TransactionAddedToMempool callbacks).

ryanofsky commented at 9:01 PM on May 2, 2017:

In commit "Add a CValidationInterface::TransactionRemovedFromMempool"

What does this imply? Just that if there are any new calls to addUnchecked, the caller also needs to signal TransactionAddedToMempool not to break the wallet? Would say this in the comment explicitly if this is the case.

TheBlueMatt commented at 7:20 PM on May 3, 2017:

Updated the comment to mention that addUnchecked is only called from ATMP outside of tests period. I think the implication is that we need to fix the strong-coupling here.

in src/txmempool.h:514 in 1b6b0c9911 outdated

 510 | @@ -511,6 +511,9 @@ class CTxMemPool
 511 |      // to track size/count of descendant transactions.  First version of
 512 |      // addUnchecked can be used to have it call CalculateMemPoolAncestors(), and
 513 |      // then invoke the second version.
 514 | +    // Note that addUnchecked is ONLY called from ATMP during normal operation,

ryanofsky commented at 9:05 PM on May 2, 2017:

In commit "Add a CValidationInterface::TransactionRemovedFromMempool"

Unclear to me what a normal operation is. Comment might be clearer mentioning a not normal counterexample.

in src/wallet/wallet.cpp:1189 in f651d58a3b outdated

1184 | +            TRY_LOCK(cs_main, mainLocked);
1185 | +            if (mainLocked) {
1186 | +                if (this->lastBlockProcessed == chainActive.Tip()) {
1187 | +                    return true;
1188 | +                }
1189 | +                // If the user called invalidatechain some things might block

ryanofsky commented at 9:16 PM on May 2, 2017:

In commit "Add CWallet::BlockUntilSyncedToCurrentChain()"

Does "some things might block forever" just mean this wait might block forever? If so, maybe be more concrete and say something like "lastBlockProcessed will not be rewound back to chainActive.Tip()." Otherwise it would be good to clarify what some things is referring to.

in src/wallet/wallet.h:1125 in f651d58a3b outdated

1116 | @@ -1108,6 +1117,14 @@ class CWallet : public CCryptoKeyStore, public CValidationInterface
1117 |         If possibleOldChain is provided, the parameters from the old chain (version)
1118 |         will be preserved. */
1119 |      bool SetHDMasterKey(const CPubKey& key, CHDChain *possibleOldChain = nullptr);
1120 | +
1121 | +    /**
1122 | +     * Blocks until the wallet state is up-to-date to /at least/ the current
1123 | +     * chain at the time this function is entered
1124 | +     * Obviously holding cs_main/cs_wallet when going into this call may cause
1125 | +	 * deadlock

ryanofsky commented at 9:17 PM on May 2, 2017:

In commit "Add CWallet::BlockUntilSyncedToCurrentChain()"

Stray tab here

in src/wallet/rpcwallet.cpp:3005 in ec4df8c633 outdated

2711 | @@ -2648,6 +2712,10 @@ UniValue fundrawtransaction(const JSONRPCRequest& request)
2712 |  
2713 |      RPCTypeCheck(request.params, boost::assign::list_of(UniValue::VSTR));
2714 |  
2715 | +    // Make sure the results are valid at least up to the most recent block

ryanofsky commented at 9:30 PM on May 2, 2017:

In commit "Add calls to CWallet::BlockUntilSyncedToCurrentChain() in RPCs"

Can you give an example of specific bug that could occur without these BlockUntilSynced calls and is prevented by adding them? I looked at some of the old issues (#9584, #9148, etc), but they're confusing and I don't know how much of the information is up to date.

It would be great if BlockUntilSyncedToCurrentChain had a comment that made it clearer when it does and doesn't need to be called, and what consistency issues it is and isn't supposed to solve.

Maybe there should also be a bullet point in the new RPC interface guidelines about what kind of consistency wallet RPCs are expected to have.

in test/functional/zmq_test.py:83 in 6fb571977d outdated

  66 | +            blkhash = bytes_to_hex_str(body)
  67 | +        else:
  68 | +            assert_equal(topic, b"hashtx")
  69 |  
  70 |          msg = self.zmqSubSocket.recv_multipart()
  71 |          topic = msg[0]

ryanofsky commented at 9:52 PM on May 2, 2017:

In commit "Fix zmq tests now that txn/blocks are unordered"

Maybe assert msg[0] != topic above this line to confirm actually receive distinct hashtx and hashblock messages (not two hashblocks).

ryanofsky commented at 10:06 PM on May 2, 2017: member

utACK 6fb571977d9cc41793a594688e5071dd5bbd864d.

Code changes mostly seem great, though as you can tell from my comments I have a somewhat hazy understanding of the semantics and assumptions being made. A little more documentation would make everything clear, I think.

TheBlueMatt force-pushed on May 3, 2017

TheBlueMatt commented at 7:20 PM on May 3, 2017: member

Rebased and fixed @ryanofsky's mostly-comment nits :).

in src/qt/test/rpcnestedtests.cpp:12 in a14fa884f7 outdated

  11 | @@ -12,6 +12,7 @@
  12 |  #include "rpc/server.h"

mchrostowski commented at 7:30 PM on May 3, 2017:

These changes do not appear to be related to the rest. Am I missing something or should this be in its own PR? I believe @sipa made a similar comment on #10179

TheBlueMatt commented at 1:18 AM on May 4, 2017:

Without them test_bitcoin-qt segfaults.

in src/validationinterface.cpp:25 in f481ce1109 outdated

  20 | +    boost::signals2::signal<void (const CBlockIndex *, const std::shared_ptr<const CBlock>&)> NewPoWValidBlock;
  21 | +};
  22 | +
  23 |  static CMainSignals g_signals;
  24 |  
  25 | +CMainSignals::CMainSignals() {

mchrostowski commented at 7:33 PM on May 3, 2017:

This would be safer/faster/cleaner with : internals(new CMainSignalsInstance()) {} instead of the body.

Initializer lists guarantee proper cross-thread visibility, otherwise you might init twice and have sharing issues.

TheBlueMatt commented at 1:35 AM on May 4, 2017:

Does that compile? CMainSignalsInstance() is not defined at that time, only declared.

mchrostowski commented at 6:00 PM on May 4, 2017:

Since it is only a declaration I believe it should have been fine. I see this code isn't present in the final commit so I suppose it doesn't matter either way.

TheBlueMatt commented at 6:58 PM on May 4, 2017:

Ahh, yes, since the rebase on the latest version of #10179 this code no longer exists, as the scheduler has to be passed into the creation of the internals object.

in src/scheduler.h:44 in d923d41826 outdated

  42 | @@ -43,6 +43,9 @@ class CScheduler
  43 |      // Call func at/after time t
  44 |      void schedule(Function f, boost::chrono::system_clock::time_point t);

mchrostowski commented at 8:03 PM on May 3, 2017:

You can accomplish this entire commit by changing this line to void schedule(Function f, boost::chrono::system_clock::time_point t = boost::chrono::system_clock::now());

TheBlueMatt commented at 1:37 AM on May 4, 2017:

Took this on #10179, will be here when I next rebase.

in src/validationinterface.cpp:41 in 8daf243979 outdated

  36 | +    bool fCallbacksRunning = false;
  37 | +
  38 | +    void MaybeScheduleProcessQueue() {
  39 | +        {
  40 | +            LOCK(cs_callbacksPending);
  41 | +            // Try to avoid scheduling too many copies here, but if we

mchrostowski commented at 8:13 PM on May 3, 2017:

This comment and issue can be avoided entirely if you move line 56 up to 46. After that lines 54 and 55 (which will be 55 and 56) can be removed.

ryanofsky commented at 1:04 AM on May 4, 2017:

This comment and issue can be avoided entirely if you move line 56 up to 46. After that lines 54 and 55 (which will be 55 and 56) can be removed.

I think this is right (line numbers apply to commit 8daf2439796dfdee41c1a32787e0ec9726daf6be). It also seems like you could eliminate the fCallbacksRunning variable if you change ProcessQueue to call pop_front after running the callback and condition the AddToProcessQueue schedule() call on the queue being previously empty.

TheBlueMatt commented at 1:23 AM on May 4, 2017:

Looks like you commented on an outdated version and github wont show me full context, so I have no idea what those line numbers refer to :/

mchrostowski commented at 7:39 PM on May 4, 2017:

@ryanofsky I believe the point is to avoid duplicate calls to the scheduler since it may be multi-threaded. So a call to AddToProcessQueue should not schedule() anything if we're already scheduled; it should only schedule() if our previously scheduled function has completed execution (at least beyond the point of it calling schedule() again). Can't see implementing that without knowing if fCallbacksRunning is true.

Still, the code does not guarantee single threaded execution in its current form due to the fCallbacksRunning state being set AFTER the call to schedule(). Being that all the locks are placed appropriately we might as well make this guarantee or else it seems like a bug because the SingleThreadedClient won't be.

I'll see if I can't get a test showing this behavior.

ryanofsky commented at 7:53 PM on May 4, 2017:

@ryanofsky I believe the point is to avoid duplicate calls to the scheduler since it may be multi-threaded.

I know, this is why the second half of my suggestion was "condition the AddToProcessQueue schedule() call on the queue being previously empty." Anyway, I don't think Matt's particularly interested in these simplifications, and it's easier to communicate these changes as patches rather than english descriptions, so I'd rather just leave any simplifications to followup PRs.

Still, the code does not guarantee single threaded execution in its current form due to the fCallbacksRunning state being set AFTER the call to schedule()

The reason it works in its current form is because of the if (fCallbacksRunning) return; line at the top of ProcessQueue()

Again, I don't think the code in it's current form is the simplest it could be, but it seems safe and easy to clean up later in a followup PR. Also this whole discussion really should be moved to #10179. #10286 is only building on the changes in #10179.

mchrostowski commented at 9:02 PM on May 4, 2017:

@ryanofsky I see that now, the extra check does prevent the execution.

Being that this is new code I wouldn't call it a simplification. Here's a patch of the proposed change, less logic with the same function: scheduler.patch.txt

which reads better if you rename fCallbacksRunning to fCallbacksScheduled and this patch: scheduler.patch2.txt which can be argued reduces code reuse but I think the readability is improved.

mchrostowski commented at 9:05 PM on May 4, 2017:

@TheBlueMatt If you're open to these changes in a PR to your branch I can do that, I assume they'll be squashed so either way works.

TheBlueMatt commented at 9:20 PM on May 4, 2017:

@mchrostowski hmm, really, I find that it decreases readability (though that may be NIH). It looks harder to reason about whether some callbacks might accidentally get missed to me.

(Other random note, we dont use tabs in our codebase, which your patch added).

mchrostowski commented at 9:42 PM on May 4, 2017:

@TheBlueMatt Well, in that case I feel like either patch gets funky, especially since the use of fCallbacksRunning becomes inconsistent if you apply the first patch without the second (unless some alternative name for fCallbacksRunning works).

The extra safety check and inconsistency of scheduling bothers me but I wouldn't expect it to actually cause issues so I have no grounds for objection.

I think my inquiry stemmed from it not being immediately apparent that ProcessQueue() only runs once and the extra check is just an extra check. Perhaps the "not a big deal" part of the comment could be "because ProcessQueue() already checks" for clarity, I would not have looked so deeply into the code except that I thought "not a big deal" meant "sometimes interweaving calls is okay."

in src/validationinterface.cpp:62 in 8daf243979 outdated

  57 | +
  58 | +            callback = callbacksPending.front();
  59 | +            callbacksPending.pop_front();
  60 | +        }
  61 | +
  62 | +        // RAII the setting of fCallbacksRunning and calling MaybeScheduleProcessQueue

mchrostowski commented at 8:26 PM on May 3, 2017:

RAII is great and all but exists for the acquisition of resources. Why not try{} catch{}? try { callback(); } catch(...) { { LOCK(cs_callbacksPending); fCallbacksRunning = false; } MaybeScheduleProcessQueue(); }

ryanofsky commented at 1:13 AM on May 4, 2017:

Why not try{} catch{}?

My guess about this was that it allows the processqueue to take advantage of whatever error handling or reporting cscheduler provides, and to not have to repeat the finalization logic both inside and after the catch clause. Either approach seems fine to me, though.

TheBlueMatt commented at 1:23 AM on May 4, 2017:

@laanwj previously requested that any exceptions be thrown all the way up, so this was an easier way to do that. That request seemed reasonable.

mchrostowski commented at 7:01 PM on May 4, 2017:

Understood, makes perfect sense. Didn't occur to me we'd have to duplicate the logic, spoiled by finally.

in src/validationinterface.cpp:72 in 8daf243979 outdated

  67 | +            ~RAIICallbacksRunning() {
  68 | +                {
  69 | +                    LOCK(instance->cs_callbacksPending);
  70 | +                    instance->fCallbacksRunning = false;
  71 | +                }
  72 | +                instance->MaybeScheduleProcessQueue();

mchrostowski commented at 8:35 PM on May 3, 2017:

Is this really what we want? A scheduler call for each callback? It does prevent starving any other scheduled tasks in case of a long queue, but it also generates a lot of lock contention which can be a performance killer.

Unless there is evidence of this queue processing messing with other scheduling I feel strongly we should avoid this design. It will be much harder to detect performance issues from lock contention than performance issues from the processing of a long queue.

I recommend replacing line 45 with a size() query, -- the size (to account for your pop), and put the entire thing in a do {} while (size > 0);. You can then avoid calling MaybeScheduleProcessQueue() inside of ProcessQueue() itself.

ryanofsky commented at 1:20 AM on May 4, 2017:

Is this really what we want? A scheduler call for each callback?

I'm guessing the work done in any of these callbacks far outweighs the cost of scheduling one of them but I could be wrong.

TheBlueMatt commented at 1:25 AM on May 4, 2017:

Right now we only have one scheduler thread, so there should be limited lock contention, not to mention anything running is gonna take much longer than any locks held (locks are only held to push/pop/whatever, which should be very, very fast).

mchrostowski commented at 7:20 PM on May 4, 2017:

Okay, sounds like blocking/starving is the biggest risk. Can't disagree with that.

Feels like a bit of wasted work to call schedule on things you know are intended to execute now, but that's just a performance (not important) concern. A bit surprising there is only one scheduler thread when the scheduler counts threads, but indeed it is only one.

in src/scheduler.h:90 in 0703075a71 outdated

  88 | + * Class used by CScheduler clients which may schedule multiple jobs
  89 | + * which are required to be run serially. Does not require such jobs
  90 | + * to be executed on the same thread, but no two jobs will be executed
  91 | + * at the same time.
  92 | + */
  93 | +class CSingleThreadedSchedulerClient {

mchrostowski commented at 8:40 PM on May 3, 2017:

This is no longer a scheduler. It has one public method, void AddToProcessQueue(std::function<void (void)> func);, which does not take any 'schedule' information.

This class is neat, more of a SingleThreadedExecutor that happens to use a scheduler to execute. Really its treating the scheduler as a thread pool.

I'm all for keeping this if it's not named 'scheduler' and if a thread pool abstraction can be extracted from CScheduler then both this class and CScheduler can use that pool for execution. Also to consider, is this used anywhere else yet or is it expected to be used?

TheBlueMatt commented at 1:39 AM on May 4, 2017:

Its unlikely to be used elsewhere, but as it is more intimate with the CScheduler than the validation interface, it was abstracted out and put here. We should probably tweak up how it all works in a later PR (as we move off of the big boost threadGroup in init), but for now I'll leave it.

in src/wallet/wallet.h:1129 in bc3f415e09 outdated

1112 | @@ -1104,6 +1113,14 @@ class CWallet : public CCryptoKeyStore, public CValidationInterface
1113 |         caller must ensure the current wallet version is correct before calling
1114 |         this function). */
1115 |      bool SetHDMasterKey(const CPubKey& key);
1116 | +
1117 | +    /**
1118 | +     * Blocks until the wallet state is up-to-date to /at least/ the current
1119 | +     * chain at the time this function is entered
1120 | +     * Obviously holding cs_main/cs_wallet when going into this call may cause

mchrostowski commented at 8:46 PM on May 3, 2017:

Odd comment, "Obviously... may..." is concerning enough that we should have a comment explaining how to avoid a deadlock rather than this vagueness or perhaps removing the statement altogether?

TheBlueMatt commented at 1:31 AM on May 4, 2017:

Any suggestions? The statement indicates that "holding cs_main/cs_wallet may cause deadlock", this is true, deadlock is not guaranteed, but may appear, thus you should obviously never call with cs_main or cs_wallet held.

mchrostowski commented at 8:00 PM on May 4, 2017:

I'd just drop "Obviously" now that I'm more familiar with the method.

in src/wallet/wallet.cpp:1265 in bc3f415e09 outdated

1152 | @@ -1147,6 +1153,50 @@ void CWallet::BlockDisconnected(const std::shared_ptr<const CBlock>& pblock) {
1153 |  
1154 |  
1155 |  
1156 | +void CWallet::BlockUntilSyncedToCurrentChain() {

mchrostowski commented at 8:55 PM on May 3, 2017:

This method is concerning. It may be that it is being used in a safe manner but the method itself is quite dangerous. Preliminary observation suggests this can be called from both the command line RPC and JSON RPC at the same time but I don't know how true this is.

Calling it from two different threads appears to be not okay, so it is "Not thread safe" and should likely be labeled as such (though I don't see this as a standard in the project codebase so maybe that's going a bit far).

TheBlueMatt commented at 1:30 AM on May 4, 2017:

I'm not sure how it is not thread safe? It blocks the current thread, not any other work.

mchrostowski commented at 8:06 PM on May 4, 2017:

I thought unsafe due to the "AssertLockNotHeld" that I misunderstood. My above comment is totally wrong and can be disregarded.

in src/wallet/wallet.cpp:1267 in bc3f415e09 outdated

1158 | +    {
1159 | +        LOCK(cs_main);
1160 | +        initialChainTip = chainActive.Tip();
1161 | +    }
1162 | +    AssertLockNotHeld(cs_main);
1163 | +    AssertLockNotHeld(cs_wallet);

mchrostowski commented at 9:11 PM on May 3, 2017:

If these assertions need to be held for this method to execute correctly then the method cannot be thread safe as itself being called twice, in two threads, is enough to cause a failure.

TheBlueMatt commented at 1:29 AM on May 4, 2017:

The AssertLockNotHeld call only fails if the current thread holds the lock, not any thread.

mchrostowski commented at 8:03 PM on May 4, 2017:

boost::thread_specific_ptr<LockStack> lockstack; Didn't see that, makes perfect sense then.

in src/validationinterface.cpp:107 in b9acf32eb6 outdated

 103 | @@ -104,7 +104,9 @@ void UnregisterAllValidationInterfaces() {
 104 |  
 105 |  void CMainSignals::MempoolEntryRemoved(CTransactionRef ptx, MemPoolRemovalReason reason) {
 106 |      if (reason != MemPoolRemovalReason::BLOCK && reason != MemPoolRemovalReason::CONFLICT) {
 107 | -        internals->TransactionRemovedFromMempool(ptx);
 108 | +        internals->schedulerClient.AddToProcessQueue([ptx, this] {

mchrostowski commented at 9:24 PM on May 3, 2017:

This is a fine solution. All these lock inversion concerns make me wonder if there isn't a more serious issue regarding lack of proper encapsulation with some of these locks. I'm sure global locks (cs_main) don't help either, can't imagine actually needing a global lock.

TheBlueMatt commented at 1:19 AM on May 4, 2017:

This is being run in a background thread, so there are no possible lock inversions? We've had great success with DEBUG_LOCKORDER and havent had any serious deadlock issues afair since the 0.3.X era.

mchrostowski changes_requested

mchrostowski commented at 11:16 PM on May 3, 2017: contributor

Overall appears to be on a good track. It looks to me like the global lock (cs_main) is causing some serious confusion/issues and there is some general mistake in the pattern of locks or their encapsulation that makes this all difficult.

I reviewed everything pretty closely aside from the ZMQ test changes, that went over my head.

Please don't overlook the outdated validationinterface.cpp comments, those took some time to put together.

in doc/developer-notes.md:574 in 943217460b outdated

 567 | @@ -568,3 +568,16 @@ A few guidelines for introducing and reviewing new RPC interfaces:
 568 |    - *Rationale*: as well as complicating the implementation and interfering
 569 |      with the introduction of multi-wallet, wallet and non-wallet code should be
 570 |      separated to avoid introducing circular dependencies between code units.
 571 | +
 572 | +- Wallet RPCs call BlockUntilSyncedToCurrentChain to maintain consistency with
 573 | +  `getblockchaininfo`'s state immediately prior to the call's execution. Wallet
 574 | +  RPCs who's behavior does *not* depend on the current chainstate may omit this

ryanofsky commented at 8:05 PM on May 4, 2017:

In commit "Add a dev notes document describing the new wallet RPC blocking"

s/who's/whose

in doc/developer-notes.md:580 in 943217460b outdated

 575 | +  call.
 576 | +
 577 | +  - *Rationale*: In previous versions of Bitcoin Core, the wallet was always
 578 | +    in-sync with the chainstate (by virtue of them all being updated in the
 579 | +    same cs_main lock). In order to maintain the behavior that wallet RPCs
 580 | +    return restults as of at least the highest best-known-block an RPC

ryanofsky commented at 8:06 PM on May 4, 2017:

In commit "Add a dev notes document describing the new wallet RPC blocking"

s/restults/results, s/best-known-block/best-known block,/

mchrostowski commented at 8:13 PM on May 4, 2017: contributor

So far looks good to me, I'm going to poke around wallet<->blockchain interaction so I can better understand the wallet.cpp changes you made before I comment further.

That said I feel like there is something fundamentally wrong with the interaction between CWallet and the blockchain (I don't even know where that code lives yet). This feels like a solution to current issues but I would hope a refactor prevents needing such frequently occurring checks and thread specific execution.

That said, it would be interesting if a single "blockchain operations" thread is needed to get the code working in a reasonable manner. GUI systems tend to need such threads because of the nature of problem which is events coming from both ends of the system (rendering thread vs input thread) which would have to, in any logical design, invert lock ordering or else excessively block.

ryanofsky commented at 8:16 PM on May 4, 2017: member

utACK 943217460bc527c4003868415c264e4a77a6e55a.

Changes since previous ACK were rebasing, adding developer notes, tweaking some comments, adding a check to the zmq test

TheBlueMatt commented at 8:25 PM on May 4, 2017: member

@mchrostowski thanks for the review! Generally, wallet and blockchain (essentially validation.cpp's stuff) have historically been pretty tightly coupled (updated all under the same cs_main lock). This PR is a step, however small, towards decoupling that a bit. Because the wallet still relies on "is it in our mempool?" as a proxy for "is this possibly going to get confirmed/is it spendable with the result making it into my mempool", the chainstate and mempool still need to be updated in-sync and the wallet notified of the updates in a single notification, which come in in the order they happened. Still, this is much better than the wallet actually querying the mempool/validation logic directly instead of tracking the stuff it cares about out of them.

TheBlueMatt force-pushed on May 4, 2017

TheBlueMatt commented at 8:55 PM on May 4, 2017: member

Rebased on latest #10179, current master, and fixed @ryanofsky's english corrections.

TheBlueMatt force-pushed on May 5, 2017

ryanofsky commented at 9:00 PM on May 5, 2017: member

utACK 2c306d7876fb57ff26d217f97415a79942094002. Changes since previous were some documentation tweaks and new block calls in "Add calls to CWallet::BlockUntilSyncedToCurrentChain()", along with the rebase.

mchrostowski commented at 10:12 PM on May 13, 2017: contributor

@TheBlueMatt The stated purpose of this PR is to reduce locking on cs_main so as to reducing code coupling. I see one change in this PR that actually deletes a LOCK(cs_main) which is in CWallet::InMempool(). This looks like a step in the right direction.

That said, the remaining changes seem to be all about getting the signals into a background thread. What does this gain us for decoupling?

If the purpose is to not hold cs_main from whatever call sites hit GetMainSignals() then I don't see a benefit in using new threads when we can release the lock before making the call. That is, running in another thread is not decoupling synchronization or interface dependencies. The goal of "move wallet updates out of cs_main into a background thread" seems unrelated to decoupling because "using a background thread" and "not holding cs_main" are not dependent on each other, at least in the cases I observed.

Of particular concern is the re-ordering of calls, which you're avoiding with the single threaded scheduler, but if this isn't required for decoupling then it is just added complexity and overhead.

ryanofsky commented at 4:14 PM on May 30, 2017: member

This needs rebase due to a minor conflict in listunspent.

in src/wallet/wallet.cpp:1189 in 6cbe04c9cf outdated

1184 | +                    return true;
1185 | +                }
1186 | +                // If the user called invalidateblock our wait here might block
1187 | +                // forever, so we check if we're ahead of the tip (a state
1188 | +                // which should otherwise never be exposed outside of validation)
1189 | +                return this->lastBlockProcessed->nChainWork > chainActive.Tip()->nChainWork;

ryanofsky commented at 8:00 PM on May 31, 2017:

In commit "Add CWallet::BlockUntilSyncedToCurrentChain()"

I don't understand why we would wait forever without this check. Does InvalidateBlock not trigger notifications that would lead to lastBlockProcessed being updated? And if it doesn't, shouldn't this just be fixed so the right notifications are sent?

in src/wallet/wallet.cpp:1181 in 6cbe04c9cf outdated

1176 | +            // moved past initialChainTip through a reorg before we could get
1177 | +            // lastBlockProcessedMutex.
1178 | +            // This should be exceedingly rare in regular usage, so potentially
1179 | +            // eating 100ms to retry this lock should be fine (not TRY_LOCKing
1180 | +            // here would be a lock inversion against lastBlockProcessedMutex)
1181 | +            TRY_LOCK(cs_main, mainLocked);

ryanofsky commented at 8:05 PM on May 31, 2017:

In commit "Add CWallet::BlockUntilSyncedToCurrentChain()"

Maybe consider dropping the try-lock and replacing it with lastBlockProcessedMutex.unlock(); LOCK(cs_main); lastBlockProcessedMutex.lock();. Maybe this would be a little slower in the average case where this code runs (which is rare to begin with), but it would avoid the 100ms worst case, and make the code simpler because you could also replace the while loop and timeout with a plain cv.wait(lock, pred) call.

TheBlueMatt force-pushed on Jun 8, 2017

TheBlueMatt commented at 3:32 PM on June 8, 2017: member

Rebased and rewrote CWallet::BlockUntilSyncedToCurrentChain(). Instead of the complicated fallback logic, it now just tests if it is caught up, and if it is not, it puts a callback into the CValidationInterface queue and waits for it to trigger. I wanted to avoid having this function previously, but I ended up needing it for a different branch which moves more CValidationInterface callbacks to the background and the logic is so simple, that I went ahead with it.

in src/validationinterface.cpp:106 in 09c604afe7 outdated

 101 | @@ -102,6 +102,10 @@ void UnregisterAllValidationInterfaces() {
 102 |      g_signals.m_internals->NewPoWValidBlock.disconnect_all_slots();
 103 |  }
 104 |  
 105 | +void CallFunctionInValidationInterfaceQueue(std::function<void ()> func) {
 106 | +    g_signals.m_internals->m_schedulerClient.AddToProcessQueue(func);

ryanofsky commented at 8:40 PM on June 8, 2017:

In commit "Add CallFunctionInQueue to wait on validation interface queue drain"

Would std::move(func)

in src/wallet/wallet.cpp:1169 in 5302405652 outdated

1164 | +        // Skip the queue-draining stuff if we know we're caught up with
1165 | +        // chainActive.Tip()
1166 | +        LOCK(cs_main);
1167 | +        const CBlockIndex* initialChainTip = chainActive.Tip();
1168 | +
1169 | +        if (m_last_block_processed == initialChainTip) {

ryanofsky commented at 8:48 PM on June 8, 2017:

In commit "Add CWallet::BlockUntilSyncedToCurrentChain()"

Maybe drop this check. Seems to be a special case of the check below which isn't actually more expensive.

in src/wallet/wallet.cpp:1177 in 5302405652 outdated

1172 | +        if (m_last_block_processed->GetAncestor(initialChainTip->nHeight) == initialChainTip) {
1173 | +            return;
1174 | +        }
1175 | +    }
1176 | +
1177 | +    std::condition_variable callbacks_done_cv;

ryanofsky commented at 9:03 PM on June 8, 2017:

In commit "Add CWallet::BlockUntilSyncedToCurrentChain()"

I think all the mutex/cv/lambda/looping stuff below could be replaced by:

std::promise<void> promise;
CallFunctionInValidationInterfaceQueue([&promise]() { promise.set_value(); });
promise.get_future().wait();

ryanofsky commented at 9:06 PM on June 8, 2017: member

utACK 91aad9fa3b33ae387145e4b84a14c5f61cbd2494. Changes since last review were rebase, style guide fixes, BlockUntilSyncedToCurrentChain rewrite.

TheBlueMatt force-pushed on Jun 9, 2017

ryanofsky commented at 5:23 PM on June 13, 2017: member

utACK ae0c83326bdae2b61c1bcffd122f78468aa2aefb. Same as previous except the last 3 review suggestions are now incorporated. There is a minor conflict now with master, but since this depends on #10179 anyway, there's probably no hurry to rebase.

in src/rpc/rawtransaction.cpp:890 in ef52453c96 outdated

 885 | +            // If wallet is enabled, ensure that the wallet has been made aware
 886 | +            // of the new transaction prior to returning. This prevents a race
 887 | +            // where a user might call sendrawtransaction with a transaction
 888 | +            // to/from their wallet, immediately call some wallet RPC, and get
 889 | +            // a stale result because callbacks have not yet been processed.
 890 | +            pwalletMain->TransactionAddedToMempool(tx);

ryanofsky commented at 7:53 PM on June 19, 2017:

In commit "Fix wallet RPC race by informing wallet of tx in sendrawtransaction"

Seems like this will result in the wallet getting two TransactionAddedToMempool notifications, which is fine but might be worth noting in the comment.

Also, not asking for this change, but would another way to do this without referencing the wallet here be to release cs_main and then wait for the other notification to be processed? (Maybe using CallFunctionInValidationInterfaceQueue like in BlockUntilSyncedToCurrentChain?)

TheBlueMatt commented at 1:22 AM on June 21, 2017:

Yes, much better to CallFunctionInValidatioInterface. Done.

TheBlueMatt force-pushed on Jun 21, 2017

in src/sync.cpp:161 in 176612666d outdated

 155 | @@ -156,6 +156,15 @@ void AssertLockHeldInternal(const char* pszName, const char* pszFile, int nLine,
 156 |      abort();
 157 |  }
 158 |  
 159 | +void AssertLockNotHeldInternal(const char* pszName, const char* pszFile, int nLine, void* cs)
 160 | +{
 161 | +    BOOST_FOREACH (const PAIRTYPE(void*, CLockLocation) & i, *lockstack)

ryanofsky commented at 2:27 PM on June 21, 2017:

In commit "Add ability to assert a lock is not held in DEBUG_LOCKORDER"

Looks like there are travis errors compiling this code (undefined FOREACH/PAIRTYPE).

in src/rpc/rawtransaction.cpp:896 in c05c1c216a outdated

 891 | +            // a stale result because callbacks have not yet been processed.
 892 | +            std::promise<void> promise;
 893 | +            CallFunctionInValidationInterfaceQueue([&promise] {
 894 | +                promise.set_value();
 895 | +            });
 896 | +            promise.get_future().wait();

ryanofsky commented at 2:38 PM on June 21, 2017:

In commit "Fix wallet RPC race by waiting for callbacks in sendrawtransaction"

I think you might need to release cs_main before waiting for the promise, because the wallet handler in the notification thread will want to acquire it.

FWIW, the way I implemented this in my wallet ipc branch was to call CallFunctionInValidationInterfaceQueue the same way you are doing here, but release cs_main before the PushInventory calls, and wait for the promise after the PushInventory calls. I also updated the comment.

--- a/src/rpc/rawtransaction.cpp
+++ b/src/rpc/rawtransaction.cpp
@@ -29,6 +29,7 @@
 #include "wallet/wallet.h"
 #endif
 
+#include <future>
 #include <stdint.h>
 
 #include <univalue.h>
@@ -846,7 +847,6 @@ UniValue sendrawtransaction(const JSONRPCRequest& request)
             + HelpExampleRpc("sendrawtransaction", "\"signedhex\"")
         );
 
-    LOCK(cs_main);
     RPCTypeCheck(request.params, {UniValue::VSTR, UniValue::VBOOL});
 
     // parse hex string from parameter
@@ -860,6 +860,9 @@ UniValue sendrawtransaction(const JSONRPCRequest& request)
     if (request.params.size() > 1 && request.params[1].get_bool())
         nMaxRawTxFee = 0;
 
+    std::promise<void> sent_notifications;
+    {
+    LOCK(cs_main);
     CCoinsViewCache &view = *pcoinsTip;
     bool fHaveChain = false;
     for (size_t o = 0; !fHaveChain && o < tx->vout.size(); o++) {
@@ -881,15 +884,6 @@ UniValue sendrawtransaction(const JSONRPCRequest& request)
                 }
                 throw JSONRPCError(RPC_TRANSACTION_ERROR, state.GetRejectReason());
             }
-#ifdef ENABLE_WALLET
-        } else {
-            // If wallet is enabled, ensure that the wallet has been made aware
-            // of the new transaction prior to returning. This prevents a race
-            // where a user might call sendrawtransaction with a transaction
-            // to/from their wallet, immediately call some wallet RPC, and get
-            // a stale result because callbacks have not yet been processed.
-            pwalletMain->TransactionAddedToMempool(tx);
-#endif
         }
     } else if (fHaveChain) {
         throw JSONRPCError(RPC_TRANSACTION_ALREADY_IN_CHAIN, "transaction already in block chain");
@@ -897,11 +891,22 @@ UniValue sendrawtransaction(const JSONRPCRequest& request)
     if(!g_connman)
         throw JSONRPCError(RPC_CLIENT_P2P_DISABLED, "Error: Peer-to-peer functionality missing or disabled");
 
+    CallFunctionInValidationInterfaceQueue([&sent_notifications] { sent_notifications.set_value(); });
+    } // LOCK(cs_main)
+
     CInv inv(MSG_TX, hashTx);
     g_connman->ForEachNode([&inv](CNode* pnode)
     {
         pnode->PushInventory(inv);
     });
+
+    // Wait for any TransactionAddedToMempool notifications sent above to be
+    // processed by thw wallet. This prevents a race where a user might call
+    // sendrawtransaction with a transaction to/from their wallet, immediately
+    // call some wallet RPC, and get a stale result because callbacks have not
+    // yet been processed.
+    sent_notifications.get_future().wait();
+
     return hashTx.GetHex();
 }

TheBlueMatt commented at 2:55 PM on June 21, 2017:

Heh, beat you to it.

TheBlueMatt force-pushed on Jun 21, 2017

ryanofsky commented at 2:53 PM on June 21, 2017: member

utACK db1d712d5668496ca83947cf9d6c4a39d3a3af1f. Changes since last review: rebase, sendraw change, naming style updates, added std::function includes.

in src/rpc/rawtransaction.cpp:917 in b5410f8480 outdated

 912 | +        // a stale result because callbacks have not yet been processed.
 913 | +        std::promise<void> promise;
 914 | +        CallFunctionInValidationInterfaceQueue([&promise] {
 915 | +            promise.set_value();
 916 | +        });
 917 | +        promise.get_future().wait();

ryanofsky commented at 3:10 PM on June 21, 2017:

In commit "Fix wallet RPC race by waiting for callbacks in sendrawtransaction"

This seems right. Possible tweaks:

should_wait_on_validationinterface seems like it is always true at this point, maybe the variable is not needed.
Might be better to call CallFunctionInValidationInterfaceQueue before releasing cs_main, to avoid waiting for notifications that might be queued in the meantime that we don't care about.
Might be cheaper / simpler just to hold on to the tx reference (remove std::move) instead of copying the hash to a temporary variable.
Maybe release cs_main before the PushInventory loop.

ryanofsky commented at 3:12 PM on June 21, 2017: member

utACK e3617741430df7d21d807549041f7251ba5b91e5. Only change since last review was sendraw locking fix.

TheBlueMatt force-pushed on Jun 21, 2017

in src/rpc/rawtransaction.cpp:977 in a40c312e6b outdated

 901 |          throw JSONRPCError(RPC_TRANSACTION_ALREADY_IN_CHAIN, "transaction already in block chain");
 902 |      }
 903 | +
 904 | +    } // cs_main
 905 | +
 906 | +    promise.get_future().wait();

ryanofsky commented at 5:38 PM on June 22, 2017:

In commit "Fix wallet RPC race by waiting for callbacks in sendrawtransaction"

Might be more efficient to wait for the promise after the PushInventory calls so they aren't blocked waiting for wallets.

TheBlueMatt commented at 5:43 PM on June 22, 2017:

I figured put it above the !g_connman check to make sure we block even if at some point in the future we support running without net/connman. Dont feel strongly either way, though

theuni commented at 9:14 PM on November 15, 2017:

Without net/connman, I'd expect this RPC to fail immediately anyway.

ryanofsky commented at 5:40 PM on June 22, 2017: member

utACK 68201ec021944b1216e321a572060a7fd5183cbe. Just AssertLockNotHeldInternal foreach fix and sendrawtransaction tweaks since last review.

laanwj added the label Wallet on Jun 28, 2017

TheBlueMatt force-pushed on Jul 7, 2017

TheBlueMatt force-pushed on Jul 11, 2017

TheBlueMatt commented at 3:15 PM on July 11, 2017: member

Rebased. Would be nice to get this in early in 16 to let it simmer on master.

ryanofsky commented at 5:17 PM on July 25, 2017: member

utACK 964f376b6a34294d328b634ef045b11079b0859d. No changes since last review, only rebase.

jnewbery commented at 8:58 PM on August 11, 2017: member

I haven't fully reviewed yet, but @ryanofsky pointed me to this for m_last_block_processed. Should that be rewound in DisconnectBlock()?

ryanofsky commented at 9:06 PM on August 11, 2017: member

I think it should be safe to update m_last_block_processed in disconnectblock. It just isn't needed for this pr, because m_last_block_processed is used here in a pretty limited way.

in src/validationinterface.cpp:111 in 3e3dfecccc outdated

  96 | @@ -85,10 +97,17 @@ void UnregisterAllValidationInterfaces() {
  97 |      g_signals.m_internals->TransactionAddedToMempool.disconnect_all_slots();
  98 |      g_signals.m_internals->BlockConnected.disconnect_all_slots();
  99 |      g_signals.m_internals->BlockDisconnected.disconnect_all_slots();
 100 | +    g_signals.m_internals->TransactionRemovedFromMempool.disconnect_all_slots();
 101 |      g_signals.m_internals->UpdatedBlockTip.disconnect_all_slots();
 102 |      g_signals.m_internals->NewPoWValidBlock.disconnect_all_slots();
 103 |  }
 104 |  
 105 | +void CMainSignals::MempoolEntryRemoved(CTransactionRef ptx, MemPoolRemovalReason reason) {

promag commented at 2:45 AM on August 12, 2017:

const CTransactionRef& ptx, like TransactionAddedToMempool?

TheBlueMatt commented at 7:30 PM on August 14, 2017:

This isnt a new function, it would have to be changed in txmempool.h

in src/validationinterface.cpp:112 in 3e3dfecccc outdated

 101 |      g_signals.m_internals->UpdatedBlockTip.disconnect_all_slots();
 102 |      g_signals.m_internals->NewPoWValidBlock.disconnect_all_slots();
 103 |  }
 104 |  
 105 | +void CMainSignals::MempoolEntryRemoved(CTransactionRef ptx, MemPoolRemovalReason reason) {
 106 | +    if (reason != MemPoolRemovalReason::BLOCK && reason != MemPoolRemovalReason::CONFLICT) {

promag commented at 2:45 AM on August 12, 2017:

Why filtering?

TheBlueMatt commented at 7:30 PM on August 14, 2017:

Because BLOCK and CONFLICT are passed out in BlockConnectedDisconnected (see comment in validationinterface.h which describes in what cases this is called).

in src/wallet/wallet.cpp:1288 in 71f168a2a6 outdated

1184 | +            return;
1185 | +        }
1186 | +    }
1187 | +
1188 | +    std::promise<void> promise;
1189 | +    CallFunctionInValidationInterfaceQueue([&promise] {

promag commented at 2:56 AM on August 12, 2017:

This function is only introduced in a later commit.

TheBlueMatt commented at 7:32 PM on August 14, 2017:

I think you've been bitten by GitHub's obnoxious order-commits-by-date-instead-of-git-ordering bug?

in src/wallet/wallet.h:338 in b9bd72959b outdated

 334 | @@ -337,6 +335,7 @@ class CWalletTx : public CMerkleTx
 335 |      mutable CAmount nImmatureWatchCreditCached;
 336 |      mutable CAmount nAvailableWatchCreditCached;
 337 |      mutable CAmount nChangeCached;
 338 | +    mutable bool fInMempool;

promag commented at 3:00 AM on August 12, 2017:

Move up for better packing?

promag commented at 3:02 AM on August 12, 2017:

Instead of fInMempool for each wallet transaction, maybe a std::set<WalletTx*> with the mempool transactions takes less memory?

TheBlueMatt commented at 7:34 PM on August 14, 2017:

Moved it up. At some point someone can go through and mark all the bools as 1 bit, I'd think.

in src/rpc/misc.cpp:81 in 15e5d05f84 outdated

  74 | @@ -75,6 +75,9 @@ UniValue getinfo(const JSONRPCRequest& request)
  75 |  #ifdef ENABLE_WALLET
  76 |      CWallet * const pwallet = GetWalletForJSONRPCRequest(request);
  77 |  
  78 | +    if (pwallet) {

promag commented at 3:04 AM on August 12, 2017:

Move this to EnsureWalletIsAvailable?

TheBlueMatt commented at 7:33 PM on August 14, 2017:

Not all functions which want wallet need to do this, though.

promag commented at 3:07 AM on August 12, 2017: member

Concept ACK.

TheBlueMatt force-pushed on Aug 14, 2017

TheBlueMatt commented at 7:34 PM on August 14, 2017: member

Rebased.

in src/wallet/wallet.cpp:4398 in daab5ea6d8 outdated

4395 | +    bool ret = ::AcceptToMemoryPool(mempool, state, tx, true, nullptr, nullptr, false, nAbsurdFee);
4396 | +    fInMempool = ret;
4397 | +    return ret;
4398 | +}
4399 | +
4400 | +bool CalculateEstimateType(FeeEstimateMode mode, bool opt_in_rbf) {

ryanofsky commented at 5:09 PM on August 16, 2017:

In commit "Use callbacks to cache whether wallet transactions are in mempool"

Looks like you accidentally resurrected this CalculateEstimateType function in the rebase. Should remove.

TheBlueMatt commented at 6:52 PM on August 17, 2017:

Oops, fixed.

ryanofsky commented at 5:13 PM on August 16, 2017: member

utACK 9adaf5f2c5a28e338dcf99d873f0bd40000696e7. Only change sinces the last review were rebase and fInMempool move

TheBlueMatt force-pushed on Aug 17, 2017

laanwj added this to the "Blockers" column in a project

TheBlueMatt force-pushed on Sep 12, 2017

TheBlueMatt commented at 5:07 PM on September 12, 2017: member

(Trivially) rebased.

in src/validationinterface.h:10 in 340164291f outdated

   6 | @@ -7,6 +7,7 @@
   7 |  #define BITCOIN_VALIDATIONINTERFACE_H
   8 |  
   9 |  #include <memory>
  10 | +#include <functional>

danra commented at 3:17 PM on September 29, 2017:

Since you're adding system includes, might as well move them to their correct place, after the user includes. Also, better if they're sorted

in src/wallet/wallet.h:728 in 901eb9f433 outdated

 723 | @@ -724,6 +724,14 @@ class CWallet final : public CCryptoKeyStore, public CValidationInterface
 724 |  
 725 |      std::unique_ptr<CWalletDBWrapper> dbw;
 726 |  
 727 | +    /**
 728 | +     * The following are used to keep track of how far behind the wallet is

danra commented at 3:53 PM on September 29, 2017:

are used->is used?

in src/wallet/wallet.cpp:3899 in 901eb9f433 outdated

3894 | @@ -3869,6 +3895,12 @@ CWallet* CWallet::CreateWalletFromFile(const std::string walletFile)
3895 |          if (walletdb.ReadBestBlock(locator))
3896 |              pindexRescan = FindForkInGlobalIndex(chainActive, locator);
3897 |      }
3898 | +
3899 | +    //We must set m_last_block_processed prior to registering the wallet as a validation interface

danra commented at 3:56 PM on September 29, 2017:

Missing space after //

in src/validationinterface.h:44 in 340164291f outdated

  31 | @@ -31,6 +32,11 @@ void RegisterValidationInterface(CValidationInterface* pwalletIn);
  32 |  void UnregisterValidationInterface(CValidationInterface* pwalletIn);
  33 |  /** Unregister all wallets from core */
  34 |  void UnregisterAllValidationInterfaces();
  35 | +/**
  36 | + * Pushes a function to callback onto the notification queue, guaranteeing any
  37 | + * callbacks generated prior to now are finished when the function is called.
  38 | + */
  39 | +void CallFunctionInValidationInterfaceQueue(std::function<void ()> func);

danra commented at 4:10 PM on September 29, 2017:

I think a better interface would be std::future<void> SyncWithValidationInterfaceQueue();

Whose implementation sets up the delayed call to a promise.set_value(), instead of this being the responsibility of the caller as it is now.

The sync functionality is the only purpose, so rather than have an overly-generic interface which allows calling any function, it's better to have a more specific interface which simplifies the interface as well as the setup work on the caller side.

TheBlueMatt commented at 6:21 PM on September 30, 2017:

Hmm..I have a few more branches which use this function for other stuff (eg one that will set a "allowed to continue processing" boolean on a CNodeState then call WakeMessageHandler). I think most of the use-cases for now could almost do fine with a future-like return value if std::future allowed you to query whether the future had yet completed. Still, since the promise-create-future-wait pattern isnt too many LoC, I suppose I'll leave it for now instead of adding a bunch of machinery just to create a special future-like return type.

danra commented at 7:44 PM on September 30, 2017:

I think most of the use-cases for now could almost do fine with a future-like return value if std::future allowed you to query whether the future had yet completed.

Wouldn't std::future::wait_for() with argument provide that functionality?

Orthogonally, if the intended interface really is to allow running arbitrary functions on the queue, than the current interface is fine. If it's just to sync with it, I think the interface I suggested is better.

TheBlueMatt commented at 8:21 PM on September 30, 2017:

Yea, could pull some hack like wait_for(1ns), but that's pretty gross. I think the purpose is somewhat of both - really it shouldn't be to do much more than sync, but sync may require rather arbitrary things (like calling WakeMessageHandler to wake up another thread which is waiting on some other unrelated condition in the example above). For now probably best to just leave it IMO.

danra commented at 8:23 PM on September 30, 2017:

Ok. btw I meant to write "with argument zero" above, so it's not hacky - it just checks if the std::future is done, without waiting.

in src/wallet/rpcwallet.cpp:2894 in 1b67af2ca0 outdated

2856 | @@ -2769,6 +2857,10 @@ UniValue listunspent(const JSONRPCRequest& request)
2857 |              nMaximumCount = options["maximumCount"].get_int64();
2858 |      }
2859 |  
2860 | +    // Make sure the results are valid at least up to the most recent block
2861 | +    // the user could have gotten from another RPC command prior to now
2862 | +    pwallet->BlockUntilSyncedToCurrentChain();

danra commented at 4:58 PM on September 29, 2017:

This should come after assert(pwallet != nullptr);

TheBlueMatt commented at 4:22 AM on October 1, 2017:

Removed that line instead (we dont have similar checks anywhere else and the case is already covered by EnsureWalletIsAvailable).

danra commented at 5:09 PM on September 29, 2017: contributor

BlockConnectedDisconnected in comments and commit messages should say BlockConnected/Disconnected, or Block[Connected|Disconnected]`, or something of the sort. The way it's written like now looks like there's an actual single function by that name.
In commit message 975f071b49b642b069c5e1de8d0050b663ca7c1c, "CSceduler background threa" -> "CScheduler background thread"

in src/validationinterface.cpp:146 in 4b6a0c9aad outdated

 142 |  
 143 |  void CMainSignals::SetBestChain(const CBlockLocator &locator) {
 144 | -    m_internals->SetBestChain(locator);
 145 | +    m_internals->m_schedulerClient.AddToProcessQueue([locator, this] {
 146 | +        m_internals->SetBestChain(locator);
 147 | +    });

danra commented at 8:32 PM on September 29, 2017:

I think this introduces a bug. CMainSignals::SetBestChain() might be called with a CBlockLocator object argument which goes out of scope and gets destructed before the queued m_internals->SetBestChain() is called, causing an invalid memory access. For example, this can happen in the walletInstance->SetBestChain(chainActive.GetLocator()); call in wallet.cpp).

danra commented at 8:33 PM on September 29, 2017:

On second thought I'm probably wrong because locator is captured by the lambda by value, not reference.

in src/rpc/rawtransaction.cpp:34 in a9629cb652 outdated

  31 | @@ -31,6 +32,7 @@
  32 |  #endif
  33 |  
  34 |  #include <stdint.h>
  35 | +#include <future>

danra commented at 8:36 PM on September 29, 2017:

sort

danra commented at 9:04 PM on September 29, 2017: contributor

utACK, looks like a very nice improvement. Reviewed everything except zmq which I have no idea about :) I think CallFunctionInValidationInterfaceQueue interface can be improved, detailed further in a comment above.

TheBlueMatt force-pushed on Oct 1, 2017

TheBlueMatt commented at 4:26 AM on October 1, 2017: member

Rebased and address @danra's comments.

TheBlueMatt force-pushed on Oct 1, 2017

in src/wallet/wallet.cpp:1288 in ac204f8f87 outdated

1276 | +            return;
1277 | +        }
1278 | +    }
1279 | +
1280 | +    std::promise<void> promise;
1281 | +    CallFunctionInValidationInterfaceQueue([&promise] {

jnewbery commented at 5:45 PM on October 11, 2017:

This is explained here: #10286 (comment)

it now just tests if it is caught up, and if it is not, it puts a callback into the CValidationInterface queue and waits for it to trigger

I think this should be a code comment rather than a github comment!

jnewbery commented at 6:56 PM on October 13, 2017:

New comment looks good. Thanks.

in src/validationinterface.cpp:120 in ac204f8f87 outdated

 114 | +    }
 115 | +}
 116 | +
 117 |  void CMainSignals::UpdatedBlockTip(const CBlockIndex *pindexNew, const CBlockIndex *pindexFork, bool fInitialDownload) {
 118 | -    m_internals->UpdatedBlockTip(pindexNew, pindexFork, fInitialDownload);
 119 | +    m_internals->m_schedulerClient.AddToProcessQueue([pindexNew, pindexFork, fInitialDownload, this] {

jnewbery commented at 6:52 PM on October 11, 2017:

There's a global variable in net_processing called nTimeBestReceived which gets updated to the current time in this callback. Now that the callback is asynchronous, it won't necessarily be updated immediately. Does that cause a problem when nTimeBestReceived is used in the Broadcast callback?

TheBlueMatt commented at 4:25 PM on October 13, 2017:

I believe this to be fine. nTimeBestReceived is passed to wallet as the first parameter to ResendWalletTransactions, and is effectively just called on a loop to resend wallet transactions when we need to. If it gets called with an nTimeBestReceived that is further back than we are, that should be fine, it just won't rebroadcast until the next time its called.

jnewbery commented at 6:55 PM on October 13, 2017:

Yep, sounds good. I thought this was ok, but I wasn't certain and wanted to check.

in src/validationinterface.cpp:111 in ac204f8f87 outdated

 104 |  
 105 | +void CallFunctionInValidationInterfaceQueue(std::function<void ()> func) {
 106 | +    g_signals.m_internals->m_schedulerClient.AddToProcessQueue(std::move(func));
 107 | +}
 108 | +
 109 | +void CMainSignals::MempoolEntryRemoved(CTransactionRef ptx, MemPoolRemovalReason reason) {

jnewbery commented at 7:00 PM on October 11, 2017:

Is there a reason that you've named the validation interface callback TransactionRemovedFromMempool() but the signal is named MempoolEntryRemoved. For all other callbacks in the validation interface, the callback name matches the signal name.

TheBlueMatt commented at 4:22 PM on October 13, 2017:

For symmetry with TransactionAddedToMempool. I'd rather rename the MempoolEntryRemoved signal in CTxMempool isntead, but, really, I cant say I have hugely strong feelings. Considering I have a PR lined up after this one to rework some of the mempool parts of this interface, maybe lets leave that for the next PR?

in src/wallet/wallet.h:732 in ac204f8f87 outdated

 724 | @@ -722,6 +725,14 @@ class CWallet final : public CCryptoKeyStore, public CValidationInterface
 725 |  
 726 |      std::unique_ptr<CWalletDBWrapper> dbw;
 727 |  
 728 | +    /**
 729 | +     * The following is used to keep track of how far behind the wallet is
 730 | +     * from the chain sync, and to allow clients to block on us being caught up
 731 | +     *
 732 | +     * Protected by cs_main

jnewbery commented at 7:00 PM on October 11, 2017:

I don't understand why this is protected by cs_main. Can you explain?

TheBlueMatt commented at 4:28 PM on October 13, 2017:

The code in BlockUntilSyncedToCurrentChain accesses this in a place that's going to need cs_main anyway (for chainActive.Tip()), could add a cs_wallet and call it protected by cs_wallet, but that just means an extra lock in BlockUntilSyncedToCurrentChain.

TheBlueMatt commented at 4:29 PM on October 13, 2017:

I added comments to explain this.

in src/wallet/wallet.cpp:1670 in ac204f8f87 outdated

1665 | @@ -1625,6 +1666,8 @@ CBlockIndex* CWallet::ScanForWalletTransactions(CBlockIndex* pindexStart, bool f
1666 |          ShowProgress(_("Rescanning..."), 100); // hide progress dialog in GUI
1667 |  
1668 |          fScanningWallet = false;
1669 | +
1670 | +        m_last_block_processed = chainActive.Tip();

jnewbery commented at 7:05 PM on October 11, 2017:

This will be set incorrectly if the rescan was aborted.

TheBlueMatt commented at 9:17 PM on October 12, 2017:

I just removed this and set it to chainActive.Tip() in CreateWalletFromFile. It doesnt make sense to try to use this to block wallet until we're caught up with the current tip in reorg, it isnt a regression so we should just leave it. When/if we expose last_block_processed via RPC (and allow wallet to return data that is stale) we should change that, I suppose.

jnewbery commented at 7:00 PM on October 13, 2017:

Yes - I agree. m_last_block_processed would be a really nice value to expose in the getwalletinfo RPC.

I have a weak preference to make m_last_block_processed work as you'd expect (tracking the last block processed by the wallet), but this PR is already doing a lot, so I think it's also fine to keep this as is. Please update the comment in wallet.h to say that it's not actually tracking how far the wallet is behind chain sync and shouldn't be publicly exposed until that's changed.

TheBlueMatt commented at 7:29 PM on October 13, 2017:

OK, added more info to the wallet.h comment.

in src/rpc/rawtransaction.cpp:926 in ac204f8f87 outdated

 918 | @@ -917,14 +919,19 @@ UniValue sendrawtransaction(const JSONRPCRequest& request)
 919 |          );
 920 |  
 921 |      ObserveSafeMode();
 922 | +
 923 | +    CTransactionRef tx;
 924 | +    std::promise<void> promise;
 925 | +
 926 | +    { // cs_main scope

jnewbery commented at 7:21 PM on October 11, 2017:

Can you move this cs_main scope further down (to avoid having to declare the CTransactionRef above and then recompute the GetHash() at the end of this function?

TheBlueMatt commented at 4:32 PM on October 13, 2017:

CTransactionRef->GetHash() is (virtually) free because we cache the hash in the CTransaction. Honestly we should probably just go the other way and remove the hashTx local. Additionally, if I'm correctly understanding your request, that would result in a deadlock as the validation interface queue may not be able to make progress as long as cs_main is held in the RPC thread. I've added a comment to validationinterface.h above CallFunctionInValidationInterfaceQueue to this effect.

TheBlueMatt commented at 6:57 PM on October 13, 2017:

Misunderstood your point, moved the scope start down.

jnewbery commented at 7:02 PM on October 13, 2017:

Yes, sorry - my comment was unclear.

Change looks good.

jnewbery commented at 7:21 PM on October 11, 2017: member

Lightly tested ac204f8f87bbc761a4e682c084838a6f47349bae. A few comments and questions inline.

TheBlueMatt force-pushed on Oct 13, 2017

TheBlueMatt commented at 4:37 PM on October 13, 2017: member

Addressed @jnewbery's comments, and added more code comments where things were not entirely clear.

TheBlueMatt force-pushed on Oct 13, 2017

jnewbery commented at 7:35 PM on October 13, 2017: member

Most recent round of changes looks good to me. utACK 024e7b51ee4eeaf21ecda71915438241dd666705

Add a CValidationInterface::TransactionRemovedFromMempool

This is currently unused, but will by used by wallet to cache when
transactions are in the mempool, obviating the need for calls to
mempool from CWalletTx::InMempool()

a7d3936de8

Call TransactionRemovedFromMempool in the CScheduler thread

This is both good practice (we want to move all such callbacks
into a background thread eventually) and prevents a lock inversion
when we go to use this in wallet (mempool.cs->cs_wallet and
cs_wallet->mempool.cs would otherwise both be used).

0343676ce3

Add ability to assert a lock is not held in DEBUG_LOCKORDER 2b4b34503f

Add CallFunctionInQueue to wait on validation interface queue drain 0b2f42d737

Add CWallet::BlockUntilSyncedToCurrentChain()

This blocks until the wallet has synced up to the current height.

5ee3172636

Add calls to CWallet::BlockUntilSyncedToCurrentChain() in RPCs

This prevents the wallet-RPCs-return-stale-info issue from being
re-introduced when new-block callbacks no longer happen in the
block-connection cs_main lock

5d67a7868d

Use callbacks to cache whether wallet transactions are in mempool

This avoid calling out to mempool state during coin selection,
balance calculation, etc. In the next commit we ensure all wallet
callbacks from CValidationInterface happen in the same queue,
serialized with each other. This helps to avoid re-introducing one
of the issues described in #9584 [1] by further disconnecting
wallet from current chain/mempool state.

Thanks to @morcos for the suggestion to do this.

Note that there are several race conditions introduced here:

 * If a user calls sendrawtransaction from RPC, adding a
   transaction which is "trusted" (ie from them) and pays them
   change, it may not be immediately used by coin selection until
   the notification callbacks finish running. No such race is
   introduced in normal transaction-sending RPCs as this case is
   explicitly handled.

 * Until Block{Connected,Disconnected} and
   TransactionAddedToMempool calls also run in the CSceduler
   background thread, there is a race where
   TransactionAddedToMempool might be called after a
   Block{Connected,Disconnected} call happens.

 * Wallet will write a new best chain from the SetBestChain
   callback prior to having processed the transaction from that
   block.

[1] "you could go to select coins, need to use 0-conf change, but
such 0-conf change may have been included in a block who's
callbacks have not yet been processed - resulting in thinking they
are not in mempool and, thus, not selectable."

17220d6325

Also call other wallet notify callbacks in scheduler thread

This runs Block{Connected,Disconnected}, SetBestChain, Inventory,
and TransactionAddedToMempool on the background scheduler thread.

Of those, only BlockConnected is used outside of Wallet/ZMQ, and
is used only for orphan transaction removal in net_processing,
something which does not need to be synchronous with anything
else.

This partially reverts #9583, re-enabling some of the gains from
 #7946. This does not, however, re-enable the gains achieved by
repeatedly releasing cs_main between each transaction processed.

e545dedf72

Fix wallet RPC race by waiting for callbacks in sendrawtransaction cb06edf938

Give ZMQ consistent order with UpdatedBlockTip on scheduler thread

Note that UpdatedBlockTip is also used in net_processing to
announce new blocks to peers. As this may need additional review,
this change is included in its own commit.

3ea8b75281

Add a dev notes document describing the new wallet RPC blocking c4784b5065

Remove redundant pwallet nullptr check 89f03120a0

TheBlueMatt commented at 11:30 PM on October 13, 2017: member

Rebased.

TheBlueMatt force-pushed on Oct 13, 2017

laanwj commented at 3:25 PM on November 15, 2017: member

utACK 89f0312

laanwj merged this on Nov 15, 2017

laanwj closed this on Nov 15, 2017

laanwj referenced this in commit 927a1d7d08 on Nov 15, 2017

in src/wallet/wallet.cpp:1276 in 5ee3172636 outdated

1271 | +
1272 | +    std::promise<void> promise;
1273 | +    CallFunctionInValidationInterfaceQueue([&promise] {
1274 | +        promise.set_value();
1275 | +    });
1276 | +    promise.get_future().wait();

theuni commented at 7:18 PM on November 15, 2017:

Isn't the scheduler queue abandoned at shutdown? What keeps this from blocking forever if timed just right?

TheBlueMatt commented at 10:23 PM on November 15, 2017:

No, init.cpp calls CMainSignals().FlushBackgroundCallbacks() which will empty the CValidationInterfaceQueue. The scheduler itself is abandoned, but we shouldn't generate any new notifications by that point.

in src/wallet/wallet.cpp:3948 in 5ee3172636 outdated

3943 | @@ -3913,6 +3944,10 @@ CWallet* CWallet::CreateWalletFromFile(const std::string walletFile)
3944 |          if (walletdb.ReadBestBlock(locator))
3945 |              pindexRescan = FindForkInGlobalIndex(chainActive, locator);
3946 |      }
3947 | +
3948 | +    walletInstance->m_last_block_processed = chainActive.Tip();

theuni commented at 7:23 PM on November 15, 2017:

Doesn't this entire function need cs_main ?

TheBlueMatt commented at 10:25 PM on November 15, 2017:

It would appear so (though that isn't introduced here)....I'll let @practicalswift take a look since that appears to be his project atm.

practicalswift commented at 8:42 AM on November 16, 2017:

Yes, according to my annotations (#11226) calling CreateWalletFromFile requires holding cs_main:

src/wallet/wallet.h:    static CWallet* CreateWalletFromFile(const std::string walletFile)
                            EXCLUSIVE_LOCKS_REQUIRED(cs_main);

This is due to the following underlying locking requirements:

CreateWalletFromFile is reading the variable chainActive which requires holding the mutex cs_main.
CreateWalletFromFile calls FindForkInGlobalIndex which reads the variable mapBlockIndex. Reading the variable mapBlockIndex requires holding the mutex cs_main.

in src/wallet/wallet.cpp:3029 in 17220d6325 outdated

3023 | @@ -3010,14 +3024,18 @@ bool CWallet::CommitTransaction(CWalletTx& wtxNew, CReserveKey& reservekey, CCon
3024 |          // Track how many getdata requests our transaction gets
3025 |          mapRequestCount[wtxNew.GetHash()] = 0;
3026 |  
3027 | +        // Get the inserted-CWalletTx from mapWallet so that the
3028 | +        // fInMempool flag is cached properly
3029 | +        CWalletTx& wtx = mapWallet[wtxNew.GetHash()];

theuni commented at 8:45 PM on November 15, 2017:

Somewhat unrelated nit: AddToWallet's return value should be checked before using it here, but that was already the case.

TheBlueMatt commented at 10:26 PM on November 15, 2017:

Indeed, though in such cases not much we can do - bdb got upset, we're probably just going to shutdown soon anyway.

laanwj removed this from the "Blockers" column in a project

in src/init.cpp:268 in a7d3936de8 outdated

 264 | @@ -265,6 +265,7 @@ void Shutdown()
 265 |  #endif
 266 |      UnregisterAllValidationInterfaces();
 267 |      GetMainSignals().UnregisterBackgroundSignalScheduler();
 268 | +    GetMainSignals().UnregisterWithMempoolSignals(mempool);

morcos commented at 5:43 PM on November 20, 2017:

@TheBlueMatt Shouldn't this line be before the UnregisterBackgroundSignalScheduler?

TheBlueMatt commented at 8:22 PM on November 20, 2017:

I suppose it would be more correct, yes, though I do not believe this is a bug - mempool should absolutely, absolutely not be generating any events by this point, so if it does (which I believe would result in an assert(false) or so) it would represent some other shutdown-order bug.

practicalswift referenced this in commit a256e6a2f1 on Nov 22, 2017

laanwj referenced this in commit bf72888e72 on Nov 28, 2017

laanwj referenced this in commit d31e5c1d0f on Nov 28, 2017

laanwj referenced this in commit e1fae70700 on Nov 28, 2017

laanwj referenced this in commit 4b5f4f5b42 on Nov 29, 2017

laanwj referenced this in commit 16fff80257 on Nov 30, 2017

ryanofsky commented at 6:42 PM on December 4, 2017: member

According to #11822 (comment), this may be causing a memory leak

HashUnlimited referenced this in commit 050cfaf11f on Mar 14, 2018

jkczyz referenced this in commit 5c3f8e14bb on Aug 22, 2019

jkczyz referenced this in commit dc86c73cf7 on Oct 31, 2019

jkczyz referenced this in commit a54833a54b on Oct 31, 2019

jkczyz referenced this in commit dd2855d6a7 on Oct 31, 2019

jnewbery referenced this in commit d7c23aa1fc on Nov 8, 2019

jkczyz referenced this in commit 3ec3eb55b9 on Nov 8, 2019

jkczyz referenced this in commit f7a527e19f on Nov 11, 2019

jkczyz referenced this in commit 9dd5013542 on Nov 11, 2019

PastaPastaPasta referenced this in commit a5c3d3331b on Jan 17, 2020

PastaPastaPasta referenced this in commit 05bc8343a7 on Jan 22, 2020

PastaPastaPasta referenced this in commit e4996fc941 on Jan 22, 2020

PastaPastaPasta referenced this in commit 0211cbff80 on Jan 29, 2020

PastaPastaPasta referenced this in commit 696cd2869c on Jan 29, 2020

PastaPastaPasta referenced this in commit 5091ab5bcc on Jan 29, 2020

PastaPastaPasta referenced this in commit 61758874b8 on Feb 13, 2020

PastaPastaPasta referenced this in commit 76ed385c03 on Feb 13, 2020

PastaPastaPasta referenced this in commit 3eb80fdcd5 on Feb 29, 2020

PastaPastaPasta referenced this in commit 803530c83d on Feb 29, 2020

furszy referenced this in commit 7d688a3152 on Nov 15, 2020

furszy referenced this in commit 59876a4873 on Dec 17, 2020

furszy referenced this in commit a60a7ee8e1 on Dec 17, 2020

furszy referenced this in commit 4188f0475f on Dec 17, 2020

furszy referenced this in commit 7343143695 on Feb 5, 2021

furszy referenced this in commit 218747678d on Feb 5, 2021

furszy referenced this in commit e7e9d4375a on Feb 10, 2021

furszy referenced this in commit 9e523a7b3e on Feb 14, 2021

furszy referenced this in commit 307d7b1e5c on Feb 26, 2021

LarryRuane referenced this in commit cfc69ede91 on Mar 4, 2021

LarryRuane referenced this in commit e180aa5b04 on Mar 4, 2021

ckti referenced this in commit 8ac6f5c0b4 on Mar 28, 2021

ckti referenced this in commit 38a9ad2b41 on Mar 28, 2021

LarryRuane referenced this in commit 756a33ace4 on Apr 13, 2021

LarryRuane referenced this in commit 3145dce2d8 on Apr 13, 2021

zkbot referenced this in commit 0e36226271 on Apr 17, 2021

gades referenced this in commit 1459aa5a72 on Jun 30, 2021

gades referenced this in commit c9cbceecdd on Feb 12, 2022

gades referenced this in commit b80463372f on Feb 15, 2022

DrahtBot locked this on Feb 15, 2022

Call wallet notify callbacks in scheduler thread (without cs_main) #10286