This should fix #11822.
It ended up bigger than I hoped for, but its not too gnarly. Note that " Require no cs_main lock for ProcessNewBlock/ActivateBestChain" is mostly pure code-movement.
This should fix #11822.
It ended up bigger than I hoped for, but its not too gnarly. Note that " Require no cs_main lock for ProcessNewBlock/ActivateBestChain" is mostly pure code-movement.
2474 | CBlockIndex *pindexNewTip = nullptr;
2475 | int nStopAtHeight = gArgs.GetArg("-stopatheight", DEFAULT_STOPATHEIGHT);
2476 | do {
2477 | boost::this_thread::interruption_point();
2478 | +
2479 | + if (GetMainSignals().CallbacksPending() > 10) {
Should this be parameterized? Or at least a configurable constant? Is there any advantage to tweaking it?
During normal (read: non-IBD/reindex) usage, this should virtually never reach 10. During IBD/reindex, your memory is probably better speint in more dbcache than storing blocks you recently connected to eventually give to your wallet. I dont think more knobs is helpful here.
In commit "Block ActivateBestChain to empty validationinterface queue"
During normal (read: non-IBD/reindex) usage, this should virtually never reach 10. During IBD/reindex, your memory is probably better speint in more dbcache than storing blocks you recently connected to eventually give to your wallet.
Please write this in a code comment. (How is anybody supposed to know this?)
Done.
It would be nice to introduce a named constant for this instead.
1179 | + { 1180 | + LOCK(cs_main); 1181 | 1182 | - const CInv &inv = *it; 1183 | - { 1184 | + while (it != pfrom->vRecvGetData.end() && (it->type == MSG_TX || it->type == MSG_WITNESS_TX)) {
It seems to me that this expects to see transactions first and then blocks. But can't a getdata message have the types in any order?
No, we are allowed to (in fact always have) not finish the full queue before returning, and we do for blocks (always did). A peer shouldn't be able to make us spin giving them 100 blocks before processing any other peers' requests.
Ok, I see now. I didn't see the break for when the type was a block.
In commit "Block ActivateBestChain to empty validationinterface queue"
Please add a code comment here saying this, I had exactly the same question as achow.
ACK-ish
This definitely is fixing the problem I was seeing, although I have one question below.
tACK 3113573c847c76943c065b72d3e7a9edc643fed1
once-over ACK
tACK https://github.com/bitcoin/bitcoin/pull/11824/commits/3113573c847c76943c065b72d3e7a9edc643fed1
I've been running -reindex for over twelve hours and haven't seen an OOM:
job@ali:~$ sudo egrep "Kill.*bitcoind" /var/log/syslog || echo "cool"
cool
Just hit another OOM, but also realized I was on an outdated version of this branch (HEAD was 151b17da1 Block ActivateBestChain to empty validationinterface queue). @TheBlueMatt do you recall what changed between 151b17da1 and current HEAD and if it would explain the difference between another OOM or not?
In any case, I'm pulling down the tip of this branch and starting another -reindex.
@jamesob No, the only differences there were in test_bitcoin.cpp, so that should have had no affect.
I haven't seen any OOMs and my node is almost fully reindexed. Granted it did go down sometime last night as it ran out of disk space. It's been running for ~6 hours now and was running for ~14 hours before then.
Well this definitely fixes a bug...just did a reindex to 150k with 0.15, 0.15.1 master and this branch, peak memory usage on this branch and 0.15 and 0.15.1 were all about the same, master was 3x higher...I'll look more into our memory footprint and see what else is hiding.
tACK (again) https://github.com/bitcoin/bitcoin/pull/11824/commits/3113573c847c76943c065b72d3e7a9edc643fed1
Definitely hitting fewer OOMs reindexing on this branch than I was a week ago. :)
My bitcoind process has been running for 37 hours on this branch; memory usage has held steady through and after a partial reindex.
Rebased.
This should (marginally) speed up validationinterface queue
draining by avoiding a cs_main lock in one client.
Rebased.
Haven't looked at the code, but can confirm that a -reindex OOM/leak issue (#11822) was fixed once I merged in that PR.
1037 | @@ -1038,180 +1038,193 @@ static void RelayAddress(const CAddress& addr, bool fReachable, CConnman* connma 1038 | connman->ForEachNodeThen(std::move(sortfunc), std::move(pushfunc)); 1039 | } 1040 | 1041 | -void static ProcessGetData(CNode* pfrom, const Consensus::Params& consensusParams, CConnman* connman, const std::atomic<bool>& interruptMsgProc) 1042 | +// returns true if we need to call ActivateBestChain before responding 1043 | +bool static ProcessGetBlockData(CNode* pfrom, const Consensus::Params& consensusParams, const CInv& inv, CConnman* connman, const std::atomic<bool>& interruptMsgProc, bool best_chain_active)
In commit "Avoid cs_main in net_processing ActivateBestChain calls"
This change would be easier to understand if ProcessGetBlockData refactoring happened in a different commit than than the cs_main ActivateBestChain unlock.
In commit "Avoid cs_main in net_processing ActivateBestChain calls"
best_chain_active argument is unused, should remove or use or explain
Oops, was a bug, fixed in the refactor.
Done. I think.
1065 | + // If we have the block and all of its parents, but have not yet validated it, 1066 | + // we might be in the middle of connecting it (ie in the unlock of cs_main 1067 | + // before ActivateBestChain but after AcceptBlock). 1068 | + // In this case, we need to run ActivateBestChain prior to checking the relay 1069 | + // conditions below. 1070 | + return true;
In commit "Avoid cs_main in net_processing ActivateBestChain calls"
This is pretty convoluted. If the point is just to release cs_main before calling ActivateBestChain, why not just use LEAVE_CRITICAL_SECTION / ENTER_CRITICAL_SECTION to release it? Is the problem just lack of a reverse_lock RAII helper?
More importantly, if the point is to release cs_main before calling ActivateBestChain, you really need to say this in a code comment, because without looking through code history, I don't think this would be clear at all.
Valid point, refactored a bit, though LEAVE_CRITICAL_SECTION also isnt sufficient as you need to do the mapBlockIndex lookup again, which I went ahead and did.
632 | + LOCK(wallet->cs_wallet); 633 | + auto it = wallet->mapWallet.find(wtx.GetHash()); 634 | + BOOST_CHECK(it != wallet->mapWallet.end()); 635 | + blocktx = CMutableTransaction(*it->second.tx); 636 | + } 637 | + CreateAndProcessBlock({CMutableTransaction(blocktx)}, GetScriptForRawPubKey(coinbaseKey.GetPubKey()));
In commit "Require no cs_main lock for ProcessNewBlock/ActivateBestChain"
Maybe simplify this using mapWallet.at() instead of .find() and BOOST_CHECK.
Done.
2572 | + // Block until the validation queue drains 2573 | + std::promise<void> promise; 2574 | + CallFunctionInValidationInterfaceQueue([&promise] { 2575 | + promise.set_value(); 2576 | + }); 2577 | + promise.get_future().wait();
In commit "Block ActivateBestChain to empty validationinterface queue"
Could add a CValidationInterface::WaitForPendingCallbacks() method to wrap this up since this is also needed in other places.
Done, in a new commit.
utACK da8903142f807be9295797c0e65f228230ffd467. Locking changes look good, and nice to break up ProcessGetData.
Don't send me anymore pub mail. How can't unsubscribe.
On Dec 24, 2017 9:22 AM, "Matt Corallo" notifications@github.com wrote:
@TheBlueMatt commented on this pull request.
In src/validation.cpp https://github.com/bitcoin/bitcoin/pull/11824#discussion_r158606648:
@@ -2566,6 +2567,16 @@ bool CChainState::ActivateBestChain(CValidationState &state, const CChainParams& int nStopAtHeight = gArgs.GetArg("-stopatheight", DEFAULT_STOPATHEIGHT); do { boost::this_thread::interruption_point();
if (GetMainSignals().CallbacksPending() > 10) {// Block until the validation queue drainsstd::promise<void> promise;CallFunctionInValidationInterfaceQueue([&promise] {promise.set_value();});promise.get_future().wait();Done, in a new commit.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bitcoin/bitcoin/pull/11824#discussion_r158606648, or mute the thread https://github.com/notifications/unsubscribe-auth/AhH-NWCScYWtFJ0hUPdgOs8P8xXy58kkks5tDohCgaJpZM4Q1hJ- .
216 | @@ -218,6 +217,8 @@ BOOST_AUTO_TEST_CASE(CreateNewBlock_validity) 217 | for (unsigned int i = 0; i < sizeof(blockinfo)/sizeof(*blockinfo); ++i) 218 | { 219 | CBlock *pblock = &pblocktemplate->block; // pointer for convenience 220 | + { 221 | + LOCK(cs_main); 222 | pblock->nVersion = 1;
Indentation?
626 | @@ -627,10 +627,15 @@ class ListCoinsTestingSetup : public TestChain100Setup 627 | BOOST_CHECK(wallet->CreateTransaction({recipient}, wtx, reservekey, fee, changePos, error, dummy)); 628 | CValidationState state; 629 | BOOST_CHECK(wallet->CommitTransaction(wtx, reservekey, nullptr, state)); 630 | + CMutableTransaction blocktx; 631 | + { 632 | + LOCK(wallet->cs_wallet);
Indentation?
49 | + * CallFunctionInValidationInterfaceQueue([&promise] { 50 | + * promise.set_value(); 51 | + * }); 52 | + * promise.get_future().wait(); 53 | + */ 54 | +void BlockOnValidationInterfaceQueueDrain();
If I understand this correctly, it's not actually waiting until a drain, as new events can be added to the queue before the callback added in this function is executed.
It's more a synchronization point (guaranteeing that at least all pre-existing events are executed first).
utACK af5a003d3cfdd3f3a26dfd7c2db5dde347b4ab3c (with a few non-blocking nits)
This requires the removal of some very liberal (incorrect) cs_mains
sprinkled in some tests. It adds some chainActive.Tip() races, but
the tests are all single-threaded anyway.
utACK 97d2b09c124e6e5803f7fd4503348d9710d1260f (and also some experimental evidence that this indeed removes the runaway memory usage).
Confirmed only indentation/naming changes since af5a003.
I'm about to try this as well, will let you know if I still see extreme memory use during a reindex.