Fix some concurrency issues in ActivateBestChain() #13023

pull skeees wants to merge 3 commits into bitcoin:master from skeees:event-tests changing 6 files +251 −31
  1. skeees commented at 8:09 pm on April 18, 2018: contributor

    Originally this PR was just to add tests around concurrency in block validation - those tests seem to have uncovered another bug in ActivateBestChain - this now fixes that bug and adds tests.

    ActivateBestChain (invoked after a new block is validated) proceeds in steps - acquiring and releasing cs_main while incrementally disconnecting and connecting blocks to sync to the most work chain known (FindMostWorkChain()). Every time cs_main is released the result of FindMostWorkChain() can change - but currently that value is cached across acquisitions of cs_main and only refreshed when an invalid chain is explored. It needs to be refreshed every time cs_main is reacquired. The test added in https://github.com/bitcoin/bitcoin/pull/13023/commits/6094ce73045fe0b4654ff94327c2059512af88fb will occasionally fail without the commit fixing this issue https://github.com/bitcoin/bitcoin/pull/13023/commits/26bfdbaddbb9f13864deb7241c6d513f22c5ab62

    Original description below

    After a bug discovered where UpdatedBlockTip() notifications could be triggered out of order (#12978), these unit tests check certain invariants about these signals.

    The scheduler test asserts that a SingleThreadedSchedulerClient processes callbacks fully and sequentially.

    The block validation test generates a random chain and calls ProcessNewBlock from multiple threads at random and in parallel. ValidationInterface callbacks verify that the ordering of BlockConnected BlockDisconnected and UpdatedBlockTip events occur as expected.

  2. in src/Makefile.test.include:91 in 5f8f1bed6c outdated
    86@@ -87,7 +87,8 @@ BITCOIN_TESTS =\
    87   test/txvalidationcache_tests.cpp \
    88   test/versionbits_tests.cpp \
    89   test/uint256_tests.cpp \
    90-  test/util_tests.cpp
    91+  test/util_tests.cpp \
    92+  test/validation_block_tests.cpp
    


    jamesob commented at 8:11 pm on April 18, 2018:
    Looks like there’s an alphabetic ordering here that may be good to preserve.
  3. in src/test/scheduler_tests.cpp:133 in 5f8f1bed6c outdated
    128+    for (int i = 0; i < 5; i++) {
    129+        threads.create_thread(boost::bind(&CScheduler::serviceQueue, &scheduler));
    130+    }
    131+
    132+    // these are not atomic, if SinglethreadedSchedulerClient prevents
    133+    // parallel execution at the queue level no synchronization should required here
    


    jamesob commented at 8:13 pm on April 18, 2018:

    should be required here

  4. in src/test/validation_block_tests.cpp:86 in 5f8f1bed6c outdated
    81+    auto pblock = Block(prev_hash);
    82+    pblock->vtx.push_back(pblock->vtx[0]);
    83+    pblock->hashMerkleRoot = BlockMerkleRoot(*pblock);
    84+
    85+    while (!CheckProofOfWork(pblock->GetHash(), pblock->nBits, Params().GetConsensus()))
    86+        ++(pblock->nNonce);
    


    jamesob commented at 8:23 pm on April 18, 2018:
    Could be worth parameterizing Block with a bool make_invalid option to avoid the duplication here, but that’s your call.

    skeees commented at 3:25 pm on April 20, 2018:
    Extracted into FinalizeBlock - its probably worth refactoring BlockAssembler into a builder style class - would make unit tests much easier to write … maybe one day

    jamesob commented at 5:14 pm on April 20, 2018:
    Yep, definitely agree.
  5. in src/test/validation_block_tests.cpp:103 in 5f8f1bed6c outdated
    86+        ++(pblock->nNonce);
    87+
    88+    return pblock;
    89+}
    90+
    91+void BuildChain(const uint256& root, int height, const unsigned int invalid_rate, const unsigned int branch_rate, const unsigned int max_size, std::vector<std::shared_ptr<const CBlock>>& blocks)
    


    jamesob commented at 8:24 pm on April 18, 2018:
    I know we don’t have columnar limits in the styleguide, but this line’s pretty long…

    skeees commented at 3:25 pm on April 20, 2018:
    i agree with you - i originally had this on two lines - and then the linter put it all back on one
  6. skeees renamed this:
    Add unit tests for signals generated by ProcessNewBlock
    [WIP] Add unit tests for signals generated by ProcessNewBlock
    on Apr 18, 2018
  7. in src/test/validation_block_tests.cpp:100 in 5f8f1bed6c outdated
     95+    bool gen_invalid = GetRand(100) < invalid_rate;
     96+    bool gen_fork = GetRand(100) < branch_rate;
     97+
     98+    const std::shared_ptr<const CBlock> pblock = gen_invalid ? BadBlock(root) : GoodBlock(root);
     99+    blocks.push_back(pblock);
    100+    if (!gen_invalid)
    


    jamesob commented at 8:25 pm on April 18, 2018:
    Braces needed.

    Empact commented at 9:37 pm on April 18, 2018:
    clang-format would put those in I believe

    skeees commented at 3:25 pm on April 20, 2018:
    it doesn’t! but i put them in now
  8. jamesob commented at 8:39 pm on April 18, 2018: member
    Concept ACK, just a few nits. Nice test!
  9. in src/test/validation_block_tests.cpp:135 in 5f8f1bed6c outdated
    130+    // this will create parallelism and randomness inside validation - the ValidationInterface
    131+    // will subscribe to events generated during block validation and assert on ordering invariance
    132+    boost::thread_group threads;
    133+    for (int i = 0; i < 10; i++) {
    134+        threads.create_thread([&blocks]() {
    135+            bool blah;
    


    Empact commented at 9:37 pm on April 18, 2018:
    Calling this ignored might be more straightforward
  10. fanquake added the label Tests on Apr 18, 2018
  11. skeees force-pushed on Apr 20, 2018
  12. skeees force-pushed on Apr 20, 2018
  13. skeees renamed this:
    [WIP] Add unit tests for signals generated by ProcessNewBlock
    Always refresh most work chain when reacquiring cs_main in ActivateBestChain()
    on Apr 20, 2018
  14. skeees commented at 3:27 pm on April 20, 2018: contributor
    Updated to address reviewer comments and fix a bug that this test seems to have uncovered
  15. skeees force-pushed on Apr 20, 2018
  16. MarcoFalke removed the label Tests on Apr 22, 2018
  17. MarcoFalke added the label Validation on Apr 22, 2018
  18. in src/validation.cpp:2676 in 26bfdbaddb outdated
    2672@@ -2673,9 +2673,7 @@ bool CChainState::ActivateBestChain(CValidationState &state, const CChainParams&
    2673             ConnectTrace connectTrace(mempool); // Destructed before cs_main is unlocked
    2674 
    2675             CBlockIndex *pindexOldTip = chainActive.Tip();
    2676-            if (pindexMostWork == nullptr) {
    2677-                pindexMostWork = FindMostWorkChain();
    2678-            }
    2679+            pindexMostWork = FindMostWorkChain();
    


    ryanofsky commented at 6:07 pm on April 23, 2018:
    Might be good to add a comment here saying why it’s important to recompute pindexMostWork each loop iteration (how sync could get stuck otherwise).

    sdaftuar commented at 4:13 pm on April 26, 2018:
    This might cause substantial slow-down during IBD or reindex, because currently FindMostWorkChain walks from the candidate tip to the fork point on the current chain to ensure none of the blocks are invalid. We may be able to optimize that, but we also might want to consider another solution.
  19. in src/test/validation_block_tests.cpp:61 in 6094ce7304 outdated
    56+
    57+    CScript pubKey;
    58+    pubKey << i++ << OP_TRUE;
    59+
    60+    auto ptemplate = BlockAssembler(Params()).CreateNewBlock(pubKey, false);
    61+    CBlock* pblock = new CBlock(ptemplate->block);
    


    ryanofsky commented at 6:13 pm on April 23, 2018:

    I’d think this could be written more simply as:

    0auto block = make_shared<CBlock>(ptemplate->block);
    1block->hashPrevBlock = ...
    2return block;
    

    It looks like pblock is leaked currently, or is this not the case?

  20. ryanofsky commented at 6:27 pm on April 23, 2018: member
    utACK 6094ce73045fe0b4654ff94327c2059512af88fb. Fun tests!
  21. sdaftuar commented at 3:53 pm on April 26, 2018: member

    Thanks for opening this PR. I thought it would be helpful for other reviewers to clarify the (IMO significant) bug here, so that we can properly evaluate fixes.

    If ActivateBestChain is invoked simultaneously in separate threads, then we can end up at a lower-work tip, and remain stuck until the next block comes in.

    • Suppose in thread 1, we have just been delivered a block that causes us to try to activate two blocks in ABC. We connect 1 of them in ABCStep, and then release cs_main before connecting the second.
    • In thread 2, suppose we have been delivered a 3rd block (say via rpc) that builds on the first two. It invokes ABC and gets to run after the first block has been connected in thread 1. It connects one block, releases cs_main, and then connects one more, and finishes.
    • When thread 1 gets to run again, the most work chain has advanced, but (before this PR) we don’t refresh pindexMostWork (except when we find an invalid block). Consequently we would invoke ABCStep with a less work tip than our current tip(!). This would cause us to disconnect our tip and return.

    Some travis failures have been observed due to this bug, as seen for instance here: https://travis-ci.org/bitcoin/bitcoin/jobs/370848272. The test that sometimes fails, rpc_deprecated.py, generates blocks on two nodes roughly simultaneously, so one of the nodes is generating blocks in an rpc thread while also processing blocks on the network thread, which I believe is enough to trigger this bug.

  22. in src/test/validation_block_tests.cpp:91 in 6094ce7304 outdated
    86+const std::shared_ptr<const CBlock> BadBlock(const uint256& prev_hash)
    87+{
    88+    auto pblock = Block(prev_hash);
    89+
    90+    // a second coinbase will make this block invalid
    91+    pblock->vtx.push_back(pblock->vtx[0]);
    


    sdaftuar commented at 4:28 pm on April 26, 2018:

    This method of constructing a bad block is not ideal, because this kind of invalidity is detected well before ConnectBlock. In order to get better code coverage, I think it would be better to generate different kinds of invalid blocks, to test failure at different points in the validation process.

    (As it is, I believe this method of making an invalid block is detectable in CheckBlock (merkle-tree fails validation due to duplicate transactions). It would also fail in CheckBlock for having more than one coinbase, and it would fail in ContextualCheckBlock for having an invalid witness commitment.)

  23. sdaftuar changes_requested
  24. sdaftuar commented at 4:31 pm on April 26, 2018: member
    Nice work finding this issue and good start on the new test. I haven’t finished my review yet, but here’s my initial feedback.
  25. sdaftuar commented at 3:22 pm on April 27, 2018: member

    I have a branch where I changed the test you wrote to produce invalid blocks that would be detected in ConnectBlock, and where I have an alternate fix for the bug you found here, as well as a fix for the bug I mentioned in #13092: https://github.com/sdaftuar/bitcoin/commits/2018-04-alternate-abc-fix.

    For fixing the bug here, rather than invoke FindMostWorkChain on every loop iteration (which I expect to be very slow during reindex), I instead added a new test: if the tip has changed since the last loop iteration, or if pindexMostWork is no longer in setBlockIndexCandidates, then update pindexMostWork. (I believe that only the first criteria is actually necessary, but included the second as belt-and-suspenders.)

    I compared this approach vs master for doing a reindex on my workstation, and saw < 3% slowdown (some slowdown was expected due to checking setBlockIndexCandidates each time).

  26. skeees commented at 4:29 pm on April 27, 2018: contributor
    Nice - thanks for this. I wonder if it might be a bit more readable to explore feasibility of caching results internally in findmostworkchain instead of doing it in the activatebestchain loop - the control flow there is getting harder and harder to follow. I’m happy to explore that.
  27. sdaftuar commented at 6:07 pm on April 27, 2018: member
    @TheBlueMatt and I discussed that idea offline (of rewriting FindMostWorkChain to use g_failed_blocks, our internal data structure for caching invalid blocks, to speed it up) but our initial reaction was that the review overhead to ensure correctness might be very high. I’m open to other ideas though (and if you think reworking FMWC is practical please feel free to take a shot at it).
  28. laanwj added this to the milestone 0.16.1 on May 3, 2018
  29. laanwj added the label Needs backport on May 3, 2018
  30. skeees force-pushed on May 4, 2018
  31. skeees commented at 3:57 pm on May 4, 2018: contributor

    @sdaftuar - pulled in your commits.

    Its still hard for me to convince myself that the set of conditions you’ve defined are the only cases in which pindexMostWork needs to be refreshed. Maybe I’m just not familiar enough with those areas - I’ll leave it to the people with more experience in that area to review.

    Just wanted to throw out one alternate way to do determine when to refresh most work tip that’s (imo) a bit easier to convince myself behaves correctly. Instead of caching pindexOldTip and using that as well as a couple of other things to determine whether to refresh - how about giving setBlockIndexCandidates a sequence number that gets incremented every time it gets updated. Then all ABC has to do is record the last tip candidate set version it saw (instead of the old tip) and use that to determine whether a refresh may be necessary. You could do this in ABC or even internally in FindMostWorkChain.

  32. MarcoFalke commented at 6:04 pm on May 4, 2018: member

    Copy of travis output:

     0Running tests: validation_block_tests from test/validation_block_tests.cpp
     1Running 1 test case...
     2Test cases order is shuffled using seed: 1374948318
     3Entering test module "Bitcoin Test Suite"
     4test/validation_block_tests.cpp(24): Entering test suite "validation_block_tests"
     5test/validation_block_tests.cpp(125): Entering test case "processnewblock_signals_ordering"
     6test/validation_block_tests.cpp(160): info:  has passed
     7test/validation_block_tests.cpp(160): info: ProcessNewBlock(Params(), block, true, &ignored)
     8test/validation_block_tests.cpp(160): info: check 
     9test/validation_block_tests.cpp(160): info: ProcessNewBlock(Params(), block, true, &ignored)
    10test/validation_block_tests.cpp(160): info: ProcessNewBlock(Params(), block, true, &ignored)
    11test/validation_block_tests.cpp(160): info: check 
    12test/validation_block_tests.cpp(160): info:  has passed
    13test/validation_block_tests.cpp(160): info: ProcessNewBlock(Params(), block, true, &ignored)
    14terminate called after throwing an instance of 'std::length_error'
    15  what():  basic_string::_S_create
    16make[3]: *** [test/validation_block_tests.cpp.test] Error 1
    
  33. sdaftuar commented at 7:11 pm on May 4, 2018: member
    From some quick web searching, it seems that BOOST_CHECK may not be thread safe, so invoking it in each thread is likely the problem here. Perhaps we should just assert() on the return value from ProcessNewBlock() (at line 160).
  34. sdaftuar commented at 7:42 pm on May 4, 2018: member

    I do occasionally see this failure when I run validation_block_tests in a tight loop (after fixing the BOOST_CHECK issue above):

    0test/validation_block_tests.cpp(38): error in "processnewblock_signals_ordering": check m_expected_tip == block->hashPrevBlock failed [0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206 != 0000000000000000000000000000000000000000000000000000000000000000]
    1Segmentation fault (core dumped)
    

    Edit: perhaps this is also a threading issue though?

  35. skeees force-pushed on May 4, 2018
  36. skeees commented at 7:53 pm on May 4, 2018: contributor
    Hmmm thats the genesis block which connects through a different path I believe
  37. skeees commented at 8:07 pm on May 4, 2018: contributor
    @sdaftuar - thats the genesis block getting activated. Feels like it may be more of a test artefact (wasn’t there a recent change that randomizes test case execution order) than a threading issue? I think solution for that is to just call PNB(genesis) at the very beginning.
  38. sdaftuar commented at 8:08 pm on May 4, 2018: member

    @skeees My reasoning is this:

    • Assume ABC never releases cs_main until the new tip has work >= that of the old tip..
    • Then it should be safe to always try to connect to pindexMostWork as long as it is still a candidate to be our tip (ie it’s still in setBlockIndexCandidates). Since we prune entries from setBlockIndexCandidates in ABCStep, if an entry is in setBlockIndexCandidates, then working towards it is still making progress (even if we might end up scrapping it for an even-more-work tip later – it’s no different from just learning about the more work tip after you had finished connecting the other blocks, rather than before).
    • However there is a small risk that we might have two threads that are simultaneously trying to connect several blocks towards two different tips, one of which has more work than the other, and both of which have more work than our prior tip. Because ABCStep continues until it arrives at a block that has more work than the prior tip, this must converge, but it could annoyingly take a while (eg thread 1 connects 1 block out of 10, then thread 2 connects 2 blocks out of the 11 it needs to get to its desired tip, then thread 1 connects 3 blocks, and so on). This is the convergence issue I alluded to in the code comment.
    • So as an optimization, we can just check to see if the tip has changed since we last left off. If the tip has not changed we can use the assumption that we never release cs_main at a less-work tip than we were at before to be sure that whatever our pindexMostWork target from before is must still be a valid goal.
      In particular, it’s not possible for (say) other threads invoking ABC concurrently to leave us at our tip if they have been connecting towards more-work tips.
    • I believe that checking for the tip change is sufficient for this to be safe, but as belt-and-suspenders I also included the check for pindexMostWork being in setBlockIndexCandidates, just in case. I think the worst-case behavior if there’s a hole in the logic would be the slow-convergence-issue in the event that multiple threads are trying to connect lots of blocks towards competing tips, which seems pretty low risk to me anyway.

    I believe that the fix to not release cs_main if we encounter an invalid block achieves the assumption listed at the top, and thus the rest follows.

    Just wanted to throw out one alternate way to do determine when to refresh most work tip that’s (imo) a bit easier to convince myself behaves correctly. Instead of caching pindexOldTip and using that as well as a couple of other things to determine whether to refresh - how about giving setBlockIndexCandidates a sequence number that gets incremented every time it gets updated. Then all ABC has to do is record the last tip candidate set version it saw (instead of the old tip) and use that to determine whether a refresh may be necessary. You could do this in ABC or even internally in FindMostWorkChain.

    This might also be a reasonable solution; it seems like a bigger refactor to me since we don’t have great encapsulation on setBlockIndexCandidates at the moment, but I agree that it could be a clearer logical change. @sipa Any thoughts on this issue?

  39. skeees force-pushed on May 4, 2018
  40. in src/test/validation_block_tests.cpp:144 in 04176fc02f outdated
    139+    // Process all the headers so we understand the toplogy of the chain
    140+    BOOST_CHECK(ProcessNewBlockHeaders(headers, state, Params()));
    141+    ProcessNewBlock(Params(), std::make_shared<CBlock>(Params().GenesisBlock()), true, &ignored);
    142+
    143+    // subscribe to events (this subscriber will validate event ordering)
    144+    TestSubscriber sub(chainActive.Tip()->GetBlockHash());
    


    sdaftuar commented at 0:35 am on May 5, 2018:

    As discussed on IRC, there’s a race condition where callbacks scheduled prior to a client subscribing to callbacks can be delivered to those clients, if the scheduler didn’t finish draining its queue before the new subscriber is added.

    That causes this test to (rarely) fail, because we initialize the TestSubscriber with the genesis block, even though it might be notified of the genesis block’s connection (which has already happened at this point!) if the callbacks were delayed. This can be reproduced by adding a MilliSleep() call to SingleThreadedSchedulerClient::ProcessQueue().

    The call to ProcessNewBlock at line 141 above doesn’t fix this issue; instead we need to ensure that there are no outstanding callbacks at the time the TestSubscriber is created, for instance by calling SyncWithValidationInterfaceQueue(). Also we should hold cs_main before accessing chainActive:

    0    SyncWithValidationInterfaceQueue();
    1
    2    const CBlockIndex *initial_tip = nullptr;
    3    {
    4        LOCK(cs_main);
    5        initial_tip = chainActive.Tip();
    6    }
    7
    8    // subscribe to events (this subscriber will validate event ordering)
    9    TestSubscriber sub(initial_tip->GetBlockHash());
    
  41. skeees force-pushed on May 7, 2018
  42. skeees renamed this:
    Always refresh most work chain when reacquiring cs_main in ActivateBestChain()
    Fix some concurrency issues in ActivateBestChain()
    on May 7, 2018
  43. in src/validation.cpp:151 in c496b7b386 outdated
    143@@ -144,6 +144,12 @@ class CChainState {
    144       */
    145     std::set<CBlockIndex*> g_failed_blocks;
    146 
    147+    /**
    148+     * the ChainState CriticalSection
    149+     * A lock that must be held when modifying this ChainState - held in ActivateBestChain()
    150+     */
    151+    CCriticalSection m_cs_cs;
    


    sipa commented at 5:46 pm on May 7, 2018:
    I believe it may be worth adding 16 bytes to the source code to give this the mildly less confusing name m_chainstate_cs instead.
  44. sipa commented at 6:28 pm on May 7, 2018: member

    utACK code changes; I didn’t review the tests.

    Also, when introducing a test to demonstrate the problem, put it after the fix. Doing otherwise means introducing temporarily broken commits, which may break the ability to bisect.

  45. in src/validation.cpp:2661 in c496b7b386 outdated
    2657@@ -2652,6 +2658,7 @@ bool CChainState::ActivateBestChain(CValidationState &state, const CChainParams&
    2658     // us in the middle of ProcessNewBlock - do not assume pblock is set
    2659     // sanely for performance or correctness!
    2660     AssertLockNotHeld(cs_main);
    2661+    LOCK(m_cs_cs);
    


    sdaftuar commented at 6:38 pm on May 7, 2018:
    nit: I think it’s worth commenting here with some explanation of the types of race conditions that this is designed to protect (along the lines of what is in the commit message, perhaps)
  46. in src/test/test_bitcoin.h:124 in 9dd61f272a outdated
    118@@ -119,5 +119,6 @@ struct TestMemPoolEntryHelper
    119 };
    120 
    121 CBlock getBlock13b8a();
    122+std::ostream& operator<<(std::ostream& os, const uint256& num);
    


    sdaftuar commented at 6:55 pm on May 7, 2018:
    nit: perhaps add a comment explaining why this is added? (guessing it’s because you want to use uint256’s in BOOST_CHECK_EQUAL())
  47. in src/test/validation_block_tests.cpp:7 in 9dd61f272a outdated
    0@@ -0,0 +1,171 @@
    1+// Copyright (c) 2018 The Bitcoin Core developers
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#include <iomanip>
    6+#include <memory>
    7+#include <sstream>
    


    sdaftuar commented at 7:06 pm on May 7, 2018:
    nit: doesn’t seem like these includes are necessary at all, am I missing some dependency?
  48. sdaftuar approved
  49. sdaftuar commented at 7:12 pm on May 7, 2018: member
    ACK c496b7b386a66fb4fc73d3bacd3db2a5c4bf7872, though I left some nits for you to consider.
  50. skeees force-pushed on May 7, 2018
  51. skeees commented at 7:44 pm on May 7, 2018: contributor
    @sipa - commits now correctly ordered and m_cs_cs renamed to m_cs_chainstate
  52. sdaftuar commented at 7:47 pm on May 7, 2018: member
    re-ACK ed1828c202c9f4376492c173e361814320a01ffa
  53. skeees force-pushed on May 7, 2018
  54. skeees force-pushed on May 8, 2018
  55. skeees force-pushed on May 11, 2018
  56. in src/validation.cpp:2719 in 8febf4850b outdated
    2735+                for (const PerBlockConnectTrace& trace : connectTrace.GetBlocksConnected()) {
    2736+                    assert(trace.pblock && trace.pindex);
    2737+                    GetMainSignals().BlockConnected(trace.pblock, trace.pindex, trace.conflictedTxs);
    2738+                }
    2739+            } while (!chainActive.Tip() || (starting_tip && CBlockIndexWorkComparator()(chainActive.Tip(), starting_tip)));
    2740+            if(!blocks_connected) return true;
    


    jamesob commented at 7:31 pm on May 11, 2018:
    nit: if (...
  57. in src/validation.cpp:149 in 8febf4850b outdated
    143@@ -144,6 +144,12 @@ class CChainState {
    144       */
    145     std::set<CBlockIndex*> g_failed_blocks;
    146 
    147+    /**
    148+     * the ChainState CriticalSection
    149+     * A lock that must be held when modifying this ChainState - held in ActivateBestChain()
    


    jamesob commented at 7:34 pm on May 11, 2018:

    must be held when modifying this ChainState

    Seems like this statement is too broad; many aspects of this object are modified without this lock held (e.g. g_failed_blocks, setBlockIndexCandidates).


    sdaftuar commented at 7:18 pm on May 16, 2018:
    I agree – this is more of a lock on the logic in ABC than any particular data structure. The comment in ABC itself is pretty good though.
  58. in src/validation.cpp:2663 in 8febf4850b outdated
    2659@@ -2653,6 +2660,12 @@ bool CChainState::ActivateBestChain(CValidationState &state, const CChainParams&
    2660     // sanely for performance or correctness!
    2661     AssertLockNotHeld(cs_main);
    2662 
    2663+    // ABC maintains a fair degree of expensive to calculate internal state
    


    jamesob commented at 7:35 pm on May 11, 2018:
    I had a bit of trouble parsing this comment initially - might be easier to read if you wrote “expensive-to-calculate” instead?
  59. jamesob approved
  60. jamesob commented at 7:37 pm on May 11, 2018: member

    utACK https://github.com/bitcoin/bitcoin/pull/13023/commits/8febf4850b11406883d457d3c63e713acfacf682

    The tests you’ve written are great (we need more like these) and the changeset fixes a definite bug, so I’m :+1: on this. I left a few nits but feel free to ignore or fix later.

  61. in src/validation.cpp:2708 in 4931c6d298 outdated
    2728 
    2729-            for (const PerBlockConnectTrace& trace : connectTrace.GetBlocksConnected()) {
    2730-                assert(trace.pblock && trace.pindex);
    2731-                GetMainSignals().BlockConnected(trace.pblock, trace.pindex, trace.conflictedTxs);
    2732-            }
    2733+            const CBlockIndex * pindexFork = chainActive.FindFork(pindexOldTip);
    


    TheBlueMatt commented at 7:40 pm on May 11, 2018:
    Not sure if this is my fault or not, but this definitely needs to be finding the fork since starting_tip for the notifications, not just since the last loop iteration. Also, we should probably move the UpdatedBlockTip callback generation inside the pindexFork != pindexNewTip check as otherwise you’ll get rather nonsense callbacks (and none of the callers seem to rely on any specific behavior). Either way behavior needs documentation in validationinterface.h (or I can do it in #12979 once this gets merged).
  62. in src/validation.cpp:2665 in 1eee5ef33b outdated
    2659@@ -2653,6 +2660,12 @@ bool CChainState::ActivateBestChain(CValidationState &state, const CChainParams&
    2660     // sanely for performance or correctness!
    2661     AssertLockNotHeld(cs_main);
    2662 
    2663+    // ABC maintains a fair degree of expensive to calculate internal state
    2664+    // because this function periodically releases cs_main so that it does not lock up other threads for too long
    2665+    // during large re-orgs - and to allow for e.g. the callback queue to drain
    


    TheBlueMatt commented at 7:41 pm on May 11, 2018:
    nit: comment is slightly wrong now, we dont release cs_main during large reorgs, only large connects.
  63. in src/test/scheduler_tests.cpp:133 in 9c31fa3228 outdated
    128+    for (int i = 0; i < 5; i++) {
    129+        threads.create_thread(boost::bind(&CScheduler::serviceQueue, &scheduler));
    130+    }
    131+
    132+    // these are not atomic, if SinglethreadedSchedulerClient prevents
    133+    // parallel execution at the queue level no synchronization should be required here
    


    TheBlueMatt commented at 7:57 pm on May 11, 2018:
    This isnt sufficient - just because two things are ordered does not mean they have a consistent view into memory unless there was something that triggered a flush. In practice I believe the lock inside of the SingleThreadedSchedulerClient provides this, but that’s certainly not guaranteed in the future. A release-acquire should be sufficient, however.

    skeees commented at 6:04 pm on May 12, 2018:
    don’t quite understand what you mean here - isn’t the whole purpose of SingleThreadedSchedulerClient to ensure internally consistent ordering/views of callbacks on a single client? if a memory barrier is somehow necessary to achieve this (like you I think the lock is sufficient for now) then that should be done in the SingleThreadedSchedulerClient and not be the caller’s responsibility. This test was intended to demonstrate that the api guarantees this behavior to users.

    TheBlueMatt commented at 4:35 pm on May 13, 2018:
    The C++ memory model does not provide any ordering guarantees across variables at different memory locations unless you have a) some operation dependency tree which creates such an order, or b) a rel_acq/seq_cst operation somewhere which implicitly creates such an order (which a mutex lock/unlock essentially implicitly creates). So if, eg, the SingleThreadedSchedulerClient used only relaxed operations to load the next callback to call, and both callback clients update the variable, there is no happens-before relationship created on the variable and you may not have consistent results. Having the rel_acq implied by the internal lock provides what you want implicitly, but its not a guarantee. A release/acquire operation should be sufficient here.

    skeees commented at 2:35 pm on May 14, 2018:
    oh hmm - but each client only interacts with its own “counter” there’s no sharing across clients in this tests. I’m simply checking that for each client, its own callbacks are processed sequentially - which i think is something that the singlethreadedschedulerclient api is supposed to guarantee (otherwise might as well just use the raw scheduler) - and no current test existed

    TheBlueMatt commented at 4:29 pm on May 14, 2018:
    Ugh, my memory of the C++ memory model was slightly faulty, I’ve updated the comment above, though its much less a concern now (however should sitll be fixed). You could use relaxed operations to try to optimistically fetch the next callback to execute, succeed, and then have one thread run a callback, the other thread run a second callback but for the same client, and create no dependancy. On many non-x86 platforms the CPU is free to reorder (apparently) unrelated stores/loads.
  64. TheBlueMatt commented at 7:59 pm on May 11, 2018: member
    Didnt review the last new test (awesome, btw).
  65. Do not unlock cs_main in ABC unless we've actually made progress.
    Technically, some internal datastructures may be in an inconsistent
    state if we do this, though there are no known bugs there. Still,
    for future safety, its much better to only unlock cs_main if we've
    made progress (not just tried a reorg which may make progress).
    ecc3c4a019
  66. Fix concurrency-related bugs in ActivateBestChain
    If multiple threads are invoking ActivateBestChain, it was possible to have
    them working towards different tips, and we could arrive at a less work tip
    than we should.  Fix this by introducing a ChainState lock which must
    be held for the entire duration of ActivateBestChain to enforce
    exclusion in ABC.
    a3ae8e6873
  67. skeees force-pushed on May 12, 2018
  68. TheBlueMatt commented at 4:33 pm on May 14, 2018: member
    utACK non-test changes through a3ae8e68739023e5dba9e5cb190e707ed4603316, a3070a861bb3a76303b7859c4dd96ba886029111 would be fine with some release/acquire atomics, and 51539506577e5439c58b2e36c07d84f33bb4d4c5 looks awesome, though I haven’t reviewed it fully.
  69. skeees commented at 6:02 pm on May 14, 2018: contributor
    I will follow up offline in case I’m totally missing something - but I feel pretty strongly that sequential consistency should be guaranteed by the SingleThreadedSchedulerClient - whether the callbacks are actually run by one or multiple threads internally, it should feel single threaded (it’s even named that way) and the caller shouldn’t have to worry about it. Otherwise you open up a ton of surface area for more bugs like the ones being fixed here to creep in. Of course if there’s something I’m missing in the current implementation of the SingleThreadedSchedulerClient that doesn’t guarantee this ordering, I’m happy to fix it here, but it appears that the lock guarantees it now and I’d strongly advocate that any future implementations without the lock still maintain that guarantee.
  70. TheBlueMatt commented at 6:05 pm on May 14, 2018: member
    @skeees hmm, maybe we’re talking past each other, but my point was that you can fully guarantee both ordering and only one callback executing at once without implying seq_cst/rel_acq in the C++ memory model. I dont think the memory consistency of a client should be implied by an API as it can impose runtime cost.
  71. skeees force-pushed on May 14, 2018
  72. skeees commented at 8:19 pm on May 14, 2018: contributor

    Ok - I finally understand what you’re saying. I still disagree. I think it makes the API pretty unusable if you drop that guarantee - and the theoretical (because its single threaded right now) performance tradeoff is worth the usability - and certainly it should not be called SingleThreaded anything since it drops the single threaded memory model. But in the interest of not delaying 0.16.1 I’ve removed that commit and will follow up in a separate PR.

    Btw though - I think the second test still has the same issue (different callbacks referencing the same unsynchronized memory)

  73. TheBlueMatt commented at 7:49 pm on May 15, 2018: member
    utACK 7b4be5089938b188ecca86742aa6aa4d2ec0c40e, though I didn’t review the last test in detail (thanks for making that!).
  74. laanwj commented at 11:56 am on May 16, 2018: member

    Compiling this locally gives me:

    0/.../bitcoin/src/test/validation_block_tests.cpp:73:13: error: use of undeclared identifier 'CheckProofOfWork'
    1    while (!CheckProofOfWork(pblock->GetHash(), pblock->nBits, Params().GetConsensus())) {
    2            ^
    31 error generated.
    

    Adding #include <pow.h> fixes it.

  75. Add unit tests for signals generated by ProcessNewBlock()
    After a recent bug discovered in callback ordering in MainSignals,
    this test checks invariants in ordering of
    BlockConnected / BlockDisconnected / UpdatedChainTip signals
    dd435ad402
  76. skeees force-pushed on May 16, 2018
  77. skeees commented at 12:29 pm on May 16, 2018: contributor
    thanks @laanwj - updated
  78. in src/validation.cpp:2691 in ecc3c4a019 outdated
    2696-                return true;
    2697+                bool fInvalidFound = false;
    2698+                std::shared_ptr<const CBlock> nullBlockPtr;
    2699+                if (!ActivateBestChainStep(state, chainparams, pindexMostWork, pblock && pblock->GetHash() == pindexMostWork->GetBlockHash() ? pblock : nullBlockPtr, fInvalidFound, connectTrace))
    2700+                    return false;
    2701+                blocks_connected = true;
    


    sdaftuar commented at 1:57 pm on May 16, 2018:
    nit: This is not a very good variable name, as ABCStep can return true even if no blocks were connected.
  79. in src/validation.cpp:2703 in ecc3c4a019 outdated
    2719-            fInitialDownload = IsInitialBlockDownload();
    2720+                for (const PerBlockConnectTrace& trace : connectTrace.GetBlocksConnected()) {
    2721+                    assert(trace.pblock && trace.pindex);
    2722+                    GetMainSignals().BlockConnected(trace.pblock, trace.pindex, trace.conflictedTxs);
    2723+                }
    2724+            } while (!chainActive.Tip() || (starting_tip && CBlockIndexWorkComparator()(chainActive.Tip(), starting_tip)));
    


    sdaftuar commented at 2:03 pm on May 16, 2018:

    I believe this means we’ll infinite loop if we ever had a bug like the race condition that this PR fixes, that could cause us to invoke ABCStep with a pindexMostWork that is less work than our tip.

    I think an infinite loop would be strictly worse than an assert failure, so could we also add an assert above that pindexMostWork has more work than chainActive.Tip(), before we invoke ABCStep?

  80. laanwj commented at 4:24 pm on May 16, 2018: member
    utACK dd435ad40267f5c50ff17533c696f9302829a6a6
  81. laanwj merged this on May 16, 2018
  82. laanwj closed this on May 16, 2018

  83. laanwj referenced this in commit 11e7bdfd90 on May 16, 2018
  84. sdaftuar commented at 7:21 pm on May 16, 2018: member
    Post-merge ACK, with some nits.
  85. fanquake added this to the "For backport" column in a project

  86. skeees deleted the branch on May 24, 2018
  87. MarcoFalke referenced this in commit 3663900717 on May 24, 2018
  88. MarcoFalke referenced this in commit 8a7cfd5557 on May 24, 2018
  89. MarcoFalke referenced this in commit e8a283f554 on May 24, 2018
  90. MarcoFalke referenced this in commit 0948153ea6 on May 24, 2018
  91. MarcoFalke referenced this in commit bb79aaf93a on May 24, 2018
  92. MarcoFalke referenced this in commit d6c3a08c48 on May 24, 2018
  93. fanquake removed the label Needs backport on May 28, 2018
  94. fanquake commented at 9:37 am on May 28, 2018: member
    Backported in #13317
  95. fanquake removed this from the "For backport" column in a project

  96. HashUnlimited referenced this in commit f8f81c2d53 on Jun 29, 2018
  97. HashUnlimited referenced this in commit 2413612b85 on Jun 29, 2018
  98. HashUnlimited referenced this in commit c80100e3ff on Jun 29, 2018
  99. MarcoFalke referenced this in commit e83d82a85c on Aug 1, 2018
  100. ccebrecos referenced this in commit 85ac9b7168 on Sep 14, 2018
  101. ccebrecos referenced this in commit 27876d57d5 on Sep 14, 2018
  102. ccebrecos referenced this in commit d4f80b9367 on Sep 14, 2018
  103. PastaPastaPasta referenced this in commit c5af021d59 on Jul 17, 2020
  104. PastaPastaPasta referenced this in commit 82d64aef95 on Jul 17, 2020
  105. PastaPastaPasta referenced this in commit f9aa9ad39f on Jul 17, 2020
  106. LarryRuane referenced this in commit b19c9ed41b on Mar 22, 2021
  107. LarryRuane referenced this in commit 25fa8176e2 on Mar 22, 2021
  108. random-zebra referenced this in commit b993c56afe on Apr 4, 2021
  109. LarryRuane referenced this in commit d9b7f45ce8 on Apr 26, 2021
  110. LarryRuane referenced this in commit c45596e287 on Apr 26, 2021
  111. UdjinM6 referenced this in commit 4a67d6ae09 on May 21, 2021
  112. LarryRuane referenced this in commit 897b4fabdd on May 27, 2021
  113. LarryRuane referenced this in commit 638b1e94be on May 27, 2021
  114. UdjinM6 referenced this in commit d3d242f2b1 on Jun 5, 2021
  115. UdjinM6 referenced this in commit bb44ac04c9 on Jun 5, 2021
  116. str4d referenced this in commit ef61fe6dbc on Sep 23, 2021
  117. str4d referenced this in commit bbaff2ab1e on Sep 23, 2021
  118. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-18 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me