[WIP] “Lockfree” Checkqueue Implementation #8464

pull JeremyRubin wants to merge 49 commits into bitcoin:master from JeremyRubin:lockfree-checkqueue-squashed changing 18 files +1282 −360
  1. JeremyRubin commented at 5:16 pm on August 5, 2016: contributor

    This commit introduces a new Checkqueue that should have much better performance than the previous one. Lockfree in scare quotes because the whole checkqueue doesn’t meet the strict academic definition of lockfree, but it does have lockfree operations.

    The PR includes 1 new consensus rule, some refactoring of the checkqueue interface, reimplementation of the checkqueue, and tests, and one optional optimization (could be PR’d separately).

    Please see https://gist.github.com/JeremyRubin/6eec0427edec6e68755ae274b27baae5 for more information.

    edit: forgot WIP label, needs more testing and review. Opening PR for that purpose. edit edit: removed WIP label, terminology confusion, it is indeed ready for testing.

    Todo

    • Stronger Testing Suite
    • Figure out better wakeup mechanism
    • Add Benchmarking to the old checkqueue compatible with the new one.
    • cross platform testing (ARM, 32 bit,…)
    • remove static asserts for lock-free ness
  2. JeremyRubin renamed this:
    "Lockfree" Checkqueue Implementation
    [WIP] "Lockfree" Checkqueue Implementation
    on Aug 5, 2016
  3. in src/checkqueue.h: in 3517995753 outdated
    25@@ -26,7 +26,7 @@ class CCheckQueueControl;
    26   * the master is done adding work, it temporarily joins the worker pool
    27   * as an N'th worker, until all jobs are done.
    28   */
    29-template <typename T>
    30+template <typename T, size_t J size_t W>
    


    sipa commented at 5:29 pm on August 5, 2016:
    Can you add comments explaining what J and W are?

    JeremyRubin commented at 5:34 pm on August 5, 2016:
    You’re viewing an outdated diff, I just did this for a clean interface change then implementation change set of commits. They are unused here.
  4. JeremyRubin renamed this:
    [WIP] "Lockfree" Checkqueue Implementation
    "Lockfree" Checkqueue Implementation
    on Aug 5, 2016
  5. in src/checkqueue.h: in d8ba250b5e outdated
    13+#include <thread>
    14+#include <chrono>
    15+#include <sstream>
    16+#include <iostream>
    17+// This should be ignored eventually, but needs testing to ensure this works on more platforms
    18+static_assert(ATOMIC_BOOL_LOCK_FREE, "shared_status not lock free");
    


    sipa commented at 5:44 pm on August 5, 2016:
    These error messages can indicate the failed type (bool/long/llong)?

    JeremyRubin commented at 2:02 am on August 6, 2016:
    I added a commit which changes them from static_asserts to warnings. I’m unsure if the warning is properly x-platform; seems to not be a consistent thing. In any case, not being lockfree only effects performance, not correctness.
  6. JeremyRubin commented at 6:17 pm on August 7, 2016: contributor
    Pull tester failure seems to be semi-related to #8425.
  7. sipa commented at 7:44 pm on August 7, 2016: member
    @JeremyRubin I don’t think so. bitcoind segfaults in Travis with this PR (and all rpc tests fail). In #8425 there is just a failure in one test.
  8. JeremyRubin commented at 8:55 pm on August 7, 2016: contributor
    @sipa So when I run tests one by one they seem to not fail; when I run them using rpc_tests.py they do fail. Let me know if you have any ideas why this might be the case.
  9. MarcoFalke commented at 9:10 pm on August 7, 2016: member
    @JeremyRubin You can try qa/pull-tester/rpc-tests.py -parallel=1 to mimic running them by one by one.
  10. JeremyRand commented at 9:48 pm on August 7, 2016: contributor
    @sipa I’m flattered that you tagged me (always an honor to be tagged by you or any other big name in Bitcoin dev), but this isn’t my pull request. :)
  11. JeremyRubin commented at 1:15 am on August 8, 2016: contributor

    Thanks @MarcoFalke.

    Running with -parallel=1 all tests pass, except for rawtransactions which failed with

    0rawtransactions.py:
    1Initializing test directory /tmp/testricvjc8u/29
    2Stopping nodes
    3Cleaning up
    4Tests successful
    5
    6stderr:
    7 Error: Unable to start HTTP server. See debug log for details.
    8
    9Pass: False, Duration: 21 s
    

    There’s nothing in the tmpdir.

    Rerunning rawtransactions alone (many times) it didn’t fail so I think it’s random failure as seen in #8425.

  12. jonasschnelli added the label Refactoring on Aug 8, 2016
  13. JeremyRubin closed this on Aug 8, 2016

  14. JeremyRubin reopened this on Aug 8, 2016

  15. laanwj renamed this:
    "Lockfree" Checkqueue Implementation
    [WIP] "Lockfree" Checkqueue Implementation
    on Aug 13, 2016
  16. JeremyRubin force-pushed on Aug 17, 2016
  17. JeremyRubin force-pushed on Aug 18, 2016
  18. JeremyRubin force-pushed on Aug 25, 2016
  19. JeremyRubin force-pushed on Aug 29, 2016
  20. Add the MAX_SCRIPTCHECKS const to consensus. Can be safely reduced to the minimum of various consensus parameters f9f86d71f3
  21. refactor templating and initialization 61540f7d47
  22. Re-Implement the CheckQueue to only use lockfree atomics 62e05da902
  23. add tests for lockfree checkqueue 9d10b67aec
  24. directly emplace jobs rather than create & swap 9c6b136473
  25. Remove BOOST_MESSAGE in test/checkqueue_tests.cpp because it's deprecated and breaks the build f93524ed48
  26. turn compiler static_assert to pragma message 849f5ca6f5
  27. Make the testing setup of CCheckQueue threads only occur one time, which can cause tests to hang otherwise. (Maybe the one time setup is something that should make it into the SetupCCheckQueue code itself) ec12a438bb
  28. Explicit shutdown of checkqueue as implicit thread collection seemed to cause segfaults 211d4bb2b6
  29. Decrease number of threads for testing depending on platform fec0fdceb5
  30. Move benchmarking code into benchmark suite f4c719910e
  31. Forgot to add the new benchmark 93c3742d4c
  32. Remove static instances of big things in tests. Remove some type hackery ff667f996a
  33. Make checkqueue tests use smaller queues 1c2e855b0c
  34. Simplify some code removing uneeded complexity and fix an initialization error that could cause hangs fc453c8753
  35. remove deprecated Add method. Update tests to use new Add method. Some trivial refactors. 112b42b516
  36. remove useless status wrapper 37ea0e3964
  37. disable tests printouts 41e87d821b
  38. Minor patch to quitting code, not certain it's race-free yet e2686e1998
  39. Critical Fix: prev_total was checked too early in the event of a fAllOk quit. Added a test which checks that unique jobs are called. Refactored a lot of code to make emplacement happen RAII style 20a739f26e
  40. Added code to clear memory after each round. Otherwise, a sequence of blocks with n, n-1, n-2, n-3... inputs with max data stored on n, n-1, n-2, n-3... could cause lots of memory to be used! (e.g., starting with n = 100,000 could use 0.2 TB of memory! (total block memory minus memory used for m inputs summed for m = 0 to 100,000) 47aa2b1583
  41. Add a test that runs jobs with HUGE allocations, and makes sure they are released (by virtue of not crashing) a7842e9cd3
  42. reformat/move only in checkqueue tests 39c1113964
  43. go back to using placement new for main.cpp emplacement rather than swap 43615c0502
  44. Free memory immediately after evaluating check 9cb002b3fe
  45. rework some tests 147bdcc84c
  46. Cleanup Failure test and add a new test which checks fAllOk resetting a63fd901fc
  47. Added a test that checks that cleanup happens before new job and refactored requires to give better printout 11dc3b34d8
  48. Fix cleanup waiting position. Change test_dump_log to use field RT_N_THREADS fd9da3305b
  49. Refactor some code to be a bit cleaner for inserter use 11ea533f27
  50. Add core dump callstack dumps to travis (see http://jsteemann.github.io/blog/2014/10/30/getting-core-dumps-of-failed-travisci-builds/)
    Conflicts:
    	.travis.yml
    ef8af35062
  51. remove uneeded include 0fbcf7aaab
  52. Fixed travis file for finding core dump more likely 82147ff377
  53. make travis search root for corefile 1381483148
  54. Use unique_ptr in miner_tests 677f91502d
  55. Made CBlockIndex and uint256 hashes allocated in RAII containers d2c886df1b
  56. add test-suite.log printing to travis 1ca2941679
  57. Fixed a bug where a worker could observe nAvail and fAllOk from previous round
    This fixes a bug where a deallocated check could be called.
    6a0582dd4f
  58. remove boost::thread from test/test_bitcoin cc85c7ed79
  59. Ensure that at least 2 cores are used for benchmarking 50c0ac835f
  60. add benchmarking to travis af2bd76294
  61. Got rid of manual placement new in favor of using vector emplace_back. 2acb1ec81c
  62. Restore RAII style for emplacement 05b14517f1
  63. Fixed OSX build 969c524482
  64. Make benchmarking cross platform friendlier by: removing boost requirement, switching to chrono over sys/time d0fb3b87f9
  65. Ugly hack to print out tests as they are run to mitigate travis timeouts. Someone with more knowledge here should fix this 1a57ba3a2c
  66. remove uneccesary travis stuff and disable benching till it works 3bedc5d798
  67. speed up prevector tests by parallelization 5210b405ce
  68. BOOST_CHECK_MESSAGE doesn't exist in older versions of boost, so replacing it with it's definition 72e42967b0
  69. JeremyRubin force-pushed on Aug 31, 2016
  70. in src/main.cpp: in 72e42967b0
    1999-                    pvChecks->push_back(CScriptCheck());
    2000-                    check.swap(pvChecks->back());
    2001-                } else if (!check()) {
    2002+                if (emplacer) {
    2003+                    emplacer(CScriptCheck(*coins, tx, i, flags, cacheStore));
    2004+                    continue;
    


    jtimon commented at 10:50 pm on September 1, 2016:
    I tend to prefer else over continue, but it is understandable is just to avoid the re-indent and minimize disruption. I would the continue be gone “for free” when/if CheckInputs needs a reindent (btw, unrelated, we’ve been owing one to Consensus::CheckTxInputs for a while…)

    JeremyRubin commented at 3:47 am on September 2, 2016:
    Yeah it was a little bit crowded there.
  71. jtimon commented at 10:54 pm on September 1, 2016: contributor

    Randomly noting this won’t interfere with #8498 at all (well, probably in obvious-to-resolve ways). utACK main.o is all I can give for now.

    I heard %10 to 20% benchmark improvement…happy to benchmark if someone provides dumbed down instructions or they exist already and I missed them. ping @TheBlueMatt

  72. JeremyRubin commented at 11:15 pm on September 1, 2016: contributor
    @jtimon in an hour or less i should have a refactored version which has benchmarks added as commit one so you can test benching on all versions.
  73. JeremyRubin commented at 0:57 am on September 2, 2016: contributor

    @laanwj I’d like to get #8632 and some variant of #8633 merged if possible before I PR the new version, otherwise when I push the cleaned up version I won’t pass tests due to timeouts.

    You can see the restructured version here if anyone wants to review right away: https://github.com/bitcoin/bitcoin/compare/master...JeremyRubin:lockfree-checkqueue-restructured

  74. JeremyRubin commented at 1:09 am on September 2, 2016: contributor

    Some benchmarks from the new version BTW:

    Before

    0#Benchmark,count,min,max,average
    1CCheckQueueSpeed,6144,0.000019055791199,0.000413492321968,0.000275931825551
    2CCheckQueueSpeedPrevectorJob,416,0.001668065786362,0.004345431923866,0.002541266954862
    

    After:

    0#Benchmark,count,min,max,average
    1CCheckQueueSpeed,6144,0.000050746137276,0.000491065904498,0.000192628746542
    2CCheckQueueSpeedPrevectorJob,768,0.000427763909101,0.003052964806557,0.001347809719543
    

    CCheckQueueSpeed is a trivial fake job (benchmark slightly unfair, as the emplacement vs resize call probably costs the time). New code takes 70% of the time.

    CCheckQueueSpeedPrevectorJob is a job that contains a prevector that (deterministically randomly) is either direct or indirect. My code takes 50% of the time.

    edit: just noting, there was a small performance bug (using wrong form of emplace_back) and with that the vector test is at 20% :)

  75. in src/consensus/consensus.h: in 72e42967b0
    18@@ -19,6 +19,7 @@ static const int64_t MAX_BLOCK_SIGOPS_COST = 80000;
    19 /** Coinbase transaction outputs can only be spent after this number of new blocks (network rule) */
    20 static const int COINBASE_MATURITY = 100;
    21 
    22+static const unsigned long long MAX_SCRIPTCHECKS = static_cast<unsigned long long>(MAX_BLOCK_SERIALIZED_SIZE)/41;
    


    luke-jr commented at 9:00 am on September 10, 2016:
    Why is 41 safe? Does this consider that invalid signatures/keys could be smaller?

    sipa commented at 9:02 am on September 10, 2016:
    41 bytes is the minimum size of a serialized CTxIn, even when the scriptSig is empty.
  76. jtimon commented at 2:15 am on October 20, 2016: contributor
    Needs rebase
  77. fanquake commented at 1:23 pm on January 12, 2017: member
    @JeremyRubin What’s the status of this?
  78. JeremyRubin commented at 2:14 pm on January 12, 2017: contributor
    @fanquake I think I have a rebased version of this laying around somewhere. Have been doing experimentation to see if there’s something simpler (read: more reviewable) that accomplishes much of the same. I’ll try to see if there’s a reasonable version around. If I recall correctly, most of the benefit of the faster CheckQueue was not observable without the cuckoocache, so I focused on that first (lock free queue doesn’t matter much if all jobs lock).
  79. JeremyRubin closed this on Mar 7, 2017

  80. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me