JeremyRubin commented at 5:16 pm on August 5, 2016: contributor

This commit introduces a new Checkqueue that should have much better performance than the previous one. Lockfree in scare quotes because the whole checkqueue doesn’t meet the strict academic definition of lockfree, but it does have lockfree operations.

The PR includes 1 new consensus rule, some refactoring of the checkqueue interface, reimplementation of the checkqueue, and tests, and one optional optimization (could be PR’d separately).

Please see https://gist.github.com/JeremyRubin/6eec0427edec6e68755ae274b27baae5 for more information.

edit: forgot WIP label, needs more testing and review. Opening PR for that purpose. edit edit: removed WIP label, terminology confusion, it is indeed ready for testing.

Todo

Stronger Testing Suite
Figure out better wakeup mechanism
Add Benchmarking to the old checkqueue compatible with the new one.
cross platform testing (ARM, 32 bit,…)
remove static asserts for lock-free ness

JeremyRubin renamed this:
~~"Lockfree" Checkqueue Implementation~~
[WIP] "Lockfree" Checkqueue Implementation
on Aug 5, 2016

in src/checkqueue.h: in 3517995753 outdated

25@@ -26,7 +26,7 @@ class CCheckQueueControl;
26   * the master is done adding work, it temporarily joins the worker pool
27   * as an N'th worker, until all jobs are done.
28   */
29-template <typename T>
30+template <typename T, size_t J size_t W>

sipa commented at 5:29 pm on August 5, 2016:

Can you add comments explaining what J and W are?

JeremyRubin commented at 5:34 pm on August 5, 2016:

You’re viewing an outdated diff, I just did this for a clean interface change then implementation change set of commits. They are unused here.

JeremyRubin renamed this:
~~[WIP] "Lockfree" Checkqueue Implementation~~
"Lockfree" Checkqueue Implementation
on Aug 5, 2016

in src/checkqueue.h: in d8ba250b5e outdated

13+#include <thread>
14+#include <chrono>
15+#include <sstream>
16+#include <iostream>
17+// This should be ignored eventually, but needs testing to ensure this works on more platforms
18+static_assert(ATOMIC_BOOL_LOCK_FREE, "shared_status not lock free");

sipa commented at 5:44 pm on August 5, 2016:

These error messages can indicate the failed type (bool/long/llong)?

JeremyRubin commented at 2:02 am on August 6, 2016:

I added a commit which changes them from static_asserts to warnings. I’m unsure if the warning is properly x-platform; seems to not be a consistent thing. In any case, not being lockfree only effects performance, not correctness.

JeremyRubin commented at 6:17 pm on August 7, 2016: contributor

Pull tester failure seems to be semi-related to #8425.

sipa commented at 7:44 pm on August 7, 2016: member

@JeremyRubin I don’t think so. bitcoind segfaults in Travis with this PR (and all rpc tests fail). In #8425 there is just a failure in one test.

JeremyRubin commented at 8:55 pm on August 7, 2016: contributor

@sipa So when I run tests one by one they seem to not fail; when I run them using rpc_tests.py they do fail. Let me know if you have any ideas why this might be the case.

MarcoFalke commented at 9:10 pm on August 7, 2016: member

@JeremyRubin You can try qa/pull-tester/rpc-tests.py -parallel=1 to mimic running them by one by one.

JeremyRand commented at 9:48 pm on August 7, 2016: contributor

@sipa I’m flattered that you tagged me (always an honor to be tagged by you or any other big name in Bitcoin dev), but this isn’t my pull request. :)

JeremyRubin commented at 1:15 am on August 8, 2016: contributor

Thanks @MarcoFalke.

Running with -parallel=1 all tests pass, except for rawtransactions which failed with

0rawtransactions.py:
1Initializing test directory /tmp/testricvjc8u/29
2Stopping nodes
3Cleaning up
4Tests successful
5
6stderr:
7 Error: Unable to start HTTP server. See debug log for details.
8
9Pass: False, Duration: 21 s

There’s nothing in the tmpdir.

Rerunning rawtransactions alone (many times) it didn’t fail so I think it’s random failure as seen in #8425.

jonasschnelli added the label Refactoring on Aug 8, 2016

JeremyRubin closed this on Aug 8, 2016

JeremyRubin reopened this on Aug 8, 2016

laanwj renamed this:
~~"Lockfree" Checkqueue Implementation~~
[WIP] "Lockfree" Checkqueue Implementation
on Aug 13, 2016

JeremyRubin force-pushed on Aug 17, 2016

JeremyRubin force-pushed on Aug 18, 2016

JeremyRubin force-pushed on Aug 25, 2016

JeremyRubin force-pushed on Aug 29, 2016

Add the MAX_SCRIPTCHECKS const to consensus. Can be safely reduced to the minimum of various consensus parameters f9f86d71f3

refactor templating and initialization 61540f7d47

Re-Implement the CheckQueue to only use lockfree atomics 62e05da902

add tests for lockfree checkqueue 9d10b67aec

directly emplace jobs rather than create & swap 9c6b136473

Remove BOOST_MESSAGE in test/checkqueue_tests.cpp because it's deprecated and breaks the build f93524ed48

turn compiler static_assert to pragma message 849f5ca6f5

Make the testing setup of CCheckQueue threads only occur one time, which can cause tests to hang otherwise. (Maybe the one time setup is something that should make it into the SetupCCheckQueue code itself) ec12a438bb

Explicit shutdown of checkqueue as implicit thread collection seemed to cause segfaults 211d4bb2b6

Decrease number of threads for testing depending on platform fec0fdceb5

Move benchmarking code into benchmark suite f4c719910e

Forgot to add the new benchmark 93c3742d4c

Remove static instances of big things in tests. Remove some type hackery ff667f996a

Make checkqueue tests use smaller queues 1c2e855b0c

Simplify some code removing uneeded complexity and fix an initialization error that could cause hangs fc453c8753

remove deprecated Add method. Update tests to use new Add method. Some trivial refactors. 112b42b516

remove useless status wrapper 37ea0e3964

disable tests printouts 41e87d821b

Minor patch to quitting code, not certain it's race-free yet e2686e1998

Critical Fix: prev_total was checked too early in the event of a fAllOk quit. Added a test which checks that unique jobs are called. Refactored a lot of code to make emplacement happen RAII style 20a739f26e

Added code to clear memory after each round. Otherwise, a sequence of blocks with n, n-1, n-2, n-3... inputs with max data stored on n, n-1, n-2, n-3... could cause lots of memory to be used! (e.g., starting with n = 100,000 could use 0.2 TB of memory! (total block memory minus memory used for m inputs summed for m = 0 to 100,000) 47aa2b1583

Add a test that runs jobs with HUGE allocations, and makes sure they are released (by virtue of not crashing) a7842e9cd3

reformat/move only in checkqueue tests 39c1113964

go back to using placement new for main.cpp emplacement rather than swap 43615c0502

Free memory immediately after evaluating check 9cb002b3fe

rework some tests 147bdcc84c

Cleanup Failure test and add a new test which checks fAllOk resetting a63fd901fc

Added a test that checks that cleanup happens before new job and refactored requires to give better printout 11dc3b34d8

Fix cleanup waiting position. Change test_dump_log to use field RT_N_THREADS fd9da3305b

Refactor some code to be a bit cleaner for inserter use 11ea533f27

Add core dump callstack dumps to travis (see http://jsteemann.github.io/blog/2014/10/30/getting-core-dumps-of-failed-travisci-builds/)

Conflicts:
	.travis.yml

ef8af35062

remove uneeded include 0fbcf7aaab

Fixed travis file for finding core dump more likely 82147ff377

make travis search root for corefile 1381483148

Use unique_ptr in miner_tests 677f91502d

Made CBlockIndex and uint256 hashes allocated in RAII containers d2c886df1b

add test-suite.log printing to travis 1ca2941679

Fixed a bug where a worker could observe nAvail and fAllOk from previous round

This fixes a bug where a deallocated check could be called.

6a0582dd4f

remove boost::thread from test/test_bitcoin cc85c7ed79

Ensure that at least 2 cores are used for benchmarking 50c0ac835f

add benchmarking to travis af2bd76294

Got rid of manual placement new in favor of using vector emplace_back. 2acb1ec81c

Restore RAII style for emplacement 05b14517f1

Fixed OSX build 969c524482

Make benchmarking cross platform friendlier by: removing boost requirement, switching to chrono over sys/time d0fb3b87f9

Ugly hack to print out tests as they are run to mitigate travis timeouts. Someone with more knowledge here should fix this 1a57ba3a2c

remove uneccesary travis stuff and disable benching till it works 3bedc5d798

speed up prevector tests by parallelization 5210b405ce

BOOST_CHECK_MESSAGE doesn't exist in older versions of boost, so replacing it with it's definition 72e42967b0

JeremyRubin force-pushed on Aug 31, 2016

in src/main.cpp: in 72e42967b0

1999-                    pvChecks->push_back(CScriptCheck());
2000-                    check.swap(pvChecks->back());
2001-                } else if (!check()) {
2002+                if (emplacer) {
2003+                    emplacer(CScriptCheck(*coins, tx, i, flags, cacheStore));
2004+                    continue;

jtimon commented at 10:50 pm on September 1, 2016:

I tend to prefer else over continue, but it is understandable is just to avoid the re-indent and minimize disruption. I would the continue be gone “for free” when/if CheckInputs needs a reindent (btw, unrelated, we’ve been owing one to Consensus::CheckTxInputs for a while…)

JeremyRubin commented at 3:47 am on September 2, 2016:

Yeah it was a little bit crowded there.

jtimon commented at 10:54 pm on September 1, 2016: contributor

Randomly noting this won’t interfere with #8498 at all (well, probably in obvious-to-resolve ways). utACK main.o is all I can give for now.

I heard %10 to 20% benchmark improvement…happy to benchmark if someone provides dumbed down instructions or they exist already and I missed them. ping @TheBlueMatt

JeremyRubin commented at 11:15 pm on September 1, 2016: contributor

@jtimon in an hour or less i should have a refactored version which has benchmarks added as commit one so you can test benching on all versions.

JeremyRubin commented at 0:57 am on September 2, 2016: contributor

@laanwj I’d like to get #8632 and some variant of #8633 merged if possible before I PR the new version, otherwise when I push the cleaned up version I won’t pass tests due to timeouts.

You can see the restructured version here if anyone wants to review right away: https://github.com/bitcoin/bitcoin/compare/master...JeremyRubin:lockfree-checkqueue-restructured

JeremyRubin commented at 1:09 am on September 2, 2016: contributor

Some benchmarks from the new version BTW:

Before

0#Benchmark,count,min,max,average
1CCheckQueueSpeed,6144,0.000019055791199,0.000413492321968,0.000275931825551
2CCheckQueueSpeedPrevectorJob,416,0.001668065786362,0.004345431923866,0.002541266954862

After:

0#Benchmark,count,min,max,average
1CCheckQueueSpeed,6144,0.000050746137276,0.000491065904498,0.000192628746542
2CCheckQueueSpeedPrevectorJob,768,0.000427763909101,0.003052964806557,0.001347809719543

CCheckQueueSpeed is a trivial fake job (benchmark slightly unfair, as the emplacement vs resize call probably costs the time). New code takes 70% of the time.

CCheckQueueSpeedPrevectorJob is a job that contains a prevector that (deterministically randomly) is either direct or indirect. My code takes 50% of the time.

edit: just noting, there was a small performance bug (using wrong form of emplace_back) and with that the vector test is at 20% :)

in src/consensus/consensus.h: in 72e42967b0

18@@ -19,6 +19,7 @@ static const int64_t MAX_BLOCK_SIGOPS_COST = 80000;
19 /** Coinbase transaction outputs can only be spent after this number of new blocks (network rule) */
20 static const int COINBASE_MATURITY = 100;
21 
22+static const unsigned long long MAX_SCRIPTCHECKS = static_cast<unsigned long long>(MAX_BLOCK_SERIALIZED_SIZE)/41;

luke-jr commented at 9:00 am on September 10, 2016:

Why is 41 safe? Does this consider that invalid signatures/keys could be smaller?

sipa commented at 9:02 am on September 10, 2016:

41 bytes is the minimum size of a serialized CTxIn, even when the scriptSig is empty.

jtimon commented at 2:15 am on October 20, 2016: contributor

Needs rebase

fanquake commented at 1:23 pm on January 12, 2017: member

@JeremyRubin What’s the status of this?

JeremyRubin commented at 2:14 pm on January 12, 2017: contributor

@fanquake I think I have a rebased version of this laying around somewhere. Have been doing experimentation to see if there’s something simpler (read: more reviewable) that accomplishes much of the same. I’ll try to see if there’s a reasonable version around. If I recall correctly, most of the benefit of the faster CheckQueue was not observable without the cuckoocache, so I focused on that first (lock free queue doesn’t matter much if all jobs lock).

JeremyRubin closed this on Mar 7, 2017

MarcoFalke locked this on Sep 8, 2021

[WIP] “Lockfree” Checkqueue Implementation #8464

Todo