furszy
commented at 1:36 pm on January 25, 2023:
member
The current procedure for building the block filter index involves processing filters one at a time;
Reading blocks, undo data, and previous headers from disk sequentially.
This PR introduces a new mechanism to perform the work concurrently. Dividing the filters
generation workload among a pool of workers that can be configured by the user,
significantly increasing the speed of the index construction process.
The same concurrent processing model has been applied to the transactions index as well.
The newly introduced init flag -indexworkers=<n> enables the concurrent sync
behavior.
Where “n” is the number of worker threads that will be spawned at startup to create ranges
of block filters during the initial sync process. Destroying the workers pool once the
initial sync completes.
Note: by default, the parallelized sync process is not enabled.
Now the juicy part:
In my computer, with the node in debug mode and on IBD, with -indexworkers=4, the
block filter index generation took less than an hour. While, in master, the sync took more than 7 hours.
Important Note:
As the access to the block data on disk is protected by cs_main, this new feature runs substantially
faster when the node is not in IBD.
Footnote:
This is part of a larger project :).
DrahtBot
commented at 1:36 pm on January 25, 2023:
contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Conflicts
Reviewers, this pull request conflicts with the following ones:
#30469 (index: Fix coinstats overflow and introduce index versioning by fjahr)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
DrahtBot added the label
UTXO Db and Indexes
on Jan 25, 2023
furszy force-pushed
on Jan 25, 2023
furszy force-pushed
on Jan 25, 2023
furszy force-pushed
on Jan 25, 2023
Sjors
commented at 2:38 pm on January 26, 2023:
member
Cool, will take it for a spin…
furszy force-pushed
on Jan 28, 2023
furszy
commented at 3:18 pm on January 28, 2023:
member
Cool @Sjors, just pushed a small update. Found a little bug.
Going to add an important note to the PR description (because otherwise testing results will vary a lot):
As the access to the block data on disk is protected by cs_main, this new feature runs substantially
faster when the node is not in IBD.
(where “substantially” here means full index sync, with 5 workers, in less than 20 minutes in my computer).
furszy force-pushed
on Jan 28, 2023
mzumsande
commented at 10:49 pm on January 29, 2023:
contributor
It’s probably much easier suggested than done, but did you attempt to implement parallelization in a more general way so that other indices could benefit from it as well? On first glance, txindex, and the indices suggested in PRs (#24539, #26951) seem to be parallelizable as well (not coinstatsindex though).
furszy
commented at 1:37 pm on January 30, 2023:
member
It’s probably much easier suggested than done, but did you attempt to implement parallelization in a more general way so that other indices could benefit from it as well?
Yeah, that is part of the plan. I started with the block filter index because it requires an special treatment that txindex parallelization does not require (block/undo data reading and block filters creation can be parallelized but writing must be done sequentially due the need to link filter headers to their predecessors to create the filters-chain on disk).
My idea was to start reviewing this one, so the process gets as clean as possible, and then move forward with the generalization step. It’s usually more natural to abstract processes when the specific cases are well-defined.
furszy force-pushed
on Jan 30, 2023
w0xlt
commented at 4:42 pm on January 30, 2023:
contributor
Concept ACK
Perhaps it could have separate parallelization and index logic. So it could be reused in other indexes.
furszy force-pushed
on Jan 30, 2023
furszy force-pushed
on Jan 30, 2023
Sjors
commented at 1:47 pm on February 7, 2023:
member
Building the index on (AMD Ryzen 7950X, blocks stored on SSD):
master @ fe86616bb4ad0c4296d34299bc2e2f0fca1fe936: 35'15" (mostly 1 alternating CPU thread)
this PR (rebased)
n=8: 5'20" (uses about 8 of 32 CPU threads as expected)
n=32: 4'26" (pleasantly close to 100% CPU usage with a dip every 10 seconds, but it drops to only 1 CPU in the last minute or two)
I made sure to not load any wallets and disabled other indexes.
I didn’t test if the index was correct.
I wonder if, for users without this index, it would be faster to generate the index, rescan the wallet and then delete it again. Combined with #26951 you would only have to generate filters up to the age of the wallet (IIUC, cc @pstratem).
Note to self for future benchmarks: -txindex takes 1:07'14", -coinstatsindex takes 3.5 hours.
furszy
commented at 2:18 pm on February 7, 2023:
member
Could also give it a run rebased on top #27006.
On master, the index initial sync is slower when the node is in IBD because the index thread has to compete for access to block data on disk through cs_main acquisition.
I didn’t test if the index was correct.
The PR contain a test verifying it.
Side note:
I’m working on generalizing the parallelization flow so other indexes, like the txindex and #26951 can make use of it too.
DrahtBot added the label
Needs rebase
on Feb 17, 2023
furszy force-pushed
on Feb 21, 2023
furszy force-pushed
on Feb 21, 2023
DrahtBot removed the label
Needs rebase
on Feb 21, 2023
furszy force-pushed
on Feb 22, 2023
furszy force-pushed
on Feb 27, 2023
furszy renamed this:
index: blockfilter initial sync speedup, parallelize process
index: blockfilter and txindex initial sync speedup, parallelize process
on Feb 27, 2023
furszy renamed this:
index: blockfilter and txindex initial sync speedup, parallelize process
index: blockfilter initial sync speedup, parallelize process
on Feb 27, 2023
furszy force-pushed
on Feb 27, 2023
furszy
commented at 12:40 pm on February 28, 2023:
member
PR updated, most of it implementation has changed.
The news are:
Decreased ThreadSync cs_main lock contention.
Removed CBlockIndex access from the child indexes internals.
Implemented generic workers pool.
Introduced a last header cache for the Block Filter index. Avoiding disk reads on every new processed block.
Enabled parallel sync on the tx index.
Important Note:
The introduced workers pool spawned by the -indexworkers init arg is shared among all the enabled indexes that support parallel sync.
The implementation uses std::any mainly to simplify the patch-set, the base class template form of it requires a larger set of changes.
Side note: in a first glance and without going too far over the coinstats index implementation, I would say that it could also be parallelized. But will leave it for a follow-up to not continue expanding the PR size.
Future (doesn’t need to be included here, just mentioning so the path is clear):
I’m working on decoupling the initial sync logic into a separate structure, so indexes subscribe to events instead of reading blocks from disk by themselves.
DrahtBot added the label
Needs rebase
on May 11, 2023
DrahtBot removed the label
Needs rebase
on May 11, 2023
furszy force-pushed
on May 11, 2023
DrahtBot added the label
Needs rebase
on Jun 12, 2023
furszy force-pushed
on Jun 21, 2023
DrahtBot removed the label
Needs rebase
on Jun 21, 2023
DrahtBot added the label
Needs rebase
on Jun 30, 2023
furszy force-pushed
on Jun 30, 2023
DrahtBot removed the label
Needs rebase
on Jun 30, 2023
DrahtBot added the label
CI failed
on Jun 30, 2023
DrahtBot removed the label
CI failed
on Jul 4, 2023
DrahtBot added the label
Needs rebase
on Jul 6, 2023
furszy force-pushed
on Aug 9, 2023
DrahtBot removed the label
Needs rebase
on Aug 9, 2023
DrahtBot added the label
CI failed
on Aug 29, 2023
furszy force-pushed
on Aug 29, 2023
DrahtBot
commented at 8:34 pm on September 4, 2023:
contributor
Could mark as draft while CI is red?
furszy force-pushed
on Sep 4, 2023
DrahtBot
commented at 5:47 am on September 6, 2023:
contributor
It looks like tsan failed, but there is no log, when there should be a log. Maybe it was accidentally removed by #27667 ?
DrahtBot added the label
Needs rebase
on Sep 12, 2023
furszy force-pushed
on Sep 12, 2023
DrahtBot removed the label
CI failed
on Sep 12, 2023
DrahtBot removed the label
Needs rebase
on Sep 12, 2023
DrahtBot added the label
Needs rebase
on Oct 2, 2023
furszy force-pushed
on Oct 18, 2023
DrahtBot removed the label
Needs rebase
on Oct 18, 2023
DrahtBot added the label
CI failed
on Oct 18, 2023
maflcko
commented at 9:16 am on October 25, 2023:
member
CI is still red. Also, how would this work with AU?
furszy force-pushed
on Oct 27, 2023
furszy force-pushed
on Oct 27, 2023
DrahtBot removed the label
CI failed
on Oct 27, 2023
Sjors
commented at 11:09 am on November 7, 2023:
member
That PR currently does not cleanly cherry-pick on top of this PR, not can I (trivially) rebase this PR on top top of it. Happy to try if you can make a branch.
I just tried it again, deleting the blockfilterindex and rebuilding it. My impression is that it’s going slower than before and I’m not seeing much CPU activity.
I also noticed the shutdown takes a very long time, with the indexer (?) threads sticking around for many minutes:
in
src/index/blockfilterindex.h:46
in
003f076a15outdated
41@@ -42,6 +42,9 @@ class BlockFilterIndex final : public BaseIndex
42 /** cache of block hash to filter header, to avoid disk access when responding to getcfcheckpt. */
43 std::unordered_map<uint256, uint256, FilterHeaderHasher> m_headers_cache GUARDED_BY(m_cs_headers_cache);
4445+ // Last computed header to avoid disk reads at every new block.
46+ uint256 last_header{};
TheCharlatan
commented at 12:58 pm on November 24, 2023:
NIt: Might take this opportunity to add the <uint256.h> header?
TheCharlatan
commented at 3:32 pm on November 24, 2023:
If I run IWYU locally, it reports the following headers as missing:
0#include<cstddef>// for size_t
1#include<algorithm>// for max
2#include<atomic>// for atomic
3#include<functional>// for function
4#include<memory>// for make_shared
5#include<stdexcept>// for runtime_error
6#include<utility>// for move, swap
7#include<vector>// for vector
Because I initially thought on pushing extra work into the thread pool queue so then the originator thread, once it finishes calculating the different tasks, can take the workload on its active wait. But I ended up implementing it differently to re-use the same piece of code for the single-thread approach (Can check it looking for where it says “// Otherwise, this is an active-wait, so we process blocks until all workers finish.”)
Still, I think that having a method to process tasks manually make sense for a general thread pool implementation. But could remove it if you are strong on it.
andrewtoth
commented at 1:00 pm on October 26, 2024:
This would make sense if we wanted to reuse this for CCheckQueue. Then, we could just loop and ProcessTask on the main thread once when we call Wait.
in
src/util/threadpool.h:56
in
46e74257f2outdated
45+ m_work_queue.pop();
46+ }
47+
48+ // Execute the task without the lock
49+ WITH_REVERSE_LOCK(wait_lock, task());
50+ }
TheCharlatan
commented at 12:37 pm on November 25, 2023:
I don’t think it would be necessary for the tasks themselves, since you already demonstrate in the tests how to retrieve exceptions, but I think some kind of exception handling and infomation logging similar to the one of util::TraceThread would still be nice. Did you choose not to use TraceThread on purpose?
I don’t think it would be necessary for the tasks themselves, since you already demonstrate in the tests how to retrieve exceptions, but I think some kind of exception handling and infomation logging similar to the one of util::TraceThread would still be nice. Did you choose not to use TraceThread on purpose?
I probably wasn’t aware of TraceThread when this was implemented. Changing it.. Thanks!
in
src/test/threadpool_tests.cpp:25
in
46e74257f2outdated
19+ // 6) Busy workers, help them by processing tasks from outside.
20+
21+ // Test case 1, submit tasks and verify completion.
22+ {
23+ int num_workers = 3;
24+ int num_tasks = 50;
TheCharlatan
commented at 12:44 pm on November 25, 2023:
Could use this test to stress test the pool a bit with more tasks?
Could use this test to stress test the pool a bit with more tasks?
What about creating a fuzzing test instead? I’m not sure about the benefits of adding more tasks here. I can only foresee other developers complaining about the increased unit test times.
in
src/test/threadpool_tests.cpp:29
in
46e74257f2outdated
TheCharlatan
commented at 12:54 pm on November 25, 2023:
contributor
This all looks pretty promising. I left some feedback before continuing with the latter half of the PR.
in
src/index/base.h:29
in
7b2d4d471aoutdated
23 namespace Consensus {
24 struct Params;
25 }
2627+/** Number of concurrent jobs during the initial sync process */
28+const int16_t INDEX_WORKERS_COUNT = 0;
TheCharlatan
commented at 4:22 pm on November 25, 2023:
1980+ }
1981+
1982 // Start threads
1983- for (auto index : node.indexes) if (!index->StartBackgroundSync()) return false;
1984+ for (auto index : node.indexes) {
1985+ // todo: Only provide thread pool to indexes that supports parallel sync
TheCharlatan
commented at 4:46 pm on November 25, 2023:
In commit 7b2d4d471ae251d4c22184dda26b73993a65eff8: Could the AllowParallelSync method be moved to this commit?
Nitty nit: Keep the order of workers_count and work_chunk consistent?
What do you mean? Set workers_count first, then work_chunk always in the same order?
in
src/index/base.cpp:247
in
727be9d500outdated
244+ it_start = WITH_LOCK(::cs_main, return NextSyncBlock(it_end, m_chainstate->m_chain));
245+ }
246+ }
247248+ // If we have only one block to process, run it directly.
249+ // Otherwise, this is an active-wait, so we process blocks until all workers finish.
TheCharlatan
commented at 10:38 pm on November 25, 2023:
Would so we also process blocks in this thread until all workers finish. be more accurate?
TheCharlatan
commented at 10:52 pm on November 25, 2023:
In commit de69506c090f530f34d78478db39898bd4e2cbba: I think the test introduction should be squashed into the following commit, or do you want to demonstrate something here first?
In commit de69506: I think the test introduction should be squashed into the following commit, or do you want to demonstrate something here first?
hmm, will squash them. I don’t remember why I split them (2 years ago).
TheCharlatan
commented at 11:01 pm on November 25, 2023:
contributor
Concept ACK.
Done with my first pass, still want to think some of the approaches here over a bit.
in
src/init.cpp:2085
in
373b0bb077outdated
1974+ if (node.args->IsArgSet("-indexworkers")) {
1975+ int index_workers = node.args->GetIntArg("-indexworkers", INDEX_WORKERS_COUNT);
1976+ if (index_workers < 0 || index_workers > 100) return InitError(_("Invalid -indexworkers arg"));
1977+
1978+ thread_pool = std::make_shared<ThreadPool>();
1979+ thread_pool->Start(index_workers);
TheCharlatan
commented at 4:08 pm on November 26, 2023:
Since the constructor and Start are always called in succession in this patch set, you could make ThreadPool more RAII styled if the Start method were removed and instead placed in the constructor. Could also make sense to do the same with the destructor and Stop.
DrahtBot added the label
Needs rebase
on Dec 14, 2023
furszy
commented at 3:22 pm on February 5, 2024:
member
Focus is on #28955, which contains a good number of commits decoupled from this PR.
Will come back here after it.
achow101 referenced this in commit
0b96a1925e
on Mar 20, 2024
furszy force-pushed
on Mar 20, 2024
DrahtBot removed the label
Needs rebase
on Mar 20, 2024
DrahtBot added the label
CI failed
on Mar 21, 2024
DrahtBot
commented at 1:36 am on March 21, 2024:
contributor
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
Sjors
commented at 5:44 pm on March 21, 2024:
member
Running it again, let’s see how quick it is…
Since I might set indexworkers=32 in my config file, which is great for the initial sync and if it needs to do a big catchup. But does it cause much overhead when it’s up to date? Maybe the threads should spin down if there’s not much work.
What happens when there are multiple readBlockFromDisk calls around the same time? I don’t see any (obvious) thread locking happening in CAutoFile. Since I keep block files on a spinning disk (-blocksdir), I wonder if that potentially slows things down - compared to fetching one file in a single uninterrupted operation.
So far (block 200K) my spinning disk is making a ton of noise and CPU activity is negligible.
It took 4 hours and 23 minutes. That’s an improvement over the 5 hours 46 minutes without: #28955 (comment)
Note that last year I tested with SSD - which resulted in a 16x improvement #26966 (comment). This time I used a spinning disk. CPU activity was negligible all the way.
furszy
commented at 5:54 pm on March 21, 2024:
member
Running it again, let’s see how quick it is…
Since I might set indexworkers=32 in my config file, which is great for the initial sync and if it needs to do a big catchup. But does it cause much overhead when it’s up to date? Maybe the threads should spin down if there’s not much work.
Yeah sure. The thread pool can be destructed once all indexes initial sync finish (once the index is synced, it starts receiving blocks through the validation signals and does not use the initial sync workers anymore).
I’m currently tackling theCharlatan’s feedback, and thinking about some improvements. Will add this change too on the next push.
DrahtBot removed the label
CI failed
on Mar 21, 2024
furszy force-pushed
on Mar 23, 2024
furszy
commented at 1:34 pm on March 23, 2024:
member
Thanks for the in-depth review theCharlatan! Most comments were tackled. And thanks for testing Sjors!
Now that we’ve reached this point (after #28955 merge), I’m rethinking and polishing the design. I’m not totally convinced about the current implementation anymore. We’ve grown a lot since this was implemented two years ago.
Other than that, Sjors had a nice idea that I want to try out (or at least design this in a way so that the idea can be implemented in isolation in the future) –> Instead of dividing the work based on block ranges, it can be divided based on block file ranges. This would minimize the needle movement on spinning disks.
DrahtBot added the label
CI failed
on Mar 23, 2024
DrahtBot
commented at 2:10 pm on March 23, 2024:
contributor
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
Sjors
commented at 9:06 am on March 25, 2024:
member
Instead of dividing the work based on block ranges, it can be divided based on block file ranges. This would minimize the needle movement on spinning disks.
It’s possible that this can be achieved with block ranges too. But you have to make sure only one block range is read at any given time. I.e. the other threads should wait while a disk read is in progress. I suspect the problem lies in having 32 threads trying to read different things at the same time, and then operating system goes and fetches a few kilobytes, a few kilobytes there, etc. Though I haven’t measured this.
DrahtBot removed the label
CI failed
on Apr 4, 2024
DrahtBot
commented at 7:54 am on April 5, 2024:
contributor
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot removed the label
CI failed
on Apr 5, 2024
DrahtBot
commented at 8:37 am on April 5, 2024:
contributor
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot removed the label
CI failed
on Apr 5, 2024
DrahtBot added the label
CI failed
on Apr 5, 2024
furszy force-pushed
on Apr 7, 2024
DrahtBot
commented at 5:19 pm on April 17, 2024:
contributor
Are you still working on this? If not, this could be moved to draft for as long as the tsan CI is failing.
furszy marked this as a draft
on Apr 18, 2024
DrahtBot added the label
Needs rebase
on Apr 30, 2024
furszy force-pushed
on May 1, 2024
DrahtBot removed the label
Needs rebase
on May 1, 2024
DrahtBot removed the label
CI failed
on May 1, 2024
DrahtBot added the label
CI failed
on Jun 14, 2024
DrahtBot
commented at 2:16 pm on June 14, 2024:
contributor
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot added the label
Needs rebase
on Aug 5, 2024
furszy renamed this:
index: blockfilter initial sync speedup, parallelize process
index: initial sync speedup, parallelize process
on Aug 7, 2024
furszy force-pushed
on Aug 14, 2024
furszy force-pushed
on Aug 14, 2024
DrahtBot removed the label
Needs rebase
on Aug 14, 2024
hebasto added the label
Needs CMake port
on Aug 16, 2024
maflcko removed the label
Needs CMake port
on Aug 29, 2024
DrahtBot added the label
Needs rebase
on Sep 2, 2024
index: remove CBlockIndex access from node internals
Moved CBlockUndo disk read lookups from child
indexes to base index class.
544ee5af8f
util: introduce general purpose thread pool401f21bfd7
init: provide thread pool to block filter index
And add option to customize thread pool workers count
32dffe6d06
index: implement index parallel sync5d0720e6ab
index: enable block filter index parallel sync
It also adds coverage for initial sync from a particular block.
Mimicking a node restart.
86f76df878
txindex: enable parallel sync81dcee9464
furszy force-pushed
on Oct 1, 2024
DrahtBot removed the label
Needs rebase
on Oct 1, 2024
in
src/util/threadpool.h:45
in
81dcee9464
40+ {
41+ // Wait for the task or until the stop flag is set
42+ m_condition.wait(wait_lock,[&]() EXCLUSIVE_LOCKS_REQUIRED(cs_work_queue) { return m_interrupt || !m_work_queue.empty(); });
43+
44+ // If stopped, exit worker.
45+ if (m_interrupt && m_work_queue.empty()) {
andrewtoth
commented at 2:54 pm on October 23, 2024:
If we receive an interrupt, we don’t also want to wait for the work queue to be empty right?
andrewtoth
commented at 4:52 pm on October 27, 2024:
I tried this and it causes memory errors, since the remaining futures will have a dangling ref to m_condition.
andrewtoth
commented at 3:02 pm on October 23, 2024:
Can we modify this to be more generic and return a type? And add logic to collect all returned values into a shared vector which can then be atomically swapped out by an observer? Possibly not in this PR, but if this will be split out into a generic thread pool.
andrewtoth
commented at 3:37 pm on November 18, 2024:
tidy doesn’t like this
0 ThreadPool() = default;
andrewtoth
commented at 6:11 pm on November 18, 2024:
contributor
I’m using the ThreadPool here in #31132 as a cherry-picked commit, modulo changing ThreadPool() {} to ThreadPool() = default;. Perhaps we could pull this out to a separate PR since it would be useful for both changes.
One request for the ThreadPool would be to track in flight tasks being executed. That way we could write tests that ensure that all tasks have been completed before continuing, even if we don’t have access to the futures.
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me