reduce cs_main scope, guard block index 'nFile' under a local mutex

furszy commented at 4:10 pm on January 31, 2023: member

By not having to lock cs_main every time we try to access block position on disk, we can avoid slowing down concurrent tasks due to threads waiting for the global lock to be released.

Also, solves a edge case race inside ReadBlockFromDisk where we obtain the file position locking cs_main and then try to access the file without it (the file could no longer exist if pruning occurred just in-between these two actions).

Context from where this comes from #26966.

DrahtBot commented at 4:10 pm on January 31, 2023: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	stickies-v
Approach NACK	dergoegge

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#31551 (optimization: bulk reads(27%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD by l0rinc)
#31539 (optimization: buffer reads(23%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD by l0rinc)
#31490 (refactor: cache block[undo] serialized size for consecutive calls by l0rinc)
#31144 (optimization: batch XOR operations 12% faster IBD by l0rinc)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot renamed this:
~~reduce cs_main scope, guard block index 'nFile' under a local mutex~~
reduce cs_main scope, guard block index 'nFile' under a local mutex
on Jan 31, 2023

in src/chain.h:165 in 6f41153d3c outdated

161@@ -162,14 +162,16 @@ class CBlockIndex
162     //! height of the entry in the chain. The genesis block has height 0
163     int nHeight{0};
164 
165+    const std::shared_ptr<Mutex> cs_data = std::make_shared<Mutex>();

maflcko commented at 4:32 pm on January 31, 2023:

Huh? Does this create a mutex for every block? What is the memory footprint?

furszy commented at 7:22 pm on January 31, 2023:

yeah, good topic. I was so focused on accessing disk concurrently that forgot that we have 800k block indexes loaded in memory all the time. The mutex size is 64 bytes, so.. a mutex per block index is a no go.

Could move it to a single global mutex for block indexes file data access, which would still decouple it from cs_main (which is a good step forward) but.. it’s still not the best solution for concurrent access.

hebasto commented at 9:26 pm on January 31, 2023:

Could move it to a single global mutex for block indexes file data access, which would still decouple it from cs_main (which is a good step forward) but.. it’s still not the best solution for concurrent access.

Readers/Writer lock?

furszy commented at 2:39 pm on February 2, 2023:

Working on it 👌🏼

Sjors commented at 2:27 pm on February 7, 2023:

^ is this done, or still in progress?

furszy commented at 2:35 pm on February 7, 2023:

^ is this done, or still in progress?

Yeah, have pushed the readers/writer lock in the last update.

Still, have to say that code elegancy wise, I’m not totally happy with the sync.h shared lock wrapper. Planning to give it another look later this week.

jamesob commented at 2:53 pm on February 7, 2023:

Why not do a mutex per blockfile (rather than CBlockIndex entry, which would indeed be crazy)? There are hundreds to thousands of those, so keeping a map of blockfile number to mutex shouldn’t be unreasonable.

furszy commented at 3:26 pm on February 7, 2023:

While that would be better than the initial per CBlockIndex entry mutex, it’s not as good as a single RW lock (which is the current PR state).

Essentially, we aim to allow concurrent access for read-only operations, while write operations that are not performed regularly require exclusive access (which is the RW lock purpose).

To place it more in our code. The only time where the CBlockIndex file position can be modified during the entire software lifecycle is pruning, and it’s done only once. And because of that single infrequent scenario, we have been forced to lock cs_main every time that a process accesses the block index file position.

So, by guarding this single specific case with an exclusive lock, the rest of the read-only operations can be processed concurrently by sharing ownership of the readers-only lock.

furszy commented at 4:02 pm on February 7, 2023:

A little add to my last comment:

I think that if we ever add other use cases that modify the block index file position more frequently (like re-ordering/compacting block data on disk or re-downloading blocks), then a more fine-grained RW lock per block file would be a nice idea.

– the comment came to my mind because one of my long-term projects introduces the pruned blocks re-download feature.. –

furszy force-pushed on Feb 2, 2023

in src/node/blockstorage.cpp:130 in 47bc99cddc outdated

126@@ -127,7 +127,7 @@ void BlockManager::PruneOneBlockFile(const int fileNumber)
127         if (pindex->nFile == fileNumber) {
128             pindex->nStatus &= ~BLOCK_HAVE_DATA;
129             pindex->nStatus &= ~BLOCK_HAVE_UNDO;
130-            pindex->nFile = 0;
131+            pindex->nFile = -1;

Sjors commented at 9:41 am on March 1, 2023:

47bc99cddc518dc41741c5dcc17f9e629e162449 This is probably fine, but there’s a bunch a places that we have to make sure are never reached. For example in FindBlockPos( we do _blockfile_info[nFile] where nFile = ... pos.nFile. I’ll have to review that a bit more.

in src/kernel/chain.cpp:22 in acddd42046 outdated

15@@ -16,9 +16,8 @@ interfaces::BlockInfo MakeBlockInfo(const CBlockIndex* index, const CBlock* data
16     if (index) {
17         info.prev_hash = index->pprev ? index->pprev->phashBlock : nullptr;
18         info.height = index->nHeight;
19-        LOCK(::cs_main);
20-        info.file_number = index->nFile;
21-        info.data_pos = index->nDataPos;
22+        info.file_number = index->GetFileNum();
23+        info.data_pos = index->GetDataPos();

LarryRuane commented at 6:10 pm on March 15, 2023:

Shouldn’t these be fetched atomically? Maybe there should be a getter that returns all three values, similar to how there’s a setter (CBlockIndex::SetFileData) than sets all three atomically.

stickies-v commented at 11:26 am on March 16, 2023:

Starting to feel like a struct for those 3 members would be appropriate

furszy commented at 1:30 pm on March 23, 2023:

Shouldn’t these be fetched atomically? Maybe there should be a getter that returns all three values, similar to how there’s a setter (CBlockIndex::SetFileData) than sets all three atomically. @LarryRuane I didn’t add it merely to not introduce that many changes but could be done too. The members set is guarded by cs_main (for now) so there is no possible race.

Starting to feel like a struct for those 3 members would be appropriate

Why exactly?. So far in the code, we either want the block data position or the block undo data position, we don’t get both at the same time because they represent different files on disk (thus why they are returned inside FlatFilePos, the struct groups only one of them with the file number).

In other words, nData + nFile maps to a “blk.dat”, while nUndoData + nFile maps to an specific “rev.dat”.

LarryRuane commented at 7:35 pm on March 31, 2023:

Are you sure these are protected by cs_main, is it? Running gdb test_bitcoin on the PR branch (leaving some irrelevant stuff out):

 0(gdb) b MakeBlockInfo
 1(gdb) run
 2Thread 29 "b-basic block f" hit Breakpoint 1, kernel::MakeBlockInfo (index=0x55555749b018, data=0x0) at kernel/chain.cpp:14
 3[#0](/bitcoin-bitcoin/0/)  kernel::MakeBlockInfo (index=0x55555749b018, data=0x0) at kernel/chain.cpp:14
 4[#1](/bitcoin-bitcoin/1/)  0x000055555623fa81 in BaseIndex::ThreadSync (this=0x7fffffffad00) at index/base.cpp:201
 5(gdb) p cs_main
 6$1 = {
 7  <std::recursive_mutex> = {
 8    <std::__recursive_mutex_base> = {
 9      _M_mutex = {
10        __data = {
11          __lock = 0,
12          __count = 0,
13          __owner = 0,
14          }
15        }
16      }
17    }
18  }

I agree with you that a new struct should just be nFile and nDataPos (or nPos). This struct can be used for either block data or rev data.

furszy commented at 8:59 pm on April 3, 2023:

Are you sure these are protected by cs_main, is it?

The members’ write process, not the read. If the read would be protected by cs_main, this PR wouldn’t make any sense :p. The two places are: the CBlockIndex creation time and CBlockIndex pruning time.

Still, those cs_main locks can also be removed in a follow-up PR. I haven’t removed them here because they require further changes.

Not sure if I’m understanding your concern here. You are thinking on a race in-between GetFileNum and GetDataPos? Because, if that is the concern, the error that could arise from it is harmless. The read block call would just return false, and callers would continue. But, to not even have to think about such error, we could just call GetFilePos() instead of the two getters.

I agree with you that a new struct should just be nFile and nDataPos (or nPos). This struct can be used for either block data or rev data.

There is already a method for that: CBlockIndex::GetFilePos.

stickies-v commented at 6:17 pm on March 15, 2023: contributor

Concept ACK

Is my understanding correct that 1) replacing cs_main with g_cs_blockindex_data and 2) having g_cs_blockindex_data be a shared instead of recursive/exclusive mutex are orthogonal improvements? I think it makes sense for both of them to happen in this PR, just wondering.

furszy commented at 1:44 pm on March 23, 2023: member

Concept ACK

Is my understanding correct that 1) replacing cs_main with g_cs_blockindex_data and 2) having g_cs_blockindex_data be a shared instead of recursive/exclusive mutex are orthogonal improvements? I think it makes sense for both of them to happen in this PR, just wondering.

Moving cs_main to an blockstorage local exclusive mutex would still be an improvement from the current state, yeah. The networking threads (and pretty much the entire node) shouldn’t get blocked because the user called the getblock RPC command.

But, the usage of an exclusive mutex here still means no concurrent block data access. Which is a shame as accessing block 124 on disk shouldn’t block the entire node from accessing block 750,000 on disk (nor any other block). They are on different files.

dergoegge changes_requested

dergoegge commented at 10:34 am on April 3, 2023: member

Concept ACK on reducing cs_main scope.

Approach NACK for adding new globals. It is not worth going with a worse design to “rush in” an improvement for a feature that may or may not happen (#26966). I would prefer to go with a more sustainable approach from the get go, otherwise this will need to be refactored in the long run.

I don’t see why the mutex could not be a member of BlockManager. Like you say in #27006 (comment), giving BlockManager its own internal lock(s) instead of cs_main is the preferable long term design.

furszy commented at 3:39 pm on April 3, 2023: member

Hey dergoegge, thanks for the feedback.

Approach NACK for adding new globals. It is not worth going with a worse design to “rush in” an improvement for a feature that may or may not happen (#26966). I would prefer to go with a more sustainable approach from the get go, otherwise this will need to be refactored in the long run.

No rush here. #26966 is only one clear use case. It isn’t the main one actually, there are other areas that shouldn’t be locking the net messages reception / validation thread because of a disk read: e.g. “getblock”, “getblocktxn” and compact block net messages, the current indexes sync process (all of them), getblock/gettxoutproof RPC commands. They all are locking the node at every single block read.

I don’t see why the mutex could not be a member of BlockManager. Like you say in #27006 (comment), giving BlockManager its own internal lock(s) instead of cs_main is the preferable long term design.

I’m not opposed to it but there are few problems with that approach:

BlockStorage depends on CBlockIndex and not in the other way around, so moving the mutex to BlockStorage would leave CBlockIndex members with no thread safety analysis annotations.
The ReadBlockFromDisk and UndoReadFromDisk functions are currently global functions. They do not have access to the BlockManager class members.

The second point can be fixed by refactoring the functions into the class but the first one is more complex. We have to avoid adding a chain.h <-> blockstorage.h cyclic dependency.

dergoegge commented at 2:08 pm on April 4, 2023: member

It isn’t the main one actually, there are other areas that shouldn’t be locking the net messages reception / validation thread because of a disk read: e.g. “getblock”, “getblocktxn” and compact block net messages, the current indexes sync process (all of them), getblock/gettxoutproof RPC commands. They all are locking the node at every single block read.

This has been the case for 10+ years, if we’re gonna change it then we should take the time to do it without adding to our technical debt.

I’m not opposed to it but there are few problems with that approach:

I agree that it is more work but it is worth doing imo and we are clearly in no rush.

The ReadBlockFromDisk and UndoReadFromDisk functions are currently global functions. They do not have access to the BlockManager class members.

This is being done in #27125.

the first one is more complex. We have to avoid adding a chain.h <-> blockstorage.h cyclic dependency.

Move CBlockIndex to a new header file and have chain and blockstorage include that?

furszy commented at 2:56 pm on April 4, 2023: member

It isn’t the main one actually, there are other areas that shouldn’t be locking the net messages reception / validation thread because of a disk read: e.g. “getblock”, “getblocktxn” and compact block net messages, the current indexes sync process (all of them), getblock/gettxoutproof RPC commands. They all are locking the node at every single block read.

This has been the case for 10+ years, if we’re gonna change it then we should take the time to do it without adding to our technical debt.

I never said the opposite. Not sure why the “rush” term is being used here really. I haven’t said anything like “this is great, we must have it now”.. (nor here nor anywhere in the repository).

I was trying to give you more context, answering to your “It is not worth going with a worse design to “rush in” an improvement for a feature that may or may not happen (https://github.com/bitcoin/bitcoin/pull/26966)".

I’m happy to continue discussing stuff and improve what needs to be improved.

the first one is more complex. We have to avoid adding a chain.h <-> blockstorage.h cyclic dependency.

Move CBlockIndex to a new header file and have chain and blockstorage include that?

How that solves the problem?

Placing the mutex inside block storage means that any field that is guarded by this new mutex will need block storage dependency. So don’t care where we place CBlockIndex, it will need blockstorage dependency to use the thread analysis annotation. Which will cause a cyclic dependency.

A possible solution would be to place the mutex in a separate file. So blockstorage and CBlockIndex can include it independently.

Or.. thinking out loud: something somewhat “radical” could be move nFile, nData and nUndoData members out of CBlockIndex, into the blockstorage class. We could place them into an unordered map <block_id, FileData>.

Which, in a first glance, sounds quite positive. As we are currently mixing two concepts into the same container. CBlockIndex currently is used to store the chain in a sort of linked list structure while it also contains the position of the block data on disk.

dergoegge commented at 4:10 pm on April 4, 2023: member

I never said the opposite. Not sure why the “rush” term is being used here really. I haven’t said anything like “this is great, we must have it now”.. (nor here nor anywhere in the repository).

I’m sorry didn’t mean to put words in your mouth. It seemed rushed to me because you are going with an “easier” solution here rather than the more complex but (imo) worth while long term design.

it will need blockstorage dependency to use the thread analysis annotation

I was thinking of not adding the annotations in this case. Which is not great but seems better to me than adding more globals.

Or.. thinking out loud: something somewhat “radical” could be move nFile, nData and nUndoData members out of CBlockIndex, into the blockstorage class. We could place them into an unordered map <block_id, FileData>.

This separates the concerns nicely and lets us add annotations, I would be happy with this :) The only thing that’s a bit weird then is maintaining the on-disk format for the block index entries but that seems workable.

furszy commented at 6:28 pm on April 5, 2023: member

I never said the opposite. Not sure why the “rush” term is being used here really. I haven’t said anything like “this is great, we must have it now”.. (nor here nor anywhere in the repository).

I’m sorry didn’t mean to put words in your mouth. It seemed rushed to me because you are going with an “easier” solution here rather than the more complex but (imo) worth while long term design.

Well, this PR didn’t have any long term design goal discussion prior to your arrival. I’m happy that it happened as it pushed us to think about the “radical” change mentioned above.

it will need blockstorage dependency to use the thread analysis annotation

I was thinking of not adding the annotations in this case. Which is not great but seems better to me than adding more globals.

I’m on the other side of the fence actually. While globals aren’t ideal, particularly for testing, I prefer them over loosing thread safety analysis as it is essential to prevent deadlocks.

Or.. thinking out loud: something somewhat “radical” could be move nFile, nData and nUndoData members out of CBlockIndex, into the blockstorage class. We could place them into an unordered map <block_id, FileData>.

This separates the concerns nicely and lets us add annotations, I would be happy with this :) The only thing that’s a bit weird then is maintaining the on-disk format for the block index entries but that seems workable.

Yeah, the read shouldn’t be that hard. And a db upgrade mechanism is doable too.

DrahtBot added the label Needs rebase on May 5, 2023

furszy marked this as a draft on Jun 22, 2023

furszy force-pushed on Nov 7, 2023

DrahtBot removed the label Needs rebase on Nov 7, 2023

DrahtBot added the label CI failed on Nov 7, 2023

DrahtBot added the label Needs rebase on Nov 17, 2023

furszy force-pushed on Dec 26, 2023

DrahtBot removed the label Needs rebase on Dec 27, 2023

DrahtBot added the label Needs rebase on Feb 2, 2024

furszy force-pushed on Feb 5, 2024

DrahtBot removed the label Needs rebase on Feb 5, 2024

maflcko commented at 3:22 pm on February 15, 2024: member

0test/sharedlock_tests.cpp:18:17: error: use emplace_back instead of push_back [modernize-use-emplace,-warnings-as-errors]
1   18 |         workers.push_back(std::thread(task));
2      |                 ^~~~~~~~~~~~~~~~~~~~~~    ~
3      |                 emplace_back(

DrahtBot added the label Needs rebase on Mar 12, 2024

furszy force-pushed on Mar 12, 2024

DrahtBot removed the label Needs rebase on Mar 12, 2024

DrahtBot added the label Needs rebase on Mar 20, 2024

furszy force-pushed on Apr 7, 2024

DrahtBot removed the label Needs rebase on Apr 7, 2024

DrahtBot added the label Needs rebase on Jul 15, 2024

blockindex: do not point to file 0 if data does not exist

better to store an invalid file number to describe
block indexes with no data on disk.

967b18b148

sync: implement annotated shared_mutex 1e72ef51dd

reduce cs_main scope, guard block index 'nFile' under a read/write mutex

By not having to lock cs_main every time we access block data on disk,
we can avoid slowing down concurrent tasks due to threads waiting
for the lock to be released.

2a50bce015

furszy force-pushed on Oct 13, 2024

DrahtBot removed the label Needs rebase on Oct 13, 2024

furszy commented at 7:30 pm on December 30, 2024: member

Closing for now. Might revive it in the future, post #26966.

furszy closed this on Dec 30, 2024

reduce cs_main scope, guard block index ’nFile’ under a local mutex #27006

Code Coverage

Reviews

Conflicts