ThreadSanitizer: Fix #29767 #29776
pull nanlour wants to merge 1 commits into bitcoin:master from nanlour:master changing 1 files +1 −1-
nanlour commented at 3:24 am on April 1, 2024: contributorI think this problem #29767#issue-2216373048 is because of in BaseIndex::Sync https://github.com/bitcoin/bitcoin/blob/61de64df6790077857faba84796bb874b59c5d15/src/index/base.cpp#L163-L168 Setup m_synced = true; before Commit(); So this may cause a race condition window to BaseIndex::BlockConnected https://github.com/bitcoin/bitcoin/blob/61de64df6790077857faba84796bb874b59c5d15/src/index/base.cpp#L271-L274 So i try to fix it with move m_synced = true after Commit(). Also see comment of Sync(): https://github.com/bitcoin/bitcoin/blob/61de64df6790077857faba84796bb874b59c5d15/src/index/base.h#L151-L156 I am a newcomer interested in Bitcoin, trying to become a member of the Bitcoin Core development team. Please give me some feedback if you could, as I may be doing something wrong. Thank you!
-
Fix #29767, set m_synced = true after Commit() bbe82c116e
-
DrahtBot commented at 3:24 am on April 1, 2024: contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Code Coverage
For detailed information about the code coverage, see the test coverage report.
Reviews
See the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
-
fanquake renamed this:
ThreadSanitizer: Fix #29767, ThreadSanitizer: data race src/flatfile.cpp:47:13 in FlatFileSeq::Open(FlatFilePos const&, bool)
ThreadSanitizer: Fix #29767
on Apr 1, 2024 -
furszy commented at 4:45 pm on April 1, 2024: member
Setup m_synced = true; before Commit(); So this may cause a race condition window to BaseIndex::BlockConnected
If that’s the case, sleeping the thread after setting
m_synced
and beforeCommit()
should trigger the failure. Have you tried it? -
mzumsande commented at 9:45 pm on April 1, 2024: contributor
Concept ACK
Seems like this is not just a theoretical ThreadSanitizer error but a bug that could lead to index corruption in the case of unfortunate timing between the threads running
Sync()
andBlockConnected()
. -
nanlour force-pushed on Apr 2, 2024
-
nanlour commented at 3:37 pm on April 2, 2024: contributor
If that’s the case, sleeping the thread after setting
m_synced
and beforeCommit()
should trigger the failure. Have you tried it?Yes, I have tried to recreate this bug with the original
feature_assumeutxo
test from #29767, but simply sleeping between settingm_synced
andCommit()
cannot trigger it. I’ve attempted to recreate the data race by only pause Sync thread after it setm_synced
and before it callFlatFileSeq::Open
, this trigger data race when runfeature_assumeutxo
test with a probability of around 10%. This is my way to recreate it: 5931fb8f0966f435e9b975ee1c4c4357e88e9e56. I am pretty sure there will be a better solution, but I can only come up with this one for now. -
ryanofsky approved
-
ryanofsky commented at 4:16 pm on April 2, 2024: contributor
Code review ACK bbe82c116e72ca0638751e063bf564cd1fe5c4d5
It seems like the race condition between
m_synced = true
and committing has been around since 4368384f1d267b011e03a805f934f5c47e2ca1b2 from #14121, and the race condition accessing them_next_filter_pos
variable existed then too. -
DrahtBot requested review from mzumsande on Apr 2, 2024
-
DrahtBot added the label CI failed on Apr 2, 2024
-
fjahr commented at 10:40 pm on April 3, 2024: contributor
Code review ACK bbe82c116e72ca0638751e063bf564cd1fe5c4d5
This looks like the correct fix to me. CI failure is unrelated.
-
ryanofsky merged this on Apr 4, 2024
-
ryanofsky closed this on Apr 4, 2024
-
ryanofsky commented at 1:30 pm on April 4, 2024: contributorMerged this since it is a small straightfoward fix, only affects indexing, has 2 acks (ryanofsky, fjahr) and positive comments from other reviewers (mzumsande, sjors, furszy)
-
Pttn referenced this in commit 037f05580a on Apr 13, 2024
-
luke-jr commented at 6:50 pm on April 21, 2024: member
It seems like the race condition between m_synced = true and committing has been around since 4368384 from #14121, and the race condition accessing the m_next_filter_pos variable existed then too.
Until #28955,
cs_main
was held across all this to prevent a race.Not going to look at it right now, but I suspect this fix just creates a different race instead.
-
nanlour commented at 11:49 am on April 26, 2024: contributor
Not going to look at it right now, but I suspect this fix just creates a different race instead. Looks like furszy already got this in #29867
I think the race condition fixed in #29867 was not introduced by my change; it was created by #28955, although my change makes the race condition window bigger.
-
ryanofsky commented at 12:29 pm on April 26, 2024: contributor
Until #28955,
cs_main
was held across all this to prevent a race.The race existed before #28955 because even before #28955, only the
BaseIndex::Sync
thread was lockingcs_main
. TheBaseIndex::BlockConnected
notification thread was never lockingcs_main
. So settingm_synced = true;
before callingCommit();
was always unsafe, even withcs_main
held because theBlockConnected
thread could run in parallel and both threads could try to update index state at the same time. #28955 would make the race more likely to happen though, which was probably why it was caught more recently. -
fanquake referenced this in commit 0fcceefe22 on Apr 26, 2024
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 00:12 UTC
More mirrored repositories can be found on mirror.b10c.me