After #21016, boost::shared_mutex
will basically be the last component of Boost Thread that Bitcoin Core is using. However, [a comment in #16684](/bitcoin-bitcoin/16684/#issuecomment-726214696), pointed out that std::shared_mutex
may be unsafe to use, when coupled with a range of glibc versions ~2.26->2.29, which may block our adoption and the removal of Boost Thread.
Note that the comment first links to “rdlock stalls indefinitely on an unlocked pthread rwlock”, but then to the Ubuntu bugtracker, which is actually for a different pthread bug: “pthread_rwlock_trywrlock results in hang”. This PR contains a modified version of the code to reproduce the second bug, for which a backport was done for Ubuntu 18.04s glibc (2.27-3ubuntu1.3).
You can reproduce the hanging behaviour using the following. (You could also use any demos from the linked bug reports).
Run a Bionic container, and install build dependencies. Clone the source and checkout this branch:
0docker run -it --rm ubuntu:18.04 /bin/bash
1
2apt update && apt upgrade -y
3apt install git build-essential libtool autotools-dev automake pkg-config bsdmainutils python3 libevent-dev libboost-system-dev libboost-filesystem-dev libboost-test-dev libboost-thread-dev -y
4
5git clone https://github.com/bitcoin/bitcoin
6git fetch origin pull/21022/head:21022
7git checkout 21022
Configure, disabling the wallet (unrelated) and enabling glibc back compat, so that our sanity checks are enabled:
0./autogen.sh
1./configure --disable-wallet --enable-glibc-back-compat
2make src/bitcoind -j8
Before running bitcoind, check which version of libc you have installed. 23844
was fixed in 1.3
, so running 1.4
should mean no issues:
0apt-cache policy libc6
1libc6:
2 Installed: 2.27-3ubuntu1.4
3 Candidate: 2.27-3ubuntu1.4
Run bitcoind
0src/bitcoind
1Testing pthread trylock_wr
2
32021-01-28T13:24:39Z Bitcoin Core version v21.99.0-1e601c9d60f9 (release build)
42021-01-28T13:24:39Z Assuming ancestors of block 0000000000000000000b9d2ec5a352ecba0592946514a92f14319dc2b367fc72 have valid signatures.
52021-01-28T13:24:39Z Setting nMinimumChainWork=00000000000000000000000000000000000000001533efd8d716a517fe2c5008
6...
7# quit
Downgrade libc to 1.2
. You may need to uninstall libc6-dev
first, otherwise it will block the downgrade of libc6
:
0apt remove libc6-dev -y
1apt install libc6=2.27-3ubuntu1.2 -y
2
3# check that you're running 1.2
4apt-cache policy libc6
5libc6:
6 Installed: 2.27-3ubuntu1.2
7 Candidate: 2.27-3ubuntu1.4
Run bitcoind
again. Note that this will hang. You will not be able to quit.
0# if it doesn't hang, quit and retry
1./src/bitcoind
2Testing pthread trylock_wr
3
4.... hanging
I think it would be good to have more discussion around how we approach these sort of “lower down the stack” issues:
- what sort of bug or issue warrants us from excluding a std library feature, i.e
std::shared_mutex
, from use? If you looked at the glibc issue tracker right now, you’d no doubt find half a dozen bugs across multiple versions of glibc that you may think warrant us excluding various things. - How widespread does the issue have to be? If it’s in a single glibc version that is nearly EOL, that is obviously less of an issue compared to something affecting most recent versions.
- how (in)tolerant are we of assuming that users are taking backports?
- do we need to be more aggressive with runtime sanity checks? Should we be trying to detect more “broken things” that are applicable to us?
It would certainly be unfortunate if migrating to more standard library components was blocked for extended periods (Ubuntu Bionic is LTS until mid 2023), due to these kinds of bugs.