[Not for merge] pthread sanity check #21022

pull fanquake wants to merge 1 commits into bitcoin:master from fanquake:broken_shared_mutex_sanity_check changing 5 files +66 −1
  1. fanquake commented at 2:24 pm on January 28, 2021: member

    After #21016, boost::shared_mutex will basically be the last component of Boost Thread that Bitcoin Core is using. However, [a comment in #16684](/bitcoin-bitcoin/16684/#issuecomment-726214696), pointed out that std::shared_mutex may be unsafe to use, when coupled with a range of glibc versions ~2.26->2.29, which may block our adoption and the removal of Boost Thread.

    Note that the comment first links to “rdlock stalls indefinitely on an unlocked pthread rwlock”, but then to the Ubuntu bugtracker, which is actually for a different pthread bug: “pthread_rwlock_trywrlock results in hang”. This PR contains a modified version of the code to reproduce the second bug, for which a backport was done for Ubuntu 18.04s glibc (2.27-3ubuntu1.3).

    You can reproduce the hanging behaviour using the following. (You could also use any demos from the linked bug reports).

    Run a Bionic container, and install build dependencies. Clone the source and checkout this branch:

    0docker run -it --rm ubuntu:18.04 /bin/bash
    1
    2apt update && apt upgrade -y
    3apt install git build-essential libtool autotools-dev automake pkg-config bsdmainutils python3 libevent-dev libboost-system-dev libboost-filesystem-dev libboost-test-dev libboost-thread-dev -y
    4
    5git clone https://github.com/bitcoin/bitcoin
    6git fetch origin pull/21022/head:21022
    7git checkout 21022
    

    Configure, disabling the wallet (unrelated) and enabling glibc back compat, so that our sanity checks are enabled:

    0./autogen.sh
    1./configure --disable-wallet --enable-glibc-back-compat
    2make src/bitcoind -j8
    

    Before running bitcoind, check which version of libc you have installed. 23844 was fixed in 1.3, so running 1.4 should mean no issues:

    0apt-cache policy libc6
    1libc6:
    2  Installed: 2.27-3ubuntu1.4
    3  Candidate: 2.27-3ubuntu1.4
    

    Run bitcoind

    0src/bitcoind
    1Testing pthread trylock_wr
    2
    32021-01-28T13:24:39Z Bitcoin Core version v21.99.0-1e601c9d60f9 (release build)
    42021-01-28T13:24:39Z Assuming ancestors of block 0000000000000000000b9d2ec5a352ecba0592946514a92f14319dc2b367fc72 have valid signatures.
    52021-01-28T13:24:39Z Setting nMinimumChainWork=00000000000000000000000000000000000000001533efd8d716a517fe2c5008
    6...
    7# quit
    

    Downgrade libc to 1.2. You may need to uninstall libc6-dev first, otherwise it will block the downgrade of libc6:

    0apt remove libc6-dev -y
    1apt install libc6=2.27-3ubuntu1.2 -y
    2
    3# check that you're running 1.2
    4apt-cache policy libc6
    5libc6:
    6  Installed: 2.27-3ubuntu1.2
    7  Candidate: 2.27-3ubuntu1.4
    

    Run bitcoind again. Note that this will hang. You will not be able to quit.

    0# if it doesn't hang, quit and retry
    1./src/bitcoind
    2Testing pthread trylock_wr
    3
    4.... hanging
    

    I think it would be good to have more discussion around how we approach these sort of “lower down the stack” issues:

    • what sort of bug or issue warrants us from excluding a std library feature, i.e std::shared_mutex, from use? If you looked at the glibc issue tracker right now, you’d no doubt find half a dozen bugs across multiple versions of glibc that you may think warrant us excluding various things.
    • How widespread does the issue have to be? If it’s in a single glibc version that is nearly EOL, that is obviously less of an issue compared to something affecting most recent versions.
    • how (in)tolerant are we of assuming that users are taking backports?
    • do we need to be more aggressive with runtime sanity checks? Should we be trying to detect more “broken things” that are applicable to us?

    It would certainly be unfortunate if migrating to more standard library components was blocked for extended periods (Ubuntu Bionic is LTS until mid 2023), due to these kinds of bugs.

  2. [WIP] pthread sanity check 1e601c9d60
  3. DrahtBot added the label Build system on Jan 28, 2021
  4. laanwj commented at 5:42 pm on January 28, 2021: member

    may be unsafe to use, when coupled with a range of glibc versions ~2.26->2.29, which may block our adoption and the removal of Boost Thread.

    plugs #18110

    Edit: with regard to shared_mutex specifically, these are the only uses, of which one is in the tests:

    0src/script/sigcache.cpp:#include <boost/thread/shared_mutex.hpp>
    1src/script/sigcache.cpp:    boost::shared_mutex cs_sigcache;
    2src/script/sigcache.cpp:        boost::shared_lock<boost::shared_mutex> lock(cs_sigcache);
    3src/script/sigcache.cpp:        boost::unique_lock<boost::shared_mutex> lock(cs_sigcache);
    4src/test/cuckoocache_tests.cpp:    boost::shared_mutex mtx;
    5src/test/cuckoocache_tests.cpp:        boost::unique_lock<boost::shared_mutex> l(mtx);
    6src/test/cuckoocache_tests.cpp:            boost::shared_lock<boost::shared_mutex> l(mtx);
    7src/test/cuckoocache_tests.cpp:    boost::unique_lock<boost::shared_mutex> l(mtx);
    

    Can’t we get rid of it? That’s resolve this particular issue at least (though not the wider discussion around glibc versions).

  5. theuni commented at 6:09 pm on January 28, 2021: member

    plugs #18110

    I know this is really pushing it, but I don’t think it’s totally crazy to consider vendoring musl in the very long term (not now).

    Certainly building against it for releases is worth considering.

    Can’t we get rid of it? That’s resolve this particular issue at least (though not the wider discussion around glibc versions).

    Without looking at the code (assuming it wouldn’t require a redesign), this seems like a reasonable temporary punt. It also seems reasonable to me to worry that boost hits the same underlying glibc bug, and we’ve just not happened to trigger the right conditions yet.

    Shared locks are really handy to have sometimes, though, so we’ll definitely want a path back to enabling them if we take that route. @fanquake’s approach here makes perfect sense to me in that regard.

  6. jnewbery commented at 10:45 am on February 2, 2021: member

    Can’t we get rid of it? That’s resolve this particular issue at least (though not the wider discussion around glibc versions).

    Without looking at the code (assuming it wouldn’t require a redesign), this seems like a reasonable temporary punt.

    I think this is actually very reasonable. The shared_mutex in the sigcache was introduced in ef0f422519de4a3ce47d923e5f8f90cd12349f3e. @sipa’s commit log:

    Remove contention on signature cache during block validation
    
    Since block validation happens in parallel, multiple threads may be
    accessing the signature cache simultaneously. To prevent contention:
    * Turn the signature cache lock into a shared mutex
    * Make reading from the cache only acquire a shared lock
    * Let block validations not store their results in the cache
    

    I believe that since #10192 was merged, block validation will now very rarely use the signature cache. From that PR:

    This is somewhat duplicative with the sigcache, as entries in the new cache will also have several entries in the sigcache. However, the sigcache is kept both as ATMP relies on it and because it prevents malleability-based DoS attacks on the new higher-level cache. Instead, the -sigcachesize option is re-used - cutting the sigcache size in half and using the newly freed memory for the script execution cache.

    If the only use for sigcache is now in ATMP (which is single-threaded and requires cs_main), then changing this to a simple mutex wouldn’t have any impact on performance. I don’t understand the “prevents malleability-based DoS attacks on the new higher-level cache” part of @TheBlueMatt’s description so I may be missing something subtle here.

  7. laanwj referenced this in commit 9996b1806a on Feb 12, 2021
  8. fanquake commented at 7:51 am on February 15, 2021: member
    Closing this now that #21064 has been merged, however I think there is still scope for us to increase our runtime sanity checks going forward. Will propose some other changes.
  9. fanquake closed this on Feb 15, 2021

  10. DrahtBot locked this on Aug 16, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-03 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me