test: Potential deadlock in wallet_tests/CreateWalletFromFile #19049

issue hebasto openend this issue on May 22, 2020
  1. hebasto commented at 10:29 am on May 22, 2020: member

    Building master (df303ceb650521dc7b1ba91e0eea383c387a5860) on Linux Mint 19.3 (x86_64, bionic codebase):

     0$ ./configure --with-sanitizers=thread && make clean && make
     1$ TSAN_OPTIONS="suppressions=test/sanitizer_suppressions/tsan" ./src/test/test_bitcoin --run_test=wallet_tests/CreateWalletFromFile
     2Running 1 test case...
     3WARNING: too long mutex cycle found
     4WARNING: too long mutex cycle found
     5WARNING: too long mutex cycle found
     6WARNING: too long mutex cycle found
     7WARNING: too long mutex cycle found
     8==================
     9WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=31793)
    10  Cycle in lock order graph: M134059 (0x7b6400004cc8) => M133094 (0x56527c390c60) => M134059
    11
    12  Mutex M133094 acquired here while holding mutex M134059 in main thread:
    13    [#0](/bitcoin-bitcoin/0/) pthread_mutex_lock <null> (libtsan.so.0+0x3fadb)
    14    [#1](/bitcoin-bitcoin/1/) __gthread_mutex_lock /usr/include/x86_64-linux-gnu/c++/7/bits/gthr-default.h:748 (test_bitcoin+0xd17c11)
    15    [#2](/bitcoin-bitcoin/2/) __gthread_recursive_mutex_lock /usr/include/x86_64-linux-gnu/c++/7/bits/gthr-default.h:810 (test_bitcoin+0xd17c11)
    16    [#3](/bitcoin-bitcoin/3/) std::recursive_mutex::lock() /usr/include/c++/7/mutex:107 (test_bitcoin+0xd17c11)
    17    [#4](/bitcoin-bitcoin/4/) std::unique_lock<std::recursive_mutex>::lock() /usr/include/c++/7/bits/std_mutex.h:267 (test_bitcoin+0xd17c11)
    18    [#5](/bitcoin-bitcoin/5/) UniqueLock<AnnotatedMixin<std::recursive_mutex>, std::unique_lock<std::recursive_mutex> >::Enter(char const*, char const*, int) sync.h:131 (test_bitcoin+0xd17c11)
    19    [#6](/bitcoin-bitcoin/6/) UniqueLock<AnnotatedMixin<std::recursive_mutex>, std::unique_lock<std::recursive_mutex> >::UniqueLock(AnnotatedMixin<std::recursive_mutex>&, char const*, char const*, int, bool) sync.h:152 (test_bitcoin+0xd17c11)
    20    [#7](/bitcoin-bitcoin/7/) CWallet::CreateWalletFromFile(interfaces::Chain&, WalletLocation const&, bilingual_str&, std::vector<bilingual_str, std::allocator<bilingual_str> >&, unsigned long) wallet/wallet.cpp:4004 (test_bitcoin+0xd397de)
    21    [#8](/bitcoin-bitcoin/8/) TestLoadWallet wallet/test/wallet_tests.cpp:37 (test_bitcoin+0x703587)
    22    [#9](/bitcoin-bitcoin/9/) wallet_tests::CreateWalletFromFile::test_method() wallet/test/wallet_tests.cpp:782 (test_bitcoin+0x72013e)
    23    [#10](/bitcoin-bitcoin/10/) CreateWalletFromFile_invoker wallet/test/wallet_tests.cpp:700 (test_bitcoin+0x721452)
    24    [#11](/bitcoin-bitcoin/11/) boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) /usr/include/boost/function/function_template.hpp:118 (test_bitcoin+0xe05fb)
    25    [#12](/bitcoin-bitcoin/12/) boost::detail::function::function_obj_invoker0<boost::detail::forward, int>::invoke(boost::detail::function::function_buffer&) <null> (libboost_unit_test_framework.so.1.65.1+0x4b2cd)
    26    [#13](/bitcoin-bitcoin/13/) __libc_start_main <null> (libc.so.6+0x21b96)
    27
    28    Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative warning message
    29
    30  Mutex M134059 acquired here while holding mutex M133094 in main thread:
    31    [#0](/bitcoin-bitcoin/0/) pthread_mutex_lock <null> (libtsan.so.0+0x3fadb)
    32    [#1](/bitcoin-bitcoin/1/) __gthread_mutex_lock /usr/include/x86_64-linux-gnu/c++/7/bits/gthr-default.h:748 (test_bitcoin+0x71e053)
    33    [#2](/bitcoin-bitcoin/2/) __gthread_recursive_mutex_lock /usr/include/x86_64-linux-gnu/c++/7/bits/gthr-default.h:810 (test_bitcoin+0x71e053)
    34    [#3](/bitcoin-bitcoin/3/) std::recursive_mutex::lock() /usr/include/c++/7/mutex:107 (test_bitcoin+0x71e053)
    35    [#4](/bitcoin-bitcoin/4/) AnnotatedMixin<std::recursive_mutex>::lock() sync.h:90 (test_bitcoin+0x71e053)
    36    [#5](/bitcoin-bitcoin/5/) operator() wallet/test/wallet_tests.cpp:780 (test_bitcoin+0x71e053)
    37    [#6](/bitcoin-bitcoin/6/) _M_invoke /usr/include/c++/7/bits/std_function.h:316 (test_bitcoin+0x71e677)
    38    [#7](/bitcoin-bitcoin/7/) std::function<void (std::unique_ptr<interfaces::Wallet, std::default_delete<interfaces::Wallet> >)>::operator()(std::unique_ptr<interfaces::Wallet, std::default_delete<interfaces::Wallet> >) const /usr/include/c++/7/bits/std_function.h:706 (test_bitcoin+0xd39850)
    39    [#8](/bitcoin-bitcoin/8/) CWallet::CreateWalletFromFile(interfaces::Chain&, WalletLocation const&, bilingual_str&, std::vector<bilingual_str, std::allocator<bilingual_str> >&, unsigned long) wallet/wallet.cpp:4006 (test_bitcoin+0xd39850)
    40    [#9](/bitcoin-bitcoin/9/) TestLoadWallet wallet/test/wallet_tests.cpp:37 (test_bitcoin+0x703587)
    41    [#10](/bitcoin-bitcoin/10/) wallet_tests::CreateWalletFromFile::test_method() wallet/test/wallet_tests.cpp:782 (test_bitcoin+0x72013e)
    42    [#11](/bitcoin-bitcoin/11/) CreateWalletFromFile_invoker wallet/test/wallet_tests.cpp:700 (test_bitcoin+0x721452)
    43    [#12](/bitcoin-bitcoin/12/) boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) /usr/include/boost/function/function_template.hpp:118 (test_bitcoin+0xe05fb)
    44    [#13](/bitcoin-bitcoin/13/) boost::detail::function::function_obj_invoker0<boost::detail::forward, int>::invoke(boost::detail::function::function_buffer&) <null> (libboost_unit_test_framework.so.1.65.1+0x4b2cd)
    45    [#14](/bitcoin-bitcoin/14/) __libc_start_main <null> (libc.so.6+0x21b96)
    46
    47SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) (/usr/lib/x86_64-linux-gnu/libtsan.so.0+0x3fadb) in __interceptor_pthread_mutex_lock
    48==================
    49WARNING: too long mutex cycle found
    50WARNING: too long mutex cycle found
    51WARNING: too long mutex cycle found
    52WARNING: too long mutex cycle found
    53WARNING: too long mutex cycle found
    54WARNING: too long mutex cycle found
    55WARNING: too long mutex cycle found
    56WARNING: too long mutex cycle found
    57WARNING: too long mutex cycle found
    58WARNING: too long mutex cycle found
    59
    60*** No errors detected
    61ThreadSanitizer: reported 1 warnings
    

    The ThreadSanitizer does not report this issue on CI as the binaries are built with the --disable-wallet option: https://github.com/bitcoin/bitcoin/blob/df303ceb650521dc7b1ba91e0eea383c387a5860/ci/test/00_setup_env_native_tsan.sh#L14

    UPDATE: This issue is caused by #16426 that has been merged on 2020-05-01.

  2. hebasto added the label Bug on May 22, 2020
  3. ryanofsky commented at 9:04 pm on May 26, 2020: member

    I tried to debug this but it is very awkward because the mutexes are unnamed:

    M134059 (0x7b6400004cc8) => M133094 (0x56527c390c60)

    Is there some way we can provide tsan with better names than M134059 and M133094 for mutexes? Or alternately is it possible to find out from gdb if 0x7b6400004cc8 and 0x56527c390c60 correspond to symbols?

    M134059 seems like it is the cs_wallets mutex since it is locked in wallet.cpp:4004

    M133094 could be anything because second stack trace is useless and just shows some lock being acquired in some HandleLoadWallet callback. The HandleLoadWallet callback in the test is mining a new block, so maybe the mutex in question is cs_main or mempool.cs. But I don’t see a case where cs_wallets would ever be locked while holding cs_main or cs_mempool. I do see cs_wallets is locked while holding cs_wallet in CreateWalletFromFile. So maybe M133094 is cs_wallet? cs_wallet is temporarily unlocked in HandleLoadWallet so I guess that technically could cause a deadlock if some other future nonexistent thread was added.

    I guess feedback would here would be that I don’t think tsan is doing anything useful for this test and should probably be suppressed. Also it would be good if it could display less confusing mutex names.

  4. vasild commented at 6:45 pm on June 2, 2020: member

    Is there some way we can provide tsan with better names than M134059 and M133094 for mutexes?

    Looking at the clang source code - I think not :(

    Or alternately is it possible to find out from gdb if 0x7b6400004cc8 and 0x56527c390c60 correspond to symbols?

    (gdb) info symbol 0x7b6400004cc8 should be able to give some useful information, but for me it just prints No symbol matches 0x7fffffffd1b0 even though I have AnnotatedMixin<std::__1::mutex> object at that address. If you have a suspect you can inspect its address and see if it matches the address printed in the clang diagnostic.

  5. ryanofsky commented at 7:01 pm on June 2, 2020: member

    Looking at the clang source code - I think not :(

    Very interesting, thanks for looking into this!

  6. MarcoFalke commented at 7:16 pm on June 2, 2020: member
    The issue shouldn’t be in any of our annotated mutexes right? Otherwise --enable-debug would detect the issue. Thus, at least one of the mutexes must be a mutex in bdb (or some other library).?
  7. ryanofsky commented at 7:50 pm on June 2, 2020: member

    The issue shouldn’t be in any of our annotated mutexes right? Otherwise --enable-debug would detect the issue. Thus, at least one of the mutexes must be a mutex in bdb (or some other library).?

    Oh wow, that’s a really good point. I think it strongly suggests the log mutex is involved, because it isn’t annotated and this test relies a lot on log hooks. I think BDB probably isn’t involved because there aren’t really cases where BDB code would be calling back into our code while holding a mutex.

    I think I might not be too concerned about debugging this issue right now because #15719 rewrites a lot the test and related code. But knowing that one of the mutexes probably isn’t annotated definitely helps narrow the bug down.

  8. MarcoFalke commented at 8:21 pm on June 3, 2020: member
    It could also be a false-positive
  9. MarcoFalke commented at 10:12 pm on June 3, 2020: member
    The cycle is cs_wallet -> cs_wallets -> cs_wallet
  10. MarcoFalke commented at 10:21 pm on June 3, 2020: member

    TestLoadWallet here

    https://github.com/bitcoin/bitcoin/blob/a1c0e5fce13911252a135bdce4ffc09c24d23791/src/wallet/test/wallet_tests.cpp#L771-L782

    will call into CreateWalletFromFile, which locks walletInstance->cs_wallet and then cs_wallets:

    https://github.com/bitcoin/bitcoin/blob/a1c0e5fce13911252a135bdce4ffc09c24d23791/src/wallet/wallet.cpp#L4020-L4027

    However one of the handlers temporarily unlocks cs_wallet and thus creates a “hole” in the lock stack, which can lead to a deadlock (in theory, not in this test in practice of course).

  11. MarcoFalke commented at 10:41 pm on June 3, 2020: member
    And the reason that --enable-debug doesn’t detect this is because it can’t deal with “holes” (ripping out a lock from the lock stack).
  12. hebasto commented at 6:39 am on June 4, 2020: member

    @MarcoFalke

    The cycle is cs_wallet -> cs_wallets -> cs_wallet

    Is this a test design problem to be fixed?

  13. ryanofsky commented at 4:44 am on June 5, 2020: member

    I’m seeing another unreadable tsan error in this test here:

    https://travis-ci.org/github/bitcoin/bitcoin/jobs/694884461

    From the configure output, it seems like debug symbols (-g) are not enabled in the build. Is this an oversight or is there a reason debug symbols couldn’t be enabled?

  14. vasild commented at 8:48 am on June 5, 2020: member

    Is this a test design problem to be fixed?

    I am not sure exactly which mutex gets locked when in this case, but any code that does “lock A, lock B, unlock A, lock A” is buggy and should be fixed (it locks B while holding A and later it locks A while holding B, so it could deadlock with another thread executing the same code).

    From the configure output, it seems like debug symbols (-g) are not enabled in the build. Is this an oversight or is there a reason debug symbols couldn’t be enabled?

    --with-sanitizers=thread was introduced without -g and without --with-debug in 89bf196c8. -O2 could also screw up backtrace reports (inlining some functions and reorganizing the code, so that line numbers do not correspond to source code).

     0--- i/ci/test/00_setup_env_native_tsan.sh
     1+++ w/ci/test/00_setup_env_native_tsan.sh
     2@@ -8,7 +8,7 @@ export LC_ALL=C.UTF-8
     3 
     4 export CONTAINER_NAME=ci_native_tsan
     5 export DOCKER_NAME_TAG=ubuntu:20.04
     6 export PACKAGES="clang llvm libc++abi-dev libc++-dev python3-zmq"
     7 export DEP_OPTS="CC=clang CXX='clang++ -stdlib=libc++'"
     8 export GOAL="install"
     9-export BITCOIN_CONFIG="--enable-zmq --disable-wallet --with-gui=no CPPFLAGS='-DARENA_DEBUG -DDEBUG_LOCKORDER' --with-sanitizers=thread CC=clang CXX='clang++ -stdlib=libc++'"
    10+export BITCOIN_CONFIG="--enable-zmq --disable-wallet --with-gui=no --enable-debug --with-sanitizers=thread CPPFLAGS='-DARENA_DEBUG' CC=clang CXX='clang++ -stdlib=libc++'"
    

    open PR, @MarcoFalke?

  15. hebasto commented at 9:03 am on June 5, 2020: member

    @vasild

    -O2 could also screw up backtrace reports (inlining some functions and reorganizing the code, so that line numbers do not correspond to source code).

    It seems https://clang.llvm.org/docs/ThreadSanitizer.html does not mention such behavior. Or did I miss something?

  16. vasild commented at 9:44 am on June 5, 2020: member

    Well, that is just in general. It is not specific to TSan.

    https://www.gnu.org/software/libc/manual/html_node/Backtraces.html mentions something about it:

    Note that certain compiler optimizations may interfere with obtaining a valid backtrace. Function inlining causes the inlined function to not have a stack frame; tail call optimization replaces one stack frame with another; frame pointer elimination will stop backtrace from interpreting the stack contents correctly.

  17. MarcoFalke commented at 11:23 am on June 5, 2020: member

    open PR, @MarcoFalke?

    Sure as long as there is one build with DARENA_DEBUG

  18. ryanofsky commented at 11:36 am on June 5, 2020: member

    Is this a test design problem to be fixed?

    I am not sure exactly which mutex gets locked when in this case, but any code that does “lock A, lock B, unlock A, lock A” is buggy and should be fixed (it locks B while holding A and later it locks A while holding B, so it could deadlock with another thread executing the same code).

    For it to deadlock, there needs to be another thread not just executing the same code, but also using the same mutex instances. In a unit test, if the mutexes are per-test instance (not global variables), then there can’t be a deadlock unless the test itself is written to have two or more of these threads running at the same time with the same mutexes. TSAN will complain, but there will not be any possibility of a deadlock if the test isn’t creating multiple threads internally.

    In this case, the cs_wallet mutex is unique to the test instance, and the other cs_wallets mutex is currently global, but #19101 updates it to be per-test instance.

    --with-sanitizers=thread was introduced without -g and without --with-debug in 89bf196. -O2 could also screw up backtrace reports (inlining some functions and reorganizing the code, so that line numbers do not correspond to source code).

    Thanks for finding that. To be sure, this only explains why the test is built without the other debug options, and there should be no problem adding -g? In any case I’ll experiment with adding -g in the setup you pointed to and see if it helps makes TSAN output more readable.

  19. ryanofsky commented at 11:42 am on June 5, 2020: member
    I’m also confused about how the suppression file test/sanitizer_suppressions/tsan seems to be working with some mutex and variable names when tsan itself seems to not know these names
  20. MarcoFalke commented at 11:43 am on June 5, 2020: member

    when tsan itself seems to not know these names

    The llvm symbolizer is required to translate symbol addresses to names. Is that what you mean?

  21. MarcoFalke commented at 11:44 am on June 5, 2020: member
  22. ryanofsky commented at 11:52 am on June 5, 2020: member

    when tsan itself seems to not know these names

    The llvm symbolizer is required to translate symbol addresses to names. Is that what you mean?

    I just mean that TSAN output contains “Mutex M262581 (0x7b3800000670) created at:” “Mutex M129 (0x55a6939ca390)” “Mutex M135 (0x55a6939ca578) created at” “https://travis-ci.org/github/bitcoin/bitcoin/jobs/694884461" without readable names. If the tsan error printer doesn’t know the real mutex names, why would the tsan error suppressor know the mutex names?

  23. ryanofsky referenced this in commit 8db3520954 on Jun 5, 2020
  24. ryanofsky commented at 1:33 pm on June 5, 2020: member

    re: me #19049 (comment)

    In any case I’ll experiment with adding -g in the setup you pointed to and see if it helps makes TSAN output more readable.

    Seems like with -g, there are line numbers, but still no mutex names:

    https://travis-ci.org/github/bitcoin/bitcoin/jobs/694884461 (without -g) https://travis-ci.org/github/bitcoin/bitcoin/jobs/695004018 (with -g)

  25. ryanofsky closed this on Jun 5, 2020

  26. ryanofsky reopened this on Jun 5, 2020

  27. ryanofsky referenced this in commit 182df0f94a on Jul 10, 2020
  28. ryanofsky referenced this in commit 66cbbb4ab0 on Jul 11, 2020
  29. ryanofsky referenced this in commit 3704d3b97e on Jul 11, 2020
  30. ryanofsky referenced this in commit 111a2ebec6 on Jul 13, 2020
  31. ryanofsky referenced this in commit 0d41ed87fc on Jul 14, 2020
  32. hebasto referenced this in commit 0cdf2a77dd on Jul 17, 2020
  33. sidhujag referenced this in commit d6badf5f80 on Jul 18, 2020
  34. Warchant referenced this in commit 1ec9fb1d4d on Aug 6, 2020
  35. ryanofsky referenced this in commit 834d7d61dd on Aug 7, 2020
  36. hebasto commented at 7:41 pm on September 19, 2020: member
    Note for myself: the warning is silenced in #19164.
  37. janus referenced this in commit ee423b18f2 on Nov 22, 2020
  38. fanquake closed this on Dec 11, 2020

  39. sidhujag referenced this in commit 9d8bc3e82d on Dec 11, 2020
  40. DrahtBot locked this on Feb 15, 2022
  41. knst referenced this in commit 37aeaa9ae5 on Feb 22, 2023
  42. knst referenced this in commit 7441e3c21a on Feb 22, 2023
  43. knst referenced this in commit c4051e4f3c on Feb 23, 2023
  44. knst referenced this in commit 7d3899fa0a on Feb 23, 2023
  45. knst referenced this in commit ab454c80a4 on Feb 28, 2023
  46. knst referenced this in commit b19186bd55 on Feb 28, 2023
  47. PastaPastaPasta referenced this in commit 25bdc2670e on Mar 19, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-03 13:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me