CI failure: ThreadSanitizer: data race /usr/lib/llvm-12/bin/../include/c++/v1/ios:523:12 in std::__1::ios_base::width() const #23366

issue ryanofsky openend this issue on October 27, 2021
  1. ryanofsky commented at 4:43 am on October 27, 2021: contributor

    I’m seeing this race failure in two unrelated PRs during the validation_chainstate_tests/chainstate_update_tip case:

    https://cirrus-ci.com/task/5125235994263552?logs=ci#L4140 from #22702 https://cirrus-ci.com/task/5269464468946944?logs=ci#L4142 from #21206

     0test/validation_chainstate_tests.cpp(83): Entering test case "chainstate_update_tip"
     1
     2...
     3
     4[Makefile:18480: test/validation_chainstate_tests.cpp.test] Error 1
     5
     6...
     7
     8WARNING: ThreadSanitizer: data race (pid=23076)
     9Read of size 8 at 0x7f23d669c6d0 by main thread:
    10[#0](/bitcoin-bitcoin/0/) std::__1::ios_base::width() const /usr/lib/llvm-12/bin/../include/c++/v1/ios:523:12 (test_bitcoin+0x17826c)
    11
    12...
    13
    14  Location is global 'std::__1::cout' of size 160 at 0x7f23d669c6b0 (libc++.so.1+0x0000000cd6d0)
    15
    16...
    17
    18SUMMARY: ThreadSanitizer: data race /usr/lib/llvm-12/bin/../include/c++/v1/ios:523:12 in std::__1::ios_base::width() const
    

    I’m not sure if this is a real bug or spurious, but I did work around a similar problem previously (in #15719) by adding a suppression:

    https://github.com/bitcoin/bitcoin/blob/81be7ff5215eb2f21ca7bc73e565c13215dda7bb/test/sanitizer_suppressions/tsan#L33-L40

  2. ryanofsky added the label Bug on Oct 27, 2021
  3. hebasto commented at 5:06 am on October 27, 2021: member
  4. maflcko commented at 8:21 am on October 27, 2021: member
    Indeed, this is not an issue with #15719, but with master
  5. hebasto commented at 8:45 am on October 27, 2021: member

    Indeed, this is not an issue with #15719, but with master

    Suggested suppression in #23370.

  6. ryanofsky commented at 11:36 am on October 27, 2021: contributor

    Looking at https://cirrus-ci.com/github/bitcoin/bitcoin/master, I guess this problem does not happen reliably on master, but is nondeterministic? IIRC, when I was seeing similar TSAN errors months ago in #15719, the problem seemed to be deterministic and would happen every time.

    The chainstate_update_tip was added a month ago in 673a5bd3377929a0a6a62eda8b560e47bc2cca0c from #21526 by @jamesob, so maybe James could look at stack traces and clarify whether it is expected that there might be multiple threads writing to cout at same time.

    I think the TSAN suppression seems safe, but am not completely confident about it. And even if it is thread safe it still means multiple threads could be writing to stdout at the same time and producing garbled output.

    Also, I’m not sure why this bug would start happening now if #21526 was merged a month ago, so wondering if something else might have changed recently? Or if this happens infrequently enough to not be noticed? Or if this was noticed but was just ignored a little while?

    Would be nice if there was a way to easily grep cirrus logs and find all the instances of this.

  7. martinus commented at 4:45 pm on November 7, 2021: contributor

    As mentioned std::cout is supposed to be threadsafe: https://eel.is/c++draft/iostream.objects#overview-7

    The problem is, it seems that io manipulators are not threadsafe: the stream members are not atomic, and there are no locks: https://github.com/llvm/llvm-project/blob/release/12.x/libcxx/include/ios#L372

    But many methods call ios.width(0) so that std::setw() is only valid for one output: https://en.cppreference.com/w/cpp/io/ios_base/width

    This is done here: https://github.com/llvm/llvm-project/blob/main/libcxx/include/locale#L1400 which is called whenever e.g. operator<<(char const*) is called.

    So even when the standard says that concurrent access to cout isn’t supposed to have any data races, this is one right there. Not sure what to do about that.

  8. maflcko commented at 9:01 am on November 8, 2021: member
    Interesting that no one else ran into this before us. Maybe a bug should be reported, since I couldn’t find any https://bugs.llvm.org/buglist.cgi?quicksearch=race%20ios_base&list_id=226152
  9. laanwj closed this on Nov 10, 2021

  10. sidhujag referenced this in commit 6d02f3b61f on Nov 10, 2021
  11. ryanofsky commented at 5:51 pm on November 15, 2021: contributor

    I guess #23370 resolved CI failures, but from Martin’s comment #23366 (comment), it seems like the suppression isn’t ideal because there is a real data race. I think good followups would be to:

    • Improve comment added in #23370, so instead of being a blind link, it has some explanation and at least mentions std::cout. A good template could be comment from 81be7ff5 with updated information from #23366 (comment)
    • Add a new lock around uses of std::cout to remove the TSAN suppression. Or replace cout calls triggering this with calls that go through the logging framework to remove the suppression. Doing either of these would also prevent interleaved output from multiple threads in addition to avoiding need for the suppression.
    • Maybe check upstream and see if library implementation is violating the standard. From Martin’s comment, it does seem like there is a bug in either the library or the standard, though it’s not clear what a good fix would be.

    Could maybe mark this issue up for grabs if this makes sense and is of interest.

  12. maflcko commented at 8:10 am on November 16, 2021: member

    Or replace cout calls triggering this with calls that go through the logging framework to remove the suppression.

    If the boost test logging has internal locking, that seems a fine alternative. Are you referring to BOOST_TEST_MESSAGE https://www.boost.org/doc/libs/1_77_0/libs/test/doc/html/boost_test/utf_reference/testout_reference/test_output_macro_message.html ?

  13. maflcko reopened this on Nov 16, 2021

  14. ryanofsky commented at 2:48 am on November 17, 2021: contributor

    Or replace cout calls triggering this with calls that go through the logging framework to remove the suppression.

    If the boost test logging has internal locking, that seems a fine alternative. Are you referring to BOOST_TEST_MESSAGE https://www.boost.org/doc/libs/1_77_0/libs/test/doc/html/boost_test/utf_reference/testout_reference/test_output_macro_message.html ?

    I was being vague because I didn’t look into where these cout writes were coming from. But from the cirrus TSAN output above, one cout call is coming G_TEST_LOG_FUN and the other one is coming from a boost method boost::unit_test::unit_test_log_t::operator<<(boost::unit_test::lazy_ostream const&). We also do have a few scattered BOOST_TEST_MESSAGE calls which would be a third source of cout writes.

    To be able to remove the suppression, all these these cout writes would need to be protect with a lock. We can lock G_TEST_LOG_FUN ourselves since it is our own code. It looks like we could synchronize the boost cout output by redirecting it with boost::unit_test::unit_test_log.set_stream( std::ostream& ); https://www.boost.org/doc/libs/1_77_0/libs/test/doc/html/boost_test/test_output/logging_api/log_ct_output_stream_redirection.html with a custom std::ostream subclass that locks a mutex and forwards to cout.

    I’m not sure if this would be worth it, but it seems something like this would be needed to remove the suppression.

  15. martinus commented at 5:39 am on November 17, 2021: contributor

    Someone (not me) created a bug for exactly this issue, with a nice short reproducer: https://bugs.llvm.org/show_bug.cgi?id=52509

    In C++20 there will be std::osyncstream: https://en.cppreference.com/w/cpp/io/basic_osyncstream So I think it would be nice to have something with the same API that’s eventually replaced with std::osyncstream once we switch to C++20

  16. maflcko commented at 5:05 pm on January 11, 2022: member
    bugs.llvm was deleted. The bug is now here: https://github.com/llvm/llvm-project/issues/51851
  17. PastaPastaPasta referenced this in commit d9a186ca9a on Apr 16, 2023
  18. PastaPastaPasta referenced this in commit aa34bd6a49 on Apr 16, 2023
  19. PastaPastaPasta referenced this in commit 076e0528ef on Apr 17, 2023
  20. maflcko commented at 12:12 pm on June 12, 2023: member

    Maybe we can set https://en.cppreference.com/w/cpp/io/ios_base/sync_with_stdio as a work-around to get thread-safety?

    Edit: nvm, this should already be set by default?

  21. maflcko commented at 1:09 pm on January 3, 2024: member

    To be able to remove the suppression, all these these cout writes would need to be protect with a lock. We can lock G_TEST_LOG_FUN ourselves since it is our own code. It looks like we could synchronize the boost cout output by redirecting it with boost::unit_test::unit_test_log.set_stream( std::ostream& ); https://www.boost.org/doc/libs/1_77_0/libs/test/doc/html/boost_test/test_output/logging_api/log_ct_output_stream_redirection.html with a custom std::ostream subclass that locks a mutex and forwards to cout.

    I’m not sure if this would be worth it, but it seems something like this would be needed to remove the suppression.

    In C++20 there will be std::osyncstream: https://en.cppreference.com/w/cpp/io/basic_osyncstream So I think it would be nice to have something with the same API that’s eventually replaced with std::osyncstream once we switch to C++20

    I guess std::osyncstream does not work out of the box with boost::unit_test::unit_test_log.set_stream, because boost would internally need to call emit?

  22. maflcko added the label Tests on Jan 3, 2024
  23. maflcko added the label CI failed on Jan 24, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me