Use thread names in logs and deadlock diagnostics #13099

pull jamesob wants to merge 3 commits into bitcoin:master from jamesob:2018-04-26-use-threadnames changing 12 files +325 −33
  1. jamesob commented at 10:22 pm on April 26, 2018: member

    While trying to debug the apparent deadlock revealed in #12873, I realized it’d be nice to have the POTENTIAL DEADLOCK DETECTED message display which thread acquired each lock. I also think it’d be generally useful to have log lines display the name of the originating thread.

    This changeset does both of those things by introducing a class which manages thread naming, ThreadNameRegistry. The class abstracts process-control calls responsible for thread naming and provides automatic number suffixing to threads which use the same name (e.g. httpworker.0, httpworker.1, …).

    With this patch, thread names look like this

     0 $ pstree -p `pgrep bitcoind`
     1
     2bitcoind(3415)─┬─{addcon}(3465)
     3               ├─{httpworker.0}(3454)
     4               ├─{httpworker.1}(3455)
     5               ├─{httpworker.2}(3456)
     6               ├─{httpworker.3}(3457)
     7               ├─{http}(3453)
     8               ├─{msghand}(3467)
     9               ├─{net}(3463)
    10               ├─{opencon}(3466)
    11               ├─{scheduler}(3452)
    12               ├─{scriptch.0}(3445)
    13               ├─{scriptch.1}(3446)
    14               ├─{scriptch.2}(3447)
    15               ├─{scriptch.3}(3448)
    16               ├─{scriptch.4}(3449)
    17               ├─{scriptch.5}(3450)
    18               ├─{scriptch.6}(3451)
    19               └─{torcontrol}(3462)
    

    and the debug log looks like this

     02018-04-26T21:54:23Z [bitcoind] init message: Loading wallet...
     12018-04-26T21:54:24Z [bitcoind] mapBlockIndex.size() = 1
     22018-04-26T21:54:24Z [bitcoind] nBestHeight = 0
     32018-04-26T21:54:24Z [loadblk] Imported mempool transactions from disk: 0 succeeded, 0 failed, 0 expired, 0 already there
     42018-04-26T21:54:24Z [torcontrol] torcontrol thread start
     52018-04-26T21:54:24Z [bitcoind] Bound to [::]:18444
     62018-04-26T21:54:24Z [torcontrol] tor: Error connecting to Tor control socket
     72018-04-26T21:54:24Z [torcontrol] tor: Not connected to Tor control port 127.0.0.1:9051, trying to reconnect
     82018-04-26T21:54:24Z [bitcoind] Bound to 0.0.0.0:18444
     92018-04-26T21:54:24Z [bitcoind] init message: Loading P2P addresses...
    102018-04-26T21:54:24Z [bitcoind] Loaded 0 addresses from peers.dat  0ms
    112018-04-26T21:54:24Z [dnsseed] dnsseed thread start
    122018-04-26T21:54:24Z [bitcoind] init message: Done loading
    132018-04-26T21:54:24Z [addcon] addcon thread start
    142018-04-26T21:54:24Z [dnsseed] Loading addresses from DNS seeds (could take a while)
    152018-04-26T21:54:24Z [net] net thread start
    162018-04-26T21:54:24Z [dnsseed] 0 addresses found from DNS seeds
    172018-04-26T21:54:24Z [dnsseed] dnsseed thread exit
    182018-04-26T21:54:24Z [msghand] msghand thread start
    192018-04-26T21:54:24Z [opencon] opencon thread start
    202018-04-26T21:54:25Z [torcontrol] tor: Error connecting to Tor control socket
    212018-04-26T21:54:25Z [torcontrol] tor: Not connected to Tor control port 127.0.0.1:9051, trying to reconnect
    

    Note that child thread names have changed; I’m no longer using the bitcoin- prefix. Because we’re limited to 16 characters for thread name (on Linux, anyway), that prefix was causing the numeric suffix some names have to be hidden. If it’s desirable to keep the prefix, I can revert this change.

    Implementation considerations

    A basic version of this change could be done pretty trivially with something like

    0thread_local std::string threadname;
    

    but per @theuni (https://github.com/bitcoin/bitcoin/pull/11722#pullrequestreview-79322658), we can’t rely on having thread_local. Also note that with a basic implementation like that, we wouldn’t be able to do the automatic numbering scheme to differentiate between threads with the same basename (e.g. httpworker, scriptch).

    We could also rely solely on thread-related system calls. I don’t like this much either because of how poorly defined or unavailable getname is on some platforms, e.g. OS X, FreeBSD; not to mention the 16 character limit.

    Tests

    If this gets a Concept ACK or two, I’ll implement some. Unittests attached.

  2. in src/threadnames.cpp:28 in 121002fddf outdated
    23+
    24+extern int LogPrintStr(const std::string &str);
    25+
    26+std::unique_ptr<ThreadNameRegistry> g_thread_name_registry(new ThreadNameRegistry());
    27+
    28+namespace {
    


    jamesob commented at 10:25 pm on April 26, 2018:
    Is there somewhere more general or useful I should put the two functions in this namespace?
  3. in src/threadnames.cpp:67 in 121002fddf outdated
    62+    }
    63+
    64+    std::string process_name(name);
    65+
    66+    /*
    67+     * Uncomment if we want to retain the `bitcoin-` system thread name prefix.
    


    jamesob commented at 10:25 pm on April 26, 2018:
    Remove or uncomment this block before merge.
  4. jamesob force-pushed on Apr 27, 2018
  5. jamesob force-pushed on Apr 27, 2018
  6. jamesob force-pushed on Apr 27, 2018
  7. jamesob force-pushed on Apr 27, 2018
  8. fanquake added the label Refactoring on Apr 27, 2018
  9. fanquake added the label Utils/log/libs on Apr 27, 2018
  10. in src/threadnames.cpp:104 in fcfba12253 outdated
     97+    return true;
     98+}
     99+
    100+const std::string& ThreadNameRegistry::GetThreadName()
    101+{
    102+    auto thread_id = boost::this_thread::get_id();
    


    promag commented at 8:46 am on April 27, 2018:
    Should lock mutex.
  11. in src/threadnames.cpp:113 in fcfba12253 outdated
    108+        auto insert_ret = m_id_to_name.emplace(thread_id, GetProcessThreadName());
    109+        return insert_ret.first->second;
    110+    }
    111+}
    112+
    113+void ThreadNameRegistry::RenameProcessThread(const char* name)
    


    promag commented at 8:47 am on April 27, 2018:
    This could be a static function or a function in a anonymous namespace.

    promag commented at 8:48 am on April 27, 2018:
    Argument could be const std::string& name?

    jamesob commented at 3:14 pm on April 27, 2018:
    I’d like to be able to unittest the *Process* functions eventually, so maybe I should just make them public.
  12. in src/threadnames.cpp:129 in fcfba12253 outdated
    124+    // Prevent warnings for unused parameters...
    125+    (void)name;
    126+#endif
    127+}
    128+
    129+std::string ThreadNameRegistry::GetProcessThreadName()
    


    promag commented at 8:47 am on April 27, 2018:
    This could be a static function or a function in a anonymous namespace.
  13. in src/threadnames.cpp:109 in fcfba12253 outdated
    104+
    105+    if (found != m_id_to_name.end()) {
    106+        return found->second;
    107+    } else {
    108+        auto insert_ret = m_id_to_name.emplace(thread_id, GetProcessThreadName());
    109+        return insert_ret.first->second;
    


    promag commented at 8:52 am on April 27, 2018:
    Shoud also add to m_name_to_id?

    jamesob commented at 2:49 pm on April 27, 2018:
    Good catch, thanks.
  14. in src/threadnames.h:20 in fcfba12253 outdated
    15+ * Keeps a map of thread IDs to string names and handles system-level thread naming.
    16+ */
    17+class ThreadNameRegistry
    18+{
    19+public:
    20+    const std::string& GetThreadName();
    


    promag commented at 8:52 am on April 27, 2018:
    Why return reference?
  15. promag commented at 8:53 am on April 27, 2018: member
    Some comments.
  16. skeees commented at 2:49 pm on April 27, 2018: contributor
    Thoughts on pulling the TraceThread class out of util into the new files you’re creating here? Seems like a good home for tracethread and would reduce the clutter in util
  17. practicalswift commented at 4:18 pm on April 27, 2018: contributor
    Concept ACK
  18. jamesob force-pushed on Apr 27, 2018
  19. jamesob commented at 5:38 pm on April 27, 2018: member
    @promag thanks for the review; I’ve pushed changes incorporating your feedback. @skeees I like that idea but maybe we can save it for a future PR? That entails changing a bunch of call sites and I’m worried about introducing a circular dependency between util and threadnames (by way of logging).
  20. TheBlueMatt commented at 0:40 am on April 28, 2018: member
    How does the use of boost::thread::id interact with eventual boost::thread removal? IIRC there was something about no guarantees being made that you can interact with boost::thread if the thread was started by std, though it usually works in practice. We do have a HAVE_THREAD_LOCAL defined, which should be set everywhere but OSX, so maybe we could just turn it on there if boost::thread::id is an issue?
  21. in src/threadnames.h:50 in 8379557572 outdated
    45+    std::string GetProcessName();
    46+
    47+private:
    48+    std::mutex m_map_lock;
    49+    std::map<boost::thread::id, std::string> m_id_to_name;
    50+    std::map<std::string, boost::thread::id> m_name_to_id;
    


    conscott commented at 5:46 pm on April 30, 2018:
    Bit of a nit, but maybe m_name_to_id could instead be a map<string, int> - where it just stores the occurrences of that name/prefix, then you don’t have to call CountMapPrefix, but can just update the count each time.

    jamesob commented at 1:03 am on May 1, 2018:
    Thanks for the look and great suggestion. I think maintaining a map of counts will also allow me to remove all of the utility functions in the anonymous namespace - nice catch!
  22. conscott commented at 5:53 pm on April 30, 2018: contributor
    Concept ACK- just a comment on how the maps are used.
  23. jamesob force-pushed on May 1, 2018
  24. jamesob force-pushed on May 1, 2018
  25. jamesob commented at 6:39 pm on May 1, 2018: member

    @TheBlueMatt good points, thanks for looking. I’ve pushed changes which

    • only make use of boost::thread when thread_local is not available. When it is available, we assign each thread a thread_local-cached ID based on a static counter.
    • incorporate @conscott’s suggestion of using m_name_to_count which nicely cuts out a bunch of code.
    • add tests.

    (jamesob/threadnames.1 -> jamesob/threadnames.3)

    This changeset is ready for review-for-merge IMO.

  26. sipa commented at 6:41 pm on May 1, 2018: member
    @theuni What’s the status of thread_local on OSX? I remember seeing https://stackoverflow.com/questions/28094794/why-does-apple-clang-disallow-c11-thread-local-when-official-clang-supports, but which Xcode version’s patches does our clang run with, or…?
  27. jamesob force-pushed on May 1, 2018
  28. jamesob commented at 6:59 pm on May 1, 2018: member

    I’ve pushed a small change removing GetProcessName support for *BSD systems. After double-checking pthread_getname support for OpenBSD, I found that it doesn’t seem to exist. As a result, any threads which are not explicitly named with Rename() will be labeled "<unknown>" on BSD systems.

     0 $ diff -uw <(git diff master..threadnames.3) <(git diff master..threadnames.4)
     1
     2--- /proc/self/fd/12    2018-05-01 14:56:59.561313752 -0400
     3+++ /proc/self/fd/13    2018-05-01 14:56:59.549313568 -0400
     4@@ -2215,10 +2215,10 @@
     5      void reset();
     6 diff --git a/src/threadnames.cpp b/src/threadnames.cpp
     7 new file mode 100644
     8-index 0000000..305c3b4
     9+index 0000000..92e3f75
    10 --- /dev/null
    11 +++ b/src/threadnames.cpp
    12-@@ -0,0 +1,135 @@
    13+@@ -0,0 +1,133 @@
    14 +// Copyright (c) 2018 The Bitcoin Core developers
    15 +// Distributed under the MIT software license, see the accompanying
    16 +// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    17@@ -2345,8 +2345,6 @@
    18 +
    19 +#if defined(PR_GET_NAME)
    20 +    ::prctl(PR_GET_NAME, pthreadname_buff);
    21-+#elif (defined(__FreeBSD__) || defined(__OpenBSD__) || defined(__DragonFly__))
    22-+    pthread_get_name_np(pthread_self(), pthreadname_buff);
    23 +#elif defined(MAC_OSX)
    24 +    pthread_getname_np(pthread_self(), pthreadname_buff, sizeof(threadname_buff));
    25 +#else
    
  29. jamesob force-pushed on May 1, 2018
  30. jamesob force-pushed on May 1, 2018
  31. jamesob force-pushed on May 1, 2018
  32. jamesob force-pushed on May 2, 2018
  33. jamesob force-pushed on May 2, 2018
  34. jamesob force-pushed on May 2, 2018
  35. jamesob force-pushed on May 2, 2018
  36. jamesob commented at 9:42 pm on May 2, 2018: member
    After spending the day figuring out that thread_local is apparently broken on mingw32 compilers, I’m now falling back to boost thread IDs for WIN32. Tests are passing and this is once again ready for review.
  37. jamesob force-pushed on May 3, 2018
  38. Introduce and use ThreadNameRegistry for thread name management
    Abstracts system thread name setting and tracks thread names in memory. Allows
    automatic thread numbering for reused names (e.g. http.0, http.1, ...).
    
    NB: as-written, this removes the `bitcoin-` prefix from child threads.
    
    Incorporates feedback from @conscott @TheBlueMatt @sipa.
    c3d398ac44
  39. Add thread names to log lines 7422741dcb
  40. Add thread names to deadlock debugging tools b09e25b0da
  41. jamesob force-pushed on May 3, 2018
  42. jamesob commented at 1:45 pm on May 3, 2018: member

    Per @sipa’s advice, I’m now using std::thread to determine thread IDs for all platforms instead of either boost or the ham-fisted thread_local code.

    jamesob/threadnames.11 -> jamesob/threadnames.13

  43. theuni commented at 4:25 pm on May 3, 2018: member
    If platform-specific functions are being used anyway, why not just use pthread_getspecific/pthread_setspecific, which are meant for exactly this? That also makes it a trivial transition when we’re finally ready to use thread_local.
  44. jamesob commented at 5:45 pm on May 3, 2018: member
    @theuni what are you talking about replacing at this point? We’re now platform agnostic with std::thread, and it seems to work fine with boost threads. Any reason you can see that it won’t work?
  45. theuni commented at 6:36 pm on May 3, 2018: member

    @jamesob See the top two commits here for a quick attempt: https://github.com/theuni/bitcoin/commits/thread_names

    That addresses a few things that make me hesitate about this PR:

    • system thread id/name are conflated with internal use. This drags in system requirements (name length, for ex) and overhead for internal functions.
    • Heavy map lookup with locking. #12970 demonstrated how painful overhead in logging can be.
    • Most can go away once we can use thread_local unconditionally.

    I realize that my changes are oversimplified, it’s just a quick POC for discussion.

  46. jamesob commented at 8:08 pm on May 3, 2018: member

    @theuni I think your simple implementation looks great. Very much agree with your concern re: map lookup and locking overhead, and I think there’s a way to combine our approaches to avoid it. Let me know if this sounds right:

    IMO lock acquisition during the single Rename() call per thread is acceptable, but we want to avoid any subsequent locking during GetName() (since it is called per logline). So: during Rename() we can use Registry’s count map for numeric suffix generation and then use your method for caching the name of the thread for use during GetName().

    If that sounds good to you, I’ll cherry-pick your two commits onto this branch and reorganize accordingly. Otherwise if you think the numeric suffix isn’t worthwhile, I’m happy to close this out and proceed on your commits only.

  47. theuni commented at 9:14 pm on May 3, 2018: member
    @jamesob Looking again, I think handling the suffix at this layer is unnecessary and adds a significant amount of complication. Why not just add a suffix before passing the name into TraceThread() ?
  48. jamesob commented at 9:20 pm on May 3, 2018: member
    @theuni ugh, you’re right, that’s dumb. Let’s go off of your commits.
  49. jamesob closed this on May 3, 2018

  50. theuni commented at 9:46 pm on May 3, 2018: member

    From IRC:

    <cfields> jamesob: noo! <cfields> jamesob: I hope you don’t think I’m being NIH about that PR, that wasn’t my intention at all. Coding up an alternative myself helps me to understand the challenges better, it wasn’t meant to replace your work.

  51. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me