While trying to debug the apparent deadlock revealed in #12873, I realized it’d be nice to have the POTENTIAL DEADLOCK DETECTED
message display which thread acquired each lock. I also think it’d be generally useful to have log lines display the name of the originating thread.
This changeset does both of those things by introducing a class which manages thread naming, ThreadNameRegistry
. The class abstracts process-control calls responsible for thread naming and provides automatic number suffixing to threads which use the same name (e.g. httpworker.0
, httpworker.1
, …).
With this patch, thread names look like this
0 $ pstree -p `pgrep bitcoind`
1
2bitcoind(3415)─┬─{addcon}(3465)
3 ├─{httpworker.0}(3454)
4 ├─{httpworker.1}(3455)
5 ├─{httpworker.2}(3456)
6 ├─{httpworker.3}(3457)
7 ├─{http}(3453)
8 ├─{msghand}(3467)
9 ├─{net}(3463)
10 ├─{opencon}(3466)
11 ├─{scheduler}(3452)
12 ├─{scriptch.0}(3445)
13 ├─{scriptch.1}(3446)
14 ├─{scriptch.2}(3447)
15 ├─{scriptch.3}(3448)
16 ├─{scriptch.4}(3449)
17 ├─{scriptch.5}(3450)
18 ├─{scriptch.6}(3451)
19 └─{torcontrol}(3462)
and the debug log looks like this
02018-04-26T21:54:23Z [bitcoind] init message: Loading wallet...
12018-04-26T21:54:24Z [bitcoind] mapBlockIndex.size() = 1
22018-04-26T21:54:24Z [bitcoind] nBestHeight = 0
32018-04-26T21:54:24Z [loadblk] Imported mempool transactions from disk: 0 succeeded, 0 failed, 0 expired, 0 already there
42018-04-26T21:54:24Z [torcontrol] torcontrol thread start
52018-04-26T21:54:24Z [bitcoind] Bound to [::]:18444
62018-04-26T21:54:24Z [torcontrol] tor: Error connecting to Tor control socket
72018-04-26T21:54:24Z [torcontrol] tor: Not connected to Tor control port 127.0.0.1:9051, trying to reconnect
82018-04-26T21:54:24Z [bitcoind] Bound to 0.0.0.0:18444
92018-04-26T21:54:24Z [bitcoind] init message: Loading P2P addresses...
102018-04-26T21:54:24Z [bitcoind] Loaded 0 addresses from peers.dat 0ms
112018-04-26T21:54:24Z [dnsseed] dnsseed thread start
122018-04-26T21:54:24Z [bitcoind] init message: Done loading
132018-04-26T21:54:24Z [addcon] addcon thread start
142018-04-26T21:54:24Z [dnsseed] Loading addresses from DNS seeds (could take a while)
152018-04-26T21:54:24Z [net] net thread start
162018-04-26T21:54:24Z [dnsseed] 0 addresses found from DNS seeds
172018-04-26T21:54:24Z [dnsseed] dnsseed thread exit
182018-04-26T21:54:24Z [msghand] msghand thread start
192018-04-26T21:54:24Z [opencon] opencon thread start
202018-04-26T21:54:25Z [torcontrol] tor: Error connecting to Tor control socket
212018-04-26T21:54:25Z [torcontrol] tor: Not connected to Tor control port 127.0.0.1:9051, trying to reconnect
Note that child thread names have changed; I’m no longer using the bitcoin-
prefix. Because we’re limited to 16 characters for thread name (on Linux, anyway), that prefix was causing the numeric suffix some names have to be hidden. If it’s desirable to keep the prefix, I can revert this change.
Implementation considerations
A basic version of this change could be done pretty trivially with something like
0thread_local std::string threadname;
but per @theuni (https://github.com/bitcoin/bitcoin/pull/11722#pullrequestreview-79322658), we can’t rely on having thread_local
. Also note that with a basic implementation like that, we wouldn’t be able to do the automatic numbering scheme to differentiate between threads with the same basename (e.g. httpworker
, scriptch
).
We could also rely solely on thread-related system calls. I don’t like this much either because of how poorly defined or unavailable getname
is on some platforms, e.g. OS X, FreeBSD; not to mention the 16 character limit.
Tests
If this gets a Concept ACK or two, I’ll implement some. Unittests attached.