Merely compiling Bitcoin Core with thread_local seems to correlate with massively degraded performance for both the RPC and P2P interface on low-end machines. See for example:
- #18538 (comment)
- Maybe #14669
- Maybe #17247
If thread_local really is the source of the performance degradation, it should be disabled by default and only enabled by --enable-debug
(for use in lock order debugging).