Please describe the feature you’d like to see added.
I’d like some way to control the levelDB compaction processes a bit more. Specifically, if I could control when they are scheduled (hourly in the background, for example), and possibly limit resources they consume (I/O, in particular), I think it would help.
Is your feature related to a problem, if so please describe it.
The problem I’m having is that when I reboot my bitcoin-core node (which runs with txindex=1
), the RPC listener comes up pretty promptly, so my healthchecks (currently just TCP) pass. However, the service is not in its usual baseline state; it is doing a lot of read I/O, and logging about levelDB compaction. This condition lasts for about an hour.
Here is an illustration of the I/O level relative to baseline. The left side is the restart, and the lower right side is after this abates. Dark blue is read:
Here’s a summary of the logs at this time:
The “Compacting” log line is extremely elevated during this period (though it does occur at a much lower level after):
Here’s a graph of RPC trace P50 duration before, during, and after this phase:
The real problem is the last graph, the elevated RPC latency. The tail and head latencies are also much worse than normal, so it’s not just a tail latency issue I could solve with timeouts / hedging. The server goes from microsecond/millisecond latency to 10s of seconds, especially for sendrawtransaction
(10-30s max), with listunspent
a distant second (3-4s max).
This latency abates as soon as the high I/O and compaction logging stops. I am therefore making the intuitive leap (so experts, please consider this critically and I welcome other explanations) that resource utilization during compaction is causing some RPCs to be very slow. I also considered lock contention (perhaps cs_main
) but I couldn’t see it in the code.
Describe the solution you’d like
I’d like the experts to recommend a solution. Intuitively, it seems like levelDB could amortize this compaction work during normal operation as a background task (the README seems to imply it already should?). Or maybe some way to limit resources used for compaction?
Describe any alternatives you’ve considered
Currently, I’m looking at alternative ways to do the RPC I enabled txindex
for, which is getrawtransaction
without the blockhash. But, that will only sidestep this issue with compaction and RPC latency.
Please leave any additional context
I am using a slower filesystem than most. It is a regionally-replicated NFS store, which we chose for resiliency reasons. Intuitively, I’d expect this problem to be less severe (or shorter duration) with lower-latency storage, but still present.
Command line args:
0txindex="1", rpcworkqueue="1024", rpc_*="redacted", debug="coindb", debug="estimatefee", debug="reindex", debug="leveldb", debug="walletdb", debug="lock", debug="rpc", dbcache="5734", datadir="/home/bitcoin/data", chain="main"