large wallet: Bitcoind freezes

RomanBelkov commented at 2:28 pm on January 11, 2019: none

We are experiencing the following node failure on one of our big wallets (about 300k addresses) from time to time: The node stops to do any I/O operations, but the bitcoind process still lives on. The node does not recover after some time. We are not sure that the constant reboots are a way to resolve this issue and are looking for help.

The issue occurs from time to time with no real steps to reproduce. We are currently unaware on how to reproduce the issue on our own.

Bitcoin Core 0.16.3 (bitcoin-0.16.3-x86_64-linux-gnu), grabbed from https://bitcoincore.org/bin/bitcoin-core-0.16.3/

Ubuntu 16.04 instance with 8 vCPU, 16Gb RAM and SSD. The machine also runs some other crypto wallets, however, disk is not overloaded with activity and handles (or at least it seems so) just fine.

We are doing quite a bunch of RPC requests on this node. Something about 25 requests per second on average, mostly getRawTransaction and validateAddress. Is this an OK load for bitcoind? Are there any suggestions on how to tweak RPC or related node params? I haven’t seen any specific suggestions on the Internet except for “max them out and pray that your machine handles it”

maflcko added the label Wallet on Jan 12, 2019

maflcko added the label RPC/REST/ZMQ on Jan 12, 2019

benthecarman commented at 6:15 am on January 12, 2019: contributor

Possibly related to #15015

promag commented at 6:59 pm on January 28, 2019: member

@RomanBelkov some questions:

does the RPC interface stops responding?
I assume you have -txindex?
Have you tweaked -rpcthreads and rpcworkqueue?
How big is the wallet? (transaction count, address count)

If it happens again please dump the process threads.

RomanBelkov commented at 12:12 pm on January 29, 2019: none

@promag

We receive timeouts (no answer for 60 seconds) for all RPC calls at some point of time. After 4-5 minutes from first timed out request bitcoind stops any disk I/O.
No, actually we do not use txindex.
I have tweaked the -rpcthreads in the range from 8 to 64, but was unable to find any effect on freezes. I have never tweaked -rpcworkqueue, because I failed to find any good pieces of advice on how to tune this parameter.
Can you please give some advice on how to track these values? I have only the rough estimate of 300.000 addresses.

Sorry if this is a newbie question, but how do you dump the process threads for bitcoind? If there is a special guide for fine tuning/debugging the node, can you please share it with me so I won’t ask any basic questions?

promag commented at 2:28 pm on January 29, 2019: member

bitcoind stops any disk I/O.

How do you check that? And what happens with new calls? Are they rejected or they also timeout?

we do not use txindex

It doesn’t matter for validateaddress but calls to getrawtransaction are serialized, in other words, you don’t have concurrency there. And your wallet might be big enough to cause slow getrawtransactions.

how to track these values

You can call getwalletinfo to see transaction count. You can run dumpwallet to see how many keys you have.

dump the process threads for bitcoind

Try ps -T -p <pid> to view process threads. You could also enable -debug and check debug.log.

I’d be surprise if your node is deadlock, but it’s hard to tell from your feedback. Have you tried running a newer version?

RomanBelkov commented at 2:50 pm on January 30, 2019: none

How do you check that? And what happens with new calls? Are they rejected or they also timeout?

I believe that we use some collectd monitoring for this node right now and it shows clearly that no I/O is performed by the process. If we check the debug log (we started to run it with -debug=1), the last messages would be something like

02019-01-25 13:55:33 ping timeout: 1200.021239s
12019-01-25 13:55:33 disconnecting peer=3
22019-01-25 14:40:03
3-----a considerable amount of newlines-----
42019-01-25 14:40:03 Bitcoin Core version v0.16.3 (release build)

All the RPC requests are timed out as soon as we hit the freeze.

Regarding the validateaddress/getrawtransaction – we’ve introduced the caches on our side to reduce the load on the node and it did not help.

Transaction count is 230160. I am yet unable to perform dumpwallet due to various reasons for this wallet. However, I’ve tried running the listaddressgroupings RPC method and received a node freeze. The node thread dump after I/O stoppage:

 0PID  SPID TTY          TIME CMD
 1 9603  9603 ?        00:00:33 bitcoind
 2 9603  9604 ?        00:00:28 bitcoin-scriptc
 3 9603  9605 ?        00:00:28 bitcoin-scriptc
 4 9603  9606 ?        00:00:28 bitcoin-scriptc
 5 9603  9607 ?        00:04:50 bitcoin-schedul
 6 9603  9608 ?        00:00:19 bitcoin-http
 7 9603  9609 ?        00:17:44 bitcoin-httpwor
 8 9603  9610 ?        00:14:01 bitcoin-httpwor
 9 9603  9611 ?        00:12:45 bitcoin-httpwor
10 9603  9612 ?        00:14:01 bitcoin-httpwor
11 9603  9613 ?        00:24:09 bitcoin-httpwor
12 9603  9614 ?        00:15:27 bitcoin-httpwor
13 9603  9615 ?        00:14:23 bitcoin-httpwor
14 9603  9616 ?        00:14:58 bitcoin-httpwor
15 9603  9617 ?        00:13:25 bitcoin-httpwor
16 9603  9618 ?        00:13:34 bitcoin-httpwor
17 9603  9619 ?        00:15:15 bitcoin-httpwor
18 9603  9620 ?        00:13:24 bitcoin-httpwor
19 9603  9621 ?        00:15:10 bitcoin-httpwor
20 9603  9622 ?        00:14:34 bitcoin-httpwor
21 9603  9623 ?        00:15:52 bitcoin-httpwor
22 9603  9624 ?        00:13:47 bitcoin-httpwor
23 9603  9836 ?        00:00:39 bitcoind
24 9603 10578 ?        00:00:00 bitcoin-torcont
25 9603 10585 ?        00:00:18 bitcoin-net
26 9603 10587 ?        00:00:00 bitcoin-addcon
27 9603 10588 ?        00:00:01 bitcoin-opencon
28 9603 10589 ?        00:01:38 bitcoin-msghand

RomanBelkov commented at 10:59 am on January 31, 2019: none

@promag Here also is the thread dump when the node freezes ’naturally'.

 0PID  SPID TTY          TIME CMD
 11858  1858 ?        00:00:38 bitcoind
 21858  1859 ?        00:00:44 bitcoin-scriptc
 31858  1860 ?        00:00:44 bitcoin-scriptc
 41858  1861 ?        00:00:44 bitcoin-scriptc
 51858  1862 ?        00:07:16 bitcoin-schedul
 61858  1863 ?        00:00:40 bitcoin-http
 71858  1864 ?        00:30:45 bitcoin-httpwor
 81858  1865 ?        00:28:41 bitcoin-httpwor
 91858  1866 ?        00:31:43 bitcoin-httpwor
101858  1867 ?        00:30:00 bitcoin-httpwor
111858  1868 ?        00:35:15 bitcoin-httpwor
121858  1869 ?        00:31:24 bitcoin-httpwor
131858  1870 ?        00:33:17 bitcoin-httpwor
141858  1871 ?        00:38:23 bitcoin-httpwor
151858  1872 ?        00:35:16 bitcoin-httpwor
161858  1873 ?        00:34:52 bitcoin-httpwor
171858  1874 ?        00:30:29 bitcoin-httpwor
181858  1875 ?        00:32:15 bitcoin-httpwor
191858  1876 ?        00:36:47 bitcoin-httpwor
201858  1877 ?        00:33:09 bitcoin-httpwor
211858  1878 ?        00:28:32 bitcoin-httpwor
221858  1879 ?        00:30:34 bitcoin-httpwor
231858  2159 ?        00:00:00 bitcoind
241858  3907 ?        00:00:00 bitcoin-torcont
251858  3911 ?        00:00:46 bitcoin-net
261858  3913 ?        00:00:00 bitcoin-addcon
271858  3914 ?        00:00:02 bitcoin-opencon
281858  3915 ?        00:03:30 bitcoin-msghand

I have also found out that calling listaddressgroupings during our usual node load always causes node to freeze.

Have you tried running a newer version?

No, we have not, unfortunately.

cryptozeny referenced this in commit b6f0a9d6c1 on Feb 8, 2019

cryptozeny referenced this in commit 2a96f22c92 on Feb 8, 2019

cryptozeny referenced this in commit 50403a93c6 on Feb 8, 2019

cryptozeny referenced this in commit ea78562840 on Feb 8, 2019

cryptozeny referenced this in commit fff78b9b6f on Feb 8, 2019

cryptozeny referenced this in commit 72436c90b2 on Feb 8, 2019

RomanBelkov commented at 9:50 am on February 28, 2019: none

@promag what would be the next logical step to overcome the issue? Trying to upgrade to 0.17.x?

cryptozeny commented at 12:50 pm on February 28, 2019: none

same issue on testnet mining pool. it has large txs and qt stops if its updating balances…

RomanBelkov commented at 10:36 am on June 26, 2019: none

Hello, Just in case if someone wondered: At first, I have upgraded the node to 0.17 and it did not help. I ended up cleaning the wallet on this node (removed the change addresses) and was able to reduce the amount of freezes. They still occur from time to time, however.

If someone has any performance tuning suggestions/links/comments, I would be very glad to receive these pieces of advice.

maflcko renamed this:
~~Bitcoind freezes~~
large wallet: Bitcoind freezes
on Jun 26, 2019

promag commented at 10:40 pm on February 6, 2020: member

same issue on testnet mining pool. it has large txs and qt stops if its updating balances…

There are some open pulls to improve bitcoin-qt in that regard. But @RomanBelkov is not referring to bitcoin-qt, I think? @RomanBelkov to be clear, the freeze happens during some RPC? Are you able to reproduce while running in the debugger? The above debug.log doesn’t help at all.

RomanBelkov commented at 10:33 am on February 19, 2020: none

@promag I refer to bitcoind only. Yes, the freezes did happen after some of the RPCs. We did not run the bitcoind in the debugger, unfortunately. After cleaning the wallet I am getting a decent run from June of 2019 with freezes occuring approximately once/twice a month.

willcl-ark commented at 9:57 am on April 19, 2023: member

@RomanBelkov is this issue still present on the current release of Bitcoin Core (v24.0.1), as there have been many wallet improvements in the years since this issue was opened?

RomanBelkov commented at 10:02 am on April 19, 2023: none

@willcl-ark unfortunately I’m unable to comment on issue as I’m not working with Bitcoin Core anymore for quite a while. I guess the issue can be closed as there was no development/reports from other people for 3 years :)

RomanBelkov closed this on Apr 19, 2023

bitcoin locked this on Apr 18, 2024

large wallet: Bitcoind freezes #15148