RPC sockets get stuck in CLOSE

espringe commented at 10:16 PM on March 27, 2014: none

When RPC requests get interrupted, the sockets can end up stuck in CLOSE_WAIT state. After 3 are in this state, bitcoind is unable to respond to RPC requests. Happens with 0.8.6 and 0.9.0. And lasts far longer than rpctimeout would suggest.

$ lsof -p `pidof bitcoind` | grep 8332
bitcoind 872 bitcoin   23u  IPv6  10179      0t0     TCP *:8332 (LISTEN)
bitcoind 872 bitcoin   30u  IPv6  11153      0t0     TCP 107.101.212.239:8332->customer-GDL-125-2.megared.net.mx:61668 (CLOSE_WAIT)
bitcoind 872 bitcoin   38u  IPv6  11243      0t0     TCP 107.101.212.239:8332->customer-GDL-125-2.megared.net.mx:61670 (CLOSE_WAIT)

Googling this issue, it seems lots of people have been bitten by it -- without any real solutions

laanwj added the label Bug on Mar 28, 2014

laanwj added the label Priority Medium on Mar 28, 2014

laanwj commented at 10:53 AM on March 28, 2014: member

So it appears that there is a leak of file descriptors somewhere in the RPC code?

What version of boost?

espringe commented at 4:22 PM on March 28, 2014: none

1.53.0

laanwj commented at 6:46 AM on March 31, 2014: member

Are you using SSL?

espringe commented at 4:12 PM on March 31, 2014: none

Nope

laanwj commented at 6:46 AM on April 1, 2014: member

Maybe doing anything else that may make your RPC usage behaviour different from others?

What do you mean by RPC requests 'getting interrupted'?

espringe commented at 1:07 PM on April 1, 2014: none

As in if a program crashes, or yje network is disconnected while a client is making an RPC call. Only tested for remote connections On Apr 1, 2014 12:47 AM, "Wladimir J. van der Laan" < notifications@github.com> wrote:

Maybe doing anything else that may make your RPC usage behaviour different from others?

What do you mean by RPC requests 'getting interrupted'?

Reply to this email directly or view it on GitHubhttps://github.com/bitcoin/bitcoin/issues/3968#issuecomment-39175445 .

gavinandresen commented at 1:15 PM on April 1, 2014: contributor

This would be a good candidate for a regression test in qa/rpc-tests. You should be able to reproduce with kill -9 while servicing requests...

ajweiss commented at 7:03 PM on May 6, 2014: contributor

Hmm... So I attempted to reproduce this one in 0.9.0 on Linux/x64 with boost 1.53 by writing a script that banged on the RPC server pretty heavily with random SIGKILLs to interrupt the requests. I got lots of CLOSE_WAITS, but they all seemed to clean themselves up and the server never stopped servicing requests. However, it would get pretty backed up response time wise when I would bang really hard. I'm starting to think that perhaps the server is getting wedged in a way that is specific to your chosen RPC requests.

Can you provide some more detail on what your RPC request activity looks like? What functions are you calling and with what frequency are you calling them? Even better, would you be willing to run with "-debug=rpc" and attach the debug.log file from the data directory?

Incidentally, rpctimeout was deprecated in 0.7.0 when multithreaded JSON-RPC was implemented. The manpage for bitcoind erroneously still mentions it.

laanwj added the label RPC on May 9, 2014

laanwj commented at 7:52 AM on August 19, 2014: member

Please do not +1 issues. Only post if you have something substantial to add.

rubensayshi commented at 6:05 PM on March 18, 2015: contributor

I was experiencing similar issues, where if hammering the RPC with requests as fast as possible (from a single threaded client process) it would at some point get stuck and hit the client side timeout (60s), after a while it get's unstuck and I can do a big batch of requests again.

Client process is PHP CURL (or HHVM, same result)
Also experienced the same issue when hammering the 0.10 rest interface.
Could not see anything suspicious in the lsof, so I'm not sure if it's the same issue.
Nothing special in the debug.log, with -debug or debug=rpc.

I could not reproduce it with a simple while [ True ]; do bitcoin-cli -testnet getrawtransaction 1030c244faabd45f6750d7c7254fa1aa87158deb900a329ef50a6dcb3aa0228e; done, nor with while [ True ]; do curl http://localhost:18332/rest/tx/1030c244faabd45f6750d7c7254fa1aa87158deb900a329ef50a6dcb3aa0228e.hex; done;

I managed to fix the issue by adding a Connection: close header to the client CURL request, after that I can hammer it for hours without interuptions.
Or alternatively reuse the connection handler. @espringe what are you using to do the requests and can you reproduce the RPC getting stuck by doing an infinite loop for getrawtransaction 1030c244faabd45f6750d7c7254fa1aa87158deb900a329ef50a6dcb3aa0228e?
If so, can you try adding the Connection: close header and see if that fixes it for you?

sipa commented at 10:15 AM on April 11, 2015: member

Since 0.10, rpc keepalive is optional and off by default (in 0.8 and 0.9 it was always on). Does that resolve the issue?

mcelrath commented at 3:05 PM on October 29, 2015: none

This is likely the same as #6454 and fixed by #5677

laanwj closed this on Nov 11, 2015

DrahtBot locked this on Sep 8, 2021

RPC sockets get stuck in CLOSE_WAIT state #3968