setpill
commented at 11:48 am on August 8, 2019:
contributor
bitcoind can take a long time to flush its db cache to disk upon
shutdown. Systemd sends a SIGKILL after a timeout, causing unclean
shutdowns and triggering a long “Rolling forward” at the next startup.
Disabling the timeout should prevent this from happening, and does not
break systemd’s restart logic.
fanquake added the label
Scripts and tools
on Aug 8, 2019
MarcoFalke assigned dongcarl
on Aug 8, 2019
DrahtBot
commented at 12:52 pm on August 8, 2019:
member
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Conflicts
Reviewers, this pull request conflicts with the following ones:
#15268 (doc: suggest using timeoutstopsec in systemd file during IBD by d3spwn)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
dongcarl
commented at 3:55 pm on August 8, 2019:
member
…it configures the time to wait for the service itself to stop. If it doesn’t terminate in the specified time, it will be forcibly terminated by SIGKILL (see KillMode= in systemd.kill(5)).
Takes a unit-less value in seconds, or a time span value such as “5min 20s”. Pass “infinity” to disable the timeout logic. Defaults to DefaultTimeoutStopSec= from the manager configuration file (see systemd-system.conf(5)).
I believe it’s warranted for us to set our own TimeoutStopSec= instead of relying on the fallback to DefaultTimeoutStopSec= (which has a default of 90 seconds), however, perhaps infinity is overkill. I believe 10 minutes is probably enough for most systems where we expect Bitcoin Core to operate healthily on, but that’s just my speculation and others can/should chime in.
setpill
commented at 12:10 pm on August 9, 2019:
contributor
I have recently encountered a situation on a system with decent specs (4 cores, 16 GB ram, 15 GB db cache, blocksdir on HDD, datadir on SSD) where the clean shutdown took almost 9 minutes. 10 minutes seems like it would be too close for comfort, especially considering the myriad of lower-end hardware that might run bitcoind. I have not attempted to analyse the bounds, but in a situation where the shutdown would hang truly indefinitely manual intervention would be required anyway, I assume, to inspect the error before continuing to use the software.
Is there a potential scenario in which the timeout of infinity would function as a regression compared to the status quo?
instagibbs
commented at 1:10 pm on August 12, 2019:
member
@setpill if the shutdown gets wedged somehow, would infinity cause this to never recover? Unclean shutdown at some point may be preferable.
setpill
commented at 9:51 am on August 16, 2019:
contributor
@instagibbs It is true it will never automatically recover if it gets wedged; but like I said if that happens one probably wants to inspect manually what is going on and if the node is still safe to use, perhaps file a bug report.
The systemd service file can also be overridden by admins to have a shutdown timeout in case it is needed, imho the default one provided with the program should err on the side of caution.
luke-jr
commented at 6:59 pm on August 23, 2019:
member
I have recently encountered a situation on a system with decent specs (4 cores, 16 GB ram, 15 GB db cache, blocksdir on HDD, datadir on SSD) where the clean shutdown took almost 9 minutes.
What caused this? It certainly doesn’t look normal.
IMO some timeout should be kept.
if that happens one probably wants to inspect manually what is going on
Not possible during shutdown…?
dongcarl
commented at 5:43 pm on September 5, 2019:
member
So perhaps what @setpill is saying is that for those who enabled the systemd debug shell (sudo systemctl enable --now debug-shell.service), even when the service is stuck, they can debug using a root shell on TTY9?
setpill
commented at 1:33 pm on September 6, 2019:
contributor
My opinion is that if bitcoin core truly does hang in a way that does not recover automatically, that it should not just be killed and restarted. It’s (imho) better to require manual intervention and let whoever runs the node decide if it is still ok to still trust that node.
But that is my preference, at the very least I think the current timeout should be reconsidered as it appears to be far too short for common usecases, and (especially during IBD) waiting for a clean shutdown saves a lot of time compared to the long “rolling forward” sequence triggered by an unclean shutdown.
instagibbs
commented at 1:40 pm on September 6, 2019:
member
at the very least I think the current timeout should be reconsidered
I think everyone is in agreement on this point. Mind compromising and just picking a “fairly long” shutdown period of say 10 minutes as @dongcarl suggests?
setpill
commented at 3:02 pm on September 6, 2019:
contributor
I suppose a 15gb dbcache is a somewhat exceptional situation, perhaps 10 minutes is a decent default. I’m also looking into implementing it in the other init files for consistency’s sake.
Set init stop timeout to 10 min
`bitcoind` can take a long time to flush its db cache to disk upon
shutdown. Most init files send a `SIGKILL` after a timeout of 1 minute,
causing unclean shutdowns and triggering a long "Rolling forward" at the
next startup. Increasing this timeout to 10 minutes should reduce how
often this occurs, especially during IBD.
fixup! Set ProtectHome in systemd service file
setpill
commented at 3:08 pm on September 6, 2019:
contributor
NB: init files aside from systemd are untested; especially the centos bitcoind.init since it’s not merely an increase of an existing parameter but a setting of a new parameter.
instagibbs
commented at 3:12 pm on September 6, 2019:
member
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2024-12-27 18:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me