Increase init file stop timeout #16569

pull setpill wants to merge 1 commits into bitcoin:master from setpill:fix-systemd-shutdown changing 4 files +4 −3
  1. setpill commented at 11:48 am on August 8, 2019: contributor

    bitcoind can take a long time to flush its db cache to disk upon shutdown. Systemd sends a SIGKILL after a timeout, causing unclean shutdowns and triggering a long “Rolling forward” at the next startup. Disabling the timeout should prevent this from happening, and does not break systemd’s restart logic.

    Addresses #13736.

  2. fanquake added the label Scripts and tools on Aug 8, 2019
  3. MarcoFalke assigned dongcarl on Aug 8, 2019
  4. DrahtBot commented at 12:52 pm on August 8, 2019: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #15268 (doc: suggest using timeoutstopsec in systemd file during IBD by d3spwn)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. dongcarl commented at 3:55 pm on August 8, 2019: member

    Documentation for TimeoutStopSec=: https://www.freedesktop.org/software/systemd/man/systemd.service.html#TimeoutStopSec=

    Relevant quotations:

    …it configures the time to wait for the service itself to stop. If it doesn’t terminate in the specified time, it will be forcibly terminated by SIGKILL (see KillMode= in systemd.kill(5)).

    Takes a unit-less value in seconds, or a time span value such as “5min 20s”. Pass “infinity” to disable the timeout logic. Defaults to DefaultTimeoutStopSec= from the manager configuration file (see systemd-system.conf(5)).

    I believe it’s warranted for us to set our own TimeoutStopSec= instead of relying on the fallback to DefaultTimeoutStopSec= (which has a default of 90 seconds), however, perhaps infinity is overkill. I believe 10 minutes is probably enough for most systems where we expect Bitcoin Core to operate healthily on, but that’s just my speculation and others can/should chime in.

  6. setpill commented at 12:10 pm on August 9, 2019: contributor

    I have recently encountered a situation on a system with decent specs (4 cores, 16 GB ram, 15 GB db cache, blocksdir on HDD, datadir on SSD) where the clean shutdown took almost 9 minutes. 10 minutes seems like it would be too close for comfort, especially considering the myriad of lower-end hardware that might run bitcoind. I have not attempted to analyse the bounds, but in a situation where the shutdown would hang truly indefinitely manual intervention would be required anyway, I assume, to inspect the error before continuing to use the software.

    Is there a potential scenario in which the timeout of infinity would function as a regression compared to the status quo?

  7. instagibbs commented at 1:10 pm on August 12, 2019: member
    @setpill if the shutdown gets wedged somehow, would infinity cause this to never recover? Unclean shutdown at some point may be preferable.
  8. setpill commented at 9:51 am on August 16, 2019: contributor

    @instagibbs It is true it will never automatically recover if it gets wedged; but like I said if that happens one probably wants to inspect manually what is going on and if the node is still safe to use, perhaps file a bug report.

    The systemd service file can also be overridden by admins to have a shutdown timeout in case it is needed, imho the default one provided with the program should err on the side of caution.

  9. luke-jr commented at 6:59 pm on August 23, 2019: member

    I have recently encountered a situation on a system with decent specs (4 cores, 16 GB ram, 15 GB db cache, blocksdir on HDD, datadir on SSD) where the clean shutdown took almost 9 minutes.

    What caused this? It certainly doesn’t look normal.

    IMO some timeout should be kept.

    if that happens one probably wants to inspect manually what is going on

    Not possible during shutdown…?

  10. dongcarl commented at 5:43 pm on September 5, 2019: member
    So perhaps what @setpill is saying is that for those who enabled the systemd debug shell (sudo systemctl enable --now debug-shell.service), even when the service is stuck, they can debug using a root shell on TTY9?
  11. setpill commented at 1:33 pm on September 6, 2019: contributor

    My opinion is that if bitcoin core truly does hang in a way that does not recover automatically, that it should not just be killed and restarted. It’s (imho) better to require manual intervention and let whoever runs the node decide if it is still ok to still trust that node.

    But that is my preference, at the very least I think the current timeout should be reconsidered as it appears to be far too short for common usecases, and (especially during IBD) waiting for a clean shutdown saves a lot of time compared to the long “rolling forward” sequence triggered by an unclean shutdown.

  12. instagibbs commented at 1:40 pm on September 6, 2019: member

    at the very least I think the current timeout should be reconsidered

    I think everyone is in agreement on this point. Mind compromising and just picking a “fairly long” shutdown period of say 10 minutes as @dongcarl suggests?

  13. setpill commented at 3:02 pm on September 6, 2019: contributor
    I suppose a 15gb dbcache is a somewhat exceptional situation, perhaps 10 minutes is a decent default. I’m also looking into implementing it in the other init files for consistency’s sake.
  14. Set init stop timeout to 10 min
    `bitcoind` can take a long time to flush its db cache to disk upon
    shutdown. Most init files send a `SIGKILL` after a timeout of 1 minute,
    causing unclean shutdowns and triggering a long "Rolling forward" at the
    next startup. Increasing this timeout to 10 minutes should reduce how
    often this occurs, especially during IBD.
    
    fixup! Set ProtectHome in systemd service file
    7fb7acfc20
  15. setpill force-pushed on Sep 6, 2019
  16. setpill renamed this:
    Disable systemd stop timeout
    Increase init file stop timeout
    on Sep 6, 2019
  17. setpill commented at 3:08 pm on September 6, 2019: contributor
    NB: init files aside from systemd are untested; especially the centos bitcoind.init since it’s not merely an increase of an existing parameter but a setting of a new parameter.
  18. instagibbs commented at 3:12 pm on September 6, 2019: member
  19. laanwj commented at 9:37 am on October 8, 2019: member
    Agree on increasing this timeout. Better be safe than sorry. It is frustrating to lose your entire sync state (with a high dbcache) on shutdown.
  20. laanwj referenced this in commit 99cebc922c on Oct 8, 2019
  21. laanwj merged this on Oct 8, 2019
  22. laanwj closed this on Oct 8, 2019

  23. sidhujag referenced this in commit a46e4acabc on Oct 8, 2019
  24. MarkLTZ referenced this in commit 64a2993987 on Nov 17, 2019
  25. deadalnix referenced this in commit 161dd286ff on Aug 24, 2021
  26. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 19:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me