Handle stale pid file for crashing daemons#489
Merged
Conversation
With `pid:!/path` Finit does not manage the file -- the daemon creates it on start and removes it on graceful exit. If the daemon dies before cleanup (SIGKILL, OOM, segfault, exit during startup) the file lingers and can block the next instance from starting, e.g. dbus-daemon refuses with EEXIST and the restart loop fails. Remove the file when it still names the just-reaped PID and that PID is no longer alive (the liveness check guards against reuse). Called from service_cleanup(), and from service_monitor()'s forking+starting branch where cleanup was previously skipped. Signed-off-by: Joachim Wiberg <[email protected]>
Replace the bare signal number ("by signal: 9") with the symbolic
name ("killed by SIGKILL") and annotate when the kernel wrote a
core:("killed by SIGSEGV, core dumped"). Makes the restart line
self-explanatory and gives operators a strong breadcrumb when a
daemon dies unexpectedly.
Signed-off-by: Joachim Wiberg <[email protected]>
The fallback for unknown signal numbers in sig_name() returned the
misspelled "SIGUNKOWN". Now that this string surfaces in user-
facing logs ("killed by …"), fix the typo.
Signed-off-by: Joachim Wiberg <[email protected]>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
test/src/serv.c:159
serv -P FILEsetspidfn, butpidfile()never stores that path in the globalfn. As a result, the laterpidfile(NULL)call (triggered immediately on first loop iteration becausereloadingstarts at 1) falls back to the default_PATH_VARRUN/serv.pidinstead of the-Pfile, and theutimensat()/cleanup()paths operate onfnrather than thepidfnargument. This can lead to touching/removing the wrong pidfile and potentially creating an extra pidfile when-Pis used (as in stale-pidfile.sh). Consider persisting the chosen pidfile path (copypidfnintofnwhen-Pis provided) and using that stored path consistently forutimensat()andcleanup().
if (!pidfn) {
if (fn[0] == 0)
snprintf(fn, sizeof(fn), "%s%s.pid", _PATH_VARRUN, ident);
pidfn = fn;
}
if (exclusive && !once && checkfn(pidfn))
errx(EX_SOFTWARE, "PID file %s exists, refusing to start", pidfn);
once = 1;
if (!checkfn(pidfn)) {
pid_t pid;
pid = getpid();
inf("Creating PID file %s with %d", pidfn, pid);
writefn(pidfn, pid);
atexit(cleanup);
} else {
inf("Touching PID file %s", pidfn);
utimensat(0, fn, NULL, 0);
}
Cover the scenario fixed in "service: clean stale pidfile after unclean daemon exit": a daemon with a pid:!/path config dies via SIGKILL, leaving its pidfile behind, and the next instance must still come up. Add a 'serv -x' flag (refuse to start when the pidfile already exists, dbus-style) so the test actually exercises the cleanup -- without it, plain 'serv' would happily overwrite the file and the test would pass with or without the fix. Signed-off-by: Joachim Wiberg <[email protected]>
* src/pid.c: note the stale-pidfile-cleanup exception to the documented "Finit does not touch pid:! pidfiles" rule. * doc/config/services.md: add a user-facing paragraph on the same. * doc/ChangeLog.md: add Unreleased section covering this PR -- stale pidfile cleanup, restart log with signal name and core dump flag, and the SIGUNKOWN typo fix. Signed-off-by: Joachim Wiberg <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two fixes around daemon death handling, surfaced by a report where
dbus-daemonfailed to restart after an unclean exit left/run/messagebus.pidbehind.service: clean stale pidfile after unclean daemon exit
A daemon-owned (pid:!) pidfile survives SIGKILL/OOM/segfault and
dbus-daemonrefuses to start when its pidfile already exists, so the restart loop fails until Finit gives up. Drop the file when it still names the reaped PID and that PID is gone (PID-reuse guard). Also wired into the forking+starting branch ofservice_monitor(), which previously bypassed cleanup entirely.service: log signal name and core dumps in death message
"by signal: 9" → "killed by SIGKILL", with ", core dumped" appended when the kernel wrote a core. Stronger breadcrumb when a daemon suddenly dies.
Test:
kill -9 $(cat /run/messagebus.pid), dbus restarts cleanly within one retry interval, log shows "killed by SIGKILL".