Skip to content

Handle stale pid file for crashing daemons#489

Merged
troglobit merged 5 commits into
masterfrom
stale-pidfile
May 12, 2026
Merged

Handle stale pid file for crashing daemons#489
troglobit merged 5 commits into
masterfrom
stale-pidfile

Conversation

@troglobit
Copy link
Copy Markdown
Collaborator

Two fixes around daemon death handling, surfaced by a report where dbus-daemon failed to restart after an unclean exit left /run/messagebus.pid behind.

  • service: clean stale pidfile after unclean daemon exit

    A daemon-owned (pid:!) pidfile survives SIGKILL/OOM/segfault and dbus-daemon refuses to start when its pidfile already exists, so the restart loop fails until Finit gives up. Drop the file when it still names the reaped PID and that PID is gone (PID-reuse guard). Also wired into the forking+starting branch of service_monitor(), which previously bypassed cleanup entirely.

  • service: log signal name and core dumps in death message

    "by signal: 9" → "killed by SIGKILL", with ", core dumped" appended when the kernel wrote a core. Stronger breadcrumb when a daemon suddenly dies.

Test: kill -9 $(cat /run/messagebus.pid), dbus restarts cleanly within one retry interval, log shows "killed by SIGKILL".

This comment was marked as outdated.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread test/src/serv.c
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread src/service.c
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comment thread src/service.c
troglobit added 3 commits May 11, 2026 21:08
With `pid:!/path` Finit does not manage the file -- the daemon
creates it on start and removes it on graceful exit.  If the daemon
dies before cleanup (SIGKILL, OOM, segfault, exit during startup)
the file lingers and can block the next instance from starting,
e.g. dbus-daemon refuses with EEXIST and the restart loop fails.

Remove the file when it still names the just-reaped PID and that
PID is no longer alive (the liveness check guards against reuse).
Called from service_cleanup(), and from service_monitor()'s
forking+starting branch where cleanup was previously skipped.

Signed-off-by: Joachim Wiberg <[email protected]>
Replace the bare signal number ("by signal: 9") with the symbolic
name ("killed by SIGKILL") and annotate when the kernel wrote a
core:("killed by SIGSEGV, core dumped").  Makes the restart line
self-explanatory and gives operators a strong breadcrumb when a
daemon dies unexpectedly.

Signed-off-by: Joachim Wiberg <[email protected]>
The fallback for unknown signal numbers in sig_name() returned the
misspelled "SIGUNKOWN".  Now that this string surfaces in user-
facing logs ("killed by …"), fix the typo.

Signed-off-by: Joachim Wiberg <[email protected]>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

test/src/serv.c:159

  • serv -P FILE sets pidfn, but pidfile() never stores that path in the global fn. As a result, the later pidfile(NULL) call (triggered immediately on first loop iteration because reloading starts at 1) falls back to the default _PATH_VARRUN/serv.pid instead of the -P file, and the utimensat()/cleanup() paths operate on fn rather than the pidfn argument. This can lead to touching/removing the wrong pidfile and potentially creating an extra pidfile when -P is used (as in stale-pidfile.sh). Consider persisting the chosen pidfile path (copy pidfn into fn when -P is provided) and using that stored path consistently for utimensat() and cleanup().
	if (!pidfn) {
		if (fn[0] == 0)
			snprintf(fn, sizeof(fn), "%s%s.pid", _PATH_VARRUN, ident);
		pidfn = fn;
	}

	if (exclusive && !once && checkfn(pidfn))
		errx(EX_SOFTWARE, "PID file %s exists, refusing to start", pidfn);
	once = 1;

	if (!checkfn(pidfn)) {
		pid_t pid;

		pid = getpid();
		inf("Creating PID file %s with %d", pidfn, pid);
		writefn(pidfn, pid);
		atexit(cleanup);
	} else {
		inf("Touching PID file %s", pidfn);
		utimensat(0, fn, NULL, 0);
	}

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comment thread test/Makefile.am
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comment thread test/stale-pidfile.sh
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Comment thread test/src/serv.c
Comment thread test/src/serv.c Outdated
troglobit added 2 commits May 12, 2026 10:11
Cover the scenario fixed in "service: clean stale pidfile after
unclean daemon exit": a daemon with a pid:!/path config dies via
SIGKILL, leaving its pidfile behind, and the next instance must
still come up.

Add a 'serv -x' flag (refuse to start when the pidfile already
exists, dbus-style) so the test actually exercises the cleanup --
without it, plain 'serv' would happily overwrite the file and the
test would pass with or without the fix.

Signed-off-by: Joachim Wiberg <[email protected]>
* src/pid.c: note the stale-pidfile-cleanup exception to the
  documented "Finit does not touch pid:! pidfiles" rule.
* doc/config/services.md: add a user-facing paragraph on the same.
* doc/ChangeLog.md: add Unreleased section covering this PR --
  stale pidfile cleanup, restart log with signal name and core
  dump flag, and the SIGUNKOWN typo fix.

Signed-off-by: Joachim Wiberg <[email protected]>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

@troglobit troglobit merged commit a3eac3a into master May 12, 2026
9 checks passed
@troglobit troglobit deleted the stale-pidfile branch May 12, 2026 08:42
@troglobit troglobit mentioned this pull request May 12, 2026
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants