Skip to content

fix(core): graceful process tree shutdown for continuous tasks#1

Open
agcty wants to merge 6 commits intonxc-3480from
fix/32438-graceful-process-tree-kill
Open

fix(core): graceful process tree shutdown for continuous tasks#1
agcty wants to merge 6 commits intonxc-3480from
fix/32438-graceful-process-tree-kill

Conversation

@agcty
Copy link
Owner

@agcty agcty commented Feb 26, 2026

Summary

This builds on top of Leo's nxc-3480 branch (PR nrwl#33655) to fix the graceful shutdown regression that was blocking the PR from landing.

Problem: PR nrwl#33655 added killProcessTree using sysinfo to kill all descendants on Ctrl+C, but it was fire-and-forget — processes were killed so fast that cleanup handlers (e.g. server.close()) never had a chance to run.

Solution: Added killProcessTreeGraceful() — an async Rust function that:

  1. Snapshots the entire process tree via BFS before sending any signals
  2. Sends SIGTERM to all descendants (leaves first)
  3. Polls every 100ms, waiting up to 5s (configurable) for processes to exit
  4. Force-kills survivors with SIGKILL after the grace period
  5. Detects reparented descendants (when root exits quickly, children reparented to init)

All TypeScript kill() methods across the codebase now use the graceful variant, and signal handlers properly await the kill promises before calling process.exit().

Key changes

  • process_killer/mod.rs — Added killProcessTreeGraceful, collect_descendants_of, and helper functions
  • running-tasks.ts — Per-child SIGINT handlers no longer call process.exit() (prevents first-child-exits race); parent-level handler awaits all kills
  • forked-process-task-runner.tscleanup() is now async, SIGINT handler awaits it
  • task-orchestrator.ts — Awaits forked runner cleanup alongside other running tasks
  • pseudo-terminal.tskill() uses graceful tree kill directly; shutdown() uses sync killProcessTree for exit handler
  • node-child-process.ts — Both kill methods use killProcessTreeGraceful
  • run-script.impl.ts — Exit handler uses graceful kill with .finally()

Test results

Tested against the reproduction repo (agcty/nx-32438-continuous-tasks-repro) with 3 concurrent services (service-a:3001, service-b:3002, frontend):

  • ✅ No orphaned processes after Ctrl+C / SIGTERM
  • ✅ All ports freed (servers closed gracefully)
  • ✅ Cleanup handlers confirmed running (log files written by onCleanup callbacks)
  • ✅ Shutdown completes in ~1s (well within 5s grace period)

Added 6 integration tests in kill_process_tree.spec.ts covering tree killing, graceful SIGTERM with cleanup verification, slow cleanup within grace period, and already-dead process fast path.

Review by OpenAI Codex (gpt-5.3-codex, xhigh reasoning)

The fix was reviewed by Codex CLI which found and helped address:

  1. Unawaited async kills in forked-process-task-runner → Fixed
  2. Per-child SIGINT listener race condition → Fixed
  3. PTY root-exits-quickly before tree snapshot → Fixed
  4. TaskOrchestrator not awaiting forked cleanup → Fixed

Note: The nx.json change (removing @nx/gradle) was needed to build locally without Java and should be dropped before merging upstream.

Related

Test plan

  • Automated: npx jest packages/nx/src/native/tests/kill_process_tree.spec.ts (6/6 pass)
  • Manual: Multi-service reproduction with cleanup handler verification
  • nx affected -t build,test,lint on nrwl/nx monorepo
  • e2e tests

🤖 Generated with Claude Code

agcty and others added 6 commits February 26, 2026 15:36
Building on the Rust-based process tree killer, this adds a
`killProcessTreeGraceful` async function that implements
SIGTERM → wait → SIGKILL escalation, preventing two issues:

1. Orphaned processes: the tree walker sends signals to all
   descendants regardless of process group, so deep child processes
   (e.g. doppler → bunx → bun → app) all receive SIGTERM.

2. Premature cleanup interruption: the main process now stays alive
   during a configurable grace period (default 5s), keeping the PTY
   master open and preventing the kernel from sending SIGHUP to
   children before they finish their cleanup handlers.

The fire-and-forget `killProcessTree` is kept for synchronous
`process.on('exit')` handlers as a last resort.

Fixes nrwl#32438

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- forked-process-task-runner: await kill promises before process.exit()
- running-tasks: per-child SIGINT handlers no longer call process.exit()
  to prevent first-child-exits race
- pseudo-terminal: shutdown() uses sync killProcessTree for exit handler
- pseudo-terminal: kill() uses killProcessTreeGraceful directly
- process_killer: graceful kill detects reparented descendants

Co-Authored-By: Claude Opus 4.6 <[email protected]>
TaskOrchestrator.cleanup() now awaits the forked process runner's
cleanup promise alongside other running tasks, ensuring all process
trees are fully shut down before the orchestrator continues.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…reeGraceful

Tests cover:
- Killing a simple process
- Killing a multi-level process tree (parent + children)
- Graceful SIGTERM with cleanup handler verification
- Slow cleanup within grace period (verifies process isn't killed prematurely)
- Multi-level tree graceful shutdown
- Already-dead process (no-op fast path)

Skipped on Windows where Unix signals don't apply.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@agcty agcty force-pushed the fix/32438-graceful-process-tree-kill branch from 1d68855 to 888a97c Compare February 26, 2026 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant