What happened:
When a deployment is cancelled or times out during a ScriptRun, CustomSync, or Kubernetes rollback stage, the Piped agent stops tracking the stage but fails to terminate the underlying shell process. This results in orphaned "zombie" processes (e.g., kubectl, terraform, sleep) and leaked goroutines on the Piped agent host. These orphaned processes continue to run and mutate cluster state even after a rollback has been initiated, leading to potential data corruption and agent resource exhaustion.
What you expected to happen:
When a stage is cancelled or times out, the Piped agent should immediately and definitively terminate the underlying OS process tree associated with that stage's command. This ensures that no unauthorized mutations occur after cancellation and that agent resources are correctly reclaimed.
How to reproduce it:
- Create a PipeCD application with a ScriptRun stage that executes a long-running command (e.g.,
run: sleep 300).
- Trigger a deployment for this application.
- Once the ScriptRun stage is active, click the "Cancel" button in the PipeCD web UI.
- Observe the PipeCD UI and logs; the stage will be reported as
CANCELLED.
- Check the process list on the Piped agent host (e.g.,
ps aux | grep sleep). You will find that the sleep process is still running in the background.
Environment:
- piped version: master / latest
- control-plane version: master / latest
- Others: This is a platform-agnostic bug affecting any Piped agent utilizing ScriptRun or CustomSync executors.
What happened:
When a deployment is cancelled or times out during a ScriptRun, CustomSync, or Kubernetes rollback stage, the Piped agent stops tracking the stage but fails to terminate the underlying shell process. This results in orphaned "zombie" processes (e.g., kubectl, terraform, sleep) and leaked goroutines on the Piped agent host. These orphaned processes continue to run and mutate cluster state even after a rollback has been initiated, leading to potential data corruption and agent resource exhaustion.
What you expected to happen:
When a stage is cancelled or times out, the Piped agent should immediately and definitively terminate the underlying OS process tree associated with that stage's command. This ensures that no unauthorized mutations occur after cancellation and that agent resources are correctly reclaimed.
How to reproduce it:
run: sleep 300).CANCELLED.ps aux | grep sleep). You will find that the sleep process is still running in the background.Environment: