Describe the bug
In flyteplugins/go/tasks/pluginmachinery/flytek8s/copilot.go, the CoPilot timeout is assigned directly to TerminationGracePeriodSeconds without converting from nanoseconds to seconds:
coPilotPod.TerminationGracePeriodSeconds = (*int64)(&cfg.Timeout.Duration)
cfg.Timeout.Duration is a time.Duration, which stores nanoseconds as int64. Kubernetes expects terminationGracePeriodSeconds in seconds.
For example, a 1-hour copilot timeout produces terminationGracePeriodSeconds: 3600000000000.
Expected behavior
The duration should be converted to seconds before assignment:
seconds := int64(cfg.Timeout.Duration.Seconds())
coPilotPod.TerminationGracePeriodSeconds = &seconds
How to reproduce
- Run a Flyte task with CoPilot enabled (i.e., with output handling) which ignores SIGTERM
- Cancel or timeout the execution
- Observe the pod enters
Terminating state and never completes termination until manual force-deletion
- Inspect the pod:
kubectl get pod <name> -o jsonpath='{.spec.terminationGracePeriodSeconds}' returns 3600000000000
- Inspect the deletion timestamp: it will be decades/centuries in the future
Environment
- FlytePropeller version: v1.16.3 (also present on
master/v2.0.9)