Skip to content

Helm hook race condition: ServiceAccount deleted before CRD upgrade job runs #2003

@Guerlielton

Description

@Guerlielton

What steps did you take and what happened:

While installing the secrets-store-csi-driver Helm chart using ArgoCD + Helm, the secrets-store-csi-driver-upgrade-crds job sometimes fails during installation.

The chart defines the following annotations on the ServiceAccount and RBAC resources:

helm.sh/hook: pre-install,pre-upgrade
helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation
helm.sh/hook-weight: "1"

Because these resources are Helm hooks, they are marked as Succeeded immediately after creation, which triggers hook-succeeded and causes Helm to delete them.

When the secrets-store-csi-driver-upgrade-crds job starts (hook weight 10), Kubernetes attempts to create the pod but fails because the ServiceAccount was already deleted.

Observed error:

Error creating: pods "secrets-store-csi-driver-upgrade-crds-" is forbidden:
error looking up service account kube-system/secrets-store-csi-driver-upgrade-crds:
serviceaccount "secrets-store-csi-driver-upgrade-crds" not found

This results in the CRD upgrade job failing during installation.


What did you expect to happen:

The ServiceAccount and RBAC resources should remain available until the CRD upgrade job finishes so that the job pod can start successfully.


Anything else you would like to add:

This issue is reproducible when deploying the chart with ArgoCD, especially during new cluster bootstrap, where timing differences expose this race condition.

A potential fix would be to remove hook-succeeded from the delete policy of the ServiceAccount and RBAC resources:

helm.sh/hook-delete-policy: before-hook-creation

Cleanup can remain on the upgrade job using:

helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded,hook-failed

Which provider are you using:

Not using a specific provider (driver installation only).


Environment:

  • Secrets Store CSI Driver version: v1.5.6
  • Kubernetes version:
Client Version: v1.35.x
Server Version: v1.34.x
  • Deployment method: ArgoCD + Helm

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    Status

    Subprojects - Needs Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions