Skip to content

Upgrading error on jobset-controller-manager Deployment #3428

@christian-heusel

Description

@christian-heusel

Validation Checklist

  • I confirm that this is a Kubeflow-related issue.
  • I am reporting this in the appropriate repository.
  • I have followed the Kubeflow installation guidelines.
  • The issue report is detailed and includes version numbers where applicable.
  • I have considered adding my company to the adopters page to support Kubeflow and help the community, since I expect help from the community for my issue (see 1. and 2.).
  • This issue pertains to Kubeflow development.
  • I am available to work on this issue.
  • You can join the CNCF Slack and access our meetings at the Kubeflow Community website. Our channel on the CNCF Slack is here #kubeflow-platform.

Version

master

Detailed Description

When upgrading a cluster from 26.03 to master the following error appears:

The Deployment "jobset-controller-manager" is invalid: spec.selector: Invalid value: {"matchLabels":{"app.kubernetes.io/instance":"jobset","app.kubernetes.io/managed-by":"kustomize","app.kubernetes.io/name":"jobset","control-plane":"controller-manager"}}: field is immutable

This is most likely related to #3413.

The manifest which it tries to apply is the following:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: manager
    app.kubernetes.io/created-by: jobset
    app.kubernetes.io/instance: jobset
    app.kubernetes.io/managed-by: kustomize
    app.kubernetes.io/name: jobset
    app.kubernetes.io/part-of: jobset
    control-plane: controller-manager
  name: jobset-controller-manager
  namespace: kubeflow-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/instance: jobset
      app.kubernetes.io/managed-by: kustomize
      app.kubernetes.io/name: jobset
      control-plane: controller-manager
  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/default-container: manager
        traffic.sidecar.istio.io/excludeInboundPorts: "9443"
      labels:
        app.kubernetes.io/instance: jobset
        app.kubernetes.io/managed-by: kustomize
        app.kubernetes.io/name: jobset
        control-plane: controller-manager
    spec:
      containers:
      - args:
        - --config=/controller_manager_config.yaml
        - --zap-log-level=2
        command:
        - /manager
        image: us-central1-docker.pkg.dev/k8s-staging-images/jobset/jobset:v0.11.0
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8081
          initialDelaySeconds: 15
          periodSeconds: 20
        name: manager
        ports:
        - containerPort: 8443
          name: metrics
          protocol: TCP
        - containerPort: 9443
          name: webhook-server
          protocol: TCP
        readinessProbe:
          httpGet:
            path: /readyz
            port: 8081
          initialDelaySeconds: 5
          periodSeconds: 10
        resources:
          limits:
            memory: 4Gi
          requests:
            cpu: 500m
            memory: 128Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
        volumeMounts:
        - mountPath: /controller_manager_config.yaml
          name: manager-config
          subPath: controller_manager_config.yaml
        - mountPath: /tmp/k8s-webhook-server/serving-certs
          name: cert
          readOnly: true
      securityContext:
        runAsNonRoot: true
        seccompProfile:
          type: RuntimeDefault
      serviceAccountName: jobset-controller-manager
      terminationGracePeriodSeconds: 10
      volumes:
      - configMap:
          name: jobset-manager-config
        name: manager-config
      - name: cert
        secret:
          defaultMode: 420
          secretName: jobset-webhook-server-cert

The version that is currently deployed in the cluster is the following:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: "2026-03-19T19:22:01Z"
  generation: 1
  labels:
    app.kubernetes.io/component: manager
    app.kubernetes.io/created-by: jobset
    app.kubernetes.io/instance: controller-manager
    app.kubernetes.io/managed-by: kustomize
    app.kubernetes.io/name: deployment
    app.kubernetes.io/part-of: jobset
    control-plane: controller-manager
  name: jobset-controller-manager
  namespace: kubeflow-system
  resourceVersion: "6427"
  uid: bc81f873-a1a1-4db1-bf07-c8f400f53c6e
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      control-plane: controller-manager
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/default-container: manager
        traffic.sidecar.istio.io/excludeInboundPorts: "9443"
      labels:
        control-plane: controller-manager
    spec:
      containers:
      - args:
        - --config=/controller_manager_config.yaml
        - --zap-log-level=2
        command:
        - /manager
        image: registry.k8s.io/jobset/jobset:v0.10.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 15
          periodSeconds: 20
          successThreshold: 1
          timeoutSeconds: 1
        name: manager
        ports:
        - containerPort: 8443
          name: metrics
          protocol: TCP
        - containerPort: 9443
          name: webhook-server
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            memory: 4Gi
          requests:
            cpu: 500m
            memory: 128Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /controller_manager_config.yaml
          name: manager-config
          subPath: controller_manager_config.yaml
        - mountPath: /tmp/k8s-webhook-server/serving-certs
          name: cert
          readOnly: true
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsNonRoot: true
        seccompProfile:
          type: RuntimeDefault
      serviceAccount: jobset-controller-manager
      serviceAccountName: jobset-controller-manager
      terminationGracePeriodSeconds: 10
      volumes:
      - configMap:
          defaultMode: 420
          name: jobset-manager-config
        name: manager-config
      - name: cert
        secret:
          defaultMode: 420
          secretName: jobset-webhook-server-cert
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2026-03-19T19:27:33Z"
    lastUpdateTime: "2026-03-19T19:27:33Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2026-03-19T19:22:02Z"
    lastUpdateTime: "2026-03-19T19:27:33Z"
    message: ReplicaSet "jobset-controller-manager-d988cfd45" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas:

Steps to Reproduce

  1. Setup a Kubeflow platform deployment with kind on version 26.03
  2. Checkout the latest master branch (at the time this is commit 46f3142)
  3. Try to re-deploy via the while [...] command and observe the error mentioned above

Screenshots or Videos (Optional)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions