generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
What I have done
I have added a toleration to DcgmExporter that I found here but using the DcgmExporter definition from your repository.
apiVersion: cloudwatch.aws.amazon.com/v1alpha1
kind: DcgmExporter
metadata:
name: dcgm-exporter
namespace: amazon-cloudwatch
labels:
k8s-app: dcgm-exporter
version: v1
spec:
image: nvcr.io/nvidia/k8s/dcgm-exporter:3.3.3-3.3.1-ubuntu22.04
nodeSelector:
kubernetes.io/os: linux
serviceAccount: dcgm-exporter-service-acct
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
[...]Problem
The toleration is missing in the generated DaemonSet.
Workaround
Our AWS Solutions Architect has found a workaround by adding manually the toleration to the DaemonSet.
kubectl patch daemonset dcgm-exporter -n amazon-cloudwatch --type='json' -p='[ { "op": "add", "path": "/spec/template/spec/tolerations", "value": [ { "key": "nvidia.com/gpu", "operator": "Exists", "effect": "NoSchedule" } ] } ]'Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels