From efe02495659ac0c0555447006ffad03aeb56f5f6 Mon Sep 17 00:00:00 2001 From: Wu Sheng Date: Wed, 10 Jun 2026 09:49:02 +0800 Subject: [PATCH] Run OAP init job in the main phase to fix `helm --wait` deadlock The OAP init job was a `post-install,post-upgrade,post-rollback` hook. Under `helm upgrade --install --wait`, Helm waits for all release resources to become Ready before running post-* hooks, but the OAP Deployment runs in `-Dmode=no-init` and never becomes Ready until the init job creates the storage schema. The hook therefore never runs and the install deadlocks until it times out (hits new users on a fresh install/storage). Hooks cannot fix this with embedded storage subcharts: a pre-* hook init job cannot reach main-phase storage, and a post-* hook deadlocks under `--wait`. So the init job now runs as a normal main-phase resource alongside storage and the OAP Deployment, which blocks in no-init mode until the schema appears. To avoid `spec.template is immutable` failures on upgrade (a Job's pod template cannot be patched), the Job name carries an 8-char hash of the chart values, so a changed spec yields a new Job and Helm prunes the previous one. A new optional `oapInit.ttlSecondsAfterFinished` can auto-clean finished Jobs (off by default; left off for GitOps tools that would otherwise recreate the Job). The OAP Deployment startupProbe default failureThreshold is raised 9 -> 30 (90s -> 300s) so the pod waits for the init job during a cold start instead of being restarted. Docs (values.yaml, chart README, root README) updated accordingly. --- README.md | 27 ++++++++++--------- chart/skywalking/README.md | 3 ++- .../skywalking/templates/oap-deployment.yaml | 5 +++- chart/skywalking/templates/oap-init.job.yaml | 19 ++++++++++--- chart/skywalking/values.yaml | 14 +++++++--- 5 files changed, 47 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index 011c576..6698a29 100644 --- a/README.md +++ b/README.md @@ -316,22 +316,25 @@ You can set those environment variables by `--set oap.env.= > The environment variables take priority over the overrode configuration files. -## Rerun OAP init job +## OAP init job -Kubernetes Job cannot be rerun by default, if you want to rerun the OAP init -job, you need to delete the Job and recreate it. +The OAP storage schema (Elasticsearch indices / SQL tables / BanyanDB groups) is created by a +one-shot `*-oap-init-*` Job that runs OAP in `-Dmode=init`. The main OAP Deployment runs in +`-Dmode=no-init` and blocks (its `12800` port stays closed, so it is not Ready) until that schema +exists. The init Job is a **normal release resource** that runs in the main install/upgrade phase, +so `helm upgrade --install --wait` works: the Job creates the schema while OAP waits for it. To get +Helm to surface init-Job failures directly (instead of only seeing OAP fail to become Ready), add +`--wait-for-jobs` alongside `--wait`. + +The Job name carries a hash of the chart values, so any `helm upgrade` that changes a value +re-creates the Job and re-runs init automatically (Helm prunes the previous one). + +To **force a rerun** without changing any value — delete the Job and re-run `helm upgrade`; Helm +recreates the (now missing) Job and init runs again: ```shell -# Make sure to export the Job manifest to a file before deleting it. -kubectl get job -n "${SKYWALKING_RELEASE_NAMESPACE}" -l release=$SKYWALKING_RELEASE_NAME -o yaml > oap-init.job.yaml -# Trim the Job manifest to keep only the Job part, you can either download yq from https://github.com/mikefarah/yq or -# manually remove the fields that are not needed. -yq 'del(.items[0].metadata.creationTimestamp,.items[0].metadata.resourceVersion,.items[0].metadata.uid,.items[0].status,.items[0].spec.template.metadata.labels."batch.kubernetes.io/controller-uid",.items[0].spec.template.metadata.labels."controller-uid",.items[0].spec.selector.matchLabels."batch.kubernetes.io/controller-uid")' oap-init.job.yaml > oap-init.job.trimmed.yaml -# Check the file oap-init.job.trimmed.yaml to make sure it has correct content -# Delete the Job kubectl delete job -n "${SKYWALKING_RELEASE_NAMESPACE}" -l release=$SKYWALKING_RELEASE_NAME -# Create the Job -kubectl -n "${SKYWALKING_RELEASE_NAMESPACE}" apply -f oap-init.job.trimmed.yaml +helm upgrade "$SKYWALKING_RELEASE_NAME" -n "${SKYWALKING_RELEASE_NAMESPACE}" --reuse-values ``` # Contact Us diff --git a/chart/skywalking/README.md b/chart/skywalking/README.md index 21c838b..89be301 100644 --- a/chart/skywalking/README.md +++ b/chart/skywalking/README.md @@ -68,7 +68,7 @@ The following table lists the configurable parameters of the Skywalking chart an | `oap.nodeSelector` | OAP labels for master pod assignment | `{}` | | `oap.tolerations` | OAP tolerations | `[]` | | `oap.resources` | OAP node resources requests & limits | `{} - cpu limit must be an integer` | -| `oap.startupProbe` | Configuration fields for the [startupProbe](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) | `tcpSocket.port: 12800`
`failureThreshold: 9`
`periodSeconds: 10` +| `oap.startupProbe` | Configuration fields for the [startupProbe](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/). The default budget (`failureThreshold` * `periodSeconds` = 300s) is large enough for OAP to wait in no-init mode while the OAP init Job creates the storage schema. | `tcpSocket.port: 12800`
`failureThreshold: 30`
`periodSeconds: 10` | `oap.livenessProbe` | Configuration fields for the [livenessProbe](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) | `tcpSocket.port: 12800`
`initialDelaySeconds: 5`
`periodSeconds: 10` | `oap.readinessProbe` | Configuration fields for the [readinessProbe](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) | `tcpSocket.port: 12800`
`initialDelaySeconds: 5`
`periodSeconds: 10` | `oap.env` | OAP environment variables | `[]` | @@ -109,6 +109,7 @@ The following table lists the configurable parameters of the Skywalking chart an | `oapInit.nodeSelector` | OAP init job labels for master pod assignment | `{}` | | `oapInit.tolerations` | OAP init job tolerations | `[]` | | `oapInit.extraPodLabels` | OAP init job metadata labels | `[]` | +| `oapInit.ttlSecondsAfterFinished` | Seconds after which the finished OAP init Job (and its Pod) is auto-deleted by the Kubernetes TTL-after-finished controller. Empty keeps the Job. Leave empty with GitOps tools (Argo CD/Flux), which would recreate it after deletion. | `""` | | `satellite.name` | Satellite deployment name | `satellite` | | `satellite.replicas` | Satellite k8s deployment replicas | `1` | | `satellite.enabled` | Is enable Satellite | `false` | diff --git a/chart/skywalking/templates/oap-deployment.yaml b/chart/skywalking/templates/oap-deployment.yaml index d88c6ef..8ab826d 100644 --- a/chart/skywalking/templates/oap-deployment.yaml +++ b/chart/skywalking/templates/oap-deployment.yaml @@ -105,7 +105,10 @@ spec: {{ else }} tcpSocket: port: 12800 - failureThreshold: 9 + # In no-init mode OAP blocks (port 12800 stays closed) until the init Job has created + # the storage schema. Give it a generous budget (30 * 10s = 300s) so the pod waits for + # the init Job instead of being restarted during a cold start. + failureThreshold: 30 periodSeconds: 10 {{- end }} readinessProbe: diff --git a/chart/skywalking/templates/oap-init.job.yaml b/chart/skywalking/templates/oap-init.job.yaml index 2bf1874..8b4f05d 100644 --- a/chart/skywalking/templates/oap-init.job.yaml +++ b/chart/skywalking/templates/oap-init.job.yaml @@ -18,17 +18,28 @@ apiVersion: batch/v1 kind: Job metadata: - name: "{{ template "skywalking.oap.fullname" . }}-init" + # NOTE: This Job is intentionally a normal release resource, NOT a Helm hook. + # Running it as a post-install/post-upgrade hook deadlocks `helm upgrade --install --wait`: + # Helm waits for every release resource to become Ready before it runs post-* hooks, but the + # OAP Deployment runs in `-Dmode=no-init` and never becomes Ready until this Job has created + # the storage schema -- so the hook (and therefore the schema) would never run. As a main-phase + # resource the Job runs alongside the OAP Deployment, which blocks in no-init mode until the + # schema appears, so `--wait` resolves instead of deadlocking. + # + # The name carries a hash of the chart values: a Job's `spec.template` is immutable, so a stable + # name would make `helm upgrade` fail with "field is immutable" whenever the pod template changes. + # Hashing yields a fresh Job whenever a relevant value changes; Helm prunes the previous one. + name: "{{ printf "%s-init-%s" (include "skywalking.oap.fullname" . | trunc 40 | trimSuffix "-") (.Values | toYaml | sha256sum | trunc 8) }}" labels: app: {{ template "skywalking.name" . }} chart: {{ .Chart.Name }}-{{ .Chart.Version }} component: "{{ template "skywalking.fullname" . }}-job" heritage: {{ .Release.Service }} release: {{ .Release.Name }} - annotations: - "helm.sh/hook": post-install,post-upgrade,post-rollback - "helm.sh/hook-weight": "1" spec: + {{- if .Values.oapInit.ttlSecondsAfterFinished }} + ttlSecondsAfterFinished: {{ .Values.oapInit.ttlSecondsAfterFinished }} + {{- end }} template: metadata: name: "{{ .Release.Name }}-oap-init" diff --git a/chart/skywalking/values.yaml b/chart/skywalking/values.yaml index e83947d..5681cc2 100644 --- a/chart/skywalking/values.yaml +++ b/chart/skywalking/values.yaml @@ -75,11 +75,13 @@ oap: # initialDelaySeconds: 5 # periodSeconds: 20 startupProbe: {} - # Time to boot the application is set to: - # 9 (failureThreshold) * 10 (periodSeconds) = 90 seconds in this case. + # Boot budget defaults to 30 (failureThreshold) * 10 (periodSeconds) = 300 seconds. + # In no-init mode OAP keeps port 12800 closed until the OAP init Job has created the storage + # schema, so the budget must be large enough to cover storage startup + schema creation; + # otherwise the pod is restarted while it is legitimately waiting for the init Job. # tcpSocket: # port: 12800 - # failureThreshold: 9 + # failureThreshold: 30 # periodSeconds: 10 readinessProbe: {} # tcpSocket: @@ -301,6 +303,12 @@ oapInit: tolerations: [] extraPodLabels: {} # sidecar.istio.io/inject: false + # Auto-delete the completed init Job (and its Pod) this many seconds after it finishes, via the + # Kubernetes TTL-after-finished controller. Leave empty to keep the completed Job around. + # NOTE: leave this empty when using GitOps tools (e.g. Argo CD, Flux) -- they would recreate the + # Job after the TTL controller deletes it, re-running init on every reconcile. The Job name is + # value-hashed, so upgrades already work without TTL; this is only for tidying finished Jobs. + ttlSecondsAfterFinished: "" # Elasticsearch managed by ECK (eck-elasticsearch chart) # When enabled, the ECK operator is also installed as a dependency.