Fix replica escalation during rolling updates with local recommender#334
Conversation
During rolling updates with maxSurge >= 1, the WPA enters a +1 feedback loop that escalates replicas on every reconciliation cycle: 1. Status.Replicas = Spec.Replicas + 1 (surge pod) 2. WPA sends Status.Replicas as CurrentReplicas to the recommender 3. Recommender returns "hold at CurrentReplicas" (metric between watermarks) 4. WPA compares response against Spec.Replicas, sees an upscale 5. Sets Spec.Replicas = Status.Replicas, K8s creates new surge pod, loop repeats Fix: Send Spec.Replicas (the intended replica count) instead of Status.Replicas (which includes transient surge pods) as CurrentReplicas to the recommender. This aligns the recommender's input with what adjustReplicaCount uses as its baseline (target.Spec.Replicas at line 319), breaking the feedback loop. When there is no rolling update, Spec.Replicas == Status.Replicas so this change has no effect. The recommender still receives ReadyReplicas via CurrentReadyReplicas if it needs the actual running pod count. Fixes: DataDog#333 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
@codex review |
|
To use Codex here, create a Codex account and connect to github. |
| value.With(labelsWithMetricName).Set(float64(utilizationQuantity)) | ||
|
|
||
| replicaCount, metricPos := adjustReplicaCount(logger, target.Spec.Replicas, currentReadyReplicas, wpa, int32(reco.Replicas), int32(reco.ReplicasLowerBound), int32(reco.ReplicasUpperBound)) | ||
| recommendedReplicas := int32(reco.Replicas) |
There was a problem hiding this comment.
Another simpler option - but this would be a breaking change as the contract with the clients would change. Not sure if we that is okay. If so, we can do that instead.
There was a problem hiding this comment.
As mentioned on slack, I can't find any reference to this contract in the docs or git history... I think this was more likely an oversight, possibly because the original change introduced over 2 years ago didn't test with maxSurge configured
I think this approach would be cleaner, so I think we should proceed with this one
There was a problem hiding this comment.
Thanks, reverted the pr to simpler approach.
|
@clamoriniere @sblumenthal @vboulineau since you are listed as maintainers for WPA can I get feedback from your on this. Currently due to this our deployments are bit messy. |
41113fa to
a77d70d
Compare
70e8425
into
DataDog:main
…334) Fix replica escalation during rolling updates with local recommender During rolling updates with maxSurge >= 1, the WPA enters a +1 feedback loop that escalates replicas on every reconciliation cycle: 1. Status.Replicas = Spec.Replicas + 1 (surge pod) 2. WPA sends Status.Replicas as CurrentReplicas to the recommender 3. Recommender returns "hold at CurrentReplicas" (metric between watermarks) 4. WPA compares response against Spec.Replicas, sees an upscale 5. Sets Spec.Replicas = Status.Replicas, K8s creates new surge pod, loop repeats Fix: Send Spec.Replicas (the intended replica count) instead of Status.Replicas (which includes transient surge pods) as CurrentReplicas to the recommender. This aligns the recommender's input with what adjustReplicaCount uses as its baseline (target.Spec.Replicas at line 319), breaking the feedback loop. When there is no rolling update, Spec.Replicas == Status.Replicas so this change has no effect. The recommender still receives ReadyReplicas via CurrentReadyReplicas if it needs the actual running pod count. Fixes: #333 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Co-authored-by: steven.blumenthal <[email protected]> (cherry picked from commit 70e8425)
…334) (#342) Fix replica escalation during rolling updates with local recommender (#334) Fix replica escalation during rolling updates with local recommender During rolling updates with maxSurge >= 1, the WPA enters a +1 feedback loop that escalates replicas on every reconciliation cycle: 1. Status.Replicas = Spec.Replicas + 1 (surge pod) 2. WPA sends Status.Replicas as CurrentReplicas to the recommender 3. Recommender returns "hold at CurrentReplicas" (metric between watermarks) 4. WPA compares response against Spec.Replicas, sees an upscale 5. Sets Spec.Replicas = Status.Replicas, K8s creates new surge pod, loop repeats Fix: Send Spec.Replicas (the intended replica count) instead of Status.Replicas (which includes transient surge pods) as CurrentReplicas to the recommender. This aligns the recommender's input with what adjustReplicaCount uses as its baseline (target.Spec.Replicas at line 319), breaking the feedback loop. When there is no rolling update, Spec.Replicas == Status.Replicas so this change has no effect. The recommender still receives ReadyReplicas via CurrentReadyReplicas if it needs the actual running pod count. Fixes: #333 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Co-authored-by: steven.blumenthal <[email protected]> (cherry picked from commit 70e8425) Co-authored-by: piyushjindal-dd <[email protected]> Co-authored-by: steven.blumenthal <[email protected]>
Summary
Fixes a feedback loop where the WPA escalates replica count on every reconciliation cycle during rolling updates when using a local recommender. The WPA goes from steady-state (e.g., 18 replicas) to
maxReplicas(e.g., 40+) during routine deployments, even when the metric is stable and within watermark bounds.Fixes #333
The Bug
During a rolling update with
maxSurge >= 1, Kubernetes maintainsStatus.Replicas = Spec.Replicas + maxSurge. The WPA sendsStatus.ReplicasasCurrentReplicasto the recommender, butadjustReplicaCountcompares the recommender's response againstSpec.Replicas. This creates a +1 escalation on every cycle:Observed Production Impact (authenticator, us1.prod.dog, 2026-03-16)
Dashboards:
The Fix
Send
Spec.Replicasinstead ofStatus.ReplicasasCurrentReplicasto the recommender (replica_calculator.go:283).Status.Replicasincludes transient surge pods that Kubernetes creates during rolling updates.Spec.Replicasis the intended replica count — and it is already whatadjustReplicaCountuses as its baseline (line 319). Sending the same value to the recommender breaks the feedback loop.When there is no rolling update in progress,
Spec.Replicas == Status.Replicas, so this change has no effect in steady state. The recommender still receives the actual running pod count viaCurrentReadyReplicasif it needs it.Why not a post-hoc clamp?
An earlier approach added a clamp after
adjustReplicaCountto suppress the false-upscale result. That approach fixed the symptom but left the recommender receiving stale/inflated input during every rollout. SendingSpec.Replicasis the correct fix at the source: the recommender gets accurate input and returns an accurate recommendation.