fix(kubernetes): harden Deployment + Namespace reconcile + add lifecycle convergence tests#220
Closed
sam-goodwin wants to merge 1 commit intomainfrom
Closed
fix(kubernetes): harden Deployment + Namespace reconcile + add lifecycle convergence tests#220sam-goodwin wants to merge 1 commit intomainfrom
sam-goodwin wants to merge 1 commit intomainfrom
Conversation
…cle convergence tests Replace the single KubernetesApiError envelope with status-specific tagged errors so retries are scoped instead of blanket-catching auth/validation failures, wait for Deployment rollouts to converge, and wait for Namespace finalizers before recreating downstream objects. - Tagged errors: KubernetesNotFound (404), KubernetesConflict (409), KubernetesThrottled (429), KubernetesNetworkError (transport), KubernetesDeploymentNotReady, KubernetesDeleteNotComplete - 429 + transport errors retry with bounded exponential backoff - 409 on apply retries with bounded backoff (resourceVersion races and "namespace is being terminated" both resolve here without looping forever) - After applying a Deployment, poll until status.observedGeneration catches metadata.generation and ready/updatedReplicas reach spec.replicas - After deleting a Namespace, poll until GET 404s so a subsequent apply doesn't race the finalizer - Add unit tests for path building, key/sort/chunk helpers, and isDeploymentReady covering observedGeneration lag, ready/updated lag, fresh-status, and replicas defaulting - Stub describe.skip suites for live-cluster lifecycle (redeploy no-op, OOB drift recovery, OOB delete recovery, rename-replace, double-destroy idempotency) — wire up once an EKS test fixture lands Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
Website Preview DeployedURL: https://alchemyeffectwebsite-worker-pr-220-2pq6zd3sikqxayhw.testing-2b2.workers.dev Built from commit This comment updates automatically with each push. |
Contributor
Author
|
Superseded by #249 (consolidated hardening sweep). Closing — the equivalent commit landed on |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace the single
KubernetesApiErrorenvelope with status-specific tagged errors so retries are scoped instead of blanket-catching auth/validation failures, wait for Deployment rollouts to converge, and wait for Namespace finalizers before recreating downstream objects.Reconciler changes
resourceVersionraces andnamespace is being terminated.Deployment, poll untilstatus.observedGeneration === metadata.generationandreadyReplicas/updatedReplicasreachspec.replicas(5 minute cap). SurfacesKubernetesDeploymentNotReadyinstead of returning before pods come up.Namespace, poll GET until 404 so a subsequent apply doesn't race the finalizer. SurfacesKubernetesDeleteNotCompleteif the namespace stays stuck.DELETEis idempotent on 404 (already gone) and bounded-retry on 409.New lifecycle tests
packages/alchemy/test/Kubernetes/client.test.ts— pure unit coverage:buildKubernetesObjectPath— core vsapisgroup, cluster-scoped vs namespaced, missing-namespace throwskubernetesObjectKey/toKubernetesObjectRef— identity encoding,_clustersentinelchunkByApplyRank/sortRefsForDelete— Namespace before Deployment on apply, reverse on deleteisDeploymentReady— coversobservedGenerationlag,readyReplicaslag,updatedReplicaslag (rolling), fresh-status, andspec.replicasdefaultLive-cluster lifecycle scenarios (redeploy no-op, OOB drift recovery for
replicas/image/env, OOB-delete recovery, rename-triggers-replace, double-destroy idempotency for both Deployment and Namespace) are stubbed asdescribe.skipblocks; this repo has no kind/minikube fixture yet, so they wire up once an EKS test cluster lands.