Skip to content

fix(aws/rds): harden DBCluster reconcile + add lifecycle convergence tests#224

Draft
sam-goodwin wants to merge 1 commit intomainfrom
claude/harden-rds-dbcluster
Draft

fix(aws/rds): harden DBCluster reconcile + add lifecycle convergence tests#224
sam-goodwin wants to merge 1 commit intomainfrom
claude/harden-rds-dbcluster

Conversation

@sam-goodwin
Copy link
Copy Markdown
Contributor

Hardens the AWS RDS DBCluster reconciler in line with the per-resource hardening sweep started by #184 and tracked by sibling PR #201 for DBInstance. Cluster control-plane calls fail with InvalidDBClusterStateFault mid-transition; the previous reconciler issued modifyDBCluster blindly, didn't wait for stable status, and surfaced transient state-machine races as terminal errors.

Reconciler changes

- // unconditional modify, no observed-vs-desired diff
- yield* rds.modifyDBCluster({ DBClusterIdentifier: identifier, EngineVersion: news.engineVersion, ... });
- observed = yield* waitForCluster(identifier);
+ // wait for stable status before mutating
+ if (!isStableClusterStatus(observed.Status)) {
+   observed = yield* waitForStableCluster(identifier, session);
+ }
+ // diff observed → desired, skip the API on no-op
+ const modifyPayload = computeModifyPayload(observed, news, credentials.MasterUserPassword);
+ if (modifyPayload) {
+   yield* retryControlPlane(rds.modifyDBCluster(modifyPayload));
+   observed = yield* waitForStableCluster(identifier, session);
+ }
  delete: Effect.fn(function* ({ output }) {
    yield* rds.deleteDBCluster({ ... }).pipe(
+     Effect.retry({
+       while: (e) => e._tag === "InvalidDBClusterStateFault",
+       schedule: controlPlaneRetryPolicy,
+     }),
      Effect.catchTag("DBClusterNotFoundFault", () => Effect.void),
    );
+   yield* waitForClusterDeleted(output.dbClusterIdentifier);
  }),
  • InvalidDBClusterStateFault is treated as retryable only in scoped contexts where we know we're polling a transitioning resource — never globally tagged retryable, since it's context-dependent (writer being modified vs. genuine ConflictError).
  • New props/attrs: backupRetentionPeriod, preferredBackupWindow, preferredMaintenanceWindow. Existing deletionProtection and copyTagsToSnapshot are now reflected in attrs.
  • computeModifyPayload diffs each mutable field (engine version, parameter group, security groups, port, IAM/HTTP endpoint, serverless v2 scaling, backup window/retention, deletion protection, copy-tags-to-snapshot) and returns undefined on a clean no-op so the modify call is skipped entirely.
  • Tags are diffed against observed cloud tags (not output.tags) so adoption converges.
  • waitForClusterDeleted polls until RDS drops the cluster, so a subsequent reconcile or replace doesn't race against deleting.

New lifecycle tests

packages/alchemy/test/AWS/RDS/DBCluster.test.ts runs destroy → deploy → ... → destroy on ScratchStack and asserts convergence at every step. All tests are test.provider.skip because Aurora cluster create is 5–15 minutes per test; they're intended to be unskipped by hand against an isolated test account.

  • redeploy with same props is a no-op
  • reconcile resets backupRetentionPeriod / preferredBackupWindow / copyTagsToSnapshot / internal alchemy tags mutated out-of-band
  • changing dbClusterIdentifier triggers replace; old cluster is deleted
  • in-place modification of engineVersion (minor)
  • destroying an already-deleted cluster is a no-op
  • adopt(true) re-tags a foreign cluster

Distilled patch

No patch needed — the existing distilled error tagging is sufficient; InvalidDBClusterStateFault is intentionally not marked globally retryable since it's context-dependent (only the reconciler knows whether the transition is one it can ride out), and InsufficientDBClusterCapacityFault recovery takes minutes-to-hours which exceeds the default retry budget.

…tests

Replace the unconditional modify pattern with observed-vs-desired diffing.
Wait for stable cluster status before mutating, retry
InvalidDBClusterStateFault only in scoped polling contexts, and wait for
deletion to converge so replaces don't race against `deleting`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@alchemy-version-bot
Copy link
Copy Markdown
Contributor

Install the packages built from this commit:

alchemy

bun add alchemy@https://pkg.ing/alchemy/b147f49

@alchemy.run/better-auth

bun add @alchemy.run/better-auth@https://pkg.ing/@alchemy.run/better-auth/b147f49

@alchemy.run/pr-package

bun add @alchemy.run/pr-package@https://pkg.ing/@alchemy.run/pr-package/b147f49

@alchemy-version-bot
Copy link
Copy Markdown
Contributor

alchemy-version-bot Bot commented May 5, 2026

Website Preview Deployed

URL: https://alchemyeffectwebsite-worker-pr-224-zdaotegsh7bknl4j.testing-2b2.workers.dev

Built from commit b147f49.


This comment updates automatically with each push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant