Skip to content

fix(aws/ec2): harden SecurityGroup reconcile + add lifecycle convergence tests#222

Open
sam-goodwin wants to merge 1 commit intomainfrom
claude/harden-ec2-securitygroup
Open

fix(aws/ec2): harden SecurityGroup reconcile + add lifecycle convergence tests#222
sam-goodwin wants to merge 1 commit intomainfrom
claude/harden-ec2-securitygroup

Conversation

@sam-goodwin
Copy link
Copy Markdown
Contributor

Harden the EC2 SecurityGroup reconciler against eventual-consistency races and rule drift, then add lifecycle convergence tests on top of ScratchStack.

Reconciler changes

-      const describeSecurityGroup = (groupId: string) =>
-        ec2.describeSecurityGroups({ GroupIds: [groupId] }).pipe(
-          Effect.map((r) => r.SecurityGroups?.[0]),
-          Effect.flatMap((sg) =>
-            sg
-              ? Effect.succeed(sg)
-              : Effect.fail(new Error(`Security Group ${groupId} not found`)),
-          ),
-        );
+      const retryEventuallyConsistent = <A, E, R>(
+        eff: Effect.Effect<A, E, R>,
+      ) =>
+        eff.pipe(
+          Effect.retry({
+            while: (e) =>
+              e?._tag === "InvalidGroup.NotFound" ||
+              e?._tag === "SecurityGroupNotVisible",
+            schedule: Schedule.exponential(100).pipe(
+              Schedule.both(Schedule.recurs(10)),
+            ),
+          }),
+        );
+      const describeSecurityGroup = (groupId: string) =>
+        retryEventuallyConsistent(
+          ec2.describeSecurityGroups({ GroupIds: [groupId] }).pipe(
+            Effect.flatMap((r) =>
+              r.SecurityGroups?.[0]
+                ? Effect.succeed(r.SecurityGroups[0])
+                : Effect.fail({
+                    _tag: "SecurityGroupNotVisible" as const,
+                    groupId,
+                  }),
+            ),
+          ),
+        );

Rule sync used to revoke every observed rule and re-authorize the desired set every reconcile. Replaced with a canonical-key diff (protocol + ports + source + description) so we only apply the delta:

-          if (currentIngress.length > 0) {
-            yield* ec2.revokeSecurityGroupIngress({ ... all observed ... });
-          }
-          if (news.ingress) {
-            yield* ec2.authorizeSecurityGroupIngress({ ... all desired ... });
-          }
+          const observedKeys = new Map(observed.map((r) => [canonicalRuleKey(...), r]));
+          const desiredKeys = new Map(desired.map((r) => [canonicalRuleKey(...), r]));
+          const toRevoke = [...observedKeys].filter(([k]) => !desiredKeys.has(k));
+          const toAuthorize = [...desiredKeys].filter(([k]) => !observedKeys.has(k));

read now searches by alchemy tag filters when state is missing, returns Unowned(attrs) for foreign-tagged SGs (gating adoption behind adopt(true)), and treats a vanished SG as undefined so the engine recreates it.

-        read: Effect.fn(function* ({ output }) {
-          if (!output) return undefined;
-          const sg = yield* describeSecurityGroup(output.groupId);
-          ...
+        read: Effect.fn(function* ({ id, output }) {
+          const sg = output?.groupId
+            ? yield* findSecurityGroup(output.groupId)
+            : (yield* ec2.describeSecurityGroups({
+                Filters: yield* createAlchemyTagFilters(id),
+              })).SecurityGroups?.[0];
+          if (!sg) return undefined;
+          ...
+          return (yield* hasAlchemyTags(id, tagRecord)) ? attrs : Unowned(attrs);

delete previously also matched e._tag === "ValidationError" && e.message?.includes("DependencyViolation"). That branch was dead — distilled tags DependencyViolation directly via withDependencyViolationError, so the substring match never fired and any real ValidationError was being swallowed silently. Drop it.

New lifecycle tests

  • redeploy with same props is a no-op (reconcile is idempotent)
  • reconcile resets ingress + egress rules mutated out-of-band
  • reconcile re-creates an SG deleted out-of-band
  • changing groupName triggers replace; old SG is deleted
  • adding/removing/changing rules diffs against cloud state
  • destroying an already-deleted SG is a no-op
  • foreign-tagged SG requires adopt(true) to take over

Distilled patch

No patch needed — error categorization for SG flows (InvalidGroup.NotFound, InvalidPermission.NotFound, DependencyViolation) is already correct in distilled.

.worktrees/ joins .gitignore (matching the VPC hardening PR) so per-resource hardening worktrees stay untracked.

…nce tests

- Wrap describeSecurityGroups in a bounded eventual-consistency retry that rides out InvalidGroup.NotFound and the SG-not-yet-visible window immediately after createSecurityGroup. Both the post-create lookup and the final re-read now use the same helper.
- Replace the full-revoke/full-reauthorize rule sync with a canonical-key diff (protocol + port range + source + description), so reconcile only applies the delta. AWS deduplicates identical rule shapes, so this canonicalization is what AWS uses internally.
- read now searches by alchemy tag filters when state lacks a groupId, treats a vanished SG as undefined (so reconcile recreates it), and returns Unowned(attrs) for foreign-tagged SGs to gate adoption behind adopt(true).
- delete previously matched DependencyViolation by both its proper tag AND a brittle e._tag === "ValidationError" && e.message?.includes("DependencyViolation") substring check. Distilled tags DependencyViolation directly via withDependencyViolationError, so the substring branch is dead code that swallowed unrelated ValidationErrors. Drop it.
- Add lifecycle convergence tests: idempotent redeploy, drift convergence after out-of-band ingress/egress mutation, recreate after out-of-band deletion (rides eventual-consistency), replace on groupName change, rule-set diff on add/remove/change, double-destroy idempotency, foreign-tag takeover via adopt(true).

.worktrees/ joins .gitignore (matching the VPC hardening PR) so per-resource hardening worktrees stay untracked.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@alchemy-version-bot
Copy link
Copy Markdown
Contributor

Install the packages built from this commit:

alchemy

bun add alchemy@https://pkg.ing/alchemy/df3833b

@alchemy.run/better-auth

bun add @alchemy.run/better-auth@https://pkg.ing/@alchemy.run/better-auth/df3833b

@alchemy.run/pr-package

bun add @alchemy.run/pr-package@https://pkg.ing/@alchemy.run/pr-package/df3833b

@alchemy-version-bot
Copy link
Copy Markdown
Contributor

alchemy-version-bot Bot commented May 5, 2026

Website Preview Deployed

URL: https://alchemyeffectwebsite-worker-pr-222-eeeczzzx4ipb72uw.testing-2b2.workers.dev

Built from commit df3833b.


This comment updates automatically with each push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant