Skip to content

fix: make instance profile naming deterministic to prevent churn#9009

Open
tal19987 wants to merge 1 commit intoaws:mainfrom
tal19987:fix/deterministic-instance-profile-naming
Open

fix: make instance profile naming deterministic to prevent churn#9009
tal19987 wants to merge 1 commit intoaws:mainfrom
tal19987:fix/deterministic-instance-profile-naming

Conversation

@tal19987
Copy link

@tal19987 tal19987 commented Mar 9, 2026

Summary

Fixes #8838

  • Replace uuid.New() with in.Spec.Role in InstanceProfileName() hash input, making profile names deterministic per (nodeclass, role) pair. This prevents runaway instance profile creation caused by IAM eventual consistency triggering the creation branch with a new UUID-based name each time.
  • Normalize role name comparison in the reconciler by stripping IAM path prefixes before comparing currentRole with spec.Role, preventing false mismatches when roles use IAM paths.

Root Cause

Two reinforcing bugs caused a sawtooth pattern in instance profile counts after upgrading from v1.6 to v1.7+:

  1. Non-deterministic naming: InstanceProfileName() used uuid.New(), so every call produced a different name
  2. False empty currentRole: IAM eventual consistency (empty roles, NotFound), empty status, or controller restarts caused the reconciler to think no profile existed

Together: cache expires → IAM returns stale data → role mismatch → new UUID-named profile created → old profile orphaned → GC cleans up → repeat every ~15 min.

Fix

Making InstanceProfileName() deterministic by including spec.Role in the hash means even when the reconciler falsely enters the creation branch, it generates the same name. The provider's Create finds the existing profile, calls ensureRole (idempotent), and returns — no orphan created.

Role changes still produce different profile names (different role → different hash), preserving zero-downtime role switching.

Edge Cases

Scenario Behavior
Upgrading from v1.6 (legacy names) Get(legacyName) finds profile, role matches, no creation
Upgrading from v1.7/v1.8 (UUID names) Get(uuidName) finds profile, role matches, no creation
Status patch failure + controller restart Deterministic name regenerated, Create finds existing profile
Role change New deterministic name, new profile, old one cleaned by GC
Role with IAM path prefix Path normalization prevents false mismatches

…#8838)

Replace uuid.New() with in.Spec.Role in InstanceProfileName() hash input,
making profile names deterministic per (nodeclass, role) pair. This prevents
runaway instance profile creation caused by IAM eventual consistency
triggering the creation branch with a new UUID-based name each time.

Additionally, normalize role name comparison in the reconciler by stripping
IAM path prefixes before comparing currentRole with spec.Role, preventing
false mismatches when roles use IAM paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tal19987 tal19987 requested a review from a team as a code owner March 9, 2026 13:50
@tal19987 tal19987 requested a review from jmdeal March 9, 2026 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Karpenter creates hundreds of InstanceProfiles

1 participant