-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Summary
When a User resource is deleted, OrganizationMemberships referencing that user persist with Ready=False (UserNotFound), leaving organizations in an orphaned state with no accessible owner. These organizations cannot be managed by anyone and require manual admin intervention to clean up.
Current Behavior
User Deleted
↓
OrganizationMembership persists (Ready=False, UserNotFound)
↓
Organization exists with "owner" membership pointing to deleted user
↓
No one can access the organization
↓
Webhook blocks cleanup because it sees an "owner" membership exists
↓
Organization is permanently orphaned ❌
Observed in Production
Three OrganizationMemberships were found with Ready=False and UserNotFound:
- Memberships referencing user IDs that no longer exist
- Organizations aged 81-133 days in orphaned state
- Manual deletion of organizations was required to clean up
Root Cause Analysis
| Component | File | Issue |
|---|---|---|
| OrganizationMembership Controller | organization_membership_controller.go:131-149 |
When user not found, marks Ready=False and returns - does NOT delete the membership |
| OrganizationMembership Webhook | organizationmembership_webhook.go:80-117 |
Blocks deletion of "owner" membership without verifying the owner user actually exists |
| Organization Controller | organization_controller.go |
No logic to detect orgs with no valid owners |
| User Controller/Webhook | user_controller.go, user_webhook.go |
No cleanup of dependent resources on user deletion |
Proposed Solution
Keep cleanup logic in the ResourceManager domain (not in User/IAM domain) to maintain separation of concerns:
1. Update OrganizationMembership Webhook
Allow deletion when the referenced user doesn't exist:
func (v *OrganizationMembershipValidator) ValidateDelete(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {
membership := obj.(*resourcemanagerv1alpha1.OrganizationMembership)
// Allow deletion if referenced user doesn't exist
var user iamv1alpha1.User
if err := v.client.Get(ctx, client.ObjectKey{Name: membership.Spec.UserRef.Name}, &user); err != nil {
if apierrors.IsNotFound(err) {
return nil, nil // User gone, allow cleanup
}
}
// ... rest of existing validation ...
}2. Update OrganizationMembership Controller
Delete memberships when the referenced user no longer exists:
// In Reconcile, after user not found:
if apierrors.IsNotFound(err) {
logger.Info("user deleted, cleaning up membership", "user", membership.Spec.UserRef.Name)
if err := r.Client.Delete(ctx, &organizationMembership); err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
}3. Update Organization Controller
Watch OrganizationMemberships and delete organization when no valid owners remain:
func (r *OrganizationController) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// ... existing code ...
// Check if organization has any valid owners
if hasNoValidOwners, err := r.checkForValidOwners(ctx, &organization); err != nil {
return ctrl.Result{}, err
} else if hasNoValidOwners {
logger.Info("organization has no valid owners, deleting", "org", organization.Name)
if err := r.Client.Delete(ctx, &organization); err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
}
// ... rest of existing code ...
}Expected Flow After Fix
User Deleted
↓
OrganizationMembership controller detects user gone
↓
Deletes the membership (webhook allows this since user doesn't exist)
↓
Organization controller detects no valid owners
↓
Deletes the organization ✅
Additional Considerations
- Grace period: Consider adding a grace period before deleting memberships/orgs to handle temporary user states
- Audit logging: Log organization deletions triggered by this cleanup for audit purposes
- Notifications: Consider notifying other org members before deletion if the org has non-owner members
Labels
- bug
- resourcemanager
- iam
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels