Skip to content

HYPERFLEET-1043 - feat: E2E tests for force-delete lifecycle#111

Merged
openshift-merge-bot[bot] merged 3 commits into
openshift-hyperfleet:mainfrom
pnguyen44:HYPERFLEET-1043/force-delete
Jun 2, 2026
Merged

HYPERFLEET-1043 - feat: E2E tests for force-delete lifecycle#111
openshift-merge-bot[bot] merged 3 commits into
openshift-hyperfleet:mainfrom
pnguyen44:HYPERFLEET-1043/force-delete

Conversation

@pnguyen44
Copy link
Copy Markdown
Contributor

@pnguyen44 pnguyen44 commented Jun 2, 2026

Summary

Adds E2E test for the force-delete lifecycle. Validates the escape hatch for clusters stuck in Finalizing state.

HYPERFLEET-1043

Changes

  • E2E tests for the force-delete lifecycle:
    • E2E test for force-deleting a cluster stuck in Finalizing, verifying it and its nodepools are removed
    • Force-deleting a single nodepool does not affect the parent cluster
    • Adapter handles 404 gracefully without crashing or retrying
    • Negative cases: 409, 400, 404
  • Bumped hyperfleet-api-spec to v1.0.15 for force-delete API operations

Test Plan

  • Unit tests added/updated
  • make test-all passes
  • make lint passes
  • Helm chart changes validated with make test-helm (if applicable)
  • Deployed to a development cluster and verified (if Helm/config changes)
  • E2E tests passed (if cross-component or major changes)

@openshift-ci openshift-ci Bot requested review from kuudori and ma-hill June 2, 2026 15:43
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR implements a force-delete escape hatch for clusters stuck in finalization. The change adds an HTTP response validation helper (handleHTTPNoBodyResponse), client API methods to force-delete clusters and nodepools via ForceDeleteRequest with a reason, enhances test cleanup to recognize 404 responses, introduces deferred cleanup registration for Ginkgo tests, and includes a comprehensive e2e test that simulates adapter unavailability, verifies soft-delete persistence, and validates hard deletion with cascade removal of child nodepools.

Sequence Diagram

sequenceDiagram
  participant Test as E2E Test
  participant API as HyperFleet API
  participant Adapter as Cluster Adapter
  
  Test->>API: Create cluster + nodepool
  API->>Adapter: Provision resources
  Adapter-->>API: Reconciled=True
  
  Test->>Adapter: Scale down to 0 (simulate outage)
  
  Test->>API: DELETE /clusters/{id} (soft-delete)
  API-->>Test: Cluster accessible, Reconciled=False
  Test->>API: Poll for hard-deletion
  API-->>Test: Still exists (Adapter unavailable)
  
  Test->>API: POST /clusters/{id}/force-delete
  API-->>Test: 204 No Content (hard-deleted)
  Test->>API: Verify cluster + nodepool
  API-->>Test: HTTP 404 (cascade removal)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding E2E tests for force-delete lifecycle with a clear feature label and issue reference.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Sec-02: Secrets In Log Output ✅ Passed No secrets, tokens, passwords, or credentials found in any log statements across modified files. All logged fields contain safe metadata (IDs, names, paths) and user-provided justification text.
Description check ✅ Passed The PR description directly relates to the changeset: E2E tests for force-delete lifecycle, API spec dependency bump, and helper methods to validate the escape hatch for clusters stuck in Finalizing state.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/client/client.go (1)

78-99: ⚡ Quick win

Extract the shared status-mismatch error construction.

Lines 82-96 duplicate the mismatch path in handleHTTPResponse (54-68) verbatim. A small helper keeps both in sync as the error shape evolves.

♻️ Proposed extraction
+// newHTTPError builds an HTTPError from a non-matching response, reading the body when possible.
+func newHTTPError(resp *http.Response, action string) *HTTPError {
+	body, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return &HTTPError{
+			StatusCode: resp.StatusCode,
+			Action:     action,
+			Body:       fmt.Sprintf("failed to read error response body: %v", err),
+		}
+	}
+	return &HTTPError{
+		StatusCode: resp.StatusCode,
+		Action:     action,
+		Body:       string(body),
+	}
+}
+
 // handleHTTPNoBodyResponse validates the status code for responses with no body (e.g. 204).
 func handleHTTPNoBodyResponse(resp *http.Response, expectedStatus int, action string) error {
 	defer func() { _ = resp.Body.Close() }()
 
 	if resp.StatusCode != expectedStatus {
-		body, err := io.ReadAll(resp.Body)
-		if err != nil {
-			return &HTTPError{
-				StatusCode: resp.StatusCode,
-				Action:     action,
-				Body:       fmt.Sprintf("failed to read error response body: %v", err),
-			}
-		}
-		return &HTTPError{
-			StatusCode: resp.StatusCode,
-			Action:     action,
-			Body:       string(body),
-		}
+		return newHTTPError(resp, action)
 	}
 
 	return nil
 }

handleHTTPResponse can reuse the same helper for its 54-68 block.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/client/client.go` around lines 78 - 99, The status-mismatch error
construction in handleHTTPNoBodyResponse and handleHTTPResponse should be
extracted into a single helper (e.g., buildStatusMismatchError) that takes (resp
*http.Response, expectedStatus int, action string) and returns error; implement
the helper to read and close resp.Body, handle read errors by returning an
*HTTPError with Body set to the read-error message, or otherwise return
*HTTPError with the response body string, and then replace the duplicated blocks
in both handleHTTPNoBodyResponse and handleHTTPResponse to call this helper
instead of duplicating the logic.
e2e/cluster/force_delete.go (1)

110-130: ⚖️ Poor tradeoff

Fix Ginkgo cleanup ordering to avoid leaving a soft-deleted cluster behind on failures

In e2e/cluster/force_delete.go, the ginkgo.DeferCleanup that restores API required adapters is registered before h.DeferClusterCleanup(clusterID); with Ginkgo’s LIFO DeferCleanup, cluster cleanup runs first. That means CleanupTestCluster (DeleteCluster + wait for hard-delete/404) can run while the stuck-adapter is still scaled down (and still present in the required-adapters set), so the DELETE won’t complete until the restore happens—risking a cleanup timeout and leaving the cluster in Finalizing for later specs. Reorder/depend the cleanups so RestoreAPIRequiredAdaptersWithRetry runs before DeferClusterCleanup (similar to how the nodepool suite does it).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e/cluster/force_delete.go` around lines 110 - 130, The cleanup that
restores adapters (ginkgo.DeferCleanup calling
h.RestoreAPIRequiredAdaptersWithRetry) must be registered after the cluster
cleanup registration so it runs first (Ginkgo runs DeferCleanup in LIFO order);
swap the registration order around the cluster creation so
h.DeferClusterCleanup(clusterID) is called before ginkgo.DeferCleanup(...
RestoreAPIRequiredAdaptersWithRetry ...) (or otherwise ensure
RestoreAPIRequiredAdaptersWithRetry is the last DeferCleanup registered) to
guarantee adapters are restored before CleanupTestCluster/DeleteCluster runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@e2e/cluster/force_delete.go`:
- Around line 110-130: The cleanup that restores adapters (ginkgo.DeferCleanup
calling h.RestoreAPIRequiredAdaptersWithRetry) must be registered after the
cluster cleanup registration so it runs first (Ginkgo runs DeferCleanup in LIFO
order); swap the registration order around the cluster creation so
h.DeferClusterCleanup(clusterID) is called before ginkgo.DeferCleanup(...
RestoreAPIRequiredAdaptersWithRetry ...) (or otherwise ensure
RestoreAPIRequiredAdaptersWithRetry is the last DeferCleanup registered) to
guarantee adapters are restored before CleanupTestCluster/DeleteCluster runs.

In `@pkg/client/client.go`:
- Around line 78-99: The status-mismatch error construction in
handleHTTPNoBodyResponse and handleHTTPResponse should be extracted into a
single helper (e.g., buildStatusMismatchError) that takes (resp *http.Response,
expectedStatus int, action string) and returns error; implement the helper to
read and close resp.Body, handle read errors by returning an *HTTPError with
Body set to the read-error message, or otherwise return *HTTPError with the
response body string, and then replace the duplicated blocks in both
handleHTTPNoBodyResponse and handleHTTPResponse to call this helper instead of
duplicating the logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1d341ad6-5251-4ddd-98af-d42af6bd2dc7

📥 Commits

Reviewing files that changed from the base of the PR and between 28bad99 and 5a97794.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum, !**/go.sum
📒 Files selected for processing (9)
  • e2e/cluster/force_delete.go
  • e2e/nodepool/force_delete.go
  • go.mod
  • pkg/client/client.go
  • pkg/client/cluster.go
  • pkg/client/nodepool.go
  • pkg/helper/helper.go
  • test-design/testcases/delete-cluster.md
  • test-design/testcases/delete-nodepool.md

@pnguyen44 pnguyen44 changed the title HYPERFLEET-1043 - feat: add e3e tests for HYPERFLEET-1043 - feat: E2E tests for force-delete lifecycle Jun 2, 2026
@pnguyen44
Copy link
Copy Markdown
Contributor Author

@CodeRabbit The reason is a required field in the API contract. The log mirrors what a real client would emit for observability.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Tip

For best results, initiate chat on the files or code changes.

@pnguyen44 Acknowledged — makes sense. Required field + observability parity with production clients is a solid justification. I'll update my understanding going forward.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

@pnguyen44 pnguyen44 force-pushed the HYPERFLEET-1043/force-delete branch from 5a97794 to 9b073ea Compare June 2, 2026 16:15
@pnguyen44 pnguyen44 force-pushed the HYPERFLEET-1043/force-delete branch from 9b073ea to 232a8da Compare June 2, 2026 16:35
Comment thread e2e/nodepool/force_delete.go Outdated
Comment thread e2e/nodepool/force_delete.go Outdated
// Restore the adapter after the test regardless of outcome
ginkgo.DeferCleanup(func(ctx context.Context) {
ginkgo.By("Restore nodepool adapter to 1 replica")
if err := h.ScaleDeployment(ctx, h.Cfg.Namespace, deploymentName, 1); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

nit — non-blocking suggestion

Category: Pattern

This restores the adapter to a hardcoded 1 replica, but the original deployment might have a different count. Consider capturing the replica count before scaling down and restoring to that value:

// Before scaling down
originalReplicas, err := h.GetDeploymentReplicas(ctx, h.Cfg.Namespace, deploymentName)
Expect(err).NotTo(HaveOccurred())

// In DeferCleanup
h.ScaleDeployment(ctx, h.Cfg.Namespace, deploymentName, originalReplicas)

If test environments always run 1 replica, feel free to ignore.

Comment thread test-design/testcases/delete-cluster.md Outdated
Comment thread test-design/testcases/delete-cluster.md Outdated
Comment thread test-design/testcases/delete-cluster.md Outdated
Comment thread test-design/testcases/delete-nodepool.md Outdated
@pnguyen44
Copy link
Copy Markdown
Contributor Author

/retest-required

1 similar comment
@pnguyen44
Copy link
Copy Markdown
Contributor Author

/retest-required

@pnguyen44
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@pnguyen44
Copy link
Copy Markdown
Contributor Author

/retest

@pnguyen44
Copy link
Copy Markdown
Contributor Author

/retest-required

2 similar comments
@kuudori
Copy link
Copy Markdown
Contributor

kuudori commented Jun 2, 2026

/retest-required

@pnguyen44
Copy link
Copy Markdown
Contributor Author

/retest-required

@kuudori
Copy link
Copy Markdown
Contributor

kuudori commented Jun 2, 2026

/lgtm

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Jun 2, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kuudori

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved label Jun 2, 2026
@openshift-merge-bot openshift-merge-bot Bot merged commit 918dbf9 into openshift-hyperfleet:main Jun 2, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants