OCPEDGE-1829: Add e2e tests for pacemaker out-of-service taint and annotation on two node fencing clusters#31304
OCPEDGE-1829: Add e2e tests for pacemaker out-of-service taint and annotation on two node fencing clusters#31304vimauro wants to merge 2 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughAdds two new files implementing TNF (Two-Node with Fencing) fencing taint support. ChangesTNF Fencing Taint Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@vimauro: This pull request references OCPEDGE-1829 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/verified by @vimauro on a live TNF cluster with the code changes from those PRs included: openshift/machine-config-operator#6092 and openshift/cluster-etcd-operator#1625 |
|
/label tide/merge-method-squash |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/edge_topologies/tnf_taint.go`:
- Around line 161-166: Add explicit validation of the node count after
retrieving nodes with utils.GetNodes. Before accessing nodes.Items with
rand.Intn() and the modulo arithmetic, verify that the slice contains at least 2
nodes using an expectation check (similar to the existing error check pattern).
This ensures a clear failure message rather than a panic if nodes.Items is
empty, and confirms there are enough nodes for the randomIndex and targetNode
selection logic that follows.
- Around line 95-97: The Stop() method in the taintObserver type will panic if
called more than once because it attempts to close t.stopCh without guarding
against multiple invocations. To fix this, protect the close operation using
sync.Once by adding a sync.Once field to the taintObserver struct and wrapping
the close(t.stopCh) call in the Stop() method with a once.Do() guard to ensure
it only executes on the first call.
In `@test/extended/edge_topologies/utils/services/taint.go`:
- Around line 167-169: The node Update call in the cleanup path is susceptible
to resourceVersion conflicts on this high-churn resource, and a single failed
update can leave taints/annotations behind and contaminate subsequent tests.
Wrap the oc.AdminKubeClient().CoreV1().Nodes().Update() call in a retry loop
that retries on conflict errors (check for ResourceVersion conflict scenarios),
fetching the latest node state before each retry attempt. This ensures the
cleanup operation is more robust and reduces the chance of test contamination
from failed cleanup attempts.
- Around line 120-123: Replace context.Background() with timeout-bounded
contexts in the Kubernetes API calls to prevent indefinite blocking during
control-plane degradation. In the FetchNodeObject function at lines 120-123,
replace the context.Background() passed to the Get() call with a context that
includes a timeout (such as context.WithTimeout). Additionally, apply the same
fix at lines 167-168 where an Update() call also uses context.Background(). Both
calls should use a reasonably bounded timeout context (typically a few seconds)
instead of the unbounded background context to ensure the test can complete
within acceptable timeframes.
- Around line 87-90: The JournalGrepViaDebug and SystemdServiceJournalGrep
functions construct bash commands by directly interpolating unvalidated
variables (tag, unitName, sinceTimestamp, pattern) into fmt.Sprintf calls that
are passed to bash -c commands, creating a potential shell injection vector.
Harden this by adding allow-list validation for each interpolated variable
before use in command construction. Define validation rules (e.g., regex
patterns or character allowlists) appropriate for each variable type to ensure
only expected formats are accepted, and reject any input that does not match the
allow-list. Apply this validation consistently at both the initial
JournalGrepViaDebug function (around the journalctl command construction at
lines 87-90) and the SystemdServiceJournalGrep function (around the systemctl
command construction at lines 128-130).
- Around line 99-100: The date command in the bash expression used by
exutil.DebugNodeRetryWithOptionsAndChroot currently returns a UTC timestamp
without timezone information (using '+%Y-%m-%d %H:%M:%S'), which can be
misinterpreted as local time by journalctl --since. Update the date format
string to include timezone qualification by appending '+00:00' or another
timezone marker to indicate UTC, ensuring journalctl correctly interprets the
timestamp regardless of the host's local timezone setting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: a1b9cc77-3a79-473d-8757-c387c9f66bca
📒 Files selected for processing (2)
test/extended/edge_topologies/tnf_taint.gotest/extended/edge_topologies/utils/services/taint.go
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@vimauro: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@vimauro: This pull request references OCPEDGE-1829 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vimauro The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Scheduling required tests: Scheduling tests matching the |
|
@vimauro: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary by CodeRabbit