Implement EnabledPeriod for scheduled scaling activity#243
Implement EnabledPeriod for scheduled scaling activity#243artazar wants to merge 3 commits intotruefoundry:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds scheduled enabled periods to ElastiService (cron Changes
Sequence DiagramsequenceDiagram
participant ScaleHandler as ScaleHandler
participant CronUtil as CronUtil
participant CronParser as CronParser
ScaleHandler->>ScaleHandler: calculateScaleDirection()
alt enabledPeriod present
ScaleHandler->>ScaleHandler: isInEnabledPeriod(enabledPeriod)
ScaleHandler->>CronUtil: ValidateDuration(duration)
CronUtil-->>ScaleHandler: duration or error
alt duration invalid
ScaleHandler->>ScaleHandler: log warning, assume enabled (prevent scale-down)
else duration valid
ScaleHandler->>CronUtil: IsInEnabledPeriod(schedule, duration)
CronUtil->>CronParser: ParseCronSchedule(5-field)
CronParser-->>CronUtil: Schedule or error
alt schedule invalid
CronUtil-->>ScaleHandler: error
else schedule valid
CronUtil->>CronUtil: findLastTriggerTime(now, lookback)
CronUtil-->>ScaleHandler: bool (inPeriod)
end
end
alt not in enabled period
ScaleHandler->>ScaleHandler: return ScaleUp (block scale-down)
else in enabled period
ScaleHandler->>ScaleHandler: continue normal scaling logic
end
else no enabledPeriod
ScaleHandler->>ScaleHandler: continue normal scaling logic
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@docs/src/gs-configure-elastiservice.md`:
- Around line 146-189: Add a single trailing newline at the end of the markdown
file that contains the "### **5. EnabledPeriod: Control when scale-to-zero is
active (Optional)**" section so the file ends with a newline character (fixes
the markdownlint missing-trailing-newline warning); simply open the document and
ensure the last line is followed by one newline and save.
In `@pkg/cronutil/cronutil_test.go`:
- Around line 122-195: TestIsInEnabledPeriod currently only checks for errors
and ignores tt.want and tt.mockNow; add a time-injectable helper
IsInEnabledPeriodAt(schedule string, duration time.Duration, now time.Time)
(bool, error) in pkg/cronutil/cronutil.go that parses the schedule (use
ParseCronSchedule), finds the last trigger (reuse/findLastTriggerTime) within
duration+24h, computes enabledUntil := lastTrigger.Add(duration), and returns
now.UTC().Before(enabledUntil); make IsInEnabledPeriod call
IsInEnabledPeriodAt(time.Now().UTC()). Update TestIsInEnabledPeriod to call
IsInEnabledPeriodAt(tt.schedule, tt.duration, tt.mockNow) and assert the
returned bool equals tt.want (and still check error expectations for invalid
schedules).
In `@pkg/cronutil/cronutil.go`:
- Around line 39-57: The current findLastTriggerTime implementation may iterate
every occurrence in the lookback window for high-frequency schedules; update
findLastTriggerTime to avoid linear scans by performing an exponential/backoff
search using schedule.Next to quickly find a time window that contains the last
trigger, then perform a binary search (by probing schedule.Next at midpoints)
between the last known-before-now and the first-after-now to locate the exact
lastTrigger; keep references to schedule, earliestTime/now/lookbackWindow,
triggerTime and lastTrigger when describing the probes and ensure you still
return lastTrigger (zero if none found).
cf8c414 to
80168e6
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@operator/api/v1alpha1/elastiservice_types.go`:
- Around line 29-41: The EnabledPeriod struct accepts arbitrary Schedule and
Duration strings; add CRD-level validation so malformed cron or duration values
are rejected early: add kubebuilder validation tags to the Schedule and Duration
fields on EnabledPeriod (e.g.,
+kubebuilder:validation:Pattern="<cron-5-field-regex>" for Schedule and
+kubebuilder:validation:Pattern="<duration-regex>" for Duration) to enforce a
5-item UTC cron and supported time-unit formats, or alternatively implement an
admission webhook validating EnabledPeriod.Schedule and EnabledPeriod.Duration
in the webhook handler (validate functions that parse/regex-check values) so
invalid values are denied at create/update.
In `@pkg/cronutil/cronutil_test.go`:
- Around line 197-225: The subtests in TestIsInEnabledPeriodEdgeCases are
non-deterministic because they rely on time.Now(); change both subtests to use
fixed UTC timestamps (e.g., time.Date(...).UTC()) to construct the cron
"schedule" and to compute any expected boundaries, call
IsInEnabledPeriod(schedule, duration) and assert the returned boolean (use the
returned value for the "just after trigger time" test and for the "long
duration" test assert true). Update references in the test to use the fixed time
instead of time.Now() and ensure you check the returned got value (and err) in
each subtest so the assertions are deterministic and explicit.
In `@pkg/scaling/scale_handler_test.go`:
- Around line 53-71: The test helper testPollingIntervalParsing duplicates
production logic and diverges on env var name/validation; extract the
parsing/validation into a single production function (e.g., ParsePollingInterval
or GetPollingInterval) that reads the correct env var (with optional legacy
fallback), applies the same rules (ParseDuration, positive check, default 30s,
and emits warnings via the same logger), update StartScaleDownWatcher to call
that new function, and make the test call the production function instead of
using testPollingIntervalParsing so both code paths share identical behavior.
In `@pkg/scaling/scale_handler.go`:
- Around line 184-197: The enabled-period check currently logs and falls through
on error from h.isInEnabledPeriod, which can allow scale-down; change the error
path to "fail closed" by treating parse/validation errors as not enabled: when
h.isInEnabledPeriod(es.Spec.EnabledPeriod) returns an error, log the warning
(h.logger.Warn...) and return ScaleUp, nil (or otherwise prevent scale-down /
enforce min replicas) so the handler blocks scale-down on EnabledPeriod
evaluation failures.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@pkg/scaling/scale_handler.go`:
- Around line 187-201: The inline comment in the enabled-period check is stale:
when isInEnabledPeriod returns an error the code returns ScaleUp (preventing
scale-down) but the comment says "assume enabled to avoid disruption." Update
that comment to reflect the actual behavior in the block (e.g., "On error,
default to preventing scale-down" or "assume disabled — prevent scale-down")
near the isInEnabledPeriod call and the return ScaleUp to avoid confusion; no
logic change to the return or h.logger usage.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
pkg/scaling/scale_handler.go (1)
187-200: The fail-closed behavior (Lines 190–194) was previously flagged and addressed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/scaling/scale_handler.go` around lines 187 - 200, The current fail-closed behavior in the enabled-period check (using es.Spec.EnabledPeriod and h.isInEnabledPeriod) returns ScaleUp on error which prevents scale-down; change this to fail-open so that on error you log the warning via h.logger.Warn (including service and err) but return ScaleDown, nil instead of ScaleUp, nil, while keeping the existing branch that returns ScaleUp when enabled==false (outside period) unchanged.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
pkg/scaling/scale_handler.go
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/scaling/scale_handler.go`:
- Around line 195-199: The code currently returns ScaleUp when
calculateScaleDirection indicates scaling outside the enabled window, which
refreshes UpdateLastScaledUpTime and prevents subsequent scale-to-zero; change
that branch in calculateScaleDirection/handleScaleFromZero to return NoScale
instead of ScaleUp (and ensure you do not call UpdateLastScaledUpTime when no
actual scale-up will occur) so the cooldown timestamp isn't updated on these
polling cycles and scale-down can proceed when the window opens.
---
Duplicate comments:
In `@pkg/scaling/scale_handler.go`:
- Around line 187-200: The current fail-closed behavior in the enabled-period
check (using es.Spec.EnabledPeriod and h.isInEnabledPeriod) returns ScaleUp on
error which prevents scale-down; change this to fail-open so that on error you
log the warning via h.logger.Warn (including service and err) but return
ScaleDown, nil instead of ScaleUp, nil, while keeping the existing branch that
returns ScaleUp when enabled==false (outside period) unchanged.
|
@ramantehlan good from my side, please check once more, thank you! |
|
Thank you @artazar, will have a look |
Description
Introduce scheduled scale-down activity of KubeElasti.
In short: add enabledPeriod specification for ElastiService custom object that supports adding a cron schedule and duration for the KubeElasti activity. Outside of this configured schedule no scale-down should happen. By default this is set to "always".
Fixes #242
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist:
Additional notes:
Note this has been implemented with the help of Claude Code but under strict supervision and solid integration testing.
Summary by CodeRabbit
New Features
Documentation
Behavior
CRD & Samples
Tests