Skip to content

Comments

Implement EnabledPeriod for scheduled scaling activity#243

Open
artazar wants to merge 3 commits intotruefoundry:mainfrom
artazar:main
Open

Implement EnabledPeriod for scheduled scaling activity#243
artazar wants to merge 3 commits intotruefoundry:mainfrom
artazar:main

Conversation

@artazar
Copy link
Contributor

@artazar artazar commented Feb 4, 2026

Description

Introduce scheduled scale-down activity of KubeElasti.

In short: add enabledPeriod specification for ElastiService custom object that supports adding a cron schedule and duration for the KubeElasti activity. Outside of this configured schedule no scale-down should happen. By default this is set to "always".

Fixes #242

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

  • [+] Added comprehensive tests
  • [+] Build and deployed a custom image and verified that it works as expected in my cluster

Checklist:

  • [+] I have performed a self-review of my own code
  • [+] I have made corresponding changes to the documentation
  • [+] I have added tests that prove my fix is effective or that my feature works
  • [+] New and existing unit tests pass locally with my changes
  • [+] A PR is open to update the helm chart (link)[https://github.com/truefoundry/elasti/tree/main/charts/elasti] if applicable
  • [+] I have updated the CHANGELOG.md file with the changes I made

Additional notes:

Note this has been implemented with the help of Claude Code but under strict supervision and solid integration testing.

Summary by CodeRabbit

  • New Features

    • Scheduled activity for ElastiService: add enabledPeriod (cron schedule + duration) to control when scale-to-zero is allowed.
  • Documentation

    • Configuration guide, examples, defaults (daily midnight / 24h), UTC note, and behavior when omitted or invalid.
  • Behavior

    • Outside enabledPeriod prevents scale-down; invalid/unparsable schedule or duration logs warnings and is treated fail-open.
  • CRD & Samples

    • CRD schema and sample manifests updated to include enabledPeriod and examples.
  • Tests

    • Added unit tests covering parsing and enabled-period logic.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds scheduled enabled periods to ElastiService (cron schedule + duration): new API/CRD fields and deepcopy, a cron utility (parsing/validation/window checks) with tests, integrates enabled-period checks into scaling decision to gate scale-downs, updates docs/samples, and adds the cron dependency.

Changes

Cohort / File(s) Summary
API & CRD
operator/api/v1alpha1/elastiservice_types.go, operator/api/v1alpha1/zz_generated.deepcopy.go, operator/config/crd/bases/elasti.truefoundry.com_elastiservices.yaml
Add EnabledPeriod type and spec.enabledPeriod (optional) with schedule and duration; generate deepcopy funcs and extend CRD schema with defaults.
Documentation & Samples
docs/src/gs-configure-elastiservice.md, operator/config/samples/elasti_v1alpha1_elastiservice.yaml, CHANGELOG.md
Document enabledPeriod fields, semantics, examples, defaults, UTC/cron guidance; add sample manifest using enabledPeriod; add changelog entry.
Cron utility & tests
pkg/cronutil/cronutil.go, pkg/cronutil/cronutil_test.go, pkg/go.mod
New cron utility: ParseCronSchedule, ValidateDuration, IsInEnabledPeriod (and injectable IsInEnabledPeriodAt) plus helper to find last trigger; comprehensive unit tests; add github.com/robfig/cron/v3 dependency.
Scaling logic & tests
pkg/scaling/scale_handler.go, pkg/scaling/scale_handler_test.go
Integrate enabled-period check into scale decision flow to prevent scale-down outside the enabled window; add ScaleHandler.isInEnabledPeriod with defaults/validation and handle errors by warning and assuming enabled; update scale-handler tests.
Operator modules
operator/go.mod
Add indirect dependency on github.com/robfig/cron/v3 required by packages.

Sequence Diagram

sequenceDiagram
    participant ScaleHandler as ScaleHandler
    participant CronUtil as CronUtil
    participant CronParser as CronParser

    ScaleHandler->>ScaleHandler: calculateScaleDirection()
    alt enabledPeriod present
        ScaleHandler->>ScaleHandler: isInEnabledPeriod(enabledPeriod)
        ScaleHandler->>CronUtil: ValidateDuration(duration)
        CronUtil-->>ScaleHandler: duration or error
        alt duration invalid
            ScaleHandler->>ScaleHandler: log warning, assume enabled (prevent scale-down)
        else duration valid
            ScaleHandler->>CronUtil: IsInEnabledPeriod(schedule, duration)
            CronUtil->>CronParser: ParseCronSchedule(5-field)
            CronParser-->>CronUtil: Schedule or error
            alt schedule invalid
                CronUtil-->>ScaleHandler: error
            else schedule valid
                CronUtil->>CronUtil: findLastTriggerTime(now, lookback)
                CronUtil-->>ScaleHandler: bool (inPeriod)
            end
        end
        alt not in enabled period
            ScaleHandler->>ScaleHandler: return ScaleUp (block scale-down)
        else in enabled period
            ScaleHandler->>ScaleHandler: continue normal scaling logic
        end
    else no enabledPeriod
        ScaleHandler->>ScaleHandler: continue normal scaling logic
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibble cron lines under moonlit charts,
Windows of running stitched with tiny hearts.
Schedules whisper, durations counted clear,
Nightly hush, daytime cheer.
Hop—scale sleeps softly through the year.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Implement EnabledPeriod for scheduled scaling activity' accurately summarizes the main change: adding EnabledPeriod functionality for controlling when scale-down occurs based on schedule.
Linked Issues check ✅ Passed All coding requirements from issue #242 are met: scheduled scale-down capability via enabledPeriod field with cron schedule support, duration configuration, fail-open defaults, and comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are in scope. Beyond core EnabledPeriod implementation, sample YAML updates provide helpful documentation context, and go.mod updates add only the required cron library dependency.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@artazar
Copy link
Contributor Author

artazar commented Feb 4, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 4, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@docs/src/gs-configure-elastiservice.md`:
- Around line 146-189: Add a single trailing newline at the end of the markdown
file that contains the "### **5. EnabledPeriod: Control when scale-to-zero is
active (Optional)**" section so the file ends with a newline character (fixes
the markdownlint missing-trailing-newline warning); simply open the document and
ensure the last line is followed by one newline and save.

In `@pkg/cronutil/cronutil_test.go`:
- Around line 122-195: TestIsInEnabledPeriod currently only checks for errors
and ignores tt.want and tt.mockNow; add a time-injectable helper
IsInEnabledPeriodAt(schedule string, duration time.Duration, now time.Time)
(bool, error) in pkg/cronutil/cronutil.go that parses the schedule (use
ParseCronSchedule), finds the last trigger (reuse/findLastTriggerTime) within
duration+24h, computes enabledUntil := lastTrigger.Add(duration), and returns
now.UTC().Before(enabledUntil); make IsInEnabledPeriod call
IsInEnabledPeriodAt(time.Now().UTC()). Update TestIsInEnabledPeriod to call
IsInEnabledPeriodAt(tt.schedule, tt.duration, tt.mockNow) and assert the
returned bool equals tt.want (and still check error expectations for invalid
schedules).

In `@pkg/cronutil/cronutil.go`:
- Around line 39-57: The current findLastTriggerTime implementation may iterate
every occurrence in the lookback window for high-frequency schedules; update
findLastTriggerTime to avoid linear scans by performing an exponential/backoff
search using schedule.Next to quickly find a time window that contains the last
trigger, then perform a binary search (by probing schedule.Next at midpoints)
between the last known-before-now and the first-after-now to locate the exact
lastTrigger; keep references to schedule, earliestTime/now/lookbackWindow,
triggerTime and lastTrigger when describing the probes and ensure you still
return lastTrigger (zero if none found).

@artazar artazar force-pushed the main branch 2 times, most recently from cf8c414 to 80168e6 Compare February 5, 2026 02:03
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@operator/api/v1alpha1/elastiservice_types.go`:
- Around line 29-41: The EnabledPeriod struct accepts arbitrary Schedule and
Duration strings; add CRD-level validation so malformed cron or duration values
are rejected early: add kubebuilder validation tags to the Schedule and Duration
fields on EnabledPeriod (e.g.,
+kubebuilder:validation:Pattern="<cron-5-field-regex>" for Schedule and
+kubebuilder:validation:Pattern="<duration-regex>" for Duration) to enforce a
5-item UTC cron and supported time-unit formats, or alternatively implement an
admission webhook validating EnabledPeriod.Schedule and EnabledPeriod.Duration
in the webhook handler (validate functions that parse/regex-check values) so
invalid values are denied at create/update.

In `@pkg/cronutil/cronutil_test.go`:
- Around line 197-225: The subtests in TestIsInEnabledPeriodEdgeCases are
non-deterministic because they rely on time.Now(); change both subtests to use
fixed UTC timestamps (e.g., time.Date(...).UTC()) to construct the cron
"schedule" and to compute any expected boundaries, call
IsInEnabledPeriod(schedule, duration) and assert the returned boolean (use the
returned value for the "just after trigger time" test and for the "long
duration" test assert true). Update references in the test to use the fixed time
instead of time.Now() and ensure you check the returned got value (and err) in
each subtest so the assertions are deterministic and explicit.

In `@pkg/scaling/scale_handler_test.go`:
- Around line 53-71: The test helper testPollingIntervalParsing duplicates
production logic and diverges on env var name/validation; extract the
parsing/validation into a single production function (e.g., ParsePollingInterval
or GetPollingInterval) that reads the correct env var (with optional legacy
fallback), applies the same rules (ParseDuration, positive check, default 30s,
and emits warnings via the same logger), update StartScaleDownWatcher to call
that new function, and make the test call the production function instead of
using testPollingIntervalParsing so both code paths share identical behavior.

In `@pkg/scaling/scale_handler.go`:
- Around line 184-197: The enabled-period check currently logs and falls through
on error from h.isInEnabledPeriod, which can allow scale-down; change the error
path to "fail closed" by treating parse/validation errors as not enabled: when
h.isInEnabledPeriod(es.Spec.EnabledPeriod) returns an error, log the warning
(h.logger.Warn...) and return ScaleUp, nil (or otherwise prevent scale-down /
enforce min replicas) so the handler blocks scale-down on EnabledPeriod
evaluation failures.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pkg/scaling/scale_handler.go`:
- Around line 187-201: The inline comment in the enabled-period check is stale:
when isInEnabledPeriod returns an error the code returns ScaleUp (preventing
scale-down) but the comment says "assume enabled to avoid disruption." Update
that comment to reflect the actual behavior in the block (e.g., "On error,
default to preventing scale-down" or "assume disabled — prevent scale-down")
near the isInEnabledPeriod call and the return ScaleUp to avoid confusion; no
logic change to the return or h.logger usage.

Copy link
Collaborator

@ramantehlan ramantehlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/scaling/scale_handler.go (1)

187-200: The fail-closed behavior (Lines 190–194) was previously flagged and addressed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/scaling/scale_handler.go` around lines 187 - 200, The current fail-closed
behavior in the enabled-period check (using es.Spec.EnabledPeriod and
h.isInEnabledPeriod) returns ScaleUp on error which prevents scale-down; change
this to fail-open so that on error you log the warning via h.logger.Warn
(including service and err) but return ScaleDown, nil instead of ScaleUp, nil,
while keeping the existing branch that returns ScaleUp when enabled==false
(outside period) unchanged.
ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71f0129 and 5c026fd.

📒 Files selected for processing (1)
  • pkg/scaling/scale_handler.go
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/scaling/scale_handler.go`:
- Around line 195-199: The code currently returns ScaleUp when
calculateScaleDirection indicates scaling outside the enabled window, which
refreshes UpdateLastScaledUpTime and prevents subsequent scale-to-zero; change
that branch in calculateScaleDirection/handleScaleFromZero to return NoScale
instead of ScaleUp (and ensure you do not call UpdateLastScaledUpTime when no
actual scale-up will occur) so the cooldown timestamp isn't updated on these
polling cycles and scale-down can proceed when the window opens.

---

Duplicate comments:
In `@pkg/scaling/scale_handler.go`:
- Around line 187-200: The current fail-closed behavior in the enabled-period
check (using es.Spec.EnabledPeriod and h.isInEnabledPeriod) returns ScaleUp on
error which prevents scale-down; change this to fail-open so that on error you
log the warning via h.logger.Warn (including service and err) but return
ScaleDown, nil instead of ScaleUp, nil, while keeping the existing branch that
returns ScaleUp when enabled==false (outside period) unchanged.

@artazar
Copy link
Contributor Author

artazar commented Feb 23, 2026

@ramantehlan good from my side, please check once more, thank you!

@artazar artazar requested a review from ramantehlan February 24, 2026 04:41
@ramantehlan
Copy link
Collaborator

Thank you @artazar, will have a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Scheduled scale-down activity

2 participants