Add scheduling options and strategies in CA by walidghallab · Pull Request #9205 · kubernetes/autoscaler

walidghallab · 2026-02-10T19:00:47Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Add scheduling options and strategies in CA.

To be more exact:

An iterator to control the order of nodes to go through.
Prefer the smallest number of iterations in case of multiple matches.
Add a way of exiting early if no more nodes should be checked.
Add some predefined strategies ready to use for simplicity.

Why we need it? Because currently RunFiltersUntilPassingNode has only one scheduling strategy, which is to always start from the last node it tried. While this work in many cases, there are some cases where we don't want this behavior.

Adding this option is a way to:

Provide some version of scoring nodes.
Provide simple way to avoid spreading of pods on nodes without affecting efficiency.
Customize the order of nodes that scheduling go through.
Due to parallelism, prefer the match with the earliest iterations (to decrease unnecessary spreading and indeterminism).

The only alternative that exist right now is to run CheckPredicates N times, but this is very slow because:

It run prefilters N times (instead of just once), and prefilters are much slower than filters.
Doesn't do parallelism.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2026-02-10T19:00:56Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: walidghallab
Once this PR has been reviewed and has the lgtm label, please assign x13n for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

cluster-autoscaler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2026-02-10T19:00:57Z

Hi @walidghallab. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

walidghallab · 2026-02-10T19:04:19Z

/cc x13n

walidghallab · 2026-02-10T19:04:42Z

/uncc elmiko

walidghallab · 2026-02-10T19:04:59Z

/uncc aleksandra-malinowska

elmiko · 2026-02-10T19:18:36Z

/ok-to-test

cluster-autoscaler/simulator/scheduling/hinting_simulator.go

cluster-autoscaler/simulator/clustersnapshot/scheduling_opts.go

cluster-autoscaler/simulator/clustersnapshot/predicate/plugin_runner.go

To be more exact: - An iterator to control the order of nodes to go through. - Prefer the smallest number of iterations in case of multiple matches. - Add a way of exiting early if no more nodes should be checked.

cluster-autoscaler/simulator/clustersnapshot/predicate/plugin_runner.go

x13n · 2026-02-20T13:35:28Z

cluster-autoscaler/simulator/clustersnapshot/scheduling_opts.go

+// NodeOrderMapping defines the order in which nodes are iterated during scheduling simulation.
+type NodeOrderMapping interface {
+	// Init initializes the mapping with the list of nodes and the index of the last successful match.
+	Init(collection []*framework.NodeInfo, lastMatch int)


I wonder if we could avoid passing lastMatch in the argument here to avoid leaking abstractions. I understand the current implementation starts from last match + 1, but if we changed semantics to last accessed + 1 instead, you could encapsulate last index logic entirely in lastIndexOrderMapping rather than splitting the responsibility between that and RunFiltersUntilPassingNode. The point of last match + 1 is to prevent starting from the same node every time and that property is going to be maintained if you change the semantic to be last accesssed + 1.

The thing is, I want to allow having 2 strategies:

One that starts from lastMatch + 1 (current).

One that starts from lastMatch (this can be useful in the ASN bug we found for example). This is highly efficient while limiting spreading.

So the current way is more flexible and allow both.

Makes sense. I wonder though whether in this case there should be a separate MarkLastMatch(int) method in the interface. Then you could drop lastIndex field from plugin runner - no need for it to be stored there.

I see what you mean. But in that case lastIndex won't be communicated between 2 different NodeOrderings. And also will need some NodeOrderings to be global. (e.g. one that start from last node and one that start last node + 1).

Maybe it is okay but I wanted to have flexibility. Callers who don't need it (e.g. priorityNodeOrderMapping) can simply ignore it in init calls.

Right, with the current code structure it is possible to sometimes pass one ordering and sometimes another. And you want to pass the information about last match between these orderings? Actually the whole last index idea is quite fragile, because you have to ensure the list of nodes returned from snapshot is identical between the calls. Do you have a use case for running the predicates twice for the same pod, on the same snapshot, but with different ordering?

you have to ensure the list of nodes returned from snapshot is identical between the calls

True. The detailed answer will be:

In basic snapshot, it is always changing (bec it is based on a map without cache AFAICS)

In delta snapshot (which I assume cloud providers to be using), the answer it depends:
-- The part that comes from before Fork will always be at the beginning of the nodeInfoList and same order (due to cache in parent snapshot).
-- The whole list will be the same unless add/remove/modify operations are done (due to cache in the current snapshot).

Do you have a use case for running the predicates twice for the same pod, on the same snapshot, but with different ordering?

No. Bec once pod is matched, I assume it will be cached for the rest of the CA loop.

Actually the whole last index idea is quite fragile

I agree.

Okay I have made the changes:

Removed lastIndex from Init.

Added MarkMatch method to the interface.

Added global DefaultNodeOrdering

Renamed Init to Reset, bec Init indicates that it is only done once but in reality it is called every time a new pod is being scheduled.

I made them in a separate commit so you can compare the diff.

Note, I made defaultNodeOrdering a field in the pluginRunner (not global one).
To avoid race conditions (e.g. when 2 snapshots are created and running in parallel).

this way we can have the ability to modify the default one in the future for a specific cloud provider (or even for just specific instance of clustersnapshot) without affecting the others (e.g. by adding SetDefaultNodeOrdering or passing it in the constructor and/or having it in AutoscalingOptions).

cluster-autoscaler/simulator/clustersnapshot/scheduling_opts.go

To be more detailed, changes are: - Removed lastIndex from Init. - Added MarkMatch method to the interface. - Stored defaultNodeOrdering (didn't have global one to avoid race conditions between different clustersnapshots). - Renamed Init to Reset, bec Init indicates that it is only done once but in reality it is called every time a new pod is being scheduled.

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/needs-area labels Feb 10, 2026

k8s-ci-robot requested review from aleksandra-malinowska and elmiko February 10, 2026 19:00

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 10, 2026

k8s-ci-robot added area/cluster-autoscaler size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed do-not-merge/needs-area labels Feb 10, 2026

walidghallab force-pushed the scheduler branch from 6155035 to 73eefa2 Compare February 10, 2026 19:04

k8s-ci-robot requested a review from x13n February 10, 2026 19:04

k8s-ci-robot removed the request for review from elmiko February 10, 2026 19:04

k8s-ci-robot removed the request for review from aleksandra-malinowska February 10, 2026 19:05

walidghallab force-pushed the scheduler branch from 73eefa2 to 62531e3 Compare February 10, 2026 19:16

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 10, 2026

walidghallab force-pushed the scheduler branch from 62531e3 to ad16378 Compare February 10, 2026 19:20

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 14, 2026

walidghallab force-pushed the scheduler branch 3 times, most recently from 27ab375 to 526fd85 Compare February 18, 2026 22:56

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 18, 2026

walidghallab changed the title ~~Add scheduling strategies in CA~~ Add scheduling options in CA Feb 18, 2026

walidghallab changed the title ~~Add scheduling options in CA~~ Add scheduling options and strategies in CA Feb 18, 2026

x13n requested changes Feb 19, 2026

View reviewed changes

walidghallab force-pushed the scheduler branch 2 times, most recently from 3901633 to 8dd10e5 Compare February 19, 2026 21:13

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 19, 2026

walidghallab force-pushed the scheduler branch 9 times, most recently from a831585 to a4606c0 Compare February 20, 2026 12:45

Add scheduling strategies in CA.

6427281

To be more exact: - An iterator to control the order of nodes to go through. - Prefer the smallest number of iterations in case of multiple matches. - Add a way of exiting early if no more nodes should be checked.

walidghallab force-pushed the scheduler branch from a4606c0 to 6427281 Compare February 20, 2026 13:12

x13n requested changes Feb 20, 2026

View reviewed changes

walidghallab requested a review from x13n February 20, 2026 21:26

walidghallab force-pushed the scheduler branch from 85b4d8b to 59f6b78 Compare February 20, 2026 21:37

walidghallab force-pushed the scheduler branch from 59f6b78 to 6d86d3f Compare February 20, 2026 21:43

Conversation

walidghallab commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Feb 10, 2026

Uh oh!

k8s-ci-robot commented Feb 10, 2026

Uh oh!

walidghallab commented Feb 10, 2026

Uh oh!

walidghallab commented Feb 10, 2026

Uh oh!

walidghallab commented Feb 10, 2026

Uh oh!

elmiko commented Feb 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

x13n Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

walidghallab Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

x13n Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

walidghallab Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

x13n Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

walidghallab Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

walidghallab Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

walidghallab commented Feb 10, 2026 •

edited

Loading

walidghallab Feb 20, 2026 •

edited

Loading

walidghallab Feb 20, 2026 •

edited

Loading

walidghallab Feb 20, 2026 •

edited

Loading