Introduce registry for caching and exposing TemplateNodeInfos#8911
Conversation
|
Welcome @Choraden! |
|
Hi @Choraden. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
f9c0302 to
ad96941
Compare
ad96941 to
4fb808c
Compare
4fb808c to
7f36de5
Compare
|
/assign @towca |
|
/cherry-pick cluster-autoscaler-release-1.35 |
|
@jackfrancis: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
cluster-autoscaler/processors/nodeinfosprovider/template_node_info_registry.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/processors/nodeinfosprovider/template_node_info_registry.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/processors/nodeinfosprovider/template_node_info_registry.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/processors/nodeinfosprovider/template_node_info_registry_test.go
Show resolved
Hide resolved
cluster-autoscaler/processors/customresources/dra_processor_test.go
Outdated
Show resolved
Hide resolved
| // NewTestProcessors returns a set of simple processors for use in tests. | ||
| // Note: This function injects a default TemplateNodeInfoRegistry into the provided AutoscalingContext. | ||
| // This is a necessary workaround for synthetic tests that manually construct the context without using NewStaticAutoscaler, ensuring they have access to the registry. | ||
| func NewTestProcessors(autoscalingCtx *ca_context.AutoscalingContext) *processors.AutoscalingProcessors { |
There was a problem hiding this comment.
I get that this was the easiest change to make the tests pass, but unfortunately little hacks like these make the tests really hard to understand and extend.
Looking at the usages of this function, it's ~always called after NewScaleTestAutoscalingContext(). IMO the order should be switched, like it's in the prod path - processors are a dependency of the context, not the other way around. NewScaleTestAutoscalingContext() should either take the processors as parameter, or call NewTestProcessors() internally. NewTestProcessors() technically depends on the full context now, but it only uses a small subset of it - config.AutoscalingOptions - which is also used as a parameter to NewScaleTestAutoscalingContext(). Have you explored something like that?
There was a problem hiding this comment.
I decided to:
- decouple
NewTestProcessorsfromautoscalingCtxand depend only onconfig.AutoscalingOptions - update
NewScaleTestAutoscalingContextto acceptTemplateNodeInfoRegistryas in the originalNewAutoscalingContext - reordered test initialization: create options -> create processors & registry -> create context
This aligns the test setup with the production architecture and improves readability and safety.
Adding it in a separate commit to streamline review. Let me know if you want it squashed eventually.
There was a problem hiding this comment.
LGTM, thanks a lot for this!
7f36de5 to
f1ba828
Compare
| // NewTestProcessors returns a set of simple processors for use in tests. | ||
| // Note: This function injects a default TemplateNodeInfoRegistry into the provided AutoscalingContext. | ||
| // This is a necessary workaround for synthetic tests that manually construct the context without using NewStaticAutoscaler, ensuring they have access to the registry. | ||
| func NewTestProcessors(autoscalingCtx *ca_context.AutoscalingContext) *processors.AutoscalingProcessors { |
There was a problem hiding this comment.
I decided to:
- decouple
NewTestProcessorsfromautoscalingCtxand depend only onconfig.AutoscalingOptions - update
NewScaleTestAutoscalingContextto acceptTemplateNodeInfoRegistryas in the originalNewAutoscalingContext - reordered test initialization: create options -> create processors & registry -> create context
This aligns the test setup with the production architecture and improves readability and safety.
Adding it in a separate commit to streamline review. Let me know if you want it squashed eventually.
cluster-autoscaler/processors/customresources/dra_processor_test.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/processors/nodeinfosprovider/template_node_info_registry.go
Outdated
Show resolved
Hide resolved
towca
left a comment
There was a problem hiding this comment.
Some final comments for the test code, but in general LGTM!
| "node_5_Dra_Unready": false, | ||
| }, | ||
| }, | ||
| "2 DRA node group, single driver multiple pools, more pools published including template pools": { |
There was a problem hiding this comment.
Was this case intentionally removed? Why?
There was a problem hiding this comment.
I guess, it should not have been removed. Reverted.
cluster-autoscaler/processors/customresources/dra_processor_test.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/processors/customresources/dra_processor_test.go
Outdated
Show resolved
Hide resolved
| // NewTestProcessors returns a set of simple processors for use in tests. | ||
| // Note: This function injects a default TemplateNodeInfoRegistry into the provided AutoscalingContext. | ||
| // This is a necessary workaround for synthetic tests that manually construct the context without using NewStaticAutoscaler, ensuring they have access to the registry. | ||
| func NewTestProcessors(autoscalingCtx *ca_context.AutoscalingContext) *processors.AutoscalingProcessors { |
There was a problem hiding this comment.
LGTM, thanks a lot for this!
|
/test pull-cluster-autoscaler-e2e-azure-master |
|
@jackfrancis: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
…NodeInfos This change introduces a new component, TemplateNodeInfoRegistry, which wraps the existing TemplateNodeInfoProvider. It caches the computed template NodeInfos and exposes them via a thread-safe interface. This registry is added to the AutoscalingContext, allowing processors (like the DRA processor) to access the cached templates instead of relying on the less reliable NodeGroup.TemplateNodeInfo().
…gistry Key changes: - Updated NewScaleTestAutoscalingContext to accept TemplateNodeInfoRegistry as a parameter. - Refactored NewTestProcessors to take AutoscalingOptions and return both Processors and TemplateNodeInfoRegistry. - Reordered test initialization to follow the production path: Options -> Processors/Registry -> AutoscalingContext. These changes improve testing readability and extendability by ensuring a consistent setup of the autoscaling environment with the production logic.
The DRACustomResourcesProcessor now attempts to retrieve NodeInfo from the TemplateNodeInfoRegistry before falling back to the NodeGroup. This ensures the processor uses the canonical TemplateNodeInfo for the current autoscaling loop. Crucially, this preserves any enrichments (such as custom DRA resource slices) that are computed during the registry's Recompute phase but might be absent in a fresh, raw template from the CloudProvider.
f1ba828 to
c96e983
Compare
|
@towca PTAL. |
|
LGTM! Thanks a lot @Choraden, this was an important missing part for DRA support in Cluster Autoscaler. /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Choraden, towca The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@jackfrancis: new pull request created: #9032 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This change introduces a new component,
TemplateNodeInfoRegistry, which wraps the existingTemplateNodeInfoProvider. It caches the computed template NodeInfos and exposes them via a thread-safe interface.This registry is added to the AutoscalingContext, allowing processors (like the DRA processor) to access the cached templates instead of relying on the less reliable
NodeGroup.TemplateNodeInfo().Which issue(s) this PR fixes:
Fixes #8881
Fixes #8882
Special notes for your reviewer:
--
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: