-
Notifications
You must be signed in to change notification settings - Fork 61
Description
What happened?
When setting minReplicas: 0 on a component (Engine or Router), the controller incorrectly triggers Serverless deployment mode even when the user has explicitly set deploymentMode: RawDeployment via annotation.
The component-level deployment mode determination ignores the user's explicit global deployment mode annotation.
What did you expect to happen?
When deploymentMode: RawDeployment is explicitly set via annotation, minReplicas: 0 should NOT trigger Serverless mode. The explicit annotation should take precedence over component-level inference from minReplicas.
Users setting minReplicas: 0 with RawDeployment mode expect a standard Kubernetes Deployment that can scale to zero (e.g., with KEDA), not a Knative Service.
How can we reproduce it (as minimally and precisely as possible)?
- Deploy OME on a cluster (with or without Knative installed)
- Create an InferenceService with explicit
RawDeploymentmode butminReplicas: 0:
apiVersion: ome.io/v1beta1
kind: InferenceService
metadata:
name: test-deployment-mode
namespace: default
annotations:
ome.io/deployment-mode: RawDeployment # Explicit: use RawDeployment
spec:
predictor:
engine:
minReplicas: 0 # This incorrectly triggers Serverless!
maxReplicas: 3
container:
image: test-image:latest- Observe that the controller:
- Tries to create a Knative Service instead of a Deployment
- If Knative is not installed, fails with error:
no kind is registered for the type v1.Service in scheme - If Knative is installed, creates a Knative Service when user expected a Deployment
The same issue occurs with Router component:
apiVersion: ome.io/v1beta1
kind: InferenceService
metadata:
name: test-router-mode
annotations:
ome.io/deployment-mode: RawDeployment
spec:
predictor:
engine:
minReplicas: 1
container:
image: engine:latest
router:
minReplicas: 0 # Also incorrectly triggers Serverless for Router
container:
image: router:latestAnything else we need to know?
Root Cause
In pkg/controller/v1beta1/inferenceservice/utils/deployment.go, the functions DetermineEngineDeploymentMode() and determineComponentDeploymentMode() check for minReplicas == 0 and return Serverless without considering the global deployment mode set via annotation.
// Current behavior - ignores global mode
if engine.MinReplicas != nil && *engine.MinReplicas == 0 {
return constants.Serverless // Ignores explicit deploymentMode annotation!
}Impact
- Users cannot use
minReplicas: 0with external autoscalers (like KEDA) when using RawDeployment mode - Confusing error messages when Knative is not installed
- User's explicit configuration is silently ignored
Workaround
Set minReplicas: 1 or higher when using RawDeployment mode. This prevents scale-to-zero functionality with external autoscalers.
Environment
- OME version: v0.1.x
- Kubernetes version: v1.28+
- Cloud provider or hardware configuration: Any
- OS: Any
- Runtime: Any (SGLang, vLLM, etc.)
- Model being served: Any
- Install method: Helm or kubectl