Skip to content

[BUG] minReplicas=0 ignores explicit deploymentMode annotation #445

@jskswamy

Description

@jskswamy

What happened?

When setting minReplicas: 0 on a component (Engine or Router), the controller incorrectly triggers Serverless deployment mode even when the user has explicitly set deploymentMode: RawDeployment via annotation.

The component-level deployment mode determination ignores the user's explicit global deployment mode annotation.

What did you expect to happen?

When deploymentMode: RawDeployment is explicitly set via annotation, minReplicas: 0 should NOT trigger Serverless mode. The explicit annotation should take precedence over component-level inference from minReplicas.

Users setting minReplicas: 0 with RawDeployment mode expect a standard Kubernetes Deployment that can scale to zero (e.g., with KEDA), not a Knative Service.

How can we reproduce it (as minimally and precisely as possible)?

  1. Deploy OME on a cluster (with or without Knative installed)
  2. Create an InferenceService with explicit RawDeployment mode but minReplicas: 0:
apiVersion: ome.io/v1beta1
kind: InferenceService
metadata:
  name: test-deployment-mode
  namespace: default
  annotations:
    ome.io/deployment-mode: RawDeployment # Explicit: use RawDeployment
spec:
  predictor:
    engine:
      minReplicas: 0 # This incorrectly triggers Serverless!
      maxReplicas: 3
      container:
        image: test-image:latest
  1. Observe that the controller:
    • Tries to create a Knative Service instead of a Deployment
    • If Knative is not installed, fails with error: no kind is registered for the type v1.Service in scheme
    • If Knative is installed, creates a Knative Service when user expected a Deployment

The same issue occurs with Router component:

apiVersion: ome.io/v1beta1
kind: InferenceService
metadata:
  name: test-router-mode
  annotations:
    ome.io/deployment-mode: RawDeployment
spec:
  predictor:
    engine:
      minReplicas: 1
      container:
        image: engine:latest
  router:
    minReplicas: 0 # Also incorrectly triggers Serverless for Router
    container:
      image: router:latest

Anything else we need to know?

Root Cause

In pkg/controller/v1beta1/inferenceservice/utils/deployment.go, the functions DetermineEngineDeploymentMode() and determineComponentDeploymentMode() check for minReplicas == 0 and return Serverless without considering the global deployment mode set via annotation.

// Current behavior - ignores global mode
if engine.MinReplicas != nil && *engine.MinReplicas == 0 {
    return constants.Serverless  // Ignores explicit deploymentMode annotation!
}

Impact

  1. Users cannot use minReplicas: 0 with external autoscalers (like KEDA) when using RawDeployment mode
  2. Confusing error messages when Knative is not installed
  3. User's explicit configuration is silently ignored

Workaround

Set minReplicas: 1 or higher when using RawDeployment mode. This prevents scale-to-zero functionality with external autoscalers.

Environment

  • OME version: v0.1.x
  • Kubernetes version: v1.28+
  • Cloud provider or hardware configuration: Any
  • OS: Any
  • Runtime: Any (SGLang, vLLM, etc.)
  • Model being served: Any
  • Install method: Helm or kubectl

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions