[BUG] minReplicas=0 ignores explicit deploymentMode annotation

## What happened?

When setting `minReplicas: 0` on a component (Engine or Router), the controller incorrectly triggers Serverless deployment mode even when the user has explicitly set `deploymentMode: RawDeployment` via annotation.

The component-level deployment mode determination ignores the user's explicit global deployment mode annotation.

## What did you expect to happen?

When `deploymentMode: RawDeployment` is explicitly set via annotation, `minReplicas: 0` should NOT trigger Serverless mode. The explicit annotation should take precedence over component-level inference from `minReplicas`.

Users setting `minReplicas: 0` with `RawDeployment` mode expect a standard Kubernetes Deployment that can scale to zero (e.g., with KEDA), not a Knative Service.

## How can we reproduce it (as minimally and precisely as possible)?

1. Deploy OME on a cluster (with or without Knative installed)
2. Create an InferenceService with explicit `RawDeployment` mode but `minReplicas: 0`:

```yaml
apiVersion: ome.io/v1beta1
kind: InferenceService
metadata:
  name: test-deployment-mode
  namespace: default
  annotations:
    ome.io/deployment-mode: RawDeployment # Explicit: use RawDeployment
spec:
  predictor:
    engine:
      minReplicas: 0 # This incorrectly triggers Serverless!
      maxReplicas: 3
      container:
        image: test-image:latest
```

3. Observe that the controller:
   - Tries to create a Knative Service instead of a Deployment
   - If Knative is not installed, fails with error: `no kind is registered for the type v1.Service in scheme`
   - If Knative is installed, creates a Knative Service when user expected a Deployment

The same issue occurs with Router component:

```yaml
apiVersion: ome.io/v1beta1
kind: InferenceService
metadata:
  name: test-router-mode
  annotations:
    ome.io/deployment-mode: RawDeployment
spec:
  predictor:
    engine:
      minReplicas: 1
      container:
        image: engine:latest
  router:
    minReplicas: 0 # Also incorrectly triggers Serverless for Router
    container:
      image: router:latest
```

## Anything else we need to know?

### Root Cause

In `pkg/controller/v1beta1/inferenceservice/utils/deployment.go`, the functions `DetermineEngineDeploymentMode()` and `determineComponentDeploymentMode()` check for `minReplicas == 0` and return `Serverless` without considering the global deployment mode set via annotation.

```go
// Current behavior - ignores global mode
if engine.MinReplicas != nil && *engine.MinReplicas == 0 {
    return constants.Serverless  // Ignores explicit deploymentMode annotation!
}
```

### Impact

1. Users cannot use `minReplicas: 0` with external autoscalers (like KEDA) when using RawDeployment mode
2. Confusing error messages when Knative is not installed
3. User's explicit configuration is silently ignored

### Workaround

Set `minReplicas: 1` or higher when using `RawDeployment` mode. This prevents scale-to-zero functionality with external autoscalers.

## Environment

- OME version: v0.1.x
- Kubernetes version: v1.28+
- Cloud provider or hardware configuration: Any
- OS: Any
- Runtime: Any (SGLang, vLLM, etc.)
- Model being served: Any
- Install method: Helm or kubectl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] minReplicas=0 ignores explicit deploymentMode annotation #445

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Root Cause

Impact

Workaround

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] minReplicas=0 ignores explicit deploymentMode annotation #445

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Root Cause

Impact

Workaround

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions