Skip to content

[BUG] Operator reverts manual StatefulSet replica scaling back to 1 #3329

@olamide226

Description

@olamide226

Bug Description

When manually scaling a StatefulSet replicas (e.g., increasing from 1 to 3), the ToolHive operator automatically reverts the replica count back to 1. This behavior prevents horizontal scaling of MCP servers.

Steps to Reproduce

  1. Deploy an MCP server via the ToolHive operator (creates a StatefulSet with 1 replica)
  2. Manually scale the StatefulSet:
    kubectl scale statefulset <mcpserver-name> --replicas=3
  3. Observe that the operator reverts the replicas back to 1

Expected Behavior

The operator should NOT automatically revert manual scaling changes. The manually set replica count should persist.

Actual Behavior

The operator overrides the manual scaling and resets replicas to 1.

Root Cause

The MCPServer CRD lacks a replicas field to persist the desired replica state. Without this field, the operator has no way to know whether the replica count was intentionally changed, causing it to revert to its default state.

Proposed Solution

  1. Add a replicas field to the MCPServer CRD spec to allow users to declare the desired replica count
  2. Update the operator to respect this field and not override manual scaling changes
  3. The field should be optional with a default value of 1 for backward compatibility

Example:

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPServer
metadata:
  name: my-mcp-server
spec:
  replicas: 3  # New field
  image: my-image:latest
  # ... other fields

Additional Context

  • For scaling purposes, typically only one proxy/runner is needed since it routes to a headless service and can load balance between pods in the StatefulSet
  • This is especially relevant for stateless MCP servers or Streamable HTTP MCP servers where load balancing works well
  • For stateful MCP servers, scaling considerations may be more complex

Environment

  • ToolHive Operator version: v0.6.12
  • Kubernetes version: v1.33.3+k3s1

Related discussion: The community has confirmed this is a bug and the operator should NOT revert manual scaling changes.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingkubernetesItems related to Kubernetesoperator

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions