-
Notifications
You must be signed in to change notification settings - Fork 171
Description
Summary
The ToolHive Kubernetes operator has admission webhook code for validating VirtualMCPServer, VirtualMCPCompositeToolDefinition, and MCPExternalAuthConfig resources, but these webhooks have never been functional. The controller-runtime v0.23.0 upgrade exposed this issue.
Background
What Happened
During the upgrade to controller-runtime v0.23.0, the operator began failing at startup with:
"error":"open /tmp/k8s-webhook-server/serving-certs/tls.crt: no such file or directory"
Root Cause Analysis
Investigation revealed that the webhooks were never actually working:
-
In controller-runtime v0.22.x: The old webhook API (
ctrl.NewWebhookManagedBy(mgr).For(r).Complete()) silently failed to register webhooks. The webhook server never started because no webhooks were registered with it. -
In controller-runtime v0.23.0: The new generic webhook API (
builder.WebhookManagedBy[T](mgr, r).WithValidator(r).Complete()) properly registers webhooks, which triggers the webhook server to start, which then fails because TLS certificates are not available.
Missing Infrastructure
Even if the webhook server started, the webhooks would not function because:
| Component | Required | Status |
|---|---|---|
| ValidatingWebhookConfiguration | ✓ | Not deployed by helm chart |
| Webhook Service | ✓ | Not deployed by helm chart |
| Port 9443 exposed | ✓ | Not in deployment spec |
| TLS certificates | ✓ | No cert-manager integration |
The config/webhook/manifests.yaml file exists (kubebuilder-generated) but is never deployed.
Impact of Missing Webhooks
The webhooks perform validation-only (no mutation). Without them:
| Resource | Webhook Validation | Controller Validation | Risk |
|---|---|---|---|
| VirtualMCPServer | Disabled | Partial (during reconcile) | Low - caught at reconcile |
| MCPExternalAuthConfig | Disabled | None | High - invalid configs silently accepted |
| VirtualMCPCompositeToolDefinition | Disabled | None | High - invalid configs silently accepted |
Example Validations Not Enforced
MCPExternalAuthConfig:
- Can create
tokenExchangetype without requiredtokenExchangeconfig - Can set conflicting configs (both
tokenExchangeandheaderInjection) - Unsupported auth types are accepted
VirtualMCPServer:
- Missing required
spec.config.groupRef(caught at reconcile, but not at admission) - Invalid auth configurations
Proposed Solution
Option 1: Full Webhook Support (Recommended for Production)
- Add cert-manager as a dependency or optional integration
- Deploy ValidatingWebhookConfiguration via helm chart
- Create webhook Service in helm chart
- Expose port 9443 in deployment
- Configure cert-manager Certificate resource
Option 2: Self-Signed Certificates (Development/Simple Deployments)
- Generate self-signed certificates at operator startup
- Mount emptyDir volume for certificate storage
- Deploy ValidatingWebhookConfiguration with
caBundleinjection - Create webhook Service
Option 3: Keep Webhooks Disabled (Current State)
- Document that webhooks are not functional
- Add controller-level validation for MCPExternalAuthConfig and VirtualMCPCompositeToolDefinition
- Accept that invalid resources can be created (will fail at runtime)
Current Workaround
Webhook registration has been disabled in cmd/thv-operator/main.go to allow the operator to start. The webhook server is not created.
References
- controller-runtime v0.23.0 breaking change: Generic Validator and Defaulter
- Webhook manifest location:
config/webhook/manifests.yaml - Affected files:
cmd/thv-operator/main.gocmd/thv-operator/api/v1alpha1/*_webhook.go