You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add Kubernetes manifests and CI workflows for de.NBI migration
Decompose the monolithic Docker container into Kubernetes workloads:
- Streamlit Deployment with health probes and session affinity
- Redis Deployment + Service for job queue
- RQ Worker Deployment for background workflows
- CronJob for workspace cleanup
- Ingress with WebSocket support and cookie-based sticky sessions
- Shared PVC (ReadWriteMany) for workspace data
- ConfigMap for runtime configuration (replaces build-time settings)
- Kustomize base + template-app overlay for multi-app deployment
Code changes:
- Remove unsafe enableCORS=false and enableXsrfProtection=false from config.toml
- Make workspace path configurable via WORKSPACES_DIR env var in clean-up-workspaces.py
CI/CD:
- Add build-and-push-image.yml to push Docker images to ghcr.io
- Add k8s-manifests-ci.yml for manifest validation and kind integration tests
https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
* Fix kubeconform validation to skip kustomization.yaml
kustomization.yaml is a Kustomize config file, not a standard K8s resource,
so kubeconform has no schema for it. Exclude it via -ignore-filename-pattern.
https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
* Add matrix strategy to test both Dockerfiles in integration tests
The integration-test job now uses a matrix with Dockerfile_simple and
Dockerfile. Each matrix entry checks if its Dockerfile exists before
running — all steps are guarded with an `if` condition so they skip
gracefully when a Dockerfile is absent. This allows downstream forks
that only have one Dockerfile to pass CI without errors.
https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
* Adapt K8s base manifests for de.NBI Cinder CSI storage
- Switch workspace PVC from ReadWriteMany to ReadWriteOnce with
cinder-csi storage class (required by de.NBI KKP cluster)
- Increase PVC storage to 500Gi
- Add namespace: openms to kustomization.yaml
- Reduce pod resource requests (1Gi/500m) and limits (8Gi/4 CPU)
so all workspace-mounting pods fit on a single node
https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
* Add pod affinity rules to co-locate all workspace pods on same node
The workspaces PVC uses ReadWriteOnce (Cinder CSI block storage) which
requires all pods mounting it to run on the same node. Without explicit
affinity rules, the scheduler was failing silently, leaving pods in
Pending state with no events.
Adds a `volume-group: workspaces` label and podAffinity with
requiredDuringSchedulingIgnoredDuringExecution to streamlit deployment,
rq-worker deployment, and cleanup cronjob. This ensures the scheduler
explicitly co-locates all workspace-consuming pods on the same node.
https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
* Fix CI: wait for ingress-nginx admission webhook before deploying
The controller pod being Ready doesn't guarantee the admission webhook
service is accepting connections. Add a polling loop that waits for the
webhook endpoint to have an IP assigned before applying the Ingress
resource, preventing "connection refused" errors during kustomize apply.
https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
* Fix CI: add -n openms namespace to integration test steps
The kustomize overlay deploys into the openms namespace, but the
verification steps (Redis wait, Redis ping, deployment checks) were
querying the default namespace, causing "no matching resources found".
https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
* Fix CI: retry kustomize deploy for webhook readiness
Replace the unreliable endpoint-IP polling with a retry loop on
kubectl apply (up to 5 attempts with backoff). This handles the race
where the ingress-nginx admission webhook has an endpoint IP but isn't
yet accepting TCP connections.
https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
* Fix REDIS_URL to use prefixed service name in overlay
Kustomize namePrefix renames the Redis service to template-app-redis,
but the REDIS_URL env var in streamlit and rq-worker deployments still
referenced the unprefixed name "redis", causing the rq-worker to
CrashLoopBackOff with "Name or service not known".
Add JSON patches in the overlay to set the correct prefixed hostname.
https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
* Add Traefik IngressRoute for direct LB IP access
The cluster uses Traefik, not nginx, so the nginx Ingress annotations
are ignored. Add a Traefik IngressRoute with PathPrefix(/) catch-all
routing and sticky session cookie for Streamlit session affinity.
https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
* Fix CI: skip Traefik IngressRoute CRD in validation and integration tests
kubeconform doesn't know the Traefik IngressRoute CRD schema, and the
kind cluster in integration tests doesn't have Traefik installed. Skip
the IngressRoute in kubeconform validation and filter it out with yq
before applying to the kind cluster.
https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
* Fix IngressRoute service name for kustomize namePrefix
Kustomize namePrefix doesn't rewrite service references inside CRDs,
so the IngressRoute was pointing to 'streamlit' instead of
'template-app-streamlit', causing Traefik to return 404.
https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
* fix: use ConfigMap as settings override instead of full replacement
The ConfigMap was replacing the entire settings.json, losing keys like
"version" and "repository-name" that the app expects (causing KeyError).
Now the ConfigMap only contains deployment-specific overrides, which are
merged into the Docker image's base settings.json at container startup
using jq.
https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
* fix: add set -euo pipefail to fail fast on settings merge error
Addresses CodeRabbit review: if jq merge fails, the container should
not start with unmerged settings.
https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
---------
Co-authored-by: Claude <[email protected]>
0 commit comments