Skip to content

Replace envoy proxy with Traefik ingress in sandbox#7134

Merged
pingsutw merged 9 commits intov2from
remove-envoy-proxy-from-sandbox
Apr 1, 2026
Merged

Replace envoy proxy with Traefik ingress in sandbox#7134
pingsutw merged 9 commits intov2from
remove-envoy-proxy-from-sandbox

Conversation

@pingsutw
Copy link
Copy Markdown
Member

@pingsutw pingsutw commented Apr 1, 2026

Summary

  • Remove the envoy proxy from the flyte sandbox and replace it with k3s's built-in Traefik ingress controller
  • Reduces sandbox image size by ~50MB compressed (~130-170MB uncompressed) by removing the envoyproxy/envoy image
  • Preserves the single-port localhost:30080 experience using Kubernetes Ingress resources + Traefik Middleware CRDs

Motivation

With the flyte-sdk replacing gRPC with connectRPC (flyteorg/flyte-sdk#844), envoy's HTTP/2 protocol bridging is no longer needed. ConnectRPC works over plain HTTP/1.1, so any standard L7 ingress controller can handle routing. Traefik is already bundled in k3s at zero additional image cost.

Changes

  • Delete envoy proxy Helm templates (configmap, deployment, service)
  • Add Traefik HelmChartConfig to configure NodePort 30080 with unlimited streaming timeout
  • Add Kubernetes Ingress resources for all routed services (/flyteidl2.*, /v2, /kubernetes-dashboard/, /minio/)
  • Add Traefik Middleware CRDs for prefix stripping (dashboard, minio)
  • Remove envoyproxy/envoy from sandbox image manifest
  • Re-enable Traefik in k3s (remove --disable=traefik from Dockerfile and k3d config)
  • Remove unused sandbox.proxy values and stale README entries
  • Regenerate bundled manifests

Test plan

  • helm lint charts/flyte-sandbox passes
  • helm template renders correctly in standard and dev modes
  • No envoy references remain in rendered output
  • Sandbox starts successfully with flytectl demo start or equivalent
  • All services accessible via localhost:30080 (API, console, dashboard, minio)
  • Streaming RPCs (e.g., log tailing) work without timeout

pingsutw added 2 commits April 1, 2026 12:01
Remove the envoy proxy from the sandbox and replace it with k3s's
built-in Traefik ingress controller. This reduces the sandbox image
size by ~50MB compressed while preserving the single-port
localhost:30080 experience.

Changes:
- Delete envoy proxy templates (configmap, deployment, service)
- Add Traefik HelmChartConfig for NodePort 30080 with streaming timeout
- Add Kubernetes Ingress resources for all routed services
- Add Traefik Middleware CRDs for prefix stripping (dashboard, minio)
- Remove envoy image from sandbox image manifest
- Re-enable Traefik in k3s (Dockerfile and k3d config)
- Remove unused proxy values and README entries
- Regenerate bundled manifests

Signed-off-by: Kevin Su <[email protected]>
Resolve conflicts:
- _helpers.tpl: v2 removed buildkit helpers, we removed envoy helpers — both removals kept
- values.yaml: v2 removed buildkit section, we removed proxy section — both removals kept
- manifests: regenerated from merged templates

Signed-off-by: Kevin Su <[email protected]>
@github-actions github-actions bot mentioned this pull request Apr 1, 2026
3 tasks
pingsutw added 7 commits April 1, 2026 12:12
Use a template variable for the backend service name instead of
repeating the if/else conditional in each path entry.

Signed-off-by: Kevin Su <[email protected]>
…bility

- Change websecure.expose from nested object format to boolean (v25.0.2
  Traefik chart uses `expose: false`, not `expose.default: false`)
- Move streaming timeout into ports.web.transport instead of
  additionalArguments

Signed-off-by: Kevin Su <[email protected]>
Resolve conflicts:
- manifest.txt: v2 removed postgresql, we removed envoy — both kept
- manifests: regenerated from merged templates

Signed-off-by: Kevin Su <[email protected]>
The embedded-postgres binary was added but never launched in the
entrypoint script. Start it in the background and wait for its ready
file before proceeding, so PostgreSQL is available when the
flyte-binary pod's wait-for-db init container runs.

Signed-off-by: Kevin Su <[email protected]>
When reusing a docker volume from a previous sandbox run that used the
old bitnami PostgreSQL (uid 1001), the embedded-postgres (uid 999)
cannot clean up the data directory. Fix by chowning the directory as
root in the entrypoint before starting embedded-postgres.

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
@pingsutw pingsutw merged commit 279c32e into v2 Apr 1, 2026
17 checks passed
@pingsutw pingsutw deleted the remove-envoy-proxy-from-sandbox branch April 1, 2026 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants