-
Notifications
You must be signed in to change notification settings - Fork 719
Description
Describe the bug
The Flux GitRepository resource consistently fails to clone the repository from GitLab with a context deadline exceeded error, even though the same SSH key and repository URL work perfectly when tested from a debug pod with the secret mounted in the same way.
The error appears in the GitRepository status:
failed to checkout and determine revision: unable to clone 'ssh://[email protected]/<group>/<subgroup>/<project>.git': context deadline exceeded
Despite multiple attempts to regenerate the SSH secret, adjust timeouts, and verify connectivity, the issue persists. The debug pod confirms that the SSH key, known_hosts, and network connectivity to GitLab are all functioning correctly.
Steps to reproduce
- Create SSH secret with identity and known_hosts:
kubectl create secret generic git-ssh-auth \
--from-file=identity=~/.ssh/id_ed25519 \
--from-file=known_hosts=/tmp/known_hosts \
-n flux-system- Apply the GitRepository manifest:
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: flux-system
namespace: flux-system
spec:
interval: 1m0s
ref:
name: refs/heads/main
secretRef:
name: git-ssh-auth
timeout: 60s
url: ssh://[email protected]/<group>/<subgroup>/<project>.git
- Check the GitRepository status:
...
status:
conditions:
- lastTransitionTime: "2026-03-26T17:38:16Z"
message: building artifact
observedGeneration: 1
reason: Progressing
status: "True"
type: Reconciling
- lastTransitionTime: "2026-03-26T17:38:16Z"
message: building artifact
observedGeneration: 1
reason: Progressing
status: Unknown
type: Ready
- lastTransitionTime: "2026-03-26T17:37:12Z"
message: 'failed to checkout and determine revision: unable to clone ''ssh://[email protected]/<group>/<subgroup>/<project>.git'':
context deadline exceeded'
observedGeneration: 1
reason: GitOperationFailed
status: "True"
type: FetchFailed
observedGeneration: -1Expected behavior
The GitRepository should successfully clone the repository from GitLab using the provided SSH secret within the specified timeout, similar to how the debug pod successfully clones.
Screenshots and recordings
No response
OS / Distro
ArchLinux
Flux version
flux: v2.8.3
Flux check
$ flux check
► checking prerequisites
✔ Kubernetes 1.35.1 >=1.33.0-0
► checking version in cluster
✔ distribution: flux-v2.8.3
✔ bootstrapped: false
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v1.5.3@sha256:b150af0cd7a501dafe2374b1d22c39abf0572465df4fa1fb99b37927b0d95d75
✔ image-automation-controller: deployment ready
► ghcr.io/fluxcd/image-automation-controller:v1.1.1@sha256:43617c9fbb4cf32aed7458647f62589575237ccb810f45bd7cb31f24126d4f22
✔ image-reflector-controller: deployment ready
► ghcr.io/fluxcd/image-reflector-controller:v1.1.1@sha256:4c12c4046dee6e32e11b7c6afeaf7910406b67ff0182d46eeedb128d367908cd
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v1.8.2@sha256:c480b89e26e42f6c112a4f683244a7979de3a2ca299bed7d5367ddf4fed706f0
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v1.8.2@sha256:87806dc20caff40b37280ea3155cc9ef3e995402997c49a8f9f9c6bff57e1499
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v1.8.1@sha256:7382d002cffeed2d877331353f95797e89c0aa7ecb432e661eeeda3e590b3293
► checking crds
✔ alerts.notification.toolkit.fluxcd.io/v1beta3
✔ buckets.source.toolkit.fluxcd.io/v1
✔ externalartifacts.source.toolkit.fluxcd.io/v1
✔ gitrepositories.source.toolkit.fluxcd.io/v1
✔ helmcharts.source.toolkit.fluxcd.io/v1
✔ helmreleases.helm.toolkit.fluxcd.io/v2
✔ helmrepositories.source.toolkit.fluxcd.io/v1
✔ imagepolicies.image.toolkit.fluxcd.io/v1
✔ imagerepositories.image.toolkit.fluxcd.io/v1
✔ imageupdateautomations.image.toolkit.fluxcd.io/v1
✔ kustomizations.kustomize.toolkit.fluxcd.io/v1
✔ ocirepositories.source.toolkit.fluxcd.io/v1
✔ providers.notification.toolkit.fluxcd.io/v1beta3
✔ receivers.notification.toolkit.fluxcd.io/v1
✔ all checks passed
Git provider
No response
Container Registry provider
No response
Additional context
SSH Secret Structure
The git-ssh-auth secret is structured as follows:
apiVersion: v1
data:
identity: |
-----BEGIN OPENSSH PRIVATE KEY-----
...
-----END OPENSSH PRIVATE KEY-----
known_hosts: |
gitlab.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBFSMqzJeV9rUzU4kWitGjeR4PWSa29SPqJ1fVkhtj3Hw9xjLVXVYrU9QlYWrOLXBpQ6KWjbjTDTdDkoohFzgbEY=
kind: Secret
metadata:
name: git-ssh-auth
namespace: flux-system
type: OpaqueDebug pod verification
I created a debug pod to verify the SSH key works correctly with GitLab using the exact same secret mount:
Debug pod manifest:
apiVersion: v1
kind: Pod
metadata:
name: git-debug
namespace: flux-system
spec:
containers:
- name: debug
image: alpine/git:v2.52.0
command: ["sleep", "infinity"]
volumeMounts:
- name: ssh-key
mountPath: /root/.ssh/id_ed25519
subPath: id_ed25519
readOnly: true
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumes:
- name: ssh-key
secret:
secretName: git-ssh-auth
defaultMode: 0400
items:
- key: identity
path: id_ed25519
securityContext:
runAsNonRoot: false
seccompProfile:
type: RuntimeDefault
restartPolicy: NeverSuccessful clone from GitLab inside debug pod:
~ # ssh-keyscan gitlab.com >> ~/.ssh/known_hosts
~ # git clone ssh://[email protected]/<group>/<subgroup>/<project>.git /tmp/test
Cloning into '/tmp/test'...
remote: Enumerating objects: 505, done.
remote: Counting objects: 100% (457/457), done.
remote: Compressing objects: 100% (449/449), done.
remote: Total 505 (delta 161), reused 0 (delta 0), pack-reused 48 (from 1)
Receiving objects: 100% (505/505), 194.01 KiB | 1.16 MiB/s, done.
Resolving deltas: 100% (170/170), done.
Additional observations
- The SSH key (ed25519) works perfectly from the debug pod with the same secret mount
- known_hosts is properly configured with GitLab's host keys
- Network connectivity to GitLab is confirmed from the cluster
- The issue persists even after:
Regenerating the secret with different key formats (ed25519, ecdsa)
Increasing timeout to 120s
Verifying the secret is correctly mounted in the source-controller pod - The error consistently occurs at the Flux source-controller level within the timeout period
- This is not a network policy or firewall issue, as confirmed by the debug pod's successful clone from GitLab
Code of Conduct
- I agree to follow this project's Code of Conduct