Skip to content

GitRepository clone fails with "context deadline exceeded" on GitLab despite SSH key working in debug pod #5789

@haruops

Description

@haruops

Describe the bug

The Flux GitRepository resource consistently fails to clone the repository from GitLab with a context deadline exceeded error, even though the same SSH key and repository URL work perfectly when tested from a debug pod with the secret mounted in the same way.

The error appears in the GitRepository status:

failed to checkout and determine revision: unable to clone 'ssh://[email protected]/<group>/<subgroup>/<project>.git': context deadline exceeded

Despite multiple attempts to regenerate the SSH secret, adjust timeouts, and verify connectivity, the issue persists. The debug pod confirms that the SSH key, known_hosts, and network connectivity to GitLab are all functioning correctly.

Steps to reproduce

  1. Create SSH secret with identity and known_hosts:
kubectl create secret generic git-ssh-auth \
  --from-file=identity=~/.ssh/id_ed25519 \
  --from-file=known_hosts=/tmp/known_hosts \
  -n flux-system
  1. Apply the GitRepository manifest:
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: flux-system
  namespace: flux-system
spec:
  interval: 1m0s
  ref:
    name: refs/heads/main
  secretRef:
    name: git-ssh-auth
  timeout: 60s
  url: ssh://[email protected]/<group>/<subgroup>/<project>.git

  1. Check the GitRepository status:
...
status:
  conditions:
  - lastTransitionTime: "2026-03-26T17:38:16Z"
    message: building artifact
    observedGeneration: 1
    reason: Progressing
    status: "True"
    type: Reconciling
  - lastTransitionTime: "2026-03-26T17:38:16Z"
    message: building artifact
    observedGeneration: 1
    reason: Progressing
    status: Unknown
    type: Ready
  - lastTransitionTime: "2026-03-26T17:37:12Z"
    message: 'failed to checkout and determine revision: unable to clone ''ssh://[email protected]/<group>/<subgroup>/<project>.git'':
      context deadline exceeded'
    observedGeneration: 1
    reason: GitOperationFailed
    status: "True"
    type: FetchFailed
  observedGeneration: -1

Expected behavior

The GitRepository should successfully clone the repository from GitLab using the provided SSH secret within the specified timeout, similar to how the debug pod successfully clones.

Screenshots and recordings

No response

OS / Distro

ArchLinux

Flux version

flux: v2.8.3

Flux check

$ flux check
► checking prerequisites
✔ Kubernetes 1.35.1 >=1.33.0-0
► checking version in cluster
✔ distribution: flux-v2.8.3
✔ bootstrapped: false
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v1.5.3@sha256:b150af0cd7a501dafe2374b1d22c39abf0572465df4fa1fb99b37927b0d95d75
✔ image-automation-controller: deployment ready
► ghcr.io/fluxcd/image-automation-controller:v1.1.1@sha256:43617c9fbb4cf32aed7458647f62589575237ccb810f45bd7cb31f24126d4f22
✔ image-reflector-controller: deployment ready
► ghcr.io/fluxcd/image-reflector-controller:v1.1.1@sha256:4c12c4046dee6e32e11b7c6afeaf7910406b67ff0182d46eeedb128d367908cd
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v1.8.2@sha256:c480b89e26e42f6c112a4f683244a7979de3a2ca299bed7d5367ddf4fed706f0
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v1.8.2@sha256:87806dc20caff40b37280ea3155cc9ef3e995402997c49a8f9f9c6bff57e1499
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v1.8.1@sha256:7382d002cffeed2d877331353f95797e89c0aa7ecb432e661eeeda3e590b3293
► checking crds
✔ alerts.notification.toolkit.fluxcd.io/v1beta3
✔ buckets.source.toolkit.fluxcd.io/v1
✔ externalartifacts.source.toolkit.fluxcd.io/v1
✔ gitrepositories.source.toolkit.fluxcd.io/v1
✔ helmcharts.source.toolkit.fluxcd.io/v1
✔ helmreleases.helm.toolkit.fluxcd.io/v2
✔ helmrepositories.source.toolkit.fluxcd.io/v1
✔ imagepolicies.image.toolkit.fluxcd.io/v1
✔ imagerepositories.image.toolkit.fluxcd.io/v1
✔ imageupdateautomations.image.toolkit.fluxcd.io/v1
✔ kustomizations.kustomize.toolkit.fluxcd.io/v1
✔ ocirepositories.source.toolkit.fluxcd.io/v1
✔ providers.notification.toolkit.fluxcd.io/v1beta3
✔ receivers.notification.toolkit.fluxcd.io/v1
✔ all checks passed

Git provider

No response

Container Registry provider

No response

Additional context

SSH Secret Structure

The git-ssh-auth secret is structured as follows:

apiVersion: v1
data:
  identity: |
    -----BEGIN OPENSSH PRIVATE KEY-----
    ...
    -----END OPENSSH PRIVATE KEY-----
  known_hosts: |
    gitlab.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBFSMqzJeV9rUzU4kWitGjeR4PWSa29SPqJ1fVkhtj3Hw9xjLVXVYrU9QlYWrOLXBpQ6KWjbjTDTdDkoohFzgbEY=
kind: Secret
metadata:
  name: git-ssh-auth
  namespace: flux-system
type: Opaque

Debug pod verification

I created a debug pod to verify the SSH key works correctly with GitLab using the exact same secret mount:

Debug pod manifest:

apiVersion: v1
kind: Pod
metadata:
  name: git-debug
  namespace: flux-system
spec:
  containers:
    - name: debug
      image: alpine/git:v2.52.0
      command: ["sleep", "infinity"]
      volumeMounts:
        - name: ssh-key
          mountPath: /root/.ssh/id_ed25519
          subPath: id_ed25519
          readOnly: true
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop: ["ALL"]
  volumes:
    - name: ssh-key
      secret:
        secretName: git-ssh-auth
        defaultMode: 0400
        items:
          - key: identity
            path: id_ed25519
  securityContext:
    runAsNonRoot: false
    seccompProfile:
      type: RuntimeDefault
  restartPolicy: Never

Successful clone from GitLab inside debug pod:

~ # ssh-keyscan gitlab.com >> ~/.ssh/known_hosts
~ # git clone ssh://[email protected]/<group>/<subgroup>/<project>.git /tmp/test
Cloning into '/tmp/test'...
remote: Enumerating objects: 505, done.
remote: Counting objects: 100% (457/457), done.
remote: Compressing objects: 100% (449/449), done.
remote: Total 505 (delta 161), reused 0 (delta 0), pack-reused 48 (from 1)
Receiving objects: 100% (505/505), 194.01 KiB | 1.16 MiB/s, done.
Resolving deltas: 100% (170/170), done.

Additional observations

  • The SSH key (ed25519) works perfectly from the debug pod with the same secret mount
  • known_hosts is properly configured with GitLab's host keys
  • Network connectivity to GitLab is confirmed from the cluster
  • The issue persists even after:
    Regenerating the secret with different key formats (ed25519, ecdsa)
    Increasing timeout to 120s
    Verifying the secret is correctly mounted in the source-controller pod
  • The error consistently occurs at the Flux source-controller level within the timeout period
  • This is not a network policy or firewall issue, as confirmed by the debug pod's successful clone from GitLab

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions