Skip to content

mmalyska/home-ops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4,713 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Home k8s infrastructure

Deploying a cluster with Talos and Terraform backed by ArgoCD and Bitwarden Secrets Manager.

Overview

🧱 Core components

🚚 Provisioning

For provisioning the following tools are used:

  • Talos - this is used to provision all nodes within cluster with uniform system and configuration as gitops
  • Terraform - in order to help with the DNS settings this is used to provision an already existing Cloudflare domain and DNS settings

πŸ“¦ Kubernetes

  • cert-manager - SSL certificates - with Cloudflare DNS challenge
  • Cilium - CNI (container network interface), kube-proxy replacement, L2 load balancer announcements
  • ArgoCD - GitOps tool for deploying manifests from the cluster directory
  • rook.io - ceph storage for k8s
  • nfs - used for cold storage on QNAP
  • Envoy Gateway - Kubernetes Gateway API implementation with two gateways:
    • envoy-external (192.168.48.20) - internet-facing via Cloudflare Tunnel
    • envoy-internal (192.168.48.21) - internal network only
  • Cloudflared - Cloudflare Tunnel client for external access
  • external-dns (cloudflare) - publishes external HTTPRoutes and DNSEndpoints to Cloudflare DNS
  • external-dns (adguard) - publishes internal HTTPRoutes and DNSEndpoints to AdGuard Home DNS
  • Keycloak - identity provider (OIDC)
  • External Secrets Operator - secret synchronization from Bitwarden Secrets Manager
  • kube-prometheus-stack - monitoring (Prometheus + Grafana)
  • CloudNative-PG - PostgreSQL operator
  • VolSync - persistent volume backup and restore

πŸ“ Setup

πŸ’» Systems

  • Nodes running Talos. These nodes are bare metals.
  • A Cloudflare account with a domain, this will be managed by Terraform.
  • QNAP used as NFS and S3 storage.

🧠 Devcontainer

For fast setup I use devcontainer to have same environment across different devices. See more inside .devcontainer and at Devcontainers

πŸ”§ Tools

  1. Install the most recent versions of the following command-line tools on your workstation, if you are using Homebrew on macOS or Linux skip to steps 3 and 4.

  2. This guide heavily relies on go-task as a framework for setting things up. It is advised to learn and understand the commands it is running under the hood.

  3. Install go-task via Brew

    brew install go-task/tap/go-task
  4. Install workstation dependencies via Brew

    task init

⚠️ pre-commit

It is advisable to install pre-commit and the pre-commit hooks that come with this repository. gitleaks will check to make sure you are not accidentally committing secrets.

  1. Enable Pre-Commit

    task precommit:init
  2. Update Pre-Commit, though it will occasionally make mistakes, so verify its results.

    task precommit:update

πŸ“‚ Repository structure

The Git repository contains the following directories under cluster and are ordered below by how Argo CD will apply them.

πŸ“ cluster
β”œβ”€β”€πŸ“„ bootstrap-application.yaml - root app-of-apps entry point
β”œβ”€β”€πŸ“ projects   - ArgoCD AppProject definitions (core/system/default/games/home-automation)
β”œβ”€β”€πŸ“ appsets    - ArgoCD ApplicationSet definitions (auto-discover app-config.yaml files)
β”œβ”€β”€πŸ“ apps       - application manifests organized by category
β”‚   β”œβ”€β”€πŸ“ core   - cluster core (cilium, argocd, rook-ceph)
β”‚   β”œβ”€β”€πŸ“ system - platform services (traefik, cert-manager, monitoring, keycloak, external-secrets...)
β”‚   β”œβ”€β”€πŸ“ default - workload apps (jellyfin, gitea, n8n, open-webui, gethomepage...)
β”‚   β”œβ”€β”€πŸ“ games  - game servers (minecraft-bedrock, vintagestory)
β”‚   β””β”€β”€πŸ“ home-automation - home automation (vernemq, ollama, whisper, piper, openwakeword)
β””β”€β”€πŸ“ .tools     - utility manifests (rook wipe jobs, etc.)

πŸš€ Deployment

☁️ Global Cloudflare API Token

In order to use Terraform and cert-manager with the Cloudflare DNS challenge you will need to create an API Token.

  1. Head over to Cloudflare and create an API Token by going here.

  2. Under the API Tokens section, create a scoped API Token.

  3. Store the API Token in Bitwarden Secrets Manager and reference it by UUID in:

    • provision/terraform/cloudflare/bitwarden_secrets.tf (via bitwarden-secrets Terraform provider)
    • cluster/apps/system/cert-manager/resources/api-token-externalsecret.yaml (via ESO ExternalSecret)

⚑ Preparing Talos nodes

  1. Get a ISO image of the installer from latest release

  2. Configure nodes inside provision/talos/talconfig.yaml

  3. Run task talos:init to generate talos configs for each node

  4. Follow guide on Getting Started for details on Talos installation

☁️ Configuring Cloudflare DNS with Terraform

πŸ“ Review the Terraform scripts under ./provision/terraform/cloudflare/ and make sure you understand what it's doing (no really review it). If your domain already has existing DNS records be sure to export those DNS settings before you continue. Ideally you can update the terraform script to manage DNS for all records if you so choose to.

  1. Pull in the Terraform deps by running task terraform:init:cloudflare

  2. Review the changes Terraform will make to your Cloudflare domain by running task terraform:plan:cloudflare

  3. Finally have Terraform execute the task by running task terraform:apply:cloudflare

If Terraform was ran successfully you can log into Cloudflare and validate the DNS records are present.

πŸ™ Bootstrapping the cluster

πŸ“ Before running bootstrap, make sure your .envrc is sourced (via direnv) so that BWS_TOKEN, KUBECONFIG, and TALOSCONFIG are all set in your environment.

  1. Apply Talos config to each node (first boot, use --insecure before the PKI is established):

    talosctl apply-config --nodes 192.168.48.2 --insecure -f provision/talos/clusterconfig/home-mc1.yaml
    talosctl apply-config --nodes 192.168.48.3 --insecure -f provision/talos/clusterconfig/home-mc2.yaml
    talosctl apply-config --nodes 192.168.48.4 --insecure -f provision/talos/clusterconfig/home-mc3.yaml
  2. Push your configuration to git so ArgoCD can read it:

    git add -A
    git commit -m "chore: initial cluster configuration"
    git push
  3. Run the full bootstrap (etcd β†’ kubeconfig β†’ Cilium β†’ ESO secret injection β†’ Rook wipe β†’ ArgoCD):

    task bootstrap:kubernetes

    This single command automates the following phases in order:

    Phase What happens
    etcd Bootstraps the etcd leader on the first control-plane node
    kubeconfig Fetches kubeconfig from Talos into $KUBECONFIG
    apps (helmfile) Installs Cilium CNI and kubelet-csr-approver; waits for all nodes Ready
    eso-bootstrap Creates the external-secrets namespace and injects the bitwarden-access-token K8s Secret from $BWS_TOKEN
    rook Wipes Rook data directories and raw disks on every node (destructive β€” disks must be clean for Ceph)
    argocd Creates an empty cluster-secrets placeholder so the repo-server CMP sidecar can start, applies the ArgoCD kustomize, applies the root bootstrap-application.yaml, and waits for argocd-server to be ready

    After task bootstrap:kubernetes completes, ArgoCD is running and the root app-of-apps is applied. ApplicationSets auto-discover all enabled: "true" apps and begin syncing them. External Secrets Operator deploys first (sync-wave -5), authenticates with Bitwarden using the injected token, and populates the cluster-secrets K8s Secret β€” at which point ArgoCD can fully resolve <secret:key> tokens in all app manifests.

  4. Log in to ArgoCD with the local admin account (OIDC is unavailable until Keycloak finishes deploying):

    task argocd:login
    # When prompted, use local credentials β€” SSO will not work yet
  5. Sync the Rook Ceph operator and cluster (manual gate β€” storage is not auto-synced to prevent accidental disk claims):

    task bootstrap:rook-sync
  6. Once Keycloak has deployed and its realm/client are configured, SSO login works automatically. You can re-run task argocd:login to switch to SSO.

πŸŽ‰ Congratulations you have a Kubernetes cluster managed by ArgoCD, your Git repository is driving the state of your cluster.

πŸ“£ Post installation

πŸ‘‰ Cluster maintenance

This section will be about upgrading k8s and onther components on your cluster using Talos.

About

Repository for home infrastructure and monorepo for kubernetes cluster

Topics

Resources

License

Stars

Watchers

Forks

Contributors