Skip to content

deviant101/rke2-ha-traefik-aws-ccm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RKE2 HA Cluster with Traefik Ingress & AWS CCM

A production-ready, highly available RKE2 Kubernetes cluster deployed on AWS with Traefik as the ingress controller and AWS Cloud Controller Manager (CCM) for seamless AWS integration.

Architecture Overview

┌────────────────────────────────────────────────────────┐
│                  AWS Region (us-east-1)                │
├────────────────────────────────────────────────────────┤
│  ┌──────────────────────────────────────────────────┐  │
│  │  Network Load Balancer (NLB)                     │  │
│  │  - Exposes Traefik on port 80/443                │  │
│  │  - Distributes traffic across worker nodes       │  │
│  └──────────────────────────────────────────────────┘  │
│              ↓         ↓         ↓                     │
│  ┌─────────────────────────────────────────────────┐   │
│  │    RKE2 HA Control Plane (3 nodes)              │   │
│  │    - Multi-AZ deployment                        │   │
│  │    - etcd HA cluster                            │   │
│  └─────────────────────────────────────────────────┘   │
│              ↓         ↓         ↓                     │
│  ┌─────────────────────────────────────────────────┐   │
│  │    RKE2 Worker Nodes (3+ nodes)                 │   │
│  │    - Traefik ingress controller                 │   │
│  │    - AWS CCM for volume/network provisioning    │   │
│  │    - Application workloads                      │   │
│  └─────────────────────────────────────────────────┘   │
└────────────────────────────────────────────────────────┘

Prerequisites

  • AWS Account with appropriate permissions
  • Terraform >= 1.5.0
  • AWS CLI configured
  • SSH key pair for EC2 instance access
  • kubectl installed locally (optional, for cluster management)

Project Structure

.
├── README.md                          # This file
├── terraform/                         # Infrastructure as Code
│   ├── main.tf                        # Main Terraform configuration
│   ├── variables.tf                   # Input variables
│   ├── outputs.tf                     # Cluster outputs (NLB endpoint, kubeconfig)
│   ├── terraform.tfvars               # Variable values (customize here)
│   ├── terraform.tfvars.example       # Example terraform variables
│   └── modules/
│       ├── vpc/                       # VPC, subnets, security groups
│       ├── iam/                       # IAM roles for RKE2 and AWS CCM
│       ├── security-groups/           # Security group rules
│       ├── nlb/                       # Network Load Balancer
│       ├── control-plane/             # Control plane EC2 instances
│       └── workers/                   # Worker node EC2 instances
├── scripts/                           # Initialization scripts
│   ├── control-plane-init.sh          # Control plane bootstrap
│   └── worker-init.sh                 # Worker node bootstrap
├── app-deployment/                    # Example Kubernetes manifests
│   ├── test-apps.yaml                 # Sample nginx + whoami deployments
│   └── ingress.yaml                   # Traefik ingress configuration
└── important/                         # Configuration files
    ├── traefik-values.yaml            # Traefik Helm chart values
    └── values.yaml                    # Additional Helm values

Quick Start

1. Prepare Infrastructure

Clone this repository and customize variables:

cd terraform/
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your values:
# - aws_region: AWS region for deployment
# - cluster_name: Name for your RKE2 cluster
# - control_plane_count: Number of control plane nodes (default: 3)
# - worker_count: Number of worker nodes (default: 3)
# - ssh_public_key_path: Path to your SSH public key

2. Deploy Infrastructure

# Initialize Terraform
terraform init

# Review planned infrastructure
terraform plan

# Deploy cluster
terraform apply

Expected time: 10-15 minutes

3. Retrieve Cluster Credentials

After Terraform completes:

# Get NLB endpoint
terraform output nlb_endpoint

# Retrieve kubeconfig
terraform output -raw kubeconfig > ~/.kube/rke2-cluster.yaml
export KUBECONFIG=~/.kube/rke2-cluster.yaml

# Verify cluster is ready
kubectl get nodes
kubectl get pods -A

4. Verify Traefik Ingress

The Traefik ingress controller is automatically deployed during cluster initialization.

# Check Traefik deployment
kubectl -n kube-system get deployment traefik

# Get NLB endpoint
NLB_ENDPOINT=$(terraform output -raw nlb_endpoint)
echo "Access your cluster at: http://$NLB_ENDPOINT"

5. Deploy Sample Applications

# Create namespace
kubectl create namespace my-apps

# Deploy test applications (nginx + whoami)
kubectl apply -f app-deployment/test-apps.yaml

# Deploy Traefik ingress routes
kubectl apply -f app-deployment/ingress.yaml

# Test endpoints
curl http://$NLB_ENDPOINT/nginx
curl http://$NLB_ENDPOINT/whoami

Configuration Details

Cluster Specifications

Component Configuration
RKE2 Version v1.34.6+rke2r1 (customizable in terraform.tfvars)
Control Planes 3 nodes across AZs (HA etcd cluster)
Workers 3+ nodes (adjustable in terraform.tfvars)
Instance Types t3.medium (control plane & workers)
Network Custom VPC (10.0.0.0/16) with public subnets
Ingress Controller Traefik v2.x (NLB-backed LoadBalancer service)
Cloud Integration AWS CCM for volumes & load balancers

AWS CCM Features

  • Automatic Volume Provisioning: EBS volumes provisioned and attached automatically
  • Load Balancer Integration: Services with type: LoadBalancer create AWS NLBs/CLBs
  • Node Health Management: Automatically marks unhealthy nodes as NotReady
  • IAM-based Authentication: Uses instance IAM roles (no hardcoded credentials)

Traefik Ingress Configuration

Traefik is configured to:

  • Service Type: LoadBalancer (creates NLB in AWS)
  • Replicas: 2 for HA (configurable in important/traefik-values.yaml)
  • Entrypoints:
    • HTTP (port 80)
    • HTTPS (port 443, requires cert configuration)
  • Default Ingress Class: Traefik is the default ingress controller

Custom Ingress Routes

Define ingress routes using Traefik annotations or Kubernetes Ingress resources:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
  namespace: my-apps
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
    # Add Traefik-specific middleware, load balancer config, etc.
spec:
  ingressClassName: traefik
  rules:
    - host: your-domain.com
      http:
        paths:
          - path: /api
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 8080

For path rewriting with Traefik, use Middleware resources:

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: strip-prefix
  namespace: my-apps
spec:
  stripPrefix:
    prefixes:
      - /api

Cluster Access & Management

SSH into Nodes

# Get node IPs from Terraform output or AWS console
NODE_IP="10.0.1.100"  # Replace with actual IP
ssh -i ~/.ssh/your-key.pem ubuntu@$NODE_IP

Access Kubernetes Cluster

# Set kubeconfig
export KUBECONFIG=~/.kube/rke2-cluster.yaml

# View cluster nodes
kubectl get nodes

# View all pods
kubectl get pods -A

# View Traefik service
kubectl -n kube-system get svc traefik

# View Traefik logs
kubectl -n kube-system logs -f deployment/traefik

Verify AWS CCM

# Check AWS CCM pods
kubectl -n kube-system get pods | grep aws

# View node provider IDs (set by AWS CCM)
kubectl get nodes -o json | jq '.items[].spec.providerID'

Production Considerations

Security

  • Enable HTTPS/TLS certificates (Cert-Manager + Let's Encrypt)
  • Implement RBAC policies for least privilege access
  • Use a private VPC with bastion host for SSH access
  • Enable VPC Flow Logs for network monitoring
  • Encrypt Kubernetes secrets at rest (etcd encryption)

High Availability

  • Configure cluster autoscaling for workers
  • Use pod disruption budgets for critical workloads
  • Implement cross-AZ pod affinity rules
  • Monitor etcd health and replication

Monitoring & Logging

  • Deploy Prometheus + Grafana for metrics
  • Configure CloudWatch agent for OS-level monitoring
  • Set up centralized logging (ELK, Loki, etc.)
  • Enable AWS CloudTrail for audit logging

Cost Optimization

  • Right-size EC2 instances based on workload
  • Use Reserved Instances or Savings Plans
  • Clean up unused EBS volumes and load balancers
  • Monitor AWS billing and set up alerts

Troubleshooting

Cluster Not Ready

# Check node status
kubectl get nodes

# Check system pod status
kubectl get pods -n kube-system

# View node logs
kubectl describe node <node-name>

# SSH and check RKE2 logs
ssh ubuntu@<node-ip>
sudo journalctl -u rke2-server -f  # Control plane
sudo journalctl -u rke2-agent -f   # Worker

Ingress Not Routing Traffic

# Verify Traefik is running
kubectl -n kube-system get pods -l app.kubernetes.io/name=traefik

# Check ingress resource
kubectl -n my-apps describe ingress test-ingress

# Verify service endpoints
kubectl -n my-apps get endpoints

# Test service connectivity from pod
kubectl -n my-apps run -it test-pod --image=curlimages/curl -- sh
curl http://nginx-svc:80

NLB Not Responding

# Check Traefik service has external IP
kubectl -n kube-system get svc traefik

# Verify security groups allow port 80/443
# - Check AWC console > Security Groups
# - Ensure inbound rules allow 80/443 from 0.0.0.0/0

# Check NLB health checks
# AWS Console > EC2 > Load Balancers > Select NLB > Target Groups

AWS CCM Issues

# Check AWS CCM logs
kubectl -n kube-system logs -f deployment/aws-cloud-controller-manager

# Verify IAM role permissions
# Attached policy should include:
# - ec2:DescribeInstances
# - ec2:DescribeRegions
# - ec2:DescribeVolumes
# - ec2:CreateTags
# - etc.

Cleanup

To destroy the entire cluster and AWS resources:

cd terraform/
terraform destroy

Warning: This will permanently delete all resources including:

  • EC2 instances
  • VPC and networking
  • Load balancers
  • EBS volumes
  • All Kubernetes workloads

Architecture Decisions

Why RKE2?

  • Simplified HA control plane setup
  • Built-in Kubernetes components
  • Single binary deployment
  • Better resource efficiency than kubeadm on HA clusters

Why Traefik?

  • Lightweight ingress controller
  • Excellent Kubernetes integration
  • Support for advanced routing (via CRDs)
  • Built-in metrics and observability
  • Lower resource footprint vs NGINX/HAProxy

Why AWS CCM?

  • Seamless AWS integration without custom provisioning
  • Automatic EBS volume attachment
  • AWS load balancer service type support
  • Instance metadata-based IAM auth (secure, no credentials in code)

Support & Documentation

License

This project is provided as-is for educational and reference purposes.


Last Updated: April 2026
Tested With: RKE2 v1.34.6, Traefik v2.x, Terraform 1.5+

About

HA RKE2 cluster on AWS with Traefik ingress & AWS CCM configured via Terraform.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors