A production-ready, highly available RKE2 Kubernetes cluster deployed on AWS with Traefik as the ingress controller and AWS Cloud Controller Manager (CCM) for seamless AWS integration.
┌────────────────────────────────────────────────────────┐
│ AWS Region (us-east-1) │
├────────────────────────────────────────────────────────┤
│ ┌──────────────────────────────────────────────────┐ │
│ │ Network Load Balancer (NLB) │ │
│ │ - Exposes Traefik on port 80/443 │ │
│ │ - Distributes traffic across worker nodes │ │
│ └──────────────────────────────────────────────────┘ │
│ ↓ ↓ ↓ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ RKE2 HA Control Plane (3 nodes) │ │
│ │ - Multi-AZ deployment │ │
│ │ - etcd HA cluster │ │
│ └─────────────────────────────────────────────────┘ │
│ ↓ ↓ ↓ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ RKE2 Worker Nodes (3+ nodes) │ │
│ │ - Traefik ingress controller │ │
│ │ - AWS CCM for volume/network provisioning │ │
│ │ - Application workloads │ │
│ └─────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────┘
- AWS Account with appropriate permissions
- Terraform >= 1.5.0
- AWS CLI configured
- SSH key pair for EC2 instance access
kubectlinstalled locally (optional, for cluster management)
.
├── README.md # This file
├── terraform/ # Infrastructure as Code
│ ├── main.tf # Main Terraform configuration
│ ├── variables.tf # Input variables
│ ├── outputs.tf # Cluster outputs (NLB endpoint, kubeconfig)
│ ├── terraform.tfvars # Variable values (customize here)
│ ├── terraform.tfvars.example # Example terraform variables
│ └── modules/
│ ├── vpc/ # VPC, subnets, security groups
│ ├── iam/ # IAM roles for RKE2 and AWS CCM
│ ├── security-groups/ # Security group rules
│ ├── nlb/ # Network Load Balancer
│ ├── control-plane/ # Control plane EC2 instances
│ └── workers/ # Worker node EC2 instances
├── scripts/ # Initialization scripts
│ ├── control-plane-init.sh # Control plane bootstrap
│ └── worker-init.sh # Worker node bootstrap
├── app-deployment/ # Example Kubernetes manifests
│ ├── test-apps.yaml # Sample nginx + whoami deployments
│ └── ingress.yaml # Traefik ingress configuration
└── important/ # Configuration files
├── traefik-values.yaml # Traefik Helm chart values
└── values.yaml # Additional Helm values
Clone this repository and customize variables:
cd terraform/
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your values:
# - aws_region: AWS region for deployment
# - cluster_name: Name for your RKE2 cluster
# - control_plane_count: Number of control plane nodes (default: 3)
# - worker_count: Number of worker nodes (default: 3)
# - ssh_public_key_path: Path to your SSH public key# Initialize Terraform
terraform init
# Review planned infrastructure
terraform plan
# Deploy cluster
terraform applyExpected time: 10-15 minutes
After Terraform completes:
# Get NLB endpoint
terraform output nlb_endpoint
# Retrieve kubeconfig
terraform output -raw kubeconfig > ~/.kube/rke2-cluster.yaml
export KUBECONFIG=~/.kube/rke2-cluster.yaml
# Verify cluster is ready
kubectl get nodes
kubectl get pods -AThe Traefik ingress controller is automatically deployed during cluster initialization.
# Check Traefik deployment
kubectl -n kube-system get deployment traefik
# Get NLB endpoint
NLB_ENDPOINT=$(terraform output -raw nlb_endpoint)
echo "Access your cluster at: http://$NLB_ENDPOINT"# Create namespace
kubectl create namespace my-apps
# Deploy test applications (nginx + whoami)
kubectl apply -f app-deployment/test-apps.yaml
# Deploy Traefik ingress routes
kubectl apply -f app-deployment/ingress.yaml
# Test endpoints
curl http://$NLB_ENDPOINT/nginx
curl http://$NLB_ENDPOINT/whoami| Component | Configuration |
|---|---|
| RKE2 Version | v1.34.6+rke2r1 (customizable in terraform.tfvars) |
| Control Planes | 3 nodes across AZs (HA etcd cluster) |
| Workers | 3+ nodes (adjustable in terraform.tfvars) |
| Instance Types | t3.medium (control plane & workers) |
| Network | Custom VPC (10.0.0.0/16) with public subnets |
| Ingress Controller | Traefik v2.x (NLB-backed LoadBalancer service) |
| Cloud Integration | AWS CCM for volumes & load balancers |
- Automatic Volume Provisioning: EBS volumes provisioned and attached automatically
- Load Balancer Integration: Services with
type: LoadBalancercreate AWS NLBs/CLBs - Node Health Management: Automatically marks unhealthy nodes as NotReady
- IAM-based Authentication: Uses instance IAM roles (no hardcoded credentials)
Traefik is configured to:
- Service Type: LoadBalancer (creates NLB in AWS)
- Replicas: 2 for HA (configurable in
important/traefik-values.yaml) - Entrypoints:
- HTTP (port 80)
- HTTPS (port 443, requires cert configuration)
- Default Ingress Class: Traefik is the default ingress controller
Define ingress routes using Traefik annotations or Kubernetes Ingress resources:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-ingress
namespace: my-apps
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
# Add Traefik-specific middleware, load balancer config, etc.
spec:
ingressClassName: traefik
rules:
- host: your-domain.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 8080For path rewriting with Traefik, use Middleware resources:
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: strip-prefix
namespace: my-apps
spec:
stripPrefix:
prefixes:
- /api# Get node IPs from Terraform output or AWS console
NODE_IP="10.0.1.100" # Replace with actual IP
ssh -i ~/.ssh/your-key.pem ubuntu@$NODE_IP# Set kubeconfig
export KUBECONFIG=~/.kube/rke2-cluster.yaml
# View cluster nodes
kubectl get nodes
# View all pods
kubectl get pods -A
# View Traefik service
kubectl -n kube-system get svc traefik
# View Traefik logs
kubectl -n kube-system logs -f deployment/traefik# Check AWS CCM pods
kubectl -n kube-system get pods | grep aws
# View node provider IDs (set by AWS CCM)
kubectl get nodes -o json | jq '.items[].spec.providerID'- Enable HTTPS/TLS certificates (Cert-Manager + Let's Encrypt)
- Implement RBAC policies for least privilege access
- Use a private VPC with bastion host for SSH access
- Enable VPC Flow Logs for network monitoring
- Encrypt Kubernetes secrets at rest (etcd encryption)
- Configure cluster autoscaling for workers
- Use pod disruption budgets for critical workloads
- Implement cross-AZ pod affinity rules
- Monitor etcd health and replication
- Deploy Prometheus + Grafana for metrics
- Configure CloudWatch agent for OS-level monitoring
- Set up centralized logging (ELK, Loki, etc.)
- Enable AWS CloudTrail for audit logging
- Right-size EC2 instances based on workload
- Use Reserved Instances or Savings Plans
- Clean up unused EBS volumes and load balancers
- Monitor AWS billing and set up alerts
# Check node status
kubectl get nodes
# Check system pod status
kubectl get pods -n kube-system
# View node logs
kubectl describe node <node-name>
# SSH and check RKE2 logs
ssh ubuntu@<node-ip>
sudo journalctl -u rke2-server -f # Control plane
sudo journalctl -u rke2-agent -f # Worker# Verify Traefik is running
kubectl -n kube-system get pods -l app.kubernetes.io/name=traefik
# Check ingress resource
kubectl -n my-apps describe ingress test-ingress
# Verify service endpoints
kubectl -n my-apps get endpoints
# Test service connectivity from pod
kubectl -n my-apps run -it test-pod --image=curlimages/curl -- sh
curl http://nginx-svc:80# Check Traefik service has external IP
kubectl -n kube-system get svc traefik
# Verify security groups allow port 80/443
# - Check AWC console > Security Groups
# - Ensure inbound rules allow 80/443 from 0.0.0.0/0
# Check NLB health checks
# AWS Console > EC2 > Load Balancers > Select NLB > Target Groups# Check AWS CCM logs
kubectl -n kube-system logs -f deployment/aws-cloud-controller-manager
# Verify IAM role permissions
# Attached policy should include:
# - ec2:DescribeInstances
# - ec2:DescribeRegions
# - ec2:DescribeVolumes
# - ec2:CreateTags
# - etc.To destroy the entire cluster and AWS resources:
cd terraform/
terraform destroyWarning: This will permanently delete all resources including:
- EC2 instances
- VPC and networking
- Load balancers
- EBS volumes
- All Kubernetes workloads
- Simplified HA control plane setup
- Built-in Kubernetes components
- Single binary deployment
- Better resource efficiency than kubeadm on HA clusters
- Lightweight ingress controller
- Excellent Kubernetes integration
- Support for advanced routing (via CRDs)
- Built-in metrics and observability
- Lower resource footprint vs NGINX/HAProxy
- Seamless AWS integration without custom provisioning
- Automatic EBS volume attachment
- AWS load balancer service type support
- Instance metadata-based IAM auth (secure, no credentials in code)
- RKE2 Documentation
- Traefik Kubernetes Documentation
- AWS Cloud Controller Manager
- Terraform AWS Provider
This project is provided as-is for educational and reference purposes.
Last Updated: April 2026
Tested With: RKE2 v1.34.6, Traefik v2.x, Terraform 1.5+