Building a Home Lab Kubernetes Cluster with GitOps
This post covers the architecture of my home lab Kubernetes cluster built on Raspberry Pi hardware. The cluster is powered by K3s and fully managed through GitOps using ArgoCD. Every piece of infrastructure is defined in code and reconciled automatically from git.
Hardware Topology
The cluster consists of four Raspberry Pi nodes:
- Control Planes (server nodes): 10.0.0.31, 10.0.0.32, 10.0.0.33
- Worker Node: 10.0.0.34
- Virtual IP (kube-vip): 10.0.0.30
All four nodes run workloads - the control plane nodes are not tainted and can schedule applications in addition to running the Kubernetes control plane. This provides additional capacity for running services. kube-vip provides a single virtual IP (10.0.0.30) for the API server in an HA configuration.
Provisioning with K3SUP
The cluster is provisioned using k3sup, a tool that automates K3s installation. On the first control plane, run:
k3sup install \
--ip "10.0.0.31" \
--user "rush" \
--ssh-key "~/path/to/your-ssh-key.pem" \
--cluster \
--k3s-version "v1.34.1+k3s1" \
--tls-san "10.0.0.30" \
--k3s-extra-args "--disable traefik --disable servicelb --disable local-storage"
Key decisions in the installation:
- K3s version: v1.34.1+k3s1 (stable channel) - Just pick a release
- TLS SAN: The virtual IP (10.0.0.30) is added as a Subject Alternative Name so the API server certificate is valid for the VIP (Kube-VIP)
- Disabled components: Traefik, ServiceLB (MetalLB), and local-storage are disabled because we install our own (Traefik for ingress, MetalLB for load balancing, Longhorn for storage)
After the first control plane is initialized, subsequent control planes and workers join using the k3sup join command. The exact command varies slightly depending on whether you’re joining a control plane or worker node. Here’s an example for joining a control plane:
k3sup join \
--host "10.0.0.32" \
--user "rush" \
--ssh-key "~/path/to/your-ssh-key.pem" \
--server-host "https://10.0.0.31:6443" \
--server \
--k3s-version "v1.34.1+k3s1"
And for worker nodes:
k3sup join \
--host "10.0.0.34" \
--user "rush" \
--ssh-key "~/path/to/your-ssh-key.pem" \
--server-host "https://10.0.0.31:6443" \
--k3s-version "v1.34.1+k3s1"
Wait for each node to become ready before proceeding:
kubectl wait node <node-name> --for condition=ready --timeout=240s
The GitOps Bootstrap
This is the most critical part of the architecture. The cluster uses ArgoCD to manage itself, but how does ArgoCD get installed in the first place?
Step 1: Manual Helm Install
Once all nodes are joined and ready, install ArgoCD via Helm:
helm install argocd argo/argo-cd \
-n argocd \
--create-namespace
Wait for ArgoCD to be ready:
kubectl wait deployment --all -n argocd --for=condition=Available --timeout=300s
Step 2: Set Admin Password
ArgoCD’s default password is generated at install time. Patch the secret to set your own:
kubectl -n argocd patch secret argocd-secret \
-p '{"stringData": {
"admin.password": "$2a$10$YOUR_HASH_HERE"
}}'
To generate a hash, you can use:
echo -n "your-password" | htpasswd -Bnc /dev/stdout | cut -d: -f2
Step 3: Configure ArgoCD to Watch Your Repo
You need to create a Secret with SSH credentials so ArgoCD can clone your Git repository, then create the “core” Application:
apiVersion: v1
kind: Secret
metadata:
name: your-repo
namespace: argocd
labels:
argocd.argoproj.io/secret-type: repository
stringData:
url: git@github.com:yourusername/your-repo.git
sshPrivateKey: |
-----BEGIN OPENSSH PRIVATE KEY-----
your-private-key
-----END OPENSSH PRIVATE KEY-----
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: core
namespace: argocd
spec:
project: default
source:
repoURL: 'git@github.com:yourusername/your-repo.git'
path: path/to/your/app_of_apps
targetRevision: HEAD
directory:
recurse: true
destination:
server: 'https://kubernetes.default.svc'
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
This Application tells ArgoCD to watch the app_of_apps/ folder in your repository. Every YAML file in that folder defines an ArgoCD Application (or any Kubernetes resource), and ArgoCD will reconcile them to the cluster.
This creates the GitOps loop: The cluster bootstraps itself, then ArgoCD takes over and ensures the cluster state always matches what’s in Git.
Core Infrastructure Components
Each infrastructure component is defined as an ArgoCD Application with a sync-wave annotation to control deployment order:
annotations:
argocd.argoproj.io/sync-wave: "-50"
Lower (more negative) numbers deploy first. Here’s the order:
| Wave | Component | Purpose |
|---|---|---|
| -100 | namespaces | namespaces must exist before the apps |
| -60 | Longhorn | Distributed storage |
| -50 | MetalLB | Load balancer |
| -40 | kube-vip | Virtual IP |
| -30 | Traefik | Ingress controller |
| -20 | tinyauth | Auth Middleware |
| 10+ | Applications | Non-Core Apps like this blog |
kube-vip
kube-vip provides a virtual IP for the Kubernetes API server, enabling HA control planes. It’s deployed as a DaemonSet that runs on every node:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: kube-vip
annotations:
argocd.argoproj.io/sync-wave: "-39"
spec:
chart: kube-vip
repoURL: https://kube-vip.github.io/helm-charts
helm:
valuesObject:
config:
address: "10.0.0.30"
The VIP (10.0.0.30) is used as the endpoint for kubectl and all cluster communication.
MetalLB
MetalLB provides load balancing on bare metal (replacing the K3s built-in ServiceLB). It uses L2 mode (ARP) to announce the VIP:
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: metallb-pool
namespace: metallb
spec:
addresses:
- 10.0.0.60-10.0.0.62
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: metallb-l2
namespace: metallb
The address pool (10.0.0.60-10.0.0.62) is used for Kubernetes Services of type LoadBalancer.
Traefik
Traefik replaces the default K3s Traefik installation. It’s deployed via Helm with:
- Let’s Encrypt certificate resolver for automatic TLS
- HTTP to HTTPS redirect
- Custom buffer sizes for streaming workloads
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: dashboard
namespace: traefik
spec:
entryPoints:
- websecure
routes:
- match: Host(`traefik.example.com`)
kind: Rule
middlewares:
- name: tinyauth
namespace: tinyauth
services:
- name: api@internal
kind: TraefikService
tls:
certResolver: letsencrypt
Longhorn
Longhorn provides distributed block storage, essential for persistent workloads on a Pi cluster:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: longhorn
annotations:
argocd.argoproj.io/sync-wave: "-59"
spec:
chart: longhorn
repoURL: https://charts.longhorn.io
helm:
valuesObject:
preUpgradeChecker:
jobEnabled: false
Longhorn requires iSCSI, which must be installed on each node before deploying Longhorn:
# Run on each node
ssh user@node 'sudo apt-get update && sudo apt-get -y install open-iscsi'
ssh user@node 'sudo systemctl enable iscsid open-iscsi'
ssh user@node 'sudo systemctl restart iscsid open-iscsi'
tinyauth
tinyauth provides HTTP Basic authentication middleware for Traefik. It’s used to protect internal services:
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: tinyauth
namespace: tinyauth
spec:
forwardAuth:
address: https://auth.yourdomain.com/api/auth/traefik
Services that need authentication include the tinyauth middleware in their IngressRoute:
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: app-ingress
namespace: app
annotations:
argocd.argoproj.io/sync-wave: "6"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
entryPoints:
- websecure
routes:
- match: Host(`app.yourdomain.com`)
kind: Rule
services:
- name: app
port: 1234
middlewares:
- name: tinyauth
namespace: tinyauth
tls:
certResolver: letsencrypt
Reconciliation Flow
The GitOps reconciliation works as follows:
- Push to Git: A change is made to any file in your
app_of_apps/folder - ArgoCD detects: ArgoCD watches the repo and sees the change
- Sync: ArgoCD applies the change to the cluster
- Self-heal: If someone manually changes a resource that ArgoCD manages, ArgoCD will revert it (if
selfHeal: trueis set)
The app_of_apps/ folder uses a recursive directory structure - each subdirectory contains an app.yaml that defines an ArgoCD Application. This allows adding new infrastructure components simply by creating a new folder with an application manifest.
Destroy and Rebuild
To destroy the cluster, run the following on each node:
# On control plane nodes
ssh user@node '/usr/local/bin/k3s-uninstall.sh'
# On worker nodes
ssh user@node '/usr/local/bin/k3s-agent-uninstall.sh'
This removes K3s from each node while leaving the operating system intact. Delete your local kubeconfig file as well.
To rebuild, run the k3sup commands again, then re-install ArgoCD and apply your core Application. Since ArgoCD is the source of truth, it will restore all infrastructure and applications to their desired state.
Summary
This architecture provides:
- High availability: 3 control planes with kube-vip
- GitOps: Every piece of infrastructure is defined in Git, ArgoCD reconciles automatically
- Self-healing: Manual changes are reverted, cluster state always matches Git
- Repeatable: Destroy and rebuild in minutes
- Bare-metal capable: MetalLB for load balancing, Longhorn for storage
The cluster has been running reliably with this setup for years, and adding new services is as simple as adding a new YAML file to the repository.