Kubernetes HA
4 minute read
Some of these date back to older versions but efforts are made to keep the most important - sometimes :)
LambdaStack Kubernetes HA design document
Affected version: 0.6.x
1. Goals
Provide highly-available control-plane version of Kubernetes.
2. Cluster components
2.1 Load balancer
2.1.1 External
Kubernetes HA cluster needs single TCP load-balancer to communicate from nodes to masters and from masters to masters (all internal communication has to go through the load-balancer).
PROS:
- standard solution
CONS:
- it's not enough just to create one instance of such load-balancer, it needs failover logic (like virtual IP), so in the end for fully highly-available setup we need automation for whole new service
- requires additional dedicated virtual machines (at least 2 for HA) even in the case of single-control-plane cluster
- probably requires infrastructure that can handle virtual IP (depending on the solution for failover)
2.1.2 Internal
Following the idea from kubespray's HA-mode we can skip creation of dedicated external load-balancer (2.1.1).
Instead, we can create identical instances of lightweight load-balancer (like HAProxy) on each master and each kubelet node.
PROS:
- no need for creation of dedicated load-balancer clusters with failover logic
- since we could say that internal load-balancer is replicated, it seems to be highly-available by definition
CONS:
- increased network traffic
- longer provisioning times as (in case of any changes in load-balancer's configs) provisioning needs to touch every node in the cluster (master and kubelet node)
- debugging load-balancer issues may become slightly harder
2.2 Etcd cluster
2.2.1 External
PROS:
- in the case of high network / system load external etcd cluster deployed on dedicated premium quality virtual machines will behave more stable
CONS:
- requires automation for creation and distribution of etcd's server and client PKI certificates
- upgrading etcd is difficult and requires well-tested autmation that works on multiple nodes at once in perfect coordination - in the case when etcd's quorum fails, it is unable to auto-heal itself and it requires to be reconstructed from scratch (where data loss or discrepancy seems to be likely)
2.2.2 Internal
PROS:
- adding / removing etcd nodes is completely automated and behaves as expected (via kubeadm)
- etcd's PKI is automatically re-distributed during joining new masters to control-plane
CONS:
- etcd is deployed in containers alongside other internal components, which may impact its stability when system / network load is high
- since etcd is containerized it may be prone to docker-related issues
3. Legacy single-master solution
After HA logic is implemented, it is probably better to reuse new codebase also for single-master clusters.
In the case of using internal load-balancer (2.1.2) it makes sense to use scaled-down (to single node) HA cluster (with single-backended load-balancer) and drop legacy code.
4. Use cases
The LambdaStack delivers highly-available Kubernetes clusters deploying them across multiple availability zones / regions to increase stability of production environments.
5. Example use
kind: lambdastack-cluster
title: "LambdaStack Cluster Config"
provider: any
name: "k8s1"
build_path: # Dynamically built
specification:
name: k8s1
admin_user:
name: ubuntu
key_path: id_ed25519
path: # Dynamically built
components:
kubernetes_master:
count: 3
machines:
- default-k8s-master1
- default-k8s-master2
- default-k8s-master3
kubernetes_node:
count: 2
machines:
- default-k8s-node1
- default-k8s-node2
logging:
count: 0
monitoring:
count: 0
kafka:
count: 0
postgresql:
count: 0
load_balancer:
count: 0
rabbitmq:
count: 0
---
kind: infrastructure/machine
provider: any
name: default-k8s-master1
specification:
hostname: k1m1
ip: 10.10.1.148
---
kind: infrastructure/machine
provider: any
name: default-k8s-master2
specification:
hostname: k1m2
ip: 10.10.2.129
---
kind: infrastructure/machine
provider: any
name: default-k8s-master3
specification:
hostname: k1m3
ip: 10.10.3.16
---
kind: infrastructure/machine
provider: any
name: default-k8s-node1
specification:
hostname: k1c1
ip: 10.10.1.208
---
kind: infrastructure/machine
provider: any
name: default-k8s-node2
specification:
hostname: k1c2
ip: 10.10.2.168
6. Design proposal
As for the design proposal, the simplest solution is to take internal load-balancer (2.1.2) and internal etcd (2.2.2) and merge them together, then carefully observe and tune network traffic comming from haproxy instances for big number of worker nodes.
Example HAProxy config:
global
log /dev/log local0
log /dev/log local1 notice
daemon
defaults
log global
retries 3
maxconn 2000
timeout connect 5s
timeout client 120s
timeout server 120s
frontend k8s
mode tcp
bind 0.0.0.0:3446
default_backend k8s
backend k8s
mode tcp
balance roundrobin
option tcp-check
server k1m1 10.10.1.148:6443 check port 6443
server k1m2 10.10.2.129:6443 check port 6443
server k1m3 10.10.3.16:6443 check port 6443
Example ClusterConfiguration:
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: v1.14.6
controlPlaneEndpoint: "localhost:3446"
apiServer:
extraArgs: # https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
audit-log-maxbackup: "10"
audit-log-maxsize: "200"
audit-log-path: "/var/log/apiserver/audit.log"
enable-admission-plugins: "AlwaysPullImages,DenyEscalatingExec,NamespaceLifecycle,ServiceAccount,NodeRestriction"
profiling: "False"
controllerManager:
extraArgs: # https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/
profiling: "False"
terminated-pod-gc-threshold: "200"
scheduler:
extraArgs: # https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/
profiling: "False"
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
certificatesDir: /etc/kubernetes/pki
To deploy first master run (Kubernetes 1.14):
$ sudo kubeadm init --config /etc/kubernetes/kubeadm-config.yml --experimental-upload-certs
To add one more master run (Kubernetes 1.14):
$ sudo kubeadm join localhost:3446 \
--token 932b4p.n6teb53a6pd1rinq \
--discovery-token-ca-cert-hash sha256:bafb8972fe97c2ef84c6ac3efd86fdfd76207cab9439f2adbc4b53cd9b8860e6 \
--experimental-control-plane --certificate-key f1d2de1e5316233c078198a610c117c65e4e45726150d63e68ff15915ea8574a
To remove one master run (it will properly cleanup config inside Kubernetes - do not use kubectl delete node
):
$ sudo kubeadm reset --force
In later versions (Kubernetes 1.17) this feature became stable and "experimental" word in the commandline paremeters was removed.
7. Post-implementation erratum
- It turned out, that init-phase upload-certs does not take into account etcd encryption feature and does not copy such configuration to newly joined masters.
- Instead, for consistency, master joining has been implemented via automated replication of Kubernetes PKI in ansible.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.