loading...

Kubernetes – HA best practices

In order to build HA Kubernetes systems, it’s important to note that availability is as often a function of people and process as it is a failure in technology. While hardware and software fails often, humans and their involvement in the process is a very predictable drag on the availability of all systems.

It’s important to note that this book won’t get into how to design a microservices architecture for failure, which is a huge part of coping with some (or all) system failures in a cluster scheduling and networking system such as Kubernetes.

There’s another important concept that’s important to consider: graceful degradation.

Graceful degradation is the idea that you build functionality in layers and modules, so even with the catastrophic failure of some pieces of the system, you’re still able to provide some level of availability. There is a corresponding term for the progressive enhancement that’s followed in web design, but we won’t be using that pattern here. Graceful degradation is an outcome of the condition of a system having fault tolerance, which is very desirable for mission critical and customer-facing systems.

In Kubernetes, there are two methods of graceful degradation:

  • Infrastructure degradation: This kind of degradation relies on complex algorithms and software in order to handle unpredictable failure of hardware, or software-defined hardware (think virtual machines, Software-Defined Networking (SDN), and so on). We’ll explore how to make the essential components of Kubernetes highly available in order to provide graceful degradation in this form.
  • Application degradation: While this is largely determined by the aforementioned strategies of microservice best practice architectures, we’ll explore several patterns here that will enable your users to be successful.

In each of these scenarios, we’re aiming to provide as much full functionality as possible to the end user, but if we have a failure of application, Kubernetes components, or underlying infrastructure, the goal should be to give some level of access and availability to the users. We’ll strive to abstract away completely underlying infrastructure failure using core Kubernetes strategies, while we’ll build caching, failover, and rollback mechanisms in order to deal with application failure. Lastly, we’ll build out Kubernetes components in a highly available fashion.

Anti-fragility

Before we dig into these items, it makes sense to step back and consider the larger concept of anti-fragility, which Nassim Nicholas Taleb discusses in his book Antifragility.

To read more about Taleb’s book, check out his book’s home page at https://www.penguinrandomhouse.com/books/176227/antifragile-by-nassim-nicholas-taleb/9780812979688/.

There are a number of key concepts that are important to reinforce as we cope with the complexity of the Kubernetes system, and in how we leverage the greater Kubernetes ecosystem in order to survive and strive.

First, redundancy is key. In order to cope with system failure across the many layers of a system, it’s important to build redundant and failure tolerant parts into the system. These redundant layers can utilize algorithms such as Raft consensus, which aims to provide a control plane for multiple objects to agree in a fault-tolerant distributed system. Redundancy of this type relies on N+1 redundancy in order to cope with physical or logical object loss.

We’ll take a look at etcd in a bit to explore redundancy.

Second, triggering, coping with, exploring, and remediating failure scenarios is key. You’ll need to forcefully cause your Kubernetes system to fail in order to understand how it behaves at the limit, or in corner cases. Netflix’s Chaos Monkey is a standard and well-worn approach to testing complex system reliability.

You can read more about Netflix’s Chaos Monkey here: https://github.com/Netflix/chaosmonkey.

Third, we’ll need to make sure that the correct patterns are available to our systems, and that we implement the correct patterns in order to build anti-fragility into Kubernetes. Retry logic, load balancing, circuit breakers, timeouts, health checks, and concurrent connection checks are key items for this dimension of anti-fragility. Istio and other service meshes are advanced players in this topic.

You can read more about Istio and how to manage traffic here: https://istio.io/docs/concepts/traffic-management/.

HA clusters

In order to create Kubernetes clusters which can fight against the patterns of anti-fragility and to increase the uptime of our cluster, we can create highly available clusters using the core components of the system. Let’s explore the two main methods of setting up highly available Kubernetes clusters. Let’s look at what you get from the major cloud service providers when you spin up a Kubernetes cluster with them first.

HA features of the major cloud service providers

What are the pieces of Kubernetes that need to be high availability in order to achieve the five nines of uptime for your infrastructure? For one, you should consider how much the cloud service provider (CSP) does for you on the backend.

For Google Kubernetes Engine (GKE), nearly all of the components are managed out of the box. You don’t have to worry about the manager nodes or any cost associated with them. GKE also has the most robust autoscaling functionality currently. Azure Kubernetes Service (AKS) and Amazon Elastic Kubernetes Service (EKS) both use a self-managed autoscaling function, which means that you’re in charge of managing the scale out of your cluster by using autoscaling groups.

GKE is also able to handle automatic updates to the management nodes without user intervention, but also offers a turnkey automatic update along with AKS so that the operator can choose when seamless upgrade happens. EKS is still working out those details.

EKS provides highly available master/worker nodes across multiple Availability Zones (AZ), while GKE offers something similar in their regional mode, which is akin to AWS’s regions. AKS currently does not provide HA for the master nodes, but the worker nodes in the cluster are spread across multiple AZ in order to provide HA.

HA approaches for Kubernetes

If you’re going to be running Kubernetes outside of a hosted PaaS, you’ll need to adopt one of two strategies for running an HA cluster for Kubernetes. In this chapter, we’ll go through an example with Stacked masters, and will describe the more complex external etcd cluster method.

In this method, you’ll combine etcd and manager (control plane) nodes in order to reduce the amount of infrastructure required to run your cluster. This means that you’ll need at least three machines in order to achieve HA. If you’re running in the cloud, that also means you’ll need to spread your instances across three availability zones in order to take advantage of the uptime provided by spreading your machines across zones.

Stacked masters is going to look like this in your architectural diagrams:

The second option you have builds in more potential availability in exchange for infrastructure complexity. You can use an external etcd cluster in order to create separation for the control plane and the ectd members, further increasing your potential availability. A setup in this manner will require a bare minimum of six servers, also spread across availability zones, as in the first example:

In order to achieve either of these methods, you’ll need some prerequisites.

Prerequisites

As mentioned in the preceding section, you’ll need three machines for the masters, three machines for the workers, and an extra three machines for the external etcd cluster if you’re going to go down that route.

Here are the minimum requirements for the machines – you should have one of the following operating systems:

  • Ubuntu 16.04+
  • Debian 9
  • CentOS 7
  • RHEL 7
  • Fedora 25/26 (best-effort)
  • Container Linux (tested with 1576.4.0)

On each of the machines, you’ll need 2 GB or more of RAM per machine, two or more CPUs, and full network connectivity between all machines in the cluster (a public or private network is fine). You’ll also need a unique hostname, MAC address, and a product_uuid for every node.

If you’re running in a managed network of any sort (datacenter, cloud, or otherwise), you’ll also need to ensure that the required security groups and ports are open on your machines. Lastly, you’ll need to disable swap in order to get a working kubelet.

For a list of required open ports, check out https://kubernetes.io/docs/setup/independent/install-kubeadm/#check-required-ports.

In some cloud providers, virtual machines may share identical product_uuids, though it’s unlikely that they’ll share identical MAC addresses. It’s important to check what these are, because Kubernetes networking and Calico will use these as unique identifiers, and we’ll see errors if they’re the same. You can check both with the following commands:

LANG=C ifconfig -a | grep -Po 'HWaddr \K.*$'

The preceding command will get you the MAC address, while the following command will tell you the uuid:

sudo cat /sys/class/dmi/id/product_uuid

Setting up

Now, let’s start setting up the machines.

You’ll need to run all of the commands here on a control plane node, and as root.

First, you’ll need to set up SSH. Calico will be setting up your networking, so we’ll use the IP address of your machine in order to get started with this process. Keep in mind that Kubernetes networking has three basic layers:

  • The containers and pods that run on your nodes, which are either virtual machines or hardware servers.
  • Services, which are an aggregation and abstraction layer that lets you use the various Kubernetes controllers to set up your applications and ensure that your pods are scheduled according to its availability needs.
  • Ingress, which allows traffic from outside of your cluster and are routed to the right container.

So, we need to set up Calico in order to deal with these different layers. You’ll need to get your node’s CIDR address, which we recommend being installed as Calico for this example.

You can find more information on the CNI network documentation at https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#pod-network.

You’ll need to make sure that the SSH agent on the configuration machine has access to all of the other nodes in the cluster. Turn on the agent, and then add our identity to the session:

eval $(ssh-agent)
ssh-add ~/.ssh/id_rsa

You can test to make sure that this is working correctly by using the -A flag, which preserves your identity across an SSH tunnel. Once you’re on another node, you can use the -E flag to preserve the environment:

sudo -E -s

Next, we’ll need to put a load balancer from our cloud environment in front of the kube-apiserver. This will allow your cluster’s API server remain reachable in the case of one of the machines going down or becoming unresponsive. For this example, you should use a TCP capable load balancer such as an Elastic Load Balancer (AWS), Azure Load Balancer (Azure), or a TCP/UDP Load Balancer (GCE).

Make sure that your load balancer is resolvable via DNS, and that you set a health check that listens on the kube-apiserver port at 6443. You can test the connection to the API server once the load balancer is in place with nc -v LB_DNS_NAME PORT. Once you have the cloud load balancer set up, make sure that all of the control plane nodes are added to it.

Stacked nodes

In order to run a set of stack nodes, you’ll need to bootstrap the first control plane node with a kubeadm-conf-01.yaml template. Again, this example is using Calico, but you can configure the networking as you please. You’ll need to substitute the following values with your own in order to make the example work:

  • LB_DNS
  • LB_PORT
  • CONTROL01_IP
  • CONTROL01_HOSTNAME

Open up a new file, kubeadm-conf-01.yaml, with your favorite IDE:

apiVersion: kubeadm.k8s.io/v1alpha2
kind: MasterConfiguration
kubernetesVersion: v1.11.0
apiServerCertSANs:
- "LB_DNS"
api:
   controlPlaneEndpoint: "LB_DNS:LB_PORT"
etcd:
 local:
   extraArgs:
     listen-client-urls: "https://127.0.0.1:2379,https://CONTROL01_IP:2379"
     advertise-client-urls: "https://CONTROL01_IP:2379"
     listen-peer-urls: "https://CONTROL01_IP:2380"
     initial-advertise-peer-urls: "https://CONTROL01_IP:2380"
     initial-cluster: "CONTROL01_HOSTNAME=https://CONTROL01_IP:2380"
   serverCertSANs:
     - CONTROL01_HOSTNAME
     - CONTROL01_IP
   peerCertSANs:
     - CONTROL01_HOSTNAME
     - CONTROL01_IP
networking:
   podSubnet: "192.168.0.0/16"

Once you have this file, execute it with the following command:

kubeadm init --config kubeadm-conf-01.yaml

Once this command is complete, you’ll need to copy the following list of certificates and files to the other control plane nodes:

/etc/kubernetes/pki/ca.crt
/etc/kubernetes/pki/ca.key
/etc/kubernetes/pki/sa.key
/etc/kubernetes/pki/sa.pub
/etc/kubernetes/pki/front-proxy-ca.crt
/etc/kubernetes/pki/front-proxy-ca.key
/etc/kubernetes/pki/etcd/ca.crt
/etc/kubernetes/pki/etcd/ca.key
/etc/kubernetes/admin.conf

In order to move forward, we’ll need to add another template file on our second node to create the second stacked node under kubeadm-conf-02.yaml. Like we did previously, you’ll need to replace the following values with your own:

  • LB_DNS
  • LB_PORT
  • CONTROL02_IP
  • CONTROL02_HOSTNAME

Open up a new file, kubeadm-conf-02.yaml, with your favorite IDE:

apiVersion: kubeadm.k8s.io/v1alpha2
kind: MasterConfiguration
kubernetesVersion: v1.11.0
apiServerCertSANs:
- "LOAD_BALANCER_DNS"
api:
   controlPlaneEndpoint: "LB_DNS:LB_PORT"
etcd:
 local:
   extraArgs:
     listen-client-urls: "https://127.0.0.1:2379,https://CONTROL02_IP:2379"
     advertise-client-urls: "https://CONTROL02_IP:2379"
     listen-peer-urls: "https://CONTROL02_IP:2380"
     initial-advertise-peer-urls: "https://CONTROL01_IP:2380"
     initial-cluster: "CONTROL01_HOSTNAME=https://CONTROL01_IP:2380,CONTROL02_HOSTNAME=https://CONTROL02_IP:2380"
     initial-cluster-state: existing
   serverCertSANs:
     - CONTROL02_HOSTNAME
     - CONTROL02_IP
   peerCertSANs:
     - CONTROL02_HOSTNAME
     - CONTROL02_IP
networking:
   podSubnet: "192.168.0.0/16"

Before running this template, you’ll need to move the copied files over to the correct directories. Here’s an example that should be similar on your system:

 mkdir -p /etc/kubernetes/pki/etcd
 mv /home/${USER}/ca.crt /etc/kubernetes/pki/
 mv /home/${USER}/ca.key /etc/kubernetes/pki/
 mv /home/${USER}/sa.pub /etc/kubernetes/pki/
 mv /home/${USER}/sa.key /etc/kubernetes/pki/
 mv /home/${USER}/front-proxy-ca.crt /etc/kubernetes/pki/
 mv /home/${USER}/front-proxy-ca.key /etc/kubernetes/pki/
 mv /home/${USER}/etcd-ca.crt /etc/kubernetes/pki/etcd/ca.crt
 mv /home/${USER}/etcd-ca.key /etc/kubernetes/pki/etcd/ca.key
 mv /home/${USER}/admin.conf /etc/kubernetes/admin.conf

Once you’ve copied those files over, you can run a series of kubeadm commands to absorb the certificates, and then bootstrap the second node:

kubeadm alpha phase certs all --config kubeadm-conf-02.yaml
kubeadm alpha phase kubelet config write-to-disk --config kubeadm-conf-02.yaml
kubeadm alpha phase kubelet write-env-file --config kubeadm-conf-02.yaml
kubeadm alpha phase kubeconfig kubelet --config kubeadm-conf-02.yaml
systemctl start kubelet

Once that’s complete, you can add the node to the etcd as well. You’ll need to set some variables first, along with the IPs of the virtual machines running your nodes:

export CONTROL01_IP=<YOUR_IP_HERE>
export CONTROL01_HOSTNAME=cp01H
export CONTROL02_IP=<YOUR_IP_HERE>
export CONTROL02_HOSTNAME=cp02H

Once you’ve set up those variables, run the following kubectl and kubeadm commands. First, add the certificates:

export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl exec -n kube-system etcd-${CONTROL01_HOSTNAME} -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://${CONTROL01_IP}:2379 member add ${CONTROL02_HOSTNAME} https://${CP1_IP}:2380

Next, phase in the configuration for etcd:

kubeadm alpha phase etcd local --config kubeadm-config-02.yaml

This command will cause the etcd cluster to become unavailable for a short period of time, but that is by design. You can then deploy the remaining components in the kubeconfig and controlplane, and then mark the node as a master:

kubeadm alpha phase kubeconfig all --config kubeadm-conf-02.yaml
kubeadm alpha phase controlplane all --config kubeadm-conf-02.yaml
kubeadm alpha phase mark-master --config kubeadm-conf-02.yaml

We’ll run through this once more with the third node, adding more value to the initial cluster under etcd’s extraArgs.

You’ll need to create a third kubeadm-conf-03.yaml file on the third machine. Follow this template and substitute the variables, like we did previously:

apiVersion: kubeadm.k8s.io/v1alpha2
kind: MasterConfiguration
kubernetesVersion: v1.11.0
apiServerCertSANs:
- "LB_DNS"
api:
   controlPlaneEndpoint: "LB_DNS:LB_PORT"
etcd:
 local:
   extraArgs:
     listen-client-urls: "https://127.0.0.1:2379,https://CONTROL03_IP:2379"
     advertise-client-urls: "https://CONTROL03_IP:2379"
     listen-peer-urls: "https://CONTROL03_IP:2380"
     initial-advertise-peer-urls: "https://CONTROL03_IP:2380"
     initial-cluster: "CONTRL01_HOSTNAME=https://CONTROL01_IP:2380,CONTROL02_HOSTNAME=https://CONTROL02_IP:2380,CONTRL03_HOSTNAME=https://CONTROL03_IP:2380"
     initial-cluster-state: existing
   serverCertSANs:
     - CONTRL03_HOSTNAME
     - CONTROL03_IP
   peerCertSANs:
     - CONTRL03_HOSTNAME
     - CONTROL03_IP
networking:
   podSubnet: "192.168.0.0/16"

You’ll need to move the files again:

 mkdir -p /etc/kubernetes/pki/etcd
 mv /home/${USER}/ca.crt /etc/kubernetes/pki/
 mv /home/${USER}/ca.key /etc/kubernetes/pki/
 mv /home/${USER}/sa.pub /etc/kubernetes/pki/
 mv /home/${USER}/sa.key /etc/kubernetes/pki/
 mv /home/${USER}/front-proxy-ca.crt /etc/kubernetes/pki/
 mv /home/${USER}/front-proxy-ca.key /etc/kubernetes/pki/
 mv /home/${USER}/etcd-ca.crt /etc/kubernetes/pki/etcd/ca.crt
 mv /home/${USER}/etcd-ca.key /etc/kubernetes/pki/etcd/ca.key
 mv /home/${USER}/admin.conf /etc/kubernetes/admin.conf

And, once again you’ll need to run the following commands in order bootstrap them:

kubeadm alpha phase certs all --config kubeadm-conf-03.yaml
kubeadm alpha phase kubelet config write-to-disk --config kubeadm-conf-03.yaml
kubeadm alpha phase kubelet write-env-file --config kubeadm-conf-03.yaml
kubeadm alpha phase kubeconfig kubelet --config kubeadm-conf-03.yaml
systemctl start kubelet

And then, add the nodes to the etcd cluster once more:

export CONTROL01_IP=<YOUR_IP_HERE>
export CONTROL01_HOSTNAME=cp01H
export CONTROL03_IP=<YOUR_IP_HERE>
export CONTROL03_HOSTNAME=cp03H

Next, we can set up the etcd system:

export KUBECONFIG=/etc/kubernetes/admin.conf

kubectl exec -n kube-system etcd-${CONTROL01_HOSTNAME} -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://${CONTROL01_IP}:2379 member add ${CONTROL03_HOSTNAME} https://${CONTROL03_IP}:2380

kubeadm alpha phase etcd local --config kubeadm-conf-03.yaml


After that’s complete, we can once again deploy the rest of the components of the control plane and mark the node as a master. Run the following commands:

kubeadm alpha phase kubeconfig all --config kubeadm-conf-03.yaml

kubeadm alpha phase controlplane all --config kubeadm-conf-03.yaml

kubeadm alpha phase mark-master --config kubeadm-conf-03.yaml

Great work!

Installing workers

Once you’ve configure the masters, you can join the worker nodes to the cluster. You can only do this once you’ve installed networking, the container, and any other prerequisites you’ve added to your clusters such as DNS, though. However, before you add the worker nodes, you’ll need to configure a pod network. You can find more information about the pod network add-on here: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#pod-network.

Comments are closed.

loading...