loading...

Kubernetes – Building a Foundation with Core Kubernetes Constructs

This chapter will cover the core Kubernetes constructs, namely pods, services, replication controllers, replica sets, and labels. We will describe Kubernetes components, dimensions of the API, and Kubernetes objects. We will also dig into the major Kubernetes cluster components. A few simple application examples will be included to demonstrate each construct. This chapter will also cover basic operations for your cluster. Finally, health checks and scheduling will be introduced with a few examples.

The following topics will be covered in this chapter:

  • Kubernetes’ overall architecture
  • The context of Kubernetes architecture within system theory
  • Introduction to core Kubernetes constructs, architecture, and components
  • How labels can simplify the management of a Kubernetes cluster
  • Monitoring services and container health
  • Setting up scheduling constraints based on available cluster resources

The Kubernetes system

To understand the complex architecture and components of Kubernetes, we should take a step back and look at the landscape of the overall system in order to understand the context and place of each moving piece. This book focuses mainly on the technical pieces and processes of the Kubernetes software, but let’s examine the system from a top-down perspective. In the following diagram, you can see the major parts of the Kubernetes system, which is a great way to think about the classification of the parts we’ll describe and utilize in this book:

Let’s take a look at each piece, starting from the bottom.

Nucleus

The nucleus of the Kubernetes system is devoted to providing a standard API and manner in which operators and/or software can execute work on the cluster. The nucleus is the bare minimum set of functionality that should be considered absolutely stable in order to build up the layers above. Each piece of this layer is clearly documented, and these pieces are required to build higher-order concepts at other layers of the system. You can consider the APIs here to make up the core bits of the Kubernetes control plane.

The cluster control plane is the first half of the Kubernetes nucleus, and it provides the RESTful APIs that allow operators to utilized the mostly CRUD-based operations of the cluster. It is important to note that the Kubernetes nucleus and consequently the cluster control plane was built with multi-tenancy in mind, so the layer must be flexible enough to provide logical separation of teams or workloads within a single cluster. The cluster control plane follows API conventions that allow it to take advantage of shared services such as identity and auditing, and has access to the namespaces and events of the cluster.

The second half of the nucleus is execution. While there are a number of controllers in Kubernetes, such as the replication controller, replica set, and deployments, the kubelet is the most important controller and it forms the basis of the node and pod APIs that allow us to interact with the container execution layer. Kubernetes builds upon the kubelet with the concept of pods, which allow us to manage many containers and their constituent storage as a core capability of the system. We’ll dig more into pods later.

Below the nucleus, we can see the various pieces that the kubelet depends on in order to manage the container, network, container storage, image storage, cloud provider, and identity. We’ve left these intentionally vague as there are several options for each box, and you can pick and choose from standard and popular implementations or experiment with emerging tech. To give you an idea of how many options there are in the base layer, we’ll outline container runtime and network plugin options here.

Container Runtime options: You’ll use the Kubernetes Container Runtime Interface (CRI) to interact with the two main container runtimes:

  • containerd
  • rkt

You’re still able to run Docker containers on Kubernetes at this point, and as containerd is the default runtime, it’s going to be transparent to the operator at this point due to the defaults. You’ll be able to run all of the same docker <action> commands on the cluster to introspect and gather information about your clusters.

There are also several competing, emerging formats:

  •  cri-containerd: https://github.com/containerd/cri-containerd
  • runv and clear containers, which are hypervisor-based solutions: https://github.com/hyperhq/runv and https://github.com/clearcontainers/runtime
  • kata containers, which are a combination of runv and clear containers: https://katacontainers.io/
  • frakti containers, which combine runv and Docker: https://github.com/kubernetes/frakti
You can read more about the CRI here: http://blog.kubernetes.io/2016/12/container-runtime-interface-cri-in-kubernetes.html.

Network plugin: You can use the CNI to leverage any of the following plugins or the simple Kubenet networking implementation if you’re going to rely on a cloud provider’s network segmentation, or if you’re going to be running a single node cluster:

  • Cilium
  • Contiv
  • Contrail
  • Flannel
  • Kube-router
  • Multus
  • Calico
  • Romana
  • Weave net

Application layer

The application layer, often referred to as the service fabric or orchestration layer, does all of the fun things we’ve come to value so highly in Kubernetes: basic deployment and routing, service discovery, load balancing, and self-healing. In order for a cluster operator to manage the life cycle of the cluster, these primitives must be present and functional in this layer. Most containerized applications will depend on the full functionality of this layer, and will interact with these functions in order to provide “orchestration” of the application across multiple cluster hosts. When an application scales up or changes a configuration setting, the application layer will be managed by this layer. The application layer cares about the desired state of the cluster, the application composition, service discovery, load balancing, and routing, and utilizes all of these pieces to keep data flowing from the correct point A to the correct point B.

Governance layer

The governance layer consists of high-level automation and policy enforcement. This layer can be thought of as an opinionated version of the application management layer, as it provides the ability to enforce tenancy, gather metrics, and do intelligent provisioning and autoscaling of containers. The APIs at this layer should be considered options for running containerized applications.

The governance layer allows operators to control methods used for authorization, as well as quotas and control around network and storage. At this layer, functionality should be applicable to scenarios that large enterprises care about, such as operations, security, and compliance scenarios.

Interface layer

The interface layer is made up of commonly used tools, systems, user interfaces, and libraries that other custom Kubernetes distributions might use. The kubectl library is a great example of the interface layer, and importantly it’s not seen as a privileged part of the Kubernetes system; it’s considered a client tool in order to provide maximum flexibility for the Kubernetes API. If you run $ kubectl -h, you will get a clear picture of the functionality exposed to the interface layer.

Other pieces at this layer include cluster federation tools, dashboards, Helm, and client libraries such as client-node, KubernetesClient, and python. These tools provide common tasks for you, so you don’t have to worry about writing code for authentication, for example. These libraries use the Kubernetes Service Account to authenticate to the cluster.

Ecosystem

The last layer of the Kubernetes system is the ecosystem, and it’s by far the busiest and most hectic part of the picture. Kubernetes approach to container orchestration and management is to present the user with the options of a complementary choice; there are plug-in and general purpose APIs available for external systems to utilize. You can consider three types of ecosystem pieces in the Kubernetes system:

  • Above Kubernetes: All of the glue software and infrastructure that’s needed to “make things go” sits at this level, and includes operational ethos such as ChatOps and DevOps, logging and monitoring, Continuous Integration and Delivery, big data systems, and Functions as a Service.
  • Inside Kubernetes: In short, what’s inside a container is outside of Kubernetes. Kubernetes, or K8s, cares not at all what you run inside of a container.
  • Below Kubernetes: These are the gray squares detailed at the bottom of the diagram. You’ll need a technology for each piece of foundational technology to make Kubernetes function, and the ecosystem is where you get them. The cluster state store is probably the most famous example of an ecosystem component: etcd. Cluster bootstrapping tools such as minikube, bootkube, kops, kube-aws, and kubernetes-anywhere are other examples of community-provided ecosystem tools.

Let’s move on to the architecture of the Kubernetes system, now that we understand the larger context.

The architecture

Although containers bring a helpful layer of abstraction and tooling for application management, Kubernetes brings additional to schedule and orchestrate containers at scale, while managing the full application life cycle.

K8s moves up the stack, giving us constructs to deal with management at the application- or service- level. This gives us automation and tooling to ensure high availability, application stack, and service-wide portability. K8s also allows finer control of resource usage, such as CPU, memory, and disk space across our infrastructure.

Kubernetes architecture is comprised of three main pieces:

  • The cluster control plane (the master)
  • The cluster state (a distributed storage system called etcd)
  • Cluster nodes (individual servers running agents called kubelets)

The Master

The cluster control plane, otherwise known as the Master, makes global decisions based on the current and desired state of the cluster, detecting and responding to events as they propagate across the cluster. This includes starting and stopping pods if the replication factor of a replication controller is unsatisfied or running a scheduled cron job.

The overarching goal of the control plane is to report on and work towards a desired state. The API that the master runs depends on the persistent state store, etcd, and utilizes the watch strategy for minimizing change latency while enabling decentralized component coordination.

Components of the Master can be realistically run on any machine in the cluster, but best practices and production-ready systems dictate that master components should be co-located on a single machine (or a multi-master high availability setup). Running all of the Master components on a single machine allows operators to exclude running user containers on those machines, which is recommended for more reliable control plane operations. The less you have running on your Master node, the better!

We’ll dig into the Master components, including kube-apiserver, etcd, kube-scheduler, kube-controller-manager, and cloud-controller-manager when we get into more detail on the Master node. It is important to note that the Kubernetes goal with these components is to provide a RESTful API against mostly persistent storage resources and a CRUD (Create, Read, Update, and Delete) strategy. We’ll explore the basic primitives around container-specific orchestration and scheduling later in this chapter when we read about services, ingress, pods, deployments, StatefulSet, CronJobs, and ReplicaSets.

Cluster state

The second major piece of the Kubernetes architecture, the cluster state, is the etcd key value store. etcd is consistent and highly available, and is designed to quickly and reliably provide Kubernetes access to critical cluster current and desired state. etcd is able to provide this distributed coordination of data through such core concepts as leader election and distributed locks. The Kubernetes API, via its API server, is in charge of updating objects in etcd that correspond to the RESTful operations of the cluster. This is very important to remember: the API server is responsible for managing what’s stuck into Kubernetes’ picture of the world. Other components in this ecosystem watch etcd for changes in order to modify themselves and enter into the desired state.

This is of particular important because every component we’ve described in the Kubernetes Master and those that we’ll investigate in the nodes below are stateless, which means their state is stored elsewhere, and that elsewhere is etcd.

Kubernetes doesn’t take specific action to make things happen on the cluster; the Kubernetes API, via the API server, writes into etcd what should be true, and then the various pieces of Kubernetes make it so. etcd provides this interface via a simple HTTP/JSON API, which makes interacting with it quite simple.

etcd is also important in considerations of the Kubernetes security model due to it existing at a very low layer of the Kubernetes system, which means that any component that can write data to etcd has root to the cluster. Later on, we’ll look into how the Kubernetes system is divided into layers in order to minimize this exposure. You can consider etcd to underlay Kubernetes with other parts of the ecosystem such as the container runtime, an image registry, a file storage, a cloud provider interface, and other dependencies that Kubernetes manages but does not have an opinionated perspective on.

In non-production Kubernetes clusters, you’ll see single-node instantiations of etcd to save money on compute, simplify operations, or otherwise reduce complexity. It is essential to note however that a multi-master strategy of 2n+1 nodes is essential for production-ready clusters, in order to replicate data effectively across masters and ensure fault tolerance. It is recommended that you check the etcd documentation for more information.

Check out the etcd documentation here: https://github.com/coreos/etcd/blob/master/Documentation/docs.md.

If you’re in front of your cluster, you can check to see the status of etcd by checking componentstatuses or cs:

[node3 /]$ kubectl get componentstatuses
NAME                 STATUS MESSAGE          ERROR
scheduler            Healthy ok
controller-manager   Healthy ok
etcd-0               Healthy {"health": "true"}
Due to a bug in the AKS ecosystem, this will currently not work on Azure. You can track this issue here to see when it is resolved:

https://github.com/Azure/AKS/issues/173: kubectl get componentstatus fails for scheduler and controller-manager #173

If you were to see an unhealthy etcd service, it’d look something like so:

[node3 /]$ kubectl get cs

NAME                  STATUS       MESSAGE      ERROR
etcd-0                Unhealthy                 Get http://127.0.0.1:2379/health: dial tcp 127.0.0.1:2379: getsockopt: connection refused
controller-manager    Healthy      ok
scheduler             Healthy      ok

Cluster nodes

The third and final major Kubernetes component are the cluster nodes. While the master node components only run on a subset of the Kubernetes cluster, the node components run everywhere; they manage the maintenance of running pods, containers, and other primitives and provide the runtime environment. There are three node components:

  • Kubelet
  • Kube-proxy
  • Container runtime

We’ll dig into the specifics of these components later, but it’s important to note several things about node componentry first. The kubelet can be considered the primary controller within Kubernetes, and providers the pod/node APIs that are used by the container runtime to execute container functionality. This functionality is grouped by container and their corresponding storage volumes into the concept of pods. The concept of a pod gives application developers a straightforward packaging paradigm from which to design their application, and allows us to take maximum advantage of the portability of containers, while realizing the power of orchestration and scheduling across many instances of a cluster.

It’s interesting to note that a number of Kubernetes components run on Kubernetes itself (in other words, powered by the kubelets), including DNS, ingress, the Dashboard, and the resource monitoring of Heapster:

Kubernetes core architecture

In the preceding diagram, we see the core architecture of Kubernetes. Most administrative interactions are done via the kubectl script and/or RESTful service calls to the API.

As mentioned, note the ideas of the desired state and actual state carefully. This is the key to how Kubernetes manages the cluster and its workloads. All the pieces of K8s are constantly working to monitor the current actual state and synchronize it with the desired state defined by the administrators via the API server or kubectl script. There will be times when these states do not match up, but the system is always working to reconcile the two.

Let’s dig into more detail on the Master and node instances.

Master

We know now that the Master is the brain of our cluster. We have the core API server, which maintains RESTful web services for querying and defining our desired cluster and workload state. It’s important to note that the control pane only accesses the master to initiate changes and not the nodes directly.

Additionally, the master includes the scheduler. The replication controller/replica set works with the API server to ensure that the correct number of pod replicas are running at any given time. This is exemplary of the desired state concept. If our replication controller/replica set is defining three replicas and our actual state is two copies of the pod running, then the scheduler will be invoked to add a third pod somewhere in our cluster. The same is true if there are too many pods running in the cluster at any given time. In this way, K8s is always pushing toward that desired state.

As discussed previously, we’ll look more closely into each of the Master components. kube-apiserver has the job of providing the API for the cluster as the front end of the control plane that the Master is providing. In fact, the apiserver is exposed through a service specifically called kubernetes, and we install the API server using the kubelet. This service is configured via the kube-apiserver.yaml file, which lives in /etc/kubernetes/manifests/ on every manage node within your cluster.

kube-apiserver is a key portion of high availability in Kubernetes and, as such, it’s designed to scale horizontally. We’ll discuss how to construct highly available clusters later in this book, but suffice to say that you’ll need to spread the kube-apiserver container across several Master nodes and provide a load balancer in the front.

Since we’ve gone into some detail about the cluster state store, it will suffice to say that an etcd agent is running on all of the Master nodes.

The next piece of the puzzle is kube-scheduler, which makes sure that all pods are associated and assigned to a node for operation. The schedulers works with the API server to schedule workloads in the form of pods on the actual minion nodes. These pods include the various containers that make up our application stacks. By default, the basic Kubernetes scheduler spreads pods across the cluster and uses different nodes for matching pod replicas. Kubernetes also allows specifying necessary resources, hardware and software policy constraints, affinity or anti-affinity as required, and data volume locality for each container, so scheduling can be altered by these additional factors.

The last two main pieces of the Master nodes are kube-controller-manager and cloud-controller-manager. As you might have guessed based on their names, while both of these services play an important part in container orchestration and scheduling, kube-controller-manager helps to orchestrate core internal components of Kubernetes, while cloud-controller-manager interacts with different vendors and their cloud provider APIs.

kube-controller-manager is actually a Kubernetes daemon that embeds the core control loops, otherwise known as controllers, that are included with Kubernetes:

  • The Node controller, which manages pod availability and manages pods when they go down
  • The Replication controller, which ensures that each replication controller object in the system has the correct number of pods
  • The Endpoints controller, which controls endpoint records in the API, thereby managing DNS resolution of a pod or set of pods backing a service that defines selectors

In order to reduce the complexity of the controller components, they’re all packed and shipped within this single daemon as kube-controller-manager.

cloud-controller-manager, on the other hand, pays attention to external components, and runs controller loops that are specific to the cloud provider that your cluster is using. The original intent of this design was to decouple the internal development of Kubernetes from cloud-specific vendor code. This was accomplished through the use of plugins, which prevents Kubernetes from relying on code that is not inherent to its value proposition. We can expect over time that future releases of Kubernetes will move vendor-specific code completely out of the Kubernetes code base, and that vendor-specific code will be maintained by the vendor themselves, and then called on by the Kubernetes cloud-controller-manager. This design prevents the need for several pieces of Kubernetes to communicate with the cloud provider, namely the kubelet, Kubernetes controller manager, and the API server. 

Nodes (formerly minions)

In each node, we have several components as mentioned already. Let’s look at each of them in detail.

The kubelet interacts with the API server to update the state and to start new workloads that have been invoked by the scheduler. As previously mentioned, this agent runs on every node of the cluster. The primary interface of the kubelet is one or more PodSpecs, which ensure that the containers and configurations are healthy.

The kube-proxy provides basic load balancing and directs the traffic destined for specific services to the proper pod on the backend. It maintains these network rules to enable the service abstraction through connection forwarding.

The last major component of the node is the container runtime, which is responsible for initiating, running, and stopping containers. The Kubernetes ecosystem has introduced the OCI runtime specification to democratize the container scheduler/orchestrator interface. While Docker, rkt, and runc are the current major implementations, the OCI aims to provide a common interface so you can bring your own runtime. At this point, Docker is the overwhelmingly dominant runtime.

Read more about the OCI runtime specifications here: https://github.com/opencontainers/runtime-spec. 

In your cluster, the nodes may be virtual machines or bare metal hardware. Compared to other items such as controllers and pods, the node is not an abstraction that is created by Kubernetes. Rather, Kubernetes leverages cloud-controller-manager to interact with the cloud provider API, which owns the life cycle of the nodes. That means that when we instantiate a node in Kubernetes, we’re simply creating an object that represents a machine in your given infrastructure. It’s up to Kubernetes to determine if the node has converged with the object definition. Kubernetes validates the node’s availability through its IP address, which is gathered via the metadata.name field. The status of these nodes can be discovered through the following status keys.

The addresses are where we’ll find information such as the hostname and private and public IPs. This will be specific to your cloud provider’s implementation. The condition field will give you a view into the state of your node’s status in terms of disk, memory, network, and basic configuration.

Here’s a table with the available node conditions:

A healthy node will have a status that looks similar to the following if you run it, you’ll see the following output in the code:

$ kubectl get nodes -o json
"conditions": [
  {
    "type": "Ready",
    "status": "True"
  }
]

Capacity is simple: it’s the available CPU, memory, and resulting number of pods that can be run on a given node. Nodes self-report their capacity and leave the responsibility for scheduling the appropriate number of resources to Kubernetes. The Info key is similarly straightforward and provides version information for Docker, OS, and Kubernetes.

It’s important to note that the major component of the Kubernetes and node relationship is the node controller, which we called out previously as one of the core system controllers. There are three strategic pieces to this relationship:

  • Node health: When you run large clusters in private, public, or hybrid cloud scenarios, you’re bound to lose machines from time to time. Even within the data center, given a large enough cluster, you’re bound to see regular failures at scale. The node controller is responsible for updating the node’s NodeStatus to either NodeReady or ConditionUnknown, depending on the instance’s availability. This management is key as Kubernetes will need to migrate pods (and therefore containers) to available nodes if ConditionUnknown occurs. You can set the health check interval for nodes in your cluster with --node-monitor-period.
  • IP assignment: Every node needs some IP addresses, so it can distribute IPs to services and or containers.
  • Node list: In order to manage pods across a number of machines, we need to keep an up-to-date list of available machines. Based on the aforementioned NodeStatus, the node controller will keep this list current.

We’ll look into node controller specifics when investigating highly available clusters that span Availability Zones (AZs), which requires the spreading of nodes across AZs in order to provide availability.

Finally, we have some default pods, which run various infrastructure services for the node. As we explored briefly in the previous chapter, the pods include services for the Domain Name System (DNS), logging, and pod health checks. The default pod will run alongside our scheduled pods on every node.

In v1.0, minion was renamed to node, but there are still remnants of the term minion in some of the machine naming scripts and documentation that exists on the web. For clarity, I’ve added the term minion in addition to node in a few places throughout this book.

Comments are closed.

loading...