loading...

Kubernetes – Built-in monitoring

If you recall from Chapter 1, Introduction to Kubernetes, we noted that our nodes were already running a number of monitoring services. We can see these once again by running the get pods command with the kube-system namespace specified as follows:

$ kubectl get pods --namespace=kube-system

The following screenshot is the result of the preceding command:

System pod listing

Again, we see a variety of services, but how does this all fit together? If you recall, the node (formerly minions) section from Chapter 2, Building a Foundation with Core Kubernetes Constructs, each node is running a kubelet. The kubelet is the main interface for nodes to interact with and update the API server. One such update is the metrics of the node resources. The actual reporting of the resource usage is performed by a program named cAdvisor.

The cAdvisor program is another open source project from Google, which provides various metrics on container resource use. Metrics include CPU, memory, and network statistics. There is no need to tell cAdvisor about individual containers; it collects the metrics for all containers on a node and reports this back to the kubelet, which in turn reports to Heapster.

Google’s open source projects: Google has a variety of open source projects related to Kubernetes. Check them out, use them, and even contribute your own code!

Both cAdvisor and Heapster are mentioned in the following sections of GitHub:

  • cAdvisor: https://github.com/google/cadvisor
  • Heapster: https://github.com/kubernetes/heapster

Contrib is a catch-all term for a variety of components that are not part of core Kubernetes. It can be found at  https://github.com/kubernetes/contrib. LevelDB is a key store library that was used in the creation of InfluxDB. It can be found at  https://github.com/google/leveldb.

Heapster is yet another open source project from Google; you may start to see a theme emerging here (see the preceding information box). Heapster runs in a container on one of the minion nodes and aggregates the data from a kubelet. A simple REST interface is provided to query the data.

When using the GCE setup, a few additional packages are set up for us, which saves us time and gives us a complete package to monitor our container workloads. As we can see from the preceding System pod listing screenshot, there is another pod with influx-grafana in the title.

InfluxDB is described on its official website as follows:

An open-source distributed time series database with no external dependencies.

InfluxDB is based on a key store package (refer to the previous Google’s open source projects information box) and is perfect to store and query event- or time-based statistics such as those provided by Heapster.

Finally, we have Grafana, which provides a dashboard and graphing interface for the data stored in InfluxDB. Using Grafana, users can create a custom monitoring dashboard and get immediate visibility into the health of their Kubernetes cluster, and therefore their entire container infrastructure.

Exploring Heapster

Let’s quickly look at the REST interface by running SSH to the node that is running the Heapster pod. First, we can list the pods to find the one that is running Heapster, as follows:

$ kubectl get pods --namespace=kube-system

The name of the pod should start with monitoring-heapster. Run a describe command to see which node it is running on, as follows:

$ kubectl describe pods/<Heapster monitoring Pod> --namespace=kube-system

From the output in the following screenshot, we can see that the pod is running in kubernetes-minion-merd. Also note the IP for the pod, a few lines down, as we will need that in a moment:

Heapster pod details

Next, we can SSH to this box with the familiar gcloud ssh command, as follows:

$ gcloud compute --project "<Your project ID>" ssh --zone "<your gce zone>" "<kubernetes minion from describe>"

From here, we can access the Heapster REST API directly using the pod’s IP address. Remember that pod IPs are routable not only in the containers but also on the nodes themselves. The Heapster API is listening on port 8082, and we can get a full list of metrics at /api/v1/metric-export-schema/.

Let’s look at the list now by issuing a curl command to the pod IP address we saved from the describe command, as follows:

$ curl -G <Heapster IP from describe>:8082/api/v1/metric-export-schema/

We will see a listing that is quite long. The first section shows all the metrics available. The last two sections list fields by which we can filter and group. For your convenience, I’ve added the following tables which are a little bit easier to read:

Metric

Description

Unit

Type

uptime

The number of milliseconds since the container was started

ms

Cumulative

cpu/usage

The cumulative CPU usage on all cores

ns

Cumulative

cpu/limit

The CPU limit in millicores

Gauge

memory/usage

Total memory usage

Bytes

Gauge

memory/working_set

Total working set usage; the working set is the memory that is being used, and is not easily dropped by the kernel

Bytes

Gauge

memory/limit

The memory limit

Bytes

Gauge

memory/page_faults

The number of page faults

Cumulative

memory/major_page_faults

The number of major page faults

Cumulative

network/rx

The cumulative number of bytes received over the network

Bytes

Cumulative

network/rx_errors

The cumulative number of errors while receiving over the network

Cumulative

network/tx

The cumulative number of bytes sent over the network

Bytes

Cumulative

network/tx_errors

The cumulative number of errors while sending over the network

Cumulative

filesystem/usage

The total number of bytes consumed on a filesystem

Bytes

Gauge

filesystem/limit

The total size of filesystem in bytes

Bytes

Gauge

filesystem/available

The number of available bytes remaining in a the filesystem

Bytes

Gauge

Table 6.1. Available Heapster metrics

Field

Description

Label type

nodename

The node name where the container ran

Common

hostname

The host name where the container ran

Common

host_id

An identifier specific to a host, which is set by the cloud provider or user

Common

container_base_image

The user-defined image name that is run inside the container

Common

container_name

The user-provided name of the container or full container name for system containers

Common

pod_name

The name of the pod

Pod

pod_id

The unique ID of the pod

Pod

pod_namespace

The namespace of the pod

Pod

namespace_id

The unique ID of the namespace of the pod

Pod

labels

A comma-separated list of user-provided labels

Pod

Table 6.2. Available Heapster fields

Customizing our dashboards

Now that we have the fields, we can have some fun. Recall the Grafana page that we looked at in Chapter 1, Introduction to Kubernetes. Let’s pull that up again by going to our cluster’s monitoring URL. Note that you may need to log in with your cluster credentials. Refer to the following format of the link you need to use: https://<your master IP>/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana

We’ll see the default Home dashboard. Click on the down arrow next to Home and select Cluster. This shows the Kubernetes cluster dashboard, and now we can add our own statistics to the board. Scroll all the way to the bottom and click on Add a Row. This should create a space for a new row and present a green tab on the left-hand side of the screen.

Let’s start by adding a view into the filesystem usage for each node (minion). Click on the green tab to expand, and then select  Add Panel and then G raph. An empty graph should appear on the screen, along with a query panel for our custom graph.

The first field in this panel should show a query that starts with SELECT mean(“value”) FROM. Click on the A character next to this field to expand it. Leave the first field next to FROM as default and then click on the next field with the  select measurement value. A drop-down menu will appear with the Heapster metrics we saw in the previous tables. Select filesystem/usage_bytes_gauge. Now, in the SELECT row, click on  mean() and then on the x symbol to remove it. Next, click on the + symbol on the end of the row and add selectors and  max. Then, you’ll see a GROUP BY row with time($interval) and fill(none). Carefully click on  fill and not on the (none) portion, and again on  x to remove it.

Then, click on the + symbol at the end of the row and select tag(hostname).Finally, at the bottom of the screen we should see a G roup by time  interval. Enter 5s there and you should have something similar to the following screenshot:

Heapster pod details

Next, let’s click on the Axes tab, so that we can set the units and legend. Under Left Y Axis, click on the field next to Unit and set it to data | bytes and Label to Disk Space Used. Under Right Y Axis, set Uni t to none | none. Next, on the  Legend tab, make sure to check Show in  Options and Max in  Values.

Now, let’s quickly go to the General tab and choose a title. In my case, I named mine Filesystem Disk Usage by Node (max).

We don’t want to lose this nice new graph we’ve created, so let’s click on the save icon in the top-right corner. It looks like a floppy disk (you can do a Google image search if you don’t know what this is).

After we click on the save icon, we will see a green dialog box that verifies that the dashboard was saved. We can now click the x symbol above the graph details panel and below the graph itself.

This will return us to the dashboard page. If we scroll all the way down, we will see our new graph. Let’s add another panel to this row. Again, use the green tab and then select Add Panel |  singlestat. Once again, an empty panel will appear with a setting form below it.

Let’s say we want to watch a particular node and monitor network usage. We can easily do this by first going to the Metrics tab. Then, expand the query field and set the second value in the FROM field to network/rx. Now, we can specify the WHERE clause by clicking the + symbol at the end of the row and choosing hostname from the drop-down. After hostname =click on select tag value and choose one of the minion nodes from the list.

Finally, leave mean() for the second  SELECT field shown as follows:

Singlestat options

In the Options tab, make sure that Unit format is set to data bytes and check the Show box next to  Spark lines. The spark line gives us a quick historical view of the recent variations in the value. We can use Background mode to take up the entire background; by default, it uses the area below the value.

In  Coloring, we can optionally check the Value or Background box and choose Thresholds and Colors. This will allow us to choose different colors for the value based on the threshold tier we specify. Note that an unformatted version of the number must be used for threshold values.

Now, let’s go back to the General tab and set the title as Network bytes received (Node35ao). Use the identifier for your minion node.

Once again, let’s save our work and return to the dashboard. We should now have a row that looks like the following screenshot:

Custom dashboard panels

Grafana has a number of other panel types that you can play with, such as Dashboard list, Plugin list, Table, and Text

As we can see, it is pretty easy to build a custom dashboard and monitor the health of our cluster at a glance.

Comments are closed.

loading...