If you recall from Chapter 1, Introduction to Kubernetes, we noted that our nodes were already running a number of monitoring services. We can see these once again by running the get pods command with the kube-system namespace specified as follows:
$ kubectl get pods --namespace=kube-system
The following screenshot is the result of the preceding command:

Again, we see a variety of services, but how does this all fit together? If you recall, the node (formerly minions) section from Chapter 2, Building a Foundation with Core Kubernetes Constructs, each node is running a kubelet. The kubelet is the main interface for nodes to interact with and update the API server. One such update is the metrics of the node resources. The actual reporting of the resource usage is performed by a program named cAdvisor.
The cAdvisor program is another open source project from Google, which provides various metrics on container resource use. Metrics include CPU, memory, and network statistics. There is no need to tell cAdvisor about individual containers; it collects the metrics for all containers on a node and reports this back to the kubelet, which in turn reports to Heapster.
Both cAdvisor and Heapster are mentioned in the following sections of GitHub:
- cAdvisor: https://github.com/google/cadvisor
- Heapster: https://github.com/kubernetes/heapster
Contrib is a catch-all term for a variety of components that are not part of core Kubernetes. It can be found at https://github.com/kubernetes/contrib. LevelDB is a key store library that was used in the creation of InfluxDB. It can be found at https://github.com/google/leveldb.
Heapster is yet another open source project from Google; you may start to see a theme emerging here (see the preceding information box). Heapster runs in a container on one of the minion nodes and aggregates the data from a kubelet. A simple REST interface is provided to query the data.
When using the GCE setup, a few additional packages are set up for us, which saves us time and gives us a complete package to monitor our container workloads. As we can see from the preceding System pod listing screenshot, there is another pod with influx-grafana in the title.
InfluxDB is described on its official website as follows:
InfluxDB is based on a key store package (refer to the previous Google’s open source projects information box) and is perfect to store and query event- or time-based statistics such as those provided by Heapster.
Finally, we have Grafana, which provides a dashboard and graphing interface for the data stored in InfluxDB. Using Grafana, users can create a custom monitoring dashboard and get immediate visibility into the health of their Kubernetes cluster, and therefore their entire container infrastructure.
Exploring Heapster
Let’s quickly look at the REST interface by running SSH to the node that is running the Heapster pod. First, we can list the pods to find the one that is running Heapster, as follows:
$ kubectl get pods --namespace=kube-system
The name of the pod should start with monitoring-heapster. Run a describe command to see which node it is running on, as follows:
$ kubectl describe pods/<Heapster monitoring Pod> --namespace=kube-system
From the output in the following screenshot, we can see that the pod is running in kubernetes-minion-merd. Also note the IP for the pod, a few lines down, as we will need that in a moment:

Next, we can SSH to this box with the familiar gcloud ssh command, as follows:
$ gcloud compute --project "<Your project ID>" ssh --zone "<your gce zone>" "<kubernetes minion from describe>"
From here, we can access the Heapster REST API directly using the pod’s IP address. Remember that pod IPs are routable not only in the containers but also on the nodes themselves. The Heapster API is listening on port 8082, and we can get a full list of metrics at /api/v1/metric-export-schema/.
Let’s look at the list now by issuing a curl command to the pod IP address we saved from the describe command, as follows:
$ curl -G <Heapster IP from describe>:8082/api/v1/metric-export-schema/
We will see a listing that is quite long. The first section shows all the metrics available. The last two sections list fields by which we can filter and group. For your convenience, I’ve added the following tables which are a little bit easier to read:
Metric |
Description |
Unit |
Type |
uptime |
The number of milliseconds since the container was started |
ms |
Cumulative |
cpu/usage |
The cumulative CPU usage on all cores |
ns |
Cumulative |
cpu/limit |
The CPU limit in millicores |
– |
Gauge |
memory/usage |
Total memory usage |
Bytes |
Gauge |
memory/working_set |
Total working set usage; the working set is the memory that is being used, and is not easily dropped by the kernel |
Bytes |
Gauge |
memory/limit |
The memory limit |
Bytes |
Gauge |
memory/page_faults |
The number of page faults |
– |
Cumulative |
memory/major_page_faults |
The number of major page faults |
– |
Cumulative |
network/rx |
The cumulative number of bytes received over the network |
Bytes |
Cumulative |
network/rx_errors |
The cumulative number of errors while receiving over the network |
– |
Cumulative |
network/tx |
The cumulative number of bytes sent over the network |
Bytes |
Cumulative |
network/tx_errors |
The cumulative number of errors while sending over the network |
– |
Cumulative |
filesystem/usage |
The total number of bytes consumed on a filesystem |
Bytes |
Gauge |
filesystem/limit |
The total size of filesystem in bytes |
Bytes |
Gauge |
filesystem/available |
The number of available bytes remaining in a the filesystem |
Bytes |
Gauge |
Field |
Description |
Label type |
nodename |
The node name where the container ran |
Common |
hostname |
The host name where the container ran |
Common |
host_id |
An identifier specific to a host, which is set by the cloud provider or user |
Common |
container_base_image |
The user-defined image name that is run inside the container |
Common |
container_name |
The user-provided name of the container or full container name for system containers |
Common |
pod_name |
The name of the pod |
Pod |
pod_id |
The unique ID of the pod |
Pod |
pod_namespace |
The namespace of the pod |
Pod |
namespace_id |
The unique ID of the namespace of the pod |
Pod |
labels |
A comma-separated list of user-provided labels |
Pod |
Customizing our dashboards
Now that we have the fields, we can have some fun. Recall the Grafana page that we looked at in Chapter 1, Introduction to Kubernetes. Let’s pull that up again by going to our cluster’s monitoring URL. Note that you may need to log in with your cluster credentials. Refer to the following format of the link you need to use: https://<your master IP>/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana
We’ll see the default Home dashboard. Click on the down arrow next to Home and select Cluster. This shows the Kubernetes cluster dashboard, and now we can add our own statistics to the board. Scroll all the way to the bottom and click on Add a Row. This should create a space for a new row and present a green tab on the left-hand side of the screen.
Let’s start by adding a view into the filesystem usage for each node (minion). Click on the green tab to expand, and then select Add Panel and then G raph. An empty graph should appear on the screen, along with a query panel for our custom graph.
The first field in this panel should show a query that starts with SELECT mean(“value”) FROM. Click on the A character next to this field to expand it. Leave the first field next to FROM as default and then click on the next field with the select measurement value. A drop-down menu will appear with the Heapster metrics we saw in the previous tables. Select filesystem/usage_bytes_gauge. Now, in the SELECT row, click on mean() and then on the x symbol to remove it. Next, click on the + symbol on the end of the row and add selectors and max. Then, you’ll see a GROUP BY row with time($interval) and fill(none). Carefully click on fill and not on the (none) portion, and again on x to remove it.
Then, click on the + symbol at the end of the row and select tag(hostname).Finally, at the bottom of the screen we should see a G roup by time interval. Enter 5s there and you should have something similar to the following screenshot:

Next, let’s click on the Axes tab, so that we can set the units and legend. Under Left Y Axis, click on the field next to Unit and set it to data | bytes and Label to Disk Space Used. Under Right Y Axis, set Uni t to none | none. Next, on the Legend tab, make sure to check Show in Options and Max in Values.
Now, let’s quickly go to the General tab and choose a title. In my case, I named mine Filesystem Disk Usage by Node (max).
We don’t want to lose this nice new graph we’ve created, so let’s click on the save icon in the top-right corner. It looks like a floppy disk (you can do a Google image search if you don’t know what this is).
After we click on the save icon, we will see a green dialog box that verifies that the dashboard was saved. We can now click the x symbol above the graph details panel and below the graph itself.
This will return us to the dashboard page. If we scroll all the way down, we will see our new graph. Let’s add another panel to this row. Again, use the green tab and then select Add Panel | singlestat. Once again, an empty panel will appear with a setting form below it.
Let’s say we want to watch a particular node and monitor network usage. We can easily do this by first going to the Metrics tab. Then, expand the query field and set the second value in the FROM field to network/rx. Now, we can specify the WHERE clause by clicking the + symbol at the end of the row and choosing hostname from the drop-down. After hostname =, click on select tag value and choose one of the minion nodes from the list.
Finally, leave mean() for the second SELECT field shown as follows:

In the Options tab, make sure that Unit format is set to data | bytes and check the Show box next to Spark lines. The spark line gives us a quick historical view of the recent variations in the value. We can use Background mode to take up the entire background; by default, it uses the area below the value.
Now, let’s go back to the General tab and set the title as Network bytes received (Node35ao). Use the identifier for your minion node.
Once again, let’s save our work and return to the dashboard. We should now have a row that looks like the following screenshot:

Grafana has a number of other panel types that you can play with, such as Dashboard list, Plugin list, Table, and Text.
As we can see, it is pretty easy to build a custom dashboard and monitor the health of our cluster at a glance.