Detailed explanation of Kubernetes Cluster Monitoring 04/28 Update SLTechnology News&Howtos

Detailed explanation of Kubernetes Cluster Monitoring

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Jieshao

Kubernetes has more than 40, 000 stars on Github, more than 70, 000 commits, and major contributors like Google. Kubernetes can be said to have quickly taken over the container ecosystem and become the true leader in the container orchestration platform.

Understand Kubernetes and its Abstractions

At the infrastructure layer, a Kubernetes cluster is like a group of physical or virtual machines that play a specific role. The machine that plays the role of Master acts as the brain for all operations and is controlled by the orchestration container running on the node.

The Master component manages the life cycle of the pod, and the pod is the basic unit deployed in the Kubernetes cluster. Pod completes the cycle, and Controller creates a new one. If we go up or down (increase or decrease) the number of Pod copies, Controller will create and destroy the pod accordingly to satisfy the request. The Master role contains the following components:

°kube-apiserver-provides APIs for other master components

°etcd-consistent and highly available key/value storage for all internal cluster data

°kube-scheduler-use the information in the pod specification to determine the node running pod

°kube-controller-manager-responsible for node management (detection of node failures), pod replication, and endpoint creation

°cloud-controller-manager-runs the controller that interacts with the underlying cloud provider

The Node component is the worker machine in Kubernetes and is managed by Master. A node may indicate that there are no virtual machines (VM) or physical machines, and Kubernetes can run on them. Each node contains the components needed to run pods:

°kubelet: handles all communication between Master and the node on which it is running. It provides an interface to deploy and monitor containers when using container runtime.

°kube-proxy: maintains network rules on the host and handles the transmission of packets between pods, host, and the outside world.

°container runtime: responsible for running the container on host. Although Kubernetes supports rkt, runc, and various other container runtime, the most popular engine today is Docker.

From a logical point of view, Kubernetes deployment consists of a variety of components, each of which provides services in the cluster for a specific purpose.

Pods is the basic unit of Kubernetes deployment. A pod consists of one or more containers that share the same network namespace and IP address. Best practices recommend that we create a pod for each application so that you can extend and control them separately.

Services is set before the pods collection, providing them with a consistent IP address and a set of policies to control access to them. The collection of pod that Service targets is usually determined by the label selector (tag selector). This makes it easy to have Service point to different pod collections during upgrade or blue / green deployment.

ReplicaSets is controlled by the deployment and ensures the number of pods required to run the deployment.

Namespaces defines a logical namespace for resources such as pods and services. They allow resources to use the same name, while resource names in a single namespace must be unique. Rancher uses the access control of namespaces and opportunity roles to provide secure isolation between namespaces and the resources running in them.

Metadata marks the container based on its deployment characteristics.

Monitoring Kubernetes

Multiple services and namespaces can be distributed across the infrastructure. As mentioned above, each service is made up of pods, while pod can contain one or more containers. With so many mobile parts, even monitoring a small Kubernetes cluster can be challenging. In order to monitor it efficiently, this requires an in-depth understanding of the application architecture and functionality.

Kubernetes provides tools for monitoring clusters:

Probes can actively monitor the health of containers. If Probe detects that the container is unhealthy, it restarts the container.

CAdvisor is an open source agent that monitors resource usage and analyzes container performance. CAdvisor was originally created by Google and is now integrated with Kubelet. It can collect, aggregate, process, and export all metrics of courage running on a given node, such as CPU, memory, file, and network usage.

Kubernetes dashboard (dashboard) is an add-on that provides an overview of the resources running on the cluster. It also provides a very basic way to deploy these resources and interact with them.

Kubernetes has the powerful ability to automatically reply from a failure. If the process crashes, it can restart pods, and if the node has an error, it can reassign pods. However, despite this ability, there will be situations where the problem cannot be solved. In order to detect these conditions, we need additional monitoring.

The level of monitoring

infrastructure

Server-level problems can occur in workloads, so all clusters should monitor the underlying server components

Monitor what?

CPU utilization. Monitoring CPU can show not only the cost of the system and users, but also iowait. When you run a cluster in the cloud or in any network storage, iowait prompts you to store the bottleneck wait time of the read and write (iUnix) process. Oversubscribed storage frameworks can affect performance.

Memory usage. Monitoring memory can show how much memory is used and how much memory is available, which can be free memory or cache. Systems with memory constraints begin to swap, which can quickly degrade performance.

Disk pressure. If the system is running write-intensive services such as etcd or any data storage, running out of disk space can be a catastrophic problem. Failure to write data can lead to a crash, which translates into real-world losses. With a technology like LVM, it's easy to increase disk space as needed, but monitor it nonetheless.

Network bandwidth. In today's era of gigabit interfaces, it seems that bandwidth will never be exhausted. However, just a few abnormal services, data leaks, system corruption, or DOS***, can deplete all the bandwidth and cause downtime. If you understand your normal data usage and application patterns, you can effectively reduce costs and help plan capacity.

Pod resources. The Kubernetes scheduler can maximize its effectiveness if you know what resources pod needs. It ensures that pod is placed on available nodes. When designing a network, in order to avoid the situation that the remaining nodes are unable to run all the required resources, you need to consider in advance how many nodes may fail. Services such as cloud auto-scaling groups can be used to recover quickly, but make sure that the rest of the nodes can handle the increased load before the failed nodes recover.

Kubernetes service

All the components that make up Kubernetes Master or Worker, including etcd, are critical to the healthy operation of the application. If any of them fails, the monitoring system needs to detect the failure, fix it, and send a warning.

Internal service

The last layer is the Kubernetes resource itself. Kubernetes exposes metrics about resources, and we can also monitor applications directly. Although Kubernetes will try its best to maintain the ideal state, if there is nothing it can do, we need a way for human intervention and problem solving.

Use Rancher to monitor

In addition to managing Kubernetes clusters running on any provider, anywhere, Rancher monitors resources running in these clusters and sends alerts when resources exceed defined thresholds.

There are already many tutorials on how to deploy Rancher. If you don't already have a running cluster, please pause here and enter our Quick start Guide: https://rancher.com/quick-start/. Wait until the cluster is running and then return here to start monitoring.

The cluster overview allows you to understand the resources in use and the status of the Kubernetes components. In our example, we used 78% of CPU, 26% of RAM, and 11% of the maximum number of pod.

Click the Nodes tab, you can see additional information about each node running on the cluster, and when you click on a specific node, you can see about the health status of that member.

The Workloads tab shows the pods running on the cluster. If you don't already have any running pod, release a workload that runs nginx images and expand it into multiple copies.

When you need to select a workload name, Rancher pops up a page with information about the workload. At the top of the page, it shows the nodes on which each pod runs, the IP address of the pod, and their status. Click on any pod to see more, and now we see the details about that pod. The hamburger menu icon in the upper right corner allows us to interact with pod, through which we can execute shell, view logs, or delete pod.

The Other tab shows information about different Kubernetes resources, including Load Balancing for services of type ingress or LoadBalancer, Service Discovery for other service types, and Volumes for configuring volumes in the cluster.

Monitoring using Prometheus

The information you can see in Rancher UI is very helpful for troubleshooting, but it is not the best way to actively track the state of the cluster at every moment of the cluster life cycle. We will use Prometheus, a sibling project of Kubernetes, which is maintained and operated by Cloud Native Computing Foundation. We will also use the Grafana tool, which converts time series data into beautiful graphics and dashboard displays.

Prometheus is an open source application for monitoring systems and generating alerts. It can monitor almost anything, from servers to applications, databases, and even individual processes. In Prometheus's vocabulary, it monitors targets, and each unit of the target is called metric. The act of retrieving information about a target is called scraping. Prometheus will collect the target within the specified time interval and store the information in the time series database. Prometheus has its own scripting language PromQL.

Grafana is also open source and can be run as a Web application. Although it is often used with Prometheus, it also supports back-end data stores such as fluxDB, Graphite, Elasticsearch, and so on. Grafana can easily create graphics and combine them into dashboards, which are protected by a powerful authentication and authorization layer, and they can also be shared with other dashboards without access to the server itself. Grafana uses JSON heavily in its object definition so that its graphics and dashboards are easy to migrate and version control is very convenient.

Both Prometheus and Grafana are included in Rancher's application directory, and we can deploy them with just a few mouse clicks.

Install Prometheus and Grafana

Visit the Catalog Apps page of the cluster and search for Prometheus. Grafana and AlertManager are installed along with it. For the purposes of this article, use default values for everything, but if you consider production deployment, read the information under Detailed Descriptions to see how many configurations are available in the chart.

Clicking Launch,Rancher will deploy the application to the cluster, and after a few minutes, you can see that all the workloads under the prometheus namespace are in the Active state.

Xip.io is used by default to set the Layer7 ingress, which we can see on the Load Balancing tab, click the link to open the Grafana dashboard.

The installation of Prometheus also has several dashboards deployed in Grafana, so we can immediately see the information about the cluster and see its performance.

Total knot

Kubernetes can keep the application running as much as possible, but that doesn't mean we don't need to know how the application is running. When you start working with Kubernetes, you also need to deploy a monitoring system to help you understand the situation and make decisions.

Prometheus and Grafana will help you do this, and if you use Rancher, it only takes a few minutes to deploy these two applications. In the upcoming Rancher 2.2, it is equipped with fully integrated Prometheus and Grafana to enhance the visibility of all Kubernetes clusters while ensuring isolation between different projects and users. As a result, Rancher is the only solution that supports Prometheus in a multi-cluster, multi-tenant environment.

Using Prometheus to monitor a Rancher-managed Kubernetes environment requires only two steps:

Select Cluster

Start the monitoring with one click

Here you can learn how to use Prometheus monitoring in multi-Kubernetes clusters and multi-tenant environments more easily and quickly!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.