Detailed explanation of Prometheus+Grafan monitoring k8s cluster 07/06 Update SLTechnology News&Howtos

Detailed explanation of Prometheus+Grafan monitoring k8s cluster

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

First, Overview of Prometheus

1. What is Prometheus?

Prometheus is an open source system monitoring and alerting toolkit originally built on SoundCloud. Since its inception in 2012, many companies and organizations have adopted Prometheus, and the project has a very active developer and user community. Now, it is an independent open source project and maintained independently with any company. To emphasize this point and clarify the governance structure of the project, Prometheus joined Cloud Native Computing Foundation (Cloud Native Computing Foundation (CNCF)) in 2016, the second trustee project after kubernetes.

2the advantages of Prometheus

The main advantages of Prometheus are:

A multidimensional data model consisting of index name and key / value identification time series data. The powerful query language (PromQL) does not rely on distributed storage; a single service node has autonomous capabilities. Time series data are collected by pulling method based on HTTP. Time series data can be pushed through an intermediate gateway. Monitoring targets can be obtained through static configuration files or service discovery. Support for many types of icons and dashboards, such as Grafana.

3The core components of Prometheus

The Prometheus ecosystem consists of several components, many of which are optional:

Prometheus Server: used to collect metrics and store time series data, and provide query interface. Client Library: client libraries (such as Go,python,java, etc.) that generate corresponding / metrics (service metric metrics) for the services to be monitored and expose them to Prometheus server. Push gateway: push gateway, mainly used for temporary jobs. Because this kind of jobs exists for a short time, it may disappear before Prometheus comes to pull. For this jobs, the index is regularly push to pushgateway, and then Prometheus server pull from pushgateway. Exporter: metrics for exposing existing third-party services to Prometheus. Alertmanager: used to deal with alarms. After receiving an alarm from the Prometheus server, it will remove the duplicated data, packet it, and route it to the receiving mode of the receiver to issue an alarm. The most common way to receive: e-mail.

4the architecture of Prometheus

The overall architecture and ecosystem components of Prometheus are shown in the following figure:

Prometheus server pulls monitoring indicators directly from the monitoring target or indirectly through the push gateway. It stores all the captured sample data locally and executes a series of rules on this data to summarize and record new time series of existing data and generate alarms. Monitoring data can be visualized through Grafana or other tools.

5 advantages and disadvantages of Prometheus

Prometheus is very good at collecting time series of pure digital values, so it is suitable for physical machine-centric monitoring as well as highly dynamic service-oriented architectures. In the field of micro-services, its multi-dimensional data acquisition and query are very unique and competitive.

The greatest value of Prometheus lies in its reliability. Users can see the statistics of the whole monitored system at any time, even when there is something wrong with the system. But it can not be 100% accurate, for example, if you want to charge by request data, then Prometheus is not suitable for you, because the data it collects may not be detailed and complete. In this case, you'd better use other systems to collect and analyze data for billing, and use Prometheus to monitor the rest of the system.

Second, Prometheus deployment

Deployment environment:

Node name Host ip operating system master172.16.1.30Centos7node01172.16.1.31Centos7node02172.16.1.32Centos7

1. Get the git project of Prometheus:

1) install the git toolkit: [root@master ~] # yum install git-y2) get the git project of Prometheus: [root@master prometheus] # git clone https://github.com/coreos/kube-prometheus.git

# execute the git pull command to update to ensure that the local clone is up-to-date: [root@master kube-prometheus] # git pullAlready up-to-date.

2. Import the component image required for deployment of Prometheus:

1) upload mirror image packages (including master) on all node in the cluster

2) load operations on the node in the cluster:

# Note: make sure to execute [root@master images] # for i in `ls`; do docker load under the current path

< $i; done[root@node01 images]# for i in `ls`; do docker load < $i; done[root@node02 images]# for i in `ls`; do docker load < $i; done 以上镜像都是我通过国内阿里云镜像站下载好的（已修改tag），我已上传至网盘，大家可以去进行下载：链接：https://pan.baidu.com/s/1c8pP3vAS9qHCQqc-XaYRXQ 提取码：8zk2 注意：考虑到以上组件的镜像版本在git项目上会经常的更新，所以大家就得根据最新版本去下载相对应的镜像；yaml文件中默认是从quay.io和gcr.io进行镜像拉取（其他的国内可直接拉取），我们知道，国内访问外网是被屏蔽的，我们无法直接将镜像下载下来，所以可以分别通过 quay-mirror.qiniu.com 和 registry.aliyuncs.com镜像站去拉取。 ###例如：拉取镜像：quay.io/coreos/prometheus-operator:v0.36.0 可以改为：quay-mirror.qiniu.com/coreos/prometheus-operator:v0.36.0 拉取镜像：gcr.io/google_containers/kube-proxy 可以改为：registry.aliyuncs.com/google_containers//kube-proxy 3，修改访问模式为nodeport 1）修改grafana-service文件：[root@master kube-prometheus]# cd manifests/[root@master manifests]# vim grafana-service.yaml 2）修改Prometheus-service文件：[root@master manifests]# vim prometheus-service.yaml 3)修改alertmanager-service文件： 4，执行安装操作 1）先安装Prometheus所需要的资源（在manifests/setup目录下的yaml文件）： [root@master manifests]# kubectl apply -f setup/2）安装Prometheus（在manifests/路径下的yaml文件）：[root@master manifests]# cd ..[root@master kube-prometheus]# kubectl apply -f manifests/ 5，查看Prometheus资源（确保以下pod都达到所期望的状态值） [root@master kube-prometheus]# kubectl get pod -n monitoring [root@master kube-prometheus]# kubectl get svc -n monitoring 以上各组件说明： MerticServer： k8s集群资源使用情况的聚合器，收集数据给k8s集群内使用；如kubectl，hpa，scheduler等。PrometheusOperator：是一个系统监测和警报工具箱，用来存储监控数据。NodeExPorter：用于各个node的关键度量指标状态数据。kubeStateMetrics：收集k8s集群内资源对象数据，指定告警规则。Prometheus：采用pull方式收集apiserver，scheduler，control-manager，kubelet组件数据，通过http协议传输。Grafana：是可视化数据统计和监控平台。 6，Prometheus监控页面展示 1）访问Prometheus web页面：访问url：http://172.16.1.30:30200/ #部署成功后，会显示集群节点各个组件的详细信息，并且状态为up。 2）访问alertmanager web页面：访问url: http://172.16.1.30:30300 3）访问Grafana 图形化界面：访问url： http://172.16.1.30:30100 ，初始用户名和密码都为：admin #修改用户名和密码后点击登录：三，使用Prometheus监控平台 1，为grafana添加Prometheus数据源

As shown in the figure above, you can see that a Prometheus data source has been added for us by default after deploying Prometheus, and you can also click the "Add data source" option in the upper right corner to customize the data source you need. As shown in the following figure:

2, add dashboard for grafana

3. Monitor cluster resources

As shown in the figure above, we have provided some built-in resource monitoring templates, and you can choose to view the resources that need to be monitored. The following shows several important monitored resource object information:

1) View the cluster resource information:

# you can see the display of usage information such as cpu,memory,network and disk IO in the cluster.

2) View the usage of each node resource:

3) Pod resource view:

# as shown above, we can see that the resource monitoring items provided by Prometheus are still very comprehensive. You can check other resource monitoring items on your own.

4. Other monitoring templates

The monitoring templates provided by grafana are very rich, but we can also go to the Grafana website to download other monitoring templates.

1) download the monitoring template, as shown below:

For example, we choose the Node Exporter for Prometheus template:

2) Import the template on the Grafana web interface:

The template is imported successfully, and you can download other types of monitoring templates on the Grafana official website.

For Alertmanager to realize mailbox alarm, please refer to the blog article: monitoring weapon-Prometheus installation and deployment + realizing mailbox alarm

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.