How to realize the Monitoring of large-scale Container Cluster by combining Kvass and Thanos 04/05 Update SLTechnology News&Howtos

How to realize the Monitoring of large-scale Container Cluster by combining Kvass and Thanos

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to combine Kvass and Thanos to achieve large-scale container cluster monitoring", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let Xiaobian take you to learn "How to combine Kvass and Thanos to realize large-scale container cluster monitoring"!

Is Thanos not enough?

Some students may ask, Thanos is not to solve Prometheus distributed problem, with Thanos can not achieve large-scale Prometheus monitoring? Why do you need a Kvass? Thanos solves Prometheus 'distributed storage and query problems, but does not solve Prometheus' distributed collection problems. If there are too many tasks and data collected, Prometheus will still reach the bottleneck. However, for this problem, we talked about some optimization methods in the first part of the series:

Split collection tasks from service dimension to different Prometheus instances.

Use Prometheus 'hashmod to fragment collection tasks.

However, these optimization methods still have some disadvantages:

The configuration is cumbersome, and the acquisition configuration of each Prometheus instance needs to be configured separately.

You need to estimate the size of the data in advance to configure it.

Different Prometheus instances have different collection tasks, so the load may not be balanced. If the control is not good, there may still be some instances with excessive load.

If you need to expand or shrink Prometheus, you need to adjust it manually, and you can't expand or shrink automatically.

Kvass was born to solve these problems, which is also the focus of this article.

What is Kvass ?

Kvass project is a lightweight horizontal scaling solution of Prometheus open source in Tencent Cloud. It cleverly separates service discovery from collection process, and uses Sidecar to dynamically generate configuration files for Prometheus, so as to achieve the effect of collecting different tasks of different Prometheus without manual configuration. Moreover, it can perform Load Balancer on collection tasks to avoid excessive load of some Prometheus instances. Even if the load is high, it can automatically scale. In combination with Thanos 'global view, It is easy to build a hyper-scale cluster monitoring system using only one profile. Here is the architecture diagram of Kvass+Thanos:

For more details on Kvass, please refer to How to monitor Kubernetes clusters with 100,000 containers with Prometheus. The article details the principle and use effect.

Deployment Practice Deployment Preparation

First download Kvass repo and go to examples directory:

git clone https://github.com/tkestack/kvass.gitcd kvass/examples

Before deploying Kvass we need to have service exposure metrics to collect. We provide a metrics data generator that can specify a certain number of series to generate. In this example, we will deploy 6 copies of metrics generator, each of which will generate 10045 series and deploy them to the cluster with one click:

kubectl create -f metrics.yaml deploy Kvass

Then we deploy Kvass:

kubectl create -f kvass-rbac.yaml # Kvass Required RBAC Configuration kubectl create -f config.yaml # Prometheus Configuration File kubectl create -f coordinator.yaml # Kvass coordinator Deployment Configuration

Among them, the Prometheus configuration file of config.yaml is equipped with the collection of the metrics generator just deployed:

global: scrape_interval: 15s evaluation_interval: 15s external_labels: cluster: customscrape_configs:- job_name: 'metrics-test' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name] regex: metrics action: keep - source_labels: [__meta_kubernetes_pod_ip] action: replace regex: (.*) replacement: ${1}:9091 target_label: __address__ - source_labels: - __meta_kubernetes_pod_name target_label: pod

coordinator.yaml We set the maximum number of head series per shard to not exceed 30000 in the startup parameters of Coordinator:

--shard.max-series=30000

Then deploy the Prometheus instance (containing Thanos Sidecar and Kvass Sidecar), which initially requires only a single copy:

kubectl create -f prometheus-rep-0.yaml

If you need to store data in an object store, please refer to the previous Thanos Deployment and Practice to modify the configuration of Thanos Sidecar.

deploying thanos-query

To get global data, we need to deploy a thanos-query:

kubectl create -f thanos-query.yaml

According to the above calculation, the monitoring target totals 6 targets, 60270 series. According to our setting that each shard cannot exceed 30000 series, we expect to need 3 shards. We found that Coordinator successfully changed the number of copies of StatefulSet to 3.

$ kubectl get podsNAME READY STATUS RESTARTS AGEkvass-coordinator-c68f445f6-g9q5z 2/2 Running 0 64smetrics-5876dccf65-5cncw 1/1 Running 0 75smetrics-5876dccf65-6tw4b 1/1 Running 0 75smetrics-5876dccf65-dzj2c 1/1 Running 0 75smetrics-5876dccf65-gz9qd 1/1 Running 0 75smetrics-5876dccf65-r25db 1/1 Running 0 75smetrics-5876dccf65-tdqd7 1/1 Running 0 75sprometheus-rep-0-0 3/3 Running 0 54sprometheus-rep-0-1 3/3 Running 0 45sprometheus-rep-0-2 3/3 Running 0 45sthanos-query-69b9cb857-d2b45 1/1 Running 0 49s

We then check the global data through thanos-query and find that the data is complete (where metrics0 is the metric name generated by the metric generator):

If you need to view monitoring data with Grafana panels, you can add the thanos-query address as the Prometheus data source: www.example.com. http://thanos-query.default.svc.cluster.local:9090

At this point, I believe that everyone has a deeper understanding of "how to combine Kvass and Thanos to achieve large-scale container cluster monitoring". Let's do it in practice! Here is the website, more related content can enter the relevant channels for inquiry, pay attention to us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.