How does Python call Prometheus to monitor data and calculate 07/08 Update SLTechnology News&Howtos

How does Python call Prometheus to monitor data and calculate

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

Editor to share with you how Python calls Prometheus monitoring data and calculation, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

What is Prometheus?

Prometheus is an open source monitoring system and alarm, developed by go language (golang). It is monitoring + alarm + time series.

According to the combination of the library. Suitable for monitoring docker containers. Because the popularity of kubernetes (K8s) drives its development.

The main features of Prometheus

A multi-dimensional data model, time series data identified by indicator names and key / value pairs.

As a time series database, the collected data will be stored locally in the form of files.

Flexible query language, PromQL (Prometheus Query Language) functional query language.

Independent of distributed storage, a single server node is autonomous.

Pull the time series data through the pull model in the way of HTTP.

The push model can also be supported through intermediate gateways.

The target service object is discovered through service discovery or static configuration.

Support a variety of charts and interface displays.

Prometheus principle architecture diagram

Prometheus basic Concepts what is time Series data

Time series data (TimeSeries Data): the data that records the state changes of the system and equipment in chronological order is called time series data.

There are many scenarios for application, such as:

The longitude, latitude, speed, direction, distance of the next object recorded in self-driving operation.

Track data of vehicles in a certain area.

Real-time trading data of traditional securities industry.

Real-time operation and maintenance monitoring data, etc.

Characteristics of time series data:

Good performance and low storage cost

What is targets (goal)

Prometheus is a monitoring platform that collects metrics from monitoring targets (targets) by crawling metrics HTTP endpoints on those targets.

After installing the Prometheus Server side, the first targets is itself.

For details, please refer to the official documentation.

What is metrics (indicator)

There are many different monitoring metrics (Metrics) in Prometheus, and different Metrics should be selected in different scenarios.

There are four merics types of Prometheus, which are Counter, Gauge, Summary and Histogram.

Counter: counters that only increase but not decrease

Gauge: dashboards that can be added or reduced

Histogram: analyze data distribution

Summary: use less

Simple understanding is enough, but there is no need for in-depth understanding for the time being.

Access http:// monitored IP:9100 (monitored port) / metrics through browser

The monitoring information collected by node_exporter at the monitored end can be found.

What is PromQL (functional query language)

Prometheus has built-in a powerful data query language PromQL. The query and aggregation of monitoring data can be realized through PromQL.

At the same time, PromQL is also used in data visualization (such as Grafana) and alarm.

The following questions can be easily answered through PromQL:

What is the distribution range of 95% application delay time in the past?

What is the approximate disk space usage forecast in 4 hours?

What are the top 5 CPU occupancy services? (filter)

Please refer to the official for details of the inquiry.

How to monitor remote Linux hosts

Installing the Prometheus component is actually very simple, download the package-decompress it-start it in the background and run it without a specific demonstration.

Install node_exporter components on the remote linux host (the monitored side). You can see the download address.

After downloading and decompressing, there is only a startup command node_exporter, which can be started directly.

Nohup / usr/local/node_exporter/node_exporter > / dev/null 2 > & 1 & lsof-iVue 9100

Nohup: if you start node_exporter directly, the terminal shutdown process will also shut down. This command will help you solve the problem.

Prometheus HTTP API

All stable HTTP API of Prometheus are under the / api/v1 path. When we have data query requirements, we can request monitoring data by querying API, and the data can be submitted using remote write protocol or Pushgateway.

Supported API

Authentication method

Authentication is enabled by default, so all APIs require authentication, and all authentication methods support Bearer Token and Basic Auth.

When calling the API, we need to carry the authentication of the Basic Auth request header, otherwise 401 will appear.

Bearer Token

Bearer Token is generated with the generation of instances, which can be queried through the console. For more information about Bearer Token, see Bearer Authentication.

Basic Auth

Basic Auth is compatible with native Prometheus Query authentication. The user name is the user's APPID and the password is bearer token (generated when the instance is generated), which can be queried through the console. For more information about Basic Auth, see Basic Authentication.

Data return format

The response data format for all API is JSON. Each successful request returns a 2xx status code.

An invalid request will return data in JSON format containing the error object, as well as a status code of the following table:

The response template for invalid request is as follows:

{"status": "success" | "error", "data": / / when the status status is error, the following data will be returned "errorType": "," error ":", / / when there is a warning message when the request is executed, this field will be populated and return "warnings": ["]} data to write.

The operation and maintenance process does not need to write data, so there is no in-depth understanding for the time being.

Interested students can take a look at the official documents.

Monitoring data query

When we have data query requirements, we can request monitoring data by querying API.

Query API API

GET / api/v1/queryPOST / api/v1/query

Query parameters:

Query=: Prometheus: query expression.

Time=: timestamp, optional.

Timeout=: detect the timeout. Optional. The default is specified by the-query.timeout parameter.

Simple query

Query the monitoring host with the current status of up:

Curl-u "appid:token" 'http://IP:PORT/api/v1/query?query=up'

Range query

GET / api/v1/query_rangePOST / api/v1/query_range

Query the required data according to the time range, which is also the scenario we use most often.

At this point, we need to use the / api/v1/query_range interface. The example is as follows:

$curl 'http://localhost:9090/api/v1/query_range?query=up&start=2015-07-01T20:10:30.781Z&end=2015-07-01T20:11:00.781Z&step=15s'{ "status": "success", "data": {"resultType": "matrix" "result": [{"metric": {"_ name__": "up", "job": "prometheus", "instance": "localhost:9090"}, "values": [[1435781430.781, "1"] [1435781445.781, "1"], [1435781460.781, "1"]}, {"metric": {"_ _ name__": "up", "job": "node" "instance": "localhost:9091"}, "values": [[1435781430.781, "0"], [1435781445.781, "0"], [1435781460.781, "1"]}]}} what is Grafana

Grafana is an open source measurement analysis and visualization tool, which can be used to analyze and query the collected data.

Then carry on the visual display, and can realize the alarm.

Web site: https://grafana.com/

Connect to Prometheus using Grafana

The connection is no longer demonstrated, and the operation ideas are as follows:

1. Install on the Grafana server, download address: https://grafana.com/grafana/download

two。 Browser http://grafana server IP:3000 login, the default account password is admin, you can log in.

3. Add the data collected by the Prometheus server as a data source to Grafana to get the Prometheus data.

4. Then do a graphical display for the added data source, and finally you can see it in dashboard.

The operation flow is not difficult, do not explain the key points, and then officially start on the query script.

Work usage scenario

At work, you need to generate resource utilization reports through CPU and memory, and you can write a Python script through Prometheus's API.

The data can be obtained through API, and then sorted, filtered, calculated, aggregated, and finally written to the Mysql database.

CPU peak calculation

Take the CPU value of the last week, and then sort to the highest value.

Def get_cpu_peak (self): "CPU takes all the values of the last week And then sort to take the highest value. TOP1: return: {'IP': value} "" # splicing URL pre_url = self.server_ip +' / api/v1/query_range?query=' expr = '100-(avg by (instance) (irate (node_cpu_seconds_total {mode= "idle"} [5m])) * 100)'\'& start=%s&end=%s&step=300'% (self.time_list [0]) Self.time_list [- 1]-1) url= pre_url + expr # print (url) result = {} # convert Json data to dictionary object res = json.loads (requests.post (url=url, headers=self.headers) .content.decode ('utf8',' ignore')) # print (data) # Loop fetch the values of each IP in the dictionary Sort to take the highest value Finally, deposit the result dictionary for da in res.get ('data'). Get (' result'): values = da.get ('values') cpu_values = [float (v [1]) for v in values] # take out the value and store it in the list # take out the IP and remove the port number ip = da.get (' metric'). Get ('instance') ip = ip [: Ip.index (':')] if':'in ip else ip # if ip = '10.124.58.181: # print (ip) # cpu_peak = round (sorted (cpu_values) Reverse=True) [0], 2) cpu_peak = sorted (cpu_values, reverse=True) [0] # after taking out the IP and the highest value Write to the dictionary result [ip] = cpu_peak # print (result) return resultCPU mean calculation

Divide the TOP20 of each day of the most recent week's CPU by 20 to get the busy-hour average at that time.

Then divide the sum of the 7-day average by n to get the busy-hour average in the time range.

Def get_cpu_average (self): CPU busy hour average: take the CPU data of the most recent week, divide the TOP20 of each day by 20 to get the busy hour average; then add the busy hour average of the week, and then divide by 7 to get the busy hour average of the week in the time range. : return: "" cpu_average = {} for t in range (len (self.time_list)): if t + 1 < len (self.time_list): start_time = self.time_ list [t] end_time = self.time_ list [t + 1] # print (start_time End_time) # splicing URL pre_url = server_ip +'/ api/v1/query_range?query=' expr = '100-(avg by (instance) (irate (node_cpu_seconds_total {mode= "idle"} [5m])) * 100)'\'& start=%s&end=%s&step=300'% (start_time End_time-1) url= pre_url + expr # print (url) # request interface data data = json.loads (requests.post (url=url, headers=self.headers) .content.decode ('utf8' 'ignore')) for da in data.get (' data'). Get ('result'): # Loop to get result data values = da.get (' values') cpu_load = [float (v [1]) for v in values] # Loop to get all the values in values ip = da.get ('metric'). Get ('instance') # get ip ip = ip [: ip.index (':')] if':'in ip else ip # remove the port number # avg_cup_load = sum (sorted (cpu_load)) in the instance Reverse=True) [: 20]) / 20 # take top20% and divide by 20% The average value of top20% is # avg_cup_load = round (sum (sorted (cpu_load, reverse=True) [: round (len (cpu_load) * 0.2)]) / round (len (cpu_load) * 0.2), 2) # divide the first 20% by the number in reverse order Get the average of the first 20% avg_cup_load = sum (sorted (cpu_load) Reverse=True) [: round (len (cpu_load) * 0.2)] / round (len (cpu_load) * 0.2) # print (avg_cup_load) # write the calculated data to the dictionary if cpu_average.get (ip): cpu_ average [IP] .append (avg_cup_load) with ip as key ) else: cpu_ average [IP] = [avg_cup_load] # accumulation of daily top20 averages Total 7 days divided by 7 for k, v in cpu_average.items (): # cpu_ average [k] = round (sum (v) / 7 2) cpu_ average [k] = sum (v) # print (cpu_average) return cpu_average memory peak calculation

Take the 7-day memory value, and select the highest peak TOP1 after sorting.

Def get_mem_peak (self): "" single memory peak: take 7-day memory peak TOP1: return: 7-day memory utilization peak "pre_url = self.server_ip +'/ api/v1/query_range?query=' # expr ='(node_memory_MemTotal_bytes-(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_) Bytes)) / node_memory_MemTotal_bytes * 100 starting% stempering% swarming stepwise 300'% (start_time) End_time) the # character is too long to cause an error So here is the split field calculation expr_MenTotal = 'node_memory_MemTotal_bytes&start=%s&end=%s&step=300'% (self.time_list [0], self.time_list [- 1]-1) expr_MemFree =' node_memory_MemFree_bytes&start=%s&end=%s&step=300'% (self.time_list [0]) Self.time_list [- 1]-1) expr_Buffers = 'node_memory_Buffers_bytes&start=%s&end=%s&step=300'% (self.time_list [0], self.time_list [- 1]-1) expr_Cached =' node_memory_Cached_bytes&start=%s&end=%s&step=300'% (self.time_list [0]) Self.time_list [- 1]-1) result = {} # cycle takes out total memory, available memory, Buffer block and cache block, respectively, for ur in expr_MenTotal, expr_MemFree, expr_Buffers, expr_Cached: url= pre_url + ur data = json.loads (requests.post (url=url, headers=self.headers) .content.decode ('utf8') 'ignore')) ip_dict = {} # cycle all values of a single field for da in data.get (' data'). Get ('result'): ip = da.get (' metric'). Get ('instance') ip = ip [: ip.index (':)] if':'in ip else ip # if ip! = '10.124 .53.12filter: # continue if ip_dict.get (ip): # filter duplicate ip Repeating ip will cause # print ("repeat ip:%s"% (ip)) continue values = da.get ('values') # to convert the values in the list to a dictionary to facilitate the calculation of values_dict = {} for v in values: Values_ lists [str (v [0])] = v [1] # Mark ip exists ip_ lists [IP] = True # build list append dictionary if result.get (ip): result[ IP] .append (values_dict) Else: result [ip] = [values_dict] # print (result) # calculates the four values taken out Get the peak for ip, values in result.items (): values_list = [] for k V in values [0] .items (): try: values_MenTotal = float (v) values_MemFree = float (values [1] .get (k, 0)) values_Buffers = float (values [2] .get (k) 0) if values [2] else 0 values_Cached = float (values [3] .get (k, 0)) if values [3] else 0 # if 0 Do not participate in the calculation of if values_MemFree==0.0 or values_Buffers==0.0 or values_Cached==0.0: continue # values_list.append (round ((values_MenTotal-(values_MemFree + values_Buffers + values_Cached)) / values_MenTotal * 100 2)) # calculation after merger Get the list values_list.append ((values_MenTotal-(values_MemFree + values_Buffers + values_Cached)) / values_MenTotal * 100) # sort the results result [ip] = sorted (values_list) Reverse=True) [0] except Exception as e: # print (values [0]) logging.exception (e) # print (result) return result memory mean calculation

Take out the date of 7 days first, extract daily data according to multiple link cycles, sort value, take top20 divided by 20, and finally divide the data of 7 days by 7.

Def get_mem_average (self): "" memory busy hour average: take out the date of 7 days first, retrieve the daily data according to multiple links, and sort value by top20 divided by 20. The final 7-day data is divided by 7: return: "" avg_mem_util = {} for t in range (len (self.time_list)): if t + 1 < len (self.time_list): start_time = self.time_ list [t] end_time = self.time_ list [t + 1] # according to multiple chains Then take out the daily data pre_url = self.server_ip +'/ api/v1/query_range?query=' # expr ='(node_memory_MemTotal_bytes-(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100 starting% endurance% step 300'% (start_time) End_time) expr_MenTotal = 'node_memory_MemTotal_bytes&start=%s&end=%s&step=600'% (start_time, end_time-1) expr_MemFree =' node_memory_MemFree_bytes&start=%s&end=%s&step=600'% (start_time, end_time-1) expr_Buffers = 'node_memory_Buffers_bytes&start=%s&end=%s&step=600'% (start_time) End_time-1) expr_Cached = 'node_memory_Cached_bytes&start=%s&end=%s&step=600'% (start_time, end_time-1) result = {} # Loop out four fields for ur in expr_MenTotal, expr_MemFree, expr_Buffers Expr_Cached: url= pre_url + ur data = json.loads (requests.post (url=url, headers=self.headers) .content.decode ('utf8' 'ignore')) ip_dict = {} # cycle all values of a single field for da in data.get (' data'). Get ('result'): ip = da.get (' metric'). Get ('instance') ip = ip [: ip.index (':')] if': 'in ip else ip if ip_dict.get (ip): # print ("repeat ip:%s"% (ip)) continue values = da.get (' values') # convert the values in the list to a dictionary for easy calculation Values_dict = {} for v in values: values_ exists [str (v [0])] = v [1] # Mark ip exists ip_ alarm [IP] = True # Set up a list and append dictionary if result.get (ip): result [IP] .append (values_dict) else: result [ip] = [values_dict] # print (result) for ip Values in result.items (): values_list = [] for k V in values [0] .items (): try: values_MenTotal = float (v) values_MemFree = float (values [1] .get (k) 0) if values [1] else 0 values_Buffers = float (values [2] .get (k, 0)) if values [2] else 0 values_Cached = float (values [3] .get (k) 0) if values [3] else 0 if values_MemFree = = 0.0 or values_Buffers = = 0.0 or values_Cached = = 0.0: continue value_calc = (values_MenTotal- (values_MemFree + values_Buffers + values_Cached) / values_MenTotal * 100if value_calc! = float (0): values_list.append (value_calc) Except Exception as e: print (values [0]) # logging.exception (e) continue # sort value take top20 divided by 20 # avg_mem = round (sum (sorted (values_list) Reverse=True) [: round (len (values_list) * 0.2)]) / round (len (values_list) * 0.2), 2) try: avg_mem = sum (sorted (values_list) Reverse=True) [: round (len (values_list) * 0.2)]) / round (len (values_list) * 0.2) except Exception as e: avg_mem = 0 Logging.exception (e) if avg_mem_util.get (ip): avg_mem_ util [IP ] .append (avg_mem) else: avg_mem_ util [IP] = [avg_mem] # final 7-day data divided by 7 for k V in avg_mem_util.items (): # avg_mem_ util [k] = round (sum (v) / 7 2) avg_mem_ util [k] = sum (v) return avg_mem_util export excel

Export the collected data to excel

Def export_excel (self, export): "" Export the collected data excel: param export: data collection: return: "try: # convert the dictionary list to DataFrame pf = pd.DataFrame (list (export)) # specify the field order order = ['ip',' cpu_peak', 'cpu_average'" 'mem_peak',' mem_average', 'collector'] pf = pf [order] # replace the column name with Chinese columns_map = {' ip': 'ip',' cpu_peak': 'CPU peak utilization', 'cpu_average':' CPU busy hour average peak utilization' 'mem_peak':' peak memory utilization', 'mem_average':' average peak memory busy hour utilization', 'collector':' source address'} pf.rename (columns=columns_map Inplace=True) # specify the name of the generated Excel table writer_name = self.Host + '.xlsx' writer_name.replace (': 18600 tables,'') # print (writer_name) file_path = pd.ExcelWriter (writer_name.replace (': 18600 tables,')) # replace empty cell pf.fillna ('') Inplace=True) # output pf.to_excel (file_path, encoding='utf-8', index=False) # Save form file_path.save () except Exception as e: print (e) logging.exception (e)

Because the data in the computer room needs to be retained for easy display, it is later transformed into mysql for collection and direct storage.

The above is all the contents of the article "how Python calls Prometheus to monitor data and calculate". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.