How to realize ping Monitoring between Kubernetes nodes 04/19 Update SLTechnology News&Howtos

Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

How to realize ping Monitoring between Kubernetes nodes

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

Editor to share with you how to achieve ping monitoring between Kubernetes nodes. I hope you will get something after reading this article. Let's discuss it together.

Scripts and configuration

The main component of our solution is a script that monitors the .status.addresses value of each node. If the value of a node has changed (for example, a new node has been added), our script uses Helm value to pass the list of nodes to the Helm chart as ConfigMap:

ApiVersion: v1 kind: ConfigMap metadata: name: ping-exporter-config namespace: d8-system data: nodes.json: > {{.Values.pingExporter.outline | toJson}} .Values.pingExporter.Values.pingExporter.cluster_targets is similar to the following: "cluster_targets": [{"ipAddress": "192.168.191.11", "name": "kube-a-3"}, {"ipAddress": "192.168.191.12", "name": "kube-a-2"}, {"Values.pingExporter": "192.168.191.22"). "name": "kube-a-1"}, {"ipAddress": "192.168.191.23", "name": "kube-db-1"}, {"ipAddress": "192.168.191.9", "name": "kube-db-2"}, {"ipAddress": "51.75.130.47", "name": "kube-a-4"}], "external_targets": [{"host": "8.8.8.8", "name": "google-dns"} {"host": "youtube.com"}]}

Here is the Python script:

#! / usr/bin/env python3

Import subprocess

Import prometheus_client

Import re

Import statistics

Import os

Import json

Import glob

Import better_exchook

Import datetime

Better_exchook.install ()

FPING_CMDLINE = "/ usr/sbin/fping-p 1000-C 30-B 1-Q-r 1" .split ("")

FPING_REGEX = re.compile (r "^ (\ S*)\ swords: (. *) $", re.MULTILINE)

CONFIG_PATH = "/ config/targets.json"

Registry = prometheus_client.CollectorRegistry ()

Prometheus_exceptions_counter =\

Prometheus_client.Counter ('kube_node_ping_exceptions',' Total number of exceptions', [], registry=registry)

Prom_metrics_cluster = {"sent": prometheus_client.Counter ('kube_node_ping_packets_sent_total'

'ICMP packets sent'

['destination_node',' destination_node_ip_address']

Registry=registry)

"received": prometheus_client.Counter ('kube_node_ping_packets_received_total'

'ICMP packets received'

['destination_node',' destination_node_ip_address']

Registry=registry)

"rtt": prometheus_client.Counter ('kube_node_ping_rtt_milliseconds_total'

'round-trip time'

['destination_node',' destination_node_ip_address']

Registry=registry)

"min": prometheus_client.Gauge ('kube_node_ping_rtt_min',' minimum round-trip time')

['destination_node',' destination_node_ip_address']

Registry=registry)

"max": prometheus_client.Gauge ('kube_node_ping_rtt_max',' maximum round-trip time')

['destination_node',' destination_node_ip_address']

Registry=registry)

"mdev": prometheus_client.Gauge ('kube_node_ping_rtt_mdev'

'mean deviation of round-trip times'

['destination_node',' destination_node_ip_address']

Registry=registry)}

Prom_metrics_external = {"sent": prometheus_client.Counter ('external_ping_packets_sent_total'

'ICMP packets sent'

['destination_name',' destination_host']

Registry=registry)

"received": prometheus_client.Counter ('external_ping_packets_received_total'

'ICMP packets received'

['destination_name',' destination_host']

Registry=registry)

"rtt": prometheus_client.Counter ('external_ping_rtt_milliseconds_total'

'round-trip time'

['destination_name',' destination_host']

Registry=registry)

"min": prometheus_client.Gauge ('external_ping_rtt_min',' minimum round-trip time')

['destination_name',' destination_host']

Registry=registry)

"max": prometheus_client.Gauge ('external_ping_rtt_max',' maximum round-trip time')

['destination_name',' destination_host']

Registry=registry)

"mdev": prometheus_client.Gauge ('external_ping_rtt_mdev'

'mean deviation of round-trip times'

['destination_name',' destination_host']

Registry=registry)}

Def validate_envs ():

Envs = {"MY_NODE_NAME": os.getenv ("MY_NODE_NAME"), "PROMETHEUS_TEXTFILE_DIR": os.getenv ("PROMETHEUS_TEXTFILE_DIR")

"PROMETHEUS_TEXTFILE_PREFIX": os.getenv ("PROMETHEUS_TEXTFILE_PREFIX")}

For k, v in envs.items ():

If not v:

Raise ValueError ("{} environment variable is empty" .format (k))

Return envs

@ prometheus_exceptions_counter.count_exceptions ()

Def compute_results (results):

Computed = {}

Matches = FPING_REGEX.finditer (results)

For match in matches:

Host = match.group (1)

Ping_results = match.group (2)

If "duplicate" in ping_results:

Continue

Splitted = ping_results.split ("")

If len (splitted)! = 30:

Raise ValueError ("ping returned wrong number of results:\" {}\ ".format (splitted))

Positive_results = [float (x) for x in splitted if x! = "-"]

If len (positive_results) > 0:

Computed [host] = {"sent": 30, "received": len (positive_results)

"rtt": sum (positive_results)

"max": max (positive_results), "min": min (positive_results)

"mdev": statistics.pstdev (positive_results)}

Else:

Computed [host] = {"sent": 30, "received": len (positive_results), "rtt": 0

"max": 0, "min": 0, "mdev": 0}

If not len (computed):

Raise ValueError ("regex match\" {}\ "found nothing in fping output\" {}\ ".format (FPING_REGEX, results))

Return computed

@ prometheus_exceptions_counter.count_exceptions ()

Def call_fping (ips):

Cmdline = FPING_CMDLINE + ips

Process = subprocess.run (cmdline, stdout=subprocess.PIPE

Stderr=subprocess.STDOUT, universal_newlines=True)

If process.returncode = = 3:

Raise ValueError ("invalid arguments: {}" .format (cmdline))

If process.returncode = = 4:

Raise OSError ("fping reported syscall error: {}" .format (process.stderr))

Return process.stdout

Envs = validate_envs ()

Files = glob.glob (envs ["PROMETHEUS_TEXTFILE_DIR"] + "*")

For f in files:

Os.remove (f)

Labeled_prom_metrics = {"cluster_targets": [], "external_targets": []}

While True:

With open (CONFIG_PATH, "r") as f:

Config = json.loads (f.read ())

Config ["external_targets"] = [] if config ["external_targets"] is None else config ["external_targets"]

For target in config ["external_targets"]:

Target ["name"] = target ["host"] if "name" not in target.keys () else target ["name"]

If labeled_prom_metrics ["cluster_targets"]:

For metric in labeled_prom_metrics ["cluster_targets"]:

If (metric ["node_name"], metric ["ip"]) not in [(node ["name"], node ["ipAddress"]) for node in config ['cluster_targets']]:

For k, v in prom_metrics_cluster.items ():

V.remove (metric ["node_name"], metric ["ip"])

If labeled_prom_metrics ["external_targets"]:

For metric in labeled_prom_metrics ["external_targets"]:

If (metric ["target_name"], metric ["host"]) not in [(target ["name"], target ["host"]) for target in config ['external_targets']]:

For k, v in prom_metrics_external.items ():

V.remove (metric ["target_name"], metric ["host"])

Labeled_prom_metrics = {"cluster_targets": [], "external_targets": []}

For node in config ["cluster_targets"]:

Metrics = {"node_name": node ["name"], "ip": node ["ipAddress"], "prom_metrics": {}}

For k, v in prom_metrics_cluster.items ():

Metrics ["prom_metrics"] [k] = v.labels (node ["name"], node ["ipAddress"])

Labeled_prom_metrics ["cluster_targets"] .append (metrics)

For target in config ["external_targets"]:

Metrics = {"target_name": target ["name"], "host": target ["host"], "prom_metrics": {}}

For k, v in prom_metrics_external.items ():

Metrics ["prom_metrics"] [k] = v.labels (target ["name"], target ["host"])

Labeled_prom_metrics ["external_targets"] .append (metrics)

Out = call_fping ([prom_metric ["ip"] for prom_metric in labeled_prom_metrics ["cluster_targets"]] +\

[prom_metric ["host"] for prom_metric in labeled_prom_metrics ["external_targets"])

Computed = compute_results (out)

For dimension in labeled_prom_metrics ["cluster_targets"]:

Result = computed [dimension ["ip"]]

Dimension ["prom_metrics"] ["sent"] .inc (computed [dimension ["ip"]] ["sent"])

Dimension ["prom_metrics"] ["received"] .inc (computed [dimension ["ip"]] ["received"])

Dimension ["prom_metrics"] ["rtt"] .inc (computed [dimension ["ip"]] ["rtt"])

Dimension ["prom_metrics"] ["min"] .set (computed [dimension ["ip"]] ["min"])

Dimension ["prom_metrics"] ["max"] .set (computed [dimension ["ip"]] ["max"])

Dimension ["prom_metrics"] ["mdev"] .set (computed [dimension ["ip"]] ["mdev"])

For dimension in labeled_prom_metrics ["external_targets"]:

Result = computed [dimension ["host"]]

Dimension ["prom_metrics"] ["sent"] .inc (computed [dimension ["host"]] ["sent"])

Dimension ["prom_metrics"] ["received"] .inc (computed [dimension ["host"]] ["received"])

Dimension ["prom_metrics"] ["rtt"] .inc (computed [dimension ["host"]] ["rtt"])

Dimension ["prom_metrics"] ["min"] .set (computed [dimension ["host"]] ["min"])

Dimension ["prom_metrics"] ["max"] .set (computed [dimension ["host"]] ["max"])

Dimension ["prom_metrics"] ["mdev"] .set (computed [dimension ["host"]] ["mdev"])

Prometheus_client.write_to_textfile (

Envs ["PROMETHEUS_TEXTFILE_DIR"] + envs ["PROMETHEUS_TEXTFILE_PREFIX"] + envs ["MY_NODE_NAME"] + ".prom", registry)

The script runs on each Kubernetes node and sends ICMP packets to all instances of the Kubernetes cluster twice per second. The collected results are stored in a text file.

The script is included in the Docker image:

FROM python:3.6-alpine3.8 COPY rootfs / WORKDIR / app RUN pip3 install-- upgrade pip & & pip3 install-r requirements.txt & & apk add-- no-cache fping ENTRYPOINT ["python3", "/ app/ping-exporter.py"]

In addition, we created a ServiceAccount and a corresponding role with unique permissions to get a list of nodes (so we can know their IP addresses):

ApiVersion: v1 kind: ServiceAccount metadata: name: ping-exporter namespace: d8-system-kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: d8-system:ping-exporter rules:-apiGroups: [""] resources: ["nodes"] verbs: ["list"]-- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: d8-system:kube-ping-exporter subjects:-kind: ServiceAccount name: ping-exporter namespace: D8- System roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: d8-system:ping-exporter

Finally, we need DaemonSet to run all the instances in the cluster:

ApiVersion: apps/v1 kind: DaemonSet metadata: name: ping-exporter namespace: d8-system spec: updateStrategy: type: RollingUpdate selector: matchLabels: name: ping-exporter template: metadata: labels: name: ping-exporter spec: terminationGracePeriodSeconds: 0 tolerations:-operator: "Exists" hostNetwork: true serviceAccountName: ping-exporter priorityClassName: cluster-low containers:-image: private-registry.flant.com/ping-exporter/ping-exporter:v1 name: ping-exporter env :-name: MY_NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName-name: PROMETHEUS_TEXTFILE_DIR value: / node-exporter-textfile/-name: PROMETHEUS_TEXTFILE_PREFIX value: ping-exporter_ volumeMounts:-name: textfile mountPath: / node-exporter-textfile-name: config MountPath: / config volumes:-name: textfile hostPath: path: / var/run/node-exporter-textfile-name: config configMap: name: ping-exporter-config imagePullSecrets:-name: private-registry

The final operational details of the solution are:

When the Python script executes, its results (that is, text files stored in the / var/run/node-exporter-textfile directory on the host) are passed to node-exporter of type DaemonSet.

Node-exporter starts with the-- collector.textfile.directory / host/textfile parameter, where / host/textfile is the hostPath directory / var/run/node-exporter-textfile. (you can click here to learn more about the node-exporter text file collector.)

Finally, node-exporter reads these files, and Prometheus collects all the data from the node-exporter instance.

So how did it turn out?

Now it's time to enjoy the long-awaited results. After the metrics are created, we can use them and, of course, visualize them. You can see what they look like below.

First, there is a universal selector that allows us to select nodes to check their "source" and "destination" connections. You can get a summary table to ping the results of the selected nodes within a specified period of time in the Grafana dashboard:

The following is a graph that contains combined statistics about the selected nodes:

In addition, we have a list of records in which each record is linked to a graph of each specific node selected in the Source node:

If you expand the record, you will see detailed ping statistics from the current node to all other nodes selected in the target node:

Here are the related graphics:

What does the diagram of a problem with ping between nodes look like?

If you observe a similar situation in real life, it's time to troubleshoot!

Finally, this is the visualization of our ping operation on the external host:

We can check the overall view of all nodes, or we can only check the graphics of any particular node:

After reading this article, I believe you have a certain understanding of "how to achieve ping monitoring between Kubernetes nodes". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Weibo

Weibo

Tencent

Tencent

Renren

Renren

QQZone

QQZone

Douban

Douban

More

More

Weibo

Weibo

Tencent

Tencent

Renren

Renren

QQZone

QQZone

Douban

Douban

Yixin

Yixin

Related

Build zoopker+hbase environment

Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope

2023-12-25 21:17:29 shulou Views: 339
Lenovo Rescue Y700 2023 tablet push ZUI 15.0.723 system Grayscale Test: add

CTOnews.com, December 23, ZUI officially announced that the Rescue Y700 2023 tablet will open the ZU

2023-12-25 21:14:50 shulou Views: 359
Cybertruck: future species redefine cars

"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un

2023-12-25 21:13:36 shulou Views: 364
An example Analysis of the Wechat Web Landing Authorization of the Wechat Public platform of php version

This article mainly shows you the "php version of Wechat public platform Wechat web login authorizat

2023-12-25 20:31:50 shulou Views: 270
What are the relevant knowledge points of PHP class

2023-12-25 20:31:29 shulou Views: 322

Development

More Development >

Latest Network Security More Network Security >

Latest Internet Technology More Internet Technology >

Latest Development More Development >

Latest Database More Database >

Latest Servers More Servers >

Latest Mobile Phone More Mobile Phone >

Latest Android Software More Android Software >

Latest Apple Software More Apple Software >

Latest Computer Software News More Computer Software News >

Latest IT Information More IT Information >

Wechat

About us Contact us Product review car news thenatureplanet

© 2024 shulou.com SLNews company. All rights reserved.

12

Report