Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to realize ping Monitoring between Kubernetes nodes

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

Editor to share with you how to achieve ping monitoring between Kubernetes nodes. I hope you will get something after reading this article. Let's discuss it together.

Scripts and configuration

The main component of our solution is a script that monitors the .status.addresses value of each node. If the value of a node has changed (for example, a new node has been added), our script uses Helm value to pass the list of nodes to the Helm chart as ConfigMap:

ApiVersion: v1 kind: ConfigMap metadata: name: ping-exporter-config namespace: d8-system data: nodes.json: > {{.Values.pingExporter.outline | toJson}} .Values.pingExporter.Values.pingExporter.cluster_targets is similar to the following: "cluster_targets": [{"ipAddress": "192.168.191.11", "name": "kube-a-3"}, {"ipAddress": "192.168.191.12", "name": "kube-a-2"}, {"Values.pingExporter": "192.168.191.22"). "name": "kube-a-1"}, {"ipAddress": "192.168.191.23", "name": "kube-db-1"}, {"ipAddress": "192.168.191.9", "name": "kube-db-2"}, {"ipAddress": "51.75.130.47", "name": "kube-a-4"}], "external_targets": [{"host": "8.8.8.8", "name": "google-dns"} {"host": "youtube.com"}]}

Here is the Python script:

#! / usr/bin/env python3

Import subprocess

Import prometheus_client

Import re

Import statistics

Import os

Import json

Import glob

Import better_exchook

Import datetime

Better_exchook.install ()

FPING_CMDLINE = "/ usr/sbin/fping-p 1000-C 30-B 1-Q-r 1" .split ("")

FPING_REGEX = re.compile (r "^ (\ S*)\ swords: (. *) $", re.MULTILINE)

CONFIG_PATH = "/ config/targets.json"

Registry = prometheus_client.CollectorRegistry ()

Prometheus_exceptions_counter =\

Prometheus_client.Counter ('kube_node_ping_exceptions',' Total number of exceptions', [], registry=registry)

Prom_metrics_cluster = {"sent": prometheus_client.Counter ('kube_node_ping_packets_sent_total'

'ICMP packets sent'

['destination_node',' destination_node_ip_address']

Registry=registry)

"received": prometheus_client.Counter ('kube_node_ping_packets_received_total'

'ICMP packets received'

['destination_node',' destination_node_ip_address']

Registry=registry)

"rtt": prometheus_client.Counter ('kube_node_ping_rtt_milliseconds_total'

'round-trip time'

['destination_node',' destination_node_ip_address']

Registry=registry)

"min": prometheus_client.Gauge ('kube_node_ping_rtt_min',' minimum round-trip time')

['destination_node',' destination_node_ip_address']

Registry=registry)

"max": prometheus_client.Gauge ('kube_node_ping_rtt_max',' maximum round-trip time')

['destination_node',' destination_node_ip_address']

Registry=registry)

"mdev": prometheus_client.Gauge ('kube_node_ping_rtt_mdev'

'mean deviation of round-trip times'

['destination_node',' destination_node_ip_address']

Registry=registry)}

Prom_metrics_external = {"sent": prometheus_client.Counter ('external_ping_packets_sent_total'

'ICMP packets sent'

['destination_name',' destination_host']

Registry=registry)

"received": prometheus_client.Counter ('external_ping_packets_received_total'

'ICMP packets received'

['destination_name',' destination_host']

Registry=registry)

"rtt": prometheus_client.Counter ('external_ping_rtt_milliseconds_total'

'round-trip time'

['destination_name',' destination_host']

Registry=registry)

"min": prometheus_client.Gauge ('external_ping_rtt_min',' minimum round-trip time')

['destination_name',' destination_host']

Registry=registry)

"max": prometheus_client.Gauge ('external_ping_rtt_max',' maximum round-trip time')

['destination_name',' destination_host']

Registry=registry)

"mdev": prometheus_client.Gauge ('external_ping_rtt_mdev'

'mean deviation of round-trip times'

['destination_name',' destination_host']

Registry=registry)}

Def validate_envs ():

Envs = {"MY_NODE_NAME": os.getenv ("MY_NODE_NAME"), "PROMETHEUS_TEXTFILE_DIR": os.getenv ("PROMETHEUS_TEXTFILE_DIR")

"PROMETHEUS_TEXTFILE_PREFIX": os.getenv ("PROMETHEUS_TEXTFILE_PREFIX")}

For k, v in envs.items ():

If not v:

Raise ValueError ("{} environment variable is empty" .format (k))

Return envs

@ prometheus_exceptions_counter.count_exceptions ()

Def compute_results (results):

Computed = {}

Matches = FPING_REGEX.finditer (results)

For match in matches:

Host = match.group (1)

Ping_results = match.group (2)

If "duplicate" in ping_results:

Continue

Splitted = ping_results.split ("")

If len (splitted)! = 30:

Raise ValueError ("ping returned wrong number of results:\" {}\ ".format (splitted))

Positive_results = [float (x) for x in splitted if x! = "-"]

If len (positive_results) > 0:

Computed [host] = {"sent": 30, "received": len (positive_results)

"rtt": sum (positive_results)

"max": max (positive_results), "min": min (positive_results)

"mdev": statistics.pstdev (positive_results)}

Else:

Computed [host] = {"sent": 30, "received": len (positive_results), "rtt": 0

"max": 0, "min": 0, "mdev": 0}

If not len (computed):

Raise ValueError ("regex match\" {}\ "found nothing in fping output\" {}\ ".format (FPING_REGEX, results))

Return computed

@ prometheus_exceptions_counter.count_exceptions ()

Def call_fping (ips):

Cmdline = FPING_CMDLINE + ips

Process = subprocess.run (cmdline, stdout=subprocess.PIPE

Stderr=subprocess.STDOUT, universal_newlines=True)

If process.returncode = = 3:

Raise ValueError ("invalid arguments: {}" .format (cmdline))

If process.returncode = = 4:

Raise OSError ("fping reported syscall error: {}" .format (process.stderr))

Return process.stdout

Envs = validate_envs ()

Files = glob.glob (envs ["PROMETHEUS_TEXTFILE_DIR"] + "*")

For f in files:

Os.remove (f)

Labeled_prom_metrics = {"cluster_targets": [], "external_targets": []}

While True:

With open (CONFIG_PATH, "r") as f:

Config = json.loads (f.read ())

Config ["external_targets"] = [] if config ["external_targets"] is None else config ["external_targets"]

For target in config ["external_targets"]:

Target ["name"] = target ["host"] if "name" not in target.keys () else target ["name"]

If labeled_prom_metrics ["cluster_targets"]:

For metric in labeled_prom_metrics ["cluster_targets"]:

If (metric ["node_name"], metric ["ip"]) not in [(node ["name"], node ["ipAddress"]) for node in config ['cluster_targets']]:

For k, v in prom_metrics_cluster.items ():

V.remove (metric ["node_name"], metric ["ip"])

If labeled_prom_metrics ["external_targets"]:

For metric in labeled_prom_metrics ["external_targets"]:

If (metric ["target_name"], metric ["host"]) not in [(target ["name"], target ["host"]) for target in config ['external_targets']]:

For k, v in prom_metrics_external.items ():

V.remove (metric ["target_name"], metric ["host"])

Labeled_prom_metrics = {"cluster_targets": [], "external_targets": []}

For node in config ["cluster_targets"]:

Metrics = {"node_name": node ["name"], "ip": node ["ipAddress"], "prom_metrics": {}}

For k, v in prom_metrics_cluster.items ():

Metrics ["prom_metrics"] [k] = v.labels (node ["name"], node ["ipAddress"])

Labeled_prom_metrics ["cluster_targets"] .append (metrics)

For target in config ["external_targets"]:

Metrics = {"target_name": target ["name"], "host": target ["host"], "prom_metrics": {}}

For k, v in prom_metrics_external.items ():

Metrics ["prom_metrics"] [k] = v.labels (target ["name"], target ["host"])

Labeled_prom_metrics ["external_targets"] .append (metrics)

Out = call_fping ([prom_metric ["ip"] for prom_metric in labeled_prom_metrics ["cluster_targets"]] +\

[prom_metric ["host"] for prom_metric in labeled_prom_metrics ["external_targets"])

Computed = compute_results (out)

For dimension in labeled_prom_metrics ["cluster_targets"]:

Result = computed [dimension ["ip"]]

Dimension ["prom_metrics"] ["sent"] .inc (computed [dimension ["ip"]] ["sent"])

Dimension ["prom_metrics"] ["received"] .inc (computed [dimension ["ip"]] ["received"])

Dimension ["prom_metrics"] ["rtt"] .inc (computed [dimension ["ip"]] ["rtt"])

Dimension ["prom_metrics"] ["min"] .set (computed [dimension ["ip"]] ["min"])

Dimension ["prom_metrics"] ["max"] .set (computed [dimension ["ip"]] ["max"])

Dimension ["prom_metrics"] ["mdev"] .set (computed [dimension ["ip"]] ["mdev"])

For dimension in labeled_prom_metrics ["external_targets"]:

Result = computed [dimension ["host"]]

Dimension ["prom_metrics"] ["sent"] .inc (computed [dimension ["host"]] ["sent"])

Dimension ["prom_metrics"] ["received"] .inc (computed [dimension ["host"]] ["received"])

Dimension ["prom_metrics"] ["rtt"] .inc (computed [dimension ["host"]] ["rtt"])

Dimension ["prom_metrics"] ["min"] .set (computed [dimension ["host"]] ["min"])

Dimension ["prom_metrics"] ["max"] .set (computed [dimension ["host"]] ["max"])

Dimension ["prom_metrics"] ["mdev"] .set (computed [dimension ["host"]] ["mdev"])

Prometheus_client.write_to_textfile (

Envs ["PROMETHEUS_TEXTFILE_DIR"] + envs ["PROMETHEUS_TEXTFILE_PREFIX"] + envs ["MY_NODE_NAME"] + ".prom", registry)

The script runs on each Kubernetes node and sends ICMP packets to all instances of the Kubernetes cluster twice per second. The collected results are stored in a text file.

The script is included in the Docker image:

FROM python:3.6-alpine3.8 COPY rootfs / WORKDIR / app RUN pip3 install-- upgrade pip & & pip3 install-r requirements.txt & & apk add-- no-cache fping ENTRYPOINT ["python3", "/ app/ping-exporter.py"]

In addition, we created a ServiceAccount and a corresponding role with unique permissions to get a list of nodes (so we can know their IP addresses):

ApiVersion: v1 kind: ServiceAccount metadata: name: ping-exporter namespace: d8-system-kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: d8-system:ping-exporter rules:-apiGroups: [""] resources: ["nodes"] verbs: ["list"]-- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: d8-system:kube-ping-exporter subjects:-kind: ServiceAccount name: ping-exporter namespace: D8- System roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: d8-system:ping-exporter

Finally, we need DaemonSet to run all the instances in the cluster:

ApiVersion: apps/v1 kind: DaemonSet metadata: name: ping-exporter namespace: d8-system spec: updateStrategy: type: RollingUpdate selector: matchLabels: name: ping-exporter template: metadata: labels: name: ping-exporter spec: terminationGracePeriodSeconds: 0 tolerations:-operator: "Exists" hostNetwork: true serviceAccountName: ping-exporter priorityClassName: cluster-low containers:-image: private-registry.flant.com/ping-exporter/ping-exporter:v1 name: ping-exporter env :-name: MY_NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName-name: PROMETHEUS_TEXTFILE_DIR value: / node-exporter-textfile/-name: PROMETHEUS_TEXTFILE_PREFIX value: ping-exporter_ volumeMounts:-name: textfile mountPath: / node-exporter-textfile-name: config MountPath: / config volumes:-name: textfile hostPath: path: / var/run/node-exporter-textfile-name: config configMap: name: ping-exporter-config imagePullSecrets:-name: private-registry

The final operational details of the solution are:

When the Python script executes, its results (that is, text files stored in the / var/run/node-exporter-textfile directory on the host) are passed to node-exporter of type DaemonSet.

Node-exporter starts with the-- collector.textfile.directory / host/textfile parameter, where / host/textfile is the hostPath directory / var/run/node-exporter-textfile. (you can click here to learn more about the node-exporter text file collector.)

Finally, node-exporter reads these files, and Prometheus collects all the data from the node-exporter instance.

So how did it turn out?

Now it's time to enjoy the long-awaited results. After the metrics are created, we can use them and, of course, visualize them. You can see what they look like below.

First, there is a universal selector that allows us to select nodes to check their "source" and "destination" connections. You can get a summary table to ping the results of the selected nodes within a specified period of time in the Grafana dashboard:

The following is a graph that contains combined statistics about the selected nodes:

In addition, we have a list of records in which each record is linked to a graph of each specific node selected in the Source node:

If you expand the record, you will see detailed ping statistics from the current node to all other nodes selected in the target node:

Here are the related graphics:

What does the diagram of a problem with ping between nodes look like?

If you observe a similar situation in real life, it's time to troubleshoot!

Finally, this is the visualization of our ping operation on the external host:

We can check the overall view of all nodes, or we can only check the graphics of any particular node:

After reading this article, I believe you have a certain understanding of "how to achieve ping monitoring between Kubernetes nodes". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report