In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
Editor to share with you how to achieve ping monitoring between Kubernetes nodes. I hope you will get something after reading this article. Let's discuss it together.
Scripts and configuration
The main component of our solution is a script that monitors the .status.addresses value of each node. If the value of a node has changed (for example, a new node has been added), our script uses Helm value to pass the list of nodes to the Helm chart as ConfigMap:
ApiVersion: v1 kind: ConfigMap metadata: name: ping-exporter-config namespace: d8-system data: nodes.json: > {{.Values.pingExporter.outline | toJson}} .Values.pingExporter.Values.pingExporter.cluster_targets is similar to the following: "cluster_targets": [{"ipAddress": "192.168.191.11", "name": "kube-a-3"}, {"ipAddress": "192.168.191.12", "name": "kube-a-2"}, {"Values.pingExporter": "192.168.191.22"). "name": "kube-a-1"}, {"ipAddress": "192.168.191.23", "name": "kube-db-1"}, {"ipAddress": "192.168.191.9", "name": "kube-db-2"}, {"ipAddress": "51.75.130.47", "name": "kube-a-4"}], "external_targets": [{"host": "8.8.8.8", "name": "google-dns"} {"host": "youtube.com"}]}
Here is the Python script:
#! / usr/bin/env python3
Import subprocess
Import prometheus_client
Import re
Import statistics
Import os
Import json
Import glob
Import better_exchook
Import datetime
Better_exchook.install ()
FPING_CMDLINE = "/ usr/sbin/fping-p 1000-C 30-B 1-Q-r 1" .split ("")
FPING_REGEX = re.compile (r "^ (\ S*)\ swords: (. *) $", re.MULTILINE)
CONFIG_PATH = "/ config/targets.json"
Registry = prometheus_client.CollectorRegistry ()
Prometheus_exceptions_counter =\
Prometheus_client.Counter ('kube_node_ping_exceptions',' Total number of exceptions', [], registry=registry)
Prom_metrics_cluster = {"sent": prometheus_client.Counter ('kube_node_ping_packets_sent_total'
'ICMP packets sent'
['destination_node',' destination_node_ip_address']
Registry=registry)
"received": prometheus_client.Counter ('kube_node_ping_packets_received_total'
'ICMP packets received'
['destination_node',' destination_node_ip_address']
Registry=registry)
"rtt": prometheus_client.Counter ('kube_node_ping_rtt_milliseconds_total'
'round-trip time'
['destination_node',' destination_node_ip_address']
Registry=registry)
"min": prometheus_client.Gauge ('kube_node_ping_rtt_min',' minimum round-trip time')
['destination_node',' destination_node_ip_address']
Registry=registry)
"max": prometheus_client.Gauge ('kube_node_ping_rtt_max',' maximum round-trip time')
['destination_node',' destination_node_ip_address']
Registry=registry)
"mdev": prometheus_client.Gauge ('kube_node_ping_rtt_mdev'
'mean deviation of round-trip times'
['destination_node',' destination_node_ip_address']
Registry=registry)}
Prom_metrics_external = {"sent": prometheus_client.Counter ('external_ping_packets_sent_total'
'ICMP packets sent'
['destination_name',' destination_host']
Registry=registry)
"received": prometheus_client.Counter ('external_ping_packets_received_total'
'ICMP packets received'
['destination_name',' destination_host']
Registry=registry)
"rtt": prometheus_client.Counter ('external_ping_rtt_milliseconds_total'
'round-trip time'
['destination_name',' destination_host']
Registry=registry)
"min": prometheus_client.Gauge ('external_ping_rtt_min',' minimum round-trip time')
['destination_name',' destination_host']
Registry=registry)
"max": prometheus_client.Gauge ('external_ping_rtt_max',' maximum round-trip time')
['destination_name',' destination_host']
Registry=registry)
"mdev": prometheus_client.Gauge ('external_ping_rtt_mdev'
'mean deviation of round-trip times'
['destination_name',' destination_host']
Registry=registry)}
Def validate_envs ():
Envs = {"MY_NODE_NAME": os.getenv ("MY_NODE_NAME"), "PROMETHEUS_TEXTFILE_DIR": os.getenv ("PROMETHEUS_TEXTFILE_DIR")
"PROMETHEUS_TEXTFILE_PREFIX": os.getenv ("PROMETHEUS_TEXTFILE_PREFIX")}
For k, v in envs.items ():
If not v:
Raise ValueError ("{} environment variable is empty" .format (k))
Return envs
@ prometheus_exceptions_counter.count_exceptions ()
Def compute_results (results):
Computed = {}
Matches = FPING_REGEX.finditer (results)
For match in matches:
Host = match.group (1)
Ping_results = match.group (2)
If "duplicate" in ping_results:
Continue
Splitted = ping_results.split ("")
If len (splitted)! = 30:
Raise ValueError ("ping returned wrong number of results:\" {}\ ".format (splitted))
Positive_results = [float (x) for x in splitted if x! = "-"]
If len (positive_results) > 0:
Computed [host] = {"sent": 30, "received": len (positive_results)
"rtt": sum (positive_results)
"max": max (positive_results), "min": min (positive_results)
"mdev": statistics.pstdev (positive_results)}
Else:
Computed [host] = {"sent": 30, "received": len (positive_results), "rtt": 0
"max": 0, "min": 0, "mdev": 0}
If not len (computed):
Raise ValueError ("regex match\" {}\ "found nothing in fping output\" {}\ ".format (FPING_REGEX, results))
Return computed
@ prometheus_exceptions_counter.count_exceptions ()
Def call_fping (ips):
Cmdline = FPING_CMDLINE + ips
Process = subprocess.run (cmdline, stdout=subprocess.PIPE
Stderr=subprocess.STDOUT, universal_newlines=True)
If process.returncode = = 3:
Raise ValueError ("invalid arguments: {}" .format (cmdline))
If process.returncode = = 4:
Raise OSError ("fping reported syscall error: {}" .format (process.stderr))
Return process.stdout
Envs = validate_envs ()
Files = glob.glob (envs ["PROMETHEUS_TEXTFILE_DIR"] + "*")
For f in files:
Os.remove (f)
Labeled_prom_metrics = {"cluster_targets": [], "external_targets": []}
While True:
With open (CONFIG_PATH, "r") as f:
Config = json.loads (f.read ())
Config ["external_targets"] = [] if config ["external_targets"] is None else config ["external_targets"]
For target in config ["external_targets"]:
Target ["name"] = target ["host"] if "name" not in target.keys () else target ["name"]
If labeled_prom_metrics ["cluster_targets"]:
For metric in labeled_prom_metrics ["cluster_targets"]:
If (metric ["node_name"], metric ["ip"]) not in [(node ["name"], node ["ipAddress"]) for node in config ['cluster_targets']]:
For k, v in prom_metrics_cluster.items ():
V.remove (metric ["node_name"], metric ["ip"])
If labeled_prom_metrics ["external_targets"]:
For metric in labeled_prom_metrics ["external_targets"]:
If (metric ["target_name"], metric ["host"]) not in [(target ["name"], target ["host"]) for target in config ['external_targets']]:
For k, v in prom_metrics_external.items ():
V.remove (metric ["target_name"], metric ["host"])
Labeled_prom_metrics = {"cluster_targets": [], "external_targets": []}
For node in config ["cluster_targets"]:
Metrics = {"node_name": node ["name"], "ip": node ["ipAddress"], "prom_metrics": {}}
For k, v in prom_metrics_cluster.items ():
Metrics ["prom_metrics"] [k] = v.labels (node ["name"], node ["ipAddress"])
Labeled_prom_metrics ["cluster_targets"] .append (metrics)
For target in config ["external_targets"]:
Metrics = {"target_name": target ["name"], "host": target ["host"], "prom_metrics": {}}
For k, v in prom_metrics_external.items ():
Metrics ["prom_metrics"] [k] = v.labels (target ["name"], target ["host"])
Labeled_prom_metrics ["external_targets"] .append (metrics)
Out = call_fping ([prom_metric ["ip"] for prom_metric in labeled_prom_metrics ["cluster_targets"]] +\
[prom_metric ["host"] for prom_metric in labeled_prom_metrics ["external_targets"])
Computed = compute_results (out)
For dimension in labeled_prom_metrics ["cluster_targets"]:
Result = computed [dimension ["ip"]]
Dimension ["prom_metrics"] ["sent"] .inc (computed [dimension ["ip"]] ["sent"])
Dimension ["prom_metrics"] ["received"] .inc (computed [dimension ["ip"]] ["received"])
Dimension ["prom_metrics"] ["rtt"] .inc (computed [dimension ["ip"]] ["rtt"])
Dimension ["prom_metrics"] ["min"] .set (computed [dimension ["ip"]] ["min"])
Dimension ["prom_metrics"] ["max"] .set (computed [dimension ["ip"]] ["max"])
Dimension ["prom_metrics"] ["mdev"] .set (computed [dimension ["ip"]] ["mdev"])
For dimension in labeled_prom_metrics ["external_targets"]:
Result = computed [dimension ["host"]]
Dimension ["prom_metrics"] ["sent"] .inc (computed [dimension ["host"]] ["sent"])
Dimension ["prom_metrics"] ["received"] .inc (computed [dimension ["host"]] ["received"])
Dimension ["prom_metrics"] ["rtt"] .inc (computed [dimension ["host"]] ["rtt"])
Dimension ["prom_metrics"] ["min"] .set (computed [dimension ["host"]] ["min"])
Dimension ["prom_metrics"] ["max"] .set (computed [dimension ["host"]] ["max"])
Dimension ["prom_metrics"] ["mdev"] .set (computed [dimension ["host"]] ["mdev"])
Prometheus_client.write_to_textfile (
Envs ["PROMETHEUS_TEXTFILE_DIR"] + envs ["PROMETHEUS_TEXTFILE_PREFIX"] + envs ["MY_NODE_NAME"] + ".prom", registry)
The script runs on each Kubernetes node and sends ICMP packets to all instances of the Kubernetes cluster twice per second. The collected results are stored in a text file.
The script is included in the Docker image:
FROM python:3.6-alpine3.8 COPY rootfs / WORKDIR / app RUN pip3 install-- upgrade pip & & pip3 install-r requirements.txt & & apk add-- no-cache fping ENTRYPOINT ["python3", "/ app/ping-exporter.py"]
In addition, we created a ServiceAccount and a corresponding role with unique permissions to get a list of nodes (so we can know their IP addresses):
ApiVersion: v1 kind: ServiceAccount metadata: name: ping-exporter namespace: d8-system-kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: d8-system:ping-exporter rules:-apiGroups: [""] resources: ["nodes"] verbs: ["list"]-- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: d8-system:kube-ping-exporter subjects:-kind: ServiceAccount name: ping-exporter namespace: D8- System roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: d8-system:ping-exporter
Finally, we need DaemonSet to run all the instances in the cluster:
ApiVersion: apps/v1 kind: DaemonSet metadata: name: ping-exporter namespace: d8-system spec: updateStrategy: type: RollingUpdate selector: matchLabels: name: ping-exporter template: metadata: labels: name: ping-exporter spec: terminationGracePeriodSeconds: 0 tolerations:-operator: "Exists" hostNetwork: true serviceAccountName: ping-exporter priorityClassName: cluster-low containers:-image: private-registry.flant.com/ping-exporter/ping-exporter:v1 name: ping-exporter env :-name: MY_NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName-name: PROMETHEUS_TEXTFILE_DIR value: / node-exporter-textfile/-name: PROMETHEUS_TEXTFILE_PREFIX value: ping-exporter_ volumeMounts:-name: textfile mountPath: / node-exporter-textfile-name: config MountPath: / config volumes:-name: textfile hostPath: path: / var/run/node-exporter-textfile-name: config configMap: name: ping-exporter-config imagePullSecrets:-name: private-registry
The final operational details of the solution are:
When the Python script executes, its results (that is, text files stored in the / var/run/node-exporter-textfile directory on the host) are passed to node-exporter of type DaemonSet.
Node-exporter starts with the-- collector.textfile.directory / host/textfile parameter, where / host/textfile is the hostPath directory / var/run/node-exporter-textfile. (you can click here to learn more about the node-exporter text file collector.)
Finally, node-exporter reads these files, and Prometheus collects all the data from the node-exporter instance.
So how did it turn out?
Now it's time to enjoy the long-awaited results. After the metrics are created, we can use them and, of course, visualize them. You can see what they look like below.
First, there is a universal selector that allows us to select nodes to check their "source" and "destination" connections. You can get a summary table to ping the results of the selected nodes within a specified period of time in the Grafana dashboard:
The following is a graph that contains combined statistics about the selected nodes:
In addition, we have a list of records in which each record is linked to a graph of each specific node selected in the Source node:
If you expand the record, you will see detailed ping statistics from the current node to all other nodes selected in the target node:
Here are the related graphics:
What does the diagram of a problem with ping between nodes look like?
If you observe a similar situation in real life, it's time to troubleshoot!
Finally, this is the visualization of our ping operation on the external host:
We can check the overall view of all nodes, or we can only check the graphics of any particular node:
After reading this article, I believe you have a certain understanding of "how to achieve ping monitoring between Kubernetes nodes". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.