Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Kubernetes Health status examination (9)

2025-01-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Strong self-healing ability is an important feature of container orchestration engines such as Kubernetes. The default implementation of self-healing is to automatically restart the failed container. In addition, users can also use Liveness and Readiness detection mechanisms to set more detailed health checks to achieve the following requirements:

Zero downtime deployment. Avoid deploying invalid images. A more secure rolling upgrade. 1. Liveness detection

The Liveness probe allows users to customize the conditions that determine whether the container is healthy or not. If the probe fails, Kubernetes restarts the container.

We create a configuration file liveness.yaml for Pod, and you can use the command kubectl explain pod.spec.containers.livenessProbe to see how to use it.

ApiVersion: v1kind: Podmetadata: name: liveness labels: test: livenessspec: restartPolicy: OnFailure containers:-name: liveness image: busybox args:-/ bin/sh-- c-touch / tmp/healthy; sleep 30; rm-rf / tmp/healthy; sleep 600 livenessProbe: exec: command:-cat-/ tmp/healthy initialDelaySeconds: 10 periodSeconds: 5

The startup process first creates the file / tmp/healthy,30 and then deletes it. In our setting, if the / tmp/healthy file exists, the container is considered to be in a normal state, but a failure occurs anyway.

The livenessProbe section defines how to perform Liveness probes:

The way to probe is to check the existence of the / tmp/healthy file through the cat command. If the command executes successfully, the return value is zero, and Kubernetes thinks that the Liveness probe is successful; if the command returns a non-zero value, the Liveness probe fails. InitialDelaySeconds: 10 specifies that the Liveness probe will be executed after the container starts 10. We usually set it according to the preparation time for the application to start. For example, if it takes 30 seconds for an application to start normally, the value of initialDelaySeconds should be greater than 30. PeriodSeconds: 5 specifies that the Liveness probe is performed every 5 seconds. If Kubernetes fails to perform three Liveness probes in a row, it will kill and restart the container.

Create the Pod liveness below:

[root@master] # kubectl apply-f liveness.yaml pod/liveness created

As you can see from the configuration file, the / tmp/healthy exists for the first 30 seconds, and the cat command returns a successful detection of 0 kubectl describe pod liveness Lifetime. During this period, the Events section of the log will display a normal log.

[root@master] # kubectl describe pod livenessEvents: Type Reason Age From Message-Normal Pulling 25s kubelet, node02 pulling image "busybox" Normal Pulled 24s kubelet, node02 Successfully pulled image "busybox" Normal Created 24s kubelet, node02 Created container Normal Started 23s kubelet Node02 Started container Normal Scheduled 23s default-scheduler Successfully assigned default/liveness to node02

After 35 seconds, the log shows that / tmp/healthy no longer exists and the Liveness probe fails. After dozens of seconds, after several failed probes, the container will be restarted.

[root@master] # kubectl describe pod livenessEvents: Type Reason Age From Message-Normal Scheduled 6m9s default-scheduler Successfully assigned default/liveness to node02 Normal Pulled 3m41s (x3 over 6m10s) kubelet Node02 Successfully pulled image "busybox" Normal Created 3m41s (x3 over 6m10s) kubelet, node02 Created container Normal Started 3m40s (x3 over 6m9s) kubelet, node02 Started container Warning Unhealthy 2m57s (x9 over 5m37s) kubelet, node02 Liveness probe failed: cat: can't open'/ tmp/healthy': No such file or directory Normal Pulling 2m27s (x4 over 6m11s) kubelet, node02 pulling image "busybox" Normal Killing 60s (x4 over 4m57s) kubelet, node02 Killing container with id docker://liveness:Container failed liveness probe.. Container will be killed and recreated.

Then we look at the container and have restarted it once.

[root@master ~] # kubectl get podNAME READY STATUS RESTARTS AGEliveness 1ram 1 Running 3 5m13s II, Readiness probe

Users can use Liveness probe to tell Kubernetes when to restart the container to achieve self-healing, while Readiness probe tells Kubernetes when the container can be added to the Service load balancer pool to provide services.

The configuration syntax for the Readiness probe is exactly the same as the Liveness probe, and we create the configuration file readiness.yaml.

ApiVersion: v1kind: Podmetadata: name: readiness labels: test: readinessspec: restartPolicy: OnFailure containers:-name: readiness image: busybox args:-/ bin/sh-- c-touch / tmp/healthy; sleep 30; rm-rf / tmp/healthy; sleep 600 readinessProbe: exec: command:-cat-/ tmp/healthy initialDelaySeconds: 10 periodSeconds: 5

Create a Pod and view its status.

[root@master] # kubectl apply-f readiness.yaml pod/readiness created

When it was first created, the READY status was unavailable.

[root@master ~] # kubectl get podNAME READY STATUS RESTARTS AGEreadiness 0amp 1 Running 0 21s

15 seconds later (initialDelaySeconds + periodSeconds), the Readiness probe is performed for the first time and returns successfully, setting READY to available.

[root@master ~] # kubectl get podNAME READY STATUS RESTARTS AGEreadiness 1 Compact 1 Running 0 38s

After 30 seconds, / tmp/healthy is deleted, and after three consecutive Readiness probes fail, READY is set to unavailable.

[root@master ~] # kubectl get podNAME READY STATUS RESTARTS AGEreadiness 0amp 1 Running 0 63s

You can also see the log of failed Readiness probes through kubectl describe pod readiness.

Events: Type Reason Age From Message-Normal Pulling 5m29s kubelet Node01 pulling image "busybox" Normal Scheduled 5m25s default-scheduler Successfully assigned default/readiness to node01 Normal Pulled 5m13s kubelet, node01 Successfully pulled image "busybox" Normal Created 5m12s kubelet, node01 Created container Normal Started 5m12s kubelet, node01 Started container Warning Unhealthy 28s (x51 over 4m38s) kubelet, node01 Readiness probe failed: cat: can't open'/ tmp/healthy': No such file or directory

Here's a comparison between Liveness probe and Readiness probe:

Liveness probe and Readiness probe are two Health Check mechanisms. If they are not specifically configured, Kubernetes will adopt the same default behavior for both probes, that is, to determine whether the probe is successful by judging whether the return value of the container startup process is zero. The configuration methods of the two probes are exactly the same, and the supported configuration parameters are the same. The difference lies in the behavior after probe failure: Liveness probe restarts the container, while Readiness probe sets the container to be unavailable and does not receive requests forwarded by Service. Liveness probe and Readiness probe are performed independently, and there is no dependency between them, so they can be used alone or at the same time. Use Liveness probe to determine whether the container needs to be restarted to achieve self-healing; use Readiness probe to determine whether the container is ready to provide services. 3. The application of Health Check in Scale Up

For multi-copy applications, when performing the Scale Up operation, the new copy will be added to the responsible balance of the Service as a backend to process the customer's request together with the existing copy. Considering that application startup usually requires a preparation phase, such as loading cached data, connecting to the database, etc., it takes a period of time from container startup to being able to provide services. We can use the Readiness probe to determine whether the container is ready to avoid sending the request to a backend that does not yet have a ready.

Let's create a configuration file to illustrate this situation.

ApiVersion: extensions/v1beta1kind: Deploymentmetadata: name: webspec: replicas: 3 template: metadata: labels: run: webspec: containers:-name: web images: myhttpd ports:-containerPort: 8080 readinessProbe: httpGet: scheme: HTTP path: / health port: 8080 initialDelaySeconds: 10 periodSeconds : 5---apiVersion: v1kind: Servicemetadata: name: web-svcspec: selector: run: web ports:-protocol: TCP port: 8080 targetPort: 80

Focus on the readinessProbe part. Here we use another detection method different from exec-httpGet. The condition for Kubernetes to determine the success of this method is that the return code of the http request is between 200,400.

Schema specifies the protocol and supports HTTP (default) and HTTPS. Path specifies the access path. Port specifies the port.

The purpose of the above configuration is:

The probe begins 10 seconds after the container starts. If the http://[container_ip]:8080/healthy return code is not 200-400, the container is not ready and does not receive requests from Service web-svc. Detect it again every five seconds. Until the return code is 200-400, indicating that the container is ready, and then adding it to the responsible balance of web-svc to start processing customer requests. The probe will continue to be executed at an interval of 5 seconds. If three consecutive failures occur, the container will be removed from the load balancer until the next probe is successfully rejoined.

It is recommended to configure Health Check for important applications in the production environment to ensure that the containers that handle customer requests are ready Service backend.

4. The application of Health Check in Rolling Update

Another important application scenario for Health Check is Rolling Update. Consider the following situation. If you have a running multi-copy application, and then update the application (such as using a later version of image), Kubernetes will start a new copy, and then the following event occurs:

Normally, it takes 10 seconds for the new copy to complete the preparation, and the business request cannot be responded to until then. However, due to human configuration errors, the replica is always unable to complete the preparation work (such as unable to connect to the back-end database).

Because the new copy itself does not exit abnormally, the default Health Check mechanism assumes that the container is ready, and then gradually replaces the existing copy with the new copy. As a result, when all the old copies are replaced, the whole application will not be able to process requests and provide services. If this happens in an important production system, the consequences will be very serious.

If Health Check is configured correctly, the new copy will be added to the Service; only if it passes the Readiness probe. If it does not pass the probe, all the existing copies will not be replaced and the business will still proceed normally.

The following is an example to practice the application of Health Check in Rolling Update.

Simulate a 10-copy application with the following configuration file app.v1.yml:

ApiVersion: extensions/v1beta1kind: Deploymentmetadata: name: appspec: replicas: 10 template: metadata: labels: run: appspec: containers:-name: app images: busybox args:-/ bin/sh-- c-sleep 10; touch / tmp/healthy Sleep 30000 readinessProbe: exec: command:-cat-/ tmp/healthy initialDelaySeconds: 10 periodSeconds: 5

After 10 seconds, the copy can be detected by Readiness.

[root@master ~] # kubectl apply-f app.v1.yaml deployment.extensions/app created [root@master ~] # kubectl get podsNAME READY STATUS RESTARTS AGEapp-56878b4676-4rftg 1 Running 0 34sapp-56878b4676-6jtn4 1 Running 0 34sapp-56878b4676-6smfj 1 Running 0 34sapp-56878b4676-8pnc2 1 max 1 Running 0 34sapp -56878b4676-hxzjk 1 + 1 Running 0 34sapp-56878b4676-mglht 1 + 1 Running 0 34sapp-56878b4676-t2qs6 1 + 1 Running 0 34sapp-56878b4676-vgw44 1 + 1 Running 0 34sapp-56878b4676-vnxfx 1 + + 1 Running 0 34sapp-56878b4676-wb9rh 1 + + 1 Running 0 34s

Next, scroll to update the application. The configuration file app.v2.yml is as follows:

ApiVersion: extensions/v1beta1kind: Deploymentmetadata: name: appspec: replicas: 10 template: metadata: labels: run: appspec: containers:-name: app image: busybox args:-/ bin/sh-- c-sleep 30000 readinessProbe: exec: command:-cat -/ tmp/healthy initialDelaySeconds: 10 periodSeconds: 5

Obviously, since / tmp/healthy does not exist in the new copy, it cannot be detected by Readiness. Verify as follows:

[root@master ~] # kubectl apply-f app.v2.yaml deployment.extensions/app configured [root@master ~] # kubectl get podNAME READY STATUS RESTARTS AGEapp-56878b4676-4rftg 1 6jtn4 1 Running 0 4m42sappury 56878b4676-6jtn4 1 6jtn4 1 Running 04m42sapplle 56878b4676-6smfj 1max 1 Running 0 4m42sapp-56878b4676-hxzjk 1max 1 Running 0 4m42sapp -56878b4676-mglht 1 + 1 Running 0 4m42sapp-56878b4676-t2qs6 1 + 1 Running 0 4m42sapp-56878b4676-vgw44 1 + 1 Running 0 4m42sapp-56878b4676-vnxfx 1 + 1 Running 0 4m42sapp-56878b4676-wb9rh 1 + + 1 Running 0 4m42sapp-84fc656775-hf954 0 * 0 66s [root@master ~] # kubectl get deployNAME READY UP-TO-DATE AVAILABLE AGEapp 9 + 10 2 9 7m1s

Take a look at the kubectl get pod output first:

Judging from the AGE column of Pod, the last two Pod are new copies and are currently in NOT READY state. The number of old copies decreased from the first 10 to 8.

Let's look at the output of kubectl get deployment app:

DESIRED 10 indicates that the desired state is a copy of 10 READY. UP-TO-DATE 2 indicates the current number of copies that have completed the update: that is, 2 new copies. AVAILABLE 9 represents the number of copies currently in the READY state: that is, 9 old copies.

In our setup, the new copy will never be able to be detected by Readiness, so this state will remain.

Above we simulate a scenario in which scrolling updates fail. Fortunately, however, Health Check shielded us from defective copies while keeping most of the old ones, and the business was not affected by update failures.

Scrolling updates can control the number of replica replacements through the parameters maxSurge and maxUnavailable.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report