Kubernetes Readiness and liveness and startupProbe 07/06 Update SLTechnology News&Howtos

Kubernetes Readiness and liveness and startupProbe

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Why use readiness and liveness for kubernetes Pod's life cycle (Readiness and liveness and startupProbe) container probe?

Because of a large number of asynchronous mechanisms in K8s and decoupling in many kinds of object-relational design, when the number of application instances is increased / deleted, or when the application version changes to trigger a rolling upgrade, the system can not guarantee that the application-related service and ingress configurations can always be refreshed in time. In some cases, only the new Pod has completed its initialization, and the system has not yet completed the refresh of externally accessible access information such as EndPoint and load balancer, so the old Pod will be deleted immediately, resulting in a temporary unavailability of the service, which is unacceptable for production, so the survival probe (Readiness) appears at this time.

Start probe (startup Probe)

Sometimes, the service may not be able to be used immediately after startup. What we used to do before is to use the ready probe to set the initialDelay (how many seconds after the container starts to detect) to determine whether the service is alive or not, roughly as follows

LivenessProbe: httpGet: path: / test prot: 80 failureThreshold: 1 initialDelay:10 periodSeconds: 10

However, at this time, if it takes 60s for our service A to start, if the above probe is used, the pod will fall into an endless loop, because after 10 seconds of detection, it will be more likely to have a restart strategy to restart Pod, all the way into the dead loop. I'm sure you can guess that it would be nice if we adjusted the value of initialDelay. But can you make sure that you know how many seconds it takes to start each service?

Smart you must have thought of where we can adjust the value of failureThreshold, but how big should it be? If we set it to

LivenessProbe: httpGet: path: / test prot: 80 failureThreshold: 5 initialDelay:10 periodSeconds: 10

If it is set like this, the pod will start normally for the first time, but it will take (5*10s=50s) for us to find that our service is not available if we go to the later probe. This is not allowed to happen in production, so we use startupProbe to use the same probe as livenessProbe to determine whether the service has started successfully.

LivenessProbe: httpGet: path: / test prot: 80 failureThreshold: 1 initialDelay:10 periodSeconds: 10startupProbe: httpGet: path: / test prot: 80 failureThreshold: 10 initialDelay:10 periodSeconds: 10

In this case, as long as the service is started at any time within 101010s, the probe will be successfully detected and handed over to livenessProbe for further detection. When we find a problem, 1101010 can find the problem within 10s and respond in a timely manner.

Service probe (readiness probe)

Check whether the program in the container is ready to start. Only when the program in the container starts successfully will it become running. Otherwise, the container starts successfully and it is still a signal of failure (because the service in it is not detected successfully).

Survival probe (liveness Probe) (whether to run or not)

Check whether the container is running, just check whether the container is alive, and not check whether the service inside is normal. If the probe detects a failure, he will activate his restart strategy.

There are three types of handlers: 1, ExecAction: probe through custom commands, indicate survival when the return value is 0, and not survive when the return value is non-zero. 2, TCPSocketAction: tcp check the port on the container. If the port is open, it means survival 3, HTTPGetAction: execute a HTTPGet request for the specified port and url address, if the response status code is greater than or equal to 200 and less than 400 It is considered that the survival of each probe can only be the following three results: 1, success: container passed test 2, failure: container failed test 3, unknown: test failed Therefore, no action probe example will be taken: ExecAction# cat nginx.yamlapiVersion: v1kind: Podmetadata: name: nginxspec: restartPolicy: OnFailure containers:-name: nginx image: nginx:1.14.1 imagePullPolicy: IfNotPresent ports:-name: http containerPort: 80 protocol: TCP-name: https containerPort: 443 protocol: TCP livenessProbe: exec: command: ["test", "- f" "/ usr/share/nginx/html/index.html"] failureThreshold: 3 initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 5 readinessProbe: httpGet: port: 80 path: / index.html initialDelaySeconds: 15 timeoutSeconds: 1

Let's start the container and test the service probe.

Kubectl create-f nginx.yaml

We went into the nginx container and deleted the index file to see the details.

# kubectl describe pod nginx.Events: Type Reason Age From Message-Normal Scheduled 4m24s default-scheduler Successfully assigned default/nginx to 192.168.1.124 Normal Pulling 4m23s kubelet 192.168.1.124 pulling image "nginx:1.14.1" Normal Pulled 4m1s kubelet, 192.168.1.124 Successfully pulled image "nginx:1.14.1" Warning Unhealthy 57s kubelet, 192.168.1.124 Readiness probe failed: HTTP probe failed with statuscode: 404 Warning Unhealthy 50s (x3 over 60s) kubelet, 192.168.1.124 Liveness probe failed: Normal Killing 50s kubelet 192.168.1.124 Killing container with id docker://nginx:Container failed liveness probe.. Container will be killed and recreated. Normal Pulled 50s kubelet, 192.168.1.124 Container image "nginx:1.14.1" already present on machine Normal Created 49s (x2 over 4m) kubelet, 192.168.1.124 Created container Normal Started 49s (x2 over 4m) kubelet, 192.168.1.124 Started container

It is obvious from the event information that his service probe once reported an error of 404, and then immediately performed the process of restarting the container.

Introduction of probe parameters:

Exec: writing probes using custom commands

HttpGet: probe using http access

TcpSocket: use tcp sockets to probe

FailureThreshold: several failures in a row are real failures.

InitialDelaySeconds: how many seconds after the container starts to probe (because it takes time for the service in the container to start)

PeriodSeconds: how many seconds is the interval between probing

TimeoutSeconds: timeout for command execution

Probe parameters of HTTPGet:

Host: probe that host (URL) path: access path (URI) port: access port (required field) scheme: scheme used to connect to the host (HTTP or HTTPS). The default is HTTPhttpHeaders: the custom header to set in the request. HTTP allows repeating headers.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.