Detailed explanation of Pod Lifecycle of Kubernetes 07/06 Update SLTechnology News&Howtos

Detailed explanation of Pod Lifecycle of Kubernetes

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Brief introduction

Kubernetes is a system for running and cooperating containerized applications on a group of hosts, providing a mechanism for application deployment, planning, update and maintenance. Applications run on the kubernetes cluster to expand and reduce the capacity of services, perform rolling updates, and schedule traffic between different versions of applications to test functionality or roll back problematic deployments. Kubernetes implements the functions of management services by defining various types of resources, such as deployment, pod, service, volume and so on. The following article outlines the basic information of pod and details the life cycle of pod.

Introduction to Pod

Pod is the basic unit of the kubernetes system, the smallest component created or deployed by the user, and the resource object that runs containerized applications on the kubernetes system. Other resource objects in Kubernetes cluster support pod as a resource object to achieve the purpose of kubernetes management application services.

Kubernetes cluster components mainly include master node components API Server, Controller Manager, Scheduler and child node components kubelet, container Runtime (such as docker), kube-proxy and so on. This paper describes the creation, operation and destruction of pod from the point of view of interacting with various components of the cluster. Several different states in the life cycle of Pod include pending, running, succeeded, failed, and Unknown.

Interact with API Server

API Server provides an interface for the cluster to interact with the outside world, submitting pod spec to API Server through kubectl commands or other API clients as the start of pod creation.

The main process of interaction between Pod and API Server is as follows:

After receiving a request to create a pod, API Server creates a runtime pod object based on the parameter values submitted by the user.

Verify that the namespace of the two matches against the metadata of the context of the API Server request, and if not, the creation fails.

After the Namespace match, some system data is injected into the pod object, and if the pod does not provide the name of the pod, API Server will use the uid of the pod as the name of the pod.

API Server then checks to see if the required fields of the pod object are empty, and if so, the creation fails.

After the above preparation is completed, the object will be persisted in etcd, the result returned by the asynchronous call will be encapsulated as restful.response, and the result feedback will be completed.

At this point, the API Server creation process is complete, and the rest is done by scheduler and kubelet, while the pod is in the pending state.

Interact with scheduler

When the interaction between submitting the request to create the pod and the API Server is completed, the work is then carried out by the scheduler, which mainly completes the scheduling of the pod to determine which node of the cluster the pod is running on. Note that it is stated here that after the API Server completes the task, the information is written to the etcd, and the scheduler listens for the information written to the etcd through the watch mechanism before doing any work.

Scheduler reads the pod information written to etcd, and then selects a suitable node from the cluster to run it based on a series of rules. Scheduling mainly uses three steps to determine the pod running node:

Node pre-selection: each node is checked based on a series of pre-selection rules (such as PodFitsResource and MatchNode-Selector, etc.), and the non-conforming nodes are filtered to complete the node pre-selection.

Node preference: prioritize the pre-selected nodes in order to select the node that is most suitable for running the pod object.

The node with the highest priority is selected from the priority result to run the pod object, and when there are multiple such nodes, one is randomly selected.

Note: if there are special pod resources that need to run on special nodes, advanced scheduling can be achieved by combining node tags, pod tags and tag selectors, such as pre-selection strategies such as MatchInterPodAffinity, MatchNodeSelector and PodToleratesNodeTaints, which provide users with custom Pod affinity or anti-affinity, node affinity and scheduling mechanism based on stain and tolerance.

Pre-selection strategy

A pre-selection strategy is a node filter, such as the rules implemented by MathNodeSelector and the rules implemented by PodFitsResources. When performing a preselect operation, if no suitable node exists, the pod remains in the pending state until at least one node is available.

List the supported preselection strategies (version 1.10):

CheckNodeCondition

General

NoDiskConflict

PodToleratesNodeTaintsPodToleratesNodeNoExecuteTaints

CheckServiceAffinity

MaxEBsVolumeCount

MaxGCEPDVolumeCount

MaxAzureDiskVolumeCount

CheckVolumeBinding

NoVolumeZoneConflict

CheckNodeMemoryPressure

CheckNodePIDPressure

CheckNodeDiskPressure

MatchInterPodAffinity

A brief introduction to several:

CheckNodeCondition: check to see if the pod object can be dispatched if the node reports that the disk, the network is not available, or is not ready.

NoDiskConflict: check whether the storage volume requested by the pod object is available on this node, and pass the check if there is no conflict.

MathNodeSelector: if the pod object defines the spec.NodeSelector attribute, check whether the node label matches the value of this attribute.

Preferred function

Commonly used preferred functions:

BalancedResourceAllocation

LeaastRequstedPriority

NodePreferAvoidPodsPriority

NodeAffinityPriority

TaintTolerationPriority

InterPodAffinityPriority

SelectorSpreadPriority

NodeLabelPriority

MostRequestedPriority

ImageLoccalityPriority

In addition, the scheduler supports specifying a simple integer value for each preferred function to represent the weight to calculate the node priority score. The calculation formula is as follows:

FinalScoreNode = (weight1 * priorityFunc1) + (weight2 * priorityFunc2) +....

Enumerate several preferred functions:

TaintToleraionPriority: based on Pod resources, the priority of the node's stain tolerance scheduling preference is evaluated. It matches the tolerations list of Pod objects with the node's stain. The more entries that successfully match, the lower the node score.

NodeAffinityPriority: priority evaluation based on node affinity scheduling preference, which calculates the matching degree of a given node according to the nodeSelector in Pod resources. The more entries successfully matched, the higher the node score.

The above node scheduling also includes some node affinity: hard affinity and soft affinity, resource affinity scheduling. Hard affinity scheduling and soft affinity scheduling, anti-affinity scheduling and stain tolerance are all strategies of pod scheduling, which are not described in detail.

When scheduler selects the pod running node through a series of policies, the result information is updated to API Server, the API Server is updated to etcd, and the scheduling result is reflected by API Server, and then kubelet starts pod on the selected node.

Kubelet component starts pod

The role of kubelet components is not only to create pod, but also includes node management, cAdvisor resource monitoring and management, container health check and other functions.

Start pod process analysis

Kubelet listens to the etcd directory through API Server and synchronizes the pod list. If a new pod is found to be bound to this node, the pod is created as required by the pod list, and if the pod is found to be updated, make changes accordingly.

After reading the information of pod, if it is the task of creating and modifying pod, do the following:

Create a data directory for the pod

Read the pod list from API Server

Mount external volumes for this pod

Download the Secret required for pod

Check that the pod is already running in the node. If the pod does not have a container or the Pause container is not started, stop all container processes in pod first.

Use pause images to create a container for each pod that is used to take over the network of all other containers in the Pod.

Do the following for each container in pod: 1. Calculate a hash value for the container, and then query the hash value for the docker container with the name of the container. If the container is found and the hash values of the two are different, the process in the container in docker is stopped, and the associated pause container is stopped. If the same, no processing is done. If the container is terminated and the container does not have the specified restart policy, no processing will be done to call docker client to download the container image and start the container.

Elaborate on the important behavior in the pod declaration cycle

In addition to creating application containers (primary and secondary containers), note that if istio is deployed in the cluster, a new istio-related container will be injected when pod starts, which is the beginning of another wonderful story), you can also define a variety of behaviors in its declaration cycle for pod objects, such as initialization containers, container probes, and readiness probes.

Several behaviors of container life cycle

Initialize the container

The initialization container is the container to be run before the main container starts in the pod, mainly to do some pre-work. The initialization container has the following characteristics:

The initialization container must be executed first. If the initialization container fails, the cluster will restart the initialization container until it is completed. Note that if the restart policy of pod is Never, the initialization container will not restart when it fails to start.

The initialization container must be performed in the defined order, and the initialization container can be defined through the spec.initContainers of pod.

Declare periodic hook function

Kubernetes provides two lifecycle hooks for containers:

Poststart: a hook program that runs immediately after the container is created.

PreStop: the program that runs immediately before the container terminates is synchronized, so it blocks the call to delete the container before it is completed.

Note: hook programs are executed in "Exec" and "HTTP".

Container detection

Container detection is divided into viability detection and ready detection. Container detection is used to diagnose the health status of containers by kubelet. There are three main ways of container detection:

ExecAction: execute the command in the container and judge the container's health status according to the returned status code. If 0 is returned, it is successful, otherwise it is failed.

TCPSocketAction: diagnose by trying to establish a connection with a TCP port of the container. If the port can be opened, it indicates success, otherwise it fails.

HTTPGetAction: send a HTTP GET request to the container specified by URL. The response code is 2xx or 3xx is successful, otherwise it fails.

Pod termination process

The termination process is mainly divided into the following steps:

The user issues the delete pod command

The Pod object is updated over time, and during the grace period (30 seconds by default), pod is considered to be in the "dead" state

Mark pod as "Terminating" status

The third step runs at the same time to start the pod shutdown process when the pod object is in a "Terminating" state.

The third step is carried out at the same time. The endpoints controller monitors that the pod object is closed and deletes the list of endpoints matching pod and service.

If the preStop hook handler is defined in the pod, execution is started synchronously when the pod is marked as "Terminating"; if the preStop execution is not finished after the grace period ends, the second step will be re-executed with an additional small grace period of 2 seconds

The container of the object in the Pod receives the TERM signal

After the grace period ends, if there is any running process, pod will receive a SIGKILL signal

Kubelet requests API Server to set this Pod resource grace period to 0 to complete the delete operation

In addition, in addition to starting kubelet, there is also cAdvisor in kubelet, which is used to collect container CPU, memory, file system, network usage and other information, and combine with prometheus to monitor pod in the cluster.

In addition, in addition to the interaction of the above three components in the process of creating pod, there are functions such as controller-manager to ensure that the pod is in the desired state of the user (that is, to ensure that the pod is always alive) and proxy for communication between pod in the cluster.

This article is originally published by Boyun Research Institute. Please indicate the source when reproduced.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.