In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/03 Report--
Brief introduction
Kubernetes is a system for running and cooperating containerized applications on a group of hosts, providing a mechanism for application deployment, planning, update and maintenance. Applications run on the kubernetes cluster to expand and reduce the capacity of services, perform rolling updates, and schedule traffic between different versions of applications to test functionality or roll back problematic deployments. Kubernetes implements the functions of management services by defining various types of resources, such as deployment, pod, service, volume and so on. The following article outlines the basic information of pod and details the life cycle of pod.
Introduction to Pod
Pod is the basic unit of the kubernetes system, the smallest component created or deployed by the user, and the resource object that runs containerized applications on the kubernetes system. Other resource objects in Kubernetes cluster support pod as a resource object to achieve the purpose of kubernetes management application services.
Kubernetes cluster components mainly include master node components API Server, Controller Manager, Scheduler and child node components kubelet, container Runtime (such as docker), kube-proxy and so on. This paper describes the creation, operation and destruction of pod from the point of view of interacting with various components of the cluster. Several different states in the life cycle of Pod include pending, running, succeeded, failed, and Unknown.
Interact with API Server
API Server provides an interface for the cluster to interact with the outside world, submitting pod spec to API Server through kubectl commands or other API clients as the start of pod creation.
The main process of interaction between Pod and API Server is as follows:
After receiving a request to create a pod, API Server creates a runtime pod object based on the parameter values submitted by the user.
Verify that the namespace of the two matches against the metadata of the context of the API Server request, and if not, the creation fails.
After the Namespace match, some system data is injected into the pod object, and if the pod does not provide the name of the pod, API Server will use the uid of the pod as the name of the pod.
API Server then checks to see if the required fields of the pod object are empty, and if so, the creation fails.
After the above preparation is completed, the object will be persisted in etcd, the result returned by the asynchronous call will be encapsulated as restful.response, and the result feedback will be completed.
At this point, the API Server creation process is complete, and the rest is done by scheduler and kubelet, while the pod is in the pending state.
Interact with scheduler
When the interaction between submitting the request to create the pod and the API Server is completed, the work is then carried out by the scheduler, which mainly completes the scheduling of the pod to determine which node of the cluster the pod is running on. Note that it is stated here that after the API Server completes the task, the information is written to the etcd, and the scheduler listens for the information written to the etcd through the watch mechanism before doing any work.
Scheduler reads the pod information written to etcd, and then selects a suitable node from the cluster to run it based on a series of rules. Scheduling mainly uses three steps to determine the pod running node:
Node pre-selection: each node is checked based on a series of pre-selection rules (such as PodFitsResource and MatchNode-Selector, etc.), and the non-conforming nodes are filtered to complete the node pre-selection.
Node preference: prioritize the pre-selected nodes in order to select the node that is most suitable for running the pod object.
The node with the highest priority is selected from the priority result to run the pod object, and when there are multiple such nodes, one is randomly selected.
Note: if there are special pod resources that need to run on special nodes, advanced scheduling can be achieved by combining node tags, pod tags and tag selectors, such as pre-selection strategies such as MatchInterPodAffinity, MatchNodeSelector and PodToleratesNodeTaints, which provide users with custom Pod affinity or anti-affinity, node affinity and scheduling mechanism based on stain and tolerance.
Pre-selection strategy
A pre-selection strategy is a node filter, such as the rules implemented by MathNodeSelector and the rules implemented by PodFitsResources. When performing a preselect operation, if no suitable node exists, the pod remains in the pending state until at least one node is available.
List the supported preselection strategies (version 1.10):
CheckNodeCondition
General
NoDiskConflict
PodToleratesNodeTaintsPodToleratesNodeNoExecuteTaints
CheckServiceAffinity
MaxEBsVolumeCount
MaxGCEPDVolumeCount
MaxAzureDiskVolumeCount
CheckVolumeBinding
NoVolumeZoneConflict
CheckNodeMemoryPressure
CheckNodePIDPressure
CheckNodeDiskPressure
MatchInterPodAffinity
A brief introduction to several:
CheckNodeCondition: check to see if the pod object can be dispatched if the node reports that the disk, the network is not available, or is not ready.
NoDiskConflict: check whether the storage volume requested by the pod object is available on this node, and pass the check if there is no conflict.
MathNodeSelector: if the pod object defines the spec.NodeSelector attribute, check whether the node label matches the value of this attribute.
Preferred function
Commonly used preferred functions:
BalancedResourceAllocation
LeaastRequstedPriority
NodePreferAvoidPodsPriority
NodeAffinityPriority
TaintTolerationPriority
InterPodAffinityPriority
SelectorSpreadPriority
NodeLabelPriority
MostRequestedPriority
ImageLoccalityPriority
In addition, the scheduler supports specifying a simple integer value for each preferred function to represent the weight to calculate the node priority score. The calculation formula is as follows:
FinalScoreNode = (weight1 * priorityFunc1) + (weight2 * priorityFunc2) +....
Enumerate several preferred functions:
TaintToleraionPriority: based on Pod resources, the priority of the node's stain tolerance scheduling preference is evaluated. It matches the tolerations list of Pod objects with the node's stain. The more entries that successfully match, the lower the node score.
NodeAffinityPriority: priority evaluation based on node affinity scheduling preference, which calculates the matching degree of a given node according to the nodeSelector in Pod resources. The more entries successfully matched, the higher the node score.
The above node scheduling also includes some node affinity: hard affinity and soft affinity, resource affinity scheduling. Hard affinity scheduling and soft affinity scheduling, anti-affinity scheduling and stain tolerance are all strategies of pod scheduling, which are not described in detail.
When scheduler selects the pod running node through a series of policies, the result information is updated to API Server, the API Server is updated to etcd, and the scheduling result is reflected by API Server, and then kubelet starts pod on the selected node.
Kubelet component starts pod
The role of kubelet components is not only to create pod, but also includes node management, cAdvisor resource monitoring and management, container health check and other functions.
Start pod process analysis
Kubelet listens to the etcd directory through API Server and synchronizes the pod list. If a new pod is found to be bound to this node, the pod is created as required by the pod list, and if the pod is found to be updated, make changes accordingly.
After reading the information of pod, if it is the task of creating and modifying pod, do the following:
Create a data directory for the pod
Read the pod list from API Server
Mount external volumes for this pod
Download the Secret required for pod
Check that the pod is already running in the node. If the pod does not have a container or the Pause container is not started, stop all container processes in pod first.
Use pause images to create a container for each pod that is used to take over the network of all other containers in the Pod.
Do the following for each container in pod: 1. Calculate a hash value for the container, and then query the hash value for the docker container with the name of the container. If the container is found and the hash values of the two are different, the process in the container in docker is stopped, and the associated pause container is stopped. If the same, no processing is done. If the container is terminated and the container does not have the specified restart policy, no processing will be done to call docker client to download the container image and start the container.
Elaborate on the important behavior in the pod declaration cycle
In addition to creating application containers (primary and secondary containers), note that if istio is deployed in the cluster, a new istio-related container will be injected when pod starts, which is the beginning of another wonderful story), you can also define a variety of behaviors in its declaration cycle for pod objects, such as initialization containers, container probes, and readiness probes.
Several behaviors of container life cycle
Initialize the container
The initialization container is the container to be run before the main container starts in the pod, mainly to do some pre-work. The initialization container has the following characteristics:
The initialization container must be executed first. If the initialization container fails, the cluster will restart the initialization container until it is completed. Note that if the restart policy of pod is Never, the initialization container will not restart when it fails to start.
The initialization container must be performed in the defined order, and the initialization container can be defined through the spec.initContainers of pod.
Declare periodic hook function
Kubernetes provides two lifecycle hooks for containers:
Poststart: a hook program that runs immediately after the container is created.
PreStop: the program that runs immediately before the container terminates is synchronized, so it blocks the call to delete the container before it is completed.
Note: hook programs are executed in "Exec" and "HTTP".
Container detection
Container detection is divided into viability detection and ready detection. Container detection is used to diagnose the health status of containers by kubelet. There are three main ways of container detection:
ExecAction: execute the command in the container and judge the container's health status according to the returned status code. If 0 is returned, it is successful, otherwise it is failed.
TCPSocketAction: diagnose by trying to establish a connection with a TCP port of the container. If the port can be opened, it indicates success, otherwise it fails.
HTTPGetAction: send a HTTP GET request to the container specified by URL. The response code is 2xx or 3xx is successful, otherwise it fails.
Pod termination process
The termination process is mainly divided into the following steps:
The user issues the delete pod command
The Pod object is updated over time, and during the grace period (30 seconds by default), pod is considered to be in the "dead" state
Mark pod as "Terminating" status
The third step runs at the same time to start the pod shutdown process when the pod object is in a "Terminating" state.
The third step is carried out at the same time. The endpoints controller monitors that the pod object is closed and deletes the list of endpoints matching pod and service.
If the preStop hook handler is defined in the pod, execution is started synchronously when the pod is marked as "Terminating"; if the preStop execution is not finished after the grace period ends, the second step will be re-executed with an additional small grace period of 2 seconds
The container of the object in the Pod receives the TERM signal
After the grace period ends, if there is any running process, pod will receive a SIGKILL signal
Kubelet requests API Server to set this Pod resource grace period to 0 to complete the delete operation
In addition, in addition to starting kubelet, there is also cAdvisor in kubelet, which is used to collect container CPU, memory, file system, network usage and other information, and combine with prometheus to monitor pod in the cluster.
In addition, in addition to the interaction of the above three components in the process of creating pod, there are functions such as controller-manager to ensure that the pod is in the desired state of the user (that is, to ensure that the pod is always alive) and proxy for communication between pod in the cluster.
This article is originally published by Boyun Research Institute. Please indicate the source when reproduced.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.