What is the core principle of kubernetes? 07/06 Update SLTechnology News&Howtos

What is the core principle of kubernetes?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the relevant knowledge of what the core principle of kubernetes is, the content is detailed and easy to understand, the operation is simple and fast, and it has a certain reference value. I believe you will gain something after reading this article on the core principle of kubernetes. Let's take a look at it.

Core principle: controller mode

The controller logic includes three logic components: controller, sensor and system. Events are triggered by modifying the relevant fields in spec, and controller triggers the specific operation of the system by comparing the current state with the expected state.

Sensor includes three components: reflector, informer and indexer.

Reflector acquires data of resources through list&watch apiserver

List is used to fully update resources in the event of controller restart or watch interruption.

Watch makes incremental updates between multiple list.

After obtaining the resource information, reflector will add a delta data including the resource information itself and the resource object event type to the delta queue. The delta queue can ensure that there is only one record for the same object in the queue, thus avoiding duplicate records during reflector list&watch.

The informer component keeps popping up delta records from the delta queue. On the one hand, it gives the resource object to the resource callback function, and at the same time, it gives the resource object to indexer. By default, indexer records the resource object in the cache, passes through namespace as its index value, and can share resources by multiple controller of controller-manager.

The controller function of the control loop is mainly composed of event handling function and worker.

The event handler function listens for new, updated and deleted events of resources, and decides whether it needs to be processed according to the logic of the controller.

For events that need to be handled, the namespace of the resource associated with the event, as well as the name of the resource are crammed into a work queue and handled by a worker in the worker pool.

The work queue deduplicates the stored events, thus avoiding the situation in which multiple worker processes the same resource.

When dealing with resource objects, worker usually gets the latest data according to its name, which is used to create / update resource objects, or to invoke other external services.

When worker processing fails, resource events are added back to the work queue to facilitate retry later.

Controller mode summary:

Declarative api driver-modification of k8s resource object

The controller asynchronous control system approaches to the final state.

Make it possible to automate and unattended the system.

Custom resources and controllers (operator) facilitate system expansion

Deployment: manage the controller of the deployment release

Define the expected number of pod for a set, and controller will keep the number of pod in line with the expected number.

If you configure the pod publishing method, controller will update the pod according to the given policy to ensure that the number of pod that is not available during the update process is within the limit.

If there is a problem with the release, "one-click" rollback is supported.

Update Mirror

Kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1

Quick rollback

Kubectl rollout undo deployment/nginx-deployment

Kubectl rollout undo deployment.v1.apps/nginx-deployment-to-revision=2

Kubectl rollout history deployment.v1.apps/nginx-deployment

Management mode

Deployment is only responsible for managing different versions of ReplicaSet, and ReplicaSet manages the number of Pod copies. Each ReplicaSe corresponds to a version of Deployment template, and the pod under a RS is the same version.

Deployment controller

Checkpaused: check if dp needs to disable new releases

Dp's controller actually does some more complex things, including version management, leaving the quantity maintenance work under specific versions to replicaset.

ReplicaSet controller

Capacity expansion-release-rollback

Spec field parsing

MinReadySeconds: determine the minimum ready time of Pod available

RevisionHistoryLimit: number of historical revision (replicaset) retained. Default is 10.

Paused: marks that the production quantity of deployment is maintained and no new releases are made.

ProgressDealineSeconds: the maximum time to judge dp status condition as failed. (for how long dp has been in processing, dp status is considered to be failed.)

Analysis of upgrade Policy Field

MaxUnavailable: the maximum number of pod unavailable during scrolling.

MaxSurge: the maximum number of pod that exceeds the expected number of replicas in the scrolling process.

Maxunavailable and max surge cannot be 0?

Because, when maxsurge=0, you must first delete a pod and then create a new pod, because the expansion of a new pod will cause the rs to exceed the expected number.

Maxsurge guarantees that no new pod,maxunavailable can be expanded to ensure that no pod is unavailable.

Job: controller for managing tasks-running task processes through pod

Create one or more pod to ensure that a specified number of pod can run successfully.

Track the pod status and retry the failed pod in time according to the configuration.

Determine dependencies to ensure that the previous task is finished before running the next task.

Control task parallelism and ensure pod queue size according to configuration.

Running Job in parallel

Completions: represents the number of times this pod queue is executed, where 8 means that the task will be executed 8 times.

Parallelism: represents the number of parallel executions, where 2 represents the number of pod executed in parallel, that is, there will be two pod running at the same time.

CronJob syntax

Schedule: crontab has the same time format

StartingDeadlineSeconds: maximum startup time of job

ConcurrencyPolicy: whether to allow parallel operation

SuccessfulJobsHistoryLimit: the number of historical job allowed to be retained

Management mode

Job controller is responsible for creating pod based on configuration

Job controller tracks job status, retries pod in time according to configuration or continues to create

Job controller automatically adds label to track the corresponding pod and creates a pod in parallel or serial based on the configuration

Job controller

All job is a controller, it will go to watch kube-apiserver, every time we submit a yaml, we will save it in etcd through apiserver, and then job controller will register several handler. Every time we have operations such as add / update / delete, he will send it to job controller through a message queue, and job controller will check the existing active pod,...

DaemonSet: daemon controller-have nodes in each cluster run the same pod

Ensure that every node (or some) in the cluster is running the same set of pod

Track the status of cluster nodes to ensure that newly joined nodes automatically create corresponding pod

Track the status of cluster nodes to ensure that the removed nodes delete the corresponding pod

Track the pod status to ensure that each node pod is running

Update policy

RollingUpdate: DaemonSet default update policy. After updating the Daemonset template, the old pod will be deleted first, and then a new pod will be created. Rolling updates can be made in conjunction with health check.

OnDelete: when the DaemonSet template is updated, the Pod of this node will be updated only if a corresponding pod is manually deleted.

Management mode

DaemonSet Controller is responsible for creating Pod based on configuration

DaemonSet Controller tracks pod status, retries pod according to configuration or continues to create

DaemonSet Controller automatically adds affinity&label to track the corresponding pod and creates a pod on each node or appropriate part of the node according to the configuration

DaemonSet controller

Application configuration management

The configuration management of pod includes:

Variable configuration (configmap)

Sensitive information (secret)

Identity authentication (serviceAccount)

Resource configuration (spec.containers [] .Resources.resources / requests)

Security control (spec.containers [] .securitycontext)

Pre-check (spec.Initcontainers).

Configmap

Mainly manage the configuration files, environment variables, command line parameters and other variable configurations needed for the container to run. Used to decouple container images and variable configurations to ensure the portability of workloads (pod).

Kubectl create configmap kube-flannel-cfg-- from-file=configure-pod-container/configmap/cni-conf.json-n kube-systemkubectl create configmap special-config-- from-literal=special.how=very-- from-literal=special.type=charmtips:

Configmap file size limit: 1MB (etcd requirement)

Pod can only reference configmap in the same namespace

The pod cannot be created successfully when the configmap referenced by pod does not exist. That is, before pod is created, you need to create configmap.

When using envFrom to configure an environment variable from configmap, if some key in the configmap is considered invalid (such as a number in the key), the environment variable will not be injected into the container, but the pod can be created normally.

Only pod created by K8s api can use configmap, while pod created by other methods (such as static pod created by manifest) cannot use configmap.

Secret

A resource object used to store sensitive information such as passwords and token in a cluster. The sensitive data is saved by base-64 coding, which is more standardized and secure than that stored in plaintext configmap.

Kubectl create secret generic myregistrykey-from-file=.dockerconfigjson=/root/.docker/config.json-type=kubernetes.io/dockerconfigjsonkubectl create secret generic prod-db-secret-from-literal=username=produser-from-literal=password=Y4nys7f11tips:

Secret file size limit: 1MB

Although secret is encoded in base-64, it can be easily decoded to view the original information. There is a strong demand for secret encryption, so we can consider combining k8s+vault to solve the encryption and rights management of sensitive information.

Secret best practice: because the general processing of list/watch will get all the secret under namespace, it is not recommended to use list/watch to obtain secret information, but GET is recommended, thus reducing the possibility of more secret exposure.

ServiceAccount

It is mainly used to solve the identity authentication problem of pod in the cluster. The authorization information used in authentication is managed by secret (type=kubernetes.io/service-account-token) mentioned earlier.

The principle of implementation:

When pod is created, admission controller mounts the corresponding secret to a fixed directory in the container (/ var/run/secrets/kubernetes.io/serviceaccount) based on the specified serviceaccount (default is default).

When pod accesses the cluster, the token file in secret can be used by default to authenticate pod.

The authentication information of the default token is:

Group: system:serviceaccounts: [namespaces-name]

User: system:serviceaccount: [namespace-name]: [pod-name]

Container resource configuration management

Application Storage and persistence data Volume pod volumes:

If a container in a pod exits abnormally, how can it be pulled up by kubelet to ensure that the important data generated before is not lost?

How do multiple containers of the same pod share data?

K8s volume type:

1) Local storage: emptydir/hostpath

2) Network storage:

In-tree: awsEBS/gcePersistentDisk/nfs...

Out-tree: flexvolume/csi volume plugin

3) projected volume: secret/configmap/downwardAPI/serviceAccountToken

4) pvc, pv

Persistent Volume

The life cycle of volume declared in pod is the same as that of pod, and the following common scenarios:

Pod destroy and rebuild (pod image upgrade managed by dp)

Host failure migration (statefulset managed pod with remote volume migration)

Multiple pod share the same data volume

Extended implementation of data volume snapshot, resize and other functions

A PV can set multiple access policies. When PVC and PV bound, PV controller will first find the pv collection with the shortest AccessModes list and matching PVC Acessmodes list, and then find the pv object with the smallest capacity and meet the pvc size requirements from the collection.

Pvc design intention

Separation of responsibilities, PVC only need to declare their own storage size, access mode (single node monopoly or multi-node sharing? Read-only or read-write access? PV and its corresponding backend storage information are handed over to the cluster administrator for unified operation and control, and the security access policy is easier to control.

PVC simplifies users' demand for storage, and pv is the actual information carrier of storage. Combine the PVC with the appropriate pv bound through the PersisentVolumeController in kube-controller-manager to meet the actual storage needs of users.

PVC is like an abstract interface in object-oriented programming, and PV is the corresponding implementation of the interface.

Static Volume Provisioning & & Dynamic Volume Provisioning

The former requires cluster administrators to plan or predict storage requirements in advance, while the latter can create different PV templates through stroageclass. User does not need to care about the details of these PV. K8s combines PVC and storageclasss to dynamically create PV objects.

StorageClassName:

Pvc can find the PV with the same value through this field (static provisioning)

You can also dynamically provisioning the new PV object through the storageclass corresponding to this field.

PV state transfer

Description: the PV that reaches the released state cannot bound the new PVC again based on the reclaim policy returning to the available state.

At this point, if you want to reuse the data in the storage corresponding to the original PV, there are only two ways:

Reuse the stored information recorded in old pv to create a new pv object.

Reuse directly from pvc objects, that is, not unbound pvc and pv. (that is, the principle of statefulset dealing with storage state)

Detailed explanation of the complete process of PV&PVC system

The user creates a pvc object that is csi-provisioner watch to, combined with the pvc object and the storageclass declared in it, and invokes csi plugin through grpc. Will request cloud storage service and actually apply for pv storage.

After that, pvcontroller will bound the pvc and the applied pv, and then the pv can be used.

When a user creates a pod object and dispatches it to a specific node node through kube-scheduler, the applied pvmount is transferred to a path that can be used by pod through csi-node-server, and then create/start container.

There are three stages of create / attach/ mount:

Storage topology scheduling

Essential problem

When PV is in Binding or Dynamic Provisioning, you don't know which Node the pod that uses it will be dispatched to? However, access to the PV itself is limited to the "location" (topology) of the node.

Process improvement

The operation of Binding/Dynamic Provisioning PV is done after the scheduling result of Delay to Pod is determined. Benefits:

For the pv object with node affinity in pre-provisioned, after the confirmation of the node running by pod, you can find the appropriate pv object according to node, and then work with the pvc binding used in pod to ensure that the node running by pod meets the requirements of pv for accessing "location".

For the dynamic provisioning PV scenario, after the confirmation of the node running by the pod, you can combine the location information of the node to create a pv object that can be accessed by the node.

Improvement of K8s related components

PV controller: supports deferred binding operations

Dynamic pv provisioner: when creating a pv dynamically, you need to combine the "location" information of the node to be run by pod.

Scheduler: consider the pvc binding requirements of pod when choosing node, that is, to combine the node affinity of PV of pre-provisioned and the restrictions of storageclass.allowedTopologies specified by pvc when dynamic provisioning

When the PVC object is created, because the BindingMode of the corresponding StorageClass is WaitForFirstConsumer, the PV object will not be generated dynamically, but will wait until the result is dispatched by the first pod using the PVC object, and the kube-scheduler will select the nodes that meets the topology restrictions specified in the storageclass.allowedTopologies when scheduling the pod.

K8s deals with volume topology-aware scheduling process

K8s storage architecture and plug-ins mount volume procedures using K8s

The user creates a pod containing pvc (using dynamic storage volumes)

PV controller finds that the pvc is in a state to be bound, calls volume plugin (in-tree or out-of-tree) to create a storage volume and creates a PV object, and binds the created PV to pvc.

According to pod configuration, node status, pv configuration and other information, Scheduler dispatches pod to a node node.

AD controller discovers that pod and pvc are waiting for mounting, and calls volume plugin (in-tree or out-of-tree) to mount the device to the target node (/ dev/vdb).

On the node node, kubelet (volume manager) waits for the device to be mounted, and then mounts the device to the specified directory through volume plugin: / var/lib/kubelet/pods/646154cf-xxx-xxx-xxx/volumes/alicloud~disk/pv-disk

After being told that the mount directory is ready, kubelet starts containers in pod and maps volumes that have been mounted locally to the container in docker-v (bind mount).

K8s storage architecture

PV Controller introduction

Main concepts:

PersistentVolume: persistent storage volume, defining the parameters of premounted storage space in detail. There is no namespace limit, and the maintenance PV is generally created by the cluster administrator.

PersistentVolumeClaim: persistent storage volume declares that the storage interface used by the user, which is not aware of the storage details, belongs to a namespace.

StorageClass: storage class to create a template class for PV storage, that is, storage volumes (including real storage space and PV objects) are created according to the storage template defined by StorageClass.

Main tasks:

PV, PVC declaration cycle management: create and delete PV objects; responsible for state migration of PV and PVC

Bind PVC and PV objects: a pvc must be bound to a PV before it can be used; pv-controller performs bound and unbound operations on pv and pvc according to binding conditions and object status.

PV Controller implementation

ClaimWorker:

Implementing the state transition of PVC

Use the system label "pv.kubernetes.io/bind-completed" to identify whether a pvc is bound status

When pvc is unbound, the PV filtering logic is triggered, bound if a suitable PV is found, and provision if not found. (wait if it is not in-tree provisioner)

VolumeWorker:

Implementing the state transition of PV

If ClaimRef is used to determine whether PV is bound or released,pv state is released, if reclaim is delete, delete logic is executed.

AD Controller introduction

AD controller is responsible for mounting / unmounting data volumes to specific nodes

Core object: DesiredStateOfWorld, ActualStateOfWorld

Core logic: Reconcile, desiredStateOfWorldPopulator

DesiredStateOfWorld: the expected mount state of the data volume in the cluster

ActualStateOfWorld: the actual mount status of data volumes in the cluster

DesiredStateOfWorldPopulator: update DSW,ASW data based on volume mount status

Reconcile: polling performs attach and detach operations based on the status of DSW and ASW objects.

Volume Manager introduction

It is a manager in kubelet, which calls the attach/detach/mount/umount operation of this node Volume

VolumeManager scans the pod status of this node in real time, and performs corresponding operations on the volume that needs to be updated by calling volume plugin.

VolumeManager also performs some common operations depending on the storage type, such as formatting block devices, mounting block devices to a public directory, and so on.

Data structure:

VolumePluginMgr: manages the list of plug-ins for in-tree/out-of-tree on this node

DesiredStateOfWorld: records the expected mount state of the data volume on the node

ActualStateOfWorld: records the actual mount status of the data volume on the node

Core logic:

DesiredStateOfWorldPopulator: synchronizes the pod state that contains data volumes

Reconciler: execute attach/detach/mount/unmount in a loop, and the specific operation is done by calling the interface implemented by volume plugin.

Does AD controller or volume manager do attach/detach operations?

Control through-- enable-controller-attach-detach

Volume Plugins introduction

Provide data volume Provision, Delete, Attach, Detach, Mount, Unmount concrete operation realization according to different storage type, is the function abstract interface of many kinds of concrete storage type.

This is the end of the article on "what is the core principle of kubernetes". Thank you for reading! I believe that everyone has a certain understanding of the knowledge of "what is the core principle of kubernetes". If you want to learn more knowledge, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.