K8s implements data persistence 07/09 Update SLTechnology News&Howtos

K8s implements data persistence

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Preface

In all virtualization solutions, data persistence is a matter of great concern to us, such as docker, and K8s is no exception. In k8s, there is a concept of data volumes.

K8s data volume mainly solves the following two problems:

Data persistence: typically, after the container is running, the files written to its file system are temporary. When the container crashes, kebelet kill the container and generates a new container. At this point, the newly running container will have no files in the original container, because the container is re-created from the image. Data sharing: there is often a need to share files / folders between containers running in the same pod.

In K8s, the Volume (data volume) has a clear life cycle (the same as the container group (pod) that contains the data volume). As a result, the life cycle of Volume is longer than that of any container in the same container group (pod), and the data is retained no matter how many times the container is restarted. Of course, if the pod no longer exists, the data volume naturally exits. At this point, depending on the type of data volume used by pod, the data may be deleted with the exit of the data volume, or it may be really persisted and can still be used the next time the container group is restarted.

Basically, a data volume is just a directory or file that can be accessed by pod. How this directory comes from depends on the type of volume (different types of volumes use different storage media). Two containers in the same pod can mount a data volume to different directories.

I. data volume type

K8s currently supports 28 volume types (most of which are specific to cloud environments). Here we will write down several volume types commonly used in K8s.

1 、 emptyDir

A data volume of type emptyDir is assigned to the pod when it is created and is not released until the pod is removed. When the data volume is initially allocated, it is always an empty directory. Different containers in the same pod can read and write to the directory and share the data in it (although different containers may mount the data volume to different paths in the container). When the pod is deleted, the data in the emptyDir data volume is permanently deleted. (PS: when the container crashes, kubelet does not delete the pod, but simply restarts the container, so the data in emptyDir still exists after the container crashes and restarts.)

The usage scenarios of emptyDir are as follows:

A blank initial space, such as a merge / sort algorithm, where data is temporarily saved on disk. The checkpoint (intermediate result) is stored in a long calculation so that if the container crashes, it can continue from the last stored checkpoint (intermediate result) instead of starting from scratch. As shared storage between the two containers, the first content management container can store the generated data in it, while a webserver container provides these pages to the outside. By default, emptyDir data volumes are stored on the storage media (mechanical hard disk, SSD, or network storage) of the node node. Example of using emptyDir [root@master ~] # vim emtydir.yaml # pod's yaml file is as follows: apiVersion: v1kind: Podmetadata: name: read-writespec: containers:-name: write # define a container named write image: busybox volumeMounts:-mountPath: / write # when the data persistence type is emtydir The path here refers to the path within the container name: share-volume # specify the local directory name args: # after the container runs, write-/ bin/sh-- c-echo "emtydir test" > / write/hello Sleep 30000-name: read # defines a container named read: busybox volumeMounts:-mountPath: / read name: share-volume # specifies the local directory name args: # after the container runs, read-/ bin/sh-- c-cat / read/hello Sleep 30000 volumes: # here volumes means to explain the mount above-name: share-volume # the name here must correspond to the name value under the mountPath of pod above emptyDir: {} # here indicates an empty directory Mainly defines a type of data persistence [root@master ~] # kubectl apply-f emtydir.yaml # execute yaml file [root@master ~] # kubectl exec-it read-write-c write/ bin/sh # enter the first pod/ # cat / write/hello # to confirm whether the command executed by the yaml file is valid emtydir test [root@master ~] # kubectl exec-it read-write-c read / bin/sh # enter Enter the second container named read to check / # cat / read/hello # to see if the contents of the specified mount directory are the same as those in the write container. Emtydir test# so far. You can at least confirm that the two pod are mounted to the same local directory, and the contents of the files are the same. # so, now let's see which local directory is mounted? [root@master ~] # kubectl get pod-o wide # first use this command to see which node pod is running on # I am running on the node01 node So next you need to check the node01 node # node01 node operation is as follows: [root@node01 ~] # docker ps # check the shipping line container ID number CONTAINER ID IMAGE # omit part 6186a08c6d5f busybox 5f19986f0879 busybox [root@node01 ~] # docker inspect 6186a08c6d5f # View the first container's Details "Mounts": [# find mount field {"Type": "bind" "Source": "/ var/lib/kubelet/pods/86b67ff4-9ca0-4f40-86d8-6778cfe949ec/volumes/kubernetes.io~empty-dir/share-volume", # the source above is the specified local directory "Destination": "/ read", "Mode": "Z", "RW": true "Propagation": "rprivate" [root@node01 ~] # docker inspect 5f19986f0879 # View the details of the second container "Mounts": [# also navigate to the mount field {"Type": "bind", "Source": "/ var/lib/kubelet/pods/86b67ff4-9ca0-4f40-86d8-6778cfe949ec/volumes/kubernetes.io~empty-dir/share-volume" # you can see The local directory specified above and the first container specify the same directory "Destination": "/ write", "Mode": "Z", "RW": true, "Propagation": "rprivate"}, # so far It has been determined that the mount directories of the two containers share the same local directory [root@node01 ~] # cat / var/lib/kubelet/pods/86b67ff4-9ca0-4f40-86d8-6778cfe949ec/volumes/kubernetes.io~empty-dir/share-volume/hello # View the contents of this directory locally, which is consistent with the emtydir test in pod

At this point, the characteristics of emptyDir have been verified, as long as there is a container running in the pod, then the local data will not be lost, but if the pod is deleted, then the local data will no longer exist.

Verify as follows:

Delete a pod on node01 and view the local directory again:

[root@node01 ~] # docker rm-f 6186a08c6d5f # Delete a pod6186a08c6d5f [root@node01 ~] # cat / var/lib/kubelet/pods/86b67ff4-9ca0-4f40-86d8-6778cfe949ec/volumes/kubernetes.io~empty-dir/share-volume/hello # check the local directory and find that the file is still in emtydir test

Delete this pod on master and go to the node01 node again to see if the local directory exists:

Delete pod [root@master ~] # kubectl delete-f emtydir.yaml # from # master and check the local directory again on node01, it will prompt you that this directory [root@node01 ~] # cat / var/lib/kubelet/pods/86b67ff4-9ca0-4f40-86d8-6778cfe949ec/volumes/kubernetes.io~empty-dir/share-volume/hello cat: / var/lib/kubelet/pods/86b67ff4-9ca0-4f40-86d8-6778cfe949ec/volumes/kubernetes.io~empty-dir/share-volume/hello: there is no such file or directory

EmptyDir Summary:

Different containers in the same pod share the same persistence directory. When the pod node is deleted, the contents of the volume will also be deleted. But if only the container is destroyed and the pod is still there, the volume will not be affected. To put it bluntly, the life cycle of emptyDir's data persistence is the same as the pod used. It is generally used as temporary storage.

2. HostPath data volume type

A data volume of type hostPath mounts a file or directory on the file system of the node where the Pod (container group) belongs to the container group (inside the container), similar to the bind mount mount method in docker.

This data persistence approach is rarely used in scenarios because it increases the coupling between pod and nodes.

Most container groups do not need to use hostPath data volumes, but in a few cases, hostPath data volumes are useful:

Applicable scenarios are as follows:

A container needs to access Docker. You can use the / var/lib/docker of the hostPath mount host node to run cAdvisor in the container, and use hostPath to mount the / sys of the host node.

In a nutshell, this approach is generally used for data persistence of the K8s cluster itself and for docker itself.

Since there are few scenarios for its use, there will be no examples here.

3. Persistent data volume type

PersistentVolume (PV storage volume) is a piece of storage space in the cluster, which is managed by the cluster administrator or automatically managed by Storage class (storage class). PV, like pod, deployment and Service, is a resource object.

Now that we have the concept of PV, then the concept of PVC (PersistentVolumeClaim) also has to say that PVC represents a user's request for storage and an application for PV persistence space. K8s clusters may have multiple PV, and you need to constantly create multiple PV for different applications.

For example, pod consumes computing resources of node nodes, while PVC storage volumes claim to consume storage resources of PV. Pod can request a specific amount of computing resources (CPU or memory, etc.), while PVC requests storage resources of a specific size or access mode (only read-write by a single node / read-only by multiple nodes / read-write by multiple nodes).

The relationship between PV and PVC

The relationship between PV (storage volume) and PVC (storage volume declaration) is shown in the following figure:

The explanation in the above figure is as follows:

PV is a storage resource in a cluster, which is usually created and managed by cluster administrators; StorageClass is used to classify PV. If configured correctly, Storage can also dynamically create PV;PVC based on the request of PVC. The request for using this resource is usually made by the application, and the space size of the corresponding StorageClass and requirements is specified. PVC can be mounted to the pod as one of the data volumes. Management process of Storage Volume Declaration (PVC)

The management process for PV and PVC is described as follows:

1. Divide a separate directory on the host for PV use, and define its available size

2. Create the resource object PVC to request the storage space of PV.

3. Add data volumes to pod, and associate data volumes to PVC

4. Pod contains containers, and containers mount data volumes.

In fact, the above explanation so much, may still be in the clouds, the following is a use case, for reference only.

The approximate process of the case is as follows:

The underlying storage is stored in nfs, and then 1G of capacity is divided under the directory of nfs for PV scheduling. Then apply for the storage resource space of the PV by creating a PVC, and finally create a pod test that uses the storage resources declared by PVC to persist the data.

1) build nfs storage

For ease of operation, I build nfs storage directly on master.

[root@master] # yum-y install nfs-utils [root@master ~] # systemctl enable rpcbind [root@master ~] # vim / etc/exports/nfsdata * (rw,sync No_root_squash) [root@master ~] # systemctl start nfs-server [root@master ~] # systemctl enable nfs-server [root@master ~] # showmount-eExport list for master:/nfsdata * 2) create a PV resource object [root@master ~] # vim test-pv.yaml # Edit PV's yaml file apiVersion: v1kind: PersistentVolumemetadata: name: test-pvspec: capacity: storage: 1Gi # the PV can allocate 1G accessModes: -ReadWriteOnce # access mode can only be mounted to a single node in read-write mode: persistentVolumeReclaimPolicy: Recycle # Recycling policy defines the storage class name for Recycle storageClassName: nfs # nfs: # here and the storage class name defined above need to be consistent with path: / nfsdata/test-pv # specify the directory server of nfs : 192.168.20.6 # IP# of the nfs server about the above specific explanation # capacity: specify the size of the PV # AccessModes: specify the access mode # ReadWriteOnce: can only be mounted to a single node read-write (a single node means it can only be used by a single PVC declaration) # ReadOnlyMany: can be mounted to multiple nodes read-only # ReadWriteMany: can be mounted read-write Recycling strategy to multiple nodes # persistentVolumeReclaimPolicy:PV # Recycle: clear data from PV And then automatically recycle it. # Retain: manual recycling is required. # Delete: delete cloud storage resources. (cloud storage only) # PS: note that the recycling policy here refers to whether the source files stored under the PV are deleted after the PV is deleted. # the basis for the association between storageClassName:PV and PVC. [root@master ~] # kubectl apply-f test-pv.yaml # execute yaml file [root@master ~] # kubectl get pv test-pv # since PV is a resource object Then you can naturally view its status in this way. NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGEtest-pv 1Gi RWO Recycle Available nfs 38s# to view the status of PV must be Available before you can use it normally. 3) create PVC resource objects [root@master ~] # vim test-pvc.yaml # write Yaml file apiVersion: v1kind: PersistentVolumeClaimmetadata: name: test-pvcspec: accessModes: # defines the access mode Must be consistent with the access mode defined by PV-ReadWriteOnce resources: requests: storage: 1Gi # Direct request to use the maximum capacity storageClassName: nfs # where the name must be consistent with the name defined by PV [root@master ~] # kubectl apply-f test-pvc.yaml # execute the yaml file # check the status of PV and PVC again (status is bound Indicates that the PV is in use) [root@master ~] # kubectl get pvc # View the status of PVC NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGEtest-pvc Bound test-pv 1Gi RWO nfs 2m10s [root@master ~] # kubectl get pv # View the status of PV NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGEtest-pv 1Gi RWO Recycle Bound default/test-pvc nfs 8m24s4) create a Pod

The pod created here uses the PV you just created to persist the data.

[root@master ~] # vim test-pod.yaml # write the yaml file of pod apiVersion: v1kind: Podmetadata: name: test-podspec: containers:-name: test-pod image: busybox args:-/ bin/sh-- c-sleep 30000 volumeMounts:-mountPath: / testdata name: volumedata # here is the custom name volumes:-name: volumedata # here is the name interpretation defined above The two names must be the same persistentVolumeClaim: claimName: test-pvc [root@master ~] # kubectl apply-f test-pod.yaml # execute the yaml file [root@master ~] # kubectl get pod # check the status of pod and find that it has been in ContainerCreating state # what's going on? NAME READY STATUS RESTARTS AGEtest-pod 0go 1 ContainerCreating 0 23s# when you encounter an abnormal pod state Generally speaking, we can use three ways to debug. The first is to use the kubectl describe command to view the details of the pod. The second is to use the kubectl logs command to view the pod log. The third is to view the message log of the host machine. # here I use the first method to debug [root@master ~] # the last piece of kubectl describe pod test-pod# output is as follows: mount.nfs: mounting 192.168.20.6:/nfsdata/test-pv failed Reason given by server: No such file or directory# was originally when we mounted the nfs storage directory The specified directory does not exist # then create the relevant directory on the nfs server (here is local) [root@master ~] # mkdir-p / nfsdata/test-pv # create the corresponding directory [root@master ~] # kubectl get pod test-pod # and then check the status of pod # if the status of pod is still being created That's because the kubelet component on the node running the pod has not responded yet. If you want to pursue the startup speed of the pod, you can manually restart the kubelet component of the node where the pod is located. [root@master ~] # kubectl get pod test-pod # wait a moment and check again It is found that its pod has running NAME READY STATUS RESTARTS AGEtest-pod 1 Running 0 8m5) to test the effect of data persistence [root@master] # kubectl exec-it test-pod / bin/sh # enter pod/ # echo "test pv pvc" > / testdata/test.txt # to write test information to the directory of data persistence # back to the nfs server Check to see if there is any information written in the container under the shared directory [root@master ~] # cat / nfsdata/test-pv/test.txt # make sure there is some test pv pvc# now view the node of the pod container Then go to the corresponding node and delete it [root@master ~] # kubectl get pod-o wide # I am running on the node02 node NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATEStest-pod 1ram 1 Running 011m 10.244.2.2 node02 # look up the ID number of its pod container in the node02 node Then delete [root@node02 ~] # docker ps # to get the ID number of the container [root@node02 ~] # docker rm-f dd445dce9530 # Delete the newly created container # return to the nfs server and find that the data under its local directory is still in [root@master ~] # cat / nfsdata/test-pv/test.txt test pv pvc#. So now test, delete this pod, is the local data in nfs still there? [root@master ~] # kubectl delete-f test-pod.yaml [root@master ~] # cat / nfsdata/test-pv/test.txt # Oh ho, the data is still in test pv pvc#. What if PVC is deleted now? [root@master] # kubectl delete-f test-pvc.yaml [root@master ~] # cat / nfsdata/test-pv/test.txt # Oh ho, the data is gone. Cat: / nfsdata/test-pv/test.txt: there is no such file or directory

Summary: because the recycling strategy we use when creating the resource object pv is to clear the data in PV and then collect it automatically, while the resource object PV is applied for by PVC, their destruction will not affect the data in the local directory of nfs used for data persistence, whether it is container or pod. However, once the PVC is deleted, Then the local data will no longer exist with the destruction of PVC, that is to say, data persistence is achieved by using data volumes like PV, and the life cycle of data persistence is consistent with the life cycle of PVC.

-this is the end of this article. Thank you for reading-

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.