Getting started with K8s from scratch | Application Storage and persistence of data volumes: core knowledge 07/13 Update SLTechnology News&Howtos

Getting started with K8s from scratch | Application Storage and persistence of data volumes: core knowledge

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Author | Alibaba Senior R & D engineer

1. Volumes introduces Pod Volumes

First, let's take a look at the usage scenario of Pod Volumes:

Scenario 1: if a container in pod exits abnormally at runtime and is pulled back by kubelet, how to ensure that the important data generated by the container is not lost? Scenario 2: what should I do if multiple containers in the same pod want to share data?

In fact, the above two scenarios can be well solved with the help of Volumes. Let's first take a look at the common types of Pod Volumes:

Local storage, commonly used is emptydir/hostpath; network storage: there are two current implementation ways of network storage, one is in-tree, its implementation code is placed in K8s code warehouse, with the increase of K8s support for storage types, this way will bring great burden to the maintenance and development of K8s itself. The second implementation is out-of-tree, which actually decouples K8s itself and strips different storage driver implementations from K8s code repository through abstract interfaces, so out-of-tree is a way to implement network storage plug-ins mainly promoted by the community behind. Projected Volumes: it actually mounts some configuration information, such as secret/configmap, in the container in the form of a volume, so that the programs in the container can access the configuration data through the POSIX interface; PV and PVC are what we will focus on today. Persistent Volumes

Cdn.com/626e004b4affc7da801ab22d2a79de1db3808837.png ">

Next, take a look at PV (Persistent Volumes). Why introduce PV when you already have Pod Volumes? We know that the volume lifecycle declared in pod is the same as pod, and there are several common scenarios:

Scenario 1: pod reconstruction and termination, such as pod managed by Deployment, will generate a new pod and delete the old pod in the process of image upgrade. How to reuse data between the new and old pod? Scenario 2: when the host machine goes down, migrate the above pod. At this time, the pod managed by StatefulSet has actually implemented the semantics of volume migration. At this point, it is obviously impossible to do this through Pod Volumes; scenario 3: how do you declare if you want to share data among multiple pod? We know that if multiple containers in the same pod want to share data, you can use Pod Volumes to solve the problem. When multiple pod want to share data, it is very difficult for Pod Volumes to express this semantics. Scenario 4: what should you do if you want to expand data volumes, such as snapshot and resize?

In the above scenarios, it is difficult to accurately express its reuse / sharing semantics through Pod Volumes, and it is also difficult to extend it. Therefore, the concept of Persistent Volumes is introduced into K8s, which can separate storage and computing, manage storage and computing resources through different components, and then decouple the life cycle relationship between pod and Volume. In this way, when the pod is deleted, the PV it uses still exists and can be reused by the newly created pod.

PVC design intention

Once you know PV, how should you use it?

When users use PV, they actually use PVC. Why do they design PVC when they have PV? The main reason is to simplify the use of storage by K8s users and achieve the separation of responsibilities. Usually, when using storage, users only need to declare the required storage size and access mode.

What is the access mode? It really is: can I use storage that can be shared by multiple node or can only be accessed exclusively by a single node (note that it is node level rather than pod level)? Read-only or read-write access? Users only need to care about these things, and the implementation details related to storage do not need to be concerned.

Through the concept of PVC and PV, user requirements and implementation details are decoupled, and users only need to declare their storage requirements through PVC. PV has cluster administrators and storage-related teams to unify operation and control, which simplifies the way users use storage. As you can see, the design of PV and PVC is actually a bit like the relationship between object-oriented interfaces and implementation. When using the function, users only need to care about the user interface and do not need to care about its internal complex implementation details.

Since PV is centrally controlled by the cluster administrator, let's take a look at how the PV object is generated.

Static Volume Provisioning

The first generation mode: static generation mode-static Provisioning.

Static Provisioning: it is up to the cluster administrator to plan in advance how users in this cluster will use storage. It will pre-allocate some storage, that is, create some PV; in advance, and then when users submit their own storage requirements (that is, PVC), the internal components of K8s will help it bind PVC and PV. After that, when users use pod to use storage, they can find the corresponding PV through PVC, and it can be used.

What are the shortcomings of static generation? As you can see, first of all, pre-allocation by cluster administrators is required, and pre-allocation is actually very difficult to predict the real needs of users. To take the simplest example: if the user needs 20G, but the cluster administrator may have 80g, 100G, but not 20G when allocating it, it is difficult to meet the real needs of users, and it will also cause a waste of resources. Is there a better way?

Dynamic Volume Provisioning

The second access method: dynamic Dynamic Provisioning.

What does dynamic supply mean? That is to say, now the cluster administrator does not pre-allocate PV, he writes a template file, which is used to represent some parameters needed to create a certain type of storage (block storage, file storage, etc.). These parameters are not concerned by the user, and implement the relevant parameters for the storage itself. Users only need to submit their own storage requirements, that is, PVC files, and specify the storage template (StorageClass) to be used in PVC.

The management and control components in the K8s cluster will combine the information dynamics of PVC and StorageClass to generate the storage (PV) that users need. After binding PVC and PV, pod can use PV. Through the StorageClass configuration to generate the storage template needed for storage, and then dynamically create PV objects according to the needs of users, so as to achieve distribution according to needs, which not only does not increase the difficulty of users, but also liberates the operation and maintenance work of cluster administrators.

Second, interpretation of use cases

Let's take a look at exactly how Pod Volumes, PV, PVC, and StorageClass are used.

The use of Pod Volumes

Let's first take a look at the use of Pod Volumes. As shown on the left side of the figure, we can declare the name and type of our volume in the Volumes field in the pod yaml file. The two volumes declared, one using emptyDir and the other using hostPath, are both local volumes. How should I use this volume in the container? It can actually use the volumeMounts field. The name specified in the volumeMounts field is actually the volume it uses, and mountPath is the mount path in the container.

There's another subPath,subPath. What is it?

Take a look at it first. Both containers specify the same volume, which is the cache-volume. So, when multiple containers share the same volume, we can do this through subPath in order to isolate the data. It will create two subdirectories in the volume, and then the data written by Container 1 to cache will actually be written in the subdirectory cache1, and the data written by Container 2 to cache will eventually fall under the cache2 under the subdirectory in this volume.

There is also a readOnly field, readOnly actually means read-only mount, this mount you go to the mount point is actually no way to write data.

In addition, emptyDir and hostPath are both local storage. What is the slight difference between them? EmptyDir is actually a directory that is temporarily created during the creation of pod. This directory will also be deleted with pod deletion, and the data in it will be emptied out. As the name implies, hostPath is actually a path on the host. After pod deletion, this directory still exists, and its data will not be lost. This is a slight difference between the two.

Static PV usage

Let's take a look at how PV and PVC are used.

Let's first look at a static PV creation method. Static PV is first created by the administrator, who here takes NAS, that is, Ali cloud file storage, as an example. I need to first create NAS storage on Aliyun's file storage console, and then fill the relevant information stored in NAS into the PV object. After the PV object is pre-created, users can declare their storage requirements through PVC, and then create pod. Create a pod or mount the storage to a mount point in a container through the fields we just explained.

So let's take a look at how yaml writes it. The cluster administrator first creates the storage at the cloud storage vendor, and then fills in the corresponding information into the PV object.

The Aliyun NAS file you just created stores the corresponding PV. There is an important field: capacity, that is, the size of the storage created, accessModes, and how to access it. We will explain how to access it later.

Then there is a ReclaimPolicy,ReclaimPolicy that means: should the PV be deleted or retained after the storage is used, and after its user pod and PVC are deleted? In fact, it is PV's recycling strategy.

Let's take a look at how the user uses the PV object. When using storage, users need to create a PVC object first. In the PVC object, you only need to specify the storage requirements and do not care about the specific implementation details of the storage itself. What are the storage requirements? The first is the required size, that is, resources.requests.storage;, and then its access method, that is, the access method for this storage, which is declared as ReadWriteMany, that is, multi-node read and write access, which is also a typical feature of file storage.

On the left side of the image above, you can see this declaration: its size and its access mode actually match the PV we just statically created. In this way, when the user submits the PVC, the components related to the K8s cluster will bring the PV PVC bound together. After that, when the user submits the pod yaml, he can write the PVC declaration in the volume, and in the PVC declaration, he can declare which PVC to use through claimName. At this time, the mount method is actually the same as mentioned above. When the yaml is submitted, it can find the PV in the bound through PVC, and then it can use that piece of storage. This is a process from static Provisioning to being used by pod.

Dynamic PV usage

Then take a look at the dynamic Provisioning. As mentioned above in dynamic Provisioning, the system administrator no longer pre-allocates PV, but just creates a template file.

This template file is called StorageClass, in StorageClass, we need to fill in the important information: the first is what is provisioner,provisioner? It actually means that when I create PV and the corresponding storage, which storage plug-in should I use to create it.

These parameters are some of the detailed parameters that need to be specified when creating storage through K8s. Users don't need to care about these parameters, such as regionld, zoneld, fsType, and its type here. ReclaimPolicy has the same meaning as the PV we just explained, that is, the dynamically created PV, what to do with the PV when the user is finished and the Pod and PVC are deleted. What we write in this place is delete, which means that when the user pod and PVC are deleted, the PV will also be deleted.

Next, after the cluster administrator has submitted the StorageClass, that is, the template for creating the PV, the user still needs to write a PVC file first.

The size and access mode stored in PVC files are unchanged. Now you need to add a new field called StorageClassName, which means to specify the name of the template file that dynamically creates the PV, where StorageClassName is the csi-disk declared above.

After submitting the PVC, the relevant components in the K8s cluster will dynamically generate the PV based on the PVC and the corresponding StorageClass to make a binding to the PVC. After that, when users submit their own yaml, the usage and the following process are the same as the previous static use. Find the dynamically created PV through PVC and mount it to the corresponding container.

Analysis of important fields in PV Spec

Next, let's take a look at some important fields of PV:

Capacity: this is easy to understand, which is the size of the storage object; AccessModes: it is also what the user needs to care about, that is, the way I use the PV. It can be used in three ways. One is single node read-write access; the second is multiple node read-only access, which is a common way of data sharing; the third is read-write access on multiple node.

When a user submits a PVC, the two most important fields are Capacity and AccessModes. After submitting the PVC, how do the relevant components in the K8s cluster find the right PV? First of all, it is through the establishment of the AccessModes index for PV to find all the PV list that can meet the requirements of AccessModes in the user's PVC, and then further filter PV according to PVC's Capacity,StorageClassName and Label Selector. If there are multiple PV that meets the conditions, choose the PV with the smallest size of PV and the shortest list of accessmodes, that is, the principle of minimum fit.

ReclaimPolicy: this is what I just mentioned. What should I do with my PV after the PVC of my client PV is deleted? There are three common ways. We will not talk about the first way, but it is no longer recommended in K8s; the second way is delete, that is, after PVC is deleted, the PV will also be deleted; the third way is Retain, that is, after retention, the latter PV needs to be manually handled by the administrator. StorageClassName:StorageClassName is a field that we have to specify when we dynamically Provisioning, that is, we need to specify which template file to use to generate PV; NodeAffinity: that is to say, the PV I created, which node can be mounted and used, is actually limited. Then the restrictions on node are declared through NodeAffinity. In fact, there are restrictions on pod scheduling using the PV, that is, pod must be dispatched to these node that can access the PV before using this PV. This field will be discussed in detail when we next talk about storage topology scheduling. PV state transfer

Next, let's take a look at the state flow of PV. First, after the PV object is created, it will be in a transient pending state; after the real PV is created, it will be in the available state.

Available status means the state that can be used. After submitting the PVC, the user has finished bound by the relevant components of K8s (that is, finding the corresponding PV). At this time, PV and PVC are combined together, and both of them are in the bound state. When the user has finished using the PVC and deleted it, the PV is in the released state. Should it be deleted or retained after that? This will rely on the ReclaimPolicy we just talked about.

Here is a special point to note: when PV is already in the released state, it cannot directly return to the available state, that is, it cannot be bound by a new PVC. If we want to reuse the PV that is already released, what should we usually do at this time?

The first way: we can create a new PV object, and then fill the information of the relevant fields of the PV of the previous released into the new PV object, so that the PV can be combined with the new PVC; the second is that after we delete the pod, do not delete the PVC object, so that the PVC bound to the PV still exists, and the next time pod is used, it can be reused directly through PVC. The migration of Pod with storage managed by StatefulSet in K8s is in this way.

Third, operation demonstration

Next, I'll show you how to operate static Provisioning and dynamic Provisioning in a real environment.

Static Provisioning example

Static Provisioning mainly uses Aliyun's NAS file storage; dynamic Provisioning mainly uses Aliyun's cloud disk. They need corresponding storage plug-ins, which I have already deployed in my K8s cluster in advance (csi-nasplugin is to use the plug-ins needed by Aliyun NAS in K8s, and csi-disk is to use the plug-ins needed by Aliyun disk in K8s).

Let's first take a look at the yaml file of static Provisioning's PV.

VolumeAttributes is the information about the NAS file system that I created in advance in the Aliyun nas console. The main things we need to care about are: capacity is 5GI; accessModes is multi-node read and write access; reclaimPolicy:Retain, that is, when my user's PVC is deleted, my PV will be retained; and the driver used in the process of using this volume.

Then we create the corresponding PV:

Let's take a look at the state of PV in the figure above. It is already in Available, which means it is ready to be used.

Then create the nas-pvc:

We see that by this time the PVC has been newly created and has been bound to the PV we created above. Let's take a look at what is written in PVC's yaml.

It's actually very simple, just the size I need and the accessModes I need. After the submission, it matches the PV that already exists in our cluster, and after the match is successful, it will do the bound.

Next, let's create a pod that uses nas-fs:

As you can see in the figure above, both Pod are already in the running state.

Let's take a look at this pod yaml:

The pod yaml declares the PVC object we just created and mounts it under / data in the nas-container container. Our pod creates two replicas through deployment and schedules the two replicas on different node through anti-affinity.

In the picture above, we can see that the hosts where the two Pod are located are different.

The following figure shows: we log in to the first one, findmnt take a look at its mount information, this is actually mounted on my declared nas-fs, then we touch the test.test.test file below, we will also log in to another container to see if it has been shared.

Let's log out and log in to another pod (the first one just logged in, and now the second one).

As shown in the following figure: we also findmnt, and we can see that the remote mount paths of the two pod are the same, that is, we are using the same NAS PV. Let's see if the one we just created exists.

As you can see, this also exists, indicating that the two pod running on different node share the same nas storage.

Next, let's take a look at what happens after we delete the two pod. Delete Pod first, and then delete the corresponding PVC (the pvc object is protected internally in K8s. If you find that pod is using pvc,pvc when deleting pvc object, it cannot be deleted), this may take a while.

Check to see if the corresponding PVC in the following picture has been deleted.

The picture above shows that it has been deleted. Look again, the nas PV is still there, and its state is in the Released state, that is, the PVC that just used it has been deleted, and then it has been released. And because our RECLAIN POLICY is Retain, so its PV is retained.

Dynamic Provisioning example

Next let's look at the second example, the example of dynamic Provisioning. Let's manually delete the remaining PV, and we can see that there is no PV in the cluster. Let's demonstrate dynamic Provisioning.

First, create a template file that generates PV, that is, storageclass. Take a look at the contents of storageclass. It's actually very simple.

As shown in the figure above, what I specified in advance is the volume plug-in that I want to create the storage (Aliyun cloud disk plug-in, developed by the Aliyun team), which we have deployed in advance; we can see that the parameters part is some parameters needed to create the storage, but the user does not need to care about this information. Then there is reclaimPolicy, that is, whether the PV created through this storageclass will be retained or deleted after it is deleted by the bound PVC.

As shown in the figure above: now that there is no PV in this cluster, let's dynamically submit a PVC file and first take a look at its PVC file. Its accessModes-ReadWriteOnce (because Aliyun cloud disk can only be read and written by a single node, so we declare this way), its storage size requirement is 30g, and its storageClassName is csi-disk, which is the storageclass we just created, that is, it specifies to generate PV through this template.

The PVC is in the pending state at this time, which means that its corresponding PV is still in the process of being created.

After a while, we see that a new PV has been generated, which is actually dynamically generated based on the PVC we submitted and the storageclass specified in the PVC. After that, K8s will bind the generated PV and the PVC we submitted, which is the disk PVC, and then we can use it by creating a pod.

Take another look at pod yaml:

Pod yaml is simple, but also through the PVC declaration, indicating the use of this PVC. Then there is the mount point, which we can create and take a look at.

The following figure shows: we can take a look at Events, which is first scheduled by the scheduler, and then there will be an attachdetach controller, which will do the attach operation of disk, that is, mount our corresponding PV to the dispatched node of the scheduler, and then the corresponding container of Pod can be started and the corresponding disk can be used by the startup container.

Next I will delete PVC and see if PV will be deleted according to our reclaimPolicy. Let's take a look at it first. PVC still exists at this time, and so does the corresponding PV.

Then delete PVC, and then take a look at it after deletion: our PV has also been deleted, that is, according to reclaimPolicy, when we delete PVC, PV will also be deleted.

That's all for our demonstration.

Fourth, the processing flow of PV and PVC in architecture design

Let's next take a look at the complete processing flow of the PV and PVC systems in K8s. Let me first take a look at the csi mentioned in the lower right part of the picture.

What is csi? Csi, whose full name is container storage interface, is the official recommendation for storage plug-in implementation (out of tree) in the K8s community. The implementation of csi can be divided into two parts:

The first part is the general part driven by the K8s community. Like the csi-provisioner and csi-attacher controller; in our figure, the other is practiced by the cloud storage vendor. The OpenApi of the cloud storage vendor is mainly to implement the real create/delete/mount/unmount storage related operations, corresponding to the csi-controller-server and csi-node-server in the figure above.

Next, take a look at the internal processing flow of K8s after the user submits the yaml. When you submit a PVCyaml, you will first generate a PVC object in the cluster, and then the PVC object will be csi-provisioner controller watch. Csi-provisioner will combine the PVC object and the storageClass declared in the PVC object, call csi-controller-server through GRPC, then go to the cloud storage service to create the real storage, and finally create the PV object. Finally, after the PVC and PV objects are bound by the PV controller in the cluster, the PV can be used.

After submitting the pod, the user will first be scheduled by the scheduler to select a suitable node, and then the kubelet above the node will mount the previously created PV to the path that can be used by our pod through the first csi-node-server in the process of creating the pod, and then kubelet will start all the container in create & & start pod.

PV, PVC, and using storage processes through csi

Let's take a closer look at our PV, PVC, and the complete process of using storage through CSI through another diagram.

It is mainly divided into three stages:

The first stage (Create phase) is that after the user submits the PVC, the csi-provisioner creates the storage and generates the PV object. After the PV controller bound,bound the PVC and the generated PV object, the create phase is completed; then, when the user submits the pod yaml, the user will first be scheduled to select an appropriate node, and after the running node of the pod is selected, it will be selected by AD Controller watch to the node selected by pod, and it will find which PV is used in pod. Then it generates an internal object called VolumeAttachment object, which triggers csi-attacher to call csi-controller-server to do the real attache operation, and the attach operation is transferred to the cloud storage vendor OpenAPI. The attach operation is to store the attach on the node where the pod will run. The second phase, the attach phase, is complete; then let's move on to the third phase. The third stage occurs in the process of kubelet creating pod. In the process of creating pod, it first needs to do a mount. The mount operation here is to attach to the disk above the node, and then mount to a specific path that pod can use before kubelet starts to create and start the container. This is the third phase of PV plus PVC to create and consume storage-the mount phase.

Generally speaking, there are three phases: the first create phase, which mainly creates storage; the second attach phase, which mounts that piece of storage to the node (usually load the storage to the node under / dev); and the third mount phase, which further mounts the corresponding storage to the path that pod can use. This is the complete process from creation to use of our PVC, PV, and volumes that have been implemented through CSI.

This paper summarizes

This is the end of this article, here is a brief summary for you:

This paper introduces the use scenario of K8s Volume and its limitations; by introducing the PVC and PV systems of K8s, it illustrates the necessity and design idea that K8s enhances the capability of K8s Volumes in multi-Pod sharing / migration / storage expansion scenarios through PVC and PV systems; by introducing the different supply modes (static and dynamic) of PV, we learn how to provide the required storage for the Pod in the cluster in different ways. Through the complete processing flow of PVC&PV in K8s, we can deeply understand the working principle of PVC&PV.

Alibaba Cloud Native Wechat official account (ID:Alicloudnative) focuses on micro-services, Serverless, containers, Service Mesh and other technology areas, focuses on cloud native popular technology trends, and large-scale cloud native landing practices, so as to be the technical official account that best understands cloud native developers.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.