How to use Kubernetes to realize persistent Storage of Container 07/11 Update SLTechnology News&Howtos

How to use Kubernetes to realize persistent Storage of Container

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article focuses on "how to use Kubernetes to achieve persistent storage of containers", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to use Kubernetes to achieve persistent storage of containers.

It can be said that containerization has completely changed the way we think about application development. it brings many benefits: a consistent environment between development and production, the use of shared resources but isolated containers, portability between cloud environments, rapid deployment. The list goes on and on. The inherent shortness of the container is the core reason why it is great: the immutable, the same container can be started quickly in an instant. But the transient nature of the container also has a downside: the lack of persistent storage.

Introduction of Kubernetes

The concept of persistent state (Persistent state) is usually large and difficult to move, which is very different from the concept of containers, which is fast, lightweight, and easy to deploy anywhere at any time. It is for this reason that the container specification deliberately excludes the persistent state and instead chooses the storage plug-in to transfer the responsibility for managing the persistent state to the other party.

The open source container orchestration tool Kubernetes has begun to address this problem. In this article, I will introduce you to several components in Kubernetes that help solve the problem of persisting state in a container environment.

Statefulness

The biggest problem with managing persistent state is determining where it should be persisted. There are three options when deciding where persistent storage should be placed, each with its own advantages:

Persistent storage is stored in a container. This method is very effective if the data is replicable and not critical, but you will lose the data when the container is restarted.

Persistent storage is stored on the host node. This approach bypasses the transient problem of the container, but you may encounter similar problems because of the vulnerability of the host node.

Persistent storage is stored in remote storage. This eliminates the unreliability of container and host storage, but requires careful consideration of how to provide / manage remote storage.

When do you need to consider your status?

An application has two key features that require persistent state: 1, the need to persist data before the application is interrupted and restarted, and 2, the need to manage application state across the same interrupts and restarts. Typical examples of such applications are databases and their copies, some kind of logging application, or distributed applications that require remote storage.

However, the persistence requirements of such applications are not the same, because the degree of criticality is obviously different for different applications. For this reason, when designing stateful applications, I often ask myself a few questions:

How much data do we have to manage?

Can I start with the latest snapshot? Or do you need absolutely up-to-date available data?

Did it take too long to restart from the snapshot, or is it enough for this application?

How easy is it to copy data?

How important is this data to the task? Can you "survive" when the container or host terminates, or do you need remote storage?

Are the different Pods in this application interchangeable?

Storage solution

Many applications require that data can be restarted across containers and hosts for persistence, which requires remote storage. Fortunately, Kubernetes is aware of this need and provides a way for Pod to interact with remote storage: Volumes.

Kubernetes Volum

Kubernetes volumes provide a way to interact with remote (or local) storage. These volumes can be thought of as mounted storage that will persist for the lifetime of the closed Pod. The volume has a longer life than any container spin up/down in this Pod, which provides a good solution to the transient nature of the container. The following is an example of a Pod definition that leverages volumes.

ApiVersion: v1

Kind: Pod

Metadata:

Name: test-pod

Spec:

Containers:

-name: test-container

Image: nginx

VolumeMounts:

-mountPath: / data

Name: testvolume

Volumes:

-name: testvolume

# This AWS EBS Volume must already exist.

AwsElasticBlockStore:

VolumeID:

FsType: ext4

As we can see from the above Pod definition, the volumes section under spec specifies the name of the volume and the ID of the storage that has been created (in this case, the EBS volume). To use this volume, the container definition must specify the volume to mount in the containers volumeMounts field under spec.

Some key points to keep in mind when working with volumes:

Kubernetes provides many types of volumes, and a Pod can use any number of volumes at the same time.

The volume can only last as long as a closed Pod. When Pod ceases to exist, the volume also stops.

The provisioning of persistent storage is not handled by the volume or the Pod itself, and the persistent storage after the volume needs to be provided in other ways.

Although volumes solve a huge problem for containerized applications, some applications require that the lifetime of additional volumes exceed the lifetime of Pod. Persistent volumes and persistent volume declarations will be very useful for this use case.

Kubernetes persistent volumes and persistent volume declarations

Kubernetes persistent volumes and persistent volume declarations provide a way to extract details about how storage is provided from how storage is used. Persistent volumes (PV,Persistent Volume) are available persistent storage provided by administrators in a cluster that exist as cluster resources, just like nodes, whose lifecycles are independent of any individual Pod. Persistent volume declaration (PVC,Persistent Volume Claim) is a user's request for storage, similar to the way Pod consumes memory and node resources such as CPU, PVC also consumes PV resources such as storage.

The life cycle of a PV consists of four phases: provisioning, binding, using, and reclaiming.

Supply-the supply of PV can be done in two ways: static or dynamic.

Static provisioning requires the cluster administrator to manually create a large number of PV to be used.

Dynamic provisioning can occur when PVC requests a PV without any manual intervention by the cluster administrator.

Dynamic provisioning requires some advance provisioning in the form of a storage class (Storage Classes) (which we will discuss later).

Binding-when a PVC is created, it has a specific storage space and a specific access mode. When a matching PV is available, no matter how long the PVC takes, it will only bind to the requested PVC. If a matching PV does not exist, the PVC remains loose indefinitely. In the case of dynamically provisioning the PV, the control loop always binds the PV to the requested PVC. Otherwise, PVC will at least get the storage space they require, but the volumes may be more than required.

Use-once PVC claims PV, it can be used as an installation in a closed Pod. Users can specify specific modes (such as ReadWriteOnce, ReadOnlymany, and so on) and other mounted storage options for additional volumes. The installed PV can be used as long as the user needs it.

Recycling-when a user has finished using the storage, he needs to decide what to do with the PV being released. When deciding on a recycling strategy, there are three options: retain, delete, and recycle.

Preserving the PV requires only releasing the PV, without modifying or deleting any contained data, and allowing the same PVC to manually recycle the PV later.

Deleting the PV completely removes the PV and the underlying storage resources.

Recycling PV removes data from the storage resource and makes PV available to any other PVC requests.

The following is an example of a persistent volume (using static provisioning), as well as a persistent volume declaration definition.

ApiVersion: v1

Kind: PersistentVolume

Metadata:

Name: mypv

Spec:

StorageClassName: mysc

Capacity:

Storage: 8Gi

AccessModes:

-ReadWriteOnce

PersistentVolumeReclaimPolicy: Recycle

AwsElasticBlockStore:

VolumeID: # This AWS EBS Volume must already exist.

Persistent volume

ApiVersion: v1

Kind: PersistentVolumeClaim

Metadata:

Name: mypvc

Spec:

StorageClassName: mysc

AccessModes:

-ReadWriteOnce

Resources:

Requests:

Storage: 8Gi

Persistent volume declaration

Persistent volumes define the capacity of specified storage resources, as well as other volume-specific properties, such as recycling policies and access modes. You can use storageClassName under spec to classify PV into specific storage classes, which PVC can use to specify specific storage classes to declare. The persistent volume declaration definition above specifies the properties of the persistent volume it is trying to declare, some of which are storage capacity and access modes. PVC can request a specific PV by specifying the storageClassName field under spec. The PV of a specific class can only be bound to the PVC that requests that class, and the PV of no specified class can only be bound to the PVC that does not request a specific class. Selectors can also be used to specify the specific type of PV to be declared, and more documentation on this can be found here (https://kubernetes.io/docs/concepts/storage/persistent-volumes/#selector).

The following is an example of an Pod definition that uses persistent volume declarations to request storage:

ApiVersion: v1

Kind: Pod

Metadata:

Name: test-pod

Spec:

Containers:

-name: test-container

Image: nginx

VolumeMounts:

-mountPath: / data

Name: myvolume

Volumes:

-name: myvolume

PersistentVolumeClaim:

ClaimName: mypvc

When we compare this Pod definition with the previous definition of using volumes, we can see that they are almost the same. Persistent volume declarations are not intended to interact directly with storage resources, but are used to abstract storage details from Pod.

Some key conclusions about persistent volumes and persistent volume declarations:

* the life cycle of persistent volumes is independent of the life cycle of Pod.

* the persistent volume declaration abstracts the details of storage provisioning from the storage consumption of Pod.

* similar to volumes, persistent volumes and persistent volume declarations do not directly handle the provisioning of storage resources.

Kubernetes storage class and persistent volume declaration

Kubernetes storage classes and persistent volume declarations provide a way to dynamically provide storage resources upon request, eliminating the need for cluster administrators to overprovide / manually provide storage resources to meet demand. Storage classes allow cluster administrators to describe the storage "classes" they provide and use these "classes" as templates when dynamically creating storage resources and persistent volumes. Different storage classes can be defined based on specific application requirements, such as the required quality of service level and backup strategy.

The storage class definition surrounds three specific areas:

Recycling (Reclaim) policy

Provisioning procedure (Provisioner)

Parameter (Parameter)

Reclaim-if persistent volumes are created by storage classes, only Retain or Delete are available as recycling policies, while manually created persistent volumes managed by storage classes retain their assigned recycling policies when they are created.

The Provisioner-- storage class provider is responsible for deciding which volume plug-in to use when providing PV (such as AWS EBS's AWSElasticBlockStore or Portworx volume's PortworxVolume). The Provisioner field is not limited to a list of internally available Provisioner types, but any independent external provider that follows clearly defined specifications can be used to create new persistent volume types.

The last and most important part of the Parameter-- definition of a storage class is the parameter part. Different providers can use different parameters that describe the specification of a particular "class" store.

Here are the persistent volume declarations and storage class definitions:

ApiVersion: v1

Kind: StorageClass

Metadata:

Name: myscz

Provisioner: kubernetes.io/aws-ebs

Parameters:

Type: io1

IopsPerGB: "10"

FsType: ext4

Persistent volume declaration

ApiVersion: v1

Kind: PersistentVolumeClaim

Metadata:

Name: mypvc

Spec:

StorageClassName: mysc

AccessModes:

-ReadWriteOnce

Resources:

Requests:

Storage: 8Gi

Storage class

If we compare the PVC definition with the definition used in the static provisioning use case above, we can see that they are the same.

This is because there is a clear separation between storage "supply" and storage "consumption". Compared with statically created storage classes, the consumption of persistent volumes created using storage classes has some huge advantages, and one of the biggest advantages is the ability to manipulate storage resource values that are available only when the resource is created. This means that we can accurately provide the amount of storage requested by the user without any manual intervention by the cluster administrator. Because storage classes need to be defined in advance by cluster administrators, they still control which types of storage are available to end users, while abstracting all provisioning logic.

The main points of storage class and persistent volume declaration:

* Storage classes and persistent volume declarations allow end users to use the dynamic provisioning of storage resources, thereby eliminating any manual intervention required by cluster administrators.

* the storage class abstracts the details of the storage provisioning and relies on the specified provider to handle the provisioning logic.

Application status

Persistent storage is critical when we think about state. Where is my data? How does my application persist when it fails? And some applications themselves need state management, not just persisting data. This is easiest to see in applications that take advantage of multiple non-interchangeable Pod (for example, copies of the main database Pod and some of its distributed applications such as Zookeeper or Elasticsearch). Applications like this require the ability to assign a unique identifier to each Pod hat during any rescheduling. Kubernetes provides this functionality by using StatefulSet.

Kubernetes StatefulSets

Kubernetes StatefulSet provides functions similar to ReplicaSets and Deployments, but has stable rescheduling. This difference is important for applications that require stable identifiers and orderly deployment, scaling, and deletion. StatefulSet has several different features that can help provide these necessary functions.

Unique network identifier-each Pod in StatefulSet derives its hostname from the name of the StatefulSet and the sequence number of the Pod. The logo of this Pod is sticky, no matter which node the Pod is scheduled to, no matter how many times it is rescheduled. This capability is particularly useful for applications that form non-interchangeable "groups" of Pod logic, such as database replicas and agents in distributed systems. The ability to identify a single Pod is at the core of StatefulSet's strengths.

Orderly deployment, scaling, and deletion-Pod identifiers in StatefulSet are not only unique, but also ordered. The Pod in StatefulSet is created sequentially, and the last Pod waiting is healthy before moving to the next Pod. This behavior also extends to the scaling and deletion of Pod, and no Pod can be updated or extended until all predecessors of Pod are in a healthy state. Similarly, all subsequent Pod must be closed before the Pod terminates. These features allow for stable and predictable changes to the StatefulSet.

Here is an example of a StatefulSet definition:

ApiVersion: v1

Kind: StatefulSet

Metadata:

Name: web

Spec:

Selector:

MatchLabels:

App: nginx # has to match .spec.template.metadata.labels

Replicas: 3

Template:

Metadata:

Labels:

App: nginx # has to match .spec.selector.matchLabels

Spec:

TerminationGracePeriodSeconds: 10

Containers:

-name: nginx

Image: nginx

Ports:

-containerPort: 80

Name: web

VolumeMounts:

-name: www

MountPath: / usr/share/nginx/html

VolumeClaimTemplates:

-metadata:

Name: www

Spec:

StorageClassName: mysc

Resources:

Requests:

Storage: 1Gi

As shown above, the name of the StatefulSet is specified in the name under metada and will be used when creating a closed Pod. This StatefulSets definition will produce three Pod named web-0, web-1, and web-2.

This particular StatefulSet leverages Pvc through the volumeClaimTemplates field under spec to attach persistent volumes to each Pod.

Key points of StatefulSet:

* StatefulSet names its closed pod unique, allowing applications that require non-interchangeable pod

* handle the deployment, extension and deletion of StatefulSet in an orderly manner

While StatefulSet provides the ability to deploy and manage non-interchangeable Pod, there is still a problem: how do I find and use them. This is where Headless Service plays a role.

Kubernetes Headless service

Sometimes our applications do not want or need load balancing or a single service IP, and such applications (primary and replica databases, agents in distributed applications, etc.) need a way to route traffic to the separate Pod that supports the service. Headless services with unique network identifiers and Pod (such as those created using statefulset) can be used together in this use case. The ability to route directly to a single Pod replaces a large amount of performance in the hands of developers, from handling service discovery to routing directly to the main database Pod.

Here is an example of a Headless service:

ApiVersion: v1

Kind: Service

Metadata:

Name: nginx-svc

Spec:

ClusterIP: None

Selector:

App: nginx

Ports:

-name: http

Protocol: TCP

Port: 80

TargetPort: 30001

-name: https

Protocol: TCP

Port: 443

TargetPort: 30002

The property that makes the specification really "Headless" is to set the clusterIP under .spec to None. This particular example uses the selector field under spec to specify how DNS should be configured. In this example, all Pods that match the app: nginx selector will create an A record that points directly to the Pod that supports the service. More information on how DNS automatically configures Headless services can be found here (https://kubernetes.io/docs/concepts/services-networking/service/#headless-services)). This special specification will create endpoints nginx-svc-0, nginx-svc-1, and nginx-svc-2, which will be routed directly to Pod named web-0, web-1, and web-2, respectively.

Key points of Headless services:

* headless service allows direct routing to specific pods

* enable application developers to handle service discovery as they see fit

Conclusion

Kubernetes makes stateful application development a reality in the container world, especially when managing application state and persistent data. Persistent volumes and persistent volume declarations are based on volumes to support persistent data storage, thus supporting data persistence in a primarily transient environment. Storage classes further extend this idea by allowing storage resources to be provided on demand. StatefulSet provides pod uniqueness and stickiness identities, and stateful identities for each Pod that persist during Pod outages and restarts. Headless services can be used with StatefulSet to provide application developers with the ability to take advantage of the uniqueness of Pod based on application requirements.

This article introduced the basic elements required for stateful applications in Kubernetes. As Kubernetes continues to evolve, capabilities around stateful applications will continue to emerge. It is important for stateful application developers and cluster administrators to understand these basic elements.

At this point, I believe you have a deeper understanding of "how to use Kubernetes to achieve persistent storage of containers". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.