Why is the storage of Kubernetes so difficult? 07/11 Update SLTechnology News&Howtos

Why is the storage of Kubernetes so difficult?

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

With the popularity of container orchestration tools like Kubernetes, the way applications are developed and deployed is undergoing a huge change. The rise of micro-service architecture and the decoupling of infrastructure and application logic from a developer's point of view have made developers more and more focused on building software and delivering value.

Kubernetes can abstract the physical machine it manages, so that developers can obtain the corresponding resources by describing the amount of memory and computing power required, regardless of the underlying infrastructure.

Kubernetes also provides portability for applications when managing Docker images. Once applications are developed using Kubernetes's container architecture, they can be deployed anywhere-- public cloud, hybrid cloud, local-- without any changes to the underlying code.

Although Kubernetes has many advantages, such as scalability, portability, and administrative capabilities, it also has a problem that it does not support state storage. Almost all production applications are stateful, that is, some kind of external storage is required.

While the architecture of Kubernetes is dynamic, the creation and destruction of containers depends on the load and developer specifications, and Pod and containers can repair and replicate themselves. In essence, their lives are short.

However, persistent storage solutions cannot withstand this dynamic behavior, and persistent storage cannot be bound to the rules of dynamic creation and destruction.

Stateful applications face portability challenges when they need to be deployed to another infrastructure, possibly another cloud service provider, local, or hybrid cloud. Persistent storage solutions are bundled to specific cloud providers.

In addition, the storage environment for cloud native applications is not easy to understand. The storage terminology of Kubernetes can be confusing because many terms have complex meanings and subtle changes. In addition, there are many options between native Kubernetes, open source frameworks, and hosted or paid services that developers must consider before making a decision.

The following is an overview of the Cloud original Survival Storage solution published by CNCF (Cloud Native Computing Foundation) (some of which can be viewed by clicking the link):

Perhaps the first thing that comes to mind is to deploy the database in Kubernetes: choose the database solution that meets your needs, containerize it to run on the local disk, and deploy it to the cluster as another workload. However, this does not work well because of the inherent properties of the database.

Containers are built on the stateless principle, which makes spin up and spin down of containers easier. Since there is no data to save and migrate, the cluster does not need to deal with the usually intensive work of disk reading and writing.

For databases, state often needs to be saved. If the database deployed on the cluster as a container is not migrated or is not spin up frequently, then the physical characteristics of the data store come into play. Ideally, the container that uses the data should be in the same Pod as the database.

This is not to say that deploying a database in a container is a bad idea-- in some use cases, this approach is sufficient. In a test environment, or for tasks that do not require a production-level amount of data, the database in the cluster makes sense because the size of the data stored is small.

In a production environment, developers usually rely on external storage.

How does Kubernetes communicate with storage? Use the control plane interface. These interfaces connect Kubernetes to external storage. These external storage solutions connected to Kubernetes are called volume plug-ins (Volume Plugin), which support abstract storage and make it portable.

Previously, volume plug-ins were built, linked, compiled, and released with the core Kubernetes code base. This greatly limits the flexibility of developers and introduces additional maintenance costs. Adding new storage options requires changes to the Kubernetes code base.

With the introduction of CSI and Flexvolume, volume plug-ins can be deployed on clusters without changing the code base.

Native Kubernetes and storage

How does native Kubernetes handle storage? Kubernetes provides some solutions for managing storage: temporary options, persistent storage for persistent volumes, persistent volume declarations, storage classes, or state sets. Wait.

Persistent volumes (PV) are storage units provided by administrators that are independent of any single Pod, freeing them from the short life cycle of Pod.

In addition, a persistent volume declaration (PVC) is a request for storage. Using PVC, you can bind storage to a specific node so that that node can use storage.

There are two ways to deal with storage: static or dynamic.

Through static configuration, administrators provide the PV they think Pod might need before making the actual request, and these PV are manually bound to a specific Pod through explicit PVC.

In practice, a statically defined PV is not compatible with the portable structure of Kubernetes because the storage used may be environment-dependent, such as AWS EBS or GCE persistent disks. Manual binding requires changing the YAML file to point to a provider-specific storage solution.

Static configuration also runs counter to Kubernetes's idea of how developers think about resources: CPU and memory are not pre-allocated, but are bound to Pod or containers, and they are granted dynamically.

Dynamic configuration is done through storage classes. Instead of manually creating the PV in advance, the cluster administrator creates multiple storage profiles, just like templates. When the developer creates the PVC, according to the request, one of the templates is created at the request and attached to the Pod.

The above is just a very broad overview of how external storage is generally handled with native Kubernetes, and there are many other options to consider.

Container storage interface

First of all, let's introduce the Container Storage Interface (Container Storage Interface,CSI). CSI is a unified work carried out by the CNCF Storage working Group to define a standard container storage interface that enables storage drivers to work on any container choreographer.

The CSI specification has been applied to Kubernetes, and many driver plug-ins can be deployed on Kubernetes clusters. Developers can access storage exposed by CSI-compatible volume drivers and CSI volume types on Kubernetes.

With the introduction of CSI, storage can be containerized as another workload and deployed on a Kubernetes cluster.

Open source project

A large number of tools and projects around cloud native technologies are emerging. As one of the most prominent problems in production, there are a considerable number of open source projects dedicated to solving the problem of "dealing with storage on cloud native architecture".

At present, the most popular storage projects are Ceph and Rook.

Ceph is a dynamically managed and horizontally scalable distributed storage cluster. Ceph provides a logical abstraction of storage resources. It is designed to have no single point of failure, self-manageable and software-based. Ceph also provides block, object, or file system interfaces for the same storage cluster.

The architecture of Ceph is very complex, with many underlying technologies, such as RADOS, librados, RADOSGW, RDB, its CRUSH algorithm and monitor, OSD and MDS components. The key point is that Ceph is a distributed storage cluster that provides higher scalability, eliminates a single point of failure without sacrificing performance, and provides unified storage for access to objects, blocks, and files.

Naturally, Ceph has adapted to the cloud native environment. There are many ways to deploy Ceph clusters, such as using Ansible. You can deploy the Ceph cluster using CSI and PVC and get an interface in the Kubernetes cluster.

Ceph architecture

Another interesting and popular project is Rook, a tool designed to aggregate Kubernetes and Ceph-- putting computing and storage in a cluster.

Rook is a Yunyuan Survival Storage and arranger that extends the functionality of Kubernetes. Rook essentially allows Ceph to be placed in a container and provides cluster management logic so that Ceph can be run reliably on Kubernetes. Rook can automate deployment, boot, configuration, scaling, and rebalancing, which is a series of tasks that cluster administrators do.

Rook allows Ceph clusters to be deployed from YAML, like Kubernetes. The YAML file is used as a high-level declaration that the cluster administrator wants to implement in the cluster. Rook starts the cluster and starts active monitoring. Rook acts as a controller, ensuring that the required state declared in the YAML file is supported. Rook runs in a coordination loop that observes the state and operates based on the differences detected.

Rook does not have its own persistent state and does not need to be managed, which shows that it is indeed built according to the principles of Kubernetes.

Rook, which combines Ceph and Kubernetes, is one of the most popular cloud native storage solutions, with nearly 4000 stars, 16.3 million downloads, and more than 100 contributors on Github.

As the first storage project accepted by CNCF, Rook has recently entered the incubation stage.

Finally, for any problem in the application, it is important to identify the requirements and design the system or select tools accordingly. Storage in cloud native environments is no exception. Although the problem is quite complex, there are many tools and methods. With the development of cloud computing, there is no doubt that new solutions will emerge constantly.

Source: Software Engineering Daily author: Gokhan Simsek

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.