Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Analysis of Kubernetes CSI implementation of SmartX Super Fusion SMTX OS distributed Block Storage

2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/03 Report--

There is no doubt that container and container choreography have become one of the most concerned technologies for IT personnel. According to Gartner survey (1), by 2022, only 10% of CIO have no plans for container use, while 27% of CIO have planned to use containers in production environments.

1. Gartner IOCS 2018 Conference polling results

The original containers are mainly used in stateless applications and do not need persistent storage, but with the more containers are adopted and the powerful automatic management capabilities brought by Kubernetes, more and more stateful applications such as MongoDB, MySQL and PostgreSQL are running on containers, which put forward more requirements and higher requirements for persistent storage.

SmartX distributed block storage (internal code: SMTX ZBS) is independently developed by SmartX. As the core engine of SmartX super-convergence, ZBS has been widely used in private cloud construction, virtualization integration and other scenarios in finance, manufacturing, communications, real estate and other industries to carry user production and development business. Its stability, ease of use and rich storage characteristics have been tested for a long time and have been recognized by a large number of industry leaders.

Recently, SMTX ZBS's CSI driver has been officially added to the official driver list of K8s. Enterprise customers can not only continue to use SMTX ZBS to build private clouds and super-converged systems, but also use it to provide persistent storage for K8s and support stateful applications such as databases, further accelerating the landing of K8s in more scenarios within the enterprise.

Screenshot of Kubernetes official website

The following sections outline the mechanism of CSI and the implementation of the interface between SMTX ZBS and CSI.

Concept and definition

1. Persistent storage support for K8s

In terms of supporting persistent storage, K8s provides a way to connect external common storage systems such as NFS, iSCSI, CephFS, RBD and so on by embedding native Driver to meet the needs of different businesses. However, due to the continuous evolution of the storage ecology itself, using the embedded way of K8s to support the changing storage system will bring great challenges to the K8s project itself in terms of cost and time.

So like other service management systems, K8s gradually separates the specific implementation of the storage system from the main project and allows third-party vendors to access storage services by defining interfaces. We have also gone through two stages on this road:

1. Flex Volume, introduced since version 1.2.

The third-party storage service provider directly deploys Flex Volume-compliant Driver on K8s Server, and uses key API such as mount/unmount/attach/detach exposed by K8s to access the storage system. The main problem with this model is that in this form, the Driver of third-party vendors has access to the root file system of the physical node, which will bring security risks.

2. Container Storage Interface (CSI), introduced since version 1.9, has entered the GA phase (1.13).

CSI defines the storage control primitive and complete control flow required by the container scenario, and in the CSI implementation of K8s, all third-party Driver, like other service extensions of K8s, runs in the form of a service container, which will not directly affect the core system stability of K8s.

two。 Storage object

The storage object defined by CSI is a persistent volume, which is called Volume. There are two types:

Mounted File Volume,Node will Mount Volume to Container in the specified file format, and what you can see from Container's point of view is a directory

Raw Block Volume, which directly exposes Volume to Container in the form of Block Device (disk), for some services that can manipulate disk directly, this form can save the overhead of file system and get better performance.

Raw Block Volume is still in the Beta phase, so the process description below and the current implementation of SMTX's CSI Driver are aimed at Mounted File Volume.

3. Plugin

CSI calls a Driver that implements a CSI service Plugin. It is divided into two types depending on the functionality provided:

Controller Plugin, which is responsible for the lifecycle management of storage objects (Volume). Only one is needed in the cluster.

Node Plugin, when necessary, interacts with the node where the container using Volume is located, provides action support such as Volume mount / unmount on the node, and deploys it on each service node if necessary.

Storage service providers can implement different Plugin combinations according to their own needs. For example, for storage services provided in the form of NFS, you can only implement Controller Plugin to create resources and control access rights. Each node can get services through the standard NFS way, without the need to mount / unmount operations such as Node Plugin. For the storage service provided in the form of iSCSI, Node Plugin is required to complete the conversion from iSCSI LUN to container visible directory through a series of actions such as mounting LUN, formatting and mounting file system on the specified node.

4. Volume life cycle

A typical CSI Volume life cycle is shown in the following figure (from CSI SPEC):

After the Volume is created, it enters the CREATED state, where the Volume exists only in the storage system and is imperceptible to all Node or Container.

After performing the Controlller Publish action on the Volume in the CREATED state, the NODE_READY state is entered on the specified Node. At this time, the Node can sense the Volume, but the Container is still not visible.

Stage the Volume in Node and enter the state of VOL_READY. At this time, Node has established a connection with Volume. Volume exists in some form on Node.

Perform Publish operation on Volume on Node, and Volume will be transformed into the visible form of Container on Node and used by Container to enter the state of formal service.

When the life cycle of Container ends or other situations in which Volume is no longer needed, Node executes the Unpublish Volume action to disconnect the connection between Volume and Container and fall back to VOL_READY

The Node Unstage operation will disconnect the Volume from Node and fall back to the NODE_READY state

The Controller Unpublish operation removes Node's access to Volume.

Delete destroys the Volume from the storage system.

CSI requires that the state transition operation is idempotent and ensures in principle that the state transition of Volume is carried out in an orderly manner.

Depending on how the storage is used and the internal implementation, the state machine can be slightly different, but the corresponding operations must occur in pairs. For example, when there is no need for an additional Stage/Unstage phase to establish a connection between Node and Volume, the state machine can be transformed directly between NODE_READY and PUBISHED through Controller Publish/Unpublish without going through the VOL_READY phase. Plugin must declare which semantics it supports when registering with CSI.

5. RPC

The RPC that CSI requires Plugin to support include:

Identity Service: authentication service, both Controller and Node Plugin need support

GetPluginInfo, get the basic information of Plugin

GetPluginCapabilities, the ability to obtain Plugin support

Probe to detect the health status of Plugin

Controller Service: control service

Volume CRUD, including Volume status check and operation interfaces such as capacity expansion and capacity detection

Access Management of Controller Publish/Unpublish Volume and Node to Volume

Snapshot CRD, snapshot creation and deletion operations. Currently, the Snapshot defined by CSI is only used to create Volume and does not provide the semantics of rollback.

Node Service: node service

Node Stage/Unstage/Publish/Unpublish/GetStats Volume, connection state management of Volume on nodes

Node Expand Volume, the Volume expansion operation on the node, after the volume logical size expansion, it may also need to synchronously expand the file system on Volume and make the Container using Volume aware, so there needs to be a corresponding interface on the Node Plugin.

Node Get Capabilities/Info, basic attributes of Plugin and attribute query of Node

Deployment form

CSI uses Sidecar to decouple CSI Plugin from K8s core logic. The Sidecar representative listens on the standard container of the API specified by CSI, which together with CSI Plugin forms a Pod to provide services, and they are connected through Socket. In this mode, Sidecar becomes the intermediary and isolation zone for the connection between CSI Plugin and K8s. Ideally, the two can work together without direct interaction and influence, solving the security problem.

CSI defines the following Sidecar:

External-provisioner: monitor Volume CRUD API and complete lifecycle management of Volume

External-attacher: monitor Controller [publish | Unpublish] Volume API to control the visibility of Node and Volume

External-snapshotter: monitor Snapshot CRD API and complete lifecycle management of Snapshot

Node-driver-register: listen to Node basic information query API and register Node Plugin. Each node Node Plugin needs to register itself through driver-register to establish a connection with K8s to obtain Node Volume related requests.

Cluster-driver-register: the mode used to register the overall Plugin support with K8s, including whether to skip the Attach phase / whether K8s is required to provide Pod information during the Node Publish Volume phase

Livenessprobe: heartbeat detection, used to detect the survival status of Plugin

Applicable scenario

In the early stage of containerization, Container is mostly used to undertake lightweight stateless services, and the demand for data storage is mostly through local temporary file sharing, or network access to put data on remote log collection or external storage such as DB. From the point of view of program management, the business and data of this model are loosely coupled, independent of each other, and have no strict dependence.

On the other hand, in this mode, the data itself cannot be part of the service and cannot be uniformly managed by K8s. And you need to open a network channel to remote storage services for each application, which is sometimes not a good choice in terms of security.

Based on the persistence volume, the data service provider is also put into the K8s Pod (for example, the persistent volume is mounted as a disk, and the container is deployed to run DB) as part of the complete application, the data can be seamlessly managed with the application, and the business data requests between the internal Pod of all applications can be carried out in the virtual network provided by K8s. Based on the high availability of K8s and the flexible configuration ability of CSI Driver, the reliability and performance of external storage can also be obtained.

SMTX ZBS x CSI

1. SMTX ZBS

SMTX ZBS provides persistent storage for K8s through iSCSI. Its internal structure is shown in the following figure:

SMTX ZBS internal structure

Chunk Server is deployed on each node to manage the local SSD/HDD to provide a unified high-performance hybrid storage service, and Meta is deployed as a metadata management service on some nodes to form a highly reliable cluster of Chunk Server. Each Chunk Server provides protocol access services such as iSCSI Target, and they are logically equivalent in access, that is, they can access all Target and LUN in the cluster from any Chunk Server.

2. ZBS CSI Driver

The deployment form of ZBS CSI Driver is shown in the following figure:

ZBS CSI Driver deployment form

Each CSI Volume corresponds to ZBS iSCSI LUN one by one, and its life cycle is as follows:

1. After receiving the creation request, Create Volume:Controller Plugin will create an iSCSI LUN in ZBS and automatically re-create the required Target,ZBS if necessary. In the implementation, iSCSI LUN and the Target to which it belongs are logical objects and are not bound to the physical disk. 4096 Target can be created in a cluster, and a maximum of 4096 LUN can be created in each Target. Multiple CSI Volume will be in the same Target.

2. Controller Publish Volume: currently, Kubernetes uses Open iSCSI as the data access service on the node. When mounting Target, Open iSCSI will mount all LUN in Target to the host as a Block Device (such as disk such as / dev/sdx). In order to avoid making the host aware that LUN,ZBS, which is not needed and should not be accessed, uses a mechanism similar to LUN Masking to achieve this goal. When the LUN corresponding to the initial Volume is created, no iSCSI initiator (the protocol client of iSCSI) is allowed to discover the LUN. During the Controller Publish Volume phase, ZBS Controller Plugin registers the initiator on the specified Node with the LUN. Only after that, the iSCSI discovery mechanism associated with the Node can discover and access the LUN within the Target.

3. Node Stage Volume:ZBS Node Plugin will mount LUN to the host through the Open iSCSI command and present it as a disk

4. Node Publish Volume:ZBS Node Plugin formats the disk (if the disk has not been formatted before, skip the corresponding step if it has been formatted), and Mount the disk to the host for Container to use

5. Node Unpublish Volume:ZBS Node Plugin changes the file system Unmount on disk

6. Node Stage Volume:ZBS Node Plugin will disconnect the iSCSI link of the disk on the host

7. Controller Unpublish Volume:ZBS Controller Plugin logs out the access rights of the specified Node on the LUN to the ZBS backend

8. Delete Volume:ZBS Controller Plugin requests ZBS to delete the corresponding LUN, and the data space occupied by LUN will be reclaimed in the storage system.

ZBS CSI Driver supports Snapshot CRD operation, and the Snapshot mode is COW. Related requests only involve simple metadata operations, so the associated APIs will respond quickly in synchronous mode. Snapshot itself or a Volume created based on Snapshot will immediately be in the Volume Ready state.

3. Data link

The iSCSI data link between K8s and ZBS is shown in the following figure:

ISCSI data Link between K8s and ZBS

ZBS uses iSCSI Redirector mode to provide iSCSI access service. The iSCSI server address provided by CSI Plugin to Driver is iSCSI Redirector address. When initiator attempts to connect to iSCSI Server (Target side, where Chunk provides Target service in ZBS), Redirector will use the hash method to guide initiator to reconnect to an available Chunk Server.

After the redirection, all data requests are made only between initiator and Chunk, without going through Redirectord. All Chunk Server in ZBS cluster are logically equivalent when dealing with iSCSI access request, that is, any Chunk can provide data access service for any Target LUN in the cluster. The way of redirection can effectively disperse the pressure of data access and make full use of cluster performance.

4. Reliability and availability

At the beginning of its design, SMTX ZBS aims at high reliability, high availability and high performance. It uses mechanisms such as multiple copies, silent data checking and automatic balance and recovery within the cluster to ensure the security and reliability of data. On this basis, the K8s CSI mode achieves higher security than using local storage.

(1) exception handling

K8s compute node exception

The pod on Node A will be automatically pulled up on other nodes (Node B) and will experience the action of Publish Volume to mount on Node B. Node B will connect to Chunk server to provide IO services for Volume associated with Pod, and data previously written to Volume on Node A will not be lost.

Chunk node exception

If the Chunk of the current connection of the computing node is abnormal, the link with the Chunk node will be interrupted, and the computing node will seek a new access node to restore the service quickly from the iSCSI Redirector service. Usually the time of this effect is in seconds.

If the Chunk of a non-current connection is abnormal, there may be a short delay due to the loss of the IO copy, the iSCSI link itself will not be affected, and the impact time will be in seconds.

ISCSI Redirector node exception

SMTX ZBS provides VIP (virtual IP) service, which ensures that only one node in the cluster will hold the VIP. ISCSI Redirector runs on the VIP node when it is abnormal. Naturally, it will be replaced with a new VIP node to provide iSCSI Redirector services. ZBS guarantees that the Redirector services provided by all nodes are equivalent.

(2) dual active and remote backup

On the basis of basic storage functions, ZBS also provides the functions of dual-active cluster and remote backup. With these two functions, K8s CSI Volume can obtain strong and consistent data security across data centers in the same city or automatic regular backup across cities / data centers.

5. Overall deployment form

At present, SMTX supports two deployment forms of CSI, based on VM fusion mode and separation mode. Later, it will provide deployment support for SMTX K8s native fusion mode.

(1) Separation mode

K8s and SMTX ZBS are independent physical clusters, and they are related to each other through the access network. The access network needs to be independent of the service network in K8s and the storage network used by ZBS.

(2) Fusion mode

In converged mode, K8s runs on a virtual machine provided by SMTX OS. Access the ZBS service on the SMTX OS through the access network. This deployment allows for more efficient use of physical resources.

Reference:

CSI Design Doc:

Https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md

CSI SPEC:

Https://github.com/container-storage-interface/spec

CSI Driver Developer Documentation:

Https://kubernetes-csi.github.io/docs/introduction.html

Learn more: https://www.smartx.com

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report