Introduction to Cloud Storage products 07/13 Update SLTechnology News&Howtos

Introduction to Cloud Storage products

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Cloud storage products mainly include object storage, block storage, network file system (NAS), and the most profitable CDN. We will talk about their product characteristics for these mainstream products, and we will know how to select them when there is cloud storage. Of course, we are technical authors who will also briefly talk about implementation ideas. For information security, it is impossible to fully explain the industrial solution. Many upper-layer storage products of major manufacturers in the industry rely heavily on the underlying file system, and we also mention the storage grandmaster DFS.

Linux IO STACK

The essence of cloud computing is the unlimited expansion of stand-alone computing power. Let's first take a look at stand-alone file and IO management. An IO operation in the linux operating system goes through the file system vfs, scheduling algorithm, block device layer, and finally discarding the disk:

(1) the vfs layer has specific NFS/smbfs supporting network protocols to derive NAS products.

(2) VFS also has a fuse file system that can be switched to the user-mode context. As long as the upper distributed storage is adapted to the Libfuse interface, the back-end storage can be accessed.

(3) in the device layer, block storage is derived by extending the ISCSI network protocol.

Storage product architecture genre

Layered or flat layer:

For example, hbase, the underlying file system is based on hdfs, and hbase does not have to consider replication and focuses on its own domain problems.

Features: greatly reduce development costs, stability depends on the underlying storage, the bottom is unstable, the upper layer suffers.

Shaft:

Make your own replication, make your own copy recover, and do your own recover master-slave architecture when writing.

Two-tier index system to solve lots of small file

In the first layer, master maintains a routing table and finds the corresponding slave location (ip+port) through fileurl

The second layer, slave stand-alone index system, find the specific location, read out the raw data DFS

Features: rich posix-like semantics, featured Append-only storage, does not support pwrite

There may be problems:

(1) Pb-level storage scheme, not EB-level. Because of namenode centralized server, memory & qps bottleneck, bat mass company needs hundreds of clusters of operation and maintenance.

(2) default three copies with high cost

(3) strong consistent writing, slow node problem

Evolution:

GFS2 split namenode, split into directory tree, blockservice, plus ferdaration, but namespace centralized server defects remain, while split image is to stop service, horizontal scaling is not so friendly.

Object Storage:

Metadata management

Blobstorage: blobid- > [raw data]

Metastore,aws S3, also known as keymap, is essentially a kv system. Store content file_url- > [blobid list]

I am going to the O path

(1) httpserver receives muti-part form, receives fixed size raw data, and cuts it into K equal strips.

(2) the EC was made to generate (Nmurk) coding blocks, and a total of N shard were obtained. Now the question becomes where to store these N pieces of data

(3) the agent of the client continues to apply to blobstorage for a global id, which represents the address of the actual backend node and the actual physical volumes managed by this node. Each of our shard data is equally stored on these physical volumes.

(4) distribute and write N copies of data. If the number of secure copies is met, the success of the write can be returned. If the write fails, it can be repaired by delayed EC.

(5) httpserver writes the file file and the corresponding fragment list to metastore in the form of KV.

Features:

Based on http protocol ws service, the interface is simple, put/get, high delay. EB-level storage scheme is suitable for cloud products. The deep directory tree becomes a two-tier directory structure (bucket+object).

Disadvantages:

Posix semantic interfaces are too few to provide append semantics (actually provided through overwrite), let alone random writing.

Iscsi model

The part that interacts with the backend is implemented in the kernel, and the backend target parses the iscsi protocol and maps the request to the backend distributed storage.

Features:

(1) most of the request sizes are 4K-aligned blocksize. Block devices generally use upper-layer file systems, while the block size of most mainstream file systems is 4KB, and the minimum file operation granularity is blocks, so the vast majority of IO requests are 4KB-aligned.

(2) strong consistency. The block device must provide strong consistency, that is, after the write returns, it can read the data written in.

(3) support random write, low latency for users to build file system based on virtual block devices (ext4), for file editing operations are very frequent, so it is necessary to support random write. Better than NAS/Fuse products, only hack block devices read and write, the upper dentry lookup still follows the original IO path, and there is no rpc problem initiated by NAS/FUSE dentry's lookup.

(4) at the product level, the capacity needs to be purchased in advance, and the capacity expansion needs to be remounted, which is easier to waste space than NAS.

Implementation model:

The logical volume of the cloud disk is split by block, and in order to facilitate recover, it is split by 1G. Layer 1 routing is managed by blockManager, mapped to logical block by volumeid+offset, and logical block location is located on three blockserver. Blockserver pre-creates a 1G file (falloc to prevent insufficient space during writing), which is called physical block. All IO operations in this interval for logical volumes fall on this physical block file, making it easy to implement pwrite. Of course, it can also be based on a bare disk, which in os's view is a large file, divided into different 1G files.

IO path:

There will be a file system on the upper layer of the block device. Through the io scheduling algorithm and merging io operations, the IO requests issued by isici protocol are all operations on sector LBA, so they can be simply abstracted into the operation of adding offsets to volume id. Let's briefly talk about the EBS (Elastic Block Store) layer IO path.

(1) the IO request sent from the network is for volume+offerset operation, which is assumed to be a write request.

(2) find the logical block through blockManager

(3) find the physical address (ip+port) of block and the replicationGroup of block in memory

(4) send io request to replicationGroup using industry common copy chain, such as raft protocol, and raft helps us solve the problem of failed tuncate when writing.

(5) when a single node receives an IO request, the LBA is converted into the real file offset, and the pwrite is written down.

Optimize

A. It is conceivable that under this storage model, there will be a large number of random writes in the back-end node, and the throughput is certainly not high. There is a lot of room for optimization to change random writes into sequential writes in a way similar to the LSM engine. Readers can think deeply. I will not discuss it in detail in this article.

B. Virtual disk can be cut off, which is equivalent to the idea of raid disk, and the IO of a single disk becomes more than one disk, increasing throughput.

NAS

Users access shared files through the mount directory, and the mount point hangs on a NFS protocol file system and accesses NFS server through tcp.

NFS server is an agent that eventually accesses our back-end storage system through libcfs.

Back-end storage system

DS contains metastore and datastore,metastore for managing inode

We fully absorb the shortcomings of DFS in the industry, solve the bottleneck of Namenode centralized server, and fully consider the advantages of bigtable. Metastore can be based on distributed database (newsql) and recall bigtable. A user's files are scattered on multiple tabletserver, allowing users to operate across tabletserver rename, so distributed transactions are required to achieve the above guarantee. For the improvement of DFS, we persist the directory tree to imitate linux fs dentry management. The mapping rules are as follows: two tables, dentry table and inode table, dentry table description directory tree, inode table description file block list and atime,mtime,uid,gid and other source information. Generally speaking, the hard chain is sufficient. In this scenario, dentry can be copied in multiple copies, pointing to an inode. Dentry is associated to the inode table through the external key

For example, lookup child node

SELECT i.* FROM Dentry d, Inode i WHERE d.PARENT_DID=$PARENT_ID

Datastore

Features: it is required to provide random write, so the design idea of EBS is the same as block storage. Large files are cut into blocks and organized by blocks. There are real physical block files on dataserver, providing pwrite operations.

Characteristics

Flexible capacity, unlimited capacity, multi-machine mount parallel read and write, IO linear growth, support random write ratio block storage advantage lies in how much to spend, no need to apply for capacity in advance, true flexibility

Shortcoming

Vfs layer dentry lookup each level directory initiates rpc with high latency.

Summary

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.