Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze the Technical Architecture of ECS object Storage

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces how to analyze the technical architecture of ECS object storage. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

Today, let's talk about the object storage ECS of the long-standing EMC family, where ECS doesn't mean a setter, but Elastic Cloud Storage.

Why does ECS have a long history? because EMC launched content storage Centera in 2001, then Atmos in 2008, and ECS, which was launched in 2014, is the third generation of object storage. Since S3 has become the de facto standard for object storage since 2014, EMC supports the S3 protocol from ECS. However, due to the large number of Centera and Atmos users in the current network, ECS is forced to support many of the original API. It is conceivable that the historical burden of ECS is still very heavy.

The product architecture level of ECS is relatively clear, from top to bottom:

ECS Portal and provisioning Services-Web-based GUI that allows self-service, automation, reporting and management of ECS nodes. It also handles licensing, authentication, multi-tenancy and configuration services.

Data services-services, tools, and API that support objects, HDFS and NFSv3 protocols.

Storage engine-responsible for data storage and retrieval, management of transactions, data protection and replication.

Fabric-provides clustering, health, software and configuration management, upgrade capabilities and alerts.

Infrastructure-the basic operating system of ECS is SUSE Linux Enterprise Server 12 or later, which is a qualified Linux-style operating system for turnkey devices or for commercial hardware.

Hardware-turnkey equipment or qualified commercial hardware.

In data services, as we mentioned earlier, in addition to supporting S3, ECS is also backward compatible with CAS and Atmos, and also supports Swift. However, if it is a new user, certainly only use S3, will not choose those out-of-date protocols. Also, ECS also supports HDFS and NFS protocols, which we'll talk about later.

However, in the support of HDFS, ECS does not use the S3A protocol of the community as other object storage, but makes a dedicated ECS HDFS Client.

The advantage of this is that HDFS semantic access to their own objects is generally better performance and more functions than S3A. Industry object storage uses dedicated HDFS Client to connect object storage in addition to EMC, there are Huawei and XSKY, the same idea.

Because most of the object storage has the ability of multi-site, Hadoop users generally use ECS object storage for disaster recovery, and can also analyze multiple data centers at the same time.

The use of deposit and calculation separation has many advantages, especially EC (erasure code) is very mature in object storage, but it is a relatively new feature in Hadoop. It is rare to see users deploy in the production system, mainly because the reconstruction time is too long when the hard disk fails, and the operation and maintenance staff copy it. The production system uses three copies, which causes almost all Hadoop projects to face the problem of insufficient space, but if the capacity is expanded all the time, there will be the same problem as HCI, and the calculation will be followed by unnecessary capacity expansion, resulting in a lot of waste of resources.

ECS has built-in support for NFS protocol, unified namespace for 8 sites, global locks, and mutual visits between NFS | HDFS | S3.

The Chunk of ECS is fixed 128m, and the granularity of management is a bit large. We know that the default Chunk size for Ceph is 4m.

The data management of ECS uses B+ tree, which supports user-defined attribute reverse query, but there are many restrictions, each bucket can only have 5 index segments, and can not be modified after creation. (the original version of the manual contains these restrictions, but recently ECS released a new version this year. I don't know if there are still these restrictions, just test it.)

Because ECS does not support SSD for write caching, unlike other SDS, it uses memory for write caching. However, because the memory is not protected by power outage, ECS, like the Oracle database, must unload the log before the response write is completed. This mechanism results in double writing of data and poor performance. Therefore, EMC also plans to support SSD for write caching in this year's new version.

The index data of ECS uses 3 copies to ensure its security.

However, all data are saved in EC (copies are not supported). The data is compressed before writing, and if the object is about 128MB (the size of Chunk), EC it directly. If not, write three copies first, and then EC asynchronously. The advantage of this is to improve the performance of small files.

As for reading, the node that accepts the read request first finds the node where the index is located, and then finds the node where the data resides to read the data and returns it to host. Therefore, in general, a read process needs to be processed by three nodes, the IO access path is long, and the request to access an object requires multiple redirected disk access. Although it can be improved with the help of cache metadata, the cache effect may be general in the case of large-scale random access.

ECS also supports cross-site EC, although the space utilization has increased, but there are relatively high requirements for the bandwidth delay of the site, especially in the case of a site failure. Therefore, I know very little about this kind of deployment in China.

Later, EMC also realized this problem and launched the third site disaster recovery only feature in version 3.1 to solve the problem that users want to spend less money to achieve three sites disaster recovery.

Although data replication in ECS is asynchronous, metadata replication is synchronous. In this way, from the user's point of view, the data is strongly consistent. But in this way, if the main site fails, the data will still be lost.

Box-Carting is the small file merging function of ECS. ECS merges small files into 2MB size in memory to improve write performance.

As for local data protection, ECS only supports two protection methods: 124th and 102nd, which is less flexible.

ECS also supports the same compact EC as IBM, reducing the number of initial nodes. However, ECS requires 4 nodes, while generally distributed storage starts from 3 nodes.

Because ECS supports cross-site EC, the more sites there are, the higher the utilization, which is the opposite of the disadvantages of traditional storage data replication. However, as mentioned earlier, China does not seem to have many corporate customers deployed in this way (more public clouds).

The data availability of ECS only advertises 3 9s, which is too real, it is really a clear stream in the debris world. Other manufacturers wish to promote more than 6 9s. However, it is rare that EMC is not afraid of competitors to control the bid? O (∩ _ ∩) O ~

The components of ECS are encapsulated into containers and deployed in a container way.

Although ECS supports pure SDS, sales prefer to promote all-in-one machines, so sales are relatively large. ECS does not have built-in load balancing function, and 5 nodes start, the price is a little expensive.

ECS is not very rich in capacity expansion, and it may take a long time to rebalance.

We can see that there are two small versions of ECS this year, which support important features such as SSD as Cache in 3.3.and new hardware in 3.4. My previous analysis is based on last year's version, and I haven't got much information on the new version.

In the long run, ECS will support features such as all-flash and hybrid cloud in the future.

ECS is the leader in the related quadrants of Gartner and IDC.

Gartner appreciates ECS's UI and container architecture, but points out some problems.

In the key capability score of the product, only the management ability has achieved the best in the industry.

On the whole, I feel that ECS is rich in features, such as cross-site EC and strong consistency of data. However, because foreign manufacturers generally focus on storing warm and cold data, they do not pay enough attention to performance. For example, ECS just supports SSD to do Cache, small file merging strategy is too simple, does not support replica data protection, does not support whole pool expansion and reconstruction of QoS, and so on.

On how to carry out ECS object storage technology architecture analysis is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report