In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces what the design idea of Ceph is, it has a certain reference value, interested friends can refer to it, I hope you can learn a lot after reading this article, let the editor take you to understand it.
3.1 Target application scenarios targeted by Ceph
To understand the design idea of Ceph, the first thing to do is to understand the target application scenario that Sage designed Ceph for, in other words, "what is the purpose of doing this thing?"
In fact, the original target application scenario of Ceph is large-scale, distributed storage system. The so-called "large-scale" and "distributed" means that it can carry at least PB-level data and is composed of thousands of storage nodes.
Today, when big data's slogan is deeply rooted in the hearts of the people, PB is far from being an exciting system design goal. However, it should be noted that the Ceph project originated in 2004. It was a time when commercial processors were dominated by a single core, and it was common to have only a few dozen GB of hard drives. This is not the same as the current 6-core 12-thread dual-processor, single hard disk 3TB has become commonplace. Therefore, to understand this design goal, we should take into account the actual situation at that time. Of course, as mentioned earlier, there is no theoretical upper limit for the design of Ceph, so the PB level is not a capacity limit for practical applications.
In Sage's mind, such a large-scale storage system can not be viewed from a static point of view. For its dynamic characteristics, the author summarizes the following three "changes":
Changes in the size of storage systems: such a large-scale storage system often does not predict its final scale on the first day of construction, or even does not exist at all. Only with the continuous development of the business and the continuous expansion of the business scale, the system can carry more and more data capacity. This means that the scale of the system naturally changes and becomes larger and larger.
Changes in devices in storage systems: for a system composed of thousands of nodes, node failures and replacements must occur frequently. On the one hand, the system should be reliable enough to prevent the business from being affected by such frequent hardware and underlying software problems. at the same time, it should be as intelligent as possible to reduce the cost of related maintenance operations.
Changes in data in storage systems: for a large-scale storage system that is usually used in Internet applications, changes in stored data are also likely to be highly frequent. New data is constantly written, and existing data is updated, moved and even deleted. This kind of scenario requirement must also be considered in the design.
The above three "changes" are the key features of Ceph target application scenarios. The main features of Ceph are also proposed for these scene features.
3.2 expected technical characteristics for the target application scenario
In view of the above application scenarios, several technical features of Ceph at the beginning of its design are:
High reliability. The so-called "high reliability" is first of all for the data stored in the system, that is, to ensure that the data will not be lost as far as possible. Secondly, it also includes the reliability of the data writing process, that is, when the user writes the data to the Ceph storage system, the data will not be lost due to unexpected conditions.
Highly automated. It includes automatic data replication, automatic re-balancing, automatic failure detection and automatic failure recovery. Generally speaking, these automation characteristics not only ensure the high reliability of the system, but also ensure that the difficulty of operation and maintenance can be maintained at a relatively low level after the expansion of the scale of the system.
High scalability. The concept of "extensibility" here is relatively broad, which includes not only the scalability of system size and storage capacity, but also the linear expansion of aggregate data access bandwidth with the increase of the number of system nodes, and also includes the functional scalability based on the rich and powerful underlying API to provide multiple functions and support multiple applications.
3.3 Design ideas for expected technical characteristics
In view of the expected technical features described in Section 3.2, Sage's design ideas for Ceph can basically be summarized as follows:
Give full play to the computing power of the storage device itself. In fact, the idea of using a computing device (the simplest example is an ordinary server) as the storage node of the storage system was not new even at that time. However, Sage believes that these existing systems basically use these nodes as simple storage nodes. If the computing power on the node is brought into full play, the expected characteristics proposed above can be realized. This has become the core idea of Ceph system design.
Remove all the central points. Once the central point appears in the system, on the one hand, a single point of failure is introduced, on the other hand, it is bound to face the scale and performance bottleneck when the scale of the system expands. In addition, if the central point appears on the critical path of data access, it will inevitably lead to an increase in the delay of data access. These are obviously problems that should not occur in the system envisaged by Sage. Although in the engineering practice of most systems, the problems of single point of failure and performance bottleneck can be alleviated by adding backup to the central point, the Ceph system finally adopts innovative methods to solve this problem more thoroughly.
3.4 key technological innovation supporting the realization of design ideas
No matter how novel and wonderful the design idea is, the final landing must be supported by technical strength. And this is the brightest thing about Ceph.
The core technological innovation of Ceph is the eight words summarized above-- "No need to look up the table, just do the math." In general, a large-scale distributed storage system must be able to solve two basic problems:
One is "where should I write the data?" For a storage system, when a user submits data that needs to be written, the system must make a quick decision to allocate a storage location and space for the data. The speed of this decision affects the data writing delay, and more importantly, the rationality of the decision also affects the uniformity of data distribution. This will further affect the storage unit life, data storage reliability, data access speed and other follow-up problems.
The second is "where did I write the data before". For a storage system, it is also one of the basic capabilities to deal with data addressing problems efficiently and accurately.
In order to solve the above two problems, the common solution of traditional distributed storage system is to introduce a dedicated server node to store the data structure used to maintain the mapping relationship of data storage space. When the user writes / accesses the data, first connect to the server for the search operation, and then connect the corresponding node for subsequent operations after determining / finding the actual storage location of the data. Thus it can be seen that the traditional solution is easy to cause single point of failure and performance bottleneck on the one hand and longer operation delay on the other hand.
In order to solve this problem, Ceph completely abandoned the data addressing method based on look-up table, and instead used the calculation-based method. In short, any client program of a Ceph storage system can determine the storage location of a data according to its ID by simply calculating a small amount of local metadata that is updated irregularly. After comparison, it can be seen that this approach sweeps away the problems of traditional solutions. Almost all the excellent features of Ceph are based on this data addressing method.
Thank you for reading this article carefully. I hope the article "what is the design idea of Ceph" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.