In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
Today, I will talk to you about how to understand the implementation technology and decentralization of Dynamo, which may not be well understood by many people. In order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.
Here I'd like to talk about its decentralization.
Central node
Usually, the distributed storage structures we see have central (master) nodes, such as Google File System (GFS), including central Master and data node Chunck Server;, such as HDFS, central Name Node and data node Data Node. The following two examples are used to illustrate the problems and solutions encountered in setting up the central node.
The central node usually contains the distribution information of the storage unit and the meta-information of the storage content. "consistency" is the core content of the distributed system. In dealing with the problem of consistency, the introduction of the central node can bring great benefits, but, it is also easy to cause problems:
Single point of failure: the solution to this problem mainly depends on hot backup, for example, GFS depends on Shadow Master. The situation of HDFS is more complicated. Before Hadoop 2.0, it relied on Secondary NameNode, which is not the real HA (High Availability). It only merges edits and fsimage periodically to shorten the start-up time of the cluster. Therefore, when something goes wrong with NameNode, it can neither guarantee to provide services immediately nor guarantee the integrity of the data. Now HDFS to ensure the HA of Name Node, there are many practices, including (1) shared image or (2) data replication methods, this article has a systematic introduction.
(the above figure is from "HDFS HA: the Historical Evolution of highly reliable distributed Storage system Solutions")
Extensibility, we can follow this line of thinking to solve this problem:
The central node includes two basic responsibilities, one is the maintenance of the file system, it needs to know which space on each data node stores which data, and the other is the scheduling of data requests. These two can be taken apart.
Changing a single Master into Multi-master,Master can achieve data synchronization in different ways. The advantage of this method is that the horizontal scaling of Master becomes easier, but the problem is consistency. If different Master want to manipulate the same piece of data on the same data node, there needs to be a special way to deal with conflicts.
It will be troublesome when there is a large amount of metafile information, such as small files on HDFS, a large number of files, and low storage efficiency (this is an example of inappropriate use of HDFS, which I mentioned in this article), and Name Node consumes a lot of memory. Either do not use it this way, GFS is more suitable for storing large files, or from the storage architecture, a common approach to software systems is to introduce a new layer, such as a regional autonomous layer between Name Node and Data Node, in which each node independently manages part of the Data Node and is subordinate to Name Node.
Interestingly, the whole Internet can be regarded as a huge distributed system. After practice, we can think that it is indeed decentralized, but it is not "decentralized" in every dimension. For example, the domain name server, * domain name server is a central node. So it would be unwise to rudely remove the central node just for the sake of distribution. of course, Dynamo tried, and here are some of the problems caused by the removal of the central node, and its solutions.
Decentralization of Dynamo
In Dynamo's 2007 paper mentioned above, it is bluntly emphasized that decentralization is an important principle of Dynamo design:
Decentralization: An extension of symmetry, the design should favor decentralized peer-to-peer techniques over centralized control. In the past, centralized control has resulted in outages and the goal is to avoid it as much as possible. This leads to a simpler, more scalable, and more available system.
The designers of Dynamo are aware of the problems caused by centralized systems, including service disruptions, so avoid them as much as possible. Other design principles include:
Incremental scalability, incremental expansion, reducing the impact on the system
Symmetry, symmetry, nodes are all equal to each other
Heterogeneity, polyphasic (I don't know how to translate better), the expansibility of the system can be implemented on different types and capabilities of hardware in different proportions.
The following figure, from this paper, lists the problems encountered and the techniques used to solve them, which are at the core of Dynamo design, and most of them are related to decentralization:
The following is described one by one:
Partioning
Using consistent Hash (Consistent Hashing) to solve the problem of node growth and horizontal expansion, the benefits are consistent with the incremental expansion in the design principle. It is not a new topic in itself, and there are many materials to introduce it on the Internet, so I won't repeat it here. There are two special points to point out in the implementation of Dynamo:
Each physical device is converted into a different number of virtual nodes according to different capabilities.
Each piece of data is mapped to multiple nodes above the entire hash ring, thus forming a replication to ensure availability.
High availablity for writes
Using the vector clock (Vector Clock) to deal with the consistency problem, the vector clock is actually a list of (node,counter) pairs, as shown in the following figure:
D1 write occurs in the node Sx, forming a vector clock [Sx,1], and Sx writes again, so counter increases by 1 and becomes [Sx,2]. Then based on it, D3 and D4 are written twice, so there are two versions, ([Sx,2], [Sy,1]) and ([Sx,2], [Sz,1]), coordinated at D5, Sy occurs before Sz, counter plus 1. There are two ways to coordinate here:
Last write wins, which depends on node clocks, but clocks cannot be absolutely consistent
The client decides.
Handling temporary failures
Sloppy Quorum: hasty quorum, here is a well-known NWR mechanism, among which:
N represents the number of data backups replicated
W indicates the number of copies of successful writes confirmed synchronously (the rest of Nmurw's writes are asynchronous)
R represents the number of copies of successful reads that are synchronously confirmed (each read determines a valid copy by comparing the previously mentioned vector clock / version number).
When Waugh R > N, strong consistency can be guaranteed. For this theorem, a classified example is given as follows:
If a WR, such as Whist2 Magi 1 Magne2, both writes are written synchronously, so it is valid to read either piece of data.
By reconciling the values between N, W, and R, you can tradeoff between consistency and availability (in CAP theory, P cannot be sacrificed, while C and An are tradeoff), because W or R is synchronous, so basically the higher the value of W or R, the worse the Availability.
Hinted Handoff: implied transfer. If node An is temporarily unavailable during the write operation, it can automatically set the
Copies on this node are transferred to other nodes to ensure that the total number of copies is not reduced. The transferred data will be marked with a hint, and when node An is restored, it will be transferred back to A.
Recovering from permanent failures
Use Merkle Tree's anti-entropy (anti-entropy). Merkle is such a data structure that non-leaf nodes provide multi-layer Hash functionality:
The anti-entropy protocol is used to facilitate synchronization between replicas, and the main advantage of using Merkle is that each branch can be checked independently without having to download the entire tree or dataset.
Membership and failure detection
Gossip-based membership protocol (membership protocol) and fault detection. Gossip protocol itself is designed for decentralization. Although there is no guarantee that all nodes are in the same state at a certain time, it can be guaranteed to be consistent at a certain final time. Membership protocols are used to add or subtract nodes from a hash ring.
Complaints about Dynamo
For the decentralization of Dynamo, there are both merits and demerits, after all, the introduction of a bunch of complex mechanisms described above, especially for the consistency of data, is a lot of controversy. Using a Master node, you lose centralization, but the problem of consistency is much easier to solve, and the system is simpler; to say the least, if you want to decentralize, but use a protocol like Paxos to elect a "Master", it can also be relatively concise to ensure consistency. However, the implementation of Dynamo***, which allows users to resolve conflicts (sometimes users are not sure which version to use), is really awkward, while the method of using absolute time to resolve conflicts is inherently flawed in mechanism (time cannot be absolutely synchronized).
There used to be a popular online complaint "Dynamo: A flawed architecture-Part 1", complaining about some Dynamo problems. Sina Tim Yang wrote an article and briefly translated it, so I won't repeat it. Generally speaking, the complaints include:
In terms of consistency, Dynamo has no guarantee to avoid dirty reading.
In the Quorum mechanism, only ringing W > N does not guarantee strong consistency when nodes are not available.
In the case of cross-IDC, the performance of Hinted Handoff mechanism is poor because of remote transmission overhead.
In terms of disaster recovery, when an IDC is down, no one can calculate how much data is lost.
There are some contradictions in the paper, one is the description of peer equivalence, the other is the final consistent description.
Dynamo misleads users, thinking that it is always necessary to make a choice between C and An of CAP. In fact, a single node center can achieve CA at the same time.
Dynamo claims decentralization, but does not fully do so. For example, when a switch failure causes network fragmentation, the service becomes unavailable.
The title of this article reads part 1, but it is a pity that part 2 does not appear. This article caused a lot of controversy, and the author later wrote a "Dynamo-Part I: a followup and re-rebuttals" in response, which concluded with a summary of his views on Dynamo:
Try to avoid dirty reading.
Uncontrolled dirty reading is not acceptable at any time, even in the event of a disaster-even data loss is much better than it. In most cases, managers will shut down some or all of the services. Instead of responding to users with lost or corrupted data
To avoid network fragmentation in a data center, it is unreasonable to consider P (partition tolerance) in a data center.
Centralization does not mean that low Availability, high availability services are possible, although scalability may be a problem
The symmetry of development and design is not well adapted to the asymmetry of hardware and network.
Data center consistency, high availability and scalability can be achieved at the same time, as long as in a data center (that is, when P is abandoned), BigTable+GFS,HBase+HDFS and even Oracle RAC are good examples
Reading and writing of Dynamo can cause dirty reading even in a data center.
No one knows where the time limit to avoid dirty reading is.
Across data centers, there is no way to keep track of how much data is waiting to be updated, and there is no way to know how much data is lost when disaster recovery occurs.
After reading the above, do you have any further understanding of how to understand the implementation technology and decentralization of Dynamo? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.