In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the relevant knowledge of "what are the concepts of big data system". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
1: a simple data model
This is different from the distributed database, most NoSQL systems use a simpler data model, whether MongoDB,Hbase,Redis,Memcache or other Nosql, all adopt a [simple data model], how simple? Generally speaking, each record has a unique Key, and the result of the query is Value, and the query support of the system is only close to the atomic level of records. Few Nosql supports foreign keys and cross-record and cross-segment relationship maintenance. This convention of obtaining a single record in one operation greatly enhances the scalability of the system and eliminates the overhead of distributed transactions.
2: separation of metadata and application data
Generally speaking, needs to maintain two kinds of data: metadata and application data. Metadata is used for system management and information used to describe the state in the middle of the whole cluster, such as the number of nodes and data mapping of the whole Nsql cluster. And application data, that is, usually your business system, the actual requirements of the use of data, depending on the situation varies.
3: weak consistency
Weak consistency? Why does consistency still need to be strong or weak? There are several kinds of consistency, and how is Nosql achieved?
3: high-level scalability and low-end hardware clusters
Effectively reduce costs and expand the overall capacity of the cluster.
Data consistency theory 1:CAP theory 1.1:CAP concept
The composition is as follows:
1.1.1 consistency:
Analyze consistency in simple words, that is, the system is still consistent after performing an operation.
The status of, for example, after you update, all users should read the latest values, such a system is considered to be highly consistent.
1.1.2: availability
Usability: this is easier. What is available? I asked you NoSQL to take a value, you can't turn around in three minutes and don't give it to me.
This is called unavailable, but there are two basic judgments about whether it is available or not:
In "a certain time", "return a certain result".
Within a certain period of time, this is quite necessary and good, otherwise any payment, transaction, data and information transmission will be timed out.
"return the result", this is even more important, you can't tell the front end, hey, there will definitely be an action at the bottom within a certain period of time, but this action
If I don't give you the data, I'll give you "excetpion..."
1.1.3: fault tolerance of partitions:
You can fully understand the function of partition as a big cake, which is divided into multiple pieces. Reading data seems to be eating this big cake, which is essentially a number of independent partitions, and the partitions are interconnected but not dependent on each other. Can dynamically join and leave.
1.2: a troublesome relationship of satisfaction
These three properties of CAP are individual important system requirements that need to be considered when designing and deploying in a distributed system environment. Generally speaking, these three properties cannot be satisfied at the same time. Take a small example: to illustrate, look at the picture:
There are only two Client in the real system. V0 is data, and V0 is a copy on both machines.
1: first, A updates the value of VO, such as the string: newString
2: since V0 is new, as a copy, V0 on G2 also needs to be updated.
3: official update: G1 begins to send messages to G2
The value in 4:G2 is changed, and B begins to read the latest value: newString. It is shown in the figure as v1: stands for value1
Ok, everything is going well. As shown in the following figure
And the reality is always too cruel, you are in the process of updating, process 3 is interrupted. That is, the data is not transmitted correctly, and the data is in an inconsistent state. The data read by B is not the latest version of the data.
CAP's model tells us the fact that consistency may not be guaranteed if fault tolerance is to be guaranteed. Some people will say that it is very simple to solve this problem. Why don't we just do a synchronization operation for this operation of passing updates?
In fact, the update of G1-"G2 is unknowable without synchronization. It is entirely possible to be in a state of delay and interruption. Can guarantee "A", "P"
When it is impossible to guarantee a straight "C" >
In large cases, when the number of nodes is hundreds or thousands, the overhead is magnified so that the service is available but has no use value. That is to say
The availability of "A" is no longer guaranteed.
If three can't be satisfied at the same time, let's see if we can simply satisfy two:
Trade-off of 1.3:CAP 1.3.1: abandon P-sacrifice Partition Fault tolerance
If you want to avoid fault tolerance, the easiest way, of course, is to put all the data on one machine, although there is no guarantee
100% of the system is available, but at least the effect of partitioning will not occur.
1.3.2: abandon A-abandon availability
Giving up usability means that in the process of meeting the problem of fault tolerance, we do not hesitate to ensure fault tolerance and consistency first, even if the service cannot be provided, even if the service is bound to time out.
1.3.3: abandon C-abandon-consistency
Abandoning consistency here does not mean that we no longer guarantee consistency between multiple replicas, but only temporarily allows inconsistencies to ensure final consistency before being used by the service.
This is the end of the introduction of "what are the Concepts of big data system". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.