How to learn distributed quickly 07/15 Update SLTechnology News&Howtos

How to learn distributed quickly

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to learn distributed quickly". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to learn distributed quickly".

Extension of Web application

Why are there distributed applications?

65 Brother: I don't know about this. In the past, all the web applications I wrote were thrown directly into Tomcat and can be accessed by starting Tomcat. This is certainly not a distributed application.

Well, let's start with the fact that we are most familiar with web background applications. In the past, our system visits were small and the business was not complicated. One server and one application could handle all business requests. Later, our company developed, the number of visits increased, and the business expanded. Although the boss still did not give us a raise, he always complained that our system was unstable and could not bear the concurrency. We need to upgrade.

If our server can add configuration indefinitely, then all performance problems are not a problem.

In order to improve the processing power of the system, the first expansion way we think of is to upgrade the system configuration. 8-core cpu is upgraded to 32 cores, 64 cores, 64G memory is upgraded to 128Gpot 256g, bandwidth is 10 gigabytes, 100000, this is called vertical expansion. But this expansion will not be sustainable in the end for the following reasons.

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

The processing capacity of the stand-alone system will eventually reach the bottleneck.

The marginal cost of stand-alone upgrade will become larger and larger.

Nothing can stop us from programming and hitting workers.

As the saying goes, the system can not hold up, add a server, if not, add two.

When vertical scaling reaches the technical bottleneck or the input-output ratio exceeds expectations, we can consider increasing the number of servers to improve concurrency, which is horizontal scaling.

System split

65 Brother: Oh, this is the distributed system? Is it that simple.

Oh, it's not that simple. In horizontal expansion, we have increased the number of servers, but how to make these servers provide stable and effective services as a whole is the key. Now that we have multiple servers, we need to consider how to deploy the system to different nodes.

65 Brother: this is not easy. I will deploy my SpringBoot project to multiple servers and add a nginx in front of it. Now our systems are all like this, stable and efficient perfect. I'll draw you a structure map. (keep your voice down. I knew this when I was at school.)

There are no good years, but someone is carrying a heavy load for you. There are actually two reasons for what you think is simple:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

The system is not completely separated, and it is important that it is still sharing a database.

In this mature system, there are too many mature middleware serving us, such as nginx mentioned above.

There are also two ways to split the system, vertical split and horizontal split. Note that this does not deal with the same problem as the vertical and horizontal expansion mentioned above. (65 Brother: , I know that everything in the world is nothing but vertical and horizontal.

The vertical split of the system is to deploy multiple sets of the same system, all the nodes are not different, the roles and functions are the same, they share part of the functional requests, so that the processing capacity of the whole system increases.

From the point of view of processing web requests, each node of the vertical split processes a complete request, and each node bears a portion of the request volume.

From the point of view of data storage, each data node stores the same business data, and each node stores part of the data.

The horizontal split of the system is to split the system according to different modules or roles, and different modules deal with different things.

From the point of view of the web request, multiple interdependent systems are required to complete a request, and the requirements handled by each node are inconsistent.

From the point of view of data storage, each data node stores the data related to its own business module, and their data is different.

After the vertical split above, each node forms a cluster, while the horizontal split of each node is distributed. This is the difference between clustering and distribution. The cluster can not only improve the concurrent processing capacity mentioned above, but also ensure the high availability of the system. When some nodes fail, the whole system can still provide complete services. Distribution is the same, in addition to improving the concurrency ability, decoupling the system, making the system boundary clearer, and the system function more cohesive is also one of its advantages, so in the actual system, we often use both of these two methods at the same time. And the distributed system that we often talk about actually contains the concept of cluster.

Distributed target

Induction and deduction is the cornerstone of human reason, learning and thinking is to constantly sum up the past experience, so as to get the universal law, and then deduce the law to other things to guide the better practice process.

We explained the origin of distributed systems above, and now let's review and summarize this process. The introduction of distributed system must be based on realistic needs and goals.

65 Brother: what about the distributed goal?

Distribution is designed to meet the following goals:

Transparency: transparency, that is, users do not care about the distribution behind the system, whether the system is distributed or stand-alone, should be transparent to users, users only need to care about the ability of the system available. Transparency here includes the following aspects:

Access transparency: a fixed and unified access interface and mode, which does not change the access mode of the distributed system because of changes within the distributed system.

Location transparency: external visitors do not need to know the specific address of the distributed system, and changes in system nodes will not affect its function.

Concurrency transparency: several processes can use shared resources concurrently without interfering with each other.

Replication transparency: use multiple instances of resources to improve reliability and performance, while users and programmers do not need to know about replicas.

Fault transparency: the failure of some nodes in the distributed system does not affect the overall function of the system.

Mobile transparency: resources and customers can move within the system without being affected.

Performance transparency: when the load changes, the system can be reconfigured to improve performance.

Scalable transparency: systems and applications can be extended without changing the system structure and application algorithms.

Openness: openness, common protocols and ways of use.

Scalability: scalability, with the increase in the number of resources and user access, the system can still maintain its effectiveness, the system is called scalable. Distributed systems should be scalable in terms of system size and system management.

Performance: performance, distributed systems should use more outstanding performance than single applications.

Reliability: reliability, distributed systems should have better security, consistency and error concealment capabilities than monolithic systems.

Distributed challenge

The distributed challenge comes from uncertainty. Think about it, what are the advantages of distributed systems compared to single applications?

65 Brother: with more service nodes, there is network communication between services.

Yes, it seems that 65 students already know how to think and analyze the system. All the challenges of distributed systems stem from the uncertainty of both.

1. Node failure:

The greater the number of nodes, the higher the probability of failure. The distributed system needs to ensure that when a failure occurs, the system is still available, which requires the system to be able to perceive the service status of all nodes and transfer the computing and storage tasks of the node to other nodes in the event of a node failure.

two。 Unreliable network:

Nodes communicate with each other through the network, we all know that the network is unreliable. Possible network problems include: network segmentation, delay, packet loss, disorder. Compared with stand-alone procedure call, the biggest headache of network communication is the uncertainty of time-out and two-way traffic. When a timeout occurs, the network communication initiator cannot determine whether the current request has been successfully processed.

In unreliable networks and nodes, distributed systems still have to ensure their availability, stability and efficiency, which is the most basic requirement of a system. Therefore, the design and architecture of distributed systems are full of challenges.

divide and rule

Distributed system is to make full use of more resources for parallel computing and storage to improve the performance of the system, which is the principle of divide and conquer.

65 Brother: Oh, I see. Is the partition of MapReduce's map,Elasticsearch 's sharding,Kafka 's distributed divide-and-conquer principle?

Yes, 65 elder brother students can not only sum up, but also draw lessons from others. Yes, whether it is map,sharding or partition, or even request routing load balancing, computing or data are split and distributed to different nodes for computing and storage, thus improving the concurrency of the system.

Scores of different cluster types

Sharding

The same points, in different areas, and even different implementation of the system will usually have different views. Sharding is usually a way to distribute different data to different nodes in a data storage system, which is usually translated into data fragments in Chinese.

In MongoDB, for example, when MongoDB stores large amounts of data, one machine may not be enough to store data, or it may not be enough to provide acceptable read and write throughput. At this time, we can split the data on multiple machines, so that the database system can store and process more data.

For example, in Elasticsearch, each index has one or more shards, and the index data is allocated to each shard, which is equivalent to N cups for a bucket of water. Sharding helps scale out, and N shards are distributed as evenly as possible (rebalance) among different nodes.

Partition

The concept of partition can often be seen in Kafka. In kafka, topic is a logical concept. From the point of view of distributed queues, topic is a queue for users. In the concrete implementation of kafka, topic is composed of partition distributed on different nodes. Each partition is multiple partitions split according to the partition algorithm. In kafka, the same partition cannot be consumed by multiple consumer under the same group. So how much partition a topic has, in a sense, means how much concurrent processing power the topic has.

In Amazing's distributed database DynamoDB, a table is also partitioned into different partition in the underlying implementation.

Load balance

Load balancing is a key component of a highly available network infrastructure and is often used to distribute workloads across multiple servers to improve the performance and reliability of websites, applications, databases, or other services.

For example, nginx load balancing distributes http requests to different nodes of the web application through different load balancing strategies, so as to improve the concurrent processing ability of the application.

For example, the client load capacity of dubbo can route dubbo requests to specific producer providing nodes. Load balancing is a capability that a perfect RPC should have.

In the Spring Cloud system, the Robbin component can distribute the request to different nodes in the cluster through the load balancing problem of communication between the microservices of Spring Cloud.

Sub-strategy

Whether it is partition or fragmentation, or partition routing, there are actually some general partition algorithms. Many students may have seen the following concepts in different fields, such as the reverse proxy server nginx seen above, such as distributed message queuing kafka, such as RPC framework Dubbo, which sometimes makes many students feel confused.

In fact, no matter in any field, as long as you grasp its core functions, you can understand that they are thinking about how to divide the processing requests (that is, computing) evenly among different machines. how to distribute data to different nodes.

From a general point of view, there are two strategies, one can be repeatable, the other is unrepeatable.

Repeatable, this strategy allocates calculation and data according to a certain algorithm, and under the same conditions, the results obtained at any time point are the same, so it is reproducible for requests and data with the same conditions. the same request and data at different points in time are always on the same node. This strategy is generally used in situations where there is a data state.

Can not be repeated, this strategy uses a fully random way, even under the same conditions, different time points of the results are not consistent, so it is irreducible, if it is only for the purpose of restoration, if you record the allocated data through metadata, then you can accurately know the location of the data through metadata when you need to restore.

65 Brother: is it so magical? I want to see what strategies different systems have.

Load balancing of Dubbo

Dubbo is Ali's open source distributed service framework. It implements a variety of load balancing strategies.

Random LoadBalance

Random, you can set the random probability by weight. The probability of collision on a section is high, but the larger the amount of adjustment is, the more uniform the distribution is, and it is also more uniform after using the weight according to probability, which is beneficial to dynamically adjust the weight of the provider.

RoundRobin LoadBalance

Polling, setting the polling rate according to the weight after the convention. There is a problem of slow provider accumulating requests, for example, the second machine is slow, but it doesn't hang up, it gets stuck when the request is transferred to the second machine, and over time, all requests are stuck on the second machine.

LeastActive LoadBalance

The minimum number of active calls, the random number of the same active number, and the active number refers to the difference in count before and after the call. Causes slower providers to receive fewer requests, because the slower the provider, the greater the difference in count before and after invocation.

ConsistentHash LoadBalance

Consistent Hash, requests with the same parameters are always sent to the same provider. When a provider hangs up, the request originally sent to that provider, based on the virtual node and spread equally to other providers, will not cause drastic changes.

Partition allocation Strategy of Kafka

The implementation of the multiple partition allocation algorithm (PartitionAssignor) is provided in Kafka:

RangeAssignor

The principle of the RangeAssignor strategy is to divide the total number of consumers and the total number of partitions to obtain a span, and then distribute the partitions evenly according to the span to ensure that the partitions are distributed to all consumers as evenly as possible. For each Topic,RangeAssignor policy, all consumers in the consumption group who subscribe to this Topic are sorted according to the lexicographic order of their names, and then each consumer is divided into a fixed range of partitions. if it is not evenly distributed, then consumers with the top dictionary order will be assigned one more partition.

RoundRobinAssignor

The allocation strategy of RoundRobinAssignor is to sort all the Topic partitions of subscriptions in the consumption group and all consumers to allocate them as evenly as possible (RangeAssignor is sorted for the partitions of a single Topic).

StickyAssignor

Literally, Sticky is "sticky", which can be understood to mean that the allocation result is "sticky"-each allocation change makes the least change compared to the previous allocation (the previous result is sticky), mainly in order to achieve the following two goals:

The distribution of zones is as balanced as possible.

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Try to keep the result of each redistribution consistent with that of the previous one.

65 Brother: wow, it seems that excellent systems are all the same.

Copy

Replicas solve the problem of high availability of distributed clusters. In the cluster system, each server node is unreliable, and each system has the risk of downtime. How to ensure the availability of the whole system in the case of a small number of nodes failure is one of the challenges of the distributed system. Copy is the solution to this kind of problem. Replicas can also improve the ability of concurrent processing, such as data can be separated from reading and writing on different nodes, and can be read in parallel.

In fact, there are many theories here, such as Master-Salve, Leader-Follower, Primary-Shard, Leader-Replica and so on.

Master-Slave Architecture of Mysql

At present, most mainstream relational databases provide master-slave hot backup function. By configuring the master-slave relationship of two (or more) databases, the data updates of one database server can be synchronized to another server. This can not only achieve the separation of read and write of the database, so as to improve the load pressure of the database, but also increase the high availability of data, and multiple data backups reduce the risk of data loss.

Copy Mechanism of Elasticsearch

There is the concept of master shard and replica shard in ES. The main purpose of replica sharding is to fail over. If the node holding the master shard dies, a replica shard will be promoted to the role of the primary shard so as to provide external query services.

CAP theory

In theoretical computer science, CAP Theorem (CAP theorem), also known as Brewer's theorem Theorem, points out that for a distributed computing system, it is impossible to satisfy both distributed system consistency, availability and partition fault tolerance (i.e. "C", "A" and "P" in CAP):

65 Brother: what is consistency, availability and partition fault tolerance?

Consistency (Consistency)

Consistency means that all clients see the same data at the same time, no matter which node they are connected to. For this to happen, whenever data is written to one node, it must immediately be forwarded or copied to all other nodes in the system before the write can be considered "successful."

Availability (Availability)

Any client request can get the response data, and there will be no response errors. In other words, usability is another promise to customers who access the system from the point of view of a distributed system: I will certainly return data to you, will not return errors to you, but do not guarantee that the data is up-to-date, emphasizing that there will be no errors.

Partition fault tolerance

Partition is a communication interruption in a distributed system, a lost or temporarily delayed connection between two nodes. Partition fault tolerance means that the cluster must continue to work despite communication failures between nodes in the system.

If these three properties are combined, the following three situations can be obtained:

CA: a completely strict arbitration protocol, such as 2PC (two-stage submission agreement, first-stage voting, second-stage submission)

CP: incomplete (majority) arbitration protocols, such as Paxos, Raft

AP: protocols that use conflict resolution, such as Dynamo, Gossip

The design of CA and CP systems follow the theory of strong consistency. The difference is that the CA system cannot tolerate node failure. The CP system can tolerate the failure of f of the 2f+1 nodes.

Base theory

CAP theory shows that for a distributed system, it can not satisfy Consistency (strong consistency), Availability (availability) and Partition tolerance (partition tolerance) at the same time, and can only satisfy two of them at most.

In a distributed environment, we will find that we have to choose the P (partition tolerance) element, because the network itself is not 100% reliable and may fail, so partitioning is an inevitable phenomenon. In other words, partition fault tolerance is one of the most basic requirements of distributed systems.

CAP theorem limits that we can not satisfy all three at the same time, but we can satisfy C, An and P as far as possible. This is BASE theorem.

BASE theory is the abbreviation of three phrases: Basically Available (basic availability), Soft State (soft state) and Eventually Consistent (ultimate consistency). Even if strong consistency (Strong consistency) can not be achieved, each application can adopt appropriate ways to make the system achieve final consistency (Eventual consistency) according to its own business characteristics.

Basic available (Basically Available)

Basic availability means that when a distributed system fails, partial availability is allowed to be lost, that is, to ensure the availability of the core.

When e-commerce is booming, in order to cope with the surge in traffic, some users may be directed to downgraded pages, and the service layer may only provide downgraded services, which is a sign of the loss of partial availability.

Soft state (Soft State)

What is the soft state? Relative to atomicity, data copies of multiple nodes are required to be consistent, which is a "hard state".

The soft state means that the data in the system is allowed to have an intermediate state, and it is considered that the state does not affect the overall availability of the system, that is, it allows the system to have data delay in the data copies of many different nodes.

Final consistency (Eventual Consistency)

Final consistency means that all copies of data in the system can finally reach a consistent state after a certain period of time.

Weak consistency is the opposite of strong consistency, and the final consistency is a special case of weak consistency.

BASE theory is aimed at large-scale, highly available and scalable distributed systems. Contrary to the characteristics of traditional ACID and different from the strong consistency model of ACID, BASE proposes to achieve availability by sacrificing strong consistency and allow data inconsistency over a period of time, but finally achieve a consistent state.

Distributed is the inevitable direction of system expansion, and the problems encountered by distributed systems are common. With a large number of excellent projects on the distributed road, predecessors have summed up a large number of rich theories. And distributed systems in different fields also emerge in endlessly, we should not only learn these good theoretical knowledge, but also look at the implementation of different distributed systems, summarize their commonness, and find that they have unique highlights in different fields. more importantly, we should use what we have learned in the practice of daily projects and sum up more law theories in practice.

Thank you for your reading. the above is the content of "how to learn distributed quickly". After the study of this article, I believe you have a deeper understanding of how to learn distributed quickly. The specific use situation also needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.