How to master distributed system 07/16 Update SLTechnology News&Howtos

How to master distributed system

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to master distributed systems". The content of the explanation is simple and clear, and it is easy to learn and understand. let's follow the editor's ideas to study and learn "how to master distributed systems".

1. What is a distributed system?

The academic definition of distributed system seen from the network is simply a set of computers that work together to make users feel like a unified whole system.

However, because this definition is too concise, many beginners will unconsciously confuse the concept of distributed systems.

What does it mean? Let me ask you here, when we use keepalived to build high-availability clusters, are we building distributed systems? When we don't have enough concurrency and build a bunch of machines to do load balancing, are we building a distributed system?

When you silently answer yes, or you don't know if it is, you are confused about the concept of distributed systems.

Here, we need to draw a boundary for the distributed system to tell you that there are either multiple machines stacked together or the distributed system. For those two questions, the correct answer is the high availability cluster made by keepalived, and the load balancing with Nginx or lvs followed by a bunch of application clusters. They are not distributed systems, they are just clusters.

Similarly, databases such as the master-slave and dual-master of MySQL are certainly not distributed systems. Because these clusters are missing the core of the distributed system:

Collaboration between servers where the application is located

To clarify clustering and distribution, let me give you another easy-to-understand example:

Suppose I start a software company one day, and I am the only programmer in the company. I do all the front-end, back-end and testing work, and I can finish a project in a month.

Later, there were so many projects that I was too busy. In order to make more money, what should I do? I thought of two ways.

Hire another full stack engineer who is as strong as me, and each of us will work on the project on our own, so that we can finish two projects a month. The two of us formed a cluster.

Recruit a front-end, a test with me, front-end, back-end, test separately. By working together, we can finish a project in half a month. At this point, our relationship is distributed.

You can see from the above example:

Multiple servers in the cluster are doing the same thing, which does not shorten the time it takes to process one thing.

Distributed, however, is to take things apart and multiple servers to do things separately, which can shorten the time.

Once you know what a distributed system is, what should the simplest distributed system look like?

Suppose we make a system, which has only two functions: 1. Registration, 2. Log in

What if we want to make this system a distributed system? The simplest thing is to make the registration function and login function into two sets of sub-services, and then deploy them to the two servers to cooperate with each other, which becomes the simplest distributed system.

You might be shocked to see this:

This is a distributed system?

What about so many technology stacks of distributed systems that I want to learn?

What about those high-end algorithms?

What about the fault-tolerant mechanism that flashes in an instant?

What about the function of seamless hot upgrade?

What exactly is the problem?

Is this simple system we build really the distributed system we talk about every day?

two。 Why do we need a distributed system?

Why do you want a distributed system? The answer is simple: forced by the situation! A distributed system is often the ultimate solution adopted after the development of the business.

Suppose the company starts a new online business, and we have to build and develop a business system for this business. Often at this time, because the future of the project is unknown, and because we have to quickly go online and enter the market to make trial and error, we may give priority to a set of single architecture and go online first.

With the development and operation of the business, the first problem we often face is the collapse of the system and the downtime of the server.

At this time, we will develop a set of highly available architecture to solve the problem. If the same project is deployed on multiple machines, if there is something wrong with one machine, just switch to another to provide services.

Subsequently, due to the further development and growth of the business, at this time, the bottleneck is often the response time of the system. The increase in response time has a direct impact on the user experience, which itself reflects a bottleneck in throughput.

For this kind of problem, architects will come up with good ideas to solve the problem. At this time, the system architecture begins to become complex, because don't forget that we need to ensure the high availability of services while ensuring load balancing.

So far, there seems to be no problem. We ensure the reliability of the system through high availability and disperse the pressure of the system through load balancing.

However, none of the above solutions are distributed, and the system is not a distributed system. It is still the clunky architecture of Monoliths, which is ridiculed by some techno-lunatics.

Do we still need to be distributed?

The picture above is a small part of the architecture of the payment platform of a large factory.

From this picture, we can see how complicated the business will be in the future. In the face of such a complex business, we found that the kind of cluster we had before didn't make sense.

At this time, it is necessary to split the business.

Although the business has been split, but these businesses ultimately have to cooperate with the outside world to provide an overall service, at this time, it is the time to really need a distributed system. We need a set of systems that cooperate with each other on different servers.

So we say that the distributed system is the ultimate solution after the development of the business. In the end, the business is complex enough to split, so distributed systems are a natural requirement.

Here, we can also answer the questions we faced in the previous section. What we need is not a simple meaningless distribution that distributes the modules directly, not a simple module decomposition. What we need is that the system can still:

Maintain excellent performance

With incomparably reliable availability

And excellent flexibility.

In order to ensure the above three indicators, there is a complicated and difficult technology stack of distributed system.

3. Technology Stack of distributed system

As we said above, the emergence of distributed systems is completely forced by form and is the final result of business development. As a result of the split of the business, we are forced to generate more distributed requirements and technologies to address them:

Because there are many business splits, the corresponding modules need to communicate with each other. in order to ensure the fast and reliable communication, we need to master the distributed communication technology.

Too many business splits, each module may also need to cluster, so many server resources, in order to ensure the accurate allocation of resources, we also need to consider distributed resource management and load scheduling technology.

After the business split, a lot of shared data needs to be accessed between modules. In order to ensure a safe and complete data state, we also need to use distributed coordination and synchronization technology.

When it comes to the stage of business split, the data must be huge. In order to ensure the reliability of data storage and to ensure excellent data reading and writing performance, we need distributed storage technology.

The business is so complex, for the development of the company, the business can continue to expand, we need to be able to more accurate marketing and operation, we also need real-time, offline processing and analysis of data, at this time, we have to consider distributed computing technology.

After the split of the business, the overall architecture has changed dramatically, and it is no longer possible to use the previous cluster thinking to consider high availability, then distributed reliability technology should be brought into our grasp.

You see, the technology stacks of distributed systems are so many and so complicated, right? don't panic.

I am not writing this article to persuade you to quit. We must learn step by step and subject by topic, and gradually master the overall distributed technology stack.

4. How to learn the Technology Stack of distributed system

In the distributed technology stack, we can see that there is actually a classification of distributed technology, and we can grasp the concepts and ideas behind each category of distributed technology according to different categories. No matter how many distributed technologies are implemented, these implementations are always based on the principles of the distributed technology in which they are classified as the core layer.

At the same time, in the course of learning, we must combine theory with practice and learn according to our actual development and architecture.

Moreover, the business is gradually developing, and the project will not develop very large at once. This gives us the time and opportunity to learn step by step and master step by step.

4.1 distributed communication

So how exactly do you do it?

First of all, what is the foundation of distribution? In my own experience, I think it is communication, and the most important thing is the communication mechanism in those modules in the distributed system.

And how to learn the communication mechanism? I think the first thing to do is to understand the differences between the communication mechanisms available to us. It is particularly important to understand the shortcomings of various communication mechanisms. Yes, you read it right, it's a flaw.

Why are shortcomings the most important? Because when the architect is in the architecture, one of the most important work is to do the technology selection. In many cases, the application scene of the goal of technology selection is often very vague. if we can understand the shortcomings of each selection, it plays an extremely important role in whether the results of selection are accurate or not.

For example, if we want to communicate between modules now, should we use RPC or MQ? At this point, if we know the shortcomings of RPC and MQ, we can easily make a more accurate selection.

Disadvantages of RPC:

You can't cut the flow.

Cannot broadcast to multiple modules

There is no guarantee for message delivery.

There is no decoupling between modules and modules

Disadvantages of MQ:

Delay time cannot be guaranteed

It is not suitable for transactions with strong consistency

Increase the complexity of the system

Reduces the availability of the system

Well, knowing the disadvantages, it will be easy for us to choose the model. If we have a business that deducts fees in real time, we definitely want to do RPC, because it is delay-sensitive and requires strong consistency.

If we now have a business that needs to send accounting requests to both the accounting system and the partner, then we may choose MQ communication at this time.

4.2 distributed coordination and synchronization

After we understand distributed communication, I think the next step is distributed coordination and synchronization.

Because in reality, even if the system is distributed, it is often not very large, and distributed resource management can be ignored for the time being. Distributed storage may also be using database active / standby or Sharding mode to resist. And the need for distributed computing may be less urgent.

However, once there is a problem with the global state in the distributed system, it is an accident. So it must be urgent and important to understand distributed coordination and synchronization.

So how to learn coordination and synchronization?

We need to know exactly what we call coordinating data access and synchronizing data access. In fact, the essence of coordinating data access is to prioritize the requests for data access, which is the essence of coordinating data access. And how to define priority? What is the definition of priority? That's what we need to learn.

As for synchronization, it is actually the protection of data access. How do I restrict access to data? What is the policy for restricting data access? Is the essence of synchronization.

Then, if we understand the data coordination and synchronization of multithreading, we can grasp the technical nature of distributed coordination more easily and quickly through the similarities and differences between distributed and multithreading.

4.3 distributed storage

When we understand distributed coordination and synchronization, we should focus on distributed storage. Because the core of the business is data, massive data ultimately need distributed storage to solve the problem of secure and reliable persistence.

What is the most important thing about distributed storage? It is not the various implementations of storage, but the foundation of distributed storage: CAP theory.

Through our understanding of CAP theory, it will be very easy to understand how distributed storage implementations implement the corresponding CP or AP. And with the understanding of CAP, we can understand whether the business needs CP or AP according to the real business requirements, and then we can make the appropriate selection of distributed storage based on these.

4.4 distributed computing

When we learn about distributed storage, we should learn about distributed computing. Because distributed computing is likely to become an important operational requirement. On the whole, there are four modes of distributed computing. No matter how much you change, you can't escape these four modes.

In terms of calculation, there are only two methods:

MR mode (MapReduce)

Stream mode

From the perspective of the processing process, there are only two modes:

Actor mode

Pipeline mode

4.5 distributed reliability

At this point, after knowing this knowledge, architects are comfortable with the architectural tasks of a general company. In fact, a complete knowledge of the forward distributed learning process is almost enough.

At this point, we also need to know the general processing scheme of distributed reliability. In fact, there are generally no more than three:

A cluster that makes load balancing for a large number of modules

Flow control can be carried out for some modules with resource constraints.

When there is a problem with the server corresponding to any module, try not to let it affect the normal operation of the system, and this is called fault isolation.

For the above three schemes, two of them are actually very general technologies, even if we do not do distributed, we still need to learn and understand.

Only for the third kind, fault isolation, is the need for in-depth understanding. But fault isolation is not a high-end cool techs, when we are distributed, because different modules have different machines, and the machines are clustered, so this fault isolation is natural.

However, sometimes we want to block fault isolation in a more fine-grained way, for example, if we want to isolate faults at the thread level or process level. At this point, I can consider using threads or containers to execute tasks, and then go to some scheduling strategies to isolate faults naturally to thread or process level.

4.6 distributed resource management

Finally, we want to further study to cope with larger distributed systems, after all, people are in pursuit of progress. At this time, we need to understand the knowledge related to distributed architecture and to understand distributed resource management.

Fortunately, the technology stack of distributed resource management itself is very small. For distributed architectures, there are two architectures:

Centralized structure

Decentralized structure

There are three ways to allocate or schedule distributed resources:

Single unit scheduling

Two-tier scheduling

Shared state scheduling

Thank you for your reading. the above is the content of "how to master distributed systems". After the study of this article, I believe you have a deeper understanding of how to master distributed systems. The specific use of the situation also needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.