What are the characteristics of web server distributed system 07/12 Update SLTechnology News&Howtos

What are the characteristics of web server distributed system

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "what are the characteristics of web server distributed system". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Basic knowledge of distributed system

In an era when tomcat dominates the world, it can not be said to be completely eliminated. In a management system, it is often used in small projects, which is not excessive, but it is worth advocating for the sake of cost. But if you want to extend to high concurrency scenarios, you must understand distributed systems:

Characteristics of distributed system

Distributed system: a system in which hardware or software components are distributed on different network computers and communicate and coordinate with each other only through message passing.

This is a distributed system that communicates and coordinates only through messages on different hardware, different software, different networks, different computers.

These are his characteristics, which can be seen in more detail: distribution, equivalence, concurrency, lack of global clock, and faults can occur at any time.

1. Distributive property

Since it is a distributed system, the most prominent feature must be distribution. From a simple point of view, if we are doing an e-commerce project, the whole project will be divided into different functions, and the professional points will be different micro-services, such as user micro-services, product micro-services, order micro-services, these services are deployed in different tomcat, different servers, or even different clusters, and the whole architecture is distributed in different places. It is random in space and can be added at any time. Delete the server node, which is the first feature.

two。 Reciprocity

Peer-to-peer is a goal of distributed design, or take e-commerce websites as an example to illustrate what is equivalence. To complete a distributed system architecture, it is certainly not simply to split a large single system into micro-services, and then deploy in different server clusters, in which each micro-service split may find problems, resulting in the loss of functionality of the entire e-commerce website.

For example, the order service, in order to prevent problems with the order service, generally requires a backup to replace the original order service when there is a problem with the order service. This requires that the two (or more) order services are completely equivalent and the functions are exactly the same. In fact, this is the redundancy of a service copy.

The other is the redundancy of data copies, such as database, cache, etc., and for example, the three copies in big data HDFS are the same as the order service mentioned above. For security reasons, exactly the same backup is needed, which means equivalence.

3. Concurrency

Concurrency is actually not a pattern for us. We have learned more or less about multithreading, which is the basis of concurrency. But in the past, concurrency is implemented on a JVM, but now what we need to contact is not the perspective of multi-threading, but a higher level. From the perspective of multi-process and multi-JVM, for example, multiple nodes in a distributed system may operate some shared resources concurrently, and how to coordinate distributed concurrent operations accurately and efficiently. That's what distributed locks are for.

4. Lack of global clock

In a distributed system, a node may be anywhere anyway, and each node has its own time system, so in a distributed system, it is difficult to define which of the two transactions comes first and which comes later. The reason is the lack of a global clock sequence to control, of course, this is no longer a big problem, there are a large number of time servers for system calls.

5. A failure can occur at any time.

Any node may have power outages, crashes and other phenomena, the more server clusters, the greater the possibility of failure, with the increase in the number of clusters, failure will even become a normal, how to ensure that there is a failure in the system, and the system or normal visitors should be considered as system builders.

Large-scale website architecture diagram

To know what a distributed system is, let's take a specific look at the architecture diagram of a large website. first of all, the whole architecture is divided into many layers, application layer, service layer, infrastructure layer and data service layer, each of which is composed of several nodes. this is a typical distributed architecture, and the next lot of time is every part of the learning of the system.

So what role does zookeeper play in it? if zk can be played as a traffic policeman, and each node is all kinds of cars (cars, buses) on the road, in order to ensure the availability of the whole traffic (system), zookeeper must know the health status of each node (whether there is something wrong with the bus, send a new bus [service registration and discovery]), and whether the road is congested during rush hours. Only cars in one direction are allowed to pass through [distributed locks] on a very narrow road.

If the traffic policeman is the commander of the traffic system, and zookeeper is the commander of the distributed system composed of each node.

Distributed system problem

If we compare the distributed system with the usual traffic system, no matter how robust the traffic system is, there will be traffic accidents, and there are many problems that need to be overcome in the distributed system, such as communication anomaly, network partition, three-state, node failure and so on.

1. Communication exception

In fact, the communication anomaly is the network anomaly, the network system itself is unreliable, because the distributed system needs to transmit data through the network, network optical fiber, routers and other hardware problems are inevitable. As long as there is a problem with the network, it will affect the process of sending and receiving messages, so the loss or extension of data messages will become very common.

two。 Network partition

Network zoning, in fact, is the phenomenon of brain fissure (see Hadoop NameNode). For example, there was a traffic policeman to manage the traffic situation in the whole area. everything was in order, and suddenly there was a power outage or an earthquake and other natural disasters, and some roads could not receive the instructions of the traffic police. in this case, there may be a part-time worker and a policeman to direct the traffic. But note that the original traffic police are still there, but the communication system is interrupted, and problems will arise at this time. There will be different people directing on the roads in the same area, which will inevitably lead to engine traffic jams and chaos. This kind of schizophrenia occurs when there are two conflicting leaders in the same region (distributed cluster) due to various problems, which is called brain fissure or network partition.

3. Three states

What are the three states? Tri-state is actually the third state besides success and failure. Of course, it is definitely not abnormal, but supertense.

In a jvm, the application will get a clear response after calling a method function, either success or failure, while in the distributed system, although in the vast majority of cases can accept the success or failure of the corresponding, but once the network exception, there is a very likely timeout, when there is such a timeout phenomenon, the initiator of network communication is unable to determine whether the request is processed successfully.

4. Node failure

In fact, as mentioned earlier, node failure is a common problem in distributed systems, which refers to the downtime or deadlock of the nodes that make up the server cluster, which often occurs.

CAP theory

I mentioned the characteristics of distribution and will encounter a lot of headaches, these problems will certainly have some theoretical ideas to solve the problem. Next, take a moment to talk about these theories, of which CAP and BASE theories are the basis and are often asked during interviews.

First of all, let's take a look at CAP,CAP, which is actually an acronym for consistency, usability and partition fault tolerance.

Consistency

Consistency is a feature of transactional ACID [Atomicity, Consistency, Isolation, Durability].

The consistency here is more or less the same, but now we are considering whether it is in a distributed environment or not a single database.

In distributed systems, consistency is the characteristic of whether data can be consistent among multiple replicas. The consistency here is similar to the equivalence mentioned above. If all users can read the latest value immediately after a change to a data item is successfully implemented in a distributed system, then such a system is considered to be highly consistent.

Usability

Availability means that the service provided by the system must be available all the time, and the result can always be accessed in a limited time for the user's operation request. The focus here is on limited time and return results. In order to achieve the limited time, the cache needs to be used, and the load is needed. At this time, the additional nodes on the server are for performance considerations.

In order to return the result, you need to consider the server master / slave. When there is a problem with the master node, the node that needs to be backed up can be replaced as quickly as possible, and there must be no OutOfMemory or other 500404 errors, otherwise such a system will be considered unavailable.

Partition fault tolerance

When a distributed system encounters any network partition failure, it still needs to be able to provide services that meet the consistency and availability, unless the whole network environment fails. There can be no brain fissure.

PS:

A distributed system can not meet the three basic requirements of consistency, availability and partition fault tolerance, but can only meet two of them at the same time. Designers often spend their energy on how to find a balance between An and C directly according to the business scenario.

BASE theory

According to the previous CAP theory, designers should find a balance between consistency and usability, and it is certainly not allowed for the system to be completely unavailable for a short time, so according to CAP theory, strong consistency cannot be achieved in a distributed environment.

BASE theory:

Even if strong consistency cannot be achieved, distributed systems can adopt appropriate ways to achieve final consistency according to their own business characteristics.

Basically Avaliable is basically available

When an unforeseen failure occurs in a distributed system, partial availability is allowed to be lost to ensure the basic availability of the system, which is reflected in the loss of time and function.

E.g: some users stutter or downgrade the Taobao page during the peak period of Singles Day holiday

Soft state soft statu

In fact, it is the three states mentioned earlier, which allows the data in the system to have an intermediate state, that is, there is a delay in the data synchronization process between the data copies of different nodes of the system, and it is considered that this delay will not affect the availability of the system.

E.g:12306 website sells train tickets, and requests will enter the queue

Final consistency of Eventually consistent

After a period of data synchronization, all the data can finally reach a consistent state.

E.g: the total amount of recharge on the front page of financial products is inconsistent for a short time.

Common nouns service avalanche

Assume that the following call chain exists

At this time, the traffic of Service A fluctuates greatly, and the traffic often increases suddenly! So in this case, even if Service A can handle the request, Service B and Service C may not be able to withstand the sudden request.

At this point, if Service C becomes unavailable because it cannot resist the request. Then Service B's request will also block, slowly deplete Service B's thread resources, and Service B will become unavailable. Then, Service A will also be unavailable, as shown in the following figure

As shown in the figure above, the situation in which one service fails, resulting in the failure of services across the entire link, is called a service avalanche.

Then, service breaker and service degradation can be regarded as one of the means to solve the service avalanche.

Service circuit breaker

Service breaker: when the downstream service suddenly becomes unavailable or the response is too slow for some reason, the upstream service stops calling the target service and releases resources quickly in order to ensure the availability of its overall service. If the target service improves, the call is resumed.

It should be noted that the circuit breaker is actually a frame-level process, so the circuit breaker mode is basically used in the design of this circuit breaker mechanism, as shown in the state transition diagram provided by Martin Fowler.

At the beginning, it is in the closed state, and once the detected error reaches a certain threshold, it changes to the open state.

At this point, there will be a reset timeout, and at this time, it will move to the half open state.

Try to release part of the request to the backend. Once the test is successful, you will return to the closed status, that is, restore the service.

At present, there are many popular fuses in the industry, such as Ali's Sentinel and the most widely used Hystrix. In Hystrix, the corresponding configuration is as follows

/ / the size of the sliding window. Default is 20circuitBreaker.requestVolumeThreshold / / for how long the fuse will check whether it is turned on again. Default is 5000, i.e. 5s clock circuitBreaker.sleepWindowInMilliseconds / / error rate, and default 50%circuitBreaker.errorThresholdPercentage.

Whenever 50% of the 20 requests fail, the fuse will be opened. If this service is called again, the failure will be returned directly and the remote service will no longer be called. Until 5 seconds later, re-detect the trigger condition to determine whether to turn the fuse off or continue to open.

These are framework-level implementations, and we just need to implement the corresponding interfaces!

Service degradation

What is a service downgrade? There are two scenarios:

When the downstream service responds too slowly for some reason, the downstream service actively stops some less important business, releasing server resources and increasing response speed!

When the downstream service is unavailable for some reason, the upstream actively invokes some local downgrade logic to avoid stutter and quickly return to the user!

In fact, at first glance, many people still do not understand the difference between circuit breaker and demotion. In fact, they should understand it like this:

There are many ways to downgrade services! Such as switch downgrade, current limiting demotion, fuse demotion!

Service breaker is one of the degraded ways!

Some people may be dissatisfied and think that a circuit breaker is a circuit breaker and a demotion is a demotion, which is obviously two different things! In fact, this is not the case, because in terms of implementation, breakers and demotion must occur together. Because when a downstream service is unavailable, it is necessary to enter the upstream downgrade logic in order to be accountable to the end user. Therefore, it makes sense to regard fuse demotion as a way of demotion.

Put aside the framework and illustrate it in the simplest code! The upstream code is as follows

Try {/ / calls downstream helloWorld service xxRpc.helloWorld ();} catch (Exception e) {/ / cannot call doSomething ();} because of a circuit breaker.

Notice that the downstream helloWorld service cannot be turned on because of a circuit breaker. At this point, the upstream service will enter the code block in catch, and the logic executed in catch can be understood as degraded logic!

What, you told me you didn't catch anomalies, just drop the page? OK, then I am willing to be outdone when I misunderstand!

Service degradation is mostly a kind of business-level processing. Of course, what I want to talk about here is another way of downgrading, that is, switch downgrade, which is also another way commonly used in our production!

The practice is very simple, make a switch, and then put the switch in the configuration center! Change the switch in the configuration center to determine which services are downgraded. As for how the application monitors that the configuration has changed after the configuration change, this is outside the scope of this article.

Then, in the middle of the application under the switch of this process, the industry also has a term, called burying point!

Then the next most critical question is, which businesses need to be buried?

There are generally the following ways

Simplify the execution process

Sort out the core business processes and non-core business processes. Then add a switch to the non-core business process, and once you find that the system can't handle it, turn off the switch and end these secondary processes.

Turn off secondary featur

There must be many functions under a microservice, so you can distinguish between primary and secondary functions. Then add a switch to the secondary function, and when you need to downgrade, turn off the secondary function!

Reduce consistency

Suppose, in the business, you find that the implementation process can not be simplified. Worry! There is no secondary function to turn off, Sang Xin ah! That can only reduce consistency, that is, the core business processes will be changed from synchronous to asynchronous, and strong consistency will be changed to final consistency!

But these are manual demotion, is there a way to automatically downgrade?

There is no automatic downgrade in production! Because scenarios that generally need to be degraded are predictable, such as so-and-so activities. Suppose that there is really an emergency at ordinary times, the traffic is abnormal, and there is also a monitoring system to send email notification to remind us to downgrade!

Of course, this doesn't mean that automatic demotion can't be done, it's just that I thought about how I would achieve it if I were asked to do automatic demotion:

Set your own threshold, for example, how many times you fail in a few seconds, start the downgrade.

Do your own interface monitoring (you can learn about Rxjava if you are interested), and push logic when you reach the threshold. How do you push it? For example, if you put the configuration on git, use jgit to change the configuration of the configuration center. If the configuration is in the database, use jdbc to change it.

After changing the configuration of the configuration center, the application can automatically detect the change in the configuration and downgrade it! (if you don't understand this sentence, learn about the hot refresh function of the configuration center.)

This is the end of the content of "what are the characteristics of web server distributed system". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.