How to ensure stateless service in highly available architecture design 04/27 Update SLTechnology News&Howtos

How to ensure stateless service in highly available architecture design

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

Today, I would like to talk to you about how to ensure stateless services in highly available architecture design. Many people may not know much about it. In order to make you understand better, the editor has summarized the following for you. I hope you can get something from this article.

Stateless service for highly available architecture design

Talking about Architecture Design with a laugh

The occurrence of the accident is the result of the accumulation of quantity, and nothing is as simple as it seems. In the process of running the software, with the increase of the number of users and regardless of high availability, failure will occur sooner or later. High availability design must not be considered in advance, and high availability is a huge knowledge.

Do you want to know what I will consider when designing a highly available system? In the process of architecture design

Consider what kind of pit the scheme selection will bring, and in the worst case, you need to consider the emergency solution when the failure occurs.

Need to monitor the system to perceive when the failure occurs and when it occurs.

Need automatic recovery scheme, automatic early warning scheme

At the code level, processing speed, code performance and error handling need to be considered.

Also consider minimizing failures: service degradation, current limiting, circuit breakers.

Wait

Stateless service: the service does not store data (except cache) at any time, it can be destroyed and created at will, the user data will not be lost, and you can switch to any copy at will. The high availability of stateless service is designed not to lose data under any circumstances, the service does not fail, and the impact is guaranteed to be minimal in the event of failure of some services, and can be recovered quickly.

It can be considered from these aspects.

Redundant deployment: deploy at least one more node to avoid a single point of problem

Vertical scaling: increase stand-alone performanc

Horizontal expansion: rapid expansion of traffic surge

Redundant deployment

In a single point of architecture, with the increase of the amount of data, the pressure of a single point of load is too large, which is easy to cause service collapse and unavailability. For stateless services, we can consider deploying services of multiple nodes to disperse the pressure.

For how to schedule incoming requests, you can refer to the way of load balancing to ensure that the resources of the server are fully utilized as much as possible.

Stateless service: a service that does not need to store data. Even if the node is hung up and restarted, there will be no data loss.

Load balancing: an algorithm that distributes a large number of requests to different nodes

Load balancing of stateless services

You can use the four algorithms provided in load balancing

Random equalization algorithm: known backend server list, random request, the larger the amount of data, the closer to equilibrium.

Polling algorithm: take turns requesting back-end servers

The problem of the first two algorithms is that when the load pressure or server configuration of the back-end server is different, it can not guarantee the multi-distribution with low pressure and the small distribution with high pressure, so it is introduced.

Weighted round robin algorithm: according to the anti-pressure ability of the back-end server, the load distribution is higher, it is easier to hit, reduce the downtime risk, and distribute to the back-end server according to the weight order.

Weighted random method: like the weighted rotational training algorithm, the difference is that the allocation is random according to weight, for example, if there are multiple sets with the same weight and random access, then it has the same problem as the random algorithm. It is only close to equilibrium when the amount of data is large. When the amount of data is small, it is possible to repeatedly access the same machine with the same weight.

[weighted] minimum number of connections algorithm: this is the most intelligent algorithm, which is selected according to the current number of connections of the server, so it is easier to hit the server with fast processing speed.

The above algorithm is used for stateless applications. If you want to save the communication state, you need to use the

Source address hashing algorithm: hash the source address to ensure that the same requests end up on the same machine, eliminating the need to establish connections repeatedly

How to choose the load balancing algorithm?

First of all, abandon the random algorithm, the simplest configuration can use the basic rotation training algorithm, it is suitable for the server configuration is consistent, such as the use of virtual machines, you can dynamically adjust the server configuration scenarios, while ensuring that dedicated virtual machines, other applications will not be deployed above

However, multiple applications are often installed on the server, so consider choosing between weighted rotation training and the minimum number of connections.

Weighted rotation training is suitable for short connection scenarios, such as HTTP service. In K8s, because each pod is independent, the default service strategy is unweighted rotation training.

The minimum number of connections is suitable for long connections, such as FTP, etc.

If the scenario without cookie function is considered in the system architecture, the source IP can be mapped to the same rs all the time by using the source address hash algorithm, which is called session persistence mode in K8s and forwarded to the same pod each time.

Recommendations:

If the container is sent directly to K8s for scheduling, cookie is used for session persistence, and the algorithm uses default rotation training. Specific scheduling will be described in detail in the future K8s article.

For applications that use persistent connections (FTP, socket, or for download connections), select the weighted minimum number of connections

Short-link applications (static websites, micro-service components, etc.), choose weighted rotation training, use cookie for session persistence, and reduce session design, which will not only increase code complexity, but also increase server load, which is not conducive to distributed applications.

Identification of highly concurrent applications

Main metrics QPS processes responses per second, such as 10w pv per day

Formula (100000 * 80%) / (86400 / 20%) = 4.62 QPS (peak QPS)

Formula principle: 80% of daily visits are concentrated in 20% of the time, which is called peak time.

For example, the system I made hosts up to 5w machines, and each machine has a PV every minute, and the time is more uniform.

((60-24) * 50000) / (86400) = 833 QPS

Generally hundreds of magnitude can be called high concurrency, the information found on the Internet Weibo every day more than 100 million pv of the system is generally 1500QPSPower5000QPS peak.

In addition to QPS, there are service response time and concurrent user indicators for reference.

When the server load is high, it is manifested in the problems such as slow processing speed, network disconnection, service processing failure, abnormal error report and so on. The specific problems should be analyzed in detail, not generalized.

The performance status of the server can be obtained through monitoring, dynamic adjustment and retry to ensure service availability and reduce maintenance costs. Usually, vertical expansion can be considered when the server is under great pressure.

Vertical expansion

Vertical expansion is to increase the processing capacity of a stand-alone server. There are three main ways.

Server upgrade: focus on CPU, memory, swap, disk capacity or network card, etc.

Hardware performance: disk SSD, adjusting system parameters, etc.

Architecture adjustment: software level uses async, cache, lock-free structure, etc.

The way to enhance the performance of a single machine is the fastest and easiest way, but there is a limit in the performance of a single machine. At the same time, if a failure occurs when a single machine is deployed, it will be fatal to the application. We should ensure that the application is available at all times. That is, as the saying goes, guarantee the reliability of five 9s.

Horizontal automatic expansion

Once you know the limitations of a single machine, consider using horizontal scaling.

Horizontal scaling means that when the pressure increases, new nodes are added to share the pressure, but only multi-point deployment is not enough. For growing business, it will one day break the upper limit of service pressure. If you encounter the scenario of traffic surge, manual response will certainly be unprepared, so you need a means of automatic scaling.

For private cloud deployment, the scheduler can be implemented manually, the system status can be detected, and the iaas layer can be connected to achieve scalability.

You can also directly use the auto scaling service provided by the CVM.

For containers, automatic scaling and scheduling policies can be configured in the case of elastic scaling of the iaas layer or when there are sufficient node nodes to prevent stand-alone failure.

Explanation of the term: iaas infrastructure as a service, which represents services for the management of hardware resources such as servers, storage, networks, etc. Note: the business scenario for auto scaling is stateless service.

In addition, stateless machines are not enough to carry request traffic, and the threshold for horizontal scaling is generally thousands of QPS. At the same time, there will be pressure on the database, so it is recommended that servers that scale horizontally should not deploy stateful services.

The pressure of stateful service will be introduced in subsequent articles.

CDN and OSS

For a website, the user interaction page is a special service, which contains many static resources, such as pictures, videos and pages (html/css/js). These resources need to be downloaded on the spot when the user requests, and the download speed determines the loading speed. When the web service fails, it should also be corresponding to the user.

At this level, you can consider using the CDN content distribution network. For more information, please see [xxx] to cache the front-end static data to the edge server.

Noun interpretation: edge server (edge node) can be understood as the server that interacts with the user, or it can be understood as the server node close to the user, because it is close to the user, which reduces the time used for network transmission. "using CDN's web service, you can bind the https certificate to cdn. In the origin-pull policy, you can configure configuration such as origin-pull timeout, origin-pull following 301x302 status code and so on. It can also intelligently compress web pages and customize error pages, which is very convenient.

Oss is a special storage scheme, which is stored in the form of objects and can store unlimited files in theory.

Consider using oss object storage combined with cdn to store media resources on object storage, or compress and archive cold data to oss.

Most common video websites use oss. Weibo's data from n years ago should have been archived to object storage.

After reading the above, do you have any further understanding of how the high availability architecture design ensures stateless service? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.