What are the knowledge points of high concurrency in web development 09/22 Update SLTechnology News&Howtos

What are the knowledge points of high concurrency in web development

2025-09-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "what are the high concurrency knowledge points in web development". In daily operation, I believe many people have doubts about the high concurrency knowledge points in web development. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "what are the high concurrency knowledge points in web development". Next, please follow the editor to study!

01 how to understand high concurrency?

High concurrency means large traffic, which requires the use of technical means to resist the impact of traffic, which is like operating traffic, which can make the traffic handled more smoothly by the system and bring users a better experience.

Our common high concurrency scenes are: double 11 on Taobao, ticket grabbing during Spring Festival travel, hot news on Weibo and so on. In addition to these typical things, the second kill system with hundreds of thousands of requests per second, the order system with tens of millions of levels per day, and the information flow system with hundreds of millions of daily active users can be classified as high concurrency.

Obviously, the high concurrency scenarios mentioned above vary from one to another, so how much concurrency is high concurrency?

1. You can't just look at the numbers, you have to look at the specific business scenarios. It cannot be said that the second kill of 10W QPS is high concurrency, while the information flow of 1W QPS is not high concurrency. The information flow scenario involves complex recommendation models and various human strategies, and its business logic may be more than 10 times more complex than the second kill scenario. Therefore, not in the same dimension, there is no comparative significance.

2. Business is done from 0 to 1, and concurrency and QPS are only reference indicators. The most important thing is: when the business volume gradually becomes 10x or 100x, do you use high concurrency processing methods to evolve your system to prevent and solve problems caused by high concurrency from the dimensions of architecture design, coding implementation, and even product solutions? Instead of blindly upgrading the hardware and adding the machine to do horizontal expansion.

In addition, the business characteristics of each high concurrency scenario are completely different: there are information flow scenarios with more reading and writing, and transaction scenarios with more reading and writing. Is there a general technical solution to solve the problem of high concurrency in different scenarios?

I think big ideas can be used for reference, and other people's plans can also be used for reference, but in the real landing process, there will be countless pits in the details. In addition, due to the hardware and software environment, technology stack, and product logic can not be completely consistent, these will lead to the same business scenario, even with the same technical solution will face different problems, these holes still have to go one by one.

Therefore, in this article, I will focus on the basics, general ideas, and the effective experience I have practiced, hoping to give you a deeper understanding of high concurrency.

What is the goal of 02 high concurrency system design?

It is meaningful and targeted to find out the goal of high concurrency system design first, and then discuss the design scheme and practical experience on this basis.

2.1 Macro objectives

High concurrency does not mean only the pursuit of high performance, which is an one-sided understanding of many people. From a macro point of view, there are three goals for high concurrency system design: high performance, high availability, and high scalability.

1. High performance: performance reflects the parallel processing ability of the system. Under the limited hardware input, improving performance means cost saving. At the same time, the performance also reflects the user experience, the response time is 100 milliseconds and 1 second respectively, the feeling to the user is completely different.

2. High availability: indicates the time when the system can be served normally. One has no downtime and no failure throughout the year; the other has online accidents and outages every three or five times, and users will definitely choose the former. In addition, if the system is only 90% available, it will be a big drag on the business.

3. High expansion: indicates the scalability of the system, whether the capacity can be expanded in a short time during peak traffic, and to undertake peak traffic more smoothly, such as double 11 events, star divorce and other hot events.

These three goals need to be considered as a whole, because they are interrelated and even influence each other.

For example, considering the scalability of the system, you will design the service to be stateless. This cluster design ensures high scalability and indirectly improves the performance and availability of the system.

For example, in order to ensure availability, the service interface is usually timed out to prevent a large number of thread blockages causing a system avalanche on slow requests, so how reasonable is the timeout setting? In general, we will refer to the performance of the dependent service for setting.

2.2 Micro objectives

From a micro point of view, what specific indicators are there to measure high performance, high availability and high scalability? Why did you choose these indicators?

Performance index

The existing performance problems can be measured by the performance index, and can be used as the evaluation basis of performance optimization. Generally speaking, the interface response time over a period of time is used as an indicator.

1. Average response time: the most commonly used, but the defect is obvious and insensitive to slow requests. For example, for 10, 000 requests, of which 9900 are 1ms 100ms, the average response time is 1.99ms. Although the average time spent is only increased by 0.99ms, the response time of 1% requests has increased by 100 times.

2. TP90, TP99 and other quantiles: the response time is sorted from smallest to largest. TP90 represents the response time in the 90th quartile. The higher the quantile value is, the more sensitive it is to slow requests.

3. Throughput: it is inversely proportional to the response time. For example, if the response time is 1ms, the throughput is 1000 times per second.

In general, performance goals are set with a balance between throughput and response time, such as: under 10,000 requests per second, AVG is controlled below 50ms and TP99 below 100ms. For highly concurrent systems, both AVG and TP quantiles must be considered at the same time.

In addition, from the user experience point of view, 200 milliseconds is considered to be the first demarcation point, users do not feel the delay, 1 second is the second demarcation point, users can feel the delay, but acceptable.

Therefore, for a healthy high concurrency system, TP99 should be controlled within 200ms, and TP999 or TP9999 should be controlled within 1 second.

Availability index

High availability means that the system has high fault-free operation ability, availability = average failure time / total running time of the system, and several 9s are generally used to describe the availability of the system.

For high concurrency systems, the most basic requirement is to guarantee 3 9s or 4 9s. The reason is simple: if you can only achieve 2 9s, it means 1% downtime, such as some large companies often have more than 100 billion GMV or revenue each year, and 1% is 1 billion level business impact.

Scalability index

In the face of sudden traffic, it is impossible to temporarily modify the architecture, the fastest way is to increase the machine to linearly improve the processing capacity of the system.

For business clusters or basic components, scalability = the ratio of performance improvement / machine increase. The ideal scalability is to increase resources by several times and performance by several times. Generally speaking, scalability should be maintained at more than 70%.

However, from the perspective of the overall architecture of the high concurrency system, the goal of expansion is not only to design the service as stateless, because when the traffic increases 10 times, the business service can expand 10 times rapidly, but the database may become a new bottleneck.

Stateful storage services like MySQL are usually extended technical difficulties, and if the architecture is not planned in advance (vertical and horizontal split), it will involve the migration of a large amount of data.

Therefore, high scalability needs to be considered: middleware such as service clusters, databases, caching and message queues, load balancing, bandwidth, dependent third parties, etc., when concurrency reaches a certain order of magnitude, each of these factors may become a bottleneck for expansion.

03 what are the practical solutions for high concurrency?

After understanding the three major goals of high concurrency design, and then systematically summarizing the design scheme of high concurrency, it will be carried out from the following two parts: first, summarize the general design methods, and then give specific practical schemes around high performance, high availability, and high scalability.

3.1 General design methods

The general design method is mainly from the two dimensions of "vertical" and "horizontal", commonly known as the two axes of high concurrent processing: vertical expansion and horizontal expansion.

Scale-up (scale-up)

Its goal is to improve the processing capacity of a single machine, and the solutions include:

1. Improve the hardware performance of a stand-alone machine: by increasing memory, CPU cores, storage capacity, or upgrading disks to heap hardware such as SSD.

2. Improve the software performance of a stand-alone machine: use cache to reduce the number of IO, and use concurrent or asynchronous methods to increase throughput.

Scale-out (scale-out)

Because there is always a limit to stand-alone performance, scale-out needs to be introduced eventually to further improve concurrent processing capacity through cluster deployment, including the following two directions:

1. Do a good job of hierarchical architecture: this is the advance of horizontal expansion, because highly concurrent systems often have complex business, and complex problems can be simplified through hierarchical processing, making it easier to scale horizontally.

The above diagram is the most common hierarchical architecture of the Internet, and of course the real high concurrency system architecture will be further improved on this basis. For example, static and dynamic separation and introduction of CDN, reverse proxy layer can be LVS+Nginx,Web layer can be a unified API gateway, business service layer can be further micro-service according to vertical business, storage layer can be a variety of heterogeneous databases.

2. Expand horizontally at each layer: stateless horizontal expansion, stateful fragmentation routing. Business clusters can usually be designed to be stateless, while databases and caches are often stateful, so it is necessary to design partition keys to do a good job of storage slicing, and of course, the read performance can be improved through master-slave synchronization and read-write separation.

3.2 specific practice plan

The following combined with my personal experience, for high performance, high availability, high expansion of three aspects, summed up the landing of the practical program.

High-performance practical solution

1. Cluster deployment to reduce the pressure on a single machine through load balancing.

2. Multi-level cache, including static data using CDN, local cache, distributed cache, etc., as well as the handling of hot key, cache penetration, cache concurrency, data consistency and other issues in cache scenarios.

3. Optimize the database, table and index, and solve the complex query problem with the help of search engine.

4. Consider the use of NoSQL databases, such as HBase, TiDB, etc., but the team must be familiar with these components and have strong operation and maintenance capabilities.

5. Asynchronization, processing secondary processes asynchronously through multithreading, MQ, and even delayed tasks.

6. For current limit, you need to consider whether the service allows current limit (for example, the second kill scenario is allowed), including frontend current limit, Nginx access layer current limit, and server current limit.

7. Cut the peak and fill the valley for the flow, and accept the flow through MQ.

8. Concurrent processing, parallelizing serial logic through multithreading.

9. Pre-calculation, such as the scenario of grabbing red packets, you can calculate the amount of red packets in advance and cache them, and you can use them directly when sending red packets.

10. Cache warm-up, prefetch data to local cache or distributed cache in advance through asynchronous tasks.

Reduce the number of IO, such as batch read and write of database and cache, batch interface support of RPC, or kill RPC calls by means of redundant data.

12. Reduce the packet size of IO, including adopting lightweight communication protocol, appropriate data structure, removing redundant fields in the interface, reducing the size of cache key, compressing cache value and so on.

13. Program logic optimization, such as the preposition of judgment logic that blocks the execution flow with high probability, the computational logic optimization of For cycle, or the use of more efficient algorithms.

14. The use of various pooling technologies and the setting of pool size, including HTTP request pool, thread pool (considering CPU-intensive or IO-intensive setting of core parameters), database and Redis connection pool, etc.

15. JVM optimization, including the size of the new generation and the old age, the selection of GC algorithm, etc., to reduce the GC frequency and time-consuming as much as possible.

Lock selection, optimistic locking is used in scenarios with more reading and writing less, or consider reducing lock conflicts by segmented locking.

The above solution only considers all possible optimization points from the two dimensions of computing and IO, and requires a supporting monitoring system to understand the current performance in real time and support you to analyze the performance bottleneck, and then follow the principle of 28 to optimize the principal contradiction.

Highly available practical solutions

1. Failover of peer nodes. Nginx and service governance frameworks both support access to one node after failure.

2. For the failover of non-peer nodes, the master / slave handover is detected by heartbeat (such as sentry mode or cluster mode of redis, master-slave handover of MySQL, etc.).

3. Timeout setting, retry strategy and idempotent design at the interface level.

4. Downgrade treatment: ensure core services, sacrifice non-core services, and circuit breaker if necessary; or if there is a problem with the core link, there is an alternative link.

5. Current-limiting processing: directly reject requests that exceed the processing capacity of the system or return error codes.

6. Guarantee the message reliability of the MQ scenario, including the retry mechanism on the producer side, the persistence on the broker side, the ack mechanism on the browser side, and so on.

7. Grayscale release, which can support the deployment of small traffic according to the machine dimension, observe system logs and business metrics, and then push the full volume after running smoothly.

8. Monitoring and alarm: omni-directional monitoring system, including the most basic CPU, memory, disk, network monitoring, as well as Web server, JVM, database, all kinds of middleware monitoring and business index monitoring.

9. Disaster preparedness drill: similar to the current "chaos Engineering", carry out some destructive means to the system to observe whether local failures will cause availability problems.

The highly available scheme is mainly considered from three aspects: redundancy, trade-off and system operation and maintenance. at the same time, it needs a supporting duty mechanism and fault handling flow, which can be followed up in time when there are online problems.

A highly scalable practical solution

1. Reasonable hierarchical architecture: for example, the most common tiered architecture of the Internet mentioned above, and the fine-grained layering of micro-services according to the data access layer and business logic layer (but the performance needs to be evaluated. There will be one more hop in the network).

2. Split the storage layer: split vertically according to the business dimension and further horizontally according to the data feature dimension (sub-database table).

3. Split of the business layer: the most common split is based on the business dimension (such as merchandise services and order services in e-commerce scenarios), core and non-core interfaces, and request sources (such as To C and To Bjinger Apps and H5).

At this point, the study of "what are the high concurrency knowledge points in web development" is over. I hope to be able to solve everyone's doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.