How to understand the high concurrency in web development 07/03 Update SLTechnology News&Howtos

How to understand the high concurrency in web development

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "how to understand high concurrency in web development". The explanation content in this article is simple and clear, easy to learn and understand. Please follow the ideas of Xiaobian slowly and deeply to study and learn "how to understand high concurrency in web development" together!

How to understand high concurrency?

High concurrency means large traffic, and technical means need to be used to resist the impact of traffic. These means, like operating traffic, can make traffic more smoothly processed by the system and bring users a better experience.

Our common high-concurrency scenes include: Taobao double 11, Spring Festival rush tickets, Weibo big V hot news, etc.

In addition to these typical things, a spike system with hundreds of thousands of requests per second, an order system with tens of millions of orders per day, and an information flow system with hundreds of millions of daily activities can all be classified as high concurrency.

Obviously, the high concurrency scenarios mentioned above have different concurrency levels, so how much concurrency is considered high concurrency?

① You can't just look at numbers, you have to look at specific business scenarios. It cannot be said that 10W QPS seconds are high concurrency, while 1W QPS information flow is not high concurrency.

The information flow scenario involves complex recommendation models and various artificial strategies, and its business logic may be more than 10 times more complex than the spike scenario. Therefore, they are not in the same dimension and have no comparative significance.

② Business is done from 0 to 1, concurrency and QPS are only reference indicators, the most important thing is: in the process of business volume gradually becoming 10 times, 100 times the original, whether you use a high concurrency processing method to evolve your system.

Prevent and solve problems caused by high concurrency from the dimensions of architecture design, coding implementation, and even product solutions? Instead of blindly upgrading hardware, adding machines to do horizontal expansion.

In addition, the business characteristics of each high concurrency scenario are completely different: there are information flow scenarios with more reads and less writes, and transaction scenarios with more reads and more writes. Is there a common technical solution to solve the high concurrency problem in different scenarios?

I think the big ideas can be used for reference, and other people's schemes can also be referred to, but in the real landing process, there will be countless pits in the details.

In addition, because the software and hardware environment, technology stack, and product logic cannot be completely consistent, these will lead to the same business scenario, even with the same technical solution will face different problems, these pits have to go through one by one.

Therefore, in this article I will focus on the basics, common ideas, and effective experiences I have practiced, hoping to give you a deeper understanding of high concurrency.

What is the goal of high concurrency system design?

It is meaningful and pertinent to make clear the goal of high concurrency system design first, and then discuss the design scheme and practical experience on this basis.

macro-objective

High concurrency does not mean only pursuing high performance, which is a one-sided understanding of many people. From a macro perspective, there are three goals for high concurrency system design: high performance, high availability, and high scalability.

① High performance: Performance reflects the parallel processing capability of the system. Under limited hardware investment, improving performance means saving costs.

At the same time, the performance also reflects the user experience, the response time is 100 milliseconds and 1 second respectively, giving the user a completely different feeling.

② High availability: indicates the time when the system can be in normal service. One does not stop all year round, no fault; the other has an accident and downtime every three or five lines. Users must choose the former. Also, if the system is only 90% available, it will also greatly slow down the business.

③ High expansion: indicates the expansion ability of the system, whether the expansion can be completed in a short time at peak traffic, and more smoothly undertake peak traffic, such as double 11 activities, star divorce and other hot events.

These three objectives need to be considered as a whole because they are interrelated and even affect each other.

For example, considering the scalability of the system, you might design services to be stateless. This clustering design ensures high scalability, which indirectly improves system performance and availability.

Another example: in order to ensure availability, the service interface is usually timed out to prevent a large number of thread blocks from causing system avalanches on slow requests. How much is the timeout set reasonable? In general, we will refer to the performance of dependent services to set.

micro-targeting

From a micro perspective, what are the specific indicators of high performance, high availability and high scale? Why did you choose these indicators?

Performance indicators: performance indicators can be used to measure the existing performance problems, and at the same time as the evaluation basis for performance optimization. In general, interface response time over time is used as a metric.

① Average response time: The most common, but the defect is obvious, insensitive to slow requests. For example, 10,000 requests, of which 9900 are 1ms and 100 are 100ms, the average response time is 1.99 ms, although the average time has only increased by 0.99 ms, but the response time of 1% request has increased by 100 times.

TP90, TP99 quantile values: The response time is sorted from small to large, TP90 indicates the response time ranked in the 90th quantile, and the larger the quantile value, the more sensitive it is to slow requests.

③ Throughput: inversely proportional to response time, such as response time is 1ms, throughput is 1000 times per second.

Typically, performance targets are set for throughput and response time, such as AVG under 50ms and TP99 under 100ms at 10,000 requests per second. For high concurrency systems, AVG and TP quantiles must be considered simultaneously.

Also, from a user experience perspective, 200 milliseconds is considered the first cut-off point where the user does not feel the delay, and 1 second is the second cut-off point where the user can feel the delay, but can accept it.

Therefore, for a healthy high-concurrency system, TP99 should be controlled within 200 milliseconds and TP999 or TP9999 should be controlled within 1 second.

Availability index: high availability refers to the system has a high ability to run without failure, availability = mean time to failure/total system operating time, generally use several 9 to describe the availability of the system.

For a highly concurrent system, the most basic requirement is to guarantee three or four 9s. The reason is very simple. If you can only do two 9s, it means that there is 1% downtime, like some large companies with more than 100 billion GMV or revenue every year, 1% is 1 billion business impact.

Scalability metrics: In the face of sudden traffic, it is impossible to temporarily modify the architecture, the fastest way is to add machines to linearly increase the processing capacity of the system.

For business clusters or basic components, scalability = performance improvement ratio/machine increase ratio. The ideal scalability is: resource increase by several times, performance increase by several times. Generally speaking, the expansion capacity should be maintained at more than 70%.

However, from the perspective of the overall architecture of a highly concurrent system, the goal of scaling is not only to design services to be stateless, because when traffic increases by 10 times, business services can be rapidly scaled by 10 times, but the database may become a new bottleneck.

Stateful storage services such as MySQL are often technical difficulties in scaling, and if the architecture is not well planned in advance (vertical and horizontal splitting), it will involve a large amount of data migration.

Therefore, high scalability needs to consider: service cluster, database, middleware such as cache and message queue, Load Balancer, bandwidth, dependent third party, etc. When concurrency reaches a certain magnitude, each of the above factors may become a bottleneck point for expansion.

What are the high concurrency practices?

After understanding the three goals of high concurrency design, and then systematically summarize the design scheme of high concurrency, it will be expanded from the following two parts: first summarize the general design method, and then give specific practical schemes around high performance, high availability and high scalability.

Universal design methodology

The common design method is mainly from the "vertical" and "horizontal" two dimensions, commonly known as the two axes of high concurrency processing: vertical expansion and horizontal expansion.

Scale-up: Its goal is to increase the processing power of a single machine.

There are two options:

Improve the hardware performance of a single machine by increasing memory, CPU cores, storage capacity, or upgrading disks to SSD heap hardware.

Improve stand-alone software performance: use caching to reduce IO times and concurrent or asynchronous ways to increase throughput.

Scale-out: Because there will always be limits to stand-alone performance, it will eventually be necessary to introduce scale-out to further increase concurrency through cluster deployment.

It includes the following two directions:

① Do a good job of layered architecture: This is the advance of horizontal expansion, because high concurrency systems tend to be complex, complex problems can be simplified through layered processing, and horizontal expansion is easier.

The above diagram is the most common layered architecture of the Internet, and of course the real high concurrency system architecture will be further improved on this basis.

For example, dynamic and static separation will be done and CDN will be introduced. The reverse proxy layer can be LVS+Nginx, the Web layer can be a unified API gateway, the business service layer can be further micro-serviced according to vertical business, and the storage layer can be various heterogeneous databases.

② Horizontal expansion of each layer: horizontal expansion without state, fragmentation routing with state. Business clusters can usually be designed to be stateless, while databases and caches are often stateful, so partition keys need to be designed to do storage fragmentation well. Of course, read performance can also be improved through master-slave synchronization and read-write separation schemes.

Specific practical programs

The following is a summary of practical solutions that can be implemented in terms of high performance, high availability and high expansion in combination with my personal experience.

High performance practical solutions:

Cluster deployment reduces stress on individual machines through Load Balancer.

Multi-level cache, including CDN, local cache, distributed cache, etc. for static data, as well as handling hot Key, cache penetration, cache concurrency, data consistency and other issues in cache scenarios.

Dividing databases and tables and index optimization, as well as solving complex query problems with the help of search engines.

Consider the use of NoSQL databases, such as HBase, TiDB, etc., but the team must be familiar with these components and have strong operational capabilities.

Asynchronous, asynchronous processing of secondary processes through multithreading, MQ, and even delayed tasks.

Traffic limiting: It is necessary to consider whether traffic limiting is allowed for the service (for example, the second kill scenario is allowed), including front-end traffic limiting, Nginx access layer traffic limiting, and server traffic limiting.

Clipping the peak and filling the valley of the flow, and receiving the flow through MQ.

Concurrent processing parallelizes serial logic through multithreading.

Pre-calculation, such as red envelope snatching scene, you can calculate the amount of red envelope in advance and cache it, and use it directly when sending red envelope.

Cache warm-up, pre-warming data to local or distributed caches in advance through asynchronous tasks.

Reduce IO times, such as bulk reads and writes to databases and caches, bulk interface support for RPC, or eliminate RPC calls by redundant data.

Reduce packet size during IO, including adopting lightweight communication protocols, appropriate data structures, removing redundant fields in interfaces, reducing the size of cache keys, compressing cache values, etc.

Program logic optimization, such as prepositioning the judgment logic that blocks the execution flow with high probability, optimizing the calculation logic of the For loop, or adopting more efficient algorithms.

The use of various pooling techniques and pool size settings, including HTTP request pools, thread pools (consider CPU intensive or IO intensive setting core parameters), databases and Redis connection pools.

JVM optimization, including the size of the new generation and the old generation, the selection of GC algorithms, etc., to minimize GC frequency and time consumption.

Lock selection, optimistic locking for read more and write less scenarios, or consider using segmented locking to reduce lock conflicts.

The above scheme is nothing more than considering all possible optimization points from the two dimensions of calculation and IO. It is necessary to have a supporting monitoring system to understand the current performance in real time, and support you to carry out performance bottleneck analysis, and then follow the 28 principle to grasp the main contradiction for optimization.

High availability practices:

Both Nginx and the Service Governance Framework support failover of peer nodes to access another node after one node fails.

Non-peer node failover, through heartbeat detection and implementation of active/standby switch (such as redis sentinel mode or cluster mode, MySQL master-slave switch, etc.).

Timeouts, retry policies, and idempotent design at the interface level.

Degradation processing: core services are guaranteed, non-core services are sacrificed, and fuses are performed if necessary; or when core links fail, alternative links are available.

Current Limiting: Direct rejection or return of error codes for requests that exceed the system's processing capacity.

Message reliability assurance in MQ scenarios, including retry mechanism at Producer side, persistence at Broker side, Ack mechanism at Consumer side, etc.

Grayscale publishing can support deployment of small traffic according to machine dimensions, observe system logs and business indicators, and push full volume after stable operation.

Monitoring alarm: comprehensive monitoring system, including the most basic CPU, memory, disk, network monitoring, as well as Web server, JVM, database, various middleware monitoring and business indicators monitoring.

Disaster preparedness drill: similar to current "chaos engineering," some destructive means are applied to the system to see if local failures will cause availability problems.

The high availability scheme is mainly considered from three directions: redundancy, trade-off and system operation and maintenance. At the same time, it is necessary to have supporting duty mechanism and fault handling process. When online problems occur, they can be followed up and handled in time.

Highly scalable practical solutions:

Reasonable layered architecture: For example, the most common layered architecture of the Internet mentioned above, in addition, microservices can be further layered according to the data access layer and business logic layer (but performance needs to be evaluated, and there will be more than one hop in the network).

Split the storage layer: vertically split it according to the business dimension, and further horizontally split it according to the data feature dimension (separate database and table).

Business layer split: The most common is split according to business dimensions (such as commodity services and order services in e-commerce scenarios), or split according to core interfaces and non-core interfaces, or split according to request sources (such as To C and To B, APP and H5).

Thank you for reading, the above is "how to understand the high concurrency in web development" content, after the study of this article, I believe that we have a deeper understanding of how to understand the high concurrency in web development, the specific use of the situation also needs to be verified by practice. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.